Hybris: Hot Folder Data Importing

With hot folder data importing, CSV files are imported automatically when they are moved to a folder that is scanned periodically by the system.

acceleratorservices extension template comes with a batch package that enables automated importing of data from hot folders.

The infrastructure enables the import of CSV files that are internally translated into multi-threaded ImpEx scripts.

The infrastructure uses Spring integration to provide a service-based design.

Diagram of Components

The classes are structured into three major parts:

Tasks executed by the Spring integration infrastructure

HeaderSetupTask
BatchHeader
HeaderTask
HeaderInitTask
ImpexTransformerTask
ImpexRunnerTask
CleanupTask

Converters providing the ImpEx header and converting CSV rows into ImpEx rows with optional filtering

ImpexConverter
ImpexRowFilter

Helper and utility classes

SequenceIdParser
RegexParser
CleanupHelper

General Flow

Spring integration periodically scans the configured input directory for new files.
If new files are found, they are moved to the processing subdirectory and then sent to the Batch Import pipeline, which consists of the following tasks:

HeaderSetupTask
HeaderInitTask
ImpexTransformerTask
ImpexRunnerTask
CleanupTask
ErrorHandler

HeaderSetupTask: Creates a new BatchHeader.
HeaderInitTask: Retrieves a sequence ID and (optionally) a language from the file name.
ImpexTransformerTask: Creates one or many ImpEx files from the CSV input and writes error lines to the error subdirectory.
ImpexRunnerTask: Processes all ImpEx files sequentially with multiple threads.
CleanupTask: Deletes all transformed files and moves the imported file with an optionally appended timestamp to the archive subdirectory.
ErrorHandler: Deletes all transformed files and moves the imported file with an optionally appended timestamp to the error subdirectory.

Configuration files

hot-folder-spring.xml (<HYBRIS_BIN_DIR>/ext-accelerator/acceleratorservices/resources/acceleratorservices/integration)
hot-folder-common-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
hot-folder-store-electronics-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
hot-folder-store-apparel-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
hot-folder-store-powertools-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yb2bacceleratorcore/resources/yb2bacceleratorcore/integration)
project.properties (<HYBRIS_BIN_DIR>/ext-accelerator/acceleratorservices)

Spring Integration Configuration

file:inbound-channel-adapter: Scans a directory in a configurable interval and sends files to a configured channel under the following conditions:

Only files matching a specified regular expression are retrieved (filename-regex).
Files are processed in the order defined by the FileOrderComparator, using the following priority rule:

If a priority is configured for the file prefix, it uses the specified priority.
For files with equal priority, the older file is processed first.

file:outbound-gateway: Moves a file to the processing subdirectory.
int:service-activator: Activates a referenced bean when receiving a message on a configured channel. The bean response is again wrapped in a message and sent to the configured output channel.
int:channel: Sets up a channel.

Technical Background

Spring integration periodically scans the configured input directory for new files (hot-folder-store-apparel-spring.xml)

project.properties

New files are moved to the processing subdirectory (delete-source-files=true: deletes the original source files after writing to the destination) and then HeaderSetupTask is called (method attribute: method of the referenced bean)

HeaderSetupTask

catalog: The catalog to use. This setting is applied to the default header substitution: $CATALOG$

net: The net setting to apply to prices. This setting is applied to the default header substitution: $NET$

HeaderInitTask is called

HeaderInitTask

sequenceIdParser: The regular expression used to extract the sequence ID from the file name.

languageParser: The regular expression used to extract the language from the file name.

fallbackLanguage: The language to use if the language is not set in the file name.

ImpexTransformerTask is called. init-method is executed first and then the specified method of the bean.

ImpexTransformerTask performs the following tasks:

It retrieves all configured converters matching the file name prefix.

For every converter found, it converts the input file as follows:

It adds the ImpEx file header once with substitutions

It converts all rows if they are not filtered

If the line has missing input fields, it adds the line along with the error message to a file in the error subdirectory

ImpexTransformerTask

fieldSeparator: The separator to use to read CSV files (default value is ,).

encoding: The file encoding to use (default value is UTF-8).

linesToSkip: The lines to skip in all CSV files (default value is 0).

converterMap: Used to map file prefixes to one or multiple converters to produce ImpEx files.

CleanupHelper

timeStampFormat: If set, appends a timestamp in the specified format to input files moved to the archive or error subdirectory.

ConverterMapping

Converter corresponding to base_product file name prefix

Converter

header: The ImpEx header to use including header substitutions:
$NET$: the net setting
$CATALOG$: the catalog prefix
$LANGUAGE$: the language setting
$TYPE$: an optional type attribute that can be applied if filtering is configured

impexRow: The template for an ImpEx row adhering to the syntax:
Syntax: {('+')? (<columnId> | 'S')}
The '+' character adds a mandatory check to this column. Any lines with missing attributes are written to an error file in the error subdirectory.
The 'S' can be used for writing the current sequence ID at the template position. Optionally, columns can be quoted by enclosing the column in the template with quotation marks.

rowFilter: An optional row filter. The supplied expression must be a valid Groovy expression. The current row map consisting of column ID and value is referenced by row.
Configuring multiple converters with row filters gives the option to split a supplied CSV input file into different ImpEx files according to specified filter criteria.

type: An optional type that can be retrieved in the header using the header substitution $TYPE$.