Wednesday 29 November 2017

Hot Folder Data Importing

With hot folder data importing, CSV files are imported automatically when they are moved to a folder that is scanned periodically by the system. 

acceleratorservices extension template comes with a batch package that enables automated importing of data from hot folders. 

The infrastructure enables the import of CSV files that are internally translated into multi-threaded ImpEx scripts. 

The infrastructure uses Spring integration to provide a service-based design.

Diagram of Components
The classes are structured into three major parts:
  • Tasks executed by the Spring integration infrastructure
HeaderSetupTask
BatchHeader
HeaderTask
HeaderInitTask
ImpexTransformerTask
ImpexRunnerTask
CleanupTask
  • Converters providing the ImpEx header and converting CSV rows into ImpEx rows with optional filtering
ImpexConverter
ImpexRowFilter
  • Helper and utility classes
SequenceIdParser
RegexParser
CleanupHelper

General Flow
  • Spring integration periodically scans the configured input directory for new files.
  • If new files are found, they are moved to the processing subdirectory and then sent to the Batch Import pipeline, which consists of the following tasks:
HeaderSetupTask
HeaderInitTask
ImpexTransformerTask
ImpexRunnerTask
CleanupTask
ErrorHandler
  • HeaderSetupTask: Creates a new BatchHeader.
  • HeaderInitTask: Retrieves a sequence ID and (optionally) a language from the file name.
  • ImpexTransformerTask: Creates one or many ImpEx files from the CSV input and writes error lines to the error subdirectory.
  • ImpexRunnerTask: Processes all ImpEx files sequentially with multiple threads.
  • CleanupTask: Deletes all transformed files and moves the imported file with an optionally appended timestamp to the archive subdirectory.
  • ErrorHandler: Deletes all transformed files and moves the imported file with an optionally appended timestamp to the error subdirectory.

Configuration files
  • hot-folder-spring.xml (<HYBRIS_BIN_DIR>/ext-accelerator/acceleratorservices/resources/acceleratorservices/integration)
  • hot-folder-common-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
  • hot-folder-store-electronics-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
  • hot-folder-store-apparel-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yacceleratorcore/resources/yacceleratorcore/integration)
  • hot-folder-store-powertools-spring.xml (<HYBRIS_BIN_DIR>/ext-template/yb2bacceleratorcore/resources/yb2bacceleratorcore/integration)
  • project.properties (<HYBRIS_BIN_DIR>/ext-accelerator/acceleratorservices)

Spring Integration Configuration
  • file:inbound-channel-adapter: Scans a directory in a configurable interval and sends files to a configured channel under the following conditions: 
    • Only files matching a specified regular expression are retrieved (filename-regex). 
    • Files are processed in the order defined by the FileOrderComparator, using the following priority rule: 
      • If a priority is configured for the file prefix, it uses the specified priority. 
      • For files with equal priority, the older file is processed first. 
  • file:outbound-gateway: Moves a file to the processing subdirectory.
  • int:service-activator: Activates a referenced bean when receiving a message on a configured channel. The bean response is again wrapped in a message and sent to the configured output channel.
  • int:channel: Sets up a channel.

Technical Background
  • Spring integration periodically scans the configured input directory for new files (hot-folder-store-apparel-spring.xml)
project.properties
 
 
  • New files are moved to the processing subdirectory (delete-source-files=true: deletes the original source files after writing to the destination) and then HeaderSetupTask is called (method attribute: method of the referenced bean)
HeaderSetupTask
  • catalog: The catalog to use. This setting is applied to the default header substitution: $CATALOG$
  • net: The net setting to apply to prices. This setting is applied to the default header substitution: $NET$
  • HeaderInitTask is called
 
HeaderInitTask
  • sequenceIdParser: The regular expression used to extract the sequence ID from the file name.  
  • languageParser: The regular expression used to extract the language from the file name. 
  • fallbackLanguage: The language to use if the language is not set in the file name. 
  • ImpexTransformerTask is called. init-method is executed first and then the specified method of the bean.
 
 
ImpexTransformerTask performs the following tasks: 
  • It retrieves all configured converters matching the file name prefix.
  • For every converter found, it converts the input file as follows: 
    • It adds the ImpEx file header once with substitutions
    • It converts all rows if they are not filtered
    • If the line has missing input fields, it adds the line along with the error message to a file in the error subdirectory
ImpexTransformerTask
  • fieldSeparator: The separator to use to read CSV files (default value is ,). 
  • encoding: The file encoding to use (default value is UTF-8). 
  • linesToSkip: The lines to skip in all CSV files (default value is 0).
  • converterMap: Used to map file prefixes to one or multiple converters to produce ImpEx files.
CleanupHelper
  • timeStampFormat: If set, appends a timestamp in the specified format to input files moved to the archive or error subdirectory.
ConverterMapping
Converter corresponding to base_product file name prefix
Converter
  • header: The ImpEx header to use including header substitutions:
    $NET$: the net setting
    $CATALOG$: the catalog prefix
    $LANGUAGE$: the language setting
    $TYPE$: an optional type attribute that can be applied if filtering is configured 
  • impexRow: The template for an ImpEx row adhering to the syntax:
    Syntax: {('+')? (<columnId> | 'S')}
    The '+' character adds a mandatory check to this column. Any lines with missing attributes are written to an error file in the error subdirectory.
    The 'S' can be used for writing the current sequence ID at the template position. Optionally, columns can be quoted by enclosing the column in the template with quotation marks.
  • rowFilter: An optional row filter. The supplied expression must be a valid Groovy expression. The current row map consisting of column ID and value is referenced by row.
    Configuring multiple converters with row filters gives the option to split a supplied CSV input file into different ImpEx files according to specified filter criteria. 
  • type: An optional type that can be retrieved in the header using the header substitution $TYPE$.
Converter corresponding to variant file name prefix

ImpexRowFilter

  • ImpexRunnerTask is called
 
All generated ImpEx files are sent to the platform ImportService in the AbstractImpexRunnerTask sequentially.
  • CleanupTask is called
 

No comments:

Post a Comment