Extract Record Processing

A process that selects records and produces output for each record to write to a file to integrate with external systems is called an Extract Process. The base batch control Plug-in Driven Extract Template (F1-PDBEX) may be used as a template.

Fastpath: Please read the topic Ad-hoc Processes. Its information about the Select Records plug-in and some of the information about the Process Record plug-in apply to extract processes. This topic focuses on aspects that are unique to extract processes.

The system provided process includes parameters to configure the file path and file name for the created file. The file path supports all the functionality described in the topic Referencing URIs. The file name supports system substitution variables, such as run number, thread number, user and date / time. The process also supports compression if an appropriate extension is used. Refer to the File Name parameter description in the batch control for more information.

Note: Refer to Flexible File Name / Writing Multiple Files for information about support for substituting a business value into the file name.

For extract processes, the Process Record algorithm is responsible for returning the data that should be written to the file in one or more XML instances along with the schema name(s) that describes the XML instance(s). The program will write the data to the file as per the format indicated in the File Format batch parameter. By default the service uses the OUAF format for date and time. To override this and use XSD format, configure the Date Time Format batch parameter to 'XSD''.

If an existing schema satisfies the output requirements, it may be used. Otherwise, you should define a data area to indicate the output format of the records as appropriate. Note that you should define an appropriate Field for each element in the schema. This is needed for the fixed position File Format, but is good practice for any output format. Refer to F1-PDBGenProcExtractRecord for an example of a data area with fields defined for each schema element.

There are two options for designing and configuring the Process Record algorithm:

Create a specific Process Record algorithm type that encapsulates the extract logic. The product provides a base algorithm type that illustrates the basic technique to follow. Refer to the algorithm type General Process - Sample Process Record Extract (F1-GENPROCEX) for more information.
Create a file integration type that defines the records to be included in the extract and configure a plug-in driven batch control that references that integration type. Refer to Extract Using File Integration for more information about implementing this type of functionality.

Your specific edge product may provide other Process Record algorithm types out of the box. Use the algorithm type query and search for records for this algorithm entity.

Grouping by Record XML Node

When configuring an extract process that should produce the output in XML format, your Process Record plug-in may return multiple schemas with information that is all part of the same overall record. For example, the output could include account information and related service agreement details:

<account type="group"
    <accountId>1234567890</accountId>
    ...
</account>
<sa type="group"
    <saId>123457665</saId>
    ...
</sa>
<sa type="group"
    <saId>1234588913</saId>
    ...
</sa>

In this case, it may be desired to wrap all that information in a grouping XML tag so that all the information for one record is grouped together. Because the Process Record plug-in could be returning header or footer records that are not part of a given selected record's extract information, the batch process doesn't know what schemas returned by the Process Record plug-in belong together logically. The output parameter 'record XML node' in the schema collection should be used to indicate the outer XML node to use to group related information together. For example:

<SchemaInstance>
<recordXMLNode>record</recordXMLNode>
<schemaName>CM-AccountRecord</schemaName>
<schemaType>F1DA</schemaType>
<data><account><accountId>1234567890</accountId>...</data>
</SchemaInstance>
<SchemaInstance>
<recordXMLNode>record</recordXMLNode>
<schemaName>CM-SAInfo</schemaName>
<schemaType>F1DA</schemaType>
<data><sa><saId>123457665</saId>... </data>
</SchemaInstance>
<SchemaInstance>
<recordXMLNode>record</recordXMLNode>
<schemaName>CM-SAInfo</schemaName>
<schemaType>F1DA</schemaType>
<data><sa><saId>1234588913</saId>... </data>
</SchemaInstance>

All schemas returned from a single call to the Process Record plug-in that have the same record XML node will be grouped in the written output within that XML tag, as per the example below:

<root>
<record>
<account>
    <accountId>1234567890</accountId>
    ...
</account>
<sa>
    <saId>123457665</saId>
    ...
</sa>
<sa>
    <saId>1234588913</saId>
    ...
</sa>
</record>
<record>
<account>
    <accountId>987654320</accountId>
    ...
</account>
<sa>
    ...
</sa>
</record>
</root>

Multi-threaded Extract

When extracting data, it's possible that the volume of data warrants running the job multi-threaded. In this case, a separate file is produced with the thread number included in the file name. You may proactively include the thread number as a substitution variable when indicating the file name. If not, the system appends the thread number.

As a convenience, the system supports concatenating the extract files produced by the various threads at the end. The system does this by finding files whose file names match except for the thread number. As such, the file name may not contain date or time if the concatenation parameter is true. Please note the following with respect to the concatenation feature:

The content of each file is concatenated together "as is". If the individual files have header, footer or summary information, there is no logic to consolidate that information in the concatenated file. If you want a single header, footer or summary, you must run the extract single threaded.
Concatenation is not supported if the file name indicates the zip compression should be used.
The individual thread files are retained.
Note that if the file format is XML, the files written for the individual threads will be written with an extension of ".tmp".

Skipping Records

By default, the extract process expects one or more schemas to be returned by each call to the Process Record plug-in. (Ideally when designing any process, the Select Records algorithm only selects records that should be processed and the Process Record algorithm does not need to confirm whether the record should be part of the processing or not.) If however there is a need to check a condition in the Process Record plug-in and it is found that no data needs to be extracted for this record, the process record algorithm should return the 'is skipped' output parameter to true so that the program doesn't write an empty row.

Note that the base process records algorithm for file integration types (F1-FILEX-PR) will set the ‘is skipped’ parameter to true if the file integration record extract record algorithm returns no data.

Configuring a New Process

The points documented in the topic related to configuring a new adhoc process with respect to the Select Records algorithm apply to extract processes.

For the Process Records algorithm, it is responsible for returning one or more schema instances populated with information that should be written to the file. If your edge product does not deliver a suitable plug-in, create a plug-in script where the algorithm entity is Batch Control - Process Record. Note that the plug-in receives all the information selected in the SQL defined in the Select Records plug-in.

Refer to Configuring an Extract Process Using File Integration Type for information on designing extracts using this configuration.

If a new processing script is required, define the algorithm type and algorithm for the newly created script. Create a batch control by duplicating the base template F1-PDBEX. Plug in the algorithms created above and configure the parameters as appropriate. Note that you may configure custom ad hoc parameters on the batch control if required. Both base and custom batch parameter values are available to the Select Records and Process Records plug in algorithms.