How do I migrate my Data Source Connector project to a Batch Processor project?

How do I migrate my Data Source Connector project to a Batch Processor project?

The Batch Processor replaces the Data Source Connector, providing the same (and more) functionality. If you have an existing project or application using the data source connector you can migrate from the Data Source Connector to the Batch Processor by following the steps outlined below.

Upgrade the rulebase

The Batch Processor only supports rulebases compiled in the current version of Oracle Policy Modeling. You will have to update the rulebase you are currently using with the Data Source Connector by opening it in Oracle Policy Modeling.

  1. Open the rulebase Project in Oracle Policy Modeling
  2. Follow the upgrade project steps
  3. Build the rulebase

 

In most cases, upgrading a rulebase is a straightforward procedure. For more information see the topic Upgrade a project in the Oracle Policy Modeling User's Guide.

Convert Data Source Connector file descriptors to Batch Processor configuration

The Data Source Connector requires a descriptor for each csv file to be processed, the file descriptor contains information about the columns and the data contained in each column. The Batch Processor does not require file descriptors.

You can migrate the information from a Data Source Connector file descriptor to Batch Processor configuration in the following way.

Header information must exist in the orginal csv file. The first row of any csv file should contain the column names.

Example Data Source Connector file descriptor for parent.csv

 

parent.xml

<table name="parent" xmlns="http://oracle.com/determinations/connector/ data-source/table">
        <columns>
                <column header="id" ordinality="1"/>
                <column header="name" ordinality="2"/>
                <column header="wage" ordinality="3"/>
        <columns>
</table>

 

Example data in parent.csv:

 

parent.csv

1,John Robinson,350

2,Lois Griffin,400

 

For the Batch Processor, the parent.xml file descriptor is no longer required, but the column headers should be added to parent.csv

Example data in parent.csv for Batch Processor:

 

parent.csv

id,name,wage

1,John Robinson,350

2,Lois Griffin,400

 

If the column headings match the public names of attributes in the parent entity, then no further configuration is needed. Otherwise, attributes can be explicitly configured in the Batch Processor configuration file.

If you did need to explicitly map the parent entity you would add the mapping to the Batch Processor configuration file.

Example mapping for person.csv (mapped to rulebase global entity)

<mappings>
        <mapping entity="global" table="person" primary-key="id">
                <attribute name="name" field="name" />
                <attribute name="wage" field="wage" />
        </mapping>
        ...
</mappings>

For more information on Batch Processor mappings, see Example Entity mapping, Specify an attribute mapping.

Convert Data Source Connector configuration to Batch Processor configuration

The Batch Processor relies on configuration very similar to the Data Source Connector. Configuration for the Batch Processor can be done in a configuration file or passed in as command line arguments. If you are converting a Data Source Connector project, it will be easier to place your configuration in an xml file as this is very similar to the Data Source Connector configuration.

 

DSC Configuration Description BP Configuration Description
<threads>

The number of threads to process the data; example :

<threads>3</threads>

In <options> element: <processors>

The number of processes to process the data. Similar to threads, but these are distinct java or .NET processes; example:

<processors>3</processors>

<run-limit>

The number of records to process; example:

<limit>1000</limit>

In <options> element: <limit>

For csv file the Batch Processor will always process all records. For database only input a limit can be set; example:

<limit>1000</limit>

<time-out> The maximum time the batch processor will run for before quitting. No corresponding setting.  
<data-sources> Defines all the csv files to be used and the rulebase to process In <options> element: <csv> and <rulebase>

All csv files in directory specified in the csv element will be processed; example input:

<csv>data</csv>

example rulebase:

<rulebase>./rulebase/SimpleBenefits.zip</rulebase>

data-mappings Defines attributes, entities and relationships <mappings> See Data mapping in the XML configuration file
output Defines the output In <options> element: <output>

The Batch Processor has many more available output options, the default output is csv, which can be explicitly defined. Data is always overwritten; example:

<output type="csv">../output</output>

Ensure that output attributes are defined in the Batch Processor configuration

As part of converting the data mappings in step 3 you should have the output attributes from the Data Source Connector as output in the Batch Processor.

Because outputs are the most important aspect of a batch process, it is worth checking that the outputs defined in the Batch Processor are correct. In Batch Processor, output attributes can be defined in two different ways.

Defined in the csv file

In this case, the output attribute is defined in an empty column of the csv file, with parentheses around name. The parentheses indicate that this column is an output column.

The name of the column should match the public name of the inferred attribute that will be the output.

Example person.csv

id,name,wage,(person_is_eligible)

1,John Robinson,350,

2,Lois Griffin,400,

In the example above the entity that the person rows correspond to is expected to have an inferred attribute with a public name "person_is_eligible" when the Batch Processor executes, the person.csv file will be written to the output with this column filled in with the known results.

Note that the number of commas in the csv file must match the number of columns even if they are empty.

Defined in the configuration mapping

Output attributes can be defined in the Batch Processor mapping in a very similar way to how they are defined in the Data Source Connector.

Define the attribute in the corresponding mapping element for the entity and add the attribute output="true", the field attribute represents the column that the attribute will be written to.

<attribute name=" person_is_eligible " field="person_is_eligible" output="true"/>

 

Differences between Data Source Connector output and Batch Processor output

When outputting csv files the Data Source Connector, will write out the resulting attributes to the specified location. However, there are some differences.

The Data Source Connector writes the original csv file and adds the outputs to the out csv whereas the Batch Processor does the following:

Example:

person.csv in input

id,name,wage,(person_is_eligible)

1,John Robinson,350,

2,Lois Griffin,400,

will look like the following person.csv in output

id,name,wage,(person_is_eligible)

1,John Robinson,350,true

2,Lois Griffin,400,false

 

In the example above, you can see that all the input fields present in the person.csv remain, in their original format. Any output parameters are inserted into the csv file. Unknown and Uncertain results are always outputted as a blank string.