Configure the Batch Processor

The Batch Processor is invoked in the following ways.

java -jar determinations-batch.jar <command line parameters>

Determinations.Batch.exe <command line parameters>

Command line configuration

The following is a list and description of each of the Batch Processor's command line parameters:

--rulebase <rulebase path>

Specifies the rulebase to be used for the batch processor.

--csv <folder>

Specifies the folder in which the csv data files are located. This parameter must be provided if the --database parameter is not used.

--delimiter <character>

Identifies the value delimiter to be used when reading and writing CSV files. Defaults to a single comma (,) character. This parameter will be ignored if the batch processor is not reading from or writing to CSV files.

As white space characters cannot be passed easily as command line parameters, special values of \t (tab) and \s (space) can be used to specify a tab or space character as the delimiter; for example: --delimiter \t.

--coverage <coverage file>

Outputs a coverage file that can be imported into Oracle Policy Modeling's Analyze Coverage File feature.

--database <db-connection-string>

Specifies the connection string of the database to be used as the source of input data. This parameter must be provided if the --csv parameter is not used; for example: jdbc:oracle:thin:user/password@localhost:1521/example.

--dboutput

Writes the results of the batch run back to the database. This parameter can only be used if the output came from the database (--database option).

--userid <db-userid>

Specifies the user id for a database connection. This parameter can only be used when the userid is not provided in the connection string (--database option).

--password <db-password>

Specifies the password for a database connection. This parameter can only be used when the password is not provided in the connection string (--database option).

--dbprovider <db-connection-string>

Specifies the provider Invariant name for a .NET database connection.

--driver <driver-name>

Specifies the name of the database driver to be used to connect to the database specified by the --database parameter; for example, oracle.jdbc.OracleDriver. This parameter will be ignored if the --database parameter is not included.

--driversrc <path>

Specifies the full path of the external resource containing the database driver identified by the --driver parameter; for example, jar file name. This parameter will be ignored if the --database parameter is not included.

--base <name>

Specifies the 'base' table that represents the cases. Multiple csv files represent a database, but one must be identified as the one corresponding to cases. This parameter is optional if there is only a single csv file, or there is a csv file named 'global'; otherwise it is mandatory.

--processors <number>

Specifies the number of processors to use for the batch processor; default value is the number of processors available.

--blocksize <number>

Specifies the number of cases included in each data block read or updated.

--output <folder>

Specifies the path of the file to write any input or output attributes in csv format. If included, the path for the output file must not be the same as the data folder specified by the --csv parameter.

--limit <number>

For database input only, this sets a limit to the number of cases processed by the batch processor. This can be useful if you are operating on a large data set, but don't necessarily want to process all the cases; for example you may be verifying that the configuration is correct.

--export <folder>

Exports cases as saved sessions into the specified folder. The --limit parameter is handy if you wish to limit the number of cases to be exported.

--exporttsc <filename>

Exports cases into a single .tsc test case file, suitable for adding to an Oracle Project Modeling project. The test file will have the extension '.tsc' appended if it is missing; for example, --exporttsc c:\temp\my_test.tsc.

--config

Specifies the xml configuration file that is used for mapping (non-zero configuration).

XML file configuration

As well as specifying options on the command line (see Command line configuration above) you can also specify options in a Batch Processor configuration file.

When the Batch Processor starts, it looks for a file in the current working directory called config.xml and if this file is found, it will read in the configuration from this file.

Set options in the XML configuration file

All options that can be set on the command line can also be set in the <options> section of the configuration file; note that if an option is found on the command line and in the config file, then the command line overrides the configuration file setting.

The following options can be set:

 

Element Name Description Example
base Name of the base entity for the Batch Processor; equivalent to --base on the command line. If the entity name specified does not exist in the rulebase, it will be mapped to global.
<base>global</base>
processors The number of slave processors to start; equivalent to --processors on the command line. <processors>2</processors>
limit Limits the number of rows to process; equivalent to --limit on the command line. <limit>1000</limit>
rulebase The rulebase to use; equivalent to --rulebase on the command line. <rulebase>SocialServicesScreening.zip</rulebase>
csv The csv directory to get input from; equivalent to --csv on the command line. <csv>./data/csv</csv>
delimiter

The value delimiter to be used when reading from or writing to CSV files. Equivalent to --delimiter on the command line.

Special values of \t (tab) and \s (space) can be used to specify a tab or space character as the delimiter respectively.

<delimiter>\t</delimiter>
blocksize Specifies the number of cases included in each data block read or updated. <blocksize>800</blocksize>
database The definition for a database source; this element has sub-elements which are equivalent to the --database, --driver, --driversrc, --userid and --password options on the command line. <database>
    <url>http://localhost/db:8001</url>
    <driver></driver>
    <driversrc></driversrc>
    <userid></userid>
    <password></password>
</database>
output

The output location. The "type" attribute indicates the type of output (defaults to "csv"). Equivalent to the --export, --exporttsc, --db and --coverage options on the command line.

  • If the type is "db" then output is written back to the database. No value is expected here.
  • If the type is "csv" the value is a directory where the csv files with outcomes will be written.
  • If the type is "coverage", "export" or "exportsc" the value represents the file the exported test case, session or coverage file.




<ouput type="csv">./data/out/csv</output>

Data mapping in the XML configuration file

Mappings are used to map csv and database structures to Oracle Policy Automation data structures: boolean format, entities, relationships and attributes.

If the input data is csv files, much of the mapping from csv data to Oracle Policy Automation data may be done automatically (see Zero-configuration conventions for CSV input). Specifying data mappings can be used to enhance or change the default mappings of csv data.

If the input data is database tables, mapping information must be specified, as there are no zero configuration conventions for database input.

Specify the global boolean format

The global boolean format defines the format for boolean values read from and written to a csv or database data source.

This element must include the following attributes:

  1. The xml attribute true-value defines the value for true when reading from, or writing to, the data source
  2. The xml attribute false-value defines the value for false when reading from, or writing to, the data source.

Example global boolean mapping

      <mappings>
         <boolean-format true-value="" false-value="" />
         <!-- entity mapping -->
         ...
             <!-- entity attributes and relationships -->
         </mapping>
     </mappings>        

Specify an entity mapping

Specifying an entity mapping is done as follows:

  1. The xml attribute entity is used to specify the entity on the rulebase. In this case, it refers to an entity called customer in the rulebase.
  2. The xml attribute table is used to define the source table. This states the source is either from a csv file called customer.csv or a database table called customer.
  3. The xml attribute primary-key is used to define the primary key for the source. For a database source, this is where you specify the primary key of your table. Note that by default, the primary key for a csv source is the '#' column in the csv file.

Example Entity mapping

<mappings>
        <mapping entity="customer" table="customer" primary-key="#">
                …
                <!-- entity attributes and relationships -->
        </mapping>
</mappings>

Specify an attribute mapping

Attribute mappings are contained within an entity mapping. Each attribute element specifies the mapping for the rulebase attribute.

  1. 'name' is used to define the name of the attribute on the rulebase.
  2. 'field' is used to define the source field (for example, column in the csv file).
  3. The optional 'output' is used to identify the field as an output field. If not included the value will default to "false" and the field will not be used as output.
  4. If we are writing csv output, we can use the optional 'csv-output-field' to change the column name on the output.

 

IMPORTANT:

If a field was specified as an output field in the csv via parentheses '(' and ')', but is specified again in the configuration XML file without the output="true" flag, it will not be an output field. The information specified in the configuration file supersedes the CSV information if present in both places.

Example attribute mappings

<entity entity="customer" table="customer" primary-key="#">
        <attribute name="income" field="income" />
        <attribute name="result" field="result" output="true" />
        <attribute name="result" field="result" csv-output-field="newcolumnname" />
</entity>

Specify a relationship mapping

All relationships must have the two xml attributes:

  1. 'name' - must match the relationship name of the source entity.
  2. 'source-entity' - is the name for the source entity of the relationship.

Specific for one-to-one, one-to-many/many-to-one relationships.

'foreign-key' - is the column field name which is used as the foreign-key for the relationship. The foreign-key has to specified on the many side of the one-to-many relationship.

Specific for many-to-many relationships

  1. 'rel-source' - is the source of the many-to-many mapping. In the example below, it states that the source of the many-to-many mapping is coming from the csv file called plansproducts.
  2. 'source-key' - is the foreign key reference to the primary key of source table.
  3. 'target-key' - is the foreign key reference to the primary key of target table.

 

Note: Many-to-many relationships can be specified at either side.

Example relationship mappings

<mapping entity="customer" table="customer" primary-key="#">
        <relationship name="applicanttopincomeearner" source-entity="global" foreign-key="applicanttopincomeearner" />
        <relationship name="customersfavoriteproductrev" source-entity="product" foreign-key="customersfavoriteproduct" />
</mapping>
<mapping entity="product" table="product" primary-key="#">
        <!-- many-to-many -->
        <relationship name="plansproducts" source-entity="plan" rel-source="plansproducts" source-key="plan" target-key="product"/>
</mapping>

Structure of the XML configuration file

<configuration>
        <options>
        ...
        </options>
        <mappings>
        ...
        </mappings>
</configuration>

 

The structure of a Batch Processor configuration is quite simple. The root element of the configuration file is the <configuration> element.

Next there can be a single <options> element which contains all the Batch Processor configuration options (see Set options in the XML configuration file).

Next there can be a single <mappings> element which contains all of the Batch Processor data mappings.