The Batch Processor is designed to be easy to configure and use. If comma separated files are used as input, then little or no configuration is needed to get the Batch Processor to correctly read them in and output the correct results. To achieve this, a set of zero-configuration conventions are used for identifying entities, attributes and relationships.
When reading in csv files, unless otherwise specified, the file name (without the .csv extension) is used to match an entity public name, in the rulebase.
For example, if you have a csv file with the name person.csv, and your rulebase has an entity with the public name person, the Batch Processor will automatically load data from this file into the person entities when processing.
The first line in a csv file should contain the column headings for the data in subsequent lines. These column headings will be matched to attribute public names.
For example, in the csv text below, the headings "name" "age" and "date_of_birth" will be matched to attributes with the corresponding names.
#,name,age,date_of_birth
1,John Citizen,30,1982-02-13
By default, a column with the heading "#" is assumed to be a unique identifier for that entity. Only the identifier for an entity (row) needs to be unique.
In the example data below, the entity Bank has a unique identifier row ("#").
Bank.csv
#, bank_name
1, First National Bank
2, Second National Bank
3, Third National Bank
One-to-many relationships can be represented in a csv file. If a column heading is matched to the public name of a relationship, the relationship will be loaded.
In the example data below, the "to-one" side of the "one-to-many" relationship can be represented, by putting the the id of the bank in a column called "customers_bank". This will be correctly read in by the Batch Processor if the customer to bank side of the relationship has the public name "customers_bank"
Bank.csv
#, bank_name
1, First National Bank
2, Second National Bank
Customer.csv
#, customer, balance, customers_bank
1, John Citizen, 3000, 1
2, Joan Citizen, 1000, 2
A many-to-many relationship will be identified if the name of the csv file matches the public name of the many-to-many relationship. The csv file should contain two fields which are the source and target entities of the relationship.
If a rulebase has a many-to-many relationship "the customer's products" with the public name "customers_products", the many-to-many relationship will be read in as:
- Customer 1 targets: Products A, B, C
- Customer 2 targets: Products C, D
- Customer 3 targets: Products A, B
Customer.csv
#, customer
1, John Citizen
2, Joan Citizen
3, Fred Bloggs
Product.csv
#, product
1, Product A
2, Product B
3, Product C
4, Product D
customers_products.csv
Customer, Product
1, 1
1, 2
1, 3
2, 3
2, 4
3, 1
3, 2
Attribute values specified in a csv file can be mapped on any attribute in the corresponding entity. Any column in a csv file that cannot be mapped is ignored (a warning is written to the log files).
The expected formats for various attribute values are:
Value type | Format description | Blank Value |
---|---|---|
Number / Currency |
The Batch Processor treats currency values as identical to simple numeric values when reading input and writing output. For numeric and currency values, it will apply the following rules: For character fields:
|
Blank and NULL values are considered UNCERTAIN |
Text | Text values will be read and written verbatim. | Blank text values are treated as empty strings. |
Boolean |
For reading and writing boolean values, the Batch Processor will apply the following rules:
Reading input values:
Writing output values:
|
Blank and NULL values are considered UNCERTAIN |
Date |
For reading and writing date values, the Batch Processor will apply the following rules: For character fields:
|
Blank and NULL values are considered UNCERTAIN |
Date-Time |
For reading and writing date-time values, the Batch Processor will apply the following rules: For character fields:
|
Blank and NULL values are considered UNCERTAIN |
Time of Day |
For reading and writing time values, the Batch Processor will apply the following rules: For character fields:
|
Blank and NULL values are considered UNCERTAIN |