CSV input for the Batch Processor

CSV input for the Batch Processor

The Batch Processor is designed to be easy to configure and use. If comma separated files are used as input, then little or no configuration is needed to get the Batch Processor to correctly read them in and output the correct results. To achieve this, a set of zero-configuration conventions are used for identifying entities, attributes and relationships.

Zero-configuration conventions for csv input

CSV file name used as Entity public name.

When reading in csv files, unless otherwise specified, the file name (without the .csv extension) is used to match an entity public name, in the rulebase.

For example, if you have a csv file with the name person.csv, and your rulebase has an entity with the public name person, the Batch Processor will automatically load data from this file into the person entities when processing.

Column headings used as attribute public names

The first line in a csv file should contain the column headings for the data in subsequent lines. These column headings will be matched to attribute public names.

For example, in the csv text below, the headings "name" "age" and "date_of_birth" will be matched to attributes with the corresponding names.

#,name,age,date_of_birth

1,John Citizen,30,1982-02-13

Column with name "#" as identifier

By default, a column with the heading "#" is assumed to be a unique identifier for that entity. Only the identifier for an entity (row) needs to be unique.

Example:

In the example data below, the entity Bank has a unique identifier row ("#").

Bank.csv

#, bank_name

1, First National Bank

2, Second National Bank

3, Third National Bank

Column headings used for relationships

One-to-many relationships can be represented in a csv file. If a column heading is matched to the public name of a relationship, the relationship will be loaded.

Example

In the example data below, the "to-one" side of the "one-to-many" relationship can be represented, by putting the the id of the bank in a column called "customers_bank". This will be correctly read in by the Batch Processor if the customer to bank side of the relationship has the public name "customers_bank"

Bank.csv

#, bank_name

1, First National Bank

2, Second National Bank

 

Customer.csv

#, customer, balance, customers_bank

1, John Citizen, 3000, 1

2, Joan Citizen, 1000, 2

CSV file name used as many-to-many relationship

A many-to-many relationship will be identified if the name of the csv file matches the public name of the many-to-many relationship. The csv file should contain two fields which are the source and target entities of the relationship.

Example

If a rulebase has a many-to-many relationship "the customer's products" with the public name "customers_products", the many-to-many relationship will be read in as:

 

Customer.csv

#, customer

1, John Citizen

2, Joan Citizen

3, Fred Bloggs

 

Product.csv

#, product

1, Product A

2, Product B

3, Product C

4, Product D

 

customers_products.csv

Customer, Product

1, 1

1, 2

1, 3

2, 3

2, 4

3, 1

3, 2

Value formats for csv inputs

Attribute values specified in a csv file can be mapped on any attribute in the corresponding entity. Any column in a csv file that cannot be mapped is ignored (a warning is written to the log files).

The expected formats for various attribute values are:

 

Value type Format description Blank Value
Number / Currency

The Batch Processor treats currency values as identical to simple numeric values when reading input and writing output. For numeric and currency values, it will apply the following rules:

For character fields:

  • Must use the '.' character as the decimal separator.
  • Thousands separators are not supported and must not be used.
  • Currency symbols are not supported and must not be used.
  • For scientific notation, the '+' character is not supported in the exponent.
Blank and NULL values are considered UNCERTAIN
Text Text values will be read and written verbatim. Blank text values are treated as empty strings.
Boolean

For reading and writing boolean values, the Batch Processor will apply the following rules:

 

Reading input values:

  • For character fields:
    • String values are case insensitive, so "YES" is the same as "yes" and "Yes".
    • White space characters will be ignored, so "YES" is the same as "YES".
    • "true", "yes" and "1" will be read as TRUE.
    • "false", "no" and "0" will be read as FALSE.

Writing output values:

  • The values will be "true" and "false".
Blank and NULL values are considered UNCERTAIN
Date

For reading and writing date values, the Batch Processor will apply the following rules:

For character fields:

  • Input values must be provided in the format "yyyy-MM-dd".
  • Output values will be written in the format "yyyy-MM-dd".
Blank and NULL values are considered UNCERTAIN
Date-Time

For reading and writing date-time values, the Batch Processor will apply the following rules:

For character fields:

  • Input values must be provided in the format "yyyy-MM-dd HH:mm:ss", in 24-hour time.
  • Output values will be written in the format "yyyy-MM-dd HH:mm:ss", in 24-hour time.
Blank and NULL values are considered UNCERTAIN
Time of Day

For reading and writing time values, the Batch Processor will apply the following rules:

For character fields:

  • Input values must be provided in the format "HH:mm:ss", in 24-hour time.
  • Output values will be written in the format "HH:mm:ss", in 24-hour time.
Blank and NULL values are considered UNCERTAIN

 

See also:

Value formats for csv output