Data Cleanser Processing Attributes (Analyzing and Cleansing Data for Sun Master Index)

Analyzing and Cleansing Data for Sun Master Index

Data Cleanser Processing Attributes

The following table lists and describes the attributes for the cleansingVariable element in the configuration file. These attributes define the data source and path names for the Data Cleanser as well as global validation rules. Below is a sample of the cleansing attributes.

cleansingVariable objectdefFilePath="../../src/Configuration" validateType="true" 
validateNull="false" validateLength="true" DBconnection="../StagingDB" 
goodFilePath="./Output/good.txt" badFilePath=./Output/bad.txt startCount="1"
standardizer="true"

Attribute	Description
objectdefFilePath	The path and filename for the `object.xml` file to use to cleanse the data.
validateType	An indicator of whether the cleanser should validate each field's data type against the type defined in `object.xml`. Specify `true` to validate field type; otherwise specify `false`. If you validate against type and the validation fails for any field in a record, the record is written to the bad file.
validateNull	An indicator of whether the cleanser should check for null values in each field that is configured to be required in `object.xml`. Specify `true` to check for null values; otherwise specify `false`. If you check for null values and any required field in a record is null, the record is written to the bad file.
validateLength	An indicator of whether the cleanser should validate each field's length against the length defined in `object.xml`. Specify `true` to validate field length; otherwise specify `false`. If you validate against length and the validation fails for any field in a record, the record is written to the bad file.
DBconnection	The path to the staging database or the path and name of the flat file containing the data to be profiled. Use forward slashes in this path rather than back slashes.
badDataFilePath	The path and name of the file that lists the records that are found to contain bad data during the cleansing process. This file includes an error message for each record describing the reason it was rejected. If you specify a path that does not exist, you need to create the path.
goodDataFilePath	The path and name of the file that lists the records that do not contain any bad data. These records can be processed through the Initial Bulk Match and Load tool into the master index database. If you specify a path that does not exist, you need to create the path.
startCounter	The starting number for the GID generator for the cleansed records. The GID is a unique value used by the Initial Bulk Match and Load tool, which takes the good data file created by the cleansing process as its input. Enter a non-negative long value. For the initial cleansing, set this to `1`.
standardizer	An indicator of whether the Data Cleanser should standardize the input data according to the standardization rules defined in the `mefa.xml` file in the master index project. Specify `true` to standardize the data. This populates the standardized values into the output file. Specify `false` to bypass standardization. If no value is specified or this property is missing, the default is true.