Both the Data Cleanser and the Data Profiler are designed to read data from a staging database created by extracting data from your source database using Data Integrator. You can also extract your data to a flat file using the extractor of your choice. If you use Data Integrator to extract the data to be analyzed and cleansed, the extracted data is written to an Axion flat-file database in the required format for the Data Profiler and Data Cleanser.
If you use a data extractor other than Data Integrator, the data needs to be placed in a flat file a format the Data Profiler and Data Cleanser can read. If your data is in a different format, you can define a custom data reader to read the flat file into the Data Profiler and Data Cleanser. The analysis tools can read a flat file in the following format without any additional configuration:
GID|SystemCode|LocalID|UpdateDate|UserID|ObjectFields |
where:
GID is a global ID computed by the data analysis. This field can be empty in the flat file.
SystemCode is the processing code for the system from which the record originated.
LocalID is the object's local ID in the given system.
UpdateDate is the most recent update date for the record. This field can be empty.
UserID is the logon ID of the user who last updated the record. This field can be empty.
ObjectFields includes the objects and fields that are defined in object.xml. Be sure to include every field defined in the object structure in the order they are defined (include standardized, phonetic, and parsed fields). Child object types are delimited by a pound sign (#) and multiple child objects of one type are delimited by a dollar sign ($).
Below is an example of a valid input record based on the standard master index Person template, which includes alias, address, and phone objects. Note the empty fields after the first and last names for the phonetic and standardized data that will be inserted by the Data Cleanser. There are also empty fields after the street address for the parsed street address components that will also be inserted by the Data Cleanser.
28|ORACLE|00160419|11/14/1999 08:41:10|GSMYTHE|P|ELIZABETH|||ANN|WARREN||||MRS |554-44-55555|08/18/1977|Y|F|M|W|13|BAP|ENG|STEVE|ANN|MARCH|GEORGE|CAHILL|SHEFFIELD |CT|USA|E|Y||C4411444|CA|07/21/2018||ENG|USA#$BETH||CAHILL$LIZ|ANN|CAHILL#$H|1519 BOARDWALK||||||Unit 5|SHEFFIELD|CT|09876|1075|CAPE BURR|USA$W|12500 EAST RIVER ST. ||||||Suite 1310|CAPE BURR|CT|09877||CAPE BURR|USA#$CH|9895557848|$CB|9895551500|19 |