Skip Navigation Links | |
Exit Print View | |
Analyzing and Cleansing Data for a Master Index Java CAPS Documentation |
Analyzing and Cleansing Data for a Master Index
Data Cleansing and Analysis Overview
Data Cleansing and Profiling Process Overview
Required Format for Flat Data Files
Generating the Data Profiler and Data Cleanser
To Generate the Data Profiler and Data Cleanser
Determining the Fields to Analyze
Defining the Data Analysis Rules
Performing the Initial Data Analysis
To Perform the Initial Data Analysis
Reviewing the Data Profiler Reports
Configuring the Data Cleansing Rules
To Configure the Data Cleansing Rules
Performing Frequency Analyses on Cleansed Data
Adjusting the Master Index Configuration
Data Profiler Processing Attributes
Data Profiler Global Variables
Simple Frequency Analysis Rules
Constrained Frequency Analysis Rules
Pattern Frequency Analysis Rules
Data Cleanser Processing Attributes
Data Cleanser Global Variables
Simple Frequency Analysis Report Samples
After you customize the configuration file for the Data Cleanser, you can run the Data Cleanser against the staging database or a flat file. This step generates two files, one containing the records that passed all validation and was successfully cleansed and one containing records that failed validation along with an error message for each.
Before You Begin
Before performing this step, make sure you have completed the following procedures:
Caution - Be sure to change the DBConnection attribute in the configuration file to point to the renamed file and change the startcounter value to the next record to be processed. For example, if the original run processed 100 good records, change the value to “101” to start processing the bad records. Any records cleansed from the fixed file are appended to the good data file. |
Note - The final output to the good file can be loaded into the master index database using the Initial Bulk Match and Load tool (see Loading the Initial Data Set for a Master Index). The Data Cleanser automatically places the data in the correct format based on the object.xml file.
Next Steps
Continue to Performing Frequency Analyses on Cleansed Data to perform frequency analyses on the cleansed data.