Skip Navigation Links | |
Exit Print View | |
Analyzing and Cleansing Data for a Master Index Java CAPS Documentation |
Analyzing and Cleansing Data for a Master Index
Data Cleansing and Analysis Overview
Data Cleansing and Profiling Process Overview
Required Format for Flat Data Files
Generating the Data Profiler and Data Cleanser
To Generate the Data Profiler and Data Cleanser
Determining the Fields to Analyze
Defining the Data Analysis Rules
Performing the Initial Data Analysis
To Perform the Initial Data Analysis
Reviewing the Data Profiler Reports
Performing Frequency Analyses on Cleansed Data
Adjusting the Master Index Configuration
Data Profiler Processing Attributes
Data Profiler Global Variables
Simple Frequency Analysis Rules
Constrained Frequency Analysis Rules
Pattern Frequency Analysis Rules
Data Cleanser Processing Attributes
Data Cleanser Global Variables
Simple Frequency Analysis Report Samples
After you review the Data Profiler reports, you should be able to determine which fields need to be validated or modified before the records can be loaded into the master index database. Before you begin this step, have a clear outline of the fields to validate, the actions to take if data fails or passes validation, and valid and invalid values to compare against. Also define any values that will be replaced, deleted, or truncated by the Data Cleanser, as well as any values that are simply rejected.
Before You Begin
Review the Data Analysis reports to determine the cleansing rules (see Reviewing the Data Profiler Reports for more information).
By default, the Data Cleanser is generated and extracted to NetBeans_Projects/Project_Name/cleanser-generated/cleanser.
Note - You can rename the configuration file and you can create multiple configuration files, each defining a different set of rules. Use sampleConfig.xml as a template for any files you create.
For more information, see Data Cleanser Global Variables.
For information about the available rules and the syntax to use, see Data Cleanser Rules Syntax.
Next Steps
Continue to Cleansing the Legacy Data.