Skip Navigation Links | |
Exit Print View | |
Analyzing and Cleansing Data for a Master Index Java CAPS Documentation |
Analyzing and Cleansing Data for a Master Index
Data Cleansing and Analysis Overview
Data Cleansing and Profiling Process Overview
Required Format for Flat Data Files
Generating the Data Profiler and Data Cleanser
To Generate the Data Profiler and Data Cleanser
Determining the Fields to Analyze
Defining the Data Analysis Rules
Performing the Initial Data Analysis
To Perform the Initial Data Analysis
Reviewing the Data Profiler Reports
Configuring the Data Cleansing Rules
To Configure the Data Cleansing Rules
Performing Frequency Analyses on Cleansed Data
Data Profiler Processing Attributes
Data Profiler Global Variables
Simple Frequency Analysis Rules
Constrained Frequency Analysis Rules
Pattern Frequency Analysis Rules
Data Cleanser Processing Attributes
Data Cleanser Global Variables
Simple Frequency Analysis Report Samples
Based on the results of the final frequency analyses (see Performing Frequency Analyses on Cleansed Data), you might need to adjust the configuration of the master index application by adjusting the block fields if the frequencies are too high and by setting the relative match weights based on how unique each match field is. The results could also indicate that you might need to define exclusion files for the Initial Bulk Match and Load tool or filters for the SBR filter so certain values are not used for matching. For example, if there are a large number of SSN fields with the default value “000–00–0000”, you can exclude that values from the blocking process, the match process, or the survivor calculation for the single best record.