Skip Navigation Links | |
Exit Print View | |
Analyzing and Cleansing Data for a Master Index Java CAPS Documentation |
Analyzing and Cleansing Data for a Master Index
Data Cleansing and Analysis Overview
Data Cleansing and Profiling Process Overview
Required Format for Flat Data Files
Generating the Data Profiler and Data Cleanser
To Generate the Data Profiler and Data Cleanser
Determining the Fields to Analyze
Performing the Initial Data Analysis
To Perform the Initial Data Analysis
Reviewing the Data Profiler Reports
Configuring the Data Cleansing Rules
To Configure the Data Cleansing Rules
Performing Frequency Analyses on Cleansed Data
Adjusting the Master Index Configuration
Data Profiler Processing Attributes
Data Profiler Global Variables
Simple Frequency Analysis Rules
Constrained Frequency Analysis Rules
Pattern Frequency Analysis Rules
Data Cleanser Processing Attributes
Data Cleanser Global Variables
Simple Frequency Analysis Report Samples
In this step, you define the rules for the frequency and pattern analyses to perform prior to cleansing the data. Use the results of the initial run of the Data Profiler to learn more about your data so you know which fields need to be transformed or validated during the cleansing process. The Data Profiler runs standard frequency analyses, pattern analyses, and constrained frequency analyses, which allow you to specify validation rules.
You can define the rules using an XML or text editor, or you can access and edit the file from the Files window in NetBeans.
Before You Begin
Determine the fields to profile, as described in Determining the Fields to Analyze.
By default, the Data Profiler is generated and extracted to NetBeans_Projects/Project_Name/profiler-generated/profile.
Note - You can rename the configuration file and you can create multiple configuration files, each defining a different set of rules. Use sampleConfig.xml as a template for any files you create.
This step is optional. For more information, see Data Profiler Global Variables.
For information about the available rules and the syntax to use, see Data Profiler Rules Syntax. You can create multiple configuration files to define different sets of rules.
Next Steps
Continue to Performing the Initial Data Analysis.