JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Analyzing and Cleansing Data for a Master Index     Java CAPS Documentation
search filter icon
search icon

Document Information

Analyzing and Cleansing Data for a Master Index

Related Topics

Data Cleansing and Analysis Overview

About the Data Profiler

About the Data Cleanser

Data Cleansing and Profiling Process Overview

Required Format for Flat Data Files

Generating the Data Profiler and Data Cleanser

To Generate the Data Profiler and Data Cleanser

Configuring the Environment

To Configure the Environment

Extracting the Legacy Data

Determining the Fields to Analyze

Defining the Data Analysis Rules

To Define Data Analysis Rules

Performing the Initial Data Analysis

To Perform the Initial Data Analysis

Reviewing the Data Profiler Reports

Configuring the Data Cleansing Rules

To Configure the Data Cleansing Rules

Cleansing the Legacy Data

To Cleanse the Data

Performing Frequency Analyses on Cleansed Data

Adjusting the Master Index Configuration

Data Profiler Rules Syntax

Data Profiler Processing Attributes

Data Profiler Global Variables

Simple Frequency Analysis Rules

Constrained Frequency Analysis Rules

Pattern Frequency Analysis Rules

Data Cleanser Rules Syntax

Data Cleanser Processing Attributes

Data Cleanser Global Variables

Data Validation Rules

dataLength

dateRange

matchFromFile

patternMatch

range

reject

return

validateDBField

Data Transformation Rules

assign

patternReplace

replace

truncate

Conditional Data Rules

dataLength

equals

isnull

matches

Conditional Operators

Data Profiler Report Samples

Simple Frequency Analysis Report Samples

Constrained Frequency Analysis Report Samples

Pattern Frequency Analysis Report Samples

Configuring the Data Cleansing Rules

After you review the Data Profiler reports, you should be able to determine which fields need to be validated or modified before the records can be loaded into the master index database. Before you begin this step, have a clear outline of the fields to validate, the actions to take if data fails or passes validation, and valid and invalid values to compare against. Also define any values that will be replaced, deleted, or truncated by the Data Cleanser, as well as any values that are simply rejected.

To Configure the Data Cleansing Rules

Before You Begin

Review the Data Analysis reports to determine the cleansing rules (see Reviewing the Data Profiler Reports for more information).

  1. Navigate to the location of the Data Cleanser.

    By default, the Data Cleanser is generated and extracted to NetBeans_Projects/Project_Name/cleanser-generated/cleanser.

  2. Open sampleConfig.xml.

    Note - You can rename the configuration file and you can create multiple configuration files, each defining a different set of rules. Use sampleConfig.xml as a template for any files you create.


  3. In the cleansingVariable element, enter values for the attributes defined in Data Cleanser Processing Attributes.
  4. In the varList element, define all of the variables to use in the cleansing rules.

    For more information, see Data Cleanser Global Variables.

  5. Define the rules for the Data Cleanser.

    For information about the available rules and the syntax to use, see Data Cleanser Rules Syntax.

  6. Save and close the file.
  7. If you specified a path that does not exist for the output files, create the path you specified.

Next Steps

Continue to Cleansing the Legacy Data.