Analyzing and Cleansing Data for Sun Master Index

Configuring the Data Cleansing Rules

After you review the Data Profiler reports, you should be able to determine which fields need to be validated or modified before the records can be loaded into the master index database. Before you begin this step, have a clear outline of the fields to validate, the actions to take if data fails or passes validation, and valid and invalid values to compare against. Also define any values that will be replaced, deleted, or truncated by the Data Cleanser, as well as any values that are simply rejected.

ProcedureTo Configure the Data Cleansing Rules

Before You Begin

Review the Data Analysis reports to determine the cleansing rules (see Reviewing the Data Profiler Reports for more information).

  1. Navigate to the location of the Data Cleanser.

    By default, the Data Cleanser is generated and extracted to NetBeans_Projects/Project_Name/cleanser-generated/cleanser.

  2. Open sampleConfig.xml.


    Note –

    You can rename the configuration file and you can create multiple configuration files, each defining a different set of rules. Use sampleConfig.xml as a template for any files you create.


  3. In the cleansingVariable element, enter values for the attributes defined in Data Cleanser Processing Attributes.

  4. In the varList element, define all of the variables to use in the cleansing rules.

    For more information, see Data Cleanser Global Variables.

  5. Define the rules for the Data Cleanser.

    For information about the available rules and the syntax to use, see Data Cleanser Rules Syntax.

  6. Save and close the file.

  7. If you specified a path that does not exist for the output files, create the path you specified.

Next Steps

Continue to Cleansing the Legacy Data.