Analyzing and Cleansing Data for Sun Master Index

Defining the Data Analysis Rules

In this step, you define the rules for the frequency and pattern analyses to perform prior to cleansing the data. Use the results of the initial run of the Data Profiler to learn more about your data so you know which fields need to be transformed or validated during the cleansing process. The Data Profiler runs standard frequency analyses, pattern analyses, and constrained frequency analyses, which allow you to specify validation rules.

You can define the rules using an XML or text editor, or you can access and edit the file from the Files window in NetBeans.

ProcedureTo Define Data Analysis Rules

Before You Begin

Determine the fields to profile, as described in Determining the Fields to Analyze.

  1. Navigate to the location of the Data Profiler.

    By default, the Data Profiler is generated and extracted to NetBeans_Projects/Project_Name/profiler-generated/profile.

  2. Open sampleConfig.xml.


    Note –

    You can rename the configuration file and you can create multiple configuration files, each defining a different set of rules. Use sampleConfig.xml as a template for any files you create.


  3. In the profilerVariable element, enter values for the attributes defined in Data Profiler Processing Attributes.

  4. In the varList element, define all of the field variables to use in the profiling rules.

    This step is optional. For more information, see Data Profiler Global Variables.

  5. Define the rules for each pattern or frequency analysis.

    For information about the available rules and the syntax to use, see Data Profiler Rules Syntax. You can create multiple configuration files to define different sets of rules.

Next Steps

Continue to Performing the Initial Data Analysis.