Analyzing and Cleansing Data for Sun Master Index

Data Cleansing and Profiling Process Overview

The general process of cleansing data begins with analysis, followed by cleansing, followed by additional analysis. By the time you are ready to load your existing data into the master index database, you want it to be of the best possible quality. To ensure this, you might need to repeat some of the profiling and cleansing steps to be sure all errant data is caught and fixed. The steps below provide a general overview to the analysis and cleansing process.

  1. Make sure the master index application is configured and generated.

  2. Extract the data to either a flat file.

  3. Generate the Data Profiler and Data Cleanser from the master index application.

  4. Determine the fields to use for the initial analysis.

  5. Define the rules for the initial analysis and perform the initial analysis. This results in a set of reports for you to review to determine the cleansing rules to define.

  6. Define the rules for the cleansing process and cleanse the data. This is an iterative process and you might find you need to define additional rules once the initial cleansing process is complete.

  7. Analyze the blocking and matching fields to determine how to best configure the match process for the master index application.

After you configure the master index, you can load the data into the master index database using the Initial Bulk Match and Load tool (see Loading the Initial Data Set for a Sun Master Index for more information).