Loading the Initial Data Set for a Sun Master Index

Initial Bulk Match and Load Tool Processing Configuration

The processing properties described in the following table configure how the IBML Tool processes data. In these properties, you define a name for each IBML Tool, the location of the working directories, polling properties, and so on. Some of these properties only apply to specific phases of the match and load process, and some apply to either the master or slave processors.

Table 1 IBML Tool Processing Properties

Property Name 

Description 

loaderName 

A unique name for the IBML Tool residing on the current processor. This name should be unique to each IBML Tool in the distributed environment. It does not need to be modified if you are using a single processor. 

isMasterLoader 

An indicator of whether the IBML Tool being configured is the master IBML Tool. Specify true if it is the master or only IBML Tool; otherwise specify false.

matchAnalyzerMode 

An indicator of whether to process the data in match analysis mode, which only generates analysis reports, or to perform the complete match process and generate the master index image files. Specify true to perform an analysis only; specify false to perform the actual blocking and matching process and generate the master index image files.

BulkLoad 

An indicator of whether the current run will load the matched data into the database using SQL*Loader once the match process is complete. Specify true to load the data. To run a match analysis or just the matching process, specify false. If you just run the match process, you can verify the process and then load the output of the Bulk Matcher at a later time.

standardizationMode 

An indicator of whether to standardize the input data. Leave the value of the this property set to true.

deleteIntermediateDirs 

An indicator of whether the working directories are deleted when each process is complete. Specify true to delete the directories; specify false to retain the directories.

optimizeDuplicates 

An indicator of whether to automatically merge records in the input data if they have the same system and local ID. Specify true to automatically merge the duplicate records; otherwise specify false. The default is true.

rmiPort 

This is not currently used. 

workingDir 

The absolute path to the directory in which the IBML Tools create the working files as they progress through the processing stages. The master IBML Tool also creates the master index image files here. If the path you specify does not exist, create it before running the IBML Tool. 

ftp.workingDir 

The absolute path to the directory on the master processor where files are placed for distribution to the remaining IBML Tools. You only need to define this property for the master IBML Tool and only if you are running multiple IBML Tools. All other tools ignore this property. 

numBlockBuckets 

The number of block buckets to create for the initial distribution of data blocks. Each IBML Tool works on one bucket at a time so multiple buckets are processed at once. The number of block buckets you specify depends on the number of records to process and how specific the data blocks are in the blocking query. 

numThreads 

The number of threads to run in parallel during processing. 

numEUIDBuckets 

The number of buckets the EUID assigner should place the processed records into after they have been matched and assigned an EUID. 

totalNoOfRecords 

The total number of records being processed. This does not need to be an exact value, but needs to be greater than or equal to the exact number of records. 

pollInterval 

The number of milliseconds the IBML Tools should wait before polling the master IBML Tool for their next task. 

maxWaitTime 

The maximum time for an IBML Tool to wait for the next task before giving up.