JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Loading the Initial Data Set for a Master Index     Java CAPS Documentation
search filter icon
search icon

Document Information

Loading the Initial Data Set for a Master Index

Related Topics

Initial Bulk Match and Load Overview

Initial Bulk Match and Load Process Overview

Data Preparation, Matching, and Loading Procedure Overview

Distributed Processing

About the Bulk Match Process

Block Distribution

Record Matching

EUID Assignment

Master Index Image Creation

Potential Duplicate Creation

About the Bulk Load Process

About the Cluster Synchronizer

Required Format for Flat Data Files

Generating the Initial Bulk Match and Load Tool

To Generate the Initial Bulk Match and Load Tool

Configuring the Environment

To Configure the Environment

Creating the Cluster Synchronizer Database

To Create the Cluster Synchronization Tables

Configuring the Initial Bulk Match and Load Tool

Configuring the Initial Bulk Match and Load Tool Processing

To Configure the IBML Tool

Configuring Initial Bulk Match and Load Tool Logging

To Configure IBML Tool Logging

Initial Bulk Match and Load Tool Configuration Properties

Initial Bulk Match and Load Tool Field Validation Configuration

Initial Bulk Match and Load Tool Blocking Query Configuration

Initial Bulk Match and Load Tool Match String Configuration

Initial Bulk Match and Load Tool Processing Configuration

FTP Server Configuration

Cluster Synchronizer Database Configuration

SQL*Loader Configuration

Data Reader Configuration

Additional Properties

Initial Bulk Match and Load Tool Logging Properties

Performing a Match Analysis

Running the Bulk Matcher in Analysis Mode

To Run the Bulk Matcher in Analysis Mode

Reviewing the Match Analysis Results

Reconfiguring the Matching Logic

To Reconfigure the Matching Logic

Performing the Bulk Match

To Perform the Bulk Match

Running the Bulk Match and Bulk Load in One Step (SQL*Loader Only)

To Run the Bulk Match and Bulk Load in One Step

Loading the Matched Data Into the Master Index Database

Loading Matched Data Using SQL*Loader

To Load Matched Data Using SQL*Loader

Loading Matched Data Using the Command-Line Bulk Loader

To Load Matched Data Using the Command-Line Bulk Loader

Command-Line Bulk Loader Properties

Loading the Matched Data Into the Master Index Database

The IBML Tool provides two methods to load the master data images generated by the Bulk Matcher. A command line tool is provided to generate and then run the extract, transform, and load (ETL) collaborations that load the data. Alternatively, you can use SQL*Loader to load the data if the master index database is running on Oracle.

Perform one of the following procedures to load the matched data into your master index database:

Loading Matched Data Using SQL*Loader

If the master index database runs on an Oracle platform, you can use either SQL*Loader or the command-line Bulk Loader to load the matched data into the database. SQL*Loader cannot be used for a SQL Server or MySQL database.

To Load Matched Data Using SQL*Loader

  1. Complete the steps under Performing the Bulk Match.
  2. From the master IBML Tool home directory, run cluster-truncate.sql against the cluster synchronizer database.
  3. For each IBML Tool, open loader-config.xml (located in the IBML Tool home directory in the conf subdirectory).
    1. Define the SQL*Loader property as described in SQL*Loader Configuration.
    2. Change the value of the BulkLoad property to true.
    3. Save and close the file.
  4. To generate the loader, do one of the following.
    • If the master loader is running on Windows:
      1. Navigate to the master IBML Tool home directory and open generate-sql-loader.bat for editing.
      2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, set JDBC_JAR_PATH=C:\oracle\jdbc\lib\ojdbc14.jar.
      3. Close and save the file.
      4. Double-click generate-sql-loader.bat or type generate-sql-loader from a command line.
    • If the master loader is running on UNIX:
      1. Navigate to the master IBML Tool home directory and open sh generate-sql-loader.sh for editing.
      2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, export JDBC_JAR_PATH=${oracle_home}/jdbc/lib/ojdbc14.jar.
      3. Close and save the file.
      4. Type sh generate-sql-loader.sh at the command line.

      A new directory named sqlldr is created in the working directory.

  5. In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.
  6. In the sqlldr folder in the working directory, run drop.sql against the master index database to drop constraints and indexes.
  7. In the sqlldr directory, do one of the following:
    • On Windows, double-click bulk-loader.bat or type bulk-loader.bat from a command line.
    • On UNIX, type sh bulk-loader.sh at the command line.
  8. After the data is loaded, close any command prompts that were left open by the process and examine the SQL*Loader log files located in the sqlldr/log directory to ensure there were no errors during processing.

    Note - Any records that contained bad data and were not inserted into the master index database are written to the sqlldr/bad directory. Any records that contained bad data and were discarded are written to the sqlldr/discard directory.


  9. In the sqlldr directory, run create.sql against the master index database to reinstate the dropped indexes and constraints.

Loading Matched Data Using the Command-Line Bulk Loader

You can use the command-line Bulk Loader to load data into an Oracle, MySQL, or SQL Server database. Using the command-line tool does not require the use of NetBeans, but it does require that NetBeans be installed on the master loader machine.

To Load Matched Data Using the Command-Line Bulk Loader

  1. Complete the steps under Performing the Bulk Match.
  2. In the master IBML Tool home directory, open genCollab.bat (or genCollab.sh for UNIX) and configure the properties described in Command-Line Bulk Loader Properties.
  3. Save and close the file.
  4. In the master IBML Tool home directory, do one of the following:
    • On Windows, double-click genCollab.bat or type genCollab.bat from a command line.
    • On UNIX, type sh genCollab.sh at the command line.

      This generates a zip file in the IBML Tool home directory.

  5. Extract the contents of etl-loader.zip to the current directory.

    This generates an ETL collaboration and creates a new directory, ETLloader, in the IBML Tool home directory.

  6. In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.
  7. In the ETLloader/config directory, open logger.properties and modify any logging properties if needed.
  8. In the ETLloader directory, do one of the following:
    • On Windows, double-click startLoad.bat or type startLoad.bat from a command line.
    • On UNIX, type sh startLoad.sh at the command line.
  9. After the data is loaded, check the log files to ensure there were no errors during processing.

Command–Line Bulk Loader Properties

The ETL collaboration is generated by a file that includes configurable properties you need to define. The file is named genCollab.bat for Windows and genCollab.sh for UNIX. It is located in the directory where you extracted the IBML Tool files on the master processor. The following table lists and describes the default properties for the file.


Tip - If you get a usage error when running the Bulk Loader after configuring the properties below, remove the double-quotes from around the paths and filenames (but not from the delimiters).


Table 8 Command–Line Bulk Loader Properties

Property Name
Description
NetBeans and Java Properties
NB_HOME
The absolute path to the NetBeans home directory on the master processor.
JAVAPATH
The absolute path to the bin directory in the Java installation; for example, C:\\Java\jre1.5.0_11\bin.
DB_DRIVER_PATH
The absolute path to the database driver for the database platform of the master index database.
DB_DRIVER_NAME
The name of the database driver in the path specified above; for example, ojdbc14.jar.
Source Data Properties
SOURCE_LOC
The absolute path to the data files to be loaded into the master index database. These are located in the masterindex folder in the working directory you created for the Bulk Matcher.
FIELD_DELIMITER
The character that separates the fields in the master data image files. By default, fields are delimited by a pipe character (|).
RECORD_DELIMITER
The characters that separate the records in the master data image files. By default, the records are delimited by three dollar signs ($$$).
Target Database Properties
TARGET_DB_TYPE
The database platform used for the master index database. Specify 1 for Oracle, 2 for MySQL, or 3 for SQL Server.
TARGET_LOC
The name or IP address of the server on which the master index database resides.
TARGET_PORT
The port number on which the master index database is listening. The default port is 1521 for Oracle, 1433 for SQL Server, and 3306 for MySQL.
TARGET_ID
The SID or database name of the master index database.
TARGET_SCHEMA
The name of the database schema that defines the tables, fields, and relationships for the master index database. The default schema for SQL Server databases is “dbo”; for Oracle, the default is the same as the SID name of the database.
TARGET_CATALOG
The name of the database catalog containing the master index database metadata. This property can be left empty.
TARGET_LOGIN
The login ID of the user with administrator abilities for the master index database.
TARGET_PW
The password for the above login ID.