Loading the Matched Data Into the Master Index Database

The IBML Tool provides two methods to load the master data images generated by the Bulk Matcher. A command line tool is provided to generate and then run the extract, transform, and load (ETL) collaborations that load the data. Alternatively, you can use SQL*Loader to load the data if the master index database is running on Oracle.

Perform one of the following procedures to load the matched data into your master index database:

Loading Matched Data Using SQL*Loader

Note –

This procedure includes steps that were updated for Java CAPS Release 6 Update 1. The variable JDBC_JAR_PATH was previously ORACLE_JDBC_JAR, and wasn't present in all files.

If the master index database runs on an Oracle platform, you can use either SQL*Loader or the command-line Bulk Loader to load the matched data into the database. SQL*Loader cannot be used for a SQL Server or MySQL database.

To Load Matched Data Using SQL*Loader

Complete the steps under Performing the Bulk Match.

From the master IBML Tool home directory, run cluster-truncate.sql against the cluster synchronizer database.

For each IBML Tool, open loader-config.xml (located in the IBML Tool home directory in the conf subdirectory).
1. Define the SQL*Loader property as described in SQL*Loader Configuration.
2. Change the value of the BulkLoad property to true.
3. Save and close the file.

To generate the loader, do one of the following.
- If the master loader is running on Windows:
  1. Navigate to the master IBML Tool home directory and open generate-sql-loader.bat for editing.
  2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, set JDBC_JAR_PATH=C:\oracle\jdbc\lib\ojdbc14.jar.
  3. Close and save the file.
  4. Double-click generate-sql-loader.bat or type generate-sql-loader from a command line.
- If the master loader is running on UNIX:
  1. Navigate to the master IBML Tool home directory and open sh generate-sql-loader.sh for editing.
  2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, export JDBC_JAR_PATH=${oracle_home}/jdbc/lib/ojdbc14.jar.
  3. Close and save the file.
  4. Type sh generate-sql-loader.sh at the command line.
  A new directory named sqlldr is created in the working directory.

In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.

In the sqlldr folder in the working directory, run drop.sql against the master index database to drop constraints and indexes.

In the sqlldr directory, do one of the following:
- On Windows, double-click bulk-loader.bat or type bulk-loader.bat from a command line.
- On UNIX, type sh bulk-loader.sh at the command line.

After the data is loaded, close any command prompts that were left open by the process and examine the SQL*Loader log files located in the sqlldr/log directory to ensure there were no errors during processing.

Note –
Any records that contained bad data and were not inserted into the master index database are written to the sqlldr/bad directory. Any records that contained bad data and were discarded are written to the sqlldr/discard directory.

In the sqlldr directory, run create.sql against the master index database to reinstate the dropped indexes and constraints.

Loading Matched Data Using the Command-Line Bulk Loader

You can use the command-line Bulk Loader to load data into an Oracle, MySQL, or SQL Server database. Using the command-line tool does not require the use of NetBeans, but it does require that NetBeans be installed on the master loader machine.

To Load Matched Data Using the Command-Line Bulk Loader

Complete the steps under Performing the Bulk Match.

In the master IBML Tool home directory, open genCollab.bat (or genCollab.sh for UNIX) and configure the properties described in Command–Line Bulk Loader Properties.

Save and close the file.

In the master IBML Tool home directory, do one of the following:
- On Windows, double-click genCollab.bat or type genCollab.bat from a command line.
- On UNIX, type sh genCollab.sh at the command line.
  
  This generates a zip file in the IBML Tool home directory.

Extract the contents of etl-loader.zip to the current directory.

This generates an ETL collaboration and creates a new directory, ETLloader, in the IBML Tool home directory.

In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.

In the ETLloader/config directory, open logger.properties and modify any logging properties if needed.

In the ETLloader directory, do one of the following:
- On Windows, double-click startLoad.bat or type startLoad.bat from a command line.
- On UNIX, type sh startLoad.sh at the command line.

After the data is loaded, check the log files to ensure there were no errors during processing.

Command–Line Bulk Loader Properties

The ETL collaboration is generated by a file that includes configurable properties you need to define. The file is named genCollab.bat for Windows and genCollab.sh for UNIX. It is located in the directory where you extracted the IBML Tool files on the master processor. The following table lists and describes the default properties for the file.

Tip –

If you get a usage error when running the Bulk Loader after configuring the properties below, remove the double-quotes from around the paths and filenames (but not from the delimiters).

Table 7 Command–Line Bulk Loader Properties


Property Name	Description
NetBeans and Java Properties
NB_HOME	The absolute path to the NetBeans home directory on the master processor.
JAVAPATH	The absolute path to the `bin` directory in the Java installation; for example, `C:\\Java\jre1.5.0_11\bin`.
DB_DRIVER_PATH	The absolute path to the database driver for the database platform of the master index database.
DB_DRIVER_NAME	The name of the database driver in the path specified above; for example, `ojdbc14.jar`.
Source Data Properties
SOURCE_LOC	The absolute path to the data files to be loaded into the master index database. These are located in the `masterindex` folder in the working directory you created for the Bulk Matcher.
FIELD_DELIMITER	The character that separates the fields in the master data image files. By default, fields are delimited by a pipe character (\|).
RECORD_DELIMITER	The characters that separate the records in the master data image files. By default, the records are delimited by three dollar signs ($$$).
Target Database Properties
TARGET_DB_TYPE	The database platform used for the master index database. Specify 1 for Oracle, 2 for MySQL, or 3 for SQL Server.
TARGET_LOC	The name or IP address of the server on which the master index database resides.
TARGET_PORT	The port number on which the master index database is listening. The default port is 1521 for Oracle, 1433 for SQL Server, and 3306 for MySQL.
TARGET_ID	The SID or database name of the master index database.
TARGET_SCHEMA	The name of the database schema that defines the tables, fields, and relationships for the master index database. The default schema for SQL Server databases is “dbo”; for Oracle, the default is the same as the SID name of the database.
TARGET_CATALOG	The name of the database catalog containing the master index database metadata. This property can be left empty.
TARGET_LOGIN	The login ID of the user with administrator abilities for the master index database.
TARGET_PW	The password for the above login ID.

Previous: Running the Bulk Match and Bulk Load in One Step (SQL*Loader Only)