Loading the Initial Data Set for a Sun Master Index

Loading the Matched Data Into the Master Index Database

The IBML Tool provides two methods to load the master data images generated by the Bulk Matcher. A command line tool is provided to generate and then run the extract, transform, and load (ETL) collaborations that load the data. Alternatively, you can use SQL*Loader to load the data if the master index database is running on Oracle.

Perform one of the following procedures to load the matched data into your master index database:

Loading Matched Data Using SQL*Loader


Note –

This procedure includes steps that were updated for Java CAPS Release 6 Update 1. The variable JDBC_JAR_PATH was previously ORACLE_JDBC_JAR, and wasn't present in all files.


If the master index database runs on an Oracle platform, you can use either SQL*Loader or the command-line Bulk Loader to load the matched data into the database. SQL*Loader cannot be used for a SQL Server or MySQL database.

ProcedureTo Load Matched Data Using SQL*Loader

  1. Complete the steps under Performing the Bulk Match.

  2. From the master IBML Tool home directory, run cluster-truncate.sql against the cluster synchronizer database.

  3. For each IBML Tool, open loader-config.xml (located in the IBML Tool home directory in the conf subdirectory).

    1. Define the SQL*Loader property as described in SQL*Loader Configuration.

    2. Change the value of the BulkLoad property to true.

    3. Save and close the file.

  4. To generate the loader, do one of the following.

    • If the master loader is running on Windows:

      1. Navigate to the master IBML Tool home directory and open generate-sql-loader.bat for editing.

      2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, set JDBC_JAR_PATH=C:\oracle\jdbc\lib\ojdbc14.jar.

      3. Close and save the file.

      4. Double-click generate-sql-loader.bat or type generate-sql-loader from a command line.

    • If the master loader is running on UNIX:

      1. Navigate to the master IBML Tool home directory and open sh generate-sql-loader.sh for editing.

      2. Change the value of the JDBC_JAR_PATH variable in the first line to the location and name of the database driver for the master index database platform; for example, export JDBC_JAR_PATH=${oracle_home}/jdbc/lib/ojdbc14.jar.

      3. Close and save the file.

      4. Type sh generate-sql-loader.sh at the command line.

      A new directory named sqlldr is created in the working directory.

  5. In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.

  6. In the sqlldr folder in the working directory, run drop.sql against the master index database to drop constraints and indexes.

  7. In the sqlldr directory, do one of the following:

    • On Windows, double-click bulk-loader.bat or type bulk-loader.bat from a command line.

    • On UNIX, type sh bulk-loader.sh at the command line.

  8. After the data is loaded, close any command prompts that were left open by the process and examine the SQL*Loader log files located in the sqlldr/log directory to ensure there were no errors during processing.


    Note –

    Any records that contained bad data and were not inserted into the master index database are written to the sqlldr/bad directory. Any records that contained bad data and were discarded are written to the sqlldr/discard directory.


  9. In the sqlldr directory, run create.sql against the master index database to reinstate the dropped indexes and constraints.

Loading Matched Data Using the Command-Line Bulk Loader

You can use the command-line Bulk Loader to load data into an Oracle, MySQL, or SQL Server database. Using the command-line tool does not require the use of NetBeans, but it does require that NetBeans be installed on the master loader machine.

ProcedureTo Load Matched Data Using the Command-Line Bulk Loader

  1. Complete the steps under Performing the Bulk Match.

  2. In the master IBML Tool home directory, open genCollab.bat (or genCollab.sh for UNIX) and configure the properties described in Command–Line Bulk Loader Properties.

  3. Save and close the file.

  4. In the master IBML Tool home directory, do one of the following:

    • On Windows, double-click genCollab.bat or type genCollab.bat from a command line.

    • On UNIX, type sh genCollab.sh at the command line.

      This generates a zip file in the IBML Tool home directory.

  5. Extract the contents of etl-loader.zip to the current directory.

    This generates an ETL collaboration and creates a new directory, ETLloader, in the IBML Tool home directory.

  6. In the master IBML Tool home directory, run cluster-truncate.sql against the master index database to clear the cluster synchronizer tables.

  7. In the ETLloader/config directory, open logger.properties and modify any logging properties if needed.

  8. In the ETLloader directory, do one of the following:

    • On Windows, double-click startLoad.bat or type startLoad.bat from a command line.

    • On UNIX, type sh startLoad.sh at the command line.

  9. After the data is loaded, check the log files to ensure there were no errors during processing.

Command–Line Bulk Loader Properties

The ETL collaboration is generated by a file that includes configurable properties you need to define. The file is named genCollab.bat for Windows and genCollab.sh for UNIX. It is located in the directory where you extracted the IBML Tool files on the master processor. The following table lists and describes the default properties for the file.


Tip –

If you get a usage error when running the Bulk Loader after configuring the properties below, remove the double-quotes from around the paths and filenames (but not from the delimiters).


Table 7 Command–Line Bulk Loader Properties

Property Name 

Description 

NetBeans and Java Properties

NB_HOME 

The absolute path to the NetBeans home directory on the master processor. 

JAVAPATH 

The absolute path to the bin directory in the Java installation; for example, C:\\Java\jre1.5.0_11\bin.

DB_DRIVER_PATH 

The absolute path to the database driver for the database platform of the master index database. 

DB_DRIVER_NAME 

The name of the database driver in the path specified above; for example, ojdbc14.jar.

Source Data Properties

SOURCE_LOC 

The absolute path to the data files to be loaded into the master index database. These are located in the masterindex folder in the working directory you created for the Bulk Matcher.

FIELD_DELIMITER 

The character that separates the fields in the master data image files. By default, fields are delimited by a pipe character (|). 

RECORD_DELIMITER 

The characters that separate the records in the master data image files. By default, the records are delimited by three dollar signs ($$$). 

Target Database Properties

TARGET_DB_TYPE 

The database platform used for the master index database. Specify 1 for Oracle, 2 for MySQL, or 3 for SQL Server.  

TARGET_LOC 

The name or IP address of the server on which the master index database resides. 

TARGET_PORT 

The port number on which the master index database is listening. The default port is 1521 for Oracle, 1433 for SQL Server, and 3306 for MySQL. 

TARGET_ID 

The SID or database name of the master index database. 

TARGET_SCHEMA 

The name of the database schema that defines the tables, fields, and relationships for the master index database. The default schema for SQL Server databases is “dbo”; for Oracle, the default is the same as the SID name of the database. 

TARGET_CATALOG 

The name of the database catalog containing the master index database metadata. This property can be left empty. 

TARGET_LOGIN 

The login ID of the user with administrator abilities for the master index database. 

TARGET_PW 

The password for the above login ID.