Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index User's Guide Java CAPS Documentation |
Oracle Java CAPS Master Index User's Guide
Master Index Development Process Overview
The Master Index Framework and the Runtime Environment
Before You Begin Developing a Master Index
Preliminary Data Analysis for a Master Index
Planning a Master Index Project
Master Index Project Initiation Checklist
Creating a Master Index Application
Step 1: Create a Project and Start the Wizard
To Create a Project and Start the Wizard
Step 2: Name the Master Index Application
To Name the Master Index Application
Step 4: Define the Deployment Environment
To Define the Deployment Environment
Step 5: Define Parent and Child Objects
Creating Objects from a Template
Deleting an Object from the Structure
Step 6: Define the Fields for Each Object
Step 7: Generate the Project Files
To Generate the Configuration Files
Step 8: Review the Configuration Files
Master Index Wizard Field Properties and Name Restrictions
Master Index Wizard Field Name Restrictions
Master Index Wizard General Field Properties
Master Index Wizard MIDM Field Properties
Custom Plug-ins for Master Index Custom Transaction Processing
Master Index Update Policy Plug-ins
Master Index Field Validation Plug-ins
Master Index Field Masking Plug-ins
Master Index Match Processing Logic Plug-ins
Custom Match Processing Logic Methods
Custom Match Processing Logic Plug-in Requirements
Custom Match Processing Configuration
Master Index Custom Plug-in Exception Processing
Custom Plug-Ins for Master Index Custom Components
Master Index Survivor Calculator Plug-ins
Master Index Query Builder Plug-ins
Master Index Block Picker Plug-ins
Master Index Pass Controller Plug-ins
Standardization Engine Plug-ins
Phonetic Encoders Plug-ins for a Master Index
Implementing Master Index Custom Plug-ins
Generating the Master Index Application
To Generate the Application for the First Time
Master Index Database Scripts and Design
Master Index Database Requirements
Database Platform Requirements
Master Index Database Structure
Designing the Master Index Database
Creating the Master Index Database
Step 1: Analyze the Master Index Database Requirements
Step 2: Create a Master Index Database and User
Step 3: Define Master Index Database Indexes
Step 4: Define Master Index External Systems
Master Index Database Table Description for sbyn_systems
Step 5: Define Master Index Code Lists
To Customize Common Table Data for MySQL
To Customize Common Table Data for Oracle
To Customize Common Table Data for SQL Server
Step 6: Define Master Index User Code Lists
Master Index Database Table Description for sbyn_user_code
Step 7: Create Custom Master Index Database Scripts
Step 8: Create the Master Index Database Structure
To Create the Database Structure
Step 9: Specify a Starting EUID for a Master Index
Dropping Master Index Database Tables
Defining the Database Connection Pools
Step 1: Add the MySQL or Oracle Driver to the Application Server
Step 2: Create two JDBC Connection Pools
To Create the JDBC Connection Pools
Before you create the master index database, familiarize yourself with the database scripts and the database structure. Analyze your database requirements, including hardware considerations, startup data, indexing needs, performance, and so on.
The following topics provide information to help you in your analysis.
The wizard creates SQL scripts based on information you specified about code lists and external systems that you can use to define startup data for the master index application. When you generate the application, additional scripts are generated for creating or dropping database tables. These scripts appear under the Database Script node of the master index project, and are named create.sql, systems.sql, codelist.sql, and drop.sql. You can modify these scripts as needed to customize the tables, indexes, startup data, and database distribution. You can also create new database scripts if needed.
When configuring the master index database, there are several factors to consider, including basic software requirements, operating systems, disk space, and so on. This section provides a summary of requirements for the database. For more detailed information about designing and implementing the database, refer to the appropriate database platform documentation. The person responsible for the database configuration should be a database administrator familiar with the master index database and with your data processing requirements.
The master index database can be run on MySQL Enterprise Server, SQL Server, or on Oracle. For specific version information, see Java CAPS 6.3 Components and Supported External Systems in Planning for Oracle Java CAPS 6.3 Installation . You must have this software installed before beginning the database installation. Make sure you also install the latest patches for the version you are using.
The database can be installed on any operating system supported by the database platform you are using. See the documentation that came with your database server for more information.
This section describes the minimum recommended hardware configuration for a database installation. These requirements are based on the minimum requirements recommended by database vendors for a typical installation. Depending on the size of the database and expected volume, you should increase these recommendations as needed. See the documentation for your database for more information and for supported operating systems.
For information and tips about installing a MySQL database for Oracle Java CAPS Master Index, see the Chapter 2 of the MySQL 5.1 Reference Manual .
For a Windows database server, the following configuration is recommended as a minimal installation:
Windows 2000 SP3 or later, Windows XP SP2, or Windows Server 2003
Pentium 266 or later
1 GB RAM (increase this based on the number of users, connections to the database, and volume)
Virtual memory should be double the amount of RAM
3 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data). This depends on the Oracle environment you install. Enterprise Edition can take up to 5 GB.
256-color video
For a UNIX database server, the following configuration is recommended as a minimal installation:
256 MB RAM (increase this based on the number of users and connections to the database)
Swap space should be a minimum of twice the amount of RAM
2 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data).
Note - Disk space recommendations do not take into account the volume and processing requirements or the number of users. These are minimal requirements to install a generic database. At a minimum, the empty database and the database software will require 2.5 GB of disk space.
The following configuration is recommended as a minimal installation for a SQL Server database.
Pentium III-compatible processor or higher
512 MB RAM as a minimum; at least 1 GB is recommended (increase this based on the number of users, connections to the database, and volume)
3 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data). This depends on the SQL Server environment you install.
VGA or higher resolution
Note - Disk space recommendations do not take into account the volume and processing requirements or the number of users. These are minimal requirements to install a generic database. At a minimum, the empty database and the database software will require 1.6 GB of disk space.
The master index database contains some common tables that are created for all implementations and some that are customized for each implementation. The common tables include standard database system tables and supporting tables, such as sbyn_seq_table, sbyn_common_header, and sbyn_common_detail. These tables do not store information about the enterprise object structure you defined. The names of the tables that store information about the enterprise object are customized based on the object structure.
Two tables store information about the primary, or parent, object you defined: sbyn_parent_object and sbyn_parent_objectsbr, where parent_object is the name you specified for the parent object in the object structure. The sbyn_parent_object table stores parent object data from each local system and the sbyn_parent_objectsbr table stores the parent object data contained in the SBRs. Similar tables are created for each child object you defined in the object structure.
For a complete description of the database tables, see Oracle Java CAPS Master Index Processing Reference.
In designing the database, there are several factors to consider, such as the volume of data stored in the database and the number of transactions processed by the database daily. The master index database should be created in its own tablespaces. The following sections describe some of the analyses to perform along with considerations to take into account when designing the database.
The MySQL, Oracle and SQL Server installation guides provide detailed information about installing the database software for optimal performance. Both database platforms include guides containing information about monitoring and fine-tuning your database, including tuning memory, swap space, I/O, CPU usage, block and file size, and so on. You should be familiar with these concepts prior to creating the database.
Before defining the object structure, you analyzed the structure of the legacy data to help you define the object structure and the attributes of each field. You can use this data analysis to determine the amount of data that will be stored in the database, which will help you size the master index database and decide how to best distribute the database. Knowing the volume of existing data plus the expected daily transaction volume will help you plan the requirements of the database server, such as networking needs, disk space, memory, swap space, and so on.
The data structure analysis also helps you determine the processing codes and descriptions to enter in the common tables (described below), and should help you determine any default values that have been entered into certain fields that could skew the matching probability weights.
Common table data analysis involves gathering information about the abbreviations used for specific data elements in each sending system, such as system codes and codes for certain attributes of the objects in your database. For example, if you are indexing person objects, there might be processing codes for genders, such as F for female, M for male, and so on. The processing codes and their descriptions are stored in a set of database tables known as common maintenance tables. The wizard creates a script to help you load the processing codes into the database.
When an enterprise object appears on the MIDM, the master index application translates the processing codes defined in the common tables into their descriptions so the user is not required to decipher each code. The data elements stored in the common maintenance tables are also used to populate the drop-down lists that appear for certain fields in the MIDM. Users can select from these options to populate the associated fields.
User code data analysis involves gathering information about the abbreviations used for specific data elements in each sending system for a field whose format or possible values are constrained by a separate field. For example, if you store credit card information, you might have a drop-down list in the Credit Card field for each credit card type. The format of the field that stores the credit card number is dependent on the type of credit card you select. You could also use user code data to validate cities with postal codes. The abbreviations and related constraint information are stored in the sbyn_user_code table.
When you create the master index database, you need to consider several factors, such as sizing, distribution, indexes, and extents. By default, all of the master index database tables for an Oracle database are installed in the system tablespace. You should install the master index tables in different tablespaces, depending on the original size and expected volume of the database. For SQL Server, the master index tables belong to “dbo” by default.
To begin the database installation, you first create a database instance using the provided configuration tools or command line functions. Use the tools provided by the database vendor to define the tablespace and extent sizing for the database.
When you create the database instance, you can define the distribution of your system tables, data tables, rollback logs, dump files, control files, and so on. Use internal policies regarding relational database distribution to determine how to best distribute your master index database.
By default, indexes are defined for the following tables: sbyn_appl, sbyn_common_header, sbyn_common_detail, sbyn_enterprise, sbyn_transaction, sbyn_assumedmatch, sbyn_potentialduplicates, sbyn_audit, and sbyn_merge. You can create additional indexes against the database to optimize the searching and matching processes. At a minimum, it is recommended that all combinations of fields used for blocking or matching be indexed. For each query block defined in the blocking query, create an index containing the fields in that block.
The following indexes are automatically created to improve performance when running large reports from the command line or MIDM.
CREATE INDEX SBYN_POTENTIALDUPLICATES3 ON SBYN_POTENTIALDUPLICATES (TRANSACTIONNUMBER ASC);
CREATE INDEX SBYN_ASSUMEDMATCH2 ON SBYN_ASSUMEDMATCH (TRANSACTIONNUMBER ASC);
CREATE INDEX SBYN_TRANSACTION4 on SBYN_TRANSACTION (EUID2 ASC, TIMESTAMP ASC);
CREATE INDEX SBYN_TRANSACTION3 on SBYN_TRANSACTION (TIMESTAMP ASC, TRANSACTIONNUMBER ASC);
Note - To improve performance, these four indexes should be dropped prior to performing an initial load or batch load of data. They can be recreated once the load is complete if you are running the provided reports.