Developing Sun Master Indexes (Repository)

Master Index Database Scripts and Design (Repository)

Before you create the master index database, familiarize yourself with the database scripts and the database structure. Analyze your database requirements, including hardware considerations, startup data, indexing needs, performance, and so on.

The following topics provide information to help you in your analysis.

Master Index Database Scripts (Repository)

The wizard creates SQL scripts based on information you specified about code lists and external systems that you can use to define startup data for the master index application. When you generate the application, additional scripts are generated for creating or dropping database tables. These scripts appear under the Database Script node of the master index project, and are named Systems, Code List, Create Application_Name Database, and Drop Application_Name Database (where Application_Name is the name you defined for the application in the wizard). You can modify these scripts as needed to customize the tables, indexes, startup data, and database distribution. You can also create new database scripts if needed.

Master Index Database Requirements (Repository)

When configuring the master index database, there are several factors to consider, including basic software requirements, operating systems, disk space, and so on. This section provides a summary of requirements for the database. For more detailed information about designing and implementing the database, refer to the appropriate Oracle or SQL Server documentation. The person responsible for the database configuration should be an Oracle or SQL Server database administrator familiar with the master index database and with your data processing requirements.

Database Platform Requirements

The master index database can be run on SQL Server 2005 or on Oracle 9i or 10g. You must have this software installed before beginning the database installation. Make sure you also install the latest patches for the version you are using.

Operating System Requirements

The database can be installed on any operating system supported by the database platform you are using. See the Oracle or SQL Server documentation for more information.

Hardware Requirements

This section describes the minimum recommended hardware configuration for a database installation. These requirements are based on the minimum requirements recommended by Oracle and SQL Server for a typical installation. Depending on the size of the database and expected volume, you should increase these recommendations as needed. See your Oracle or SQL Server documentation for more information and for supported operating systems.

Oracle Database

For a Windows database server, the following configuration is recommended as a minimal installation:

Windows 2000 SP3 or later, Windows XP SP2, or Windows Server 2003
Pentium 266 or later
1 GB RAM (increase this based on the number of users, connections to the database, and volume)
Virtual memory should be double the amount of RAM
3 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data). This depends on the Oracle environment you install. Enterprise Edition can take up to 5 GB.
256-color video

For a UNIX database server, the following configuration is recommended as a minimal installation:

256 MB RAM (increase this based on the number of users and connections to the database)
Swap space should be a minimum of twice the amount of RAM
2 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data).

Note –
Disk space recommendations do not take into account the volume and processing requirements or the number of users. These are minimal requirements to install a generic database. At a minimum, the empty database and the database software will require 2.5 GB of disk space.

Microsoft SQL Server

The following configuration is recommended as a minimal installation for a SQL Server database.

Pentium III-compatible processor or higher
512 MB RAM as a minimum; at least 1 GB is recommended (increase this based on the number of users, connections to the database, and volume)
3 GB disk space plus an additional 2 KB for each system record to be stored in the database (note that this is a conservative estimate per system record, assuming that most records do not contain complete data). This depends on the SQL Server environment you install.
VGA or higher resolution

Note –
Disk space recommendations do not take into account the volume and processing requirements or the number of users. These are minimal requirements to install a generic database. At a minimum, the empty database and the database software will require 1.6 GB of disk space.

Master Index Database Structure (Repository)

The master index database contains some common tables that are created for all implementations and some that are customized for each implementation. The common tables include standard Oracle or SQL Server tables and supporting tables, such as sbyn_seq_table, sbyn_common_header, and sbyn_common_detail. These tables do not store information about the enterprise object structure you defined. The names of the tables that store information about the enterprise object are customized based on the object structure.

Two tables store information about the primary, or parent, object you defined: sbyn_parent_object and sbyn_parent_objectsbr, where parent_object is the name you specified for the parent object in the object structure. The sbyn_parent_object table stores parent object data from each local system and the sbyn_parent_objectsbr table stores the parent object data contained in the SBRs. Similar tables are created for each child object you defined in the object structure.

For a complete description of the database tables, see Understanding Sun Master Index Processing (Repository).

Designing the Master Index Database (Repository)

In designing the database, there are several factors to consider, such as the volume of data stored in the database and the number of transactions processed by the database daily. The master index database should be created in its own tablespaces. The following sections describe some of the analyses to perform along with considerations to take into account when designing the database.

Designing for Performance Optimization

The Oracle and SQL Server installation guides provide detailed information about installing the database software for optimal performance. Both database platforms include guides containing information about monitoring and fine-tuning your database, including tuning memory, swap space, I/O, CPU usage, block and file size, and so on. You should be familiar with these concepts prior to creating the database.

Data Structure Analysis

Before defining the object structure, you analyzed the structure of the legacy data to help you define the object structure and the attributes of each field. You can use this data analysis to determine the amount of data that will be stored in the database, which will help you size the master index database and decide how to best distribute the database. Knowing the volume of existing data plus the expected daily transaction volume will help you plan the requirements of the database server, such as networking needs, disk space, memory, swap space, and so on.

The data structure analysis also helps you determine the processing codes and descriptions to enter in the common tables (described below), and should help you determine any default values that have been entered into certain fields that could skew the matching probability weights.

Common Table Data

Common table data analysis involves gathering information about the abbreviations used for specific data elements in each sending system, such as system codes and codes for certain attributes of the objects in your database. For example, if you are indexing person objects, there might be processing codes for genders, such as F for female, M for male, and so on. The processing codes and their descriptions are stored in a set of database tables known as common maintenance tables. The wizard creates a script to help you load the processing codes into the database.

When an enterprise object appears on the EDM, the master index application translates the processing codes defined in the common tables into their descriptions so the user is not required to decipher each code. The data elements stored in the common maintenance tables are also used to populate the drop-down lists that appear for certain fields in the EDM. Users can select from these options to populate the associated fields.

User Code Data

User code data analysis involves gathering information about the abbreviations used for specific data elements in each sending system for a field whose format or possible values are constrained by a separate field. For example, if you store credit card information, you might have a drop-down list in the Credit Card field for each credit card type. The format of the field that stores the credit card number is dependent on the type of credit card you select. You could also use user code data to validate cities with postal codes. The abbreviations and related constraint information are stored in the sbyn_user_code table.

Database Considerations

When you create the master index database, you need to consider several factors, such as sizing, distribution, indexes, and extents. By default, all of the master index database tables for an Oracle database are installed in the system tablespace. You should install the master index tables in different tablespaces, depending on the original size and expected volume of the database. For SQL Server, the master index tables belong to “dbo” by default.

Database Sizing

To begin the database installation, you first create an Oracle or SQL Server database instance using the provided configuration tools. Use the tools provided by Oracle or Microsoft to define the tablespace and extent sizing for the database.

Database Distribution

When you create the database instance, you can define the distribution of your system tables, data tables, rollback logs, dump files, control files, and so on. Use internal policies regarding relational database distribution to determine how to best distribute your master index database.

Database Indexes

By default, indexes are defined for the following tables: sbyn_appl, sbyn_common_header, sbyn_common_detail, sbyn_enterprise, sbyn_transaction, sbyn_assumedmatch, sbyn_potentialduplicates, sbyn_audit, and sbyn_merge. You can create additional indexes against the database to optimize the searching and matching processes. At a minimum, it is recommended that all combinations of fields used for blocking or matching be indexed. For each query block defined in the blocking query, create an index containing the fields in that block.

The following indexes are automatically created to improve performance when running large reports from the command line or EDM.

CREATE INDEX SBYN_POTENTIALDUPLICATES3 ON SBYN_POTENTIALDUPLICATES (TRANSACTIONNUMBER ASC);

CREATE INDEX SBYN_ASSUMEDMATCH2 ON SBYN_ASSUMEDMATCH (TRANSACTIONNUMBER ASC);

CREATE INDEX SBYN_TRANSACTION4 on SBYN_TRANSACTION (EUID2 ASC, TIMESTAMP ASC);

CREATE INDEX SBYN_TRANSACTION3 on SBYN_TRANSACTION (TIMESTAMP ASC, TRANSACTIONNUMBER ASC);

Note –

To improve performance, these four indexes should be dropped prior to performing an initial load or batch load of data. They can be recreated once the load is complete if you are running the provided reports.