Understanding Sun Master Index Configuration Options (Repository)

Configuration Overview for Sun Master Index (Repository)

The files that configure the components of the master index application are created by the wizard and define characteristics of the application, such as how data is processed, queried, and matched, and how it appears on the Enterprise Data Manager (EDM). These files configure the runtime components of the master index application.

The following topics provide an overview of the configurable components of a master index application and of the configuration files that define processing properties and the data structure of the master index application. They also describe the relationships between these files.

About the Configuration Files for Sun Master Index (Repository)

Several XML configuration files define primary characteristics of the master index application, such as how data is processed, queried, and matched. These files configure runtime components of the master index application.

The configuration files include the following:

Master Index Object Definition File

In the wizard, you define the objects and fields contained in the object structure, along with properties for those fields. The information you specify is written to the Object Definition file in the master index project. This file defines the objects stored in the master index application and their relationships to one another. It also defines the fields contained in each object, as well as certain properties of each field, such as length, data type, whether it is required, whether it is a unique key, and so on. This file contains one parent object; all other objects must be child objects to that parent object. The object structure you define in the Object Definition file determines the structure of the database tables that store object data, the structure of the Java API, and the structure of the OTD generated for the project.

Master Index Candidate Select File

The Query Builder component of the master index application is configured in the Candidate Select file, which defines the available queries. In this file, you define the types of queries that can be performed from the EDM and the queries that are used during the match process. You can define both phonetic and alphanumeric searches for the EDM. By default, these are called basic queries. You can also define blocking queries, which define blocks of criteria fields for the match process. The master index application queries the database using the criteria defined in each block, one at a time. After completing a query on the criteria defined in one block, it performs another pass using the next block of defined criteria. Blocking queries can also be used in place of the basic phonetic query in the EDM.

Master Index Match Field File

In the Match Field file, you configure the Matching Service by specifying the fields to be standardized and the fields to be used for matching, as well as defining how the fields are standardized and matched. It also specifies the match and standardization engines to use and the query process for matching. Standardization includes defining fields to be reformatted (or parsed), normalized, or converted to their phonetic version. For matching, you must also define the data string to be passed to the match engine. The rules you define for standardization and matching are dependent on the match and standardization engines in use. Understanding the Sun Match Engine describes the rules for the Sun Match Engine.

In addition, the Threshold file, described below, also configures the match process by defining certain match parameters that define weight thresholds, how assumed matches are processed, and how potential duplicates are processed. It also specifies the query to use for matching.

Master Index Threshold File

The Threshold file configures the Manager Service and defines properties of the match process. You specify the match and duplicate thresholds in this file, and define certain system parameters, such as the update mode, how to process records above the match threshold, how to manage same system matches, and whether merged records can be updated. This file also specifies which of the queries defined in the Query Builder to use for matching queries.

The Threshold file also configures the EUIDs assigned by the master index application. You can specify an EUID length, whether a checksum value is used for additional verification, and a “chunk size”. Specifying a chunk size allows the EUID generator to obtain a block of EUIDs from the sbyn_seq_table database table so it does not need to query the table each time it generates a new EUID.

Master Index Best Record File

In the Best Record file, you can define formulas that determine which data in an enterprise record should be considered the most reliable and how updates to the single best record (SBR) will be handled. The survivor calculator uses these formulas to decide what data from each system record to include in each object’s SBR. The SBR is the portion of the enterprise record that represents the data that is considered to be the most accurate and current for an object.

The SBR is defined by a mapping of fields from external system records. Since there might be many external systems, you can optionally specify a strategy to select the value for an SBR field from the list of external values. You can also specify any additional fields that might be required by the selection strategy to determine which external system contains the best data, such as the object’s update date and time.

This file also allows you to specify custom update procedures that you define in custom Java code you can plug in to the application. You can create Java classes that define special processing to perform against a record when the record is created, updated, merged, or unmerged. These classes must be created in the Custom Plug-ins module and can be specified for each transaction type in the Best Record file.

Master Index Field Validation File

By default, the Field Validation file (validation.xml) defines certain validations for the local identifiers assigned by each external system. You can create custom Java classes that define rules for validating field values before they are saved to the master index database. You can then specify the Java classes in the Field Validation file to make them part of the Sun Master Index application.

Master Index Security File

This file is not currently used, and is a placeholder to be used in future versions.

Master Index Enterprise Data Manager File

Configuration of the appearance and certain processing properties of the EDM is contained in the Enterprise Data Manager file. In this file, you define each object and field that appears on the EDM, along with the properties of each field, such as the field type and length, field labels, format masks, and so on. You can also define the order in which objects and fields appear on the EDM pages.

This file defines several additional properties of the EDM, including the types of searches available, whether wildcard characters can be used, the criteria for the searches, and the results fields that appear. You can also specify whether an audit log is maintained of each instance data is accessed through the EDM. For healthcare-based master index applications, such as Sun Master Patient Index (an application built on the Sun Master Index platform), this supports the privacy rules mandated by the HIPAA regulation for healthcare. This file also includes the configuration of the reports generated from the EDM.

Finally, the Enterprise Data Manager file defines certain implementation information, such as the application server in use, debugging rules, and security activation.

The files that configure the components of the master index application are created by the wizard and define characteristics of the application, such as how data is processed, queried, and matched, and how it appears on the Enterprise Data Manager (EDM). These files configure the runtime components of the master index application.

Match and Standardization Engine Configuration Files

Several match and standardization engine configuration files are included in the project tree. You can customize matching logic and standardization information for the match and standardization engines by modifying these files. The match configuration file, which defines and configures the comparator functions, can be modified using the Configuration Editor - Repository or the NetBeans text editor. The standardization files, which provide information to the standardization engine about how data should be parsed and normalized, can be modified using the text editor.

For information about the structure of these files and how they can be modified, see Understanding the Sun Match Engine.

Using the Editors for Sun Master Index (Repository)

You can use the NetBeans XML editor or the Configuration Editor - Repository to modify the configuration files created by the wizard. The Configuration Editor provides a series of windows to help guide you though the configuration of master index application components. The NetBeans XML editor allows you to modify the XML code directly.

The following topics provide additional information about the editors:

XML Editors

If you are familiar with XML, you can configure the master index applications by modifying the XML code directly. Use caution when modifying the XML files because there are dependencies between files. For example, all fields listed in any of the configuration files must also be defined in the Object Definition file. Any queries referenced in the Enterprise Data Manager file must also be defined in the the Candidate Select file.

Configuration Editor - Repository

The Configuration Editor - Repository allows you to modify most, but not all, configuration elements for a master index application using a graphical user interface. You can also use the editor to modify the match configuration file for the Sun Match Engine, but not to modify the standardization configuration files. While you can use the Configuration Editor to modify most of the configuration files, some elements can only be modified using the NetBeans XML editor. Following is a summary of which features can be configured using the Configuration Editor and which need to be modified using the XML editor.

Object Definition File

You can modify most elements of the Object Definition file using the Configuration Editor. The following can only be modified using the XML editor:

It is not recommended that you change the database type, but if you modify the database type or date format elements, you need to regenerate the application to create the updated database scripts. This does not recreate the Systems or Code Lists scripts; you need to update those manually.

Candidate Select File

You can modify all elements in the Candidate Select file using the Configuration Editor. If you create a query to use in the Enterprise Data Manager (EDM) or to use for the matching query, you need to add the query to the appropriate file (the Threshold file or the Enterprise Data Manager file) manually.

Threshold File

Most elements in the Threshold file cannot be modified using the Configuration Editor. You can modify the duplicate and match thresholds from the Configuration Editor.

Match Field File

You can use the Configuration Editor to modify all commonly modified elements in the Match Field file, including defining standardization structures, normalization structures, and phonetic encoding. If you create custom classes to implement a block picker, pass controller, match engine, or standardization engine, you need to specify the implementation classes in this file using the XML editor.

Best Record File

The Configuration Editor does not modify the Best Record file. If you make any changes to the object structure, review this file to verify that all fields or objects are included in the survivor strategy and that the field and object names are correct.

Field Validation File

The Configuration Editor does not modify the Field Validation file. If you create a custom field validation class, you need to specify the implementation class in this file using the XML editor.

Enterprise Data Manager File

Several elements in the Enterprise Data Manager file are not modified using the Configuration Editor. You can add and delete fields that appear on the EDM and modify the display name and the value and input masks. All other field properties can only be modified using the XML editor.

Field integrity is maintained when you delete a field using the Configuration Editor. The field is automatically deleted from the EDM object structure and from any EDM page definitions that include the field, such as a search page or report.

Match Configuration File

You can modify all components of the Match Configuration file using the Configuration Editor, including adding and removing comparators. The Configuration Editor does not validate the extra parameters that can be used for certain comparators, so you should verify your changes by reviewing the match configuration file manually.