The topics listed here provide information about the configuration files for the master index application. They also describe what the configuration options mean and how they affect master index processing.
Note that Java CAPS includes two versions of Sun Master Index. Sun Master Index (Repository) is installed in the Java CAPS repository and provides all the functionality of previous versions in the new Java CAPS environment. Sun Master Index is a service-enabled version of the master index that is installed directly into NetBeans. It includes all of the features of Sun Master Index (Repository) plus several new features, like data analysis, data cleansing, data loading, and an improved Data Manager GUI. Both products are components of the Sun Master Data Management (MDM) Suite. This document relates to Sun Master Index (Repository) only.
The above topics are reference only. For instructions on configuring a master index application, see Configuring Sun Master Indexes (Repository).
Several topics provide information and instructions for implementing and using a Repository-based master index application. For a complete list of topics related to working with Sun Master Index (Repository), see Related Topics in Developing Sun Master Indexes (Repository).
Sun Master Index provides a flexible framework that allows you to create matching and indexing applications called enterprise-wide master index applications. It is an application building tool to help you design, configure, and create a master index application that will uniquely identify and cross-reference the business objects stored in your system databases. Business objects can be any type of entity for which you store information, such as customers, patients, vendors, businesses, inventory, and so on.
The following topics provide additional information about Sun Master Index:
In Sun Master Index, you define the data structure of the business objects to be stored and cross-referenced. In addition, you define the logic that determines how data is updated, standardized, weighted, and matched in the master index database. The structure and logic you define is located in a group of XML configuration files that you create using the wizard. These files are created within the context of a NetBeans project, and can be further customized using the either the Configuration Editor - Repository or the NetBeans XML editor. This document describes the structure of the XML files and how each configuration option affects the master index application.
Sun Master Index provides features and functions to allow you to create and configure master index application for any type of data. The primary function of Sun Master Index is to automate the creation of a highly configurable master index application. A wizard guides you through the initial setup steps, and the Configuration Editor - Repository allows you to further customize the configuration of the master index application. The components you need to implement a master index application are automatically generated.
Sun Master Index provides the following features:
Rapid Development - Rapid and intuitive development of a master index application using a wizard to create the master index configuration and using XML documents to configure the attributes of the index. Templates are provided for quick development of person and company object structures.
Automated Component Generation - Sun Master Index automatically creates the configuration files that define the primary attributes of the master index application, including the configuration of the Enterprise Data Manager (EDM). Sun Master Index also generates scripts that create the appropriate database schemas and an Object Type Definition (OTD) based on the object definition you create and configure.
Configurable Survivor Calculator - Sun Master Index provides predefined strategies for determining which field values to populate in the single best record (SBR). You can define different survivor rules for each field, and you can create a custom survivor strategy to implement in the master index application.
Flexible Architecture - Sun Master Index provides a flexible platform that allows you to create a master index application for any business object. You can customize the object structure so the master index application can match and store any type of data, allowing you to design an application that specifically meets your data processing needs.
Configurable Matching Algorithm - Sun Master Index provides standard support for the Sun Match Engine. In addition, you can plug in a custom matching algorithm to the master index application.
Custom Java API - Sun Master Index generates a Java API that is customized to the object structure you define. You can call the methods in this API in the Collaborations that define the transformation rules for data processed by the master index application.
Standard Reports - Sun Master Index provides a set of standard reports with each master index application that can be run from a command line or from the EDM. The reports help you monitor the state of the data stored in the master index application and help you identify configuration changes that might be required.
The files that configure the components of the master index application are created by the wizard and define characteristics of the application, such as how data is processed, queried, and matched, and how it appears on the Enterprise Data Manager (EDM). These files configure the runtime components of the master index application.
The following topics provide an overview of the configurable components of a master index application and of the configuration files that define processing properties and the data structure of the master index application. They also describe the relationships between these files.
Several XML configuration files define primary characteristics of the master index application, such as how data is processed, queried, and matched. These files configure runtime components of the master index application.
The configuration files include the following:
In the wizard, you define the objects and fields contained in the object structure, along with properties for those fields. The information you specify is written to the Object Definition file in the master index project. This file defines the objects stored in the master index application and their relationships to one another. It also defines the fields contained in each object, as well as certain properties of each field, such as length, data type, whether it is required, whether it is a unique key, and so on. This file contains one parent object; all other objects must be child objects to that parent object. The object structure you define in the Object Definition file determines the structure of the database tables that store object data, the structure of the Java API, and the structure of the OTD generated for the project.
The Query Builder component of the master index application is configured in the Candidate Select file, which defines the available queries. In this file, you define the types of queries that can be performed from the EDM and the queries that are used during the match process. You can define both phonetic and alphanumeric searches for the EDM. By default, these are called basic queries. You can also define blocking queries, which define blocks of criteria fields for the match process. The master index application queries the database using the criteria defined in each block, one at a time. After completing a query on the criteria defined in one block, it performs another pass using the next block of defined criteria. Blocking queries can also be used in place of the basic phonetic query in the EDM.
In the Match Field file, you configure the Matching Service by specifying the fields to be standardized and the fields to be used for matching, as well as defining how the fields are standardized and matched. It also specifies the match and standardization engines to use and the query process for matching. Standardization includes defining fields to be reformatted (or parsed), normalized, or converted to their phonetic version. For matching, you must also define the data string to be passed to the match engine. The rules you define for standardization and matching are dependent on the match and standardization engines in use. Understanding the Sun Match Engine describes the rules for the Sun Match Engine.
In addition, the Threshold file, described below, also configures the match process by defining certain match parameters that define weight thresholds, how assumed matches are processed, and how potential duplicates are processed. It also specifies the query to use for matching.
The Threshold file configures the Manager Service and defines properties of the match process. You specify the match and duplicate thresholds in this file, and define certain system parameters, such as the update mode, how to process records above the match threshold, how to manage same system matches, and whether merged records can be updated. This file also specifies which of the queries defined in the Query Builder to use for matching queries.
The Threshold file also configures the EUIDs assigned by the master index application. You can specify an EUID length, whether a checksum value is used for additional verification, and a “chunk size”. Specifying a chunk size allows the EUID generator to obtain a block of EUIDs from the sbyn_seq_table database table so it does not need to query the table each time it generates a new EUID.
In the Best Record file, you can define formulas that determine which data in an enterprise record should be considered the most reliable and how updates to the single best record (SBR) will be handled. The survivor calculator uses these formulas to decide what data from each system record to include in each object’s SBR. The SBR is the portion of the enterprise record that represents the data that is considered to be the most accurate and current for an object.
The SBR is defined by a mapping of fields from external system records. Since there might be many external systems, you can optionally specify a strategy to select the value for an SBR field from the list of external values. You can also specify any additional fields that might be required by the selection strategy to determine which external system contains the best data, such as the object’s update date and time.
This file also allows you to specify custom update procedures that you define in custom Java code you can plug in to the application. You can create Java classes that define special processing to perform against a record when the record is created, updated, merged, or unmerged. These classes must be created in the Custom Plug-ins module and can be specified for each transaction type in the Best Record file.
By default, the Field Validation file (validation.xml) defines certain validations for the local identifiers assigned by each external system. You can create custom Java classes that define rules for validating field values before they are saved to the master index database. You can then specify the Java classes in the Field Validation file to make them part of the Sun Master Index application.
This file is not currently used, and is a placeholder to be used in future versions.
Configuration of the appearance and certain processing properties of the EDM is contained in the Enterprise Data Manager file. In this file, you define each object and field that appears on the EDM, along with the properties of each field, such as the field type and length, field labels, format masks, and so on. You can also define the order in which objects and fields appear on the EDM pages.
This file defines several additional properties of the EDM, including the types of searches available, whether wildcard characters can be used, the criteria for the searches, and the results fields that appear. You can also specify whether an audit log is maintained of each instance data is accessed through the EDM. For healthcare-based master index applications, such as Sun Master Patient Index (an application built on the Sun Master Index platform), this supports the privacy rules mandated by the HIPAA regulation for healthcare. This file also includes the configuration of the reports generated from the EDM.
Finally, the Enterprise Data Manager file defines certain implementation information, such as the application server in use, debugging rules, and security activation.
The files that configure the components of the master index application are created by the wizard and define characteristics of the application, such as how data is processed, queried, and matched, and how it appears on the Enterprise Data Manager (EDM). These files configure the runtime components of the master index application.
Several match and standardization engine configuration files are included in the project tree. You can customize matching logic and standardization information for the match and standardization engines by modifying these files. The match configuration file, which defines and configures the comparator functions, can be modified using the Configuration Editor - Repository or the NetBeans text editor. The standardization files, which provide information to the standardization engine about how data should be parsed and normalized, can be modified using the text editor.
For information about the structure of these files and how they can be modified, see Understanding the Sun Match Engine.
You can use the NetBeans XML editor or the Configuration Editor - Repository to modify the configuration files created by the wizard. The Configuration Editor provides a series of windows to help guide you though the configuration of master index application components. The NetBeans XML editor allows you to modify the XML code directly.
The following topics provide additional information about the editors:
If you are familiar with XML, you can configure the master index applications by modifying the XML code directly. Use caution when modifying the XML files because there are dependencies between files. For example, all fields listed in any of the configuration files must also be defined in the Object Definition file. Any queries referenced in the Enterprise Data Manager file must also be defined in the the Candidate Select file.
The Configuration Editor - Repository allows you to modify most, but not all, configuration elements for a master index application using a graphical user interface. You can also use the editor to modify the match configuration file for the Sun Match Engine, but not to modify the standardization configuration files. While you can use the Configuration Editor to modify most of the configuration files, some elements can only be modified using the NetBeans XML editor. Following is a summary of which features can be configured using the Configuration Editor and which need to be modified using the XML editor.
Object Definition File
You can modify most elements of the Object Definition file using the Configuration Editor. The following can only be modified using the XML editor:
Database type
Date format
Maximum field value
Minimum field value
It is not recommended that you change the database type, but if you modify the database type or date format elements, you need to regenerate the application to create the updated database scripts. This does not recreate the Systems or Code Lists scripts; you need to update those manually.
Candidate Select File
You can modify all elements in the Candidate Select file using the Configuration Editor. If you create a query to use in the Enterprise Data Manager (EDM) or to use for the matching query, you need to add the query to the appropriate file (the Threshold file or the Enterprise Data Manager file) manually.
Threshold File
Most elements in the Threshold file cannot be modified using the Configuration Editor. You can modify the duplicate and match thresholds from the Configuration Editor.
Match Field File
You can use the Configuration Editor to modify all commonly modified elements in the Match Field file, including defining standardization structures, normalization structures, and phonetic encoding. If you create custom classes to implement a block picker, pass controller, match engine, or standardization engine, you need to specify the implementation classes in this file using the XML editor.
Best Record File
The Configuration Editor does not modify the Best Record file. If you make any changes to the object structure, review this file to verify that all fields or objects are included in the survivor strategy and that the field and object names are correct.
Field Validation File
The Configuration Editor does not modify the Field Validation file. If you create a custom field validation class, you need to specify the implementation class in this file using the XML editor.
Enterprise Data Manager File
Several elements in the Enterprise Data Manager file are not modified using the Configuration Editor. You can add and delete fields that appear on the EDM and modify the display name and the value and input masks. All other field properties can only be modified using the XML editor.
Field integrity is maintained when you delete a field using the Configuration Editor. The field is automatically deleted from the EDM object structure and from any EDM page definitions that include the field, such as a search page or report.
Match Configuration File
You can modify all components of the Match Configuration file using the Configuration Editor, including adding and removing comparators. The Configuration Editor does not validate the extra parameters that can be used for certain comparators, so you should verify your changes by reviewing the match configuration file manually.
The properties for the objects you will store in the master index database are defined in the Object Definition file. This file defines the parent and child objects to be indexed and the fields contained in each object, including key properties for each field, such as the field size, unique record identifiers, and whether certain fields are required or can be updated. After you define the master index framework and create the configuration files, you can modify the object structure that you defined.
The Object Definition is used as a basis for most of the master index application components. The information you specify for this file defines the dynamic Java API and the database structure for the primary tables that store object information in the master index application.
The following topics describe the Object Definition file, which defines the object structure.
The object definition includes three primary components that together define the structure of the data in the master index application, the database structure, and the method OTD. Most configuration files in the master index application rely on the objects and fields defined in the Object Definition. For example, the fields you specify for the match string, queries, standardization, and the survivor calculator must all be defined in the Object Definition.
The following topics describe each component of the object definition:
In a master index application, information is stored in objects. Each object in the data structure represents a different type of information. For example, if you are indexing businesses, you might have one object type to store general information about the business (such as the business name and type), one to store address information, and one to store contact information. When indexing personal information, you might have one object type to store general information about the person (such as their name, date of birth, and gender), one to store address information, and one to store telephone information. The object structure can have several objects, but only one primary object (called the parent object). This object is the parent to all other objects defined in the Object Definition. The object structure can have multiple child objects or no child objects at all.
Generally, a record in the master index application has information in one parent object and multiple child objects. A record can also have multiple instances of each child object. For example, in the person index example above, a record for a single person would have one name, one date of birth, and one gender, all three stored in the parent object. However, the same record might have several different addresses, each of which is stored in a separate Address object.
Each object in the object structure contains fields that store the data elements of the object. You can specify properties for each field in the object structure, such as a length, name, data type, formatting rules, and so on. The fields you define in the object structure also determine the structure of the method OTD and the database tables. You can also specify certain properties for each field that determine how the database columns are defined, including the length, name, and required data type.
In the Object Definition, you must specify the parent and child objects. The object structure must contain one parent object. All remaining objects defined in the structure must be specified as child objects to that parent object.
The object structure is defined in the Object Definition file in XML format. The information entered into the default configuration file is based on the objects and fields you defined in the wizard. Depending on how completely you defined the object structure in the wizard, this file should not require customization.
The following topics provide information about working with the Object Definition file:
When you use the wizard to define the object structure, all the configuration files for the master index application are automatically generated based on the information you provide. You can modify the Object Definition file at any time prior to deploying the associated project, but you must regenerate the application and redeploy the project after doing so. If you modify the object structure using the configuration editor, the remaining configuration files are updated accordingly to keep them synchronized. If you update object structure by modifying the file directly, you also need to update the remaining configuration files. For example, if you modify the file directly and you delete a field from the object structure that also appears on the EDM, appears in the SBR, and is defined for standardization and matching, you must remove the field from the Enterprise Data Manager file, the Best Record file, and the Match Field file. Any changes made to the file without regenerating the project will not take effect.
The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 1 lists each element in the Object Definition file and provides a description of each element along with any requirements or constraints for each element.
Table 1 Object Definition File Structure
Following is a short sample illustrating the elements in the Object Definition file. The DOB field shows usage of the minimum-value element, the SSN field shows usage of the pattern element, and the AddressType field illustrates the code-module element. The AddressType field also has the key-type set to true, meaning that each record can only contain one address of each address type.
<name>Person</name> <database>oracle</database> <dateformat>MM/dd/yyyy</dateformat> <nodes> <tag>Person</tag> <fields> <field-name>LastName</field-name> <field-type>string</field-type> <size>40</size> <updateable>true</updateable> <required>true</required> <key-type>false</key-type> </fields> <fields> <field-name>FirstName</field-name> <field-type>string</field-type> <size>40</size> <updateable>true</updateable> <required>true</required> <key-type>false</key-type> </fields> <fields> <field-name>DOB</field-name> <field-type>date</field-type> <updateable>true</updateable> <required>true</required> <minimum-value>1900-01-01</minimum-value> <key-type>false</key-type> </fields> <fields> <field-name>SSN</field-name> <field-type>string</field-type> <size>16</size> <updateable>true</updateable> <required>false</required> <pattern>[0-9]{9}</pattern> <key-type>false</key-type> </fields> </nodes> <nodes> <tag>Address</tag> <fields> <field-name>AddressType</field-name> <field-type>string</field-type> <size>8</size> <updateable>true</updateable> <required>true</required> <code-module>ADDRTYPE</code-module> <key-type>true</key-type> </fields> ... </nodes> <nodes> <tag>Phone</tag> ... </nodes> <relationships> <name>Person</name> <children>Address</children> <children>Phone</children> </relationships> |
In the Candidate Select file, you configure properties of the Query Builder, which is a class that uses defined criteria and options to generate queries and query results from a master index database. The criteria and options used by the Query Builder to create database queries are defined in the Candidate Select file. The criteria must be fields that are defined in the Object Definition, and the options are key and value pairs that fine-tune the query operation. You can define the characteristics of the searches performed from the Enterprise Data Manager and of the queries used by the master index application to search for a candidate pool of potential matches for incoming records.
The following topics provide information about queries and the structure of the Candidate Select file:
The master index application performs two types of queries. Users perform manual queries from the EDM and the master index application automatically performs queries before processing matches for an incoming record. Two types of queries, basic queries and blocking queries, are predefined in the Query Builder. By default, basic queries are defined for the EDM and blocking queries are defined for match processing, though this is not required. You can also use a blocking query for the phonetic searches performed from the EDM. Both types of queries are configured by the Candidate Select file, and custom queries can be created and implemented with the master index application.
You can configure certain query properties. You can configure both basic and blocking queries to search on standardized or phonetic versions of the search criteria, and you can also specify that they search on exact values or a range of values. Basic queries can be configured to allow wildcard characters. For the blocking queries, you define the criteria to include in each block of query criteria.
The following topics provide additional information about the different types of queries:
By default, searches performed from the EDM follow the logic defined in the configured basic queries. You can specify which query type to use for each search defined for the EDM (this is specified in the Enterprise Data Manager file). These searches can be weighted, which means that the match engine calculates the likelihood that the search results match the actual search criteria and assigns a matching weight to each returned record. You can specify whether the search is performed on the original or phonetic version of the criteria.
The basic query uses all supplied search criteria to create a single SQL query. For this query, each field in the WHERE clause is joined by an AND operator, meaning that only records that match on all search fields are returned. This query has an option to allow wildcard characters in the search criteria (a percent sign (%) indicates multiple unknown characters). When this option is set to true, the query uses the LIKE operator rather than EQUALS. This option allows you to search by criteria for which you have incomplete data.
The searches performed from the EDM can be further customized in the Enterprise Data Manager file (for more information, see Enterprise Data Manager Configuration).
When the master index application evaluates possible matches of records sent to the master index application from external systems and from the EDM, the index performs a set of predefined SQL queries to retrieve a subset of possible matches. These queries are known as blocking queries. The matching algorithm processes the input record against the profiles retrieved from the blocking query (known as the candidate pool) and assigns them matching probability weights.
In the Candidate Select file, you define the criteria and conditions for querying the database to retrieve the subset of possible matches to the incoming record, including Oracle hints and SQL Server OPTION hints. You can define multiple queries, known as blocks, for each blocking query, and the master index application performs each of these queries in turn until sufficient records are retrieved (called a match pass). Using the default Query Builder, a block is only processed if the search criteria include all of the fields defined for that block. Each field in a block is joined by an AND operator in the WHERE clause, and each block is joined by a UNION operator. This type of search can also be used as a phonetic search in the EDM.
The blocking queries you define here are referenced in the Threshold file, which specifies which one of the defined blocking queries to use for match processing. They might also be referenced in the Enterprise Data Manager file if a blocking query is used for phonetic searches from the EDM. To enable extensive searching (that is, searching against additional tables, such as an alias table for a person index), you must add the fields from that table to the blocking query.
You can configure both basic queries and blocking queries to perform phonetic searches from the EDM. If you use a basic query, then all entered criteria must match existing records in order to return results from the search. If you use a blocking query, several queries are performed using different combinations of data until enough matching records are returned or until all defined combinations have been tried.
For example, if you use a basic query and enter first and last name, date of birth, gender, and SSN for criteria, the basic query might not return any matches if any one of those fields does not match the criteria. However, if you use a blocking query for the same example, it might search on SSN, then on first name and date of birth, and then on last name and gender. The query returns any matching records from any of the query passes.
Both basic and blocking queries can be configured to perform exact searches or range searches. An exact search performs a query for the exact value entered into a field as search criteria; range searches perform a query on a range of values based on the value entered into a field as search criteria. The basic query supports standard range searching, where both the lower and upper limits of the range is supplied. The blocking query supports standard range searching plus two additional types that use predefined offset values or constants.
Offset values allow you to specify values to be added to or subtracted from the entered value to determine the range on which to search. Constants provide a default value to use as a range when no value is entered or when incomplete information is available.
Range searching is configured in both the Enterprise Data Manager file and the Candidate Select file. The processing logic for different types of range searching is described in Range Search Processing (Repository).
The properties for the predefined queries are defined in the Candidate Select file in XML format. Some of the information entered into the default configuration file is based on the fields you specified for blocking in the wizard, and some is standard across all implementations. For most implementations, this file will require some customization.
The following topics provide information about working with the Candidate Select file:
You can modify the Candidate Select file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. The properties of the blocking query used by the match process should not be modified after moving into production because it can cause unexpected matching weight results. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes. Most of the components in this file can be configured using the Configuration Editor, which simplifies the process of defining queries by providing a graphical interface to perform the required tasks.
Table 2 lists each element in the Candidate Select file and provides a description of each element along with any requirements or constraints for each element.
Table 2 Candidate Select File Structure
Element/Attribute |
Description |
||||
---|---|---|---|---|---|
The configuration class for the query builders. This should not be modified. |
|||||
A list of query definitions. This element defines each query and the attributes of each query. |
|||||
A unique ID for the element. This element is used to identify the Query Builder and is referenced from the Enterprise Data Manager file when specifying the query to use on a search page. It is also referenced from the Match Field file when specifying the query to use for matching. No spaces are allowed in this attribute. |
|||||
The fully qualified name of the query class. Two default Query Builder classes are provided.
|
|||||
The fully qualified name of the class that parses the config elements for each query. This should not be modified for the default queries. |
|||||
An indicator of whether the query criteria is standardized before being passed to the query. Specify true if any fields are standardized for the query; specify false if no fields are standardized for the query. |
|||||
An indicator of whether the query criteria is phonetically encoded before being passed to the query. Specify true if any fields are phonetically encoded for the query specify; false if no fields are phonetically encoded for the query. |
|||||
The configuration information for a query. Each query-builder element contains one config element. |
|||||
One query parameter, specified by key and value attributes, as described below. This is only used by basic queries; blocking queries do not use this element. |
|||||
A parameter for the query option. For the default basic query, only the UseWildCard key is available. |
|||||
The value of the key specified by the corresponding key attribute. For the default option, UseWildCard, specify true to allow wildcard characters for that query type; otherwise specify false. When wildcard characters are enabled, you can enter a percent sign (%) to indicate multiple unknown characters. |
|||||
A list of Oracle hints or SQL Server OPTION hints and defined query criteria blocks, which are identified by unique ID numbers. |
|||||
An attribute of the block-definition element that specifies the unique ID number of each query block. Each block defined for the blocking query must be identified by a unique ID. |
|||||
A hint to add to the query to help optimize query execution. Hints are especially useful when a blocking query uses only child object fields; the hint can specify to scan the child object table first. This element is optional. For SQL Server, only OPTION hints are supported. |
|||||
A list of fields to be included in each query block, including indicators of whether a range is to be used and, if so, what type of range search to perform. |
|||||
An indicator of the type of search to perform on the field defined in the following elements. Each type of search element defines one field in a block-rule element; that is, one field in a query block. This element includes a field element, a source or constant element, and, for range searches only, a default element that defines lower and upper bounds. Specify one of the following types.
Tip – If a field is to be used for simple range searching (where the user or incoming message supplies lower and upper limits of the range are supplied) be sure to define that field for range searching in the Enterprise Data Manager file for the searches that use this query. For more complex range searches that use offset values or constants instead of user-supplied limits, do not define the field for range searching in the Enterprise Data Manager file. |
|||||
The fully qualified field name of the field to be included in the query block (for example, Enterprise.Person.Address.AddressLine1). |
|||||
The qualified field name of the source field in the object from which the criteria is obtained (for example, Person.Address.AddressLine1). An asterisk (*) can be used as a wildcard character. If the criteria should be a constant value instead of being supplied by a user or incoming message, define a constant element instead of a source element. Tip – When a field in a child object is defined for a blocking query, use the asterisk wildcard character in the ePath to the source field to ensure all instances of the child object in an incoming message are used as search criteria. Each instance is joined by an OR operator. For example, this configuration:
would result in a WHERE clause similar to this:
|
|||||
A constant value that provides the criteria for a search. Define this element instead of a source element if the criteria is a constant rather than being user defined. You can use a constant value with the following types of queries: equals, not-equals, greater-than-or-equals, and less-than-or-equals. |
|||||
A list of upper and lower limits defining a range search. If no limits are defined, the search is a simple range search in which the upper and lower values are supplied by the user or the incoming message (for example, in “Date of Birth From” and “Date of Birth To” fields). |
|||||
The lower limit of a constant or offset range search. Use a negative number for the lower limit of an offset search. This number is added to the value supplied for the search to determine the lower limit of the range. The value can be numeric, date, or string. See Range Search Processing (Repository) for more information. |
|||||
The type of range search. Define the type attribute as offset to use an offset value or as constant to define a lower constant. |
|||||
The upper limit of a constant or offset range search. The value can be numeric, date, or string. See Range Search Processing (Repository) or more information. |
|||||
The type of range search. Define the type attribute as offset to use an offset value or as constant to define an upper constant. |
Below is a sample illustrating the elements in the Candidate Select file.
<QueryBuilderConfig module-name="QueryBuilder" parser-class= "com.stc.eindex.configurator.impl.querybuilder.QueryBuilderConfiguration"> <query-builder name="ALPHA-SEARCH" class="com.stc.eindex.querybuilder.BasicQueryBuilder" parser-class="com.stc.eindex.configurator.impl.querybuilder. KeyValueConfiguration" standardize="true" phoneticize="false"> <config> <option key="UseWildcard" value="true"/> </config> </query-builder> <query-builder name="PHONETIC-SEARCH" class="com.stc.eindex.querybuilder.BasicQueryBuilder" parser-class="com.stc.eindex.configurator.impl.querybuilder. KeyValueConfiguration" standardize="true" phoneticize="true"> <config> <option key="UseWildcard" value="false"/> </config> </query-builder> <query-builder name="BLOCKER-SEARCH" class="com.stc.eindex.querybuilder.BlockerQueryBuilder" parser- class="com.stc.eindex.configurator.impl.blocker.BlockerConfig" standardize="true" phoneticize="true"> <config> <block-definition number="ID000000"> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.FnamePhonetic </field> <source>Person.FnamePhoneticCode</source> </equals> <equals> <field>Enterprise.SystemSBR.Person.LnamePhonetic </field> <source>Person.LnamePhoneticCode</source> </equals> </block-rule> </block-definition> <block-definition number="ID000001"> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.SSN</field> <source>Person.SSN</source> </equals> </block-rule> </block-definition> <block-definition number="ID000002"> <hint>ALL_ROWS</hint> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.FnamePhonetic </field> <source>Person.FnamePhoneticCode</source> </equals> <range> <field>Enterprise.SystemSBR.Person.DOB</field> <source>Person.DOB</source> <default> <lower-bound type="offset">-5</lower-bound> <upper-bound type="offset">5</upper-bound> </default> </range> <equals> <field>Enterprise.SystemSBR.Person.Gender</field> <source>Person.Gender</source> </equals> </block-rule> </block-definition> </config> </query-builder> </QueryBuilderConfig> |
Both basic and blocking queries can be configured to perform both exact searches and range searches. The following topics describe how different configurations of exact and range searches are processed.
Range searching for basic queries is configured in the search page section of the Enterprise Data Manager file by tagging the field with a “choice” attribute. When you specify a field for range searching, two corresponding fields appear on the EDM with “From” and “To” appended to the name (for example, a field named “Date of Birth” would display two fields: “Date of Birth From” and Date of Birth To”). You can also define a field for both exact and range searching by defining the field twice for the search page, once with the choice attribute set to “exact” and once with it set to “range”. In this case, three fields appear on the EDM: one with the given field name, one with “From” appended to the name, and one with “To” appended to the name.
Table 3 describes the queries formed for different exact or range search scenarios. Table 4 describes the queries formed for combination exact and range search scenarios.
The following variables are used in these tables:
field_name is the field name as specified in the search page section of the Enterprise Data Manager file (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
Field Configuration in the Enterprise Data Manager file |
Resulting Fields on EDM |
Fields Populated for Search |
Where Clause |
---|---|---|---|
choice attribute set to “exact” |
field_name |
field_name |
where field_name = value |
choice attribute set to “range” |
field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
choice attribute set to “range” |
field_name From field_name To |
field_name From |
where field_name >= value_from |
choice attribute set to “range” |
field_name From field_name To |
field_name To |
where field_name <= value_to |
In the following table, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 4 Combination Exact and Range Queries
Field Configuration in the Enterprise Data Manager file |
Resulting Fields on EDM |
Fields Populated for Search |
Where Clause |
---|---|---|---|
field defined once with choice attribute set to “exact” and once with it set to “range” |
field_name field_name From field_name To |
field_name |
where field_name = value |
field defined once with choice attribute set to “exact” and once with it set to “range” |
field_name field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
field defined once with choice attribute set to “exact” and once with it set to “range” |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from |
field defined once with choice attribute set to “exact” and once with it set to “range” |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to |
field defined once with choice attribute set to “exact” and once with it set to “range” * |
field_name field_name From field_name To |
field_name field_name From |
where field_name >= value_from |
field defined once with choice attribute set to “exact” and once with it set to “range” * |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to |
field defined once with choice attribute set to “exact” and once with it set to “range” * |
field_name field_name From field_name To |
field_name field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
Blocking queries are configured in the Candidate Select file, and, if the blocking query is used on the EDM, in the Enterprise Data Manager file. In order for the fields defined for range searching in the blocking query to appear on the EDM, the fields must be configured correctly in the Enterprise Data Manager file.
In addition to the standard range searching (described under Basic Query Range Searching), blocking queries support constant and offset range searches, allowing you to specify default upper and lower offset values or to specify upper and lower constant limits. Using offsets adds the specified values to the actual field value to determine the range on which to search. Note that this means the lower offset value should be a negative number and the upper offset value should be a positive number in order to create a valid range. You can also define a combination of a constant upper limit with lower offset value or a constant lower limit with an upper offset value.
When upper and lower offset values are defined, the application searches for values that are greater than or equal to the field value plus the lower offset value (which is typically a negative number) and less than or equal to the field value plus the upper offset value. You do not need to define both an upper and a lower offset value.
For date fields, the method for adding the offsets is different for numeric than for date type fields. For numeric data types, the offset value is added to the actual number. For date data types, the offset value is added to the day portion of the date (for example, if the offsets were -5 and +5 and the date entered is 01/10/2005, then the upper and lower bounds would be 01/05/2005 and 01/15/2005).
Table 5 describes the queries formed for different exact or range offset search scenarios. Table 6 describes the query formed for combination exact and offset range search scenarios.
The following variables are used in these tables:
field_name is the field name as specified in the search page section of the Enterprise Data Manager file (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower offset value
upper is the upper offset value
Field Configuration in the Enterprise Data Manager file |
Resulting Fields on EDM |
Offset Configuration in the Candidate Select file |
Fields Populated for Search |
Where Clause |
---|---|---|---|---|
choice attribute set to “exact” |
field_name |
both upper and lower offsets defined |
field_name |
where field_name >= (value + lower) and field_name <= (value + upper) |
choice attribute set to “exact” |
field_name |
only lower offset defined |
field_name |
where field_name >= (value + lower) |
choice attribute set to “exact” |
field_name |
only upper offset defined |
field_name |
where field_name <= (value + upper) |
choice attribute set to “range” |
field_name From field_name To |
upper, lower, or both offsets are defined |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
choice attribute set to “range” |
field_name From field_name To |
upper, lower, or both offsets are defined |
field_name From |
where field_name >= value_from |
choice attribute set to “range” |
field_name From field_name To |
upper, lower, or both offsets are defined |
field_name To |
where field_name <= value_to |
In Table 6, the field configuration in the Enterprise Data Manager file defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 6 Combination Offset Range Queries
Offset Configuration in the Candidate Select file |
Fields on EDM |
Fields Populated for Search |
Query Result |
---|---|---|---|
both upper and lower bound offsets are defined |
field_name field_name From field_name To |
field_name |
where field_name >= (value + lower) and field_name <= (value + upper) |
only a lower offset is defined |
field_name field_name From field_name To |
field_name |
where field_name >= (value + lower) |
only an upper offset is defined |
field_name field_name From field_name To |
field_name |
where field_name <= (value + upper) |
upper, lower, or both offsets are defined |
field_name field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
upper, lower, or both offsets are defined |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from |
upper, lower, or both offsets are defined |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to |
both upper and lower offsets are defined |
field_name field_name From field_name To |
field_name field_name From |
where field_name >= value_from and field_name <= (value + upper) |
only a lower offset is defined |
field_name field_name From field_name To |
field_name field_name From |
where field_name >= (value + lower) |
only an upper offset is defined |
field_name field_name From field_name To |
field_name field_name From |
where field_name <= (value + upper) |
both upper and lower offsets are defined |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to and field_name >= (value + lower) |
only a lower offset is defined |
field_name field_name From field_name To |
field_name field_name To |
where field_name >= (value + lower) |
only an upper offset is defined |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= (value + upper) |
both upper and lower offsets are defined* |
field_name field_name From field_name To |
field_name field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
When you define upper and lower constants for a field, these values are used for the WHERE clause of the query if no data is passed in as search criteria for that field. They are also used when only one of the “from” or “to” fields is populated. You do not need to define both an upper and a lower constant value. If you define only an upper constant value, only a “less than or equals” clause is used in the query; if you define only a lower constant value, only a “greater than or equals” clause is used in the query.
For numeric type fields, the constant must be defined as all digits, with one decimal point allowed. For date type fields, the constant must be in the standard SQL format of yyyy-mm-dd.
Table 7 describes the queries formed for different exact or range constant search scenarios. Table 8 describes the query formed for combination exact and range search scenarios.
The following variables are used in these tables:
field_name is the field name as defined in the search page section of the Enterprise Data Manager file (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower constant value
upper is the upper constant value
Field Configuration in the Enterprise Data Manager file |
Resulting Fields on EDM |
Fields Populated for Search |
Where Clause |
---|---|---|---|
choice attribute set to “exact” |
field_name |
field_name |
where field_name = value |
choice attribute set to “range” |
field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
choice attribute set to “range” |
field_name From field_name To |
field_name From |
where field_name >= value_from and field_name <= upper |
choice attribute set to “range” |
field_name From field_name To |
field_name To |
where field_name <= value_to and field_name >= lower |
In Table 8, the field configuration in the Enterprise Data Manager file defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 8 Combination Constant Range Queries
Offset Configuration in the Candidate Select file |
Fields on EDM |
Fields Populated for Search |
Query Result |
---|---|---|---|
upper, lower, or both constants are defined |
field_name field_name From field_name To |
field_name |
where field_name = value |
upper, lower, or both constants are defined |
field_name field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
either upper or both constants are defined |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from and field_name <= upper |
lower constant is defined |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from |
either upper or both constants are defined |
field_name field_name From field_name To |
field_name field_name From |
where field_name >= value_from and field_name <= upper |
lower constant is defined * |
field_name field_name From field_name To |
field_name field_name From |
where field_name >= value_from |
either lower or both constants are defined |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to and field_name >= lower |
upper constant is defined |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to |
either lower or both constants are defined |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to and field_name >= lower |
upper constant is defined * |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to |
upper, lower, or both constants are defined * |
field_name field_name From field_name To |
field_name field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
You can use a combination of offset and constant values to define range searching for a field.Table 9 describes the query formed for combination offset and constant search scenarios.
The following variables are used in these tables:
field_name is the field name as defined in the search page section of the Enterprise Data Manager file (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower constant or offset value
upper is the upper constant or offset value
In Table 9, the field configuration in the Enterprise Data Manager file defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 9 Combination Constant and Offset Range Queries
Offset Configuration in the Candidate Select file |
Fields on EDM |
Fields Populated for Search |
Query Result |
---|---|---|---|
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name |
where field_name >= lower and field_name <= (value + upper) |
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from |
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to and field_name >= lower |
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name field_name From |
where field_name <= (value + upper) and field_name >= value_from |
upper offset and lower constant are defined |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to and field_name >= lower |
upper offset and lower constant are defined * |
field_name field_name From field_name To |
field_name field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
upper constant and lower offset are defined |
field_name field_name From field_name To |
field_name |
where field_name <= upper and field_name >= (value + lower) |
upper constant and lower offset are defined |
field_name field_name From field_name To |
field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
upper constant and lower offset are defined |
field_name field_name From field_name To |
field_name From |
where field_name >= value_from and field_name <= upper |
upper constant and lower offset are defined |
field_name field_name From field_name To |
field_name To |
where field_name <= value_to |
upper constant and lower offset are defined * |
field_name field_name From field_name To |
field_name field_name From |
where field_name <= upper and field_name >= value_from |
upper constant and lower offset are defined * |
field_name field_name From field_name To |
field_name field_name To |
where field_name <= value_to and field_name >= (value + lower) |
upper constant and lower offset are defined + |
field_name field_name From field_name To |
field_name field_name From field_name To |
where field_name >= value_from and field_name <= value_to |
In the Threshold file, you define certain system parameters for the Manager Service, such as matching thresholds, EUID properties, and the blocking query to use for match processing. The Manager Service is the main interface of the indexing system. This interface coordinates all components of the master index application, including the database, master index project, Enterprise Data Manager, runtime environment, and match engine. The main interface is a stateless session bean, though some methods return objects that have handles to stateful beans.
The following topics describe the Manager Service and the Threshold file.
In the Threshold file, you define certain properties of the match process, such as duplicate and match thresholds, the query to use for matching, logic for automatic merges, and properties of the EUIDs assigned by the master index application (such as their length and whether a checksum value is used). This file is also used to define the update mode (optimistic or pessimistic) and merged record updates.
The following topics describe the configurable components in the Threshold file:
Custom logic classes specify any custom plug-ins created for the master index project that define custom processing for the execute match methods. If no classes are specified, execute match processing is carried out using the default logic (this is described in Understanding Sun Master Index Processing (Repository)).
The update mode specifies whether a record’s potential duplicate list is reevaluated when key fields are updated in the record. Performing the reevaluation helps keep the potential duplicate list current, but requires more system resources.
There are two update modes.
Pessimistic – In this mode, a record’s potential duplicates are reevaluated whenever updates are made to the record’s key fields. Key fields are fields involved in blocking and matching.
Optimistic – In this mode, potential duplicates are not reevaluated when key fields are updated in a record. After an update, the potential duplicate list for a record remains the same as before the update occurred.
The merge update status determines whether changes can be made to records that have a status of “merged”. These are the EUID records that are not retained after a merge. For example, when an incoming record is an assumed match with an SBR that has a status of “merged”, the master index application checks the value of the merged-record-update element. If the element is set to “Enabled”, the merged SBR is updated with the new information. If the element is set to “Disabled”, an exception is thrown and the update is not performed. Typically, it is recommended that merged records not be updated.
The blocking query, specified by the query-builder element, identifies one of the queries defined in the Candidate Select file as the query to use for match processing. This query is used by the master index application when searching for a candidate pool of possible matches to an incoming record. If the query takes any parameters, they are defined using the option element.
The DecisionMakerConfig element of the Threshold file allows you to specify how the Manager Service evaluates query results. When the master index application processes an incoming record, it compares the new record against existing records in the database and assigns a matching weight between possible matches with the incoming record. The master index application uses the values that you specify in this section to determine how to handle records that fall within certain matching weight ranges. Records with a matching weight above the duplicate threshold are treated as potential duplicates; records with a matching weight above the match threshold are treated as potential duplicates or assumed matches, depending on the value of the OneExactMatch parameter and the number of records with a matching weight above the match threshold.
For the default Decision Maker, you can configure the parameters described below.
OneExactMatchThis parameter specifies logic for assumed matches. If OneExactMatch is set to true and there is more than one record above the match threshold, then none of the records are considered an assumed match and all are flagged as potential duplicates. If OneExactMatch is set to false and there is more than one record above the match threshold, then the record with the highest matching weight is considered an assumed match and the rest are flagged as potential duplicates.
SameSystemMatchThis parameter indicates whether the master index application will match two records that originated from the same system whose matching weight falls above the match threshold. If SameSystemMatch is set to true, no assumed matches are made between records associated with the same system. If SameSystemMatch is set to false, assumed matches can be made between records associated with the same system.
DuplicateThresholdThe duplicate threshold specifies the matching probability weight at or above which two records are considered to potentially represent the same object. Records with matching weights between the duplicate and match thresholds are always flagged as potential duplicates. A thorough data analysis combined with testing will help determine the best value for the duplicate and match thresholds.
MatchThresholdThe match threshold specifies the matching probability weight at or above which two records are assumed to be a match and are automatically merged in the master index database.
The EUID generator controls how EUIDs are created for each unique record in the master index database. For the default EUID generator, you can define three parameters.
IdLength
This parameter defines the length of the EUIDs created by the master index application. By default, the length of the EUID columns in the master index database is 20. If you choose an ID length larger than 20, make sure to manually modify the length of the EUID columns in the database creation scripts.
ChecksumLength
The ChecksumLength parameter allows you to specify the length of a checksum value. Checksum values help validate EUIDs to ensure accurate identification of records as they are transmitted throughout the system. The checksum process attaches a number, generated through an algorithm, to the end of a new EUID. When a host system receives this number, it strips off the checksum digits to obtain the EUID, and then recalculates the checksum using the same algorithm process. If the checksum values agree, the host system knows the EUID number is correct. Specify “0” (zero) if you do not want to use the checksum function.
Using a checksum value affects the IdLength parameter. If you specify a checksum length greater than 0, the EUID generator creates sequential EUIDs based on the sbyn_seq_table table, and then appends the checksum value to the end of the EUID to determine the final EUID number. For example, if you set IdLength to 8 and CheckSum to 2, then the EUIDs assigned by the master index application will be 10 characters long. If the next sequence number is 10908000, the EUID assigned to the next record is 10908000 plus the checksum (it might be 1090800034, for example). The next EUID would be 10908001 plus the checksum (1090800125, for example). The first eight digits are sequential, but the last two digits are seemingly arbitrary.
If you use a checksum value, make sure to take into consideration the total length of the EUIDs (IdLength plus ChecksumLength) when determining the length of the EUID columns in the database.
ChunkSize
For efficiency, the default EUID generator does not need to query the sbyn_seq_table table in the database each time a new EUID is created. Instead, you can specify a number of EUIDs to be allocated in chunks to the EUID generator. For example, if you specify a chunk size of 1000, EUIDs are allocated to the generator 1000 ID numbers at a time. The generator can process up to 1000 new records and assign all 1000 numbers without needing to query sbyn_seq_table. When all 1000 EUIDs are used, another 1000 are allocated. If the server running the master index application is reset before all 1000 numbers are used, the unused numbers are discarded and never used, meaning that EUIDs might not always be assigned sequentially.
Specifying a chunk size affects the numbering of the EUID column in the sbyn_seq_table. If you specify a chunk size of 1, then each time a new EUID is assigned, the value of the EUID column increases by one. If you specify a larger chunk size, then the value of the EUID column increases by the value of the chunk size each time the allocated EUIDs are used. For example, if you specify a chunk size of 1000, the beginning EUID sequence number is 1000, even though EUIDs are assigned beginning with 0001, then 0002, and so on. When the first 1000 EUIDs are assigned, another 1000 EUID numbers are allocated to the generator and the EUID column changes from 1000 to 2000.
The properties of the Manager Service are defined in the Threshold file in XML format. The information entered into the default configuration file is standard across all implementations, so the file will require some customization.
The following topics provide information about working with the Threshold file:
You can modify the Threshold file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Use caution when updating this file after moving into production, since changing certain properties, such as the blocking query, can cause unexpected matching and weighting results. Most of the configuration options in this file cannot be modified using the Configuration Editor. The exceptions are the match and duplicate thresholds. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 10 lists each element in the Threshold file and provides a description of each element along with any requirements or constraints for each element.
Table 10 Threshold File Structure
Element/Attribute |
Description |
---|---|
The configuration class for the Manager Service. The attributes define the module name and Java class. The default values should not be changed. |
|
A custom plug-in that defines custom processing logic for the execute match functions that can be called from Collaborations and Business Processes. This element is optional. |
|
A custom plug-in that defines custom processing logic for the execute match function that is called from the Enterprise Data Manager (EDM). This element is optional. |
|
An indicator of whether to recalculate potential duplicates when a record is updated. Specify Pessimistic to recalculate potential duplicates; specify Optimistic to prevent potential duplicate recalculation on updates. |
|
An indicator of whether records with a status of Merged can be updated. Specify Enabled to allow updates of merged records; specify Disabled to ensure that records with a Merged status are not updated. |
|
Specifies the blocking query to use for match processing. |
|
The name of the blocking query to use for match processing. The name must match a query defined in the Candidate Select file. |
|
Optional parameters for the blocking query. Currently parameters are not used by any predefined blocking queries. |
|
A parameter for the blocking query. |
|
The value of the key specified by the corresponding key attribute. |
|
The configuration class for the Decision Maker. The attributes define the module name and Java class. The default values should not be changed. |
|
The Java class that contains the methods used by the Decision Maker class. The default value, com.stc.eindex.decision.impl.DefaultDecisionMaker, should not need to be changed, but you can implement a custom Decision Maker class. The default class accepts the parameters described below. |
|
A list of parameters for the Decision Maker class. |
|
A definition of a Decision Maker parameter. The parameters element can contain multiple parameter elements, each defining one parameter. |
|
A brief description of the parameter. This element is optional. |
|
The name of the parameter. The default Decision Maker class takes the following parameters (see Decision Makerfor more information about these parameters).
|
|
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float. |
|
The value of the parameter. For OneExactMatch and SameSystemMatch, this must be a Boolean value. For MatchThreshold and DuplicateThreshold, this must be a Float value. |
|
The configuration class for the EUID Generator. The attributes define the module name and Java class. The default values should not be changed. |
|
The Java class used by the master index application to generate new EUIDs. The default class is com.stc.eindex.idgen.impl.DefaultEuidGenerator, which assigns sequential EUIDs based on the three parameters described below. |
|
A list of parameters for the EUID Generator class. |
|
A parameter definition. The parameters element can contain multiple parameter elements, each defining one parameter. |
|
A brief description of the parameter. This element is optional. |
|
The name of the parameter. The default EUID Generator class takes the following parameters (see EUID Generator for more information about these parameters).
|
|
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float. |
|
The value of the parameter. For the default parameters, the values are all integers. |
Below is a sample of the Threshold file configuration.
<MasterControllerConfig module-name="MasterController" parser-class="com.stc.eindex.configurator.impl.master.MasterControllerConfiguration"> <logic-class>CustomMatchLogic</logic-class> <logic-class-gui>CustomMatchLogicEDM</logic-class-gui> <update-mode>Pessimistic</update-mode> <merged-record-update>Disabled</merged-record-update> <execute-match> <query-builder name="BLOCKER-SEARCH"></query-builder> </execute-match> </MasterControllerConfig> <DecisionMakerConfig module-name="DecisionMaker" parser-class="com.stc.eindex.configurator.impl.decision.DecisionMakerConfiguration"> <decision-maker-class> com.stc.eindex.decision.impl.DefaultDecisionMaker </decision-maker-class> <parameters> <parameter> <parameter-name>OneExactMatch</parameter-name> <parameter-type>java.lang.Boolean</parameter-type> <parameter-value>false</parameter-value> </parameter> <parameter> <parameter-name>SameSystemMatch</parameter-name> <parameter-type>java.lang.Boolean</parameter-type> <parameter-value>true</parameter-value> </parameter> <parameter> <parameter-name>DuplicateThreshold</parameter-name> <parameter-type>java.lang.Float</parameter-type> <parameter-value>7.25</parameter-value> </parameter> <parameter> <parameter-name>MatchThreshold</parameter-name> <parameter-type>java.lang.Float</parameter-type> <parameter-value>29.0</parameter-value> </parameter> </parameters> </DecisionMakerConfig> <EuidGeneratorConfig module-name="EuidGenerator" parser-class= "com.stc.eindex.configurator.impl.idgen.EuidGeneratorConfiguration"> <euid-generator-class> com.stc.eindex.idgen.impl.DefaultEuidGenerator </euid-generator-class> <parameters> <parameter> <parameter-name>IdLength</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>10</parameter-value> </parameter> <parameter> <parameter-name>ChecksumLength</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>0</parameter-value> </parameter> <parameter> <parameter-name>ChunkSize</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>1000</parameter-value> </parameter> </parameters> </EuidGeneratorConfig> |
The Matching Service, configured in the Match Field file, contains the matching and standardization engines used in the match process, as well as the phonetic encoders used for phonetically encoding data. You can configure the match and standardization engines for the master index application in the Match Field file, and also specify special standardization, matching, and weighting logic used by the engines. This file also defines the strategy for identifying unique records and finding the best matches in the master index database. For optimization, the Match Field components are configurable, allowing you to choose the strategy that best fits your requirements or to implement your own custom components.
The following topics describe the components of the Matching Service and the structure of the Match Field file:
The Matching Service is configured by the Match Field file, which defines the configurable properties for standardizing data and matching records. These processes are highly configurable for the master index application, allowing you to design and develop the match strategy that best suits your processing requirements.
The following components make up the Matching Service:
Standardization of incoming data applies three functions to the data processed by the master index application: reformatting (or parsing), normalization, and phonetic encoding. These functions help prepare data for matching and searching. Some fields might require all three steps, some just normalization and phonetic conversion, and other data might only need phonetic encoding. You can specify which fields require any of these steps in the standardization configuration section of the Match Field file. In addition, you can specify the nationality of the data being standardized by the Sun Match Engine.
The three stages of standardization include the following:
Data ReformattingIf incoming records contain data that is not formatted properly, it must be reformatted before it can be normalized. One good example of this is free-form text address fields. If you are matching or searching on street addresses that are contained in one or more free-form text fields (that is, the street address is contained in one field, apartment number in another, and so on), that field must be parsed into its individual components (house number, street name, street type, and so on) before the data can be normalized.
Data NormalizationWhen you normalize data, the data is converted into a standard form. A common use for normalization is to convert nicknames into their standard names, such as converting “Rich” to “Richard” or “Meg” to “Margaret”. Another example is normalizing street address components. For example, “Dr.” or “Drv” in a street address might be normalized to “Drive”. Normalized values are obtained from lookup tables.
Phonetic EncodingOnce data has gone through any necessary reformatting and normalization, it can be phonetically encoded. Phonetic values are generally used in blocking queries in order to obtain all possible matches to an incoming record. They are also used to perform searches from the EDM that allow for misspellings and typographic errors. Typically, first names use Soundex encoding and last names and street names use NYSIIS encoding.
The MatchingConfig section of the Match Field file allows you to define the data fields that are sent to the match engine (called the match string). Probabilistic weighting is performed only against the fields you specify as the match columns. You can specify any field in the object structure as a match column as long as the is configured to use all fields specified. You must specify at least one match field.
The configuration of this section of the Match Field file is specific to the you are using and the types of fields on which you are matching. For more information about how the matching should be configured for the Sun Match Engine, see Understanding the Sun Match Engine.
The match and standardization engines control the processes of standardizing data and generating matching probability weights between records. Sun Master Index provides the ability to use the standardization and match engines that best suit your indexing requirements. You can configure the master index application to use the Sun Match Engine, or you can configure the index to use a customized engine of your choice.
These engines perform two functions:
Standardize data to a common format
Calculate the likelihood that two objects match
The engines are called during match processing, when the master index application retrieves the best matches during a weighted search from the EDM or when the master index application checks for duplicate records during an insert or update from the EDM or an external system.
The block picker and pass controller define how the blocking query is executed during the match process. By default, the matching process is executed in multiple stages. Each configured block that defines query criteria is executed and evaluated separately (each query block execution and evaluation is referred to as a match pass). After a block is evaluated, the pass controller determines whether the results found are sufficient or matching should continue by performing another match pass.
The block picker chooses the block definition to use for each match pass. Block definitions define the criteria for each query that checks the database for a subset of the records to be used for matching. The block picker has access to the match results from previous match passes, as well as lists of applicable block definitions that have been executed and of those that have not been executed.
Sun Master Index provides extensible phonetic encoding capabilities, which are typically used to retrieve records with similar field values from the database for matching. By default, several phonetic encoders are defined to be used in the master index application. Typically, Soundex is used to encode first names (or SoundexFR for first names in the France national domain) and NYSIIS to encode last names. When using the Sun Match Engine, you can specify different types of phonetic encoders, such as Metaphone, Double Metaphone, and Refined Soundex. When you specify the fields in the standardization configuration to be phonetically encoded, you can select one of the encoders defined in the phonetic encoders section.
The following steps illustrate one possible processing sequence that occurs when data is received from an external system and processed by the master index application.
A record is received from an external system.
The local ID does not yet exist in the master index application; initiate the standardization and matching process.
Standardize the record to a common format.
Standardize free-form text.
Normalize fields that need to be converted to a common format.
Phonetically encode fields that are commonly misspelled or spelled in different ways.
Match the record against entries in the database.
Use the selected blocking query (specified in the Threshold file) to retrieve a block of records that might match the new record.
Build and execute the query according to the input record.
Calculate match scores comparing the incoming record against existing records (this is done by the match engine).
Determine whether to repeat the matching process with another block of records, based on the MEFAConfig element in the Match Field file.
Return match scores for further processing.
Determine whether to add the system record to an existing EUID record or to insert the system record as a new EUID record (based on the parameters defined in the DecisionMaker element of the Threshold file).
The properties for the match and standardization process are defined in the Match Field file in XML format. Some of the information entered into the default configuration file is taken from the wizard, but the file might require additional customization in order to meet your data processing needs.
The following topics provide information about working with the the Match Field file:
You can modify the Match Field file at any time, but modifying the file is not recommended once you move to production because this file defines how records are processed and data integrity is maintained. You must regenerate the application and redeploy the project after making any changes to this file. Modifying this file once you are in production might cause weighting and standardization to be handled differently, causing unexpected match weight results.
Most of the components configured by this file can be modified using the Configuration Editor. The editor provides a graphical interface that simplifies defining normalization, standardization, matching, and phonetic encoding. It also maintains referential integrity between files in cases where standardization, normalization, or phonetic encoding requires additional fields to be added to the object structure. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 11 lists each element in the Match Field file and provides a description of each element along with any requirements or constraints for each element.
Table 11 Match Field File Structure
Element/Attribute |
Description |
---|---|
The configuration information for fields to be standardized. It consists of several structures that define standardization rules for a set of fields. The StandardizationConfig attributes define the module name and Java class, and their default values should not be changed. |
|
A standardization structure that defines configuration rules, including normalization, parsing, and phonetic encoding. Each standardization structure contains three primary elements: structures-to-normalize, free-form-texts-to-standardize, and phoneticize-fields. These elements are all required, however any of them can be empty |
|
The name of the object containing the fields defined for standardization. Specifying the parent object allows you to specify any field in any object for standardization. You can also create multiple standardization structures and specify a different object for each structure. |
|
The configuration information for fields that require normalization (but not parsing or reformatting) before being processed by the standardization engine. |
|
The national domain, source fields, and target fields for one normalization unit. You can define multiple group elements. |
|
The type of standardization to perform on the source fields. This is specific to the type of data being processed and the standardization engine being used. For more information about Sun Match Engine types, see Understanding the Sun Match Engine. |
|
The Java class used by the Sun Match Engine to determine the nationality of the data being processed. If no selector is specified, the default is US. Possible values for the Sun Match Engine include the following:
|
|
The ePath to an identifying field in the object structure that indicates which of the defined local-codes definitions to use. If no field is specified, the standardization engine defaults to the United States domain. This field must be contained in the object that contains the fields defined for normalization in this structure. |
|
A list of local codes that define how the standardization engine determines which national domain to use. |
|
A list of value and locale pairs that indicate the national domain to use based on the value of the identifying field in an incoming message (specified by the local-field-name). |
|
A value that, when contained in the identifying field, indicates that the standardization engine will use the corresponding locale element to determine which national domain to use to standardize the data. To specify a default domain, enter “Default” in this element. |
|
A domain code indicating which national domain to use to standardize data when the identifying field value in a transaction matches the corresponding value element. The supported locale codes for the Sun Match Engine include the following:
|
|
A list of source fields to be normalized. |
|
The configuration information for one field in the list of source fields to be normalized. |
|
The ePath of the source field to normalize in the system object (for example, Person.FirstName). |
|
An identification code that identifies the field to normalize to the standardization engine. This ID is specific to the standardization engine in use and must correspond to a field ID defined by that engine. For more information, see Understanding the Sun Match Engine. |
|
A list of destination fields to hold the normalized data. |
|
The configuration information for one field in the list of destination fields. |
|
An identification code that identifies the normalized field to the standardization engine. This is specific to the standardization engine in use and must correspond to a field ID defined by that engine. For more information, see Understanding the Sun Match Engine. |
|
The ePath of the target field in which the normalized value is saved in the system object (for example, Person.Alias[*].StdLastName). |
|
The configuration information for fields that require parsing or reformatting and, optionally, normalization, before being processed by the standardization engine. |
|
The configuration information for the national domain and the source and target fields for one standardization unit. You can define multiple group elements. |
|
The type of standardization to perform on the source fields. This is specific to the standardization engine being used and the type of data being processed. For more information, see Understanding the Sun Match Engine. |
|
The Java class used by the Sun Match Engine to determine the nationality of the data being processed. Possible values are listed below. If no selector is specified, the default is US.
|
|
The ePath to an identifying field in the object structure that indicates which of the defined local-codes definitions to use. If this element is not defined, the standardization engine defaults to the United States domain. This field must be contained in the object that contains the fields defined for standardization in this structure. |
|
A list of local codes that define how the standardization engine determines which national domain to use. |
|
A list of value and locale pairs that indicate the national domain to use based on the value of the identifying field in an incoming message (specified by the local-field-name). |
|
A value that, when contained in the identifying field, indicates that the standardization engine will use the corresponding locale element to determine which national domain to use to standardize the data. To specify a default domain, enter “Default” in this element. |
|
A domain code indicating which national domain to use to standardize data when the identifying field value in a transaction matches the corresponding value element. Supported locale codes for the Sun Match Engine are listed below.
|
|
A list of fields to be standardized. |
|
A field to be standardized. If you define more than one source field in the same standardization unit, the fields are concatenated during standardization with a pipe (|) between lines (for the Sun Match Engine). |
|
A list of fields in which the standardized data from the source fields is stored. |
|
The configuration information for one destination field in which standardized data from the source field will be stored. One source field will likely have several destination fields. |
|
An abbreviation that identifies the destination field to the standardization engine. This must correspond to a field ID defined by the standardization engine being used. For more information, see Understanding the Sun Match Engine. |
|
The ePath of the destination field in the object where the standardized value will be saved (for example, Person.Address[*].StreetName). |
|
A list of fields to be phonetically encoded. |
|
The configuration information for each field to be phonetically encoded, including the encoder to use. |
|
The ePath of the source field in the system object from which the value to phonetically encode will be retrieved (for example, Person.Address[*].StreetName). Note – This can refer to the original field or to a standardized or normalized field. |
|
The ePath of the field in which the phonetically encoded value will be saved in the system object. |
|
A field ID to identify the field to the phonetic encoder. This is not currently used with the Sun Match Engine. |
|
The phonetic encoder to use for this field. This must correspond to the encoding-type configured for the desired encoder in the PhoneticEncodersConfig element. |
|
The configuration information for the match string (that is, the fields that are included in the data string sent to the match engine and against which weighting is performed). The attributes of the MatchingConfig element define the module name and Java class, and their default values should not be changed. |
|
The configuration and field definitions for the match string. |
|
The name of the object containing the fields in the match string. If you specify the parent object, you can specify fields from the parent and any child object in the match string. |
|
A list of fields in the match string. This element contains multiple match-column elements. |
|
The configuration information for one field in the match string. You will use multiple match-column elements. |
|
The fully qualified field name that defines the location of each field on which to match (for example, Enterprise.SystemSBR.Person.Address.City). |
|
The type of matching performed on the specified field. This is an ID that is specific to the match engine and identifies the field to the match engine. This value must correspond to a match type defined for the match engine. |
|
An integer specifying the order in which the field appears in the match string. This element is optional. If no order is specified, matching is performed in the order in which the fields are listed. |
|
The configuration information for the components of the matching service. The MEFAConfig attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed. You should only change the names of the component classes in this section if you created a corresponding custom component. |
|
The configuration information for the Java class that chooses which block of criteria defined for the blocking query to use for each match pass. |
|
The name of the block picker Java class. |
|
The configuration information for the Java class that determines whether the blocking query should continue performing match passes after each match pass is complete. |
|
The name of the pass controller Java class. |
|
The configuration information for the standardization engine to use. |
|
The name of the standardizer API Java class. |
|
The configuration information for the Java class that provides configuration information to the standardization engine. |
|
The name of the standardizer configuration Java class. |
|
The configuration information for the match engine to use. |
|
The name of the match engine API Java class. |
|
The configuration information for the Java class that provides configuration information to the match engine. |
|
The name of the match engine configuration Java class. |
|
The configuration information for the phonetic encoders used by the master index application. The attributes (module-name and parser-class) define the module name and Java class. The default values should not be changed. |
|
A list of phonetic encoders used by the standardization engine. |
|
The name of the phonetic encoder, such as NYSIIS, Soundex, or Metaphone. |
|
The fully qualified name of the Java class that determines the behavior of the phonetic encoder. The following default classes are defined for the Sun Match Engine.
|
Below is a short sample of the Match Field file based on a master index application processing person data. This sample covers the basic elements of the Match Field file, but a production environment would contain several more fields to standardize as well as several additional match string fields.
<StandardizationConfig module-name="Standardization" parser-class= "com.stc.eindex.configurator.impl.standardization.StandardizationConfiguration"> <standardize-system-object> <system-object-name>Person</system-object-name> <structures-to-normalize> <group standardization-type="PersonName" domain-selector= ”com.stc.eindex.matching.impl.SingleDomainSelectorUS"> <unnormalized-source-fields> <source-mapping> <unnormalized-source-field-name> Person.Alias[*].FirstName </unnormalized-source-field-name> <standardized-object-field-id>FirstName </standardized-object-field-id> </source-mapping> <source-mapping> <unnormalized-source-field-name> Person.Alias[*].LastName </unnormalized-source-field-name> <standardized-object-field-id>LastName </standardized-object-field-id> </source-mapping> </unnormalized-source-fields> <normalization-targets> <target-mapping> <standardized-object-field-id>FirstName </standardized-object-field-id> <standardized-target-field-name> Person.Alias[*].StdFirstName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>LastName </standardized-object-field-id> <standardized-target-field-name> Person.Alias[*].StdLastName </standardized-target-field-name> </target-mapping> </normalization-targets> </group> <group standardization-type="PersonName" domain-selector= "com.stc.eindex.matching.impl.SingleDomainSelectorUS”> <unnormalized-source-fields> <source-mapping> <unnormalized-source-field-name>Person.FirstName </unnormalized-source-field-name> <standardized-object-field-id>FirstName </standardized-object-field-id> </source-mapping> <source-mapping> <unnormalized-source-field-name>Person.LastName </unnormalized-source-field-name> <standardized-object-field-id>LastName </standardized-object-field-id> </source-mapping> </unnormalized-source-fields> <normalization-targets> <target-mapping> <standardized-object-field-id>FirstName </standardized-object-field-id> <standardized-target-field-name>Person.StdFirstName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>LastName </standardized-object-field-id> <standardized-target-field-name>Person.StdLastName </standardized-target-field-name> </target-mapping> </normalization-targets> </group> </structures-to-normalize> <free-form-texts-to-standardize> <group standardization-type="Address" domain-selector= "com.stc.eindex.matching.impl.MultiDomainSelector"> <locale-field-name>Person.Country</locale-field-name> <locale-maps> <locale-codes> <value>Default</value> <locale>US</locale> </locale-codes> </locale-maps> <unstandardized-source-fields> <unstandardized-source-field-name> Person.Address[*].AddressLine1 </unstandardized-source-field-name> <unstandardized-source-field-name> Person.Address[*].AddressLine2 </unstandardized-source-field-name> </unstandardized-source-fields> <standardization-targets> <target-mapping> <standardized-object-field-id>HouseNumber </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].HouseNumber </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>MatchStreetName </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id> StreetNamePrefDirection </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetDir </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>StreetNameSufType </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetType </standardized-target-field-name> </target-mapping> </standardization-targets> </group> </free-form-texts-to-standardize> <phoneticize-fields> <phoneticize-field> <unphoneticized-source-field-name>Person.FirstName_Std </unphoneticized-source-field-name> <phoneticized-target-field-name>Person.FirstName_Phon </phoneticized-target-field-name> <encoding-type>Soundex</encoding-type> </phoneticize-field> <phoneticize-field> <unphoneticized-source-field-name>Person.LastName_Std </unphoneticized-source-field-name> <phoneticized-target-field-name>Person.LastName_Phon </phoneticized-target-field-name> <encoding-type>NYSIIS</encoding-type> </phoneticize-field> <phoneticize-field> <unphoneticized-source-field-name> Person.Address[*].StreetName </unphoneticized-source-field-name> <phoneticized-target-field-name> Person.Address[*].StreetNamePhoneticCode </phoneticized-target-field-name> <encoding-type>NYSIIS</encoding-type> </phoneticize-field> </phoneticize-fields> </standardize-system-object> </StandardizationConfig> <MatchingConfig module-name="Matching" parser-class= "com.stc.eindex.configurator.impl.matching.MatchingConfiguration"> <match-system-object> <object-name>Person</object-name> <match-columns> <match-column> <column-name>Enterprise.SystemSBR.Person.StdFirstName </column-name> <match-type>FirstName</match-type> </match-column> <match-column> <column-name>Enterprise.SystemSBR.Person.StdLastName </column-name> <match-type>LastName</match-type> </match-column> <match-column> <column-name>Enterprise.SystemSBR.Person.DOB</column-name> <match-type>DOB</match-type> </match-column> </match-columns> </match-system-object> </MatchingConfig> <MEFAConfig module-name="MEFA" parser-class= "com.stc.eindex.configurator.impl.MEFAConfiguration"> <block-picker> <class-name>com.stc.eindex.matching.impl.PickAllBlocksAtOnce </class-name> </block-picker> <pass-controller> <class-name>com.stc.eindex.matching.impl.PassAllBlocks </class-name> </pass-controller> <class-name> com.stc.eindex.matching.adapter.SbmeStandardizerAdapter </class-name> </standardizer-api> <standardizer-config> <class-name> com.stc.eindex.matching.adapter.SbmeStandardizerAdapterConfig </class-name> </standardizer-config> <matcher-api> <class-name>com.stc.eindex.matching.adapter.SbmeMatcherAdapter </class-name> </matcher-api> <matcher-config> <class-name> com.stc.eindex.matching.adapter.SbmeMatcherAdapterConfig </class-name> </matcher-config> </MEFAConfig> <PhoneticEncodersConfig module-name="PhoneticEncoders" parser-class= "com.stc.eindex.configurator.impl.PhoneticEncodersConfig"> <encoder> <encoding-type>NYSIIS</encoding-type> <encoder-implementation-class> com.stc.eindex.phonetic.impl.Nysiis </encoder-implementation-class> </encoder> <encoder> <encoding-type>Soundex</encoding-type> <encoder-implementation-class> com.stc.eindex.phonetic.impl.Soundex </encoder-implementation-class> </encoder> </PhoneticEncodersConfig> |
The Update Manager contains the logic used to generate the single best record (SBR) for a given object. The SBR is defined by a mapping of fields from external systems to the SBR, allowing you to define the fields from each system that are kept in the SBR. For each field in the SBR, an ePath denotes the location in the external system records from which the value is retrieved. Since there can be many external systems, you can optionally specify a strategy to select the SBR field from the list of external values. You can also specify any additional fields that might be required by the selection strategy to determine which external system contains the best data (by default, the record’s update date and time is always taken into account). The Update Manager also specifies any custom Java classes to be used for different types of update transactions, such as merges, unmerges, changes to existing records, and new record inserts.
The Update Manager is configured in the Best Record file. The following topics describe the Update Manager and the Best Record file.
The survivor calculator generates and updates the SBR for each record. The SBR for an enterprise object is created from what is considered to be the most reliable information contained in each system record for a particular object. The information used from each local system to populate the SBR is determined by the survivor calculator defined in the Update Manager. The fields defined in the survivor calculator are also the fields contained in the SBR. You can configure the survivor calculator to determine the best fields for the SBR from a combination of all the source system records. The survivor calculator can consider factors such as the relative reliability of a system, how recent the data is, and whether data entered from the EDM overwrites data entered from any other system.
The survivor calculator consists of the rules defined for the survivor helper and the weighted calculator.
Phonetic and standardized fields do not need to be defined in the Best Record file since their field values are determined by the standardization engine for the SBR.
The logic that determines how the fields in the SBR are populated and how certain updates are performed is highly configurable in a master index application, allowing you to design and develop the match strategy that best suits your processing requirements.
The survivor helper defines a list of fields on which survivor calculation is performed, and thus the list of fields included in the SBR. Each field is called a candidate field. For each candidate field, you specify whether to use the default survivor calculation strategy or a custom strategy. The survivor helper must list each field contained in the SBR; any fields that are not listed here will not be populated in the SBR.
For each field, you can specify system fields to be taken into consideration as well as a specific survivorship strategy. There are three basic strategies provided by Sun Master Index to determine survivorship for each field. You can define and implement custom strategies.
Default Strategy
Weighted Strategy
Union Strategy
This strategy maps fields directly from the local system records to the SBR. When you specify the default survivor strategy for a field, you must also specify the parameter that defines the source system. For example, if you specify the default survivor calculator for the field “Person.LastName” and define the preferred system as “SystemA”, the last name field in the SBR is always taken from SystemA (unless the value is overridden in the EDM).
The default survivor strategy is com.stc.eindex.survivor.impl.DefaultSurvivorStrategy.
This strategy is the most complex survivor strategy, and uses a combination of weighted calculations to determine the most reliable source of data for each field. This strategy is highly customizable and you can define which calculation or set of calculations to use for each field. The calculations can be based on the update date of the data, system reliability, and agreement between systems. In the default configuration of the file, the calculations are defined in the WeightedCalculator section of the file.
The weighted survivor strategy is com.stc.eindex.survivor.impl.WeightedSurvivorStrategy. You can define general weighted calculations to be performed by default for each field, and you can define specialized calculations to be performed for specific fields.
This strategy combines the data from all source systems to populate the fields in the SBR for which this strategy is specified. For example, if you store aliases for person names in the database, you want to store all possible alias records and not just the “best” alias information. In order to do this, specify the union strategy for the alias object. This means that all alias information from all source systems is stored in the SBR.
The union strategy is applied to entire objects rather than to fields. This strategy combines all child objects from an enterprise objects source systems to populate the SBR. If the source systems contain two or more instances of a child object with the same unique key (such as two home telephone numbers), the union strategy only populates the most current child object in the SBR. For example, if the union strategy is assigned to the address object and each address object is identified by a unique key (such as the address type), the SBR only contains the most current address record of each address type (for example, one home address, one office address, and so on).
The union strategy is com.stc.eindex.survivor.impl.UnionSurvivorStrategy.
By default, the weighted calculator implements the weighted strategy defined above. Use the WeightedCalculator section to define conditions and weights that determine the best information with which to populate the SBR. The weighted calculator selects a single value for the SBR from a set of system fields. The selection process is based on the different qualities defined for each field.
The weighted calculator defines two sets of rules. The default rules apply to all fields in a record except those fields for which rules are specifically defined. The candidate rules only apply to those fields for which they are specifically defined. If you modify the default rules, the changes will apply to all fields except the fields for which candidate rules are defined.
You can define several strategies to help the weighted calculator determine the best information to populate into each field of the SBR. Each of these strategies is defined by a quality, a preference, and a utility. The quality defines the type of weighted calculation to perform, the preference indicates the source being rated, and the utility indicates the reliability. You can define multiple strategies for each field, and a linear summation on the utility score of each strategy determines the best value to populate in the SBR field.
The weighted calculator strategies include:
SourceSystem
SystemAgreement
MostRecentModified
This strategy indicates the best source system for a field, and is used when the quality of the field in question depends on its origin. For example, to indicate that the data from SystemA for a specific field is of a higher quality than SystemB, define a SourceSystem quality for “SystemA” and one for “SystemB”. Then assign SystemA a higher utility value (85.0, for example) and SystemB a lower utility value (30.0, for example). This indicates that SystemA is a more reliable source for the field. If both SystemA and SystemB contain the specified field, the value from SystemA is populated into the SBR. If the field is empty in SystemA but the field in SystemB contains a value, then the value from SystemB is used.
This strategy prorates the utility score based on the number of systems whose values for the specified field are in agreement. For example, if the first name field for SystemA is “John”, for SystemB is “John”, and for SystemC is “Jon”, SystemA and SystemB together receive two-thirds of the utility score, while SystemC only receives one-third. The value populated into the SBR is “John”. You do not need to define a preference for the SystemAgreement strategy, but you must define source systems.
This strategy ranks the field values from the source systems in descending order according to the time that the object was last modified. The value populated in the SBR comes from the most recently modified object. You do not need to define a preference for the MostRecentModified strategy, but you must define a utility.
The Update Manager policies specify custom Java classes that provide additional processing logic for each type of update transaction. By default, this additional processing is not defined in a standard master index application. You can define custom update policies using the Custom Plug-ins function in the master index project, which appears after the project is generated. The Custom Plug-in function also provides the ability to build and compile the custom Java code, and Sun Master Index automatically incorporates the classes when you generate the application. The Java classes defining the update policies are specified for the master index application in the UpdateManagerConfig element of the Best Record file.
There are seven types of update policies defined in the Update Manager.
Enterprise Merge Policy – The enterprise merge policy defines additional processing to perform when two enterprise objects are merged. This policy is defined by the EnterpriseMergePolicy element.
Enterprise Unmerge Policy – The enterprise unmerge policy defines additional processing to perform when an unmerge transaction occurs. This policy is defined by the EnterpriseUnmergePolicy element.
Enterprise Update Policy – The enterprise update policy defines additional processing to perform when a record is updated. This policy is defined by the EnterpriseUpdatePolicy element.
Enterprise Create Policy – The enterprise create policy defines additional processing to perform when a new record is inserted into the master index database. This policy is defined by the EnterpriseCreatePolicy element.
System Merge Policy – The system merge policy defines additional processing to perform when two system objects are merged. This policy is defined by the SystemMergePolicy element.
System Unmerge Policy – The system unmerge policy defines additional processing to perform when system objects are unmerged. This policy is defined by the SystemUnmergePolicy element.
UndoAssumeMatchPolicy – The undo assume match policy defines additional processing to perform when an assumed match transaction is reversed. This policy is defined by the UndoAssumeMatchPolicy element.
The update policy section includes a flag that can prevent the update policies from being carried out if no changes were made to the existing record. When set to “true”, the SkipUpdateIfNoChange flag prevents the update policies from being performed when no changes are made to an existing record. Setting the flag to true helps increase performance when processing a large number of updates.
The properties for the update process are defined in the Best Record file in XML format. Some of the information entered into the default configuration file is based on the fields defined in the wizard and some is standard across all implementations. For most implementations, this file will require customization.
The following topics provide information about working with the Best Record file:
You can customize the configuration of the Update Manager by modifying the Best Record file. This file cannot be modified using the Configuration Editor; you need to modify the file directly. You can modify this file at any time, but it is not recommended after moving into production. The configuration controls how the SBR for each object is created, and modifying the file can cause discrepancies in how SBRs are formed before and after the modifications. It might also cause discrepancies in match results, since matching is performed against the SBR. You must regenerate the application and redeploy the project after modifying this file. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 12 lists each element in the Best Record file and provides a description of each element along with any requirements or constraints for each element.
Table 12 Best Record File Structure
Element/Attribute |
Description |
---|---|
The configuration of the overall survivor strategy. The SurvivorHelperConfig attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed. |
|
The Java class that determines how to retrieve values from system records and to set them in the SBR. The default class uses the ePath notation to retrieve and set the values. |
|
The configuration information for the survivor strategy to use as the default. You can define multiple survivor strategies and use different strategy combinations for the candidate fields. Any field that is not assigned a specific survivor strategy in the candidate-definitions list uses the default survivor strategy specified here. |
|
The Java class name of the default strategy. |
|
A list of optional parameters for the default survivor strategy. |
|
A parameter for the default survivor strategy. The default parameter points to another section in the Best Record file that configures the default class. There can be multiple parameter elements. |
|
An optional element that briefly describes the parameter. |
|
The name of the parameter.
|
|
The Java data type for the parameter value. For both the DefaultSurvivorStrategy and the WeightedSurvivorStrategy, this value is java.lang.String. |
|
The value of the named parameter.
|
|
The configuration information for the fields to be included in the SBR. For any field that does not use the default survivor strategy, an alternate strategy is defined. All field that are included in the SBR must be listed here. |
|
The qualified field name for a field in the SBR (for more information about field notations, see Master Index Field Notations). |
|
A short description of the candidate field (this element is optional). |
|
A field (other than the candidate field) that is evaluated to determine the value for the SBR. One example of this would be to evaluate the last update date of the system records to determine which value is most recent. This element is not currently used by any of the standard survivor strategies provided with the master index application, but might be useful when defining custom strategies. |
|
The name of the field to use to determine the value for the SBR. |
|
An alternate survivor strategy to use for the given field in place of the default strategy defined in the default-survivor-strategy element. If a strategy is not specifically defined for a field, the default strategy is used for that field. |
|
The name of the Java class to use for the alternate survivor strategy. |
|
The configuration of the weighted calculator. By default, this is the strategy specified as the default strategy in the default-survivor-strategy element. The WeightedCalculator attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed unless you create a custom class. |
|
The configuration information for the default weighted calculator logic for all fields except those whose logic is defined in the candidate-field element below. |
|
A parameter for the default logic of the weighted calculator. |
|
The type of weighted calculation to perform. You can specify any of the following:
For more information about these qualities, see Weighted Calculator. |
|
The preferred value for the specified quality. This element is only used for the SourceSystem quality and must be a source system code. |
|
A value that indicates the reliability of the specified quality for determining the best field value for the SBR. You define the scale for the utility values. |
|
A field for which you want to use custom logic for the weighted calculator. The logic you specify here overrides the logic defined in the default-parameters section, but only for the fields specified. Each candidate field is identified by a name attribute and defines the survivor strategies for one field. |
|
The name of the candidate field for which you want to define override logic. |
|
A parameter configuring the weighted calculator for the candidate field. You can define multiple parameters for each candidate field. |
|
The type of weighted calculation to perform. You can specify any of the following:
|
|
The preferred value for the specified quality. This element is only used for the SourceSystem quality and the preference must be a source system code. |
|
A value that indicates the reliability of the specified quality for determining the best field value for the SBR. You define the scale for the utility values. |
|
The configuration information for the Update Manager. This section defines a list of Java classes to manage custom processing for different types of transactions. You can create the custom classes in the Custom Plug-ins function of the master index project and then specify those classes here. The UpdateManagerConfig attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed. |
|
A class that defines additional processing to perform when two enterprise objects are merged. |
|
A class that defines additional processing to perform when two enterprise objects are unmerged. |
|
A class that defines additional processing to perform when a record is updated. |
|
A class that defines additional processing to perform when a new record is created. |
|
A class that defines additional processing to perform when two system objects are merged. |
|
A class that defines additional processing to perform when system objects are unmerged. |
|
A class that defines additional processing to perform when an assumed match transaction is reversed. |
|
An indicator of whether the update policies are carried out if no changes are made to the existing record. Specify “true” to prevent the update policies from being performed when no changes are made to an existing record. |
Below is a sample of the Best Record file using a very small object structure based on person data. Note that standardized and phonetic fields are included in the candidate fields to ensure that they are also included in the SBR. In this sample, all fields use the default strategy except those included in the Alias object, which uses the union strategy. The value that is populated in the LastName field of the SBR is dependent on the SSN field of the system objects. In addition, custom logic is defined only for the SSN field; the remaining fields use the default logic defined in the default-parameters element.
<SurvivorHelperConfig module-name="SurvivorHelper" parser-class="com.stc.eindex.configurator.impl.SurvivorHelperConfig"> <helper-class>com.stc.eindex.survivor.impl.DefaultSurvivorHelper </helper-class> <default-survivor-strategy> <strategy-class> com.stc.eindex.survivor.impl.WeightedSurvivorStrategy </strategy-class> <parameters> <parameter> <parameter-name>ConfigurationModuleName</parameter-name> <parameter-type>java.lang.String</parameter-type> <parameter-value>WeightedSurvivorCalculator </parameter-value> </parameter> </parameters> </default-survivor-strategy> <candidate-definitions> <candidate-field name="Person.LastName"> <system-fields> <field-name>Person.SSN</field-name> </system-fields> </candidate-field> <candidate-field name="Person.FirstName"/> <candidate-field name="Person.MiddleName"/> <candidate-field name="Person.DOB"/> <candidate-field name="Person.Gender"/> <candidate-field name="Person.SSN"/> <candidate-field name="Person.FnamePhoneticCode"/> <candidate-field name="Person.LnamePhoneticCode"/> <candidate-field name="Person.StdFirstName"/> <candidate-field name="Person.StdLastName"/> <candidate-field name="Person.Alias[*].*"> <survivor-strategy> <strategy-class> com.stc.eindex.survivor.impl.UnionSurvivorStrategy </strategy-class> </survivor-strategy> </candidate-field> </candidate-definitions> </SurvivorHelperConfig> <WeightedCalculator module-name="WeightedSurvivorCalculator" parser-class="com.stc.eindex.configurator.impl.WeightedCalculatorConfig"> <candidate-field name="Person.SSN"> <parameter> <quality>SourceSystem</quality> <preference>SBYN</preference> <utility>100.0</utility> </parameter> <parameter> <quality>MostRecentModified</quality> <utility>75.0</utility> </parameter> </candidate-field> <default-parameters> <parameter> <quality>MostRecentModified</quality> <utility>80.0</utility> </parameter> <parameter> <quality>SourceSystem</quality> <preference>SBYN</preference> <utility>100.0</utility> </parameter> </default-parameters> </WeightedCalculator> <UpdateManagerConfig module-name="UpdateManager" parser-class="com.stc.eindex.configurator.impl.UpdateManagerConfig"> <EnterpriseMergePolicy>com.stc.eindex.user.CustomMergePolicy </EnterpriseMergePolicy> <EnterpriseUnmergePolicy>com.stc.eindex.user.CustomUnmergePolicy </EnterpriseUnmergePolicy> <EnterpriseUpdatePolicy>com.stc.eindex.user.CustomUpdatePolicy </EnterpriseUpdatePolicy> <EnterpriseCreatePolicy>com.stc.eindex.user.CustomCreatePolicy </EnterpriseCreatePolicy> <SystemMergePolicy>com.stc.eindex.user.CustomSystemMergePolicy </SystemMergePolicy> <SystemUnmergePolicy>com.stc.eindex.user.CustomSystemUnmergePolicy </SystemUnmergePolicy> <UndoAssumeMatchPolicy>com.stc.eindex.user.CustomUndoMatchPolicy </UndoAssumeMatchPolicy> <SkipUpdateIfNoChange>true</SkipUpdateIfNoChange> </UpdateManagerConfig> |
The following sample illustrates how the weighted calculator uses the parameters you define to determine which field values to use in the SBR. Using this sample, if there is a value in only one of the system records but not in the other, that value is used in the SBR regardless of update date. If there is a value in both system records and they were updated at the same time, the SAP field value is used (80.0>30.0). If there is a value in both system records, but CDW was the most recently modified, the value from CDW is populated into the SBR ((30.0+70.0)>80.0)
<default-parameters> <parameter> <quality>SourceSystem</quality> <preference>SAP</preference> <utility>80.0</utility> </parameter> <parameter> <quality>MostRecentModified</quality> <utility>70.0</utility> </parameter> <parameter> <quality>SourceSystem</quality> <preference>CDW</preference> <utility>30.0</utility> </parameter> </default-parameters> |
You can define custom logic for field validations and then specify them in the Field Validation file to associate the logic with the master index application. The custom logic is created as a Java class using the Custom Plug-ins function of the master index project. The custom validation classes must implement com.stc.eindex.objects.validation.ObjectValidator. The exception thrown is com.stc.eindex.objects.validation.exception.ValidationException.
The following topic describes the structure of the Field Validation file and provides information about custom field validators.
By default, the Field Validation file defines one validation rule named validate-local-id. This rule defines certain validations that are performed against local ID and system fields before they are entered into the database. The local ID validator verifies that the system code is valid, the local ID format is correct, the local ID is the correct length, and that neither field is null.
The following topics provide information about working with the the Field Validation file:
You can modify the Field Validation file using the XML editor. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes. When you modify this file, you must regenerate the application and redeploy the project for the changes to take effect.
The Field Validation file consists primarily of a list of rules. Each rule is defined within the ValidationConfig element and is defined by attributes within a rules element. Table 13 describes the elements and attributes of the Field Validation file.
Table 13 Field Validation File Elements
Element or Attribute |
Description |
---|---|
The configuration information for the validation rules. |
|
The configuration information for a specific validation rule in a rules list. |
|
A rule attribute that specifies a name for the validation rule. |
|
A rule attribute that specifies the name of the class that defines the object to which the validation rule is applied, such as SystemObject or ParentNameObject (where ParentName is the name of the parent object in the Object Definition). |
|
A rule attribute that specifies the complete path of the Java class containing the validation rule. |
Plug the custom validation classes you create into the master index application by specifying the name of the custom plug-in for the class in the Field Validation file, as shown below.
<ValidationConfig module-name="Validation" parser-class= "com.stc.eindex.configurator.impl.validation.ValidationConfiguration" <rules> <rule name="validate-auxiliary-id" object-name="PersonObject" class="com.stc.eindex.user.AuxiliaryId"/> <rule name="validate-birth-date" object-name="PersonObject" class="com.stc.eindex.user.BirthDate"/> </rules> </ValidationConfig> |
The Enterprise Data Manager (EDM) is the web-based user interface for the master index application that allows you to monitor and modify data in the index. This interface is highly configurable, and can be customized by modifying the Enterprise Data Manager file in the maser index project.
The following topics describe the EDM, the Enterprise Data Manager file structure, and how the EDM can be configured:
The EDM is a web-based interface that allows you to manage and monitor the data in your master index database. Using the EDM, you can search for records; add, update, deactivate, and reactivate records; review and resolve potentially duplicate records; compare records; and merge and unmerge records. You can also view a transaction history for each record and an audit log of access to the database. This interface is highly configurable, allowing you to customize certain processing properties as well as the appearance of certain windows.
The EDM facilitates the use of screen readers and other assistive technology by providing information through HTML tags. It also provides tooltips when the cursor is placed over links and images on the EDM pages.
You can configure several properties of the EDM to display the information you want in the way you want, to define the way searches can be processed, and to define the criteria that can be used for each search. Certain implementation options are also configured in the Enterprise Data Manager file, such as application server information, debug options, and security information.
The configurable properties of the user interface fall into these five categories:
In the Enterprise Data Manager file, you can specify which objects appear on the EDM windows and the order in which they appear. You can also specify the fields displayed in each object and configure properties for each field. You can specify a field’s name, length, order of appearance on the EDM, required data type, whether text can be entered into a field or if it must be selected from a predefined value, whether a field or combination of fields must be unique to the parent object, and whether the value of the field is hidden under certain circumstances. You can also specify whether the format of a field is dependent on the value of a related field (for example, the format of a credit card number field could be dependent on the type of credit card specified).
In the Enterprise Data Manager file, relationships define the hierarchy of the object types listed for the EDM. By specifying relationships, you define parent and child nodes. The parent and child nodes you specify in the relationships element must also be defined in the node elements of the file. You can specify one parent object; the remaining objects must be child objects to the parent you define. This section is dependent on the relationships section in the Object Definition file, and should only be changed if corresponding changes are made to the Object Definition file.
You can configure several display properties for the pages that appear on the EDM. For most configurable pages, you can specify the number of fields on each row, and for all search pages you can specify the number of results to display on each page. You can also specify the type of object to display on a page and the name of the tabbed heading.
Certain properties of the following pages can be configured:
In the display configuration, you can specify whether an audit log is maintained of all instances in which object information was accessed from the EDM. If the log is maintained, then information about each instance of access can be viewed on the EDM. This is especially useful in healthcare implementations, where privacy of information is mandated.
A local ID is a unique identification code assigned to a record by the system in which the record originated. By default, this code is named “Local ID” on the EDM pages. This name can be modified to a name more recognizable by EDM users.
Of the configurable pages, the page that requires the most configuration is the Search page. In addition to defining the number of fields per row and the number of records to display in the search results list, you can also specify the search criteria that appear and the types of searches allowed from the EDM.
You can define and name several search pages, each with their own configuration. For each page, you specify groups of fields that are displayed in boxed areas. Each boxed area can represent a different type of search, such as a demographic search, address search, EUID search, and so on.
For each search page you define, you must also specify the search types available, such as alphanumeric or phonetic. You can configure searches by specifying a name for the search, the maximum number of records to return, whether the results are weighted, and whether wildcard characters can be used. When you define the search types for the EDM, you must specify a query for each type you define. The queries you specify must already be defined in the Candidate Select file.
The Enterprise Data Manager file defines certain information about the application server for the master index implementation, such as the names of certain validation and management components, debug parameters, and security information. The security for the master index application is based in the application server.
EDM properties are defined in the Enterprise Data Manager file in XML format. Some of the information entered into the default configuration file is based on the fields you defined in the wizard, and some is standard across all implementations. For most implementations, this file will require customization.
The following topics provide information about working with the Enterprise Data Manager file:
You can modify the Enterprise Data Manager file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Changes made to this file do not affect match processing; it only affects the EDM. Most components of this file cannot be modified using the Configuration Editor; you need to modify the file directly. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 14 lists each element in the Enterprise Data Manager file and provides a description of each element along with any requirements or constraints for each element.
Table 14 Enterprise Data Manager File Structure
Element/Attribute |
Description |
---|---|
The configuration information for an object on the EDM. Each object that appears on the EDM must be defined in a node element. The node is named for the object it defines; for example: node-Person or node-Address. Most of the information you need to specify in the node elements is generated by the wizard, but you can modify the information as needed. Note – All fields defined in the Enterprise Data Manager file must also be defined in the Object Definition file; however, not all fields defined in the Object Definition file need to be defined for the EDM. Any fields or objects not listed in the Enterprise Data Manager file will not appear on the EDM. |
|
The order in which the child object types appear in the tree view pane on the EDM pages. |
|
An indicator of whether EDM users must specify which instances of the child object type to retain during a system record merge. Specify false to give EDM users the option of selecting the child objects to retain or accepting the default objects (the default objects are those in the destination system). Specify true to force the user to select the child objects to retain. |
|
The configuration information for a field on the EDM. Each field that appears on the EDM must be defined in a field element. The element is named for the field it defines; for example: field-FirstName. |
|
The name of the field as it will appear on the EDM. |
|
The order in which the field appears on the EDM. For example, specify 1 to indicate this is the first field on the EDM pages, 2 to indicate it is the second field, and so on. |
|
The maximum number of characters displayed on the EDM for the field. |
|
The type of display for the field. Specify one of the following options.
|
|
The master index data type for the data populated in the field. The following data types are supported:
|
|
A mask used by the EDM to add punctuation to a field. You can add an input mask to display telephone numbers as “(123)456-7890” even though the database might store them as “1234567890” and the user enters the numbers with no punctuation. The following character types are allowed:
|
|
A mask used by the master index application to strip any extra characters that were added by the input mask to ensure that data is stored in the database in the correct format. This mask must be the same length as the input mask. To specify a value mask, type the same value as is entered for the input mask, but type an “x” in place of each punctuation mark. For example, using the above phone number example, you need to specify a value mask of xDDDxDDDxDDDD. A value mask is not required for date fields. |
|
The name of the menu list used to populate the drop-down list for the field. This is required if the gui-type specified is MenuList, and it must match a code of an element in the sbyn_common_header database table. |
|
An indicator of whether the field (or a combination of key fields) must be unique in an enterprise record. Unique key fields identify unique child objects in an enterprise object. Specify true to indicate the field is a key field; specify false if it is not. |
|
An indicator of whether the value of the field is hidden on the EDM for records with a VIP status of “VIP”. Only users with the eView.Admin or eView.VIP user roles can view the hidden information. Specify true to hide the field value; specify false (or remove the is-sensitive element) to display the field value. Note – This element is only used if the object-sensitive-plug-in-class in the impl-details section is populated. |
|
Configuration information for the hierarchy of the parent and child objects displayed on the EDM. |
|
The name of the parent object. |
|
The name of a child object. There should be one children element for each child node. |
|
The configuration information for certain classes that are required for the EDM to connect to the server. This section also defines debug and security options. |
|
The JNDI name for the Master Controller. The default name is ejb/app-nameMasterController, where app-name is the name of the master index application. |
|
The JNDI name for the processing code validator. The default name is ejb/app-nameCodeLookup. |
|
The JNDI name for the user code validator (used for non-unique IDs). The default name is ejb/app-nameUserCodeLookup. |
|
The JNDI name for the EDM report generator. The default name is ejb/app-nameReportGenerator. |
|
An indicator of whether debug information is logged. Specify true to log debug information. |
|
The destination to which debug information is written. Specify console to log debug information to a monitor; specify file to print to a file. |
|
An indicator of whether authorization security is enabled for the EDM (this refers to the security permissions defined in the application server). Specify true to enable security. |
|
The name of the class that contains logic for masking the data in certain fields from certain users. For example, certain sensitive information should only be viewed by administrators. If you specify field masking, you must define a custom plug-in to handle the process and specify it here. Note – Sun Master Patient Index provides a predefined field-masking class. |
|
The configuration information for the pages that appear on the EDM, including the types of searches available on the EDM. |
|
The first page to appear when you log in to the EDM. Enter one of the following values.
|
|
Configuration information for the names of the local ID fields and headings that appear on the EDM. If you want to change the field label from “Local ID” to a name that is more relevant to your implementation, define that name here. |
|
The name to use for the heading label of the local ID search section of the EDM Lookup page. |
|
The name to use for all fields and columns containing local IDs. In fields where the local ID label is abbreviated to “LID1” or “LID2”, the name becomes local-id1 or local-id2 (where local-id is the value specified for this element). |
|
The configuration information for individual pages on the EDM. |
|
The configuration information for the Search pages. |
|
The name of the type of object returned by the search (this must be the parent object). |
|
A name for the search pages. This name appears on tab label associated with the search pages on the EDM. |
|
The URL to the entry page of the search pages. This element should not be modified. |
|
Each simple-search-page element contains the configuration information for one of the searches that appears on the Search page. |
|
The name of the search as it appears in the Search Type drop-down list on the EDM, from which users can select a type of search to perform. |
|
The number of fields to display in each row on the search page. For best readability, this value should be set at 1 or 2. |
|
An indicator of whether to display the EUID. Specify true to display the EUID; otherwise specify false. Only display this field if you want it to take precedence over all other search criteria. When the EUID is displayed, it appears in its own labelled box. |
|
An indicator of whether to display the local ID and system fields. Specify true to display the fields; otherwise specify false. Only display these fields if you want them to take precedence over all other search criteria (except the EUID field). When the local ID is displayed, the local ID and system fields appear in their own labelled box. |
|
A short statement to help the user process a search. The text you enter here appears above the search fields on the Search page. |
|
A list of fields that appear on the Search page. Each field group is contained in a labelled box on the Search page. |
|
A description of the fields defined for the field-group element. This value appears as a box label for the area of the page that contains the specified fields. |
|
The simple field names of the fields in the field group using their corresponding objects as the root. For example, the path to the FirstName field in the Person object is “Person.FirstName”. You can define multiple field-ref elements for each field group. |
|
An indicator of whether the field is required in order to perform a search. Specify any of the following values.
|
|
An indicator of whether this field allows you to search by a range of values rather than an exact value. Specify one of the following values.
To define a field for both exact and range searching, define the field twice; once with this attribute set to exact and once with it set to range. |
|
The configuration information for a search. Each search-option element defines one type of search for the page. |
|
A short phrase describing the type of search to perform, such as “Alphanumeric Search” or “Phonetic Search”. This appears next to the option button on the search page when multiple search options are defined for one page. |
|
The type of query to use when this type of search is selected. The value entered here must match a query-builder name in the Candidate Select file. |
|
An indicator of whether the results of the search are assigned matching probability weights. Specify true to assign matching weights or false to return unweighted results. |
|
The maximum number of records to return for a search. This value must be a positive number, and is only used for blocking queries. Setting the candidate threshold to zero is equivalent to not setting a threshold. |
|
A list of optional parameters for the search. |
|
The name of the parameter. Currently, only UseWildCard is available. |
|
The value of the parameter. For the UseWildCard parameter, this is an indicator of whether the parameter is enabled or disabled. Specify true to allow wildcard characters or false to perform exact-match searches. |
|
The configuration information for the Search Results page. |
|
The number of resulting records to display on one page. |
|
The maximum number of records to return for a search. |
|
The simple field names of the fields to appear in the search results list using their corresponding objects as the root. For example, the path to the FirstName field in the Person object is “Person.FirstName”. You can define multiple field-ref elements. The EUID appears in the list by default, so it does not need to be specified here. |
|
The configuration information for the View/Edit page. |
|
The number of fields to display in each row of the EDM. For best readability, this value should be kept at 1. |
|
The configuration of the Create System Record pages. |
|
The name of the type of object created (this must be the parent object). |
|
A name for the Create System Record pages. This name appears on tab label associated with the Create System Record pages on the EDM. |
|
The URL to the entry page of the Create System Record pages. This element should not be modified. |
|
The configuration information for the History page. |
|
The name of the type of object displayed on the History pages (this must be the parent object). |
|
A name for the History pages. This name appears on tab label associated with the History pages on the EDM. |
|
The URL to the entry page of the History pages. This element should not be modified. |
|
The configuration information for the History Search page. |
|
The number of fields to display in each row of the History Search page. For best readability, this value should be set at 1 or 2. |
|
The configuration information for the History Search Results page. |
|
The number of transaction records to display on each page of the search results. |
|
The maximum number of records to return for a History search. |
|
The simple field names of the fields to appear in the search results list using their corresponding objects as the root. These fields are in addition to the permanent fields, which include TransactionID, EUID1, EUID2, System, LID1, LID2, Function, SystemUser, and TimeStamp. |
|
The configuration information for the merge history tree. |
|
The configuration information for the fields that appear on the merge tree in addition to those that permanently appear in the merge tree transaction table. Permanent fields in the transaction table include TransactionID, EUID1, EUID2, System, LID1, LID2, Function, SystemUser, and TimeStamp. Any fields defined here also appear on the History Search Result page, but if a field is listed in both this element and the search-result-list-page element, only one instance of the field appears in the results list |
|
The configuration information for the Matching Review page. |
|
The name of the type of object displayed on the Matching Review pages (this must be the parent object). |
|
A name for the Matching Review pages. This name appears on tab label associated with the Matching Review pages on the EDM. |
|
The URL to the entry page of the Matching Review pages. This element should not be modified. |
|
The configuration information for the Potential Duplicate Search page. |
|
The number of fields to appear in each row on the Potential Duplicate Search page. For best readability, this value should be set at 1 or 2. |
|
The configuration information for the Matching Review Search Results page. |
|
The number of resulting records to display on each page of the search results. |
|
The maximum number of records to return for the Matching Review search. |
|
The configuration information for the Reports page. |
|
The name of the type of object displayed on the Reports pages (this must be the parent object). |
|
A name for the Reports pages. This name appears on tab label associated with the Reports pages on the EDM. |
|
The URL to the entry page of the Reports pages. This element should not be modified. |
|
The number of fields to display in each row of the Reports Search page. |
|
Configuration information for each report run by the EDM with the exception of search reports (which do not need to be configured). Each report is defined by a report element. These reports are identical to the reports that can be run from the command line report client. |
|
The type of report being generated. Specify any of the following production reports.
Or specify any of the following activity reports.
|
|
The descriptive name of the report. This can be any string and will appear as the title in the specified report. |
|
An indicator of whether the report can be run from the EDM. Specify true to allow the report to be run; specify false to disable the report. |
|
The number of records to display on the report. If no value is entered or if the value is zero (0), the size defaults to 1000 records. To retrieve all records for a report, enter a very large value for this element. |
|
The configuration information for the fields that appear on each report. |
|
A list of fields to display on the report in addition to those that are displayed automatically. Use the simple field name for the field-ref value (simple field names are described in Master Index Field Notations). This element should be empty for the activity reports; if a list of fields is supplied for any activity reports, it is ignored. |
|
The configuration information for the Audit Log pages. |
|
An indicator of whether entries are written to the audit log each time data is accessed on the EDM. Specify true to enable the audit log. Specify false to disable the audit log. If the audit log is maintained, the Audit Log page is available on the EDM for searching and viewing audit log entries and the information is stored in the sbyn_audit table. |
Below is a short sample of the Enterprise Data Manager file based on a master index application processing person information.
<node-Person> <field-LastName> <display-name>Last Name</display-name> <display-order>1</display-order> <max-length>40</max-length> <gui-type>TextBox</gui-type> <value-type>string</value-type> <key-type>true</key-type> </field-LastName> <field-FirstName> <display-name>First Name</display-name> <display-order>2</display-order> <max-length>40</max-length> <gui-type>TextBox</gui-type> <value-type>string</value-type> <key-type>true</key-type> </field-FirstName> <field-DOB> <display-name>DOB</display-name> <display-order>3</display-order> <max-length>32</max-length> <gui-type>TextBox</gui-type> <value-type>date</value-type> <key-type>true</key-type> </field-DOB> <field-Gender> <display-name>Gender</display-name> <display-order>4</display-order> <max-length>8</max-length> <gui-type>MenuList</gui-type> <value-list>GENDER</value-list> <value-type>string</value-type> <key-type>true</key-type> </field-Gender> <field-SSN> <display-name>SSN</display-name> <display-order>5</display-order> <max-length>16</max-length> <gui-type>TextBox</gui-type> <value-type>string</value-type> <input-mask>DDD-DD-DDDD</input-mask> <value-mask>DDDxDDxDDDD</value-mask> <is-sensitive>true</is-sensitive> </field-SSN> </node-Person> <node-Alias display-order="1"> <field-LastName> <display-name>LastName</display-name> <display-order>1</display-order> <max-length>40</max-length> <gui-type>TextBox</gui-type> <value-type>string</value-type> <key-type>true</key-type> </field-LastName> <field-FirstName> <display-name>FirstName</display-name> <display-order>2</display-order> <max-length>40</max-length> <gui-type>TextBox</gui-type> <value-type>string</value-type> <key-type>true</key-type> </field-FirstName> </node-Alias> <relationships> <name>Person</name> <children>Alias</children> </relationships> <impl-details> <master-controller-jndi-name>ejb/PersonMasterController </master-controller-jndi-name> <validation-service-jndi-name>ejb/PersonCodeLookup </validation-service-jndi-name> <usercode-jndi-name>ejb/PersonUserCodeLookup</usercode-jndi-name> <reportgenerator-jndi-name>ejb/PersonReportGenerator </reportgenerator-jndi-name> <debug-flag>true</debug-flag> <debug-dest>console</debug-dest> <enable-security>true</enable-security> <object-sensitive-plug-in-class> com.stc.eindex.security.VIPObjectSensitivePlugIn </object-sensitive-plug-in-class> </impl-details> <gui-definition> <system-display-name-overrides> <local-id-header>System Identifier</local-id-header> <local-id>System ID</local-id> </system-display-name-overrides> <page-definition> <eo-search> <root-object>Person</root-object> <tab-name>Person Search</tab-name> <tab-entrance>/EnterEOSearchSimpleAction.do</tab-entrance> <simple-search-page> <screen-title>Advanced Lookup (Phonetic)</screen-title> <field-per-row>2</field-per-row> <show-euid>false</show-euid> <show-lid>false</show-lid> <instruction/> <field-group> <description>Demographics</description> <field-ref>Person.LastName</field-ref> <field-ref>Person.FirstName</field-ref> <field-ref choice="range">Person.DOB</field-ref> <field-ref>Person.Gender</field-ref> <field-ref>Person.SSN</field-ref> </field-group> <search-option> <display-name>Phonetic Search</display-name> <query-builder>BLOCKER-SEARCH2</query-builder> <weighted>true</weighted> <parameter> <name>UseWildcard</name> <value>false</value> </parameter> </search-option> </simple-search-page> <simple-search-page> <screen-title>Advanced Lookup (Alpha)</screen-title> <field-per-row>2</field-per-row> <show-euid>false</show-euid> <show-lid>false</show-lid> <instruction>Enter as much information as possible to narrow the search</instruction> <field-group> <description>Demographics</description> <field-ref>Person.LastName</field-ref> <field-ref>Person.FirstName</field-ref> <field-ref>Person.Gender</field-ref> <field-ref choice="range">Person.DOB</field-ref> <field-ref>Person.SSN</field-ref> </field-group> <search-option> <display-name>Alpha Search</display-name> <query-builder>ALPHA-SEARCH</query-builder> <weighted>false</weighted> <parameter> <name>UseWildcard</name> <value>true</value> </parameter> </search-option> </simple-search-page> <simple-search-page> <screen-title>Simple Person Lookup</screen-title> <field-per-row>2</field-per-row> <show-euid>true</show-euid> <show-lid>true</show-lid> <instruction/> <field-group> <description>SSN</description> <field-ref>Person.SSN</field-ref> </field-group> <search-option> <display-name>Alpha Search</display-name> <query-builder>ALPHA-SEARCH</query-builder> <weighted>false</weighted> <parameter> <name>UseWildcard</name> <value>true</value> </parameter> </search-option> </simple-search-page> <search-result-list-page> <item-per-page>10</item-per-page> <max-result-size>100</max-result-size> <field-ref>Person.LastName</field-ref> <field-ref>Person.FirstName</field-ref> <field-ref>Person.DOB</field-ref> </search-result-list-page> <eo-view-page> <field-per-row>1</field-per-row> </eo-view-page> </eo-search> <create-eo> <root-object>Person</root-object> <tab-name>Create System Record</tab-name> <tab-entrance>/EnterEOCreateAction.do</tab-entrance> </create-eo> <history> <root-object>Person</root-object> <tab-name>History</tab-name> <tab-entrance>/EnterXASearchAction.do</tab-entrance> <xa-search-page> <field-per-row>2</field-per-row> </xa-search-page> <search-result-list-page> <item-per-page>10</item-per-page> <max-result-size>100</max-result-size> <field-ref>Person.FirstName</field-ref> <field-ref>Person.LastName</field-ref> </search-result-list-page> <merge-history-key-field> <field-ref>Person.FirstName</field-ref> <field-ref>Person.LastName</field-ref> </merge-history-key-field> </history> <matching-review> <root-object>Person</root-object> <tab-name>Matching Review</tab-name> <tab-entrance>/EnterPDSearchAction.do</tab-entrance> <pd-search-page> <field-per-row>2</field-per-row> </pd-search-page> <search-result-list-page> <item-per-page>10</item-per-page> <max-result-size>100</max-result-size> </search-result-list-page> </matching-review> <reports> <root-object>Person</root-object> <tab-name>Reports</tab-name> <tab-entrance>/EnterReportSearchAction.do</tab-entrance> <search-page-field-per-row>2</search-page-field-per-row> <report name="Potential Duplicate" title="Potential Duplicate Report"> <enable>true</enable> <max-result-size>2000</max-result-size> <fields> <field-ref>Person.FirstName</field-ref> <field-ref>Person.LastName</field-ref> <field-ref>Person.SSN</field-ref> <field-ref>Person.DOB</field-ref> </fields> </report> </reports> <audit-log> <allow-insert>false</allow-insert> </audit-log> </page-definition> </gui-definition> |
The configuration files use specific notations to define a specific field or a group of fields in an enterprise or system object. There are three different types of notations used by Sun Master Index.
The following topics describe each type of notation used:
In the Best Record file, an element path, called an ePath, is used to specify the location of a field or list of fields. ePaths are also used in the StandardizationConfig element of the Match Field file. An ePath is a sequence of nested nodes in an enterprise record where the most nested element is a data field or a list of data fields. ePaths allow you to retrieve and transform values that are located in the object tree.
ePath strings can be of four basic types:
ObjectField - A field defined in the master index object structure.
ObjectNode - A parent or child object defined in the master index object structure.
ObjectField List - A list of references to certain ObjectFields in the master index object structure.
ObjectNode List - A list of references to certain ObjectNodes in the master index object structure.
A context node is specified when evaluating each ePath expression. The context is considered as the root node of the structure for evaluation.
These topics describe and illustrate how to form ePath strings:
The syntax of an ePath consists of three components: nodes, qualifiers, and fields, as shown below.
node{.node{”[”qualifier’]’}+}+.field |
Node - Specifies the node type and optionally includes qualifiers to restrict the number of nodes. A node without any qualifier defaults to only the first node of the specified type. Use “node.*” to address a node rather than a field.
Qualifier - Restricts the number of nodes addressed at each level. The following qualifiers are allowed:
* (asterisk) - Denotes all nodes of the specified type.
int - Accesses the node by index.
@keystring= valuestring - Accesses the node using a key-value pair. Only one instance of the node is addressed using keys. If a composite key is defined, then multiple key-value pairs can be separated by a comma in the ePath (for example, [@key1=value1,@key2=value2]). The following ePath uses the keystring qualifier and returns the alias where the unique key field type is “Main”. It returns only one alias in a given record.
Person.Alias[@type=Main]
filter=value - Considers only nodes whose field matches the specified value. A subset of nodes is addressed using filters. Multiple filter-value pairs can be separated by a comma (for example, [filter1=value1, filter2=value2]). The following ePath uses the filter qualifier and returns all aliases where the last name is “Jones”.
Person.Alias[lastname=Jones]
Field - Designates the field to return and is in the form of a string.
The following sample illustrates an object structure containing a system object from Site A with a local ID of 111. The object contains a first name, last name, and three addresses. Following the sample, there are several ePath examples that refer to various elements of this object structure along with a description of the data in the sample object structure referred by each ePath.
Enterprise SystemObject - A 111 Person FirstName LastName -Address AddressType = Home Street = 800 Royal Oaks Dr. City = Monrovia State = CA PostalCode = 91016 -Address AddressType = Office Street = 181 2nd Ave.. City = Monrovia State = CA PostalCode = 91016 -Address AddressType = Billing Street = 100 Grand Avenue City = El Segundo State = CA PostalCode = 90245 |
Person.Address.City – Equivalent to Person.Address[0].City.
Person.FirstName – Uses Person as the context, and is equivalent to Enterprise.SystemObject[@SystemCode=A, @Lid= 111].Person.FirstName with Enterprise as the context.
Person.Address[@AddressType=Home].City – Returns a single ObjectField reference to “Monrovia” (the City field of the home address).
Person.Address[City=Monrovia,State=CA].Street – Returns a list of ObjectField references: “800 Royal Oaks Dr.”, “181 2nd Ave.” (the street fields for both addresses where the city is Monrovia and the state is CA). Note that a reference to the Billing address is not returned.
Person.Address[*].Street – Returns a list of ObjectField references: “800 Royal Oaks Dr.”, “181 2nd Ave..”, “100 Grand Avenue”. Note that all references to Street are returned.
Person.Address[2].* – Addresses the second address object as an ObjectNode instead of an ObjectField.
In the Candidate Select file and the MatchingConfig element of the Match Field file use qualified field names to specify the location of a field. This method defines a specific field and is not used to define a list of fields. A qualified field name is a sequence of nested nodes in an enterprise record where the most nested element is a data field.
There are two types of qualified field names.
Fully qualified field names - Allow you to define fields within the context of the enterprise object; that is, the field name uses Enterprise as the root. These are used in the MatchingConfig element of the Match Field file and to specify the fields in a query block in the Candidate Select file.
Qualified field names - Allow you to define fields within the context of the parent object; that is, the field name uses the name of the parent object as the root. These are used in the Candidate Select file to specify the source fields for the blocking query criteria.
The following topics describe and illustrate how to form qualified field name strings.
The syntax of a fully qualified field name is:
Enterprise.SystemSBR.parent_object.child_object.field_name
where parent_object refers to the name of the parent object in the index, child_object refers to the name of the child object that contains the field, and field_name is the full name of the field. If the parent object contains the field being defined, the child object is not required in the path.
The syntax of a qualified field name is:
parent_object.child_object.field_name
The following sample illustrates an object structure that could be defined in the Object Definition file. The object contains a Person parent object, and Address and Phone child objects.
Person FirstName LastName DateOfBirth Gender -Address AddressType StreetAddress Street City State PostalCode -Phone PhoneType PhoneNumber |
The following fully qualified field names are valid for the sample structure above.
Enterprise.SystemSBR.Person.FirstName
Enterprise.SystemSBR.Person.Address.StreetAddress
Enterprise SystemSBR.Person.Phone.PhoneNumber
The qualified field names that correspond with the fully qualified names listed above are:
Person.FirstName
Person.Address.StreetAddress
Person.Phone.PhoneNumber
In the Enterprise Data Manager file, simple field names are used to specify the location of a field that appears on the EDM. These are used in the GUI configuration section of the file. Simple field names define a specific field and are not used to define a list of fields. They include only the field name and the name of the object that contains the field. Simple field names allow you to define fields within the context of an object.
The following topics describe and illustrate how to form simple field notations:
The syntax of a simple field name is:
object.field_name
where object refers to the name of the object that contains the field being defined and field_name is the full name of the field.
The following sample illustrates an object structure that could be defined in the Object Definition file. The object contains a Person parent object, and Address and Phone child objects.
Person FirstName LastName DateOfBirth Gender -Address AddressType StreetAddress Street City State PostalCode -Phone PhoneType PhoneNumber |
The following simple field names are valid for the sample structure above.
Person.FirstName
Address.StreetAddress
Phone.PhoneNumber