Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Configuration Reference Java CAPS Documentation |
Oracle Java CAPS Master Index Configuration Reference
About Oracle Java CAPS Master Index
Oracle Java CAPS Master Index Configuration
Features of Oracle Java CAPS Master Index
Configuration Overview for Oracle Java CAPS Master Index
About the Configuration Files for Oracle Java CAPS Master Index
Master Index validation.xml File
Master Index security.xml File
Match and Standardization Engine Configuration Files
Using the Editors for Oracle Java CAPS Master Index
Master Index Configuration Editor
Master Index Object Definition Configuration
Master Index Object Definition Components
Master Index Object Definition Objects
Master Index Object Definition Fields
Master Index Object Definition Relationships
The Master Index object.xml File
Modifying the Master Index Object Definition
Basic Queries in a Master Index
Blocking Queries in a Master Index
Blocking Query Block Processing
Phonetic Queries in a Master Index
Master Controller Configuration
Custom Logic Classes in master.xml
Merged Record Updates in master.xml
Match and Standardization Engines
Block Picker and Pass Controller
Sample Standardization and Matching Sequence
Survivor Strategy Configuration
The Survivor Calculator and the SBR
Survivor Helper Default Strategy
Survivor Helper Weighted Strategy
Survivor Helper Union Strategy
Weighted Calculator SourceSystem Strategy
Weighted Calculator SystemAgreement Strategy
Weighted Calculator MostRecentModified Strategy
Update Manager Update Policies
Update Manager Update Policy Flag
SBR, Matching, and Blocking Filter Configuration
Field Validation Configuration
In query.xml, you configure properties of the Query Builder, which is a class that uses defined criteria and options to generate queries and query results from a master index database. The criteria and options used by the Query Builder to create database queries are defined in query.xml. The criteria must be fields that are defined in the Object Definition, and the options are key and value pairs that fine-tune the query operation. You can define the characteristics of the searches performed from the Master Index Data Manager and of the queries used by the master index application to search for a candidate pool of potential matches for incoming records.
The following topics provide information about queries and the structure of query.xml:
The master index application performs two types of queries. Users perform manual queries from the MIDM and the master index application automatically performs queries before processing matches for an incoming record. Two types of queries, basic queries and blocking queries, are predefined in the Query Builder. By default, basic queries are defined for the MIDM and blocking queries are defined for match processing, though this is not required. You can also use a blocking query for the phonetic searches performed from the MIDM. Both types of queries are configured by query.xml, and custom queries can be created and implemented with the master index application.
You can configure certain query properties. You can configure both basic and blocking queries to search on standardized or phonetic versions of the search criteria, and you can also specify that they search on exact values or a range of values. Basic queries can be configured to allow wildcard characters. For the blocking queries, you define the criteria to include in each block of query criteria.
The following topics provide additional information about the different types of queries:
By default, searches performed from the MIDM follow the logic defined in the configured basic queries. You can specify which query type to use for each search defined for the MIDM (this is specified in midm.xml). These searches can be weighted, which means that the match engine calculates the likelihood that the search results match the actual search criteria and assigns a matching weight to each returned record. You can specify whether the search is performed on the original or phonetic version of the criteria.
The basic query uses all supplied search criteria to create a single SQL query. For this query, each field in the WHERE clause is joined by an AND operator, meaning that only records that match on all search fields are returned. This query has an option to allow wildcard characters in the search criteria (a percent sign (%) indicates multiple unknown characters). When this option is set to true, the query uses the LIKE operator rather than EQUALS. This option allows you to search by criteria for which you have incomplete data.
The searches performed from the MIDM can be further customized in midm.xml (for more information, see Master Index Data Manager Configuration).
When the master index application evaluates possible matches of records sent to the master index application from external systems and from the MIDM, the index performs a set of predefined SQL queries to retrieve a subset of possible matches. These queries are known as blocking queries. The matching algorithm processes the input record against the profiles retrieved from the blocking query (known as the candidate pool) and assigns them matching probability weights.
In query.xml, you define the criteria and conditions for querying the database to retrieve the subset of possible matches to the incoming record, including hints. You can define multiple queries, known as blocks, for each blocking query, and the master index application performs each of these queries in turn until sufficient records are retrieved (called a match pass). Using the default Query Builder, a block is only processed if the search criteria include all of the fields defined for that block. Each field in a block is joined by an AND operator in the WHERE clause, and each block is joined by a UNION operator. This type of search can also be used as a phonetic search in the MIDM.
The blocking queries you define here are referenced in master.xml, which specifies which one of the defined blocking queries to use for match processing. They might also be referenced in midm.xml if a blocking query is used for phonetic searches from the MIDM. To enable extensive searching (that is, searching against additional tables, such as an alias table for a person index), you must add the fields from that table to the blocking query.
Certain fields used as criteria in the blocking query might contain known default or invalid values. You can define exclusion lists to filter out unwanted or invalid values from the blocking query. For more information, see SBR, Matching, and Blocking Filter Configuration.
You can configure both basic queries and blocking queries to perform phonetic searches from the MIDM. If you use a basic query, then all entered criteria must match existing records in order to return results from the search. If you use a blocking query, several queries are performed using different combinations of data until enough matching records are returned or until all defined combinations have been tried.
For example, if you use a basic query and enter first and last name, date of birth, gender, and SSN for criteria, the basic query might not return any matches if any one of those fields does not match the criteria. However, if you use a blocking query for the same example, it might search on SSN, then on first name and date of birth, and then on last name and gender. The query returns any matching records from any of the query passes.
Both basic and blocking queries can be configured to perform exact searches or range searches. An exact search performs a query for the exact value entered into a field as search criteria; range searches perform a query on a range of values based on the value entered into a field as search criteria. The basic query supports standard range searching, where both the lower and upper limits of the range is supplied. The blocking query supports standard range searching plus two additional types that use predefined offset values or constants.
Offset values allow you to specify values to be added to or subtracted from the entered value to determine the range on which to search. Constants provide a default value to use as a range when no value is entered or when incomplete information is available.
Range searching is configured in both midm.xml and query.xml. The processing logic for different types of range searching is described in Range Search Processing.
The properties for the predefined queries are defined in query.xml. Some of the information entered into the default configuration file is based on the fields you specified for blocking in the wizard, and some is standard across all implementations. For most implementations, this file will require some customization.
The following topics provide information about working with query.xml:
You can modify query.xml at any time, but you must regenerate the application and redeploy the project after making any changes to the file. The properties of the blocking query used by the match process should not be modified after moving into production because it can cause unexpected matching weight results. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes. Most of the components in this file can be configured using the Configuration Editor, which simplifies the process of defining queries by providing a graphical interface to perform the required tasks.
This topic describes the structure of the XML file, including general requirements and constraints, and provides a sample implementation.
Table 2 lists each element in query.xml and provides a description of each element along with any requirements or constraints for each element.
Table 2 query.xml File Structure
|
Below is a sample illustrating the elements in query.xml.
<QueryBuilderConfig module-name="QueryBuilder" parser-class= "com.sun.mdm.index.configurator.impl.querybuilder.QueryBuilderConfiguration"> <query-builder name="ALPHA-SEARCH" class="com.sun.mdm.index.querybuilder.BasicQueryBuilder" parser-class="com.sun.mdm.index.configurator.impl.querybuilder. KeyValueConfiguration" standardize="true" phoneticize="false"> <config> <option key="UseWildcard" value="true"/> </config> </query-builder> <query-builder name="PHONETIC-SEARCH" class="com.sun.mdm.index.querybuilder.BasicQueryBuilder" parser-class="com.sun.mdm.index.configurator.impl.querybuilder. KeyValueConfiguration" standardize="true" phoneticize="true"> <config> <option key="UseWildcard" value="false"/> </config> </query-builder> <query-builder name="BLOCKER-SEARCH" class="com.sun.mdm.index.querybuilder.BlockerQueryBuilder" parser- class="com.sun.mdm.index.configurator.impl.blocker.BlockerConfig" standardize="true" phoneticize="true"> <config> <block-definition number="ID000000"> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.FnamePhonetic </field> <source>Person.FnamePhoneticCode</source> </equals> <equals> <field>Enterprise.SystemSBR.Person.LnamePhonetic </field> <source>Person.LnamePhoneticCode</source> </equals> </block-rule> </block-definition> <block-definition number="ID000001"> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.SSN</field> <source>Person.SSN</source> </equals> </block-rule> </block-definition> <block-definition number="ID000002"> <hint>ALL_ROWS</hint> <block-rule> <equals> <field>Enterprise.SystemSBR.Person.FnamePhonetic </field> <source>Person.FnamePhoneticCode</source> </equals> <range> <field>Enterprise.SystemSBR.Person.DOB</field> <source>Person.DOB</source> <default> <lower-bound type="offset">-5</lower-bound> <upper-bound type="offset">5</upper-bound> </default> </range> <equals> <field>Enterprise.SystemSBR.Person.Gender</field> <source>Person.Gender</source> </equals> </block-rule> </block-definition> </config> </query-builder> </QueryBuilderConfig>
Both basic and blocking queries can be configured to perform both exact searches and range searches. The following topics describe how different configurations of exact and range searches are processed.
Range searching for basic queries is configured in the search page section of midm.xml by tagging the field with a “choice” attribute. When you specify a field for range searching, two corresponding fields appear on the MIDM with “From” and “To” appended to the name (for example, a field named “Date of Birth” would display two fields: “Date of Birth From” and Date of Birth To”). You can also define a field for both exact and range searching by defining the field twice for the search page, once with the choice attribute set to “exact” and once with it set to “range”. In this case, three fields appear on the MIDM: one with the given field name, one with “From” appended to the name, and one with “To” appended to the name.
Table 3 describes the queries formed for different exact or range search scenarios. Table 4 describes the queries formed for combination exact and range search scenarios.
The following variables are used in these tables:
field_name is the field name as specified in the search page section of midm.xml (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
Table 3 Standard Range Queries
|
In the following table, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 4 Combination Exact and Range Queries
|
Blocking queries are configured in query.xml, and, if the blocking query is used on the MIDM, in midm.xml. In order for the fields defined for range searching in the blocking query to appear on the MIDM, the fields must be configured correctly in midm.xml.
In addition to the standard range searching (described under Basic Query Range Searching), blocking queries support constant and offset range searches, allowing you to specify default upper and lower offset values or to specify upper and lower constant limits. Using offsets adds the specified values to the actual field value to determine the range on which to search. Note that this means the lower offset value should be a negative number and the upper offset value should be a positive number in order to create a valid range. You can also define a combination of a constant upper limit with lower offset value or a constant lower limit with an upper offset value.
When upper and lower offset values are defined, the application searches for values that are greater than or equal to the field value plus the lower offset value (which is typically a negative number) and less than or equal to the field value plus the upper offset value. You do not need to define both an upper and a lower offset value.
Note - For date fields, the method for adding the offsets is different for numeric than for date type fields. For numeric data types, the offset value is added to the actual number. For date data types, the offset value is added to the day portion of the date (for example, if the offsets were -5 and +5 and the date entered is 01/10/2005, then the upper and lower bounds would be 01/05/2005 and 01/15/2005).
Table 5 describes the queries formed for different exact or range offset search scenarios. Table 6 describes the query formed for combination exact and offset range search scenarios.
The following variables are used in these tables:
field_name is the field name as specified in the search page section of midm.xml (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower offset value
upper is the upper offset value
Table 5 Standard Offset Range Queries
|
In Table 6, the field configuration in midm.xml defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 6 Combination Offset Range Queries
|
When you define upper and lower constants for a field, these values are used for the WHERE clause of the query if no data is passed in as search criteria for that field. They are also used when only one of the “from” or “to” fields is populated. You do not need to define both an upper and a lower constant value. If you define only an upper constant value, only a “less than or equals” clause is used in the query; if you define only a lower constant value, only a “greater than or equals” clause is used in the query.
Note - For numeric type fields, the constant must be defined as all digits, with one decimal point allowed. For date type fields, the constant must be in the standard SQL format of yyyy-mm-dd.
Table 7 describes the queries formed for different exact or range constant search scenarios. Table 8 describes the query formed for combination exact and range search scenarios.
The following variables are used in these tables:
field_name is the field name as defined in the search page section of midm.xml (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower constant value
upper is the upper constant value
Table 7 Standard Constant Range Queries
|
In Table 8, the field configuration in midm.xml defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 8 Combination Constant Range Queries
|
You can use a combination of offset and constant values to define range searching for a field. Table 9 describes the query formed for combination offset and constant search scenarios.
The following variables are used in these tables:
field_name is the field name as defined in the search page section of midm.xml (the field named field_name is used for exact searching)
value is the value entered into the exact search field
value_from is the value entered into the field_name From field
value_to is the value entered into the field_name To field
lower is the lower constant or offset value
upper is the upper constant or offset value
In Table 9, the field configuration in midm.xml defines the field twice for searching, once with the choice attribute set to “exact” and once with it set to “range”.
In the following cases, when field_name is populated but not used in the WHERE clause, its value is used for weighting purposes. These cases are marked with an asterisk (*).
Table 9 Combination Constant and Offset Range Queries
|