Understanding Sun Master Index Configuration Options (Repository)

Candidate Select Configuration (Repository)

In the Candidate Select file, you configure properties of the Query Builder, which is a class that uses defined criteria and options to generate queries and query results from a master index database. The criteria and options used by the Query Builder to create database queries are defined in the Candidate Select file. The criteria must be fields that are defined in the Object Definition, and the options are key and value pairs that fine-tune the query operation. You can define the characteristics of the searches performed from the Enterprise Data Manager and of the queries used by the master index application to search for a candidate pool of potential matches for incoming records.

The following topics provide information about queries and the structure of the Candidate Select file:

Query Builder Components (Repository)

The master index application performs two types of queries. Users perform manual queries from the EDM and the master index application automatically performs queries before processing matches for an incoming record. Two types of queries, basic queries and blocking queries, are predefined in the Query Builder. By default, basic queries are defined for the EDM and blocking queries are defined for match processing, though this is not required. You can also use a blocking query for the phonetic searches performed from the EDM. Both types of queries are configured by the Candidate Select file, and custom queries can be created and implemented with the master index application.

You can configure certain query properties. You can configure both basic and blocking queries to search on standardized or phonetic versions of the search criteria, and you can also specify that they search on exact values or a range of values. Basic queries can be configured to allow wildcard characters. For the blocking queries, you define the criteria to include in each block of query criteria.

The following topics provide additional information about the different types of queries:

Basic Queries in a Master Index (Repository)

By default, searches performed from the EDM follow the logic defined in the configured basic queries. You can specify which query type to use for each search defined for the EDM (this is specified in the Enterprise Data Manager file). These searches can be weighted, which means that the match engine calculates the likelihood that the search results match the actual search criteria and assigns a matching weight to each returned record. You can specify whether the search is performed on the original or phonetic version of the criteria.

The basic query uses all supplied search criteria to create a single SQL query. For this query, each field in the WHERE clause is joined by an AND operator, meaning that only records that match on all search fields are returned. This query has an option to allow wildcard characters in the search criteria (a percent sign (%) indicates multiple unknown characters). When this option is set to true, the query uses the LIKE operator rather than EQUALS. This option allows you to search by criteria for which you have incomplete data.

The searches performed from the EDM can be further customized in the Enterprise Data Manager file (for more information, see Enterprise Data Manager Configuration).

Blocking Queries in a Master Index (Repository)

When the master index application evaluates possible matches of records sent to the master index application from external systems and from the EDM, the index performs a set of predefined SQL queries to retrieve a subset of possible matches. These queries are known as blocking queries. The matching algorithm processes the input record against the profiles retrieved from the blocking query (known as the candidate pool) and assigns them matching probability weights.

In the Candidate Select file, you define the criteria and conditions for querying the database to retrieve the subset of possible matches to the incoming record, including Oracle hints and SQL Server OPTION hints. You can define multiple queries, known as blocks, for each blocking query, and the master index application performs each of these queries in turn until sufficient records are retrieved (called a match pass). Using the default Query Builder, a block is only processed if the search criteria include all of the fields defined for that block. Each field in a block is joined by an AND operator in the WHERE clause, and each block is joined by a UNION operator. This type of search can also be used as a phonetic search in the EDM.

The blocking queries you define here are referenced in the Threshold file, which specifies which one of the defined blocking queries to use for match processing. They might also be referenced in the Enterprise Data Manager file if a blocking query is used for phonetic searches from the EDM. To enable extensive searching (that is, searching against additional tables, such as an alias table for a person index), you must add the fields from that table to the blocking query.

Phonetic Queries in a Master Index (Repository)

You can configure both basic queries and blocking queries to perform phonetic searches from the EDM. If you use a basic query, then all entered criteria must match existing records in order to return results from the search. If you use a blocking query, several queries are performed using different combinations of data until enough matching records are returned or until all defined combinations have been tried.

For example, if you use a basic query and enter first and last name, date of birth, gender, and SSN for criteria, the basic query might not return any matches if any one of those fields does not match the criteria. However, if you use a blocking query for the same example, it might search on SSN, then on first name and date of birth, and then on last name and gender. The query returns any matching records from any of the query passes.

Range Searching (Repository)

Both basic and blocking queries can be configured to perform exact searches or range searches. An exact search performs a query for the exact value entered into a field as search criteria; range searches perform a query on a range of values based on the value entered into a field as search criteria. The basic query supports standard range searching, where both the lower and upper limits of the range is supplied. The blocking query supports standard range searching plus two additional types that use predefined offset values or constants.

Offset values allow you to specify values to be added to or subtracted from the entered value to determine the range on which to search. Constants provide a default value to use as a range when no value is entered or when incomplete information is available.

Range searching is configured in both the Enterprise Data Manager file and the Candidate Select file. The processing logic for different types of range searching is described in Range Search Processing (Repository).

The Candidate Select File (Repository)

The properties for the predefined queries are defined in the Candidate Select file in XML format. Some of the information entered into the default configuration file is based on the fields you specified for blocking in the wizard, and some is standard across all implementations. For most implementations, this file will require some customization.

The following topics provide information about working with the Candidate Select file:

Modifying the Candidate Select File

You can modify the Candidate Select file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. The properties of the blocking query used by the match process should not be modified after moving into production because it can cause unexpected matching weight results. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes. Most of the components in this file can be configured using the Configuration Editor, which simplifies the process of defining queries by providing a graphical interface to perform the required tasks.

Candidate Select File Description

Table 2 lists each element in the Candidate Select file and provides a description of each element along with any requirements or constraints for each element.

Table 2 Candidate Select File Structure

Element/Attribute 

Description 

QueryBuilderConfig

The configuration class for the query builders. This should not be modified. 

query-builder

A list of query definitions. This element defines each query and the attributes of each query. 

query-builder/name

A unique ID for the element. This element is used to identify the Query Builder and is referenced from the Enterprise Data Manager file when specifying the query to use on a search page. It is also referenced from the Match Field file when specifying the query to use for matching. No spaces are allowed in this attribute. 

query-builder/class

The fully qualified name of the query class. Two default Query Builder classes are provided. 

  • com.stc.eindex.querybuilder.BasicQueryBuilder – Builds dynamic queries using all the available input fields. When configured to use normalized and phonetic data, this query performs phonetic searches; when configured not to use normalized and phonetic data, this query is used for exact alphanumeric searches.

  • com.stc.eindex.querybuilder.BlockerQueryBuilder – Builds queries using the criteria defined in the block definitions defined for the query. When a blocking query is performed, the application searches only on the blocks for which the query has complete data.

query-builder/parser-class

The fully qualified name of the class that parses the config elements for each query. This should not be modified for the default queries.

query-builder/standardize

An indicator of whether the query criteria is standardized before being passed to the query. Specify true if any fields are standardized for the query; specify false if no fields are standardized for the query.

query-builder/phoneticize

An indicator of whether the query criteria is phonetically encoded before being passed to the query. Specify true if any fields are phonetically encoded for the query specify; false if no fields are phonetically encoded for the query.

config

The configuration information for a query. Each query-builder element contains one config element.

option

One query parameter, specified by key and value attributes, as described below. This is only used by basic queries; blocking queries do not use this element.

option/key

A parameter for the query option. For the default basic query, only the UseWildCard key is available.

option/value

The value of the key specified by the corresponding key attribute. For the default option, UseWildCard, specify true to allow wildcard characters for that query type; otherwise specify false. When wildcard characters are enabled, you can enter a percent sign (%) to indicate multiple unknown characters.

block-definition

A list of Oracle hints or SQL Server OPTION hints and defined query criteria blocks, which are identified by unique ID numbers. 

block-definition/number

An attribute of the block-definition element that specifies the unique ID number of each query block. Each block defined for the blocking query must be identified by a unique ID.

hint

A hint to add to the query to help optimize query execution. Hints are especially useful when a blocking query uses only child object fields; the hint can specify to scan the child object table first. This element is optional. For SQL Server, only OPTION hints are supported.

block-rule

A list of fields to be included in each query block, including indicators of whether a range is to be used and, if so, what type of range search to perform. 

type of search

An indicator of the type of search to perform on the field defined in the following elements. Each type of search element defines one field in a block-rule element; that is, one field in a query block. This element includes a field element, a source or constant element, and, for range searches only, a default element that defines lower and upper bounds.

Specify one of the following types.

  • equals - Performs an exact search against either the criteria or the value defined for the constant element.

  • not-equals - Searches for values that do not equal either the criteria or the value defined for the constant element.

  • greater-than-or-equal - Performs a search for values that are greater than or equal to either the criteria or the value defined for the constant element.

  • less-than-or-equal - Performs a search for values that are less than or equal to either the criteria or the value defined for the constant element.

  • range - Performs a search against a range of either static or user-defined ranges. If you select this option, you must specify upper and lower bounds in a default element.


Tip –

If a field is to be used for simple range searching (where the user or incoming message supplies lower and upper limits of the range are supplied) be sure to define that field for range searching in the Enterprise Data Manager file for the searches that use this query. For more complex range searches that use offset values or constants instead of user-supplied limits, do not define the field for range searching in the Enterprise Data Manager file.


field

The fully qualified field name of the field to be included in the query block (for example, Enterprise.Person.Address.AddressLine1). 

source

The qualified field name of the source field in the object from which the criteria is obtained (for example, Person.Address.AddressLine1). An asterisk (*) can be used as a wildcard character. If the criteria should be a constant value instead of being supplied by a user or incoming message, define a constant element instead of a source element.


Tip –

When a field in a child object is defined for a blocking query, use the asterisk wildcard character in the ePath to the source field to ensure all instances of the child object in an incoming message are used as search criteria. Each instance is joined by an OR operator. For example, this configuration:


<field>Enterprise.SystemSBR.Person.Alias.FirstName

</field>

<source>Person.Alias[*].FirstName</source>

would result in a WHERE clause similar to this:


WHERE Alias.FirstName=”Meg” OR Alias.FirstName=”Maggie”

constant

A constant value that provides the criteria for a search. Define this element instead of a source element if the criteria is a constant rather than being user defined. You can use a constant value with the following types of queries: equals, not-equals, greater-than-or-equals, and less-than-or-equals.

default

A list of upper and lower limits defining a range search. If no limits are defined, the search is a simple range search in which the upper and lower values are supplied by the user or the incoming message (for example, in “Date of Birth From” and “Date of Birth To” fields). 

lower-bound

The lower limit of a constant or offset range search. Use a negative number for the lower limit of an offset search. This number is added to the value supplied for the search to determine the lower limit of the range. The value can be numeric, date, or string. See Range Search Processing (Repository) for more information.

lower-bound/type

The type of range search. Define the type attribute as offset to use an offset value or as constant to define a lower constant.

upper-bound

The upper limit of a constant or offset range search. The value can be numeric, date, or string. See Range Search Processing (Repository) or more information.

upper-bound/type

The type of range search. Define the type attribute as offset to use an offset value or as constant to define an upper constant.

Candidate Select Example

Below is a sample illustrating the elements in the Candidate Select file.


<QueryBuilderConfig module-name="QueryBuilder" parser-class=
   "com.stc.eindex.configurator.impl.querybuilder.QueryBuilderConfiguration">
   <query-builder name="ALPHA-SEARCH"
    class="com.stc.eindex.querybuilder.BasicQueryBuilder"
    parser-class="com.stc.eindex.configurator.impl.querybuilder.
    KeyValueConfiguration" standardize="true" phoneticize="false">
      <config>
         <option key="UseWildcard" value="true"/>
      </config>
   </query-builder>
   <query-builder name="PHONETIC-SEARCH"
    class="com.stc.eindex.querybuilder.BasicQueryBuilder"
    parser-class="com.stc.eindex.configurator.impl.querybuilder.
    KeyValueConfiguration" standardize="true" phoneticize="true">
      <config>
         <option key="UseWildcard" value="false"/>
      </config>
   </query-builder>
   <query-builder name="BLOCKER-SEARCH"
    class="com.stc.eindex.querybuilder.BlockerQueryBuilder" parser-
    class="com.stc.eindex.configurator.impl.blocker.BlockerConfig"
    standardize="true" phoneticize="true">
      <config>
         <block-definition number="ID000000">
            <block-rule>
               <equals>
                  <field>Enterprise.SystemSBR.Person.FnamePhonetic
                  </field>
                  <source>Person.FnamePhoneticCode</source>
               </equals>
               <equals>
                  <field>Enterprise.SystemSBR.Person.LnamePhonetic
                  </field>
                  <source>Person.LnamePhoneticCode</source>
               </equals>
            </block-rule>
         </block-definition>
         <block-definition number="ID000001">
            <block-rule>
               <equals>
                  <field>Enterprise.SystemSBR.Person.SSN</field>
                  <source>Person.SSN</source>
               </equals>
            </block-rule>
         </block-definition>
         <block-definition number="ID000002">
            <hint>ALL_ROWS</hint>
            <block-rule>
               <equals>
                  <field>Enterprise.SystemSBR.Person.FnamePhonetic
                  </field>
                  <source>Person.FnamePhoneticCode</source>
               </equals>
               <range>
                  <field>Enterprise.SystemSBR.Person.DOB</field>
                  <source>Person.DOB</source>
                  <default>
                     <lower-bound type="offset">-5</lower-bound>
                     <upper-bound type="offset">5</upper-bound>
                  </default>
               </range>
               <equals>
                  <field>Enterprise.SystemSBR.Person.Gender</field>
                  <source>Person.Gender</source>
               </equals>
            </block-rule>
         </block-definition>
      </config>
   </query-builder>
</QueryBuilderConfig>