5.2 Analyzers

The Analyzer assists compliance teams and technical users in understanding how specific input data—such as names or addresses—is tokenized, normalized, and matched against watchlist entries. It enables detailed examination of synonym expansion, token classification, and match scoring mechanisms that are essential for fine-tuning the screening engine and ensuring accurate detection of true positives while minimizing false positives.

Configure Analyzer on Index Management

  1. From the Navigation List, click Watchlist Management>Index Management. The index Management page displays a list of available watchlists.
  2. Click Index JSON

    Figure 5-1 Edit icon

    Edit Icon
    to view the JSON file details associated with this watch list.

    The Edit Index JSON pop-up appears.

  3. You can edit the JSON in this window.

    For example: If we want to use Country Synonyms for Country Name under Prohibited Country Watchlist, then edit the Country List's index json and change the analyzerType of particular attribute ("name" : "v_country_name") to address.

    {  "schemaName" : "",  "runSkey" : 68,  "batchRunId" : "WLDJWLoad_2024-03-28_1711619630679_1",  "tableName" : "FCC_WL_DJW_V",  "deletedProfilesTableName" : null,  "filterCondition" : "1=1",  "indexName" : "fcc_idx_djw",  "indexAlias" : "idx_djw",  "disasterRecovery" : false,  "indexLogicalName" : "Watchlist",  "indexBusinessName" : "Dow Jones",  "indexKeyAttribute" : "n_uid",  "loadType" : "FullLoad",  "shards" : 3,  "replicas" : 4,  "attributes" : [ {    "name" : "v_given_name", -- Each of these blocks is used to define
          pre-processing of the fields.    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name", -- Here, the user can enter any analyser mentioned in the
          list.    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_family_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_full_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases_given_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases_family_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_entity_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "organization",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_entity_name_bus_strip",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "organization",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_original_script_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_date_of_births",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "date",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_passports",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_ssn",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_identification_numbers",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_city",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_country",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_nationality",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_residence",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_yob",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_min_yob",    "type" : "integer",    "similarity" : "boolean",    "analyzerType" : null,    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_max_yob",    "type" : "integer",    "similarity" : "boolean",    "analyzerType" : null,    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_address",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_gender",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "gender",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_place_of_birth",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_title",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "gender",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  } ],  "customAnalyzer" : [ ],  "customFilter" : [ ],  "customCharFilter" : [ ],  "customTokenizer" : [ ],  "others" : [ "n_wl_skey", "n_run_skey", "v_wl_sub_type", "v_wl_type",
          "v_entity_type", "n_uid" ],  "replaceEmptyFields" : [ ],  "replaceCharFields" : [ {    "name" : "v_full_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_family_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_given_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_entity_name",    "charArray" : [ "-.", "''", "&", "()" ],    "replaceWith" : [ " ", "", "and", " " ]  }, {    "name" : "v_entity_name_bus_strip",    "charArray" : [ "-.", "''", "&", "()" ],    "replaceWith" : [ "", "", "and", " " ]  }, {    "name" : "v_original_script_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_country",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_address",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases_given_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases_family_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  } ],    "translateFields" : [ "v_family_name", "v_given_name", "v_full_name",
          "v_aliases_family_name", "v_aliases_given_name", "v_aliases" ]}}
  4. Click Validate to verify that your edits to the JSON are valid.
  5. Click Save to update the JSON or click Cancel to close without saving your changes.

Analyzer Types

Table 5-3 Analyzer Types

Analyzer Type Supported Filters Type Description
Name

Individual Name Synonyms

Individual Title

Synonym

Stop Word

The Name analyzer processes person names by applying standardization rules such as name synonyms (e.g., Will - William) and removal of non-essential titles (e.g., Mr., Dr.). This ensures better match results during screening.
Address Country Synonyms Synonym

The Address analyzer processes address components by resolving country-specific synonyms (e.g., USA - United States, , UK - United Kingdom)). This enhances consistency and accuracy in location-based matching.

Example:

Input: 123 Main St, NY,USA

Normalized to: 123 Main Street, New York, United States

Phone No Token Filters - The Phone Analyzer tokenizes and indexes phone numbers without applying any additional filters. It enables straightforward and direct matching of phone number values during screening.
Email No Token Filters - The Email Analyzer processes email addresses as exact tokens without applying any transformations or filters. It is designed for direct string matching, ensuring that the full email address is preserved for accurate comparison.
Organization

Organization Numbers

Organization Suffix

Organization Strip Words

Synonym

Stop Word

Stop Word

The Organization analyzer standardizes organization names by removing common suffixes (e.g., Inc, Ltd), normalizing common terms, and ignoring non-distinct or generic words. This improves matching for corporate entity names.
Gender Individual Gender Synonym The Gender analyzer handles gender-related fields by resolving known synonyms (e.g., F - Female, M - Male) for consistent identity matching.
Date No Token Filters - Dates are indexed as-is with no transformation or filtering.
Name Stop

Individual Name Synonyms

Individual Title

Individual Name Strip Words

Synonym

Stop Word

Stop Word

The Name Stop Analyzer clean and normalize names by removing non-essential tokens and standardizing known variations. This helps improve match accuracy during screening.

Example:

Name : Mr. John A. Smith

The Name Stop Analyzer:
  • Removes the title “Mr.” using the Individual Title filter.
  • Removes initials like “A.” if listed in Strip Words.
  • Recognizes that “John” is a synonym for “Jon” if defined in Individual Name Synonyms.

John Smith - normalized for better matching

TF Analyzer No Token Filters - The TF Analyzer is used in Transaction Filtering to tokenize and normalize input data by applying filters like lower casing, stop word removal, and synonym resolution for improved match accuracy.
Document ID No Token Filters - A custom analyzer that tokenizes text using delimiters (comma, semicolon, Tilde, Parentheses, and space), converts tokens to lowercase, and removes duplicates. This analyzer is intended for indexing document identifiers such as national ID numbers or other personal identification numbers.
Organization Strip

Organization Numbers

Organization Stop Words

Synonym

Stop Word

The Organization Strip Analyzer is designed to strip common suffixes or terms from organization names that don't add unique value to the name, improving match consistency across similar entries.

Example: Consider the organization name: ABC Technologies Pvt. Ltd.

The Organization Strip Analyzer:
  • Removes stop words like “Pvt.” and “Ltd.”
  • May replace common synonyms like “Technologies” with “Tech” if configured.

Processed Result:ABC Tech

This helps normalize similar entries like ABC Technologies Limited and ABC Tech Ltd. to a comparable form for accurate screening.

Alphanumeric Keyword No Token Filters - A custom analyzer that splits text on any sequence of non-letter and non-digit characters, converts all characters to lowercase, and applies ASCII folding to normalize accented or special characters to their ASCII equivalents.