Analyzers

5.2 Analyzers

The Analyzer assists compliance teams and technical users in understanding how specific input data—such as names or addresses—is tokenized, normalized, and matched against watchlist entries. It enables detailed examination of synonym expansion, token classification, and match scoring mechanisms that are essential for fine-tuning the screening engine and ensuring accurate detection of true positives while minimizing false positives.

Configure Analyzer on Index Management

From the Navigation List, click Watchlist Management>Index Management. The index Management page displays a list of available watchlists.
Click Index JSON

Figure 5-1 Edit icon
to view the JSON file details associated with this watch list.
The Edit Index JSON pop-up appears.

You can edit the JSON in this window.

For example: If we want to use Country Synonyms for Country Name under Prohibited Country Watchlist, then edit the Country List's index json and change the analyzerType of particular attribute ("name" : "v_country_name") to address.

{  "schemaName" : "",  "runSkey" : 68,  "batchRunId" : "WLDJWLoad_2024-03-28_1711619630679_1",  "tableName" : "FCC_WL_DJW_V",  "deletedProfilesTableName" : null,  "filterCondition" : "1=1",  "indexName" : "fcc_idx_djw",  "indexAlias" : "idx_djw",  "disasterRecovery" : false,  "indexLogicalName" : "Watchlist",  "indexBusinessName" : "Dow Jones",  "indexKeyAttribute" : "n_uid",  "loadType" : "FullLoad",  "shards" : 3,  "replicas" : 4,  "attributes" : [ {    "name" : "v_given_name", -- Each of these blocks is used to define
      pre-processing of the fields.    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name", -- Here, the user can enter any analyser mentioned in the
      list.    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_family_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_full_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases_given_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases_family_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_entity_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "organization",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_entity_name_bus_strip",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "organization",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_original_script_name",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_date_of_births",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "date",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_passports",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_ssn",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_identification_numbers",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_city",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_country",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_nationality",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_residence",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_yob",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "name",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_min_yob",    "type" : "integer",    "similarity" : "boolean",    "analyzerType" : null,    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_max_yob",    "type" : "integer",    "similarity" : "boolean",    "analyzerType" : null,    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_address",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_aliases",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "namestop",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_gender",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "gender",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_place_of_birth",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "address",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  }, {    "name" : "v_title",    "type" : "text",    "similarity" : "boolean",    "analyzerType" : "gender",    "searchAnalyzerType" : null,    "fields" : [ ],    "termVector" : null  } ],  "customAnalyzer" : [ ],  "customFilter" : [ ],  "customCharFilter" : [ ],  "customTokenizer" : [ ],  "others" : [ "n_wl_skey", "n_run_skey", "v_wl_sub_type", "v_wl_type",
      "v_entity_type", "n_uid" ],  "replaceEmptyFields" : [ ],  "replaceCharFields" : [ {    "name" : "v_full_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_family_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_given_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_entity_name",    "charArray" : [ "-.", "''", "&", "()" ],    "replaceWith" : [ " ", "", "and", " " ]  }, {    "name" : "v_entity_name_bus_strip",    "charArray" : [ "-.", "''", "&", "()" ],    "replaceWith" : [ "", "", "and", " " ]  }, {    "name" : "v_original_script_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_country",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_address",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases_given_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases_family_name",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  }, {    "name" : "v_aliases",    "charArray" : [ "-.", "''" ],    "replaceWith" : [ " ", "" ]  } ],    "translateFields" : [ "v_family_name", "v_given_name", "v_full_name",
      "v_aliases_family_name", "v_aliases_given_name", "v_aliases" ]}}

Click Validate to verify that your edits to the JSON are valid.
Click Save to update the JSON or click Cancel to close without saving your changes.

Analyzer Types

Table 5-3 Analyzer Types

Analyzer Type	Supported Filters	Type	Description
Name	Individual Name Synonyms Individual Title	Synonym Stop Word	The Name analyzer processes person names by applying standardization rules such as name synonyms (e.g., Will - William) and removal of non-essential titles (e.g., Mr., Dr.). This ensures better match results during screening.
Address	Country Synonyms	Synonym	The Address analyzer processes address components by resolving country-specific synonyms (e.g., USA - United States, , UK - United Kingdom)). This enhances consistency and accuracy in location-based matching. Example: Input: `123 Main St, NY,USA` Normalized to: `123 Main Street, New York, United States`
Phone	No Token Filters	-	The Phone Analyzer tokenizes and indexes phone numbers without applying any additional filters. It enables straightforward and direct matching of phone number values during screening.
Email	No Token Filters	-	The Email Analyzer processes email addresses as exact tokens without applying any transformations or filters. It is designed for direct string matching, ensuring that the full email address is preserved for accurate comparison.
Organization	Organization Numbers Organization Suffix Organization Strip Words	Synonym Stop Word Stop Word	The Organization analyzer standardizes organization names by removing common suffixes (e.g., Inc, Ltd), normalizing common terms, and ignoring non-distinct or generic words. This improves matching for corporate entity names.
Gender	Individual Gender	Synonym	The Gender analyzer handles gender-related fields by resolving known synonyms (e.g., F - Female, M - Male) for consistent identity matching.
Date	No Token Filters	-	Dates are indexed as-is with no transformation or filtering.
Name Stop	Individual Name Synonyms Individual Title Individual Name Strip Words	Synonym Stop Word Stop Word	The Name Stop Analyzer clean and normalize names by removing non-essential tokens and standardizing known variations. This helps improve match accuracy during screening. Example: Name : `Mr. John A. Smith` The Name Stop Analyzer: Removes the title “Mr.” using the Individual Title filter. Removes initials like “A.” if listed in Strip Words. Recognizes that “John” is a synonym for “Jon” if defined in Individual Name Synonyms. `John Smith` - normalized for better matching
TF Analyzer	No Token Filters	-	The TF Analyzer is used in Transaction Filtering to tokenize and normalize input data by applying filters like lower casing, stop word removal, and synonym resolution for improved match accuracy.
Document ID	No Token Filters	-	A custom analyzer that tokenizes text using delimiters (comma, semicolon, Tilde, Parentheses, and space), converts tokens to lowercase, and removes duplicates. This analyzer is intended for indexing document identifiers such as national ID numbers or other personal identification numbers.
Organization Strip	Organization Numbers Organization Stop Words	Synonym Stop Word	The Organization Strip Analyzer is designed to strip common suffixes or terms from organization names that don't add unique value to the name, improving match consistency across similar entries. Example: Consider the organization name: `ABC Technologies Pvt. Ltd`. The Organization Strip Analyzer: Removes stop words like “Pvt.” and “Ltd.” May replace common synonyms like “Technologies” with “Tech” if configured. Processed Result:`ABC Tech` This helps normalize similar entries like ABC Technologies Limited and ABC Tech Ltd. to a comparable form for accurate screening.
Alphanumeric Keyword	No Token Filters	-	A custom analyzer that splits text on any sequence of non-letter and non-digit characters, converts all characters to lowercase, and applies ASCII folding to normalize accented or special characters to their ASCII equivalents.