1.1 General matching strategy
This section provides a brief description of the general strategy used in Oracle Financial Services Customer Screening. It consists of three main components: identifier preparation, clustering, and matching.
Identifier preparation
There are some differences between the structure of data sets that always need to be normalized before clustering and matching, so that the matching process does not need to repeat the configuration of transformations on each comparison.
Identifier preparation is used to ensure that the records conform to a pre-defined data structure which can be used by the rest of the matching process, and also to eliminate common forms of variance between the records (such as spelling variants of given names and abbreviations of frequently-used tokens).
Clustering
Clustering is used to minimize the work that must be performed by the final stage of matching. It works by splitting the working and reference data into wide tranches (clusters), based on similarities in significant data fields. Only subsets of the data which share similar characteristics, and will, therefore, be placed in the same cluster, will be compared on a record-by-record basis later in the matching process.
If very wide clusters are used, there will be a large number of records in each cluster. This means that there is a reduced risk that true matches will be missed, but also that a greater amount of processing power is required to compare all the clustered records by brute force. A tighter clustering strategy will result in smaller clusters, with fewer records per cluster. This results in reduced processing requirements for row-by-row comparisons but increases the likelihood that some true matches will not be detected.
Matching
Note:
Oracle Financial Services Customer Screening does not use the Match decision as it never considers there to be an automatic match between two records that do not require review.The rules are applied as a decision table, so if a pair of records qualifies as a match under a rule higher in the table, it will not be compared using any rules below that. All rules are configured to operate on a case-insensitive basis. Unless stated otherwise, all noise and whitespace characters are removed or normalized before matching.