Oracle Text (OT)

This topic provides information on the database objects, related tables, available services, OT matching job workflows as well as the limitations and key considerations for OT matching.

Database Objects and OT-Related Tables

All the OT-related tables are located in the entity resolution (ER) schema. The following is a list of the key tables:

Table 4-3 Key Tables and their Descriptions

Table Name Description
FCC_MR_RULESET This table contains the OT match rule sets along with the relevant OT metadata. It is located in the ER schema.
FCC_ER_FULL_OT This table is created from FCC_ER_FULL. It is a source table for OT matching.
FCC_MR_SELECTED_CANDIDATES This table is located in the ER schema. It consists of the OT matching candidates for each source record prior to scoring
FCC_MR_MATCHED_RESULTS This table is located in the ER schema. It consists of the selected candidates based on scoring output.
FCC_ER_OT_RUNSKEY_MAP This table maintains the mapping between the ER runskey and the OT runskey for each batch run.
FCC_M_PIPELINE_BATCH_RUN This table contains OT Batch specific details such as OT Runskey, BATCH_ID etc. It is used to track OT Batch executions.
FCC_MATCH_RULESET (Studio Schema) This table is located in the ER schema. It is used to track the OT matching process execution.

Services Architecture

For information on the services that are available for OT matching, see thePort Numbers for Application

OT Matching Integration in ER Jobs

Job 1: Rule set Synchronization & Setup

  • The job synchronizes the rule sets from FCC_MATCH_RULESET (Studio schema) to FCC_MR_RULESET (ER schema).
  • Only the OT rule sets that are not yet synced to the ER schema will be synced in the job execution.
  • Ensure that all the OS rule sets are disabled before the Job 1 execution.
  • Domain index creation for OT matching is handled in Job 1
  • Synonym Matching:
    Thesaurus:
    • In general terms, a thesaurus is referred to as a book or resource that lists words in groups of synonyms and related concepts.
    • Similarly, Oracle text provides user ability to create, store & update virtual thesaurus into your database
    • Note:

      • There can be only one thesaurus object in a Database server.
      • If the thesaurus is owned by a different user and needs to be created by a new one, then the older user needs to remove the thesaurus first. The new user needs to set the F_IS_RECENTLY_CHANGED flag column to Y, in the FCC_IDX_M_LOOKUP table against the respective thesaurus record.
      • This step must be performed in case the thesaurus data is updated.
      • Update the V_SYNONYM_NAME column in the FCC_MR_M_ENTITY_ATTRIBUTES table against the desired column / attribute to use synonym data.

    Synonyms:

    • A synonym is a word or phrase which has the exact same meaning, or a very similar meaning, to another word.
    • Oracle Text enables the user to create case-sensitive or case-insensitive thesauruses that define synonyms and hierarchical relationships between words and phrases. You can create a thesaurus and load it into the system.
    • CTX_THES Package: To maintain and browse your thesaurus programmatically, you can use the CTX_THES PL/SQL package. With this package, you can browse terms and hierarchical relationships, add and delete terms, add and remove thesaurus relations, and import and export thesauri in and out of the thesaurus tables.
    • Thesaurus Operators: To expand query terms according to your loaded thesaurus, you can use the thesaurus operators in the CONTAINS clause. For example, use the SYN operator to expand a term such as user to its synonyms: 'syn(user)'
    • Scenarios where it might be required:
      • Data containing different names but having the same meaning.
      • Data containing the same sounding names
    • Steps to enable synonym matching:
      • Set the result.applySynonyms property to true, which is present in the applications.properties file of matching-service-ot and scoring-engine services.
      • For Matching-service-ot, navigate to the following path:
        • <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/matching-service-ot/conf
      • For scoring-engine, navigate to the following path:
        • <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/scoring-engine/conf
  • The new table for OT, FCC_ER_FULL_OT, is populated from the FCC_ER_FULL table. The FCC_ER_FULL_OT table works as the source for OT matching

Job 2: OT Matching Execution

  • For each OT batch run, including RERUN, a new OT runskey is generated and mapped to the ER runskey.
  • Mapping is stored in the FCC_ER_OT_RUNSKEY_MAP table and is used for tracking in the FCC_M_PIPELINE_BATCH_RUN table.

Limitations and Key Considerations

The following are the limitations and key considerations for OT matching
  • Out-Of-the-Box (OOB) rule sets for OS and OT have different structures. Ensure that only the OT OOB rulesets are enabled for OT matching.
  • Custom rule sets for OT must not have groups or sub-rules.