2.8.3.6 Creating Thesaurus for Entity Resolution

This section describes how to create Thesaurus for Entity Resolution.

The thesaurus is required to execute the Entity Resolution batches.

Note:

  • This section is applicable when MATCHING_MECHANISM is set to OT.
  • There is no additional configuration required for OpenSearch.
  • For a fresh installation, there is no thesaurus available in the database.

To create a thesaurus in the ER schema:

  1. Navigate to the <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/candidate-selection/ utility/bin directory.
  2. Execute the following command:
    ./CreateDBThesaurus.sh <DATA_SCHEMA_ALIAS> <PATH TO STORE PRE-PROCESSED FILES GENERATED BY UTILITY> <MODE>

    For example:

    ./CreateDBThesaurus.sh ER_SCHEMA_ALIAS /user/thesaurusFiles CREATE

    The script has two options:
    1. Create: This option helps to generate the pre-seeded thesaurus in the database.
    2. Reset: This option helps the user to update the pre-existing thesaurus. If there is any change in the data, the user can run the script with a reset flag, and the thesaurus will be updated.

    Note:

    Only one thesaurus can be created in one Database server with the specified thesaurus name.