Smart duplicate search

The smart duplicate search considers all the values present in the incoming record or search criteria to find the potential matching records. It returns the matching case record, even if some of the values in search criteria do not match as those values do not contribute to total match score.

The application uses this algorithm only when the Enable Smart Duplicate Search switch is enabled in Argus Console > System Configuration > Common Profile Switches > Case Processing > Duplicate Search, and displays a record match score in the Score field. By default, the search result is sorted in the descending order of the score.

If the Enable Smart Duplicate Search is not enabled, the duplicate search does not return a record even if a single criteria does not match. For example, if all the search criteria values are matching to a record except patient age, then the duplicate search does not return this case record, while the smart duplicate search fetches this record in the search results for a total match score above the predefined score threshold. Hence, the smart duplicate search reduces the number of search iterations making the process more efficient.

Note:

This switch is global that is, any changes made to this switch impacts all the enterprises.

Score Threshold

A threshold value is defined in Argus Console > System Configuration > System Management (Common Profile Switches) > Case Processing > Duplicate Search. While searching for a case record, the smart duplicate search algorithm displays matching records having a score of predefined threshold or more. The threshold is also referred as the record level score threshold.

Note:

This switch is set at the Enterprise level. Any changes made to this switch impacts only that enterprise.

Smart search algorithm

The smart search algorithm use fuzziness such that you can enter all the search criteria at once and the records are returned based on the total score. It is possible that one or more of the search criteria does not match a record in the result, but still appears in the results list, if the total score is higher than the configured threshold.

For example, if the record threshold is 50, then only matching cases with score of 50 or more are displayed in the list.

The following are the main features provided by the smart search algorithm:
  • Numerical Scoring - Smart Search algorithm applies numerical scoring to the numeric and date fields, so that, the entered search criteria matches the value in the case record, if the value is within pre-defined threshold.

    For example, for a threshold of 1 month, the patient age of 12 months should match 11 months. Similarly, for a threshold of 15 days, the Event Onset date 10-Jan-24 should match 20-Jan-2024.

    The lower is the deviation between the search criteria and the matching value, higher is the score contribution, and vice-versa.

  • Soundex - A feature to match misspelled text fields or names with similar sounds is available using the Full Search check box on the Initial Case Entry screen.
  • Weighted Matching - Each matching attribute contributes differently to the scores based on the pre-configured weight.

    For example, if Patient DOB has more weight configured than Patient Age, then the Patient DOB field contributes more to the total score than the Patient Age field.

  • Transposed Date Matching - Transposed date, that is, date with day and month swapped is also considered as a match with more score to an original date than the transposed date.

    For example, Patient DOB: 10-Jan-2024 matches 01-Oct-2024.

    Note:

    • Date thresholds are applied to date as well as transposed date.
    • Date limits are calculated based on the input date and not based on the transposed date.
    • Records with transposed date are considered if they are within the date limits, else they are excluded. For example:

      For the initial receipt date of 03-Jan-2024, the receipt range limit is 04-Nov-2023 to 03-Mar-2024. In this case, the transposed date 01-Mar-2024 falls within the date limits calculated from 03-Jan-2024.

      Whereas, in case of the initial receipt date of 10-Jan-2024, the receipt range limit is 11-Nov-2023 to 10-Mar-2024 and the transpose date 01-Oct-2024 does not fall within the date limits. Hence, the record with the transposed date is not considered as a match.

  • Exclusion Criteria - This is to ensure that there are no performance issues. If any of the following fields do not match, then the record is excluded from the search results:
    • Report Type
    • COI
    • Gender
    • Initial Receipt Date

    Note:

    As the above fields are part of the exclusion criteria, and hence these fields do not contribute to the match score, and do not show up in the Match Score Explanation screen. However, the Initial Receipt Date field is an exception as this field contributes to the match score as well as appear in the Match Score Explanation screen due to numerical scoring.

Match Score

To understand the score value, from the duplicate search results, click Score of the selected record. The Match Score Explanation screen appears with the fields for which search criteria has been entered.

The Match Score Explanation screen shows comparison of the values, which are searched with the corresponding values in the matching case record. You can also view the contribution of each field to the total match score along with the match type, that is, if it is an exact match, similar values, or no match; so as to take a decision on whether the incoming record is a duplicate or follow-up of the selected record or not.

Note:

The match score also depends on the number of search terms. The significance of number of terms in the search criteria diminishes with the increase in search terms. However, for a lower number of search terms like searching only by Reporter's First Name as the search criteria may return a very less score (as low as 5), even on the exact match. Hence, Oracle recommends to search with more basic available information for the case rather than searching on only one field.

If the Reference ID or case number matches, then this field is prioritized over the rest of the fields and the case record is considered as a match with 100 score ignoring the match score contribution for rest of the fields. Match Type is the Camel Casing in this scenario.

Total score for the record is rounded off to a whole number, and hence there can be round off error between the sum of individual field scores and total score.

Reference ID field, if available in the search criteria, is always the first field in the Match Score Explanation screen.

Field name Description
Search Criteria

Displays the values in the search criteria or incoming record being searched against.

Note:

Only duplicate search fields that are checked and have values in the incoming record or search criteria are displayed.

Exclusion Criteria fields are not displayed in the Match Score Explanation screen as these fields do not contribute to the total score.

The Initial Receipt Date field is always displayed, if entered as search criteria or present in the incoming record.

Selected Record

Displays the value for the matching case record.

For example, if the search criteria matches Patient First Name, then only First Name is displayed under Selected Record.
  • In case of no match, Patient First Name and Last Name are displayed separated by space.
  • If only Patient First Name or Last Name is available, then non-blank values are displayed.
  • If Patient First Name and Last Name, both are blank, then Patient Initials are displayed. If patient initials are also unavailable, this field is left blank.
Score Displays individual contribution to the total score by each field. It helps to understand how much each field has contributed for the record to be a match.

Individual field score is rounded-off to 2 decimal places.

Match Type Displays type of the values that are matched:
  • Priority Match - If the fields with high priority matches, like Reference ID.

    In case a record is a priority match, the Match Type is displayed as N/A as well as Score for rest of the fields in the search criteria.

  • No Match - A criterion that is used during the search, but the searched value do not match the value in the duplicate search result case record being assessed.
  • Exact Match - A criterion that is used in the search, where the value in the duplicate search result case record being assessed matches exactly.
  • Similar Values - A criterion that is used in the search, where the intent of the values match, but are reported or captured as other similar values that are close to the value searched (such as dates within range, age ranges, Soundex, etc.). For example:
    • Soundex: “Johnson” vs “Jonsen”
    • Misspelling: "Headahce" vs. "Headache"
    • Date Range match: “20-Nov-2023” vs “21-Oct-2023”

    As Argus uses Oracle Text which is designed to match information based on the text (word) entered, if there are fields with same value and are variants, then also the record is considered as a similar match, and assign them with same score. For example, if a record has symptom as Rash, and the other one has Severe Rash, then both the records are considered as similar match, and assigned same score.