Configuring Matching Stopwords

When matching request items or deduplicating nodes in a viewpoint, you can define stopwords at the node type level that get ignored when comparing string values.

For example, when matching or deduplicating using the Name property, you may want to ignore words such as "Company", "Corporation", and "Incorporated", as including these common words may result in false positives.

Considerations

  • You can add stopwords to string properties (that is, String, Memo, or Numeric String data types) only.
  • The stopword must match the entire word in the property value, including punctuation marks. Partial matches are not counted. For example, if you define the stopword "Corporation", the word "Corp" will not be ignored. Similarly, the stopword "Corp" (without a period) will not ignore "Corp." (with a period) from being matched.
  • Stopwords are not case-sensitive.

Configuring Stopwords on a Node Type

  1. Inspect the node type that you want to configure stopwords for. See Inspecting a Node Type.
  2. Navigate to the Rules tab, and on the Stopwords sub-tab click Edit.
  3. Click Add add button, and select a property to add stopwords for in the Property drop down menu.

    Note:

    If you select the same property on multiple rows, the stopword values for each of the rows are merged into a single row for that property when you click Save.
  4. In Matching Stopwords, enter the stopwords that you want to ignore during the matching process.

    Note:

    The values are converted to lower-case when they are stored. Stopwords are not case-sensitive.
  5. To remove a stopword, click the X icon next to it. To remove the property and all of its stopwords completely, click Actions Actions menu and select Remove.
  6. Click Save.

Stopword Processing

When running request item matching or deduplication for a node type that has matching stopwords configured:

  • The stopwords for all properties are ignored in all matching rules using that property.
  • The stopwords for all properties are ignored when calculating the match score of a matching rule for a match candidate (when matching request items) or matched node (when running deduplication).

For example, when matching request items, suppose your incoming request item has a node named "StreamVault Media" and your node type contains a match candidate named "The StreamVault Media Company".

  • With no stopwords configured, your match score is 61 (because 17 of 28, or 61%, of the characters in the request item match the match candidate, see How are match scores calculated and how do I use them?).
  • If you configure the stopwords "The" and "Company", your match score becomes 100, because those words are ignored when calculating the match score and thus all 17 characters in the request item match the match candidate.

The revised match score may cause a match candidate to be automatically accepted as a match or automatically excluded from the match results when it previously was not.