Each rule is assigned a weight value. Weights between -100 and 100 adjust weights in a relative fashion. Use a negative weight value to lower the rating for topic assignment. Use a positive weight value to raise the rating for topic assignment. Use a weight value of zero (0) to preempt other rule matches. Weights outside the -100 to 100 range are absolute.

Rules are evaluated in rank order. If the highest ranking rule is an absolute positive (> 100), the topic is assigned with 100% confidence. If the highest ranking rule is an absolute negative (< -100) the topic is not assigned. In the absence of absolute rules, the weights are added to compute the relevance.

For example, for the topic union strike, you might define the following rules (weights are in parentheses):

1. strike three (0)
2. strike out (0)
3. strike two (0)
4. strike one (0)
5. union—strike (100)
6. strike (100)
7. baseball (-101)

Any document containing the word strike matches rule 6; however these documents may or may not be about union strikes. Any document that contains any of the first four phrases also matches rule 6, but because rules 1—4 appear first, they preempt the rule 6 match. (For example, strike out matches rules 2 and 6. Since rule 2 comes first, it is selected and receives a weight value of 0.) A document containing the phrase y matches rule 5 and receives a weight value of 100. Rule 7 absolutely excludes any document that includes baseball.

The actual assignment of topics to documents occurs during indexing. The options that control thresholds for this are in TPO sets (see the Text Processing Option Sets chapter of this guide for more information):

  • Topic limit per doc—Controls the maximum number of topics to assign to an item; the default is 10.

  • Topic relevance threshold

  • Topic confidence threshold

The topic limit determines the maximum number of topics to assign to any content item; the default is 10. The relevance and confidence thresholds set thresholds for which the pattern weights must score the document above, in order for the topic to be considered a relevant topic for an item. The default values are 1 and 0, respectively.

 
loading table of contents...