The exclude list configuration contains a set of terms that are removed from the final list of extracted terms.
Excludes are compared against the canonical and all raw forms of a term; if it matches any, the term is excluded. This is equivalent to canonicalizing the exclude term.
The exclude list configuration can be passed to the CAS term extraction manipulator by creating a new Record Store of a supported type, for example de-limited or JDBC. You must load the data into the Record Store and add the following pass through information in the manipulator configuration.
Parameter |
Configuration Value |
---|---|
Exclude List Record Store instance name |
Record Store instance name which contains exclude terms. |
Exclude term property name |
Property name of the record which contains exclude term in Exclude List Record Store. |
The format rules for the excluded terms list are as follows:
The list is processed after all terms have been extracted from the records.
In this brief example of a delimited exclude list file, EXCLUDE and
RecordSpec are the headers -- multiple headers are allowed. For the
Exclude term property name
property in the CAS term
extraction manipulator, you must pass EXCLUDE.
EXCLUDE,RecordSpec - 12.1 megapixel,1 - 12 MP,2 The 12.1 megapixel sensor,4