Term extraction is the process of tagging a Guided Search record with a list of its relevant terms. A term represents a concept mentioned in the record’s source document, and is typically a noun phrase. The noun phrase consists of one or more nouns, potentially with adjoining words. A relevant term is a term that bears information for a document relative to the rest of the corpus.

During the term extraction process, term variants found in documents are stemmed for comparison and aggregation, but the final representation of the term uses the dominant form (most frequent variant). Using the dominant form allows the recovery of the preferred representation (singular/plural case, capitalization) of proper nouns and brand names.

Term extraction is performed by CAS using a CAS term extractor manipulator. Configuration information on term extraction is given in the Configuration Guidelines for Term Extraction chapter.


Copyright © Legal Notices