Phrases are expressions composed of several terms. Phrases that have a different meaning than the sum of their parts are called non-compositional phrases. For example, the phrase pick up means “to lift”, which is not a direct meaning taken from pick and up. Phrases that take their meanings directly from their component terms are called compositional phrases. For example, brick house means “a house made of bricks” and there is no special meaning to the phrase itself. Since compositional phrases are therefore just a specific sequence of terms, there is no benefit to treating them as a unit, since other occurrences of the terms will not be in phrase order, but will reflect the same content.

Proper names constitute a third kind of phrase, but ATG Search considers those compound tokens rather than grammatical phrases, such as NewYork and International Business Machines.

The ATG Search dictionary contains both compositional and non-compositional phrases. During indexing, ATG Search uses non-compositional phrases as index terms rather than their component terms. For example, documents with the phrase pick up will be indexed by that phrase, rather than by pick and up. Retrieval becomes more precise using these phrases, much like other compound terms found during tokenization. Note that phrases also have morphological variants of their component terms, such as picked up and picking up. All of these variants are recognized as the same phrase and therefore are indexed by the same term. During indexing, compositional phrases are ignored, and the documents are indexed by their component terms. This makes the index and retrieval results more consistent.

During query processing, ATG Search treats non-compositional phrases as query terms, behaving just like normal words. They have their own term weight based on their frequency and term expansions based on the dictionary. ATG Search recognizes compositional phrases during query time, but does not treat them as query terms since they have not been indexed as a unit. Instead, ATG Search uses them to expand the component terms of the phrase. For example, the phrase bank machine can expand to the simple term atm or multiple terms such as automatic teller.

 
loading table of contents...