This section summarizes the fundamental data structure that represents the user’s query, called a query term vector. ATG Search identifies the tokens, terms, compound terms, phrases, and normalizations that appear in the input. The end result is a sequence of query items. This section describes the items included, and the diagram that follows provides additional details.

The query sequence contains:

The term weight of the query item is computed based on the frequency of the surface index term plus any additional equivalent terms in the expansion. For example, the 10th item installing has a document frequency of 2300 which translates into a weight of 26 out of 100. In this example, the terms with very low weight have been explicitly weighted in the dictionary and are unaffected by their frequency. The 11th term a is the only true stop-word, with a weight of 0. It will be completely ignored in this query, although it could be significant within a larger double-quoted string (see the Literal Constraint section in the User-Entered Query Operators chapter).

Query Term Vector

The query item can also hold information about query operators, discussed in the User-Entered Query Operators chapter. For example, the 8th item administrator was double-quoted, which means it is constrained to match literally and will not match administrate. Also, the 6th term logged in was preceded by the simple Boolean operator +, which means that results are required to have this term (or any of its expansions). Once constructed, the query term vector contains the complete information necessary to execute the query and retrieve results.

 
loading table of contents...