Normally, ATG Search retrieves indexed content using a relatively short input query from a user. The terms of the query are used to retrieve matching statements that represent the basic results of the search. However, special situations require retrieving index items (documents) that are most similar to another index item, such as consistency checks during authoring of new content. In effect, a whole document is treated as a query which retrieves other documents from the index. To perform this special retrieval, ATG Search provides a special Similar Text request. This section will describe how the request works, its response, and various parameters that control it.

In order to retrieve index items with the whole text of some other item, ATG Search must construct a list of unique terms from this input item. The end result is a vector of unique terms and their frequencies in the input item. Next, the Similar Text request iterates over these terms, retrieving any item that contains that term, and collects these items. During the collection, ATG Search computes the term frequency inverse document frequency (TF-IDF) value, which measures the strength of similarity between the input terms and the retrieved items. After all input terms are processed, ATG Search sorts the retrieved items by this TF-IDF value and returns the desired maximum results. The results have the same structure as the standard query response, as described in the Query Results section of the Standard Query chapter. However, there is no matching statement text, so the index item URL is returned instead. The relevancy score is the TF-IDF value.

This chapter includes the following sections:

loading table of contents...