Stemming refers to the removal of common endings from words, such as ing from selling. Search engines stem terms to increase the number of matches between the term and the documents. For example, if the documents contain the terms selling, sells, and sell, and the query contains selling, a literal match only retrieves documents with selling, potentially missing many documents. Stemming trades exactness in the query for more results and potentially improved relevance.

If stemming is performed on both the documents and the query, documents including any form of sell can be retrieved by any form of sell in the query. Alternatively, stemming can be performed only on the query. In this case, the documents are indexed with unstemmed forms of sell, but the query stems selling into a query for all forms that have the stem sell (selling or sells or sell). This process is a form of term expansion, in which a single term is expanded to many terms, and it is discussed in the Term Expansion section.

Stemming does not work in all cases. Words such as sold can’t be stemmed by stripping off endings, terms such as sing have an ending that looks like a suffix but is not, and other terms have extra letters not part of the stem, such as swimm in swimming. Furthermore, stemming can not handle the diverse suffixes of most languages outside English. Therefore, ATG Search goes beyond stemming and performs robust morphological analysis, described in the Morphology section.

 
loading table of contents...