The proximity
module ranks how close the query terms are to each other in a document by counting the number of intervening words. It is designed primarily for use with unstructured data.
Like the first
module, the proximity
module groups its results into variable sized strata, because the difference in significance of an interval of one word and one of two words is usually greater than the difference in significance of an interval of 21 words and 22. If no terms match, the document is placed in the lowest stratum.
Single words and phrases get assigned to the best stratum because there are no intervening words. When the query has multiple terms, proximity
behaves as follows:
All of the absolute positions for each of the query terms are computed.
The smallest range that includes at least one instance of each of the query terms is calculated. This range’s length is given in number of words. The score for each document is the stratum that contains the difference of the range’s length and the number of terms in the query; smaller differences are better than larger differences.
Under query expansion (that is, stemming and the thesaurus), the expanded terms are treated as if they were in the query, so the proximity metric is computed using the locations of the expanded terms in the matching document.
For example, if a user searches for “big cats” and a document contains the sentence, “Big Bird likes his cat” (stemming takes cats to cat ), then the proximity metric is computed just as if the sentence were, “Big Bird likes his cats.” The proximity
module scores partially matched queries as if the query contains only the matching terms. For example, if a user searches for “cat dog fish” and a document is partially matched that contains only cat and fish, then the document is scored as if the query “cat fish” had been entered.
Note: The proximity
module does not work with Boolean searches, cross-field matching, or wildcard search. It assigns all such matches a score of zero.