Designed primarily for use with unstructured data, the Proximity module ranks how close the query terms are to each other in a document by counting the number of intervening words.

Like the First module, this module groups its results into variable sized strata, because the difference in significance of an interval of one word and one of two words is usually greater than the difference in significance of an interval of 21 words and 22. If no terms match, the document is placed in the lowest stratum.

Single words and phrases get assigned to the best stratum because there are no intervening words. When the query has multiple terms, Proximity behaves as follows:

Under stemming, spelling correction, and the thesaurus, the expanded terms are treated as if they were in the query, so the proximity metric is computed using the locations of the expanded terms in the matching document.

For example, if a user searches for big cats and a document contains the sentence, "Big dog likes his cat" (stemming takes cats to cat), then the proximity metric is computed just as if the sentence were, "Big dog likes his cats."

Proximity scores partially matched queries as if the query only contained the matching terms. For example, if a user searches for cat dog fish and a document is partially matched that contains only cat and fish, then the document is scored as if the query cat fish had been entered.

Proximity does not work with Boolean searches, cross-field matching, or wildcard searches. It assigns all such matches a score of zero.


Copyright © Legal Notices