Designed primarily for use with unstructured data, the First module ranks documents by how close the query terms are to the beginning of the document.
First groups its results into variably-sized strata. The strata are not the same size, because while the first word is probably more relevant than the tenth word, the 301st is probably not so much more relevant than the 310th word. This module takes advantage of the fact that the closer something is to the beginning of a document, the more likely it is to be relevant.
The First module works as follows:
- When the query has a single term, First's behavior is
straight-forward: it retrieves the first absolute position of the
word in the document, then calculates which stratum contains that
position. The score for this document is based upon that stratum;
earlier strata are better than later strata
.
- When the query has multiple terms, First behaves as follows:
- The first absolute position for each of the query terms is determined.
- The median position of these positions is calculated. This median
is treated as the position of this query in the document and can be
used with stratification as described in the single word case.
- With query expansion (using stemming, spelling correction, or the thesaurus), the First module treats expanded terms as if they occurred in the source query. For example, the phrase glucose intolerence would be corrected to glucose intolerance (with intolerence spell-corrected to intolerance). First then continues as it does in the non-expansion case. The first position of each term is computed and the median of these is taken.
- In a partially matched query, where only some of the query terms cause a document to match, First behaves as if the intersection of terms that occur in the document and terms that occur in the original query were the entire query. For example, if the query cat bird dog is partially matched to a document on the terms cat and bird, then the document is scored as if the query were cat bird.
First's interaction with other features
First works for partial match modes, such as MatchPartial, as well as for MatchAll. For partial matches, First ranks documents based on the median position of the matching terms.
First does not work with Boolean searches, cross-field matching, or wildcard search. It assigns all such matches a score of zero.