|Oracle8i interMedia Text Reference
Release 2 (8.1.6)
Part Number A77063-01
Query Operators, 11 of 26
Use the NEAR operator to return a score based on the proximity of two or more query terms. Oracle returns higher scores for terms closer together and lower scores for terms farther apart in a document.
NEAR((word1, word2,..., wordn) [, max_span [, order]])
Specify the terms in the query separated by commas. The query terms can be single words or phrases.
Optionally specify the size of the biggest clump. The default is 100. Oracle returns an error if you specify a number greater than 100.
A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term.
For near queries with two terms, max_span is the maximum distance allowed between the two terms. For example, to query on dog and cat where dog is within 6 words of cat, issue the following query:
Specify TRUE for Oracle to search for terms in the order you specify. The default is FALSE.
For example, to search for the words monday, tuesday, and wednesday in that order with a maximum clump size of 20, issue the following query:
Oracle might return different scores for the same document when you use identical query expressions that have the order flag set differently. For example, Oracle might return different scores for the same document when you issue the following queries:
The scoring for the NEAR operator combines frequency of the terms with proximity of terms. For each document that satisfies the query, Oracle returns a score between 1 and 100 that is proportional to the number of clumps in the document and inversely proportional to the average size of the clumps. This means many small clumps in a document result in higher scores, since small clumps imply closeness of terms.
The number of terms in a query also affects score. Queries with many terms, such as seven, generally need fewer clumps in a document to score 100 than do queries with few terms, such as two.
A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. You can define clump size with the max_span parameter as described in this section.
You can use the NEAR operator with other operators such as AND and OR. Scores are calculated in the regular way.
For example, to find all documents that contain the terms tiger, lion, and cheetah where the terms lion and tiger are within 10 words of each other, issue the following query:
The score returned for each document is the lower score of the near operator and the term cheetah.
You can also use the equivalence operator to substitute a single term in a near query:
This query asks for all documents that contain the phrase stock crash within twenty words of Japan or Korea.
You can write near queries using the syntax of previous ConText releases. For example, to find all documents where lion occurs near tiger, you can write:
or with the semi-colon as follows:
This query is equivalent to the following query:
When you use highlighting and your query contains the near operator, all occurrences of all terms in the query that satisfy the proximity requirements are highlighted. Highlighted terms can be single words or phrases.
For example, assume a document contains the following text:
Chocolate and vanilla are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup.
If the query is near((chocolate, vanilla)), 100, FALSE), the following is highlighted:
<<Chocolate>> and <<vanilla>> are my favorite ice cream flavors. I like <<chocolate>> served in a waffle cone, and <<vanilla>> served served in a cup with carmel syrup.
However, if the query is near((chocolate, vanilla)), 4, FALSE), only the following is highlighted:
<<Chocolate>> and <<vanilla>> are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup.
For more information about the procedures you can use for highlighting, see Chapter 8, "CTX_DOC Package".
You can use the NEAR operator with the WITHIN operator for section searching as follows:
When evaluating expressions such as these, Oracle looks for clumps that lie entirely within the given section.
In the example above, only those clumps that contain dog and cat that lie entirely within the section Headings are counted. That is, if the term dog lies within Headings and the term cat lies five words from dog, but outside of Headings, this pair of words does not satisfy the expression and is not counted.