Phrase search queries are generally more expensive to process than normal conjunctive search queries.
In addition to the work associated with a conjunctive query, a phrase search operation must verify the presence of the exact requested phrase.
The cost of phrase search operations depends mostly on how frequently the query words appear in the data. Searches for phrases containing relatively infrequent words (such as proper names) are generally very rapid, because the base conjunctive search narrows the results to a small set of candidate hits, and within these hits relatively few possible match positions need to be considered.
On the other hand, searches for phrases containing only very common
words are more expensive. For example, consider a search for the phrase
"to be or not to be"
on a large collection of
documents. Because all of these words are quite common, the base conjunctive
search does not narrow the set of candidate hit documents significantly. Then,
within each candidate result document, numerous possible word positions need to
be scanned, because these words tend to be frequently reused within a single
document.
Even very difficult queries (such as
"to be or not to be"
) are handled by the MDEX
Engine within a few seconds (depending on hardware), and possibly faster on
moderate sized data sets. Obviously, if such queries are expected to be very
common, adequate hardware must be employed to ensure sufficient throughput. In
most applications, phrase searches tend to be used far less frequently than
normal searches. Also, most phrase searches performed tend to contain at least
one information-rich, low-frequency word, enabling results to be returned
rapidly (that is, in less than a second).
You can use the
--phrase_max <num>
flag for the dgraph to
specify the maximum number of words in each phrase for text search. Using this
flag improves performance of text search with phrases. The default number is
10. If the maximum number of words in a phrase is exceeded, the phrase is
truncated to the maximum word count and a warning is logged.