This section discusses tuning the spelling auto-correction and spelling Did You Mean features.
Spelling auto-correction performance is impacted by the size of the dictionary in use. Spell-corrected keyword searches with many words, in systems with very large dictionaries, can take a disproportionately long time to process relative to other Dgraph requests.
It is important to carefully analyze the performance of the system together with application requirements prior to production application deployment.
You can use the
admin?op=updateaspell administrative query to make
changes to the Aspell spelling dictionary without having to stop and restart
the MDEX Engine. This administrative query causes the MDEX Engine to
temporarily stop processing other regular queries, update the spelling
dictionary and then resume its regular processing.
If the total amount of searchable text is large, this increases the
latency of the
admin?op=updateaspell operation, especially at large
data scale.
The performance of spelling correction in the Dgraph depends heavily on the size of the dictionary. An unnecessarily large dictionary can slow response times and provide less focused results.
Dictionary pruning techniques allow you to reduce the size of the dictionary without sacrificing much in the way of usefulness. To improve spelling correction performance, consider making the following adjustments in Developer Studio’s Spelling editor:
Set the minimum number of word occurrences to a number greater than one.
The first setting in the Spelling editor indicates the number of times a word must occur in the source data in order for it to be included in the dictionary. For record search, the default value is four, which means only words that appear four or more times are included in the dictionary.
Set the minimum word length to a number greater than one.
The second setting in the Spelling editor specifies the minimum length (number of characters) of a word for inclusion in the dictionary. By default, words that are longer than three characters and shorter than sixteen characters are included.
While less dramatic than tuning the minimum word occurrences, adjusting the minimum word length can result in a cleaner, more useful dictionary.
Word-break analysis allows you to consider alternate queries computed
by changing the word divisions in the user’s query. The performance impact of
word-break analysis can be considerable, depending on your data. Seemingly
small deviations from default values, such as increasing the value of
--wb_maxbrks from one to two or decreasing the
value of
--wb-minbrklen from two to one, can have a
significant impact, because they greatly increase the workload on the MDEX
Engine. Oracle suggests that you tune this feature carefully and test its
impact thoroughly before exposing it in a production environment.
Lowering the value for
--dym_hthresh (a Dgraph spelling option) may improve
the performance of Did You Mean.
The option
--dym_hthresh indicates when spelling Did You Mean
engages. The default is 20, meaning that spelling Did You Mean engages even if
there are up to 20 results.
Depending upon your data, making Did You Mean suggestions at this point
may be unnecessary or even overwhelming to your end users. Setting
--dym_hthresh to 2 or 4 is often a better choice.

