Word-break analysis allows the Spelling Correction feature to
consider alternate queries computed by changing the word divisions in the
user’s query.
For example, if the query is
Back Street Boys, word-break analysis could
instruct the Oracle Endeca Server to consider the alternate
Backstreet Boys.
The following statements describe how word-break analysis works in the
Dgraph process of the Oracle Endeca Server:
- It is enabled by default.
- As part of the word-break
analysis, the Dgraph process removes breaks from the original term, or adds
breaks to the original term if needed.
- The maximum number of word
breaks that the Dgraph adds to or removes from a query is one.
- The minimum length for a new
term created by word-break analysis is two characters. The Dgraph does not
correct words that are smaller than 2 characters. For example, it does not
correct
anear to
a near. It could correct to
an ear if there are actual terms in the data
corpus that match both
an and
ear.
- When word-break analysis is
applied to a query, it requires that the substrings that the term is broken up
into appear in the data in succession. For example, starting with the query
box17, word-break analysis would find
box 17, as well as
box-17, assuming that the hyphen (-) has not been specified as a
search character. However, it would not find
17 old boxes, because the target terms do not appear in order.