To optimize performance of wildcard search, use the following
recommendations.
- Account for increased
time needed for indexing. In general, if wildcard search is enabled in the
Oracle Endeca Server (even if it is not used by the users), it increases the
time and disk space required for indexing. Therefore, consider first the
business requirements for your Endeca application to decide whether you need to
use wildcard search.
Note: To optimize performance, the Dgraph process of the Oracle
Endeca Server performs wildcard indexing for words that are shorter than 1024
characters. Words that are longer than 1024 characters are not indexed for
wildcard search.
- In addition, if the wildcard
search is not enabled before you load the records, and you issue a
Configuration Web Service request to enable it, after a large number of records
exist in the data domain, this causes re-indexing and is associated with
increased processing time.
- Do not use "low
information" queries. For optimal performance, use wildcard search queries
with at least 2-3 non-wildcarded characters in them, such as
abc* and
ab*de, and avoid wildcard searches with one
non-wildcarded character, such as
a*. Wildcard queries with extremely low information,
such as
a*, require a significant amount of time to process.
Queries that contain only wildcards, or only wildcards and
punctuation or spaces, such as
*. (star followed by period), or
* * (star space star), are rejected by the Oracle
Endeca Server.
-
Analyze the format of your typical wildcard query cases. This
lets you be aware of performance implications associated with one specific
wildcard search pattern.
Do you have queries that contain punctuation syntax in between
strings of text, such as
ab*c.def*?
For strings with punctuation, the Dgraph process generates lists of
words that match each of the punctuation-separated wildcard expressions. Only
in this case, Dgraph uses the
--wildcard_max <count> setting to optimize its
performance.
Increasing the
--wildcard_max <count> improves the
completeness of results returned by wildcard search for strings with
punctuation, but negatively affects performance. Thus you may want to find the
number that provides a reasonable trade-off.