Preventing expensive wildcard searches

Certain types of wildcard queries may cause the MDEX Engine to grow in memory footprint and take a long time to complete. Even though these types of queries are legitimate searches that would eventually return, they can cause the appearance of a timeout and potentially cause a site outage. As a best practice, Endeca recommends preventing these types of wildcard queries in your front-end application code.

The behavior of such wildcard queries does not typically indicate an actual timeout of the MDEX Engine; instead, it may indicate, for example, that the query search term is so broad that it takes a very long time to compute results. For example, to process a search for "a*", the MDEX Engine must return every record containing any word beginning with a; this is a more time-intensive query for the Dgraph to compute.

The following types of wildcard queries are potentially very expensive to compute for the MDEX Engine: To prevent users from issuing such types of wildcard queries, utilize front-end application code to circumvent these scenarios for all queries that contain a wildcard character (*).
Note: If search queries contain only wildcards and punctuation, such as *.*, the MDEX Engine rejects them for performance reasons and returns no results.
Use the following recommendations in the front-end application, by utilizing application code at query time:
  1. Remove all non-searchable characters from each wildcard query before issuing it to the MDEX Engine.

    Stripping non-searchable characters should make little difference in your search results because the MDEX Engine treats non-searchable characters as white space both when indexing and when retrieving word matches.

  2. Parse the queries to calculate their search term length to avoid very low information queries, such as "a*". For, example, you may want to prevent issuing to the MDEX Engine wildcarding queries that contain fewer than 3 non-wildcarded characters.

    Filtering out such queries should make no difference in your search results because wildcard search for two characters or less would bring back an unusable results set in almost all instances.

  3. Exclude wildcard queries with quoted phrase searches. This will not affect your search results because when users issue quoted phrase search, most likely they expect exact matches and do not require wildcards in this case.

You can accomplish these recommendations in the front-end application tier by programmatically analyzing search terms entered by the users before issuing them to the MDEX Engine, determining whether a query will be issued, and prompting the user to submit a better query (or using logic of your choice to handle this situation).

Note: In the majority of cases, none of these changes should impact the user experience.