Oracle Commerce Guided Search - Configuring Search

Configuring Search

Refine user search by configuring thesaurus, stop words, search characters, stemming, and spelling dictionary features.

About search configuration

Developer Studio makes it possible for you to refine your users' search experience by making adjustments to the following features:

Thesaurus
You can establish a one-way or a two-way equivalence between words to enrich your users' query results. You access the Thesaurus editor from the Project tab.
Stop words
You can add stop words. Stop words are ignored by the Endeca search query engine. You access the Stop Words editor from the Project tab.
Search characters
You can add search characters to the search list. This allows non-alphanumeric characters to be indexed along with alphanumeric characters, rather than be treated as whitespace. You access the Search Characters editor from the Edit > Search Characters menu.
Stemming
You can enable or disable stemming for a variety of languages. Stemming defines sets of words (for example, "shirt" and "shirts") that should be considered strictly equivalent for all search operations. You access the Stemming editor using the Edit > Stemming command.
Spelling dictionary
You can tune the size of your application's spelling dictionary by instructing Dgidx to exclude small words, large words, and infrequently used words. You access the Spelling Dictionary editor using the Edit > Spelling dictionary command.

Using the Thesaurus

Option	Description
From	The word or phrase that, when searched on, will also return hits on the words or phrases in the To list.
To	The additional words or phrases whose results will also be returned when the user's query returns hits on the From entry.
Add	Adds the word or phrase entered in the field above to the To list.
Modify	Used to modify a word or phrase in the To list.
Remove	Removed the selected word or phrase from the To list.

Option

Description

From

The word or phrase that, when searched on, will also return hits on the words or phrases in the To list.

The additional words or phrases whose results will also be returned when the user's query returns hits on the From entry.

Add

Adds the word or phrase entered in the field above to the To list.

Modify

Used to modify a word or phrase in the To list.

Remove

Removed the selected word or phrase from the To list.

Two-way Thesaurus Entry editor

Define a list of words or phrases that will return results for each other when searched.

The Two-way Thesaurus Entry editor contains the following fields:

Option	Description
To	The list of words or phrases that will have a mutual equivalence relationship. Searches on any of the words in the list will return hits for the other words in the list as well.
Add	Adds the word or phrase entered in the field above to the To list.
Modify	Used to modify a word or phrase in the To list. To modify an entry, select it in the To list, make your changes in the editing field, and click Modify.
Remove	Removed the selected word or phrase from the To list.

Using Stemming

About stemming

The stemming feature broadens search results to include word roots and word derivations.

Stemming is intended to allow words with a common root form (such as the singular and plural forms of nouns) to be considered interchangeable in search operations. For example, search results for the word "shirt" will include the derivation "shirts," while a search for "shirts" will also include its word root "shirt."

Stemming equivalences are strictly two-way (that is, all-to-all). For example, a search for the singular form of a noun (such as "child") will also return matches for the plural form "children." Likewise, a search for "children" will return matches for "child" and "children." Stemmed words, therefore, are considered equivalent and interchangeable for all search operations.

In contrast, the thesaurus feature supports one-way mappings in addition to two-way mappings.

Note

Stemming files are provided by Endeca for various languages. While you can enable the use of these files, you cannot modify their contents.

Enabling stemming

Open the Stemming editor from the Edit menu to enable stemming.

To enable stemming for one or more languages in your project:

On the Edit menu, choose Stemming.
The Stemming editor appears.
Check one or more of the language check boxes on the list.
Click OK.

Note

The Stemming editor allows you to turn the default version of Dutch and German stemming-with the word forms file-on and off. If you want to implement dynamic stemming for Dutch or German, you must edit the stemming.xml file directly, as described in the Endeca Advanced Development Guide. Subsequent use of the Stemming editor will not overwrite manual changes to the stemming.xml file. To disable stemming, use the above procedure, but uncheck the languages for which you do not want stemming.

Related links

Using Stemming

Stemming editor

The Stemming editor allows you to enable or disable stemming for your supported languages.

The "Enable stemming for" list contains an entry for every supported language. A language that is checked has stemming enabled for that language.

Using Search Characters

About search characters

You can specify punctuation marks as searchable, in addition to digits and upper- and lower-case letters (automatically set as valid search characters).

Upper- and lower-case letters and the digits 0 to 9 are automatically included as valid search characters in your Endeca-enabled application. However, in the case of other characters, such as certain punctuation characters, you can specify whether the character should be indexed along with alphanumeric characters in a token or instead treated as whitespace.

Search characters are configured globally for all search operations.

Note

Implementing search features requires additional work outside of Developer Studio. Please refer to the Endeca Basic Development Guide for details.

Adding search characters

Add search characters from the Search Characters editor, under the Edit menu.

To add search characters:

On the Edit menu, choose Search Characters.
The Search Characters editor appears.
Click Modify.
The Additional Search Characters dialog box appears.
Type the character(s) you want to add.
Click OK.
If you have more than one, do not separate them with commas or spaces -- simply type them one after another.
Click Close.

Implementing search features requires additional work outside of Developer Studio. Please refer to the Endeca Basic Development Guide for details.

Deleting search characters

Delete search characters from the Search Characters editor, under the Edit menu.

To remove an additional character from the list of searchable characters:

On the Edit menu, choose Search Characters.
The Search Characters editor appears.
Click Modify.
The Additional Search Characters dialog box appears.
Highlight the character(s) you want to remove.
Press Delete.
Click OK.
The confirmation message appears.
Click Yes.
Click Close.

Implementing search features requires additional work outside of Developer Studio. Please refer to the Endeca Basic Development Guide for details.

Search Characters editor

The Search Characters editor lists the standard special search characters, as well as any others you have specified.

Using Stop Words

About stop words

Stop words are words that are set to be ignored by the Endeca MDEX Engine.

Typically, common words like "the" are included in the stop word list. In addition, you might want to add terms that are prevalent in your data set. For example, if your data consists of lists of books, you might want to add the word "book" itself to the stop word list, since a search on that word would return an impracticably large set of records.

Note

Words added to the stop word list are not expanded by other Endeca Developer Studio features like stemming and thesaurus. That means that if you set the word "item" as a stop word, its plural form "items" will not be marked automatically as a stop word. If you want both forms to be on the stop word list, you must add them individually.
Stop words are counted in any search mode (such as MatchPartial) that calculates results based on number of matching terms. However, the Endeca MDEX Engine reduces the minimum term match and maximum word omit requirement by the number of stop words contained in the query.
Stop words must be single words only, and cannot contain any non-searchable characters. If more than one word is entered as a stop word, neither the individual words nor the combined phrase will act as a stop word. Non-searchable characters within a stop word will also cause this behavior. Entering "full-bodied" as a stop word acts just as if you had entered "full bodied", and does not have any effect on searches.

Creating stop words

Set stop words from the Stop Words view, under Search Configuration in the Project Explorer.

To add a word to the stop list:

In the Project Explorer, expand Search Configuration and double-click Stop Words.
The Stop Words view appears.
In the Stop Words view, click New.
The Stop Word editor appears.
Type the word that you want to add to the stop word list, and then click OK.

Modifying stop words

Modify stop words from the Stop Words view.

To edit a stop word:

In the Stop Word view, select the stop word that you want to modify, and then click Edit.
The Stop Word editor appears.
Make the necessary changes to the stop word, and then click OK.

Deleting stop words

Remove stop words from the Stop Words view.

To remove a word from the stop word list:

In the Stop Words view, select the stop word you want to remove and click Delete.
When the confirmation message appears, click Yes.

Sorting stop words

Sort stop words in ascending or descending alphabetical order in the Stop Words view.

The entries in Stop Words view are sorted by name to make them easier to work with. You can choose whether the sort is ascending or descending.

To sort the stop words in your list:

In the Stop Words view, click either the Sort Ascending
or Sort Descending
button.

Using Automatic Phrasing

About automatic phrasing

When an application user provides individual search terms in a query, the automatic phrasing feature groups those individual terms into a search phrase and returns query results for the phrase.

Automatic phrasing is similar to placing quotation marks around search terms before submitting them in a query. For example, 'my search terms' is the phrased version of the query my search terms. However, automatic phrasing removes the need for application users to place quotation marks around search phrases to get phrased results.

The result of automatic phrasing is that a Web application can process a more restricted query and therefore return fewer and more focused search results. This feature is available only for record search.

The automatic phrasing feature works by:

Comparing individual search terms in a query to a list of application-specific search phrases. The list of search phrases are stored in a project's phrase dictionary.
Grouping the search terms into search phrases.
Returning query results that are either based on the automatically phrased query, or returning results based on the original unphrased query along with automatically phrased 'Did You Mean?' (DYM) alternatives.

Point three above suggests the two typical implementation scenarios to choose from when using automatic phrasing:

Process an automatically phrased form of the query and suggest the original unphrased query as a DYM alternative.
In this scenario, the automatic phrasing feature rewrites the original query's search terms into a phrased query before processing it. If you are also using DYM, you can display the unphrased alternative so the user can opt-out of automatic phrasing and select their original query, if desired.
For example, an application user searches a wine catalog for the terms "low tannin." The MDEX Engine compares the search terms against the phrase dictionary, finds a phrase entry for "low tannin," and processes the phrased query as "low tannin." The MDEX Engine returns 3 records for the phrased query "low tannin" rather than 16 records for the user's original unphrased query "low tannin." However, the Web application also presents a "Did you mean low tannin?" selection so the user may opt-out of automatic phrasing, if desired.
Process the original query and suggest an automatically-phrased form of the query as a DYM alternative.
In this scenario, the automatic phrasing feature processes the unphrased query as entered and determines if a phrased form of the query exists. If a phrased form is available, the Web application displays an automatically-phrased alternative as a "Did you mean?" option. The user can opt-in to automatic phrasing, if desired.
For example, an application user searches a wine catalog for low tannin. The MDEX Engine returns 16 records for the user's unphrased query low tannin. The Web application also presents a "Did you mean "low tannin"?" option so the user may opt-in to automatic phrasing, if desired.

There are two tasks to implement automatic phrasing:

Add phrases to your project using Developer Studio.
Add Presentation API code to support either of the two implementation scenarios described above.

Note

Implementing search features requires additional work outside of Developer Studio. Refer to the Endeca Advanced Development Guide for details.

Automatic phrasing and query expansion

Grouping of terms as a phrase exempts the phrase from thesaurus expansion and stemming.

Once individual search terms in a query are grouped as a phrase, the phrase is not subject to thesaurus expansion or stemming by the MDEX Engine.

Automatic phrasing, spelling correction, and DYM

Describes the processing order of spelling correction and the DYM function with regard to automatic phrasing.

If you are using automatic phrasing, you should enable the MDEX Engine for both spelling correction and "Did you mean?" If you want spelling-corrected automatic phrases, spelling correction ensures search terms are corrected before the terms are automatically phrased. DYM provides users the choice to opt-in or opt-out of automatic phrasing.

The MDEX Engine applies spelling correction to a query before automatically phrasing the terms. This processing order means, for example, if a user misspells the query Napa Valle, the MDEX Engine first spell corrects to Napa Valley and then automatically phrases to "Napa Valley." Without spelling correction enabled, automatic phrasing would typically not find a matching phrase in the phrase dictionary.

If you implement automatic phrasing to rewrite the query using an automatic phrase, then enabling DYM allows users a way to opt-out of automatic phrasing if they want to. On the other hand, if you implement automatic phrasing to process the original query and suggest automatically-phrased alternatives, then enabling DYM allows users to take advantage of automatically phrased alternatives as follow-up queries.

Note

For details about configuring spelling correction and DYM, see the Endeca Advanced Development Guide.

Adding Phrases to a Project

Methods for adding phrases

Import phrases from an XML file, or extract phrases from dimension names.

There are two ways to include phrases in your Developer Studio project:

Import phrases from an XML file.
Extract phrases from dimension names.

After you add phrases and update your instance configuration, the MDEX Engine builds the phrase dictionary. You cannot view the phrases in Developer Studio. However, after adding phrases and saving your project, you can examine the phrases contained in a project's phrase dictionary by opening the project file named phrases.xml in a text editor. Directly modifying phrases.xml is not supported.

Importing phrases from an XML file

You import an XML file of phrases using the Import Phrases dialog box in Developer Studio. The Import Phrases dialog box can be accessed from either the File menu or from the Automatic Phrasing dialog box.

Before you import the XML file, it must conform to phrase_import.dtd, in the Endeca Navigation Platform conf/dtd directory.

Here is a simple example of a phrase file that conforms to phrase_import.dtd:

<?xml version='1.0' encoding='UTF-8' standalone='no' ?> <!DOCTYPE PHRASE_IMPORT SYSTEM 'phrase_import.dtd'>

<PHRASE_IMPORT>

<PHRASE>Napa Valley</PHRASE>

<PHRASE>low tannin</PHRASE>

</PHRASE_IMPORT>

To import phrases from an XML file:

In the Project Explorer, expand Search Configuration.
Double-click Automatic Phrasing.
The Automatic Phrasing dialog box displays.
Click Import Phrases.
The Import Phrases dialog box displays.
Note
Alternatively, you can select Import Phrases from the File menu to invoke the Import Phrases dialog box.
Either type the path to your phrases file or click the Browse button to locate the file.
Click OK on the Import Phrases dialog box.
Click OK on the Automatic Phrasing dialog box.
The Messages pane displays the number of phrases read in from the XML file.
Select Save from the File menu.

Extracting phrases from dimension names

In addition to importing an XML file of phrases, you can add phrases to your project based on the dimension values of any dimension you choose.

The MDEX Engine adds each multi-term dimension value in a selected dimension to the phrase dictionary. Single-term dimension values are not included. For example, if you import a Winery dimension from a wine catalog, the MDEX Engine creates a phrase entry for multi-term names such as Agostina Pieri but not for single-term names such as Alessi.

In addition, the MDEX Engine adds each multi-term synonym to the phrase dictionary that has "Search" checked on the Synonyms dialog box. In this release, dimension value phrases that have been modified by a partial update pipeline are not reflected in the phrase dictionary.

To extract phrases from dimension name:

In the Project Explorer, expand Search Configuration .
Double-click Automatic Phrasing .
The Automatic Phrasing dialog box displays.
From the All dimensions list, select a dimension that you want to extract phrases from and click Add. Repeat this step as necessary for additional dimensions.
Click OK.
Select Save from the File menu.

Adding search characters that support automatic phrasing

Inclusion of original punctuation marks in search query phrases returns more relevant results.

To add search characters that support automatic phrasing:

If you have phrases that include punctuation, add those punctuation marks as search characters. Adding the punctuation marks ensures that the MDEX Engine includes the punctuation when tokenizing the query, and therefore the MDEX Engine can match search terms with punctuation to phrases with punctuation.

For example, suppose you add phrases based on a Winery dimension, and consequently the Winery name Anderson & Brothers exists in your phrase dictionary. You should create a search character for the ampersand (&).

Note

For details on search characters, see About search characters.

Tips and troubleshooting for automatic phrasing

Depending on how a phrased query is processed it may create dead-end results, for reasons including significance of term order and the fact that the MDEX Engine does not extend user phrases to match those in the phrase dictionary.

The following table provides tips and troubleshooting guidance about using the automatic phrasing feature.

Tip	Description
Examining how a phrased query was processed
Single word phrases	You can include a single word in your phrases_import.xml file and treat the word as a phrase in your project. This may be useful if you do not want stemming or thesaurus expansion applied to single word query terms. You cannot include single word phrases by extracting them from dimension values using the Phrases dialog box. They have to be imported from your phrases_import.xml file.
Extending user phrases	The MDEX Engine does not extend phrases a user provides to match a phrase in the phrase dictionary. For example, if a user provides the query A "BC" D and "BCD" is in the phrase dictionary, the MDEX Engine does not extend the user's original phrasing of "BC" to "BCD."
Term order is significant in phrases	Phrases are matched only if search terms are provided in the same exact order and with the same exact terms as the phrase in the phrase dictionary. For example, if "weekend bag" is in the phrase dictionary, the MDEX Engine does not automatically phrase the search terms "weekend getaway bag" or "bag, weekend" to match "weekend bag."
Possible dead ends	If an application automatically phrases search terms, it is possible a query may not produce results when it seemingly should have. Specifically, one way in which a dead-end query can occur is when a search phrase is displayed as a DYM link with results and navigation state filtering excludes the results. For example, suppose a car sales application is set up to process a user's original query and display any automatic phrase alternatives as DYM options. Further suppose a user navigates to Cars > Less than $15,000 and then provides the search terms luxury package. The search terms match the phrase 'luxury package' in the phrase dictionary. The user receives query results for Cars > Less than $15,000 and results that matched some occurances of the terms luxury and package. However, if the user clicks the DYM link Did you mean "luxury package"? then no results are available because the navigation state Cars > Less than $15,000 excludes them. Note See the Endeca Advanced Development Guidefor details about how processing order affects queries.

Tip

Description

Examining how a phrased query was processed

Single word phrases

You can include a single word in your phrases_import.xml file and treat the word as a phrase in your project. This may be useful if you do not want stemming or thesaurus expansion applied to single word query terms. You cannot include single word phrases by extracting them from dimension values using the Phrases dialog box. They have to be imported from your phrases_import.xml file.

Extending user phrases

The MDEX Engine does not extend phrases a user provides to match a phrase in the phrase dictionary. For example, if a user provides the query A "BC" D and "BCD" is in the phrase dictionary, the MDEX Engine does not extend the user's original phrasing of "BC" to "BCD."

Term order is significant in phrases

Phrases are matched only if search terms are provided in the same exact order and with the same exact terms as the phrase in the phrase dictionary. For example, if "weekend bag" is in the phrase dictionary, the MDEX Engine does not automatically phrase the search terms "weekend getaway bag" or "bag, weekend" to match "weekend bag."

Possible dead ends

If an application automatically phrases search terms, it is possible a query may not produce results when it seemingly should have. Specifically, one way in which a dead-end query can occur is when a search phrase is displayed as a DYM link with results and navigation state filtering excludes the results.

For example, suppose a car sales application is set up to process a user's original query and display any automatic phrase alternatives as DYM options. Further suppose a user navigates to Cars > Less than $15,000 and then provides the search terms luxury package. The search terms match the phrase 'luxury package' in the phrase dictionary.

The user receives query results for Cars > Less than $15,000 and results that matched some occurances of the terms luxury and package. However, if the user clicks the DYM link Did you mean "luxury package"? then no results are available because the navigation state Cars > Less than $15,000 excludes them.

Note

See the Endeca Advanced Development Guidefor details about how processing order affects queries.

If you want to fine tune the size of the spelling dictionary and consequently tune the performance of spelling corrected queries, you can specify constraints to control what words Dgidx adds to the spelling dictionary. You can separately configure entries in the dictionary based for dimension search and record search.

To constrain spelling dictionary entries:

In the Project Explorer, expand Search Configuration.
Double-click Spelling.
Select the Dimension Search tab.
In the It Occurs at Least ... Times field, provide a number that indicates the minimum number of times the word must appear in your source data before the word should be included in the spelling dictionary.
In the And Is Between ... and ... Characters Long fields, provide values that represent the minimum and maximum length of a word that should be included in the spelling dictionary.
Select the Record Search tab.
Repeat steps 4 and 5.
Click OK.

Note

Developer Studio Help