The key to producing optimal XHTML documents is specifying the right set of properties – omitting properties that you don’t need, and including the ones that you do. The smaller your XHTML documents are, the smaller your index will be; a smaller index uses less memory and can be searched more quickly than a larger one. Of course, an index that’s missing important data is not very useful, so you need to make tradeoffs based on the needs of your environment. This section provides guidelines to help you determine which properties to include as text properties and which ones to include as metadata.
In addition to omitting unnecessary properties, you can reduce the size of your XHTML documents by applying various property value filters to the repository data. For information about these filters and how to use them, see the Using Property Value Filters section of the Customizing the XHTML Output chapter. Also, see the IndexingOutputConfig Analysis section for tools to help pinpoint unnecessary properties in your index.
Guidelines for Text Properties
Some guidelines for determining which properties to include as text properties, and which ones to omit:
Include properties whose values contain text that users are likely to search for. For product catalogs, typical properties might be
description
,longDescription
,color
, andbrand
.Don’t include multiple properties that contain the same data. For example, if a product’s
description
andlongDescription
properties always contain the same text, include only one of these properties in the index. (Similarly, if thedescription
property always contains a subset of the text in thelongDescription
property – such as the first sentence – then includelongDescription
and omitdescription
.)Don’t include properties that have values that users are unlikely to search for. These include date and Boolean values, and some types of numeric values. (Note, however, that these properties are often appropriate for metadata.)
Don’t include properties that may lead to irrelevant or undesired results. For example, suppose you have a Shoes category with two subcategories, Men’s Shoes and Women’s Shoes. If the
description
property of the Shoes category is “Men’s and women’s shoes,” and you includeancestorCategories.description
in the index, searches for “women’s shoes” will return men’s shoes as well as women’s, because theancestorCategories.description
property for each item in Men’s Shoes contains the phrase “women’s shoes.”Be careful not to confuse the name of a property with its values. For example, you might be inclined to include a Boolean property named
onSale
, on the assumption that users may include “on sale” in search queries. But the resulting index will not includeonSale
(the name of the property), it will includetrue
andfalse
(the values of the property), so searching for “on sale” will not have the desired effect.
Keep in mind that these are just guidelines, and you may need to deviate from them depending on the requirements of your environment. For example, you may want to include a Boolean property as a text property if you translate true
and false
into searchable Strings. (See Translating Property Values.) Or there may be certain numeric properties (e.g., product codes) that you may want to make available for searching.
Guidelines for Metadata Properties
Some guidelines for determining which properties to include as metadata properties, and which ones to omit:
Include any properties that you want to use for creating facets (for example, faceting by
size
).Include any properties that you want to use in search configuration rankings and rules (for example, ranking by
brand
).Include any properties that you want to use as sort criteria (for example, sorting by
salePrice
).Include any properties that you want to use in query constraints (for example, using
catalogId
to restrict results to items in the catalog assigned to the user).Include any properties that you want to be able to access in search results. For example, you might include
$repositoryId
for the document-level item so you can access the repository item that a search result represents. But don’t include such properties if you don’t need them; see Suppressing Properties.Don’t include properties that do not match any of these criteria. In particular, you should not include long text fields (such as
longDescription
) as metadata properties. Although it is possible to include these properties in constraints (e.g., “do not return results whoselongDescription
contains Acme”), such constraints are very inefficient, and these properties can increase the index size significantly.