Text Processing Options are parameter settings that control a wide variety of options related to the processing of text both during indexing and query processing. Examples of text processing options include the ability to specify whether specified metatags should be indexed as text or excluded from indexing, whether items are indexed at the document or sentence level, and the minimum number of words that should be considered a distinct sentence.

These parameters are gathered into sets, which apply all of their text processing option settings when building an index or processing a query.

Some individual options within a text processing option set must have the same value across all content in a project (project-level options), while others may vary with the content (content-level options). In Search Administration, a single TPO set is associated with the project as a whole, referred to as the project TPO set. It’s possible to also associate option sets with content, and these are called content TPO sets.

This chapter includes the following sections:

You can add TPO sets at either the project or the content level.

The option values used for text processing come from:

  • The project TPO set if no content TPO set is specified.

  • The merge of the project TPO set and content TPO set, if both sets are specified

If both project-level and content-level TPO sets are present, project-level options are always specified by the project TPO set. For content-level options, the content TPO set either explicitly specifies its own value, or leaves it unset, and the value is inherited from the project TPO set. Only one content TPO set can be associated for a given content name.

Note that if you are indexing structured content from a repository, such as a product catalog, you do not need to use a TPO set to specify input languages; the repository indexing process overrides any TPO language selection and uses the languages specified in your IndexingOutputConfig component. However, if you are indexing file-based content in multiple languages, you must separate your content by language, and use a content-level TPO set to specify the Input Language for that content. One way to do this would be to separate the files for each language into a discrete directory; you can then add content representing each directory to the content set separately.

If you have an active search project selected in the Workbench, you can view the merged set of options. For example, if your search project MyProject uses TPO set A at the project level and TPO set B at the content level for some content, if you edit TPO set B with MyProject selected, for any option that says “use project level setting,” it displays the actual setting used in the project.


Copyright © 1997, 2013 Oracle and/or its affiliates. All rights reserved. Legal Notices