Oracle Commerce Guided Search - Oracle Language Technology (OLT) language analysis

Oracle Language Technology (OLT) language analysis

Oracle Language Technology analysis performs language-specific dictionary-based forms of linguistic analysis, including the following:

Segmentation: Identifying word breaks in text from languages that do not use whitespaces as word delimiters. Formerly unseparated words must be contiguous to each other and in the same property. Note that Latin-1 analysis is unsuitable for languages that do not use whitespaces as delimiters.
Tokenization: Breaking a stream of text up into words, phrases, symbols, or other meaningful elements.
Orthographic normalization: Accounting for variations in the representation of words in languages that have standardized alternatives to diacritic marks (such as "ae" or "a" for ä in German); for example, treating "Furtwaengler" and "Furtwangler" as matching terms.
Decompounding: Dividing compound word forms into their base terms; for example, dividing "Altertumswissenschaft" into "Altertums" and "Wissenschaft".
Diacritic folding: Ignoring character accents in data when indexing and searching text.
Dynamic stemming: Determining the base (uninflected) form of a word. The process is based on dictionary entries and language specific rules.
Stop words: Common words (such as "the", "and", or "while") that have no value for searching.

A single MDEX Engine can process any number of the originally supported languages whose default language analysis is OLT; for example, a single MDEX Engine can process data in Arabic, Finnish, and Hebrew. However, among the languages that were not originally supported, a single MDEX Engine can process only one language whose default analysis is OLT.

The management of the originally supported languages can be configured in the file stemming.xml.

For a complete list of the languages supported by the MDEX engine, see MDEX Engine Supported Languages .

Note

OLT analysis is only partially compatible with Oracle Commerce record and dimension search features; for example, it does not support wildcard search, phrase search, and search characters. If your application requires these search features, use Latin-1 analysis.

Different releases of the MDEX Engine may include different versions of OLT. To find out which version of OLT the MDEX Engine uses, enter the --version option for the Dgidx or Dgraph at the command line.

Note

Only one type of language analysis can be applied to any particular record, dimension, or property.

Copyright © Legal Notices