Using Text Enrichment

The Text Enrichment component provides the ability to extract and assess free-form text data.

Extracted information includes:

The Text Enrichment component uses the Salience Engine from Lexalytics. Depending on your license, the Salience Engine may also provide the ability to assess the sentiment of the input text. Sentiment can be evaluated for the whole input (or document), the sentiment towards specific entities, or the sentiment towards specific themes.

Supported Text Enrichment features

The Salience Engine supports a wide variety of text extraction features, but only a limited set of these features are supported by the Endeca Text Enrichment component. The following table lists the text extraction features supported by the Endeca Text Enrichment component.
Table 1. Supported Text Enrichment features
Text Enrichment feature Resulting information in the output record
Sentiment Analysis An overall sentiment score for the current document, for specific entities, or for specific themes. This functionality is available by special license.

This feature can be enabled and disabled.

Named Entities A list of named entities in the current document. You can specify which types of entities to extract. Supported entity types include:
  • Company (i.e., businesses)
  • Person
  • Place (i.e., geographical locations)
  • Product
  • Sports
  • Title
  • List (for user-defined entities)

The output record includes one column per type. Each column can contain multiple values.

If Sentiment Analysis is enabled, the entities are added to different groups based on their sentiment scores. You must specify the ranges for the entity sentiment scores. The output record includes one column per range and each column can contain multiple values.

This feature can be enabled or disabled.

Themes A list of themes in the document. All meta-themes are added to the output record in a field you specify.
For any theme that is not a meta-theme, if the theme score is higher than a user-specified threshold, then:
  • If Sentiment Analysis is enabled, the theme is added to a group based on its sentiment score. You must specify the ranges for the sentiment scores. The output record includes one column per range and each column can contain multiple values.
  • Regardless of whether Sentiment Analysis is enabled or disabled, the theme is added to another (i.e., not meta theme) user-specified field.

This feature can be enabled or disabled.

Quotations A list of quotes in the document, with an attribution to the speaker. You can specify the maximum length of quotes and the name of the field/property in the output record.

This feature can be enabled or disabled,

Document Summary A shortened version of the input content that best represents the whole content in a limited number of words.

This feature is always enabled. It cannot be disabled.

Lexalytics information sources

The Lexalytics Support Web site provides two sources of information on the Salience Engine:

Although both sources are aimed at a developer audience, they can provide useful information for Integrator ETL users who are implementing the Text Enrichment feature.