About Text Enrichment

The Text Enrichment component provides information extraction and summarization capabilities.

Extracted information include entities (such as people, places, and organizations), quotations, and themes. The Text Enrichment component utilizes the Salience Engine from Lexalytics. Depending on the version of the Salience Engine that you purchased, the engine also provides the ability to extract sentiment from documents at the document, entity, and theme levels.

Supported Text Enrichment features

The following table lists the features supported by the Text Enrichment component. Note that while the Salience Engine supports a larger number of extraction features, only the following ones are supported by the Text Enrichment component.
Text Enrichment feature Resulting information in the output record
Sentiment Analysis An overall sentiment score for the current document (computed only if the Sentiment Analysis feature has been enabled).
Named Entities A list of named entities in the current document (computed only if the Named Entities feature has been enabled). The user specifies which types of entities will be extracted. Supported entity types are:
  • Company (i.e., businesses)
  • Person
  • Place (i.e., geographical locations)
  • Product
  • Sports
  • Title
  • List (for user-defined entities)

The output record will have one column per type and each column can have multiple values.

Additionally, if the user has enabled Sentiment Analysis, the entities will be added to different groups based on their sentiment scores. The user has to specify the different ranges for the entity sentiment scores. The output record will have one column per range and each column can have multiple values.

Themes A list of themes in the document (computed only if the Themes feature has been enabled). All meta-themes are added to the output record (the user has to specify the name of the field/property for meta-themes).
For any theme that is not a meta-theme, if the theme score is higher than a user-specified threshold, then:
  • If Sentiment Analysis is enabled, the theme is added to a group based on its sentiment score. The user must specify the different ranges for the sentiment scores. The output record will have one column per range and each column can have multiple values.
  • Regardless of whether Sentiment Analysis is enabled or disabled, the theme is added to another (i.e., not meta theme) user-specified field.
Quotations A list of quotes, with their speakers, in the document (computed only if the Quotations feature has been enabled). The user can specify the maximum length of quotes and the name of the field/property in the output record.
Document Summary A shortened version of the input content so as to best represent the whole content in a limited number of words.

Lexalytics information sources

The Lexalytics Support Web site provides two sources of information on the Salience Engine:

Although both sources are aimed at a developer audience, they can provide useful information for Integrator users who are implementing the Text Enrichment feature.