The Text Enrichment component can extract, summarize, and assess input text.
The Text Enrichment component uses the Salience Engine from Lexalytics to extract entities (people, places, organizations, themes, and quotes) from source files. The extracted entities can be written to an output file or loaded into an Endeca data domain.
The Salience Engine also provides the ability to assess the sentiment of input text. This enhanced capability requires a different license from the basic text enrichment capability.
The metadata schema for the Text Enrichment component is not fixed.
The following table describes the configuration properties available for the Text Enrichment component.
| Name | Description | Valid Values | Example |
|---|---|---|---|
| Configuration file | Absolute path to the Text Enrichment
properties file.
Recommended practice is to store the configuration file in the project directory. |
Valid file path
You can use ${PROJECT} or a similar global variable to specify the path. |
${PROJECT}/TextEnrichments.properties |
| Input field | Name of the source field in the input source record that you want to enrich (extract entities and assess sentiment) | Field names | survey_responses |
| Salience license file | Absolute path to the Lexalytics Salience license file | Valid file path | C:/Program Files
(x86)/Lexalytics/license.v5
/usr/endeca/salience/licencse.v5 |
| Salience data path | Absolute path to the Lexalytics data directory | Valid file path | C:/Program Files
(x86)/Lexalytics/data
/usr/endeca/salience/data |
| Error handling key field | Specifies a field to store error-handling output. You must specify a value for this field. If you do not have a specific error field, you can specify the primary key field name. (The primary key field must exist in the input metadata.) | Alphanumeric characters | salience_errors |
| Text threshold (percent) | The minimum percentage of alphanumeric characters that the input field must contain for the field to be processed. If no threshold is specified, the system default is 80. | Positive integers | 80 |
| Number of threads | The number of threads the component should consume. If no thread count is specified, the component uses one thread. | Positive integers | 4 |
| Multi-assign delimiter | Sets the character that separates multi-assign values in
a property in a source record. Keep in mind that this delimiter is different
from the delimiter that separates property fields on the source record.
See also Multi-assign delimiter. |
A single character that is the multi-assign delimiter. The default is the Unicode DELETE character (\U007F). You do not have to use this field if your data does not include multi-assign properties. |