Text Enrichment properties file

The Text Enrichment Properties file defines the configuration of the Salience Engine for the Text Enrichment component instance. All instances of the component can use the same properties file, or you can use different properties files to support different instances of the component.

The Text Enrichment properties file:
  • specifies whether supported text extraction features are enabled
  • specifies the input fields to process
  • defines scoring thresholds for assessments
  • defines the field names for assessment output data

The spelling and case of the configuration properties must match the spelling and case listed in the following sections.

The following sections and tables describe the configuration properties. Some of the setting values are user-variable (in other words, the user can customize the name or setting), while others must use the specific values described in the table.

Global Sentiment Analysis property

The following property provides overall control of sentiment analysis:
te.sentiment.analysis.enabled

If this property is set to true, all levels of sentiment analysis (document, entity, and theme) are enabled. Each level can then be enabled and disabled individually.

If you want to use this feature, you must purchase a license that includes sentiment analysis.

Table 1. Document Sentiment Analysis properties
Document Sentiment property Meaning
te.document-sentiment.enabled If set to true (the default), Sentiment Analysis is enabled for documents and an overall sentiment score is computed for the current document. If set to false, Document Sentiment Analysis is disabled.
te.document-sentiment.field Sets the target field name in the output records in which the sentiment score is written. The field name is user-variable, and DocumentSentiment is the default.
te.document-sentiment.use-chains If set to true (the default), document sentiment scoring will use lexical chains in the computation of the sentiment score. The default is true.
Table 2. Named Entity processing properties
Named Entity property Meaning
te.entity.enabled If set to true (the default), Named Entity Extraction is enabled. If set to false, Named Entity Extraction is disabled.
te.entity.types Sets the types of named entities to extract. Supported types are:
  • Company
  • Person
  • Place
  • Product
  • Sports
  • Title
  • List (must be used if you have user-defined entities)

The default types are Person, Company, and Product.

Each configured entity type is written to a target field whose name is made up of the entity type (such as Person) prefixed by the te.entity-sentiment.field value.

te.entity.field-prefix Sets the prefix name that is used to determine the final field names for the named entities. The field name is user-variable, and Entities is the default. For example, if you set this value to Entities and you configure Person and Company entity types to be extracted, then EntitiesPerson and EntitiesCompany will be the two target field names in the output records.

If you have user-defined entities, then all the user-defined entities are put in the EntitiesList target field.

te.entity-sentiment.enabled If set to true (the default), Sentiment Analysis is enabled for entities and a sentiment score is computed for the entities. If set to false, Entity Sentiment Analysis is disabled.
te.entity-sentiment.cut1.label Sets the value for this cut1 label. This will be the column name for the negative entities to be extracted. The field name is user-variable, and EntitiesNegative is the default. Negative entities are entities with a score less than the negative threshold.
te.entity-sentiment.cut1.value Sets the EntitiesNegative threshold. The value is user-variable (for example, a value of -0.1 can be used). Entity-sentiment scores that are less than this value are written to the EntitiesNegative record field.
te.entity-sentiment.cut2.label Sets the value for this cut2 label. This will be the column name for the neutral type of entity sentiments to be extracted. Neutral entities are entities with a sentiment score between the negative and positive thresholds. The field name is user-variable, and EntitiesNeutral is the default.
te.entity-sentiment.cut2.value Sets the EntitiesPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). Entity-sentiment scores that are greater than this value are written to the EntitiesPositive record field.
te.entity-sentiment.cut3.label Sets the value for this cut3 label. This is the column name for the positive type of entity sentiments to be extracted. Positive entities are entities with a score greater than the positive threshold. The field name is user-variable, and EntitiesPositive is the default.
Table 3. Theme processing properties
Theme property Meaning
te.theme.enabled If set to true (the default), Theme Extraction is enabled. If set to false, Theme Extraction is disabled.
te.theme-type.enabled Specifies whether output themes will be standard or normalized.
Valid values are:
  • standard
  • normalized

If you specify normalized as the value for this property, you must define a normalization.dat file in the directory %LEXALYTICS_HOME%/data/themes. For more information, see Normalizing themes. If the file normalization.dat does not exist, standard themes will be output.

If the text enrichment properties file does not include this property, standard themes are output. If the value of the property is specified incorrectly, the graph will fail.

te.theme.field Sets the target field name in the output records in which kept theme names are written. The field name is user-variable, and Themes is the default. Kept themes are those themes whose score is higher than the te.theme.score.threshold setting and have made the te.theme.keep-max cut-off list.
te.theme.score.threshold Sets a score threshold for keeping themes. That is, only keep themes with a score greater than this threshold. The value is user-variable, and 1.0 is the default.
te.theme.keep-max Sets a threshold for keeping the best themes. That is, of those themes that are above the te.theme.score.threshold setting, only keep the themes with the best scores. The value is user-variable, and the default is 100.
te.theme-sentiment.enabled If set to true (the default), Sentiment Analysis is enabled for themes and a sentiment score is computed for the themes. If set to false, Theme Sentiment Analysis is disabled.
te.theme-sentiment.cut1.label Sets the value for this cut1 label. The field name is user-variable, and ThemesNegative is the default. Negative themes are themes with a score less than the negative threshold.
te.theme-sentiment.cut1.value Sets the ThemesNegative threshold. The value is user-variable (for example, a value of -0.1 can be used).
te.theme-sentiment.cut2.label Sets the value for the cut2 label. The field name is user-variable, and ThemesNeutral is the default. Neutral themes are themes with a sentiment score between the negative and positive thresholds.
te.theme-sentiment.cut2.value Sets the ThemesPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). ThemesPositive are themes with a score greater than the positive threshold.
te.theme-sentiment.cut3.label Sets the value for this cut3 label. The field name is user-variable, and ThemesPositive is the default. Positive themes are themes with a score greater than the positive threshold.
te.meta-theme.field Sets the target field name in the output records in which the meta-themes are written. The field name is user-variable, and ThemesMeta is the default. Meta-themes are a list of themes in the document.
te.meta-theme.frequency.threshold Sets a score threshold for keeping meta-themes. That is, only keep meta-themes with a score greater than this threshold. The value is user-variable, and 1.0 is the default.
Table 4. Quotation processing properties
Quotation property Meaning
te.quotation.enabled If set to true (the default), Quoted Context Extraction is enabled. If set to false, Quoted Context Extraction is disabled.
te.quotation.field Sets the target field name in the output records in which quoted content is written. The field name is user-variable, and the default is Quotes.
te.quotation.max-length Sets the maximum length (in characters) of a quotation. The default length is 200. Note that if the quotation in the source field is longer than this setting, the source quotation is not written to the target field.
Table 5. Social Media property
Social Media property Meaning
te.short-content.enabled If set to true, the processing of Social Media (for example, Twitter data) is enabled. If set to false (the default), Social Media processing is disabled. Note that Social Media (short content) processing is only applicable to default data (which means English). If you are using a language data other than English, make sure to set this property to false.
Table 6. Document Summary properties
Document Summary property Meaning
te.summary.field Sets the column name in the output file in which the summarization of the input content is written. The field name is user-variable, and the default is Summary.
te.summary.length Sets the document summary length in sentences. The default length is 3 sentences.
Table 7. Basic custom properties
Basic Custom properties Meaning
te.salience.userdataDirectory Takes an absolute path to a directory that contains a user-created data dictionary.
te.sentiment.setSentimentDictionary Takes an absolute path to a user-created dictionary that will be used as the sentiment dictionary for the Salience Engine (that is, this dictionary overrides the default Salience sentiment dictionary).
te.sentiment.addSentimentDictionary Takes an absolute path to a user-created dictionary that will be used in addition to the current sentiment Analysis dictionary:
  • If te.sentiment.setSentimentDictionary has been used, then the additional dictionary is added to the first user-created sentiment dictionary.
  • If te.sentiment.setSentimentDictionary has not been used, then the additional dictionary is added to the default Salience sentiment dictionary.
Table 8. Query topic processing properties
Query Topics property Meaning
te.query-topics.enabled If set to true (the default), query topic processing is enabled. If set to false, query topic processing is disabled.
te.query-topics.field Sets the target field name in the output records to which specified query topics are written. The field name is user-variable, and QueryTopics is the default.
Salience.Options.QueryTopics.setQueryTopicList Specifies the location and name of the file used to define the topics and queries you want to use to tag output from this instance of the Text Enrichment component.
te.query-topics-sentiment.enabled If set to true (the default), Sentiment Analysis is enabled for query topics and a sentiment score is computed for the topics. If set to false, Query Topic Sentiment Analysis is disabled.
te.query-topic-sentiment.cut1.label Sets the value for this cut1 label. The field name is user-variable, and QueryTopicsNegative is the default. Negative query topics are query topics with a score less than the negative threshold.
te.query-topic-sentiment.cut1.value Sets the QueryTopicsNegative threshold. The value is user-variable (for example, a value of -0.1 can be used).
te.query-topic-sentiment.cut2.label Sets the value for the cut2 label. The field name is user-variable, and QueryTopicsNeutral is the default. Neutral query topics are query topics with a sentiment score between the negative and positive thresholds.
te.query-topic-sentiment.cut2.value Sets the QueryTopicsPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). Positive query topics are query topics with a score greater than the positive threshold.
te.query-topic-sentiment.cut3.label Sets the value for this cut3 label. The field name is user-variable, and QueryTopicsPositive is the default. Positive query topics are themes with a score greater than the positive threshold.

Advanced Custom options

In addition to the te.* configuration properties listed above, you can set other options provided by the Salience API. You can use custom options from the following classes:
Salience.Options.Base.xxx
Salience.Options.Collections.xxx
Salience.Options.Concepts.xxx
Salience.Options.Entities.xxx
Salience.Options.QueryTopics.xxx
Salience.Options.Sentiment.xxx
where xxx is the name of the specific method you want to configure, such as Salience.Options.Base.setFailLongSentence.

Information on these classes is available in the Lexalytics Salience 5.1 Javadoc:

http://dev.lexalytics.com/doc/java-se5.1/

These API extension points are not parsed by the Text Extraction component. The values are passed directly to the Salience Engine as is.

Sentiment activator interaction

Four configuration activation properties control Sentiment Analysis:
  • te.sentiment-analysis.enabled

    Enables or disables Sentiment Analysis on a global basis.

  • te.document-sentiment.enabled

    Enables or disables Document Sentiment Analysis.

  • te.entity-sentiment.enabled
  • Enables or disables Entity Sentiment Analysis.

  • te.theme-sentiment.enabled

    Enables or disables Theme Sentiment Analysis.

If te.sentiment-analysis.enabled is set to false, Sentiment Analysis is disabled globally. The document, entity, and theme sentiment activators are all treated as false, regardless of the specific setting of the individual activators. No sentiment analysis of any type is performed.

If te.sentiment-analysis.enabled is set to true, you can enable and disabled document, entity, and theme sentiment analysis in any combination. For example, if you are not interested in entity sentiment analysis, you can disable it but enable document and theme sentiment analysis.

Customizing the theme, entity, and query topic sentiment cuts

If you are using Sentiment Analysis for themes, entities, and query topics, you can customize the number of cuts. The "Named Entity property", "Theme property" and "Query topic" tables above assume that you are using three cuts for positive, negative, and neutral scores, but you can use more or fewer cuts.

For example, named-entities are added to different user-configured fields based on their sentiment scores. You can configure the various output fields by specifying range-thresholds and field-names as follows (names in bold-face are user-supplied names):
te.entity-sentiment.cut1.label = fieldName1
te.entity-sentiment.cut1.value = sentimentScore1
te.entity-sentiment.cut2.label = fieldName2
te.entity-sentiment.cut2.value = sentimentScore2
te.entity-sentiment.cut3.label = fieldName3
te.entity-sentiment.cut3.value = sentimentScore3
...
te.entity-sentiment.cut–1.label = fieldNameN-1
te.entity-sentiment.cut–1.value = sentimentScoreN-1
te.entity-sentiment.cutN.label = fieldNameN

This field schema can be represented graphically by this illustration:

Field schema diagram

The above configuration specifies N different fields into which the named-entities will be mapped based on their sentiment-scores. Any entity whose sentiment-score is between MIN_FLOAT and sentimentScore1 will be placed in fieldName1. Then, any entity whose sentiment-score is between sentimentScore1 and sentimentScore2 will be placed in fieldName2, and so on. Finally, any entity whose sentiment score is between sentimentScoreN-1 and MAX_FLOAT will be placed in fieldNameN.

The label can be any string that is allowed to be a field-name (e.g., EntitiesBucket1). The value can be any floating-point number.

Note: There are no default values for the above-mentioned properties in the Text Enrichment component. Therefore, a property will not be used unless you add it to the properties file, with a named label and a floating-point value.
The following is an example configuration:
te.entity-sentiment.cut1.label = EntitiesNegative
te.entity-sentiment.cut1.value = -0.1
te.entity-sentiment.cut2.label = EntitiesNeutral
te.entity-sentiment.cut2.value = 0.1
te.entity-sentiment.cut3.label = EntitiesPositive

Configure theme sentiment and query topic sentiment the same way. The only difference is the name of the fields used in the configuration.

Sample Text Enrichment properties file

# Enable Sentiment Analysis on global basis
te.sentiment-analysis.enabled = true

# Enable Document Sentiment
te.document-sentiment.enabled = true
te.document-sentiment.field = DocumentSentiment

# Enable Entity extraction
te.entity.enabled = true
# Entity types to allow and their prefix
te.entity.types = Person, Company, Product, Place
te.entity.field-prefix = Entities
# Entity sentiment goes -0.1 < s < 0.1
te.entity-sentiment.enabled = true
te.entity-sentiment.cut1.label = EntitiesNegative
te.entity-sentiment.cut1.value = -0.1
te.entity-sentiment.cut2.label = EntitiesNeutral
te.entity-sentiment.cut2.value = 0.1
te.entity-sentiment.cut3.label = EntitiesPositive

# Enable Theme extraction
te.theme.enabled = true
te.theme.field = Themes
# Only keep themes with score greater than the threshold
te.theme.score.threshold = 0.0
# Of those that are above the threshold, only keep the best 50
te.theme.keep-max = 50

# Theme sentiment goes -0.1 &lt; s &lt; 0.1
te.theme-sentiment.cut1.label = ThemesNegative
te.theme-sentiment.cut1.value = -0.1
te.theme-sentiment.cut2.label = ThemesNeutral
te.theme-sentiment.cut2.value = 0.1
te.theme-sentiment.cut3.label = ThemesPositive
# Set meta-theme field and only keep those above0.1 
te.meta-theme.field = ThemesMeta
te.meta-theme.frequency.threshold = 0.1

# Enable Quotation extraction
te.quotation.enabled = true
te.quotation.field = Quotes
# Max length of a quotation, in characters
te.quotation.max-length = 400

#Enable query topic processing
te.query-topics.enabled = true 
te.query-topics.field = QueryTopics
#Set the location of the query topics definition file
Salience.Options.QueryTopics.setQueryTopicList = /localdisk/djones/lexalytics/salience-6.0/custom/QueryDefinedTopics.dat
te.query-topics-sentiment.enabled = true
te.query-topics-sentiment.cut1.label = QueryTopicsNegative
te.query-topics-sentiment.cut1.value = -0.1
te.query-topics-sentiment.cut2.label = QueryTopicsEntitiesNeutral
te.query-topics-sentiment.cut2.value = 0.1
te.query-topics-sentiment.cut3.label = QueryTopicsPositive

#Enable Twitter processing
te.short-content.enabled = true 

# Summary is always enabled
te.summary.field = Summary
# Document summary length in sentences
te.summary.length = 2

# Set location of my user directory
te.salience.userdataDirectory=/localdisk/djones/lexalytics/salience-6.0/data/user
# Add my sentiment dictionary to the Salience default
te.sentiment.addSentimentDictionary=/localdisk/djones/lexalytics/salience-6.0/custom/custom.hsd