Text Enrichment properties file

The Text Enrichment Properties file defines the configuration of the Salience Engine for the Text Enrichment component instance. All instances of the component can use the same properties file, or you can use different properties files to support different instances of the component.

The Text Enrichment properties file:

specifies whether supported text extraction features are enabled
specifies the input fields to process
defines scoring thresholds for assessments
defines the field names for assessment output data

The spelling and case of the configuration properties must match the spelling and case listed in the following sections.

The following sections and tables describe the configuration properties. Some of the setting values are user-variable (in other words, the user can customize the name or setting), while others must use the specific values described in the table.

Global Sentiment Analysis property

The following property provides overall control of sentiment analysis:

te.sentiment.analysis.enabled

If this property is set to true, all levels of sentiment analysis (document, entity, and theme) are enabled. Each level can then be enabled and disabled individually.

If you want to use this feature, you must purchase a license that includes sentiment analysis.

Table 1. Document Sentiment Analysis properties
Document Sentiment property	Meaning
te.document-sentiment.enabled	If set to `true` (the default), Sentiment Analysis is enabled for documents and an overall sentiment score is computed for the current document. If set to `false`, Document Sentiment Analysis is disabled.
te.document-sentiment.field	Sets the target field name in the output records in which the sentiment score is written. The field name is user-variable, and `DocumentSentiment` is the default.
te.document-sentiment.use-chains	If set to `true` (the default), document sentiment scoring will use lexical chains in the computation of the sentiment score. The default is `true`.

Table 2. Named Entity processing properties
Named Entity property	Meaning
te.entity.enabled	If set to `true` (the default), Named Entity Extraction is enabled. If set to `false`, Named Entity Extraction is disabled.
te.entity.types	Sets the types of named entities to extract. Supported types are: `Company` `Person` `Place` `Product` `Sports` `Title` `List` (must be used if you have user-defined entities) The default types are `Person`, `Company`, and `Product`. Each configured entity type is written to a target field whose name is made up of the entity type (such as `Person`) prefixed by the te.entity-sentiment.field value.
te.entity.field-prefix	Sets the prefix name that is used to determine the final field names for the named entities. The field name is user-variable, and `Entities` is the default. For example, if you set this value to Entities and you configure `Person` and `Company` entity types to be extracted, then EntitiesPerson and EntitiesCompany will be the two target field names in the output records. If you have user-defined entities, then all the user-defined entities are put in the EntitiesList target field.
te.entity-sentiment.enabled	If set to `true` (the default), Sentiment Analysis is enabled for entities and a sentiment score is computed for the entities. If set to `false`, Entity Sentiment Analysis is disabled.
te.entity-sentiment.cut1.label	Sets the value for this cut1 label. This will be the column name for the negative entities to be extracted. The field name is user-variable, and `EntitiesNegative` is the default. Negative entities are entities with a score less than the negative threshold.
te.entity-sentiment.cut1.value	Sets the `EntitiesNegative` threshold. The value is user-variable (for example, a value of `-0.1` can be used). Entity-sentiment scores that are less than this value are written to the `EntitiesNegative` record field.
te.entity-sentiment.cut2.label	Sets the value for this cut2 label. This will be the column name for the neutral type of entity sentiments to be extracted. Neutral entities are entities with a sentiment score between the negative and positive thresholds. The field name is user-variable, and `EntitiesNeutral` is the default.
te.entity-sentiment.cut2.value	Sets the `EntitiesPositive` threshold. The value is user-variable (for example, a value of `0.1` can be used). Entity-sentiment scores that are greater than this value are written to the `EntitiesPositive` record field.
te.entity-sentiment.cut3.label	Sets the value for this cut3 label. This is the column name for the positive type of entity sentiments to be extracted. Positive entities are entities with a score greater than the positive threshold. The field name is user-variable, and `EntitiesPositive` is the default.

Table 3. Theme processing properties
Theme property	Meaning
te.theme.enabled	If set to `true` (the default), Theme Extraction is enabled. If set to `false`, Theme Extraction is disabled.
te.theme-type.enabled	Specifies whether output themes will be standard or normalized. Valid values are: `standard` `normalized` If you specify `normalized` as the value for this property, you must define a normalization.dat file in the directory %LEXALYTICS_HOME%/data/themes. For more information, see Normalizing themes. If the file normalization.dat does not exist, standard themes will be output. If the text enrichment properties file does not include this property, standard themes are output. If the value of the property is specified incorrectly, the graph will fail.
te.theme.field	Sets the target field name in the output records in which kept theme names are written. The field name is user-variable, and `Themes` is the default. Kept themes are those themes whose score is higher than the te.theme.score.threshold setting and have made the te.theme.keep-max cut-off list.
te.theme.score.threshold	Sets a score threshold for keeping themes. That is, only keep themes with a score greater than this threshold. The value is user-variable, and `1.0` is the default.
te.theme.keep-max	Sets a threshold for keeping the best themes. That is, of those themes that are above the te.theme.score.threshold setting, only keep the themes with the best scores. The value is user-variable, and the default is `100`.
te.theme-sentiment.enabled	If set to `true` (the default), Sentiment Analysis is enabled for themes and a sentiment score is computed for the themes. If set to `false`, Theme Sentiment Analysis is disabled.
te.theme-sentiment.cut1.label	Sets the value for this cut1 label. The field name is user-variable, and `ThemesNegative` is the default. Negative themes are themes with a score less than the negative threshold.
te.theme-sentiment.cut1.value	Sets the `ThemesNegative` threshold. The value is user-variable (for example, a value of `-0.1` can be used).
te.theme-sentiment.cut2.label	Sets the value for the cut2 label. The field name is user-variable, and `ThemesNeutral` is the default. Neutral themes are themes with a sentiment score between the negative and positive thresholds.
te.theme-sentiment.cut2.value	Sets the `ThemesPositive` threshold. The value is user-variable (for example, a value of `0.1` can be used). ThemesPositive are themes with a score greater than the positive threshold.
te.theme-sentiment.cut3.label	Sets the value for this cut3 label. The field name is user-variable, and `ThemesPositive` is the default. Positive themes are themes with a score greater than the positive threshold.
te.meta-theme.field	Sets the target field name in the output records in which the meta-themes are written. The field name is user-variable, and `ThemesMeta` is the default. Meta-themes are a list of themes in the document.
te.meta-theme.frequency.threshold	Sets a score threshold for keeping meta-themes. That is, only keep meta-themes with a score greater than this threshold. The value is user-variable, and `1.0` is the default.

Table 4. Quotation processing properties
Quotation property	Meaning
te.quotation.enabled	If set to `true` (the default), Quoted Context Extraction is enabled. If set to `false`, Quoted Context Extraction is disabled.
te.quotation.field	Sets the target field name in the output records in which quoted content is written. The field name is user-variable, and the default is `Quotes`.
te.quotation.max-length	Sets the maximum length (in characters) of a quotation. The default length is 200. Note that if the quotation in the source field is longer than this setting, the source quotation is not written to the target field.

Table 5. Social Media property
Social Media property	Meaning
te.short-content.enabled	If set to `true`, the processing of Social Media (for example, Twitter data) is enabled. If set to `false` (the default), Social Media processing is disabled. Note that Social Media (short content) processing is only applicable to default data (which means English). If you are using a language data other than English, make sure to set this property to `false`.

Table 6. Document Summary properties
Document Summary property	Meaning
te.summary.field	Sets the column name in the output file in which the summarization of the input content is written. The field name is user-variable, and the default is `Summary`.
te.summary.length	Sets the document summary length in sentences. The default length is 3 sentences.

Table 7. Basic custom properties
Basic Custom properties	Meaning
te.salience.userdataDirectory	Takes an absolute path to a directory that contains a user-created data dictionary.
te.sentiment.setSentimentDictionary	Takes an absolute path to a user-created dictionary that will be used as the sentiment dictionary for the Salience Engine (that is, this dictionary overrides the default Salience sentiment dictionary).
te.sentiment.addSentimentDictionary	Takes an absolute path to a user-created dictionary that will be used in addition to the current sentiment Analysis dictionary: If te.sentiment.setSentimentDictionary has been used, then the additional dictionary is added to the first user-created sentiment dictionary. If te.sentiment.setSentimentDictionary has not been used, then the additional dictionary is added to the default Salience sentiment dictionary.

Table 8. Query topic processing properties
Query Topics property	Meaning
te.query-topics.enabled	If set to `true` (the default), query topic processing is enabled. If set to `false`, query topic processing is disabled.
te.query-topics.field	Sets the target field name in the output records to which specified query topics are written. The field name is user-variable, and `QueryTopics` is the default.
Salience.Options.QueryTopics.setQueryTopicList	Specifies the location and name of the file used to define the topics and queries you want to use to tag output from this instance of the Text Enrichment component.
te.query-topics-sentiment.enabled	If set to `true` (the default), Sentiment Analysis is enabled for query topics and a sentiment score is computed for the topics. If set to `false`, Query Topic Sentiment Analysis is disabled.
te.query-topic-sentiment.cut1.label	Sets the value for this cut1 label. The field name is user-variable, and `QueryTopicsNegative` is the default. Negative query topics are query topics with a score less than the negative threshold.
te.query-topic-sentiment.cut1.value	Sets the `QueryTopicsNegative` threshold. The value is user-variable (for example, a value of `-0.1` can be used).
te.query-topic-sentiment.cut2.label	Sets the value for the cut2 label. The field name is user-variable, and `QueryTopicsNeutral` is the default. Neutral query topics are query topics with a sentiment score between the negative and positive thresholds.
te.query-topic-sentiment.cut2.value	Sets the `QueryTopicsPositive` threshold. The value is user-variable (for example, a value of `0.1` can be used). Positive query topics are query topics with a score greater than the positive threshold.
te.query-topic-sentiment.cut3.label	Sets the value for this cut3 label. The field name is user-variable, and `QueryTopicsPositive` is the default. Positive query topics are themes with a score greater than the positive threshold.

Advanced Custom options

In addition to the te.* configuration properties listed above, you can set other options provided by the Salience API. You can use custom options from the following classes:

Salience.Options.Base.xxx
Salience.Options.Collections.xxx
Salience.Options.Concepts.xxx
Salience.Options.Entities.xxx
Salience.Options.QueryTopics.xxx
Salience.Options.Sentiment.xxx

where xxx is the name of the specific method you want to configure, such as Salience.Options.Base.setFailLongSentence.

Information on these classes is available in the Lexalytics Salience 5.1 Javadoc:

http://dev.lexalytics.com/doc/java-se5.1/

These API extension points are not parsed by the Text Extraction component. The values are passed directly to the Salience Engine as is.

Sentiment activator interaction

Four configuration activation properties control Sentiment Analysis:

te.sentiment-analysis.enabled
Enables or disables Sentiment Analysis on a global basis.
te.document-sentiment.enabled
Enables or disables Document Sentiment Analysis.
te.entity-sentiment.enabled
Enables or disables Entity Sentiment Analysis.
te.theme-sentiment.enabled
Enables or disables Theme Sentiment Analysis.

If te.sentiment-analysis.enabled is set to false, Sentiment Analysis is disabled globally. The document, entity, and theme sentiment activators are all treated as false, regardless of the specific setting of the individual activators. No sentiment analysis of any type is performed.

If te.sentiment-analysis.enabled is set to true, you can enable and disabled document, entity, and theme sentiment analysis in any combination. For example, if you are not interested in entity sentiment analysis, you can disable it but enable document and theme sentiment analysis.

Customizing the theme, entity, and query topic sentiment cuts

If you are using Sentiment Analysis for themes, entities, and query topics, you can customize the number of cuts. The "Named Entity property", "Theme property" and "Query topic" tables above assume that you are using three cuts for positive, negative, and neutral scores, but you can use more or fewer cuts.

For example, named-entities are added to different user-configured fields based on their sentiment scores. You can configure the various output fields by specifying range-thresholds and field-names as follows (names in bold-face are user-supplied names):

te.entity-sentiment.cut1.label = fieldName1
te.entity-sentiment.cut1.value = sentimentScore1
te.entity-sentiment.cut2.label = fieldName2
te.entity-sentiment.cut2.value = sentimentScore2
te.entity-sentiment.cut3.label = fieldName3
te.entity-sentiment.cut3.value = sentimentScore3
...
te.entity-sentiment.cut–1.label = fieldNameN-1
te.entity-sentiment.cut–1.value = sentimentScoreN-1
te.entity-sentiment.cutN.label = fieldNameN

This field schema can be represented graphically by this illustration:

The above configuration specifies N different fields into which the named-entities will be mapped based on their sentiment-scores. Any entity whose sentiment-score is between MIN_FLOAT and sentimentScore1 will be placed in fieldName1. Then, any entity whose sentiment-score is between sentimentScore1 and sentimentScore2 will be placed in fieldName2, and so on. Finally, any entity whose sentiment score is between sentimentScoreN-1 and MAX_FLOAT will be placed in fieldNameN.

The label can be any string that is allowed to be a field-name (e.g., EntitiesBucket1). The value can be any floating-point number.

Note: There are no default values for the above-mentioned properties in the Text Enrichment component. Therefore, a property will not be used unless you add it to the properties file, with a named label and a floating-point value.

The following is an example configuration:

te.entity-sentiment.cut1.label = EntitiesNegative
te.entity-sentiment.cut1.value = -0.1
te.entity-sentiment.cut2.label = EntitiesNeutral
te.entity-sentiment.cut2.value = 0.1
te.entity-sentiment.cut3.label = EntitiesPositive

Configure theme sentiment and query topic sentiment the same way. The only difference is the name of the fields used in the configuration.

Sample Text Enrichment properties file

# Enable Sentiment Analysis on global basis
te.sentiment-analysis.enabled = true

# Enable Document Sentiment
te.document-sentiment.enabled = true
te.document-sentiment.field = DocumentSentiment

# Enable Entity extraction
te.entity.enabled = true
# Entity types to allow and their prefix
te.entity.types = Person, Company, Product, Place
te.entity.field-prefix = Entities
# Entity sentiment goes -0.1 < s < 0.1
te.entity-sentiment.enabled = true
te.entity-sentiment.cut1.label = EntitiesNegative
te.entity-sentiment.cut1.value = -0.1
te.entity-sentiment.cut2.label = EntitiesNeutral
te.entity-sentiment.cut2.value = 0.1
te.entity-sentiment.cut3.label = EntitiesPositive

# Enable Theme extraction
te.theme.enabled = true
te.theme.field = Themes
# Only keep themes with score greater than the threshold
te.theme.score.threshold = 0.0
# Of those that are above the threshold, only keep the best 50
te.theme.keep-max = 50

# Theme sentiment goes -0.1 &lt; s &lt; 0.1
te.theme-sentiment.cut1.label = ThemesNegative
te.theme-sentiment.cut1.value = -0.1
te.theme-sentiment.cut2.label = ThemesNeutral
te.theme-sentiment.cut2.value = 0.1
te.theme-sentiment.cut3.label = ThemesPositive
# Set meta-theme field and only keep those above0.1 
te.meta-theme.field = ThemesMeta
te.meta-theme.frequency.threshold = 0.1

# Enable Quotation extraction
te.quotation.enabled = true
te.quotation.field = Quotes
# Max length of a quotation, in characters
te.quotation.max-length = 400

#Enable query topic processing
te.query-topics.enabled = true 
te.query-topics.field = QueryTopics
#Set the location of the query topics definition file
Salience.Options.QueryTopics.setQueryTopicList = /localdisk/djones/lexalytics/salience-6.0/custom/QueryDefinedTopics.dat
te.query-topics-sentiment.enabled = true
te.query-topics-sentiment.cut1.label = QueryTopicsNegative
te.query-topics-sentiment.cut1.value = -0.1
te.query-topics-sentiment.cut2.label = QueryTopicsEntitiesNeutral
te.query-topics-sentiment.cut2.value = 0.1
te.query-topics-sentiment.cut3.label = QueryTopicsPositive

#Enable Twitter processing
te.short-content.enabled = true 

# Summary is always enabled
te.summary.field = Summary
# Document summary length in sentences
te.summary.length = 2

# Set location of my user directory
te.salience.userdataDirectory=/localdisk/djones/lexalytics/salience-6.0/data/user
# Add my sentiment dictionary to the Salience default
te.sentiment.addSentimentDictionary=/localdisk/djones/lexalytics/salience-6.0/custom/custom.hsd