The Text Enrichment Properties file defines the configuration of the Salience Engine for the Text Enrichment component instance. All instances of the component can use the same properties file, or you can use different properties files to support different instances of the component.
The spelling and case of the configuration properties must match the spelling and case listed in the following sections.
The following sections and tables describe the configuration properties. Some of the setting values are user-variable (in other words, the user can customize the name or setting), while others must use the specific values described in the table.
te.sentiment.analysis.enabled
If this property is set to true, all levels of sentiment analysis (document, entity, and theme) are enabled. Each level can then be enabled and disabled individually.
If you want to use this feature, you must purchase a license that includes sentiment analysis.
Document Sentiment property | Meaning |
---|---|
te.document-sentiment.enabled | If set to true (the default), Sentiment Analysis is enabled for documents and an overall sentiment score is computed for the current document. If set to false, Document Sentiment Analysis is disabled. |
te.document-sentiment.field | Sets the target field name in the output records in which the sentiment score is written. The field name is user-variable, and DocumentSentiment is the default. |
te.document-sentiment.use-chains | If set to true (the default), document sentiment scoring will use lexical chains in the computation of the sentiment score. The default is true. |
Named Entity property | Meaning |
---|---|
te.entity.enabled | If set to true (the default), Named Entity Extraction is enabled. If set to false, Named Entity Extraction is disabled. |
te.entity.types | Sets the types of named entities to extract.
Supported types are:
The default types are Person, Company, and Product. Each configured entity type is written to a target field whose name is made up of the entity type (such as Person) prefixed by the te.entity-sentiment.field value. |
te.entity.field-prefix | Sets the prefix name that is used to determine
the final field names for the named entities. The field name is user-variable,
and
Entities is the default. For example, if you set
this value to
Entities and you configure
Person and
Company entity types to be extracted, then
EntitiesPerson and
EntitiesCompany will be the two target field names in the
output records.
If you have user-defined entities, then all the user-defined entities are put in the EntitiesList target field. |
te.entity-sentiment.enabled | If set to true (the default), Sentiment Analysis is enabled for entities and a sentiment score is computed for the entities. If set to false, Entity Sentiment Analysis is disabled. |
te.entity-sentiment.cut1.label | Sets the value for this cut1 label. This will be the column name for the negative entities to be extracted. The field name is user-variable, and EntitiesNegative is the default. Negative entities are entities with a score less than the negative threshold. |
te.entity-sentiment.cut1.value | Sets the EntitiesNegative threshold. The value is user-variable (for example, a value of -0.1 can be used). Entity-sentiment scores that are less than this value are written to the EntitiesNegative record field. |
te.entity-sentiment.cut2.label | Sets the value for this cut2 label. This will be the column name for the neutral type of entity sentiments to be extracted. Neutral entities are entities with a sentiment score between the negative and positive thresholds. The field name is user-variable, and EntitiesNeutral is the default. |
te.entity-sentiment.cut2.value | Sets the EntitiesPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). Entity-sentiment scores that are greater than this value are written to the EntitiesPositive record field. |
te.entity-sentiment.cut3.label | Sets the value for this cut3 label. This is the column name for the positive type of entity sentiments to be extracted. Positive entities are entities with a score greater than the positive threshold. The field name is user-variable, and EntitiesPositive is the default. |
Theme property | Meaning |
---|---|
te.theme.enabled | If set to true (the default), Theme Extraction is enabled. If set to false, Theme Extraction is disabled. |
te.theme-type.enabled | Specifies whether output themes will be
standard or normalized.
Valid values are:
If you specify normalized as the value for this property, you must define a normalization.dat file in the directory %LEXALYTICS_HOME%/data/themes. For more information, see Normalizing themes. If the file normalization.dat does not exist, standard themes will be output. If the text enrichment properties file does not include this property, standard themes are output. If the value of the property is specified incorrectly, the graph will fail. |
te.theme.field | Sets the target field name in the output records in which kept theme names are written. The field name is user-variable, and Themes is the default. Kept themes are those themes whose score is higher than the te.theme.score.threshold setting and have made the te.theme.keep-max cut-off list. |
te.theme.score.threshold | Sets a score threshold for keeping themes. That is, only keep themes with a score greater than this threshold. The value is user-variable, and 1.0 is the default. |
te.theme.keep-max | Sets a threshold for keeping the best themes. That is, of those themes that are above the te.theme.score.threshold setting, only keep the themes with the best scores. The value is user-variable, and the default is 100. |
te.theme-sentiment.enabled | If set to true (the default), Sentiment Analysis is enabled for themes and a sentiment score is computed for the themes. If set to false, Theme Sentiment Analysis is disabled. |
te.theme-sentiment.cut1.label | Sets the value for this cut1 label. The field name is user-variable, and ThemesNegative is the default. Negative themes are themes with a score less than the negative threshold. |
te.theme-sentiment.cut1.value | Sets the ThemesNegative threshold. The value is user-variable (for example, a value of -0.1 can be used). |
te.theme-sentiment.cut2.label | Sets the value for the cut2 label. The field name is user-variable, and ThemesNeutral is the default. Neutral themes are themes with a sentiment score between the negative and positive thresholds. |
te.theme-sentiment.cut2.value | Sets the ThemesPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). ThemesPositive are themes with a score greater than the positive threshold. |
te.theme-sentiment.cut3.label | Sets the value for this cut3 label. The field name is user-variable, and ThemesPositive is the default. Positive themes are themes with a score greater than the positive threshold. |
te.meta-theme.field | Sets the target field name in the output records in which the meta-themes are written. The field name is user-variable, and ThemesMeta is the default. Meta-themes are a list of themes in the document. |
te.meta-theme.frequency.threshold | Sets a score threshold for keeping meta-themes. That is, only keep meta-themes with a score greater than this threshold. The value is user-variable, and 1.0 is the default. |
Quotation property | Meaning |
---|---|
te.quotation.enabled | If set to true (the default), Quoted Context Extraction is enabled. If set to false, Quoted Context Extraction is disabled. |
te.quotation.field | Sets the target field name in the output records in which quoted content is written. The field name is user-variable, and the default is Quotes. |
te.quotation.max-length | Sets the maximum length (in characters) of a quotation. The default length is 200. Note that if the quotation in the source field is longer than this setting, the source quotation is not written to the target field. |
Social Media property | Meaning |
---|---|
te.short-content.enabled | If set to true, the processing of Social Media (for example, Twitter data) is enabled. If set to false (the default), Social Media processing is disabled. Note that Social Media (short content) processing is only applicable to default data (which means English). If you are using a language data other than English, make sure to set this property to false. |
Document Summary property | Meaning |
---|---|
te.summary.field | Sets the column name in the output file in which the summarization of the input content is written. The field name is user-variable, and the default is Summary. |
te.summary.length | Sets the document summary length in sentences. The default length is 3 sentences. |
Basic Custom properties | Meaning |
---|---|
te.salience.userdataDirectory | Takes an absolute path to a directory that contains a user-created data dictionary. |
te.sentiment.setSentimentDictionary | Takes an absolute path to a user-created dictionary that will be used as the sentiment dictionary for the Salience Engine (that is, this dictionary overrides the default Salience sentiment dictionary). |
te.sentiment.addSentimentDictionary | Takes an absolute path to a user-created
dictionary that will be used in addition to the current sentiment Analysis
dictionary:
|
Query Topics property | Meaning |
---|---|
te.query-topics.enabled | If set to true (the default), query topic processing is enabled. If set to false, query topic processing is disabled. |
te.query-topics.field | Sets the target field name in the output records to which specified query topics are written. The field name is user-variable, and QueryTopics is the default. |
Salience.Options.QueryTopics.setQueryTopicList | Specifies the location and name of the file used to define the topics and queries you want to use to tag output from this instance of the Text Enrichment component. |
te.query-topics-sentiment.enabled | If set to true (the default), Sentiment Analysis is enabled for query topics and a sentiment score is computed for the topics. If set to false, Query Topic Sentiment Analysis is disabled. |
te.query-topic-sentiment.cut1.label | Sets the value for this cut1 label. The field name is user-variable, and QueryTopicsNegative is the default. Negative query topics are query topics with a score less than the negative threshold. |
te.query-topic-sentiment.cut1.value | Sets the QueryTopicsNegative threshold. The value is user-variable (for example, a value of -0.1 can be used). |
te.query-topic-sentiment.cut2.label | Sets the value for the cut2 label. The field name is user-variable, and QueryTopicsNeutral is the default. Neutral query topics are query topics with a sentiment score between the negative and positive thresholds. |
te.query-topic-sentiment.cut2.value | Sets the QueryTopicsPositive threshold. The value is user-variable (for example, a value of 0.1 can be used). Positive query topics are query topics with a score greater than the positive threshold. |
te.query-topic-sentiment.cut3.label | Sets the value for this cut3 label. The field name is user-variable, and QueryTopicsPositive is the default. Positive query topics are themes with a score greater than the positive threshold. |
Salience.Options.Base.xxx Salience.Options.Collections.xxx Salience.Options.Concepts.xxx Salience.Options.Entities.xxx Salience.Options.QueryTopics.xxx Salience.Options.Sentiment.xxxwhere xxx is the name of the specific method you want to configure, such as Salience.Options.Base.setFailLongSentence.
Information on these classes is available in the Lexalytics Salience 5.1 Javadoc:
http://dev.lexalytics.com/doc/java-se5.1/
These API extension points are not parsed by the Text Extraction component. The values are passed directly to the Salience Engine as is.
Enables or disables Sentiment Analysis on a global basis.
Enables or disables Document Sentiment Analysis.
Enables or disables Entity Sentiment Analysis.
Enables or disables Theme Sentiment Analysis.
If te.sentiment-analysis.enabled is set to false, Sentiment Analysis is disabled globally. The document, entity, and theme sentiment activators are all treated as false, regardless of the specific setting of the individual activators. No sentiment analysis of any type is performed.
If te.sentiment-analysis.enabled is set to true, you can enable and disabled document, entity, and theme sentiment analysis in any combination. For example, if you are not interested in entity sentiment analysis, you can disable it but enable document and theme sentiment analysis.
If you are using Sentiment Analysis for themes, entities, and query topics, you can customize the number of cuts. The "Named Entity property", "Theme property" and "Query topic" tables above assume that you are using three cuts for positive, negative, and neutral scores, but you can use more or fewer cuts.
te.entity-sentiment.cut1.label = fieldName1 te.entity-sentiment.cut1.value = sentimentScore1 te.entity-sentiment.cut2.label = fieldName2 te.entity-sentiment.cut2.value = sentimentScore2 te.entity-sentiment.cut3.label = fieldName3 te.entity-sentiment.cut3.value = sentimentScore3 ... te.entity-sentiment.cut–1.label = fieldNameN-1 te.entity-sentiment.cut–1.value = sentimentScoreN-1 te.entity-sentiment.cutN.label = fieldNameN
This field schema can be represented graphically by this illustration:
The above configuration specifies N different fields into which the named-entities will be mapped based on their sentiment-scores. Any entity whose sentiment-score is between MIN_FLOAT and sentimentScore1 will be placed in fieldName1. Then, any entity whose sentiment-score is between sentimentScore1 and sentimentScore2 will be placed in fieldName2, and so on. Finally, any entity whose sentiment score is between sentimentScoreN-1 and MAX_FLOAT will be placed in fieldNameN.
The label can be any string that is allowed to be a field-name (e.g., EntitiesBucket1). The value can be any floating-point number.
te.entity-sentiment.cut1.label = EntitiesNegative te.entity-sentiment.cut1.value = -0.1 te.entity-sentiment.cut2.label = EntitiesNeutral te.entity-sentiment.cut2.value = 0.1 te.entity-sentiment.cut3.label = EntitiesPositive
Configure theme sentiment and query topic sentiment the same way. The only difference is the name of the fields used in the configuration.
# Enable Sentiment Analysis on global basis te.sentiment-analysis.enabled = true # Enable Document Sentiment te.document-sentiment.enabled = true te.document-sentiment.field = DocumentSentiment # Enable Entity extraction te.entity.enabled = true # Entity types to allow and their prefix te.entity.types = Person, Company, Product, Place te.entity.field-prefix = Entities # Entity sentiment goes -0.1 < s < 0.1 te.entity-sentiment.enabled = true te.entity-sentiment.cut1.label = EntitiesNegative te.entity-sentiment.cut1.value = -0.1 te.entity-sentiment.cut2.label = EntitiesNeutral te.entity-sentiment.cut2.value = 0.1 te.entity-sentiment.cut3.label = EntitiesPositive # Enable Theme extraction te.theme.enabled = true te.theme.field = Themes # Only keep themes with score greater than the threshold te.theme.score.threshold = 0.0 # Of those that are above the threshold, only keep the best 50 te.theme.keep-max = 50 # Theme sentiment goes -0.1 < s < 0.1 te.theme-sentiment.cut1.label = ThemesNegative te.theme-sentiment.cut1.value = -0.1 te.theme-sentiment.cut2.label = ThemesNeutral te.theme-sentiment.cut2.value = 0.1 te.theme-sentiment.cut3.label = ThemesPositive # Set meta-theme field and only keep those above0.1 te.meta-theme.field = ThemesMeta te.meta-theme.frequency.threshold = 0.1 # Enable Quotation extraction te.quotation.enabled = true te.quotation.field = Quotes # Max length of a quotation, in characters te.quotation.max-length = 400 #Enable query topic processing te.query-topics.enabled = true te.query-topics.field = QueryTopics #Set the location of the query topics definition file Salience.Options.QueryTopics.setQueryTopicList = /localdisk/djones/lexalytics/salience-6.0/custom/QueryDefinedTopics.dat te.query-topics-sentiment.enabled = true te.query-topics-sentiment.cut1.label = QueryTopicsNegative te.query-topics-sentiment.cut1.value = -0.1 te.query-topics-sentiment.cut2.label = QueryTopicsEntitiesNeutral te.query-topics-sentiment.cut2.value = 0.1 te.query-topics-sentiment.cut3.label = QueryTopicsPositive #Enable Twitter processing te.short-content.enabled = true # Summary is always enabled te.summary.field = Summary # Document summary length in sentences te.summary.length = 2 # Set location of my user directory te.salience.userdataDirectory=/localdisk/djones/lexalytics/salience-6.0/data/user # Add my sentiment dictionary to the Salience default te.sentiment.addSentimentDictionary=/localdisk/djones/lexalytics/salience-6.0/custom/custom.hsd