Updating Sentiment Analysis models

This topic describes how to set up and update the two Sentiment Analysis models with new training data.

The training data sets for the Sentiment Analysis modules consist of two input files with these names:
  • <lang>_pos.txt contains text with positive sentiment.
  • <lang>_neg.txt contains text with negative sentiment.

<lang> is a supported country code: en (UK/US English), fr (French), de (German), it (Italian), or es (Spanish).

The text files should have one sentence per line. You must train your sentiment model against examples of the type of data that you are going to see when you use your model. For example, if you are trying to determine the sentiment of tweets, you will need to obtain examples of tweet review entries. You can either provide your own data or buy it. For a good model, you will need at least several hundred examples, if not thousands.

Each language-specific set of training files must reside in a directory whose name corresponds to the language of the files. The directory names are:
  • american
  • french
  • german
  • italian
  • spanish
The suggested naming structure of the entire directory is:
/<root>/models/sentiment/<language>
where <language> is one or more of the above names.
Create a <language> directory only if you intend to build models for that language. For example, you can have these two language directories:
/share/models/sentiment/american
/share/models/sentiment/french

The american directory would have the en_pos.txt and en_neg.txt files, while the french directory would have the fr_pos.txt and fr_neg.txt files.

To update the Sentiment Analysis model:

  1. Create the directory structure (explained above) for the Sentiment Analysis training files, with a separate sub-directory for each language version.
    In our example, following directory will be used for the American English version of the training files:
    /share/models/sentiment/american
  2. Copy the en_pos.txt and en_neg.txt files into the /american directory.
  3. Run the bdd-admin script with the update-model command, the sentiment model-type argument, and the absolute path to the /sentiment directory:
    ./bdd-admin.sh update-model sentiment /share/models/sentiment
If successful, the command prints these messages:
[2015/08/14 15:35:02 -0400] [web2014.example.com] Generating the sentiment model file using new model file...Success!
[2015/08/14 15:35:55 -0400] [Admin Server] Publishing the sentiment model file...
[2015/08/14 15:36:07 -0400] [Admin Server] Successfully published the model file.

The operation replaces the Sentiment Analysis model's current JAR on the YARN worker nodes with the new one.

You can revert the model by running the command without the path argument:
./bdd-admin.sh update-model sentiment

This reverts the Sentiment Analysis model to the original, shipped version.