Subject Line Predictions and Push Title Optimization FAQ

Subject Lines continue to be a fundamental aspect of winning consumer attention. Marketers can deploy with higher confidence by infusing machine learning and the scientific method into the art of copywriting. You may improve open rates by selecting subject lines more likely to awaken a customer’s interest.

In this document, you'll learn about:


The Subject Line Predictions feature uses historic subject lines and associated open rates to predict whether a new subject line will lead to a higher or lower than average open rate. The marketer is shown a prediction of whether the subject line will lead to a ‘Good’ or ‘Poor’ open rate, predicted open rate, key phrases that may have a positive or negative impact on the subject line, and similar subject lines with their associated average open rate. When there are too many unknown words in the subject line, it results in "No prediction".

It enables the marketer to quickly optimize and determine the best subject line before launching a campaign, preventing the hassle of keeping a repository of all email subject lines that they have used so far.

Points to remember

  1. This feature is supported for email and push campaigns
  2. Personalization tokens are taken into account while making predictions. However, complex RPL scripts that reference content in the content library cannot be processed.
  3. Text in the subject line is made lower case and punctuation marks are stripped from the analysis.
  4. There needs to be at least 300 unique subject lines in order to be able to train the model. If there is not enough data to process, the algorithm will return a "No prediction".
  5. The algorithm can return "No prediction" in the following instances:
    1. When there is an error receiving the input text from the user interface.
    2. If we receive a subject line that contains complex RPL (that is, nested if/else statements) or references the subject line in the content library.
    3. When there are insufficient data, for example, there are fewer than 300 historical subject lines.
    4. When there are too many unknown words in the subject line for the model to predict. "Unknown" is defined as not being in our vocabulary we build during the learning phase.
  6. The model doesn’t factor in emotions or intent in a subject line. For example, any subject line with a sarcastic or negative tone will not be flagged as "Poor".
  7. Emojis are converted to equivalent text and processed to be used for predictions.
  8. Key phrase highlighting is available for 3-word and 4-word phrases. It is available only for English subject lines and Latin-based subject lines.
  9. The model does not factor in the audience or content referenced in the campaign.

Data science mechanics

Predictions are made by applying data science algorithms to data collected from historical subject line performances. The algorithm is applied only to your data and not across all Oracle Responsys customers.

  1. The system leverages the following customer data:
    • For Subject lines, the data used is: sent count, unique open count, and delivered count
    • For Push titles, the data used is: Title, sent count, delivered count, open count
  2. During the learning phase, the algorithm parses the subject lines to create the vocabulary (a set of words and phrases up to 3 words in length, which are present in the historical subject lines). Then, after characterizing the subject line by the occurrences of words and phrases it contains, Oracle Responsys uses that information along with the known good/poor open rate label, to build the predictive model for subject line performance.
  3. When Oracle Responsys receive a new subject line to make a prediction, Responsys characterizes it with the same method used in the learning phase. That is, Oracle Responsys identifies the occurrences of words/phrases that exist in the vocabulary and input it into the predictive model to return a good/poor prediction.
  4. The predictive model requires a minimum of 300 subject lines. The algorithm will split the data into a training set (80%) and a testing set (20%). For example: If there are 500 subject lines in your account, the system will train the model on 400 records and then tests on 100 records.

Frequently Asked Questions

Can you predict on subject lines that use personalization variables?

Yes. Personalization variables like the recipient’s first name typically improve open rates and will be taken into consideration in predictions.

Can you predict performance for dynamic subject lines?

Yes. Each version of the subject line will receive its own prediction.

Note: This does not apply to Push titles.

What languages does the Subject line prediction support and can we get a predicted open rate for a non-English subject line?

Yes, the prediction feature is supported for non-English subject lines as well. In addition to knowing whether a subject line will yield a better or poor open rate, you can receive a predicted open rate for the following languages: Swedish, Spanish, Portuguese, German, French, Italian, Chinese, Japanese, and Hebrew.

For Push titles, the languages supported are: English and Portugese.

Does the model look into industry benchmarks?

Currently, the model looks into the data present in your account.

What is the cadence at which the model gets trained?

The model receives incremental training bi-weekly for both subject lines and push titles.

What is the lookback window for the model?

The model currently looks at all the historical data.

Why does the prediction return false positives?

Our algorithm is evolving, you may notice false positives in our prediction in some of these instances:

  1. Typos in subject lines are not detected. We don’t parse for word definitions nor compare against common dictionaries.
  2. Subject lines that may flag the email to be spam may not be identified as "poor". Again, we do not parse based on word definitions or assumed connotations. This does not apply to push titles.
  3. A best practice is to keep subject lines succinct (6-8 words). Please be aware, the model currently can mark longer subject lines as “good”. For Push titles, a best practice is to keep titles succinct.

Is the key phrase highlighting in a given subject line supported for character-based languages?

No, the key phrase highlighting is not supported for Japanese, Chinese, and Hebrew subject lines.

When would I see top key phrase insights? What do the metrics "Average open rate" and "Usage frequency" mean?

The top phrases and words insights are shown when we don't have insights into key phrases that have an impact on the subject line (that is, when key phrase highlighting is not available).

Average open rate is the average of unique open rates of those subject lines where this phrase appeared.

Usage frequency is the number of subject lines where this top phrase that appeared.

Can I get predictions for subject lines that I want to use as part of my multivariate test (MVT) campaign?

Yes, you can get predictions for subject lines used as part of MVT. If each of the subject line variants is unique, then they will be treated as unique subject lines and count toward your overall minimum requirement of having 300 subject lines in the account for the prediction to work well.

How long will it take for the prediction algorithm to work after the feature is enabled for the account?

The subject line prediction feature should work as soon as it is enabled for the account, provided that your account meets the criteria of having at least 300 subject lines.

How many similar subject lines will I see for a given subject line?

You will see up to 4 similar subject lines and their corresponding average open rate.

Learn more

Subject Line Predictions and Push Title Optimization FAQ

Impacts of enabling or disabling Subject Line Predictions