This appendix provides a high level overview of some key features of the Oracle RTD self-learning predictive models, and provides examples for business analysts to understand how specific predictive model settings impact the Oracle RTD Decision logic. This document highlights how those features can be used to strike the right balance between data exploitation (making the best out of a given state of knowledge captured by Oracle RTD self-learning models) and data exploration (gaining more knowledge to improve ability to make better decisions) throughout the lifecycle of models from inception to maturity.
This appendix assumes that the reader is already familiar with Oracle RTD predictive modeling and decision logic concepts and terminology. For more information, see the sections The Oracle RTD Decisioning Process and About Models.
This appendix does not provide an in-depth mathematical explanation of the Oracle RTD predictive modeling features.
Disclaimer
While this appendix illustrates the behavior of various Oracle RTD model settings with specific examples, the choice of settings for improving upon a given decision strategy needs to be done in the overall context of your Inline Service at a given point in time as those settings can interfere with one another. The examples and heuristics described in this appendix are presented primarily to convey understanding of Oracle RTD model behavior. While they may inform best practices in model design, they are not a substitute for the development of a cohesive modeling strategy through iterations. Oracle recommends a thorough analysis of actual production data by skilled data analysis before making changes to those settings.
This appendix contains the following topics:
Predictive models are functions built from observed data samples and used to estimate an outcome that is unknown at the point of decision. For example:
Based on this past week's interaction data, how likely is this visitor to click on this banner when presented on this place in this page?
Based on past email campaign results, how likely is this customer to visit our web site as a result of this new email?
The estimated outcome is never completely accurate, as the processes being modeled have inherent randomness and are affected by data that is not available to the model.
When defining models with Oracle RTD, you can select from three types of models
Choice Event Model
Choice Model
Model
While they share most of their characteristics, each type of model corresponds to a specific usage pattern and differs by the degree of automation provided to support those use cases. For more information, see Section 13.14.1, "Model Types."
The rest of this appendix focuses on Choice Event Models, which are the most frequently used in Inline Services as they provide the highest degree of automation. Choice Event Models are used to predict and analyze the conditional probability of specific events occurring to a 'choice', when given that a base event occurred.
To understand how Oracle RTD makes use of predictive Models to make Decisions, it is important to distinguish between the concepts of model likelihoods and decision scores.
Model likelihoods are numerical values between 0 and 1 returned by predictive models. In the context of Choice Event Models, these represent the estimated likelihood that a specific event will happen under given circumstances (the base event occurrence) and customer data and context. These likelihoods can be used for various purposes in an Inline Service, for example as thresholds in eligibility rules or as a scoring method for ranking choices.
Decision scores are numerical values of any used in the context of making decisions. With Oracle RTD, multiple scores can be associated with choices to select the "best choice" based on several Performance Goals (sometimes referred to as KPIs). The Oracle RTD Decision Framework lets you create many Performance Goals and associate a scoring method to each of them.
An example of decision logic with one single performance goal (that is, driven by a single score) is when the "best choice" is the one with the highest estimated likelihood of click as computed by an Oracle RTD Choice Event model.
An example of decision logic with multiple performance goals (that is, driven by a combination of multiple scores) is when the "best choice" is the one with the highest combined expected revenue and the lowest cost. In this case, the expected revenue could be computed as the result of the multiplication between the expected likelihood predicted by an Oracle RTD Choice Event model and the revenue associated with the product if the offer is accepted, while the cost might be a constant value associated to each product.
To simplify the examples, the rest of this appendix follows the use case of a single performance goal decision scheme, where the scoring method used to select the "best choice" is directly mapped with a Choice Event Predictive Model.
Oracle RTD models learn incrementally, which means they start with zero knowledge and gain a better ability to make accurate predictions as more and more cases are seen.
While it is tempting to think that models reach discrete stages of knowledge (often described as "model convergence" in traditional data mining literature), this definition does not directly apply in the context of Oracle RTD as its models continuously evolve in their knowledge.
While it is not easy to strictly define when an Oracle RTD model reaches a stable level of knowledge, the two initial phases of maturation are clearly distinct from the subsequent phase and are bound by the Significance Threshold parameter.
The Significance Threshold is a number defined globally at the Oracle RTD Inline Service level which applies to all Models, choices and their associated events defined in this Inline Service.
The default value of 25 is defined to ensure very short model maturation cycles. Short maturation cycles are achieved by deciding early that enough cases have been seen by those models to start trusting their likelihood calculations. Giving models an early opportunity to validate their likelihood calculations therefore reduces the time it takes for model to make accurate predictions. Increasing the value of the Significance Threshold will result in a more conservative approach, as it will require more use cases for models to make significant predictions.
At the beginning of its life a model will return NaN (Not a Number) to any request for prediction. This is an indication that the model does not have enough information to even approximate a prediction. How NaN is handled by default during the scoring process is discussed in the rest of this appendix, but users can define their own course of action in case this default behavior does not address their needs, while keeping in mind that this initial phase is transient by nature.
For example, you can restrict choice eligibility to the cases where the likelihood value for choices is not NaN
Or using a different strategy you may decide to make choices eligible as long as their likelihood is NaN or higher than 5%. This strategy would give choices an opportunity to both prove themselves under high uncertainty and restrict consideration to cases when the choice is considered a good enough match (> 5%)
In the context of decisions, you could also use a scoring rule while the model returns NaN.
This initial maturation period lasts until the model has seen a minimal number of base events records. This number is defined as twice the Significance Threshold (by default 2 * 25).
The second period starts once the number of base events reaches twice the Significance Threshold, at which point the model will begin returning likelihoods. While the number of positive events is still low, models will use a simplistic likelihood calculation method based on averages. This means that during this period, models will return the same likelihood for all customers at a given point in time, which is the average likelihood. This average likelihood will itself change with new learning.
The third period starts once the number of positive events seen is higher than twice the Significance Threshold, at which point models will return different likelihoods for different cases. Note that Choice Event models can have multiple positive events, in which case the maturation period described previously applies to each event separately.
The following table summarizes the discussion above and characterizes the method used to compute likelihoods of a given positive event during each period of maturation within the active time window. Please note that the method by which Oracle RTD combines those positive events likelihoods into a final likelihood is described independently in Section A.5, "Model Likelihood Computation for Multiple Events." For simplicity we will assume that the significance threshold is fixed at 25.
Period | Number of Cases (Base Event) | Number of Positive Cases (Positive Event) | Likelihood
(For the positive event for which the likelihood is computed) |
Rationale |
---|---|---|---|---|
Initial Phase | Less than 50 | Any number | NaN | As there is not enough data even for a basic guess, models return NaN |
Second Phase | More than 50 | Greater or equal to 1 and less than 50 | Average Likelihood = Number of Positive Events / Number of Base Events | The average is a good initial approximation when there is not a large amount of data to make reliable predictions |
Third Phase | More than 50 | More than 50 | Different prediction for each case | Model has seen enough data to make predictions and will continue to mature with more data. |
For example, assume a Choice Event model is defined to predict likelihood of someone clicking on an offer when presented. In this case the presentation of the offer is the base event and a click on the offer is the positive event. The evolution of the values returned by the model can be seen in the following table.
Times Offer Presented | Times Offer Clicked | Prediction | Notes |
---|---|---|---|
0 | 0 | NaN | No knowledge |
23 | 1 | NaN | Too little knowledge |
60 | 0 | NaN | Too little knowledge of positive events |
60 | 2 | 3.3% | Average (2/60) is the first approximation |
922 | 35 | 3.8% | Average (35/922) is still a good approximation |
1442 | 57 | Different value for each customer | - |
25436 | 2011 | More accurate different values for each customer | - |
One of the built-in features of Oracle RTD models is their capability to introduce some degree of variability in their prediction calculations. If the Randomize Likelihood option is selected, then Oracle RTD adds a normally distributed noise factor to the computed likelihood. This noise factor has a normal distribution with a mean of 0 and a standard deviation that depends on the total number of positive events and on the Significance Threshold. This randomization factor comes into play as soon as the model is capable of producing likelihoods, which starts at phase 2 of maturity.
Introducing some level of variability in the decision process by way of this randomization factor (often described as Noise in statistical literature) is important in closed loop systems because it creates opportunities to give a fair chance to all offers. This introduced variability enables a closed loop system to reassess its established knowledge (often described as "getting a system out of a local optimum" in statistical literature), making it possible to expand learning and give fairer opportunities to other choices. By controlling the amplitude of the introduced variability over time for each choice event likelihood computation (details appear later in this section), Oracle RTD tries to find the right balance in reassessing its established knowledge without adversely affecting its ability to truly predict responses.
The following table shows some examples for the likelihood computation of a given choice event ("Click"):
Times Choice Presented | Times Choice Clicked | Predicted Likelihood of Click | Average Click Likelihood | Standard Deviation of Noise | Random Noise
(For Example) |
Final Likelihood Value Used
(For Example) |
---|---|---|---|---|---|---|
37 | 5 | NaN | 13.5% | 6.04% | - | NaN |
1250 | 17 | 1.4% | 1.4% | 0.33% | 0.32% | 1.72% |
1250 | 17 | 1.4% | 1.4% | 0.33% | -0.11% | 1.29% |
3506 | 44 | 1.3% | 1.3% | 0.19% | -0.44% | 0.86% |
5577 | 71 | 5.8% | 1.3% | 0.18% | 0.12% | 5.92% |
As illustrated in the preceding table, the level of random noise will affect the final likelihood value used, even for the same input values.
Note that in the table the random noise used in a specific computation may be larger even though the standard deviation of the noise decreases over time as per the nature of randomly drawn variables.
It is important to understand that the amplitude of the noise introduced by this Randomized Likelihood option will quickly decrease over time as the standard deviation of the introduced noise diminishes with the number of positive events. The following graph shows the standard deviation of the noise for different number of positive events in the context of the likelihood computation for a given event, with the assumption that the average response rate is 5%.
Oracle RTD choice event models are defined with an ordered list of events. For example, given a sequence of Presented -> Clicked -> Purchased events that might occur at a different pace and at different times for various choices, Oracle RTD allows this sequence to be defined in one single model rather than having to create and manage separate models for each specific event.
The order of events is significant in the computation of a choice likelihood as Oracle RTD does attempt to use the most valued outcome in its computation (for example, a purchase as opposed to a click, as a purchase is ranked higher in the defined sequence of events). This strategy enables Oracle RTD to identify as early as possible breaks in sequence of positive events, and to target clickers who will purchase rather than clickers who will not purchase during the decision process. From this definition, it also follows that deeper (later) events are expected to be less frequent than earlier events.
If a deeper (later) event's likelihood cannot be computed - that is, we are in the initial phase of the model and the likelihood according to the logic explained previously is NaN - then Oracle RTD will approximate the likelihood using the following method. This method consists of finding the deepest event for which it is possible to compute the likelihood, and multiplying that likelihood by the relative proportion to the originally requested event.
Even when a sequence of events is defined at the model level, Oracle RTD allows the Inline Service implementation to request likelihood scores for a given event.
If a choice has A as a base event and three positive events B, C and D, asking for the likelihood of event D will result in the following actions:
If the likelihood of D can be computed, then Oracle RTD returns it
If the likelihood of D cannot be computed, but the likelihood for B or C can be computed, then Oracle RTD returns the likelihood for the previous event - B or C - multiplied by the relative proportion as described previously
If the likelihood cannot be computed for B or C, Oracle RTD returns NaN
In general if the model for event D for a specific choice has not converged but the model for C has converged, asking for the prediction for D will give a different result than asking for C.
The Oracle RTD Decision process consists of sorting all eligible choices based on the weights and performance goals that the business is interested in optimizing. This sorting of choices operates on the choice 'total score'. To compute a total score, Oracle RTD computes the score of the choice for each of the performance goals, then applies the weights to them and sums them up, reaching the final total score.
In the context of this appendix, we will only focus on a single performance goal decision schemes where the scoring method used to select the best choice is directly mapped with a Choice Event Predictive Model, in which case the total score associated with each choice is based on the likelihood calculated by the Choice Event model.
We will also assume that the Required option in the Performance Goals setup is selected, which indicates that each choice should have a score value for this Performance Goal in order for the total score to be computed. As a total score for each choice is required for the sorting of choices, Oracle RTD needs to define a default strategy for when the defined scoring method does not return a value. This situation typically occurs during the early stages of model maturity. Before a deeper examination of the sorting of choice algorithm, the next sections describe some common scenarios when some scores cannot be computed.
One important challenge of predictive based decision schemes is to handle cases where newly introduced choices (whose models have not yet converged) have to compete with existing choices (associated with established predictive models).
New choices have to be given a fair chance to demonstrate their value before Oracle RTD starts relying on models to predict their likelihood of positive events.
Oracle RTD provides an easy way to ensure a "fair" representation of new choices. "Fair" is defined as being proportionally equal. For example, if there are 15 eligible choices, then a fair representation would give a choice a 6.6% chance of being selected first.
Now, at the same time that we give chances to new eligible choices, we want to use our knowledge for eligible choices for which we have good models. Therefore, ideally if we have a few choices that are new and a few that are established, we would like to give fair representation to the new choices, however many they are, as well as optimized representation to the established ones.
The following example illustrates how Oracle RTD solves this requirement.
Assume there are 15 eligible choices, of which 10 are established and we have good models for them, and 5 are new choices.
Ideally, in this case we would like 33% of the time to have a random choice from the 5 new ones to come ahead and in 66% of the cases to have one of the 10 established choices come ahead based on its total score.
In order to achieve this fairness, Oracle RTD goes through the following process:
Assign to each choice a random number - whether its scores can be computed or not
Sort the choices by comparing them one against the other (iterating through the whole list of eligible choices using pairwise comparisons):
When comparing two choices for which the total score can be computed, use the total score in the comparison
When comparing two choices where one or both of the total score cannot be computed, use the assigned random numbers for comparison
This simple process achieves the fairness goals described previously. In other words, established choices are compared by their total score when competing with established choices, and any choice competing with a "new" choice is sorted randomly (but consistently, as the random number is assigned at the beginning of the sorting process). It is important to note that the random number used to compare choices in the decision process is completely different to the randomization performed for model likelihood calculation described earlier in this appendix.
Note that if the Required checkbox is not checked and the scoring method does not return a value, then the one-on-one comparison with other choices will be based on the other Performance Goals. If there are no performance goals for which the score can be computed, then the random pair wise comparison is used.
As a general rule, you should monitor the number of new choices being introduced at once into an established system, to ensure that the proportion of choices scored using a random number compared to those not using a random number is not reducing the overall performance of the system. This random comparison will only be in place during the initial and transient phases of learning.
By the very nature of most data and inter-data associations, estimates of future events by way of predictive modeling are subject to error. Oracle RTD models have several features to help understand the nature of those errors and design optimal decisions schemes with knowledge of those potential errors.
The Oracle RTD Decision Center quality reports provide the following set of metrics, at both Choice and Choice Group level, to characterize the quality of predictive models:
Model Quality
Cumulative Gains Chart
Model Errors
Model quality is a numeric value, from 0 to 100, that indicates how reliable a model is at predicting an outcome.
Model quality is computed by comparing the model's area under the lift curve compared with a theoretical ideal model.
In Oracle RTD, model quality is computed incrementally as more records are seen. Initially there is a period where there is not yet enough data in the model for the model to be reliable. During this period, Decision Center reports do not show the model quality metric. After the model becomes reliable, Decision Center reports will display model quality.
While the model quality metric distills model evaluation into a single number, it should be understood as a relative number rather than an absolute value.
Some rules of thumb:
A model quality below 30 is generally not very useful. This may occur for a variety of reasons, such as:
Input data not sufficiently correlated to output
Too much noise introduced by input variables with too many values
Modeling the wrong data
A model quality above 95 should be examined, because some inputs may be self-fulfilling, that is, they are always present when the outcome is present
A model quality of 50 or more is good
A model quality of 70 or more is excellent
While there are many metrics that can be associated with predictive models to characterize their behavior, the Model Quality attempts to summarize several of those into one single metric and as such cannot accurately express all aspects of a model. Nevertheless, Model Quality is a good indicator of when to use and trust a model for the purpose of making decisions over alternative scoring strategies.
For example:
If there is a pre-existing offline model score that can be used for scoring while your Oracle RTD models are learning, you can consider using this model until the quality of the Oracle RTD model reaches 30 or 40.
Alternatively, if no such pre-existing model exists, then you should use the Oracle RTD model from the beginning of the process.
Another option is to assign an unknown likelihood (NaN) to choices for as long as it takes for the Oracle RTD model to learn enough to achieve the desired quality. Assuming the associated Performance Goal is marked as required, this strategy would cause your offer to be presented proportionally in a random manner.
An alternative indicator for deciding whether to use and trust a model for the purpose of making decisions over alternative scoring strategies is the model lift. Oracle RTD provides a runtime API that can be used to determine the relative strength of a model at a given point in the lift chart.
The model "lift at 20%" is also a good indicator of model quality. This API returns the lift that the model provides at the 20% point in the lift curve, as seen in reports in Decision Center. Once again, the exact threshold to use depends on your decisioning strategies and the availability of alternative scoring strategies.
Finally a more elaborate strategy would be to build several predictive models using different likelihood estimation strategies and determine in real-time by way of an aggregate model how those various models should be combined.
How does model quality play a role in determining choice likelihood?
Model quality does not play any role in the likelihood calculation.
How does the Significance Threshold play a role?
The Significance Threshold (set within the Model Defaults tab of the Application) is used in many places in the computation of predictions from models, in the building of prediction models and the determination of which attributes and values to consider most predictive.
Because of the variety of uses for this threshold in general it is recommended not to change it.
The best way to interpret this threshold is as the number of cases that have to be seen before we start making statistically valid conclusions.
In an environment where the volume is very low, it may be considered to lower this threshold below the default of 25.
In an environment where there are alternative ways of estimating likelihoods and the Oracle RTD models are to be used only when their accuracy is very high then it would be recommended to increase this threshold, perhaps up to a value of 50.
What is the best practice for removing non-predictive attributes?
By default and because the number of input attributes has minimal impact on scoring throughput, the overall best practice for Oracle RTD is to include all attributes that may contribute to predictiveness of a choice.
Once a model has converged, the inclusion of attributes can be tuned based on observation of their predictiveness. While attributes that have zero predictiveness (as seen in Decision Center driver reports) for all choices of the model may be considered for exclusion, such attributes should not be excluded from the model if they may become predictive at a later time or for newly introduced choices.
Does Oracle RTD identify noisy attributes?
Certain attributes that exhibit dramatically high (90-100) predictiveness should be excluded if they are determined to be collinear with the choices they predict. Those attributes can be excluded in Decision Studio by using the Excluded Attributes section during model configuration or by unchecking "Use for Analysis" in the attribute properties (in which case they would be excluded for all models).
Oracle RTD models can be set up to attempt to automatically identify the cases where some attributes have a suspiciously high degree of predictiveness by enabling the Premise Noise Reduction option in the Model definition. In these cases, Decision Center reports highlight those suspicious very high predictiveness and correlation values with gray bars. For more information, see the topic "Premise Noise Reduction" in Section 13.14.2, "Common Model Parameters."
It is important to note that there is no need to manually exclude attributes that are participating in the eligibility rules of choices, as the presence of those attributes does not impact likelihood calculations since they are accounted for in both the base and the positive events counts.
Attributes that are known to be non-correlated to model outputs, such as unique identifiers or reference keys in a database, can be excluded from the model altogether. Even if they are included, Oracle RTD has been designed for automation and there is negligible noise that will be incurred by including potentially non-predictive attributes.
When is it necessary to rebuild models?
Traditional data mining techniques require predictive models to be rebuilt on a regular basis as often as needed by decreasing quality. Oracle RTD provides several features to automate model rebuilds and therefore optimize model exploitation.
The first feature is model time windows. This automates the rebuild of models based on a set timing thus allowing old data and their past influence not to affect new time window models.
The second feature is the fact models are reassessed on a regular basis. The parameter "Build When Data Changes by" in the Model Defaults tab for the Application in the Inline Service enables you to set the percentage of how many records are required to be seen before building a new prediction model. The default value is 20%.
How many attributes does Oracle RTD select from the total attribute population when calculating scores? All? Top 20? Top 30? Or a variable number based on statistical calculation, and if so, what method?
Oracle RTD includes all of the entity attributes as part of model learning, unless the Use for Analysis checkbox is deselected within the properties box for that attribute.
Model compression does reduce the number of attributes as well as the number of pre-computed bins for numeric values typically resulting in a 10:1 reduction in size. However, note that no reports are currently available that show which values are used.
The Correlation Threshold, defined at the Inline Service application level controls the degree of model compression. It is by default set at 0 and therefore only removes attribute-values that have a zero correlation with the positive event, which results in compressed models without any loss of information. By increasing this threshold, you can increase the model compression ratio at the cost of eliminating predictors whose level of correlation is below your defined threshold.