10 Setting Up and Using Model Snapshots

The Oracle RTD Model Snapshot feature allows you to export data accumulated in Oracle RTD predictive models to external database tables. These results include counts of events, predictiveness values, and correlations. The data exported from the Oracle RTD models can then be analyzed using standard reporting and business intelligence products and techniques.

The data that Oracle RTD collects in its predictive models for a given Inline Service is attached to a Study. The association between an Inline Service and a Study is defined at deployment time.

The Model Snapshot functionality of Oracle RTD operates at the Study level and affects all the models defined in the Inline Service. Using Model Snaphots, you will be able to export the data contained in all the Models of an Inline Service for a given Study.

The data exported by the Model Snapshot feature allows you to replicate and extend the standard choice and choice group level "predictive model" reports provided by Oracle RTD Decision Center. Furthermore, when associated with customer data from a data warehouse, this exported data enables offline customer centric reporting of predictive insights collected and generated by Oracle RTD.

This section consists of the following topics:

Section 10.1, "Overview of Setting Up and Using Model Snapshots"
Section 10.2, "Model Snapshot Tables Schema"
Section 10.3, "Configuring the Model Snapshot Tables"
Section 10.4, "Populating and Clearing the Model Snapshot Tables"
Section 10.5, "Creating Reports from the Model Snapshot Data"
Section 10.6, "Handling Partitions"
Section 10.7, "Tuning the Model Snapshot Process"

10.1 Overview of Setting Up and Using Model Snapshots

Note:

The description of how to set up and use the Model Snapshot functionality of Oracle RTD assumes that you have a running Inline Service with populated predictive models.

There are three main stages in the process of setting up and using model snapshots, as follows:

Configuring the model snapshot tables.
Populating the model snapshot tables from the learned data, and clearing the tables as required.
Creating reports from the model snapshot tables.

There are two parameters related to model snapshots that are for tuning purposes. For more information, see Section 10.7, "Tuning the Model Snapshot Process."

10.2 Model Snapshot Tables Schema

Oracle RTD exports its predictive model data using a multi-dimensional schema as described in the entity relationship model of Figure 10-1, which shows the model snapshot tables (except for RTDSnapshotSchemaVersion), their columns, and the inter-table relationships, as represented by the connector lines.

Each connector represents a one-to-many relationship, from the table at the top end of the connector to the table at the lower end. In the reverse direction, the relationship from the lower table to the top table is one-to-one.

Figure 10-1 Model Snapshot Tables Schema

Surrounding text describes Figure 10-1 .

The one-to-many relationships between the elements represented by the tables are as follows (each one-to-many relationship implies the corresponding one-to-one relationship in the reverse direction):

Each Study can have one or more Applications, and one or more Models.
Each Application can have one or more Choice Groups, and one or more Choices.
Each Choice Group can have one or more Choices.
Each Choice, Event, and Model can have one or more Model Instances.
Each Model Instance can have one or more Cumulative Gains.

There are also three many-to-many relationships between Model Instances and Attributes, namely Partition, Predictiveness, and Correlation.

The table and column names appear in the following list.

RTDApp

Column Description

id

Primary key.

study_id

Foreign key to RTDStudy.

name

Application name.
RTDAttribute

Column Description

id

Primary key.

name

Attribute name.

path

Delimited attribute names showing how this attribute was reached from the root of the session.

RTDChoice

Column	Description
id	Primary key.
app_id	Foreign key to RTDApp.
internal_name	Internal name for Choice.
display_name	Name displayed for Choice.
choicegroup_id	Foreign key to RTDChoiceGroup.

RTDChoiceGroup

Column Description

id

Primary key.

app_id

Foreign key to RTDApp.

internal_name

Internal name for Choice Group.

display_name

Name displayed for Choice Group.

RTDCorrelation

Column	Description
model_instance_id	Foreign key to RTDModelInstance.
attribute_id	Foreign key to RTDAttribute.
count_output_input	Count of those in the population where the value of this output attribute was found given the input.
correlation	Value between -1 and 1. A positive correlation indicates the degree to which the input attribute's value is associated with the value of the output. A negative correlation indicates a negative association.
count_input	Count of those in the population where this value of this input attribute was found.
value	Value for the input attribute.

Note:

The sum of over the RTDCorrelation column count_input for a particular model instance may be less than the related RTDModelInstance column count_total.

Also, the sum over the RTDCorrelation column count_output_input for a particular model instance may be less than the related RTDModelInstance column count_positive.

These results may occur in the following situations:

Where rows were not stored in RTDCorrelation due to exceedingly low correlations, for example, too close to 0.
Where the RTDCorrelation column value would have been null, meaning a value for the attribute was absent from the session. In these cases, no correlation is calculated.

RTDCumulativeGains

Column Description

model_instance_id

Foreign key to RTDModelInstance.

x

Cumulative gains curve data point.

y

Cumulative gains curve data point.
RTDEvent

Column Description

id

Primary key.

name

Event name.
RTDModel

Column Description

id

Primary key.

name

Model name.

study_id

Foreign key to RTDStudy.

RTDModelInstance

Column	Description
id	Primary key.
state	Model state. Values can be: c: Completed. The model is completed, such as for previous time windows, and is no longer subject to change. It will not be rewritten unless you perform a total model snapshot, or delete the model snapshots and then perform an incremental model snapshot. d: Combined. The model is for the current and previous time window. s: Split. The model is for the current time window. w: Written. The model is currently being written. Results may be inconsistent.
model_id	Foreign key to RTDModel.
choice_id	Foreign key to RTDChoice.
event_id	Foreign key to RTDEvent.
count_total	Total number of base events.
count_positive	Size of the subset of the population where this non-base event was recorded.
time_window_start	Start of time window.
time_window_end	End of time window.
quality	Quality of the model, value vary from 0 to 1. Higher values are better, 0 means nothing was learned.
mae	Mean absolute error.
me	Mean error.
mse	Mean square error.
rmse	Root mean square error.
rel_mae	Relative mean absolute error.
rel_me	Relative mean error.
rel_mse	Relative mean square error.
rel_rmse	Relative root mean square error.

RTDPartition

Column Description

model_instance_id

Foreign key to RTDModelInstance.

attribute_id

Foreign key to RTDAttribute.

value

Attribute value for the partition.

RTDPredictiveness

Column	Description
model_instance_id	Foreign key to RTDModelInstance.
attribute_id	Foreign key to RTDAttribute.
predictiveness	Explanatory score of an input attribute for a particular model instance. Values vary from 0 to 1, higher values are better.

RTDSnapshotSchemaVersion

Column Description

major

Major version.

minor

Minor version.
RTDStudy

Column Description

id

Primary key.

name

Study name.

Column	Description
major	Major version.
minor	Minor version.

10.3 Configuring the Model Snapshot Tables

Note:

The model snapshot tables are set up in the SDDB database during Simple and Enterprise installs of Oracle Real-Time Decisions. You may choose to store the model snapshot data in the SDDB database tables, but this is not recommended for a production system.

The main objectives of this stage are to create the model snapshot tables in a non-SDDS database, and to register them with the application server and the JMX MBeans.

Important:

The text values in the model snapshot tables must be case sensitive. If the default setting for your database is case insensitive, make sure that you override the setting when creating the model snapshot tables.

To configure the model snapshot tables:

Select the database where your model snapshot tables will be stored.

From the RTD_HOME\scripts directory, run the command that creates the model snapshot tables:

sdexec com.sigmadynamics.tools.SDDBTool.SDDBTool -f -i -I InitSnapshotDb.ctl db_type db_host db_port db_name db_runtime_user db_admin_user db_admin_password

Note:

By replacing the -i parameter with -u, the operation changes from initialization of a new schema to an upgrade of an existing schema.

The following table describes the parameters for the sdexec script.

Parameter	Description
`db_type`	The database type. Select one of the following: oracle, sqlserver, db2.
`db_host`	The name of the computer hosting the database server.
`db_port`	The database port number.
`db_name`	The name of the database or, for Oracle Database, the SID.
`db_runtime_user`^Foot 1	The user name of the run time user for the system.
`db_admin_user`	The name of a user that has rights on the database to create tables and stored procedures.
`db_admin_password`	The password of the administrative user.

^Footnote 1For Oracle Database, the db_runtime_user and db_admin_user are the same user.

Create a new Data Source in your application server, that references the database where your model snapshot tables are stored.

For details of how to create a Data Source in an application server, see Chapter 5, "Configuring Data Access for Oracle Real-Time Decisions."

As part of the operation of creating a new Data Source, you provide a new JNDI name, that is used in the steps following.
Go to the directory where you expanded the RTD.ear file during installation (RTD_HOME/package/expanded).
Open the ls.war archive, extract web.xml, then open web.xml for editing. Scroll to the bottom of the file. Copy the section for the definition of the resource reference of SDDS_LS and paste it after the existing section. In the copied section, replace the string SDDS with the JNDI name (jndi_name) that you provided in step 3.

For example:
```
<resource-ref id="jndi_name_LS">
  <res-ref-name>jndi_name</res-ref-name>
  <res-type>javax.sql.DataSource</res-type>
  <res-auth>Container</res-auth>
  <res-sharing-scope>Unshareable</res-sharing-scope>
</resource-ref>
```
Save the changes and close the file, then re-archive the file back in ls.war.
In Enterprise Manager, register the name of the new Data Source in the OracleRTD MBeans:
- Navigate to MBeans > OracleRTD > SDClusterPropertyManager > Misc.
- For the ModelSnapshotDSName attribute, specify the name of the Data Source that you created for the model snapshots.

10.4 Populating and Clearing the Model Snapshot Tables

You can take snapshots of the model learning data at any time, even if not enough data has been accumulated for prediction purposes.

Each model is defined to have a time window, such as week, month, and quarter. This determines how much data to gather in the learning process, and also influences how much data to write to the model snapshot tables.

You can select the amount of data to write from the following options:

All the model learning data for a Study

All the model learning data for a Study for the current time window

You can also delete all model learning data for a Study from the model snapshot tables.

To populate or clear the model snapshot tables:

In Enterprise Manager, navigate to MBeans > OracleRTD > Learning Server > your study name.
Click the Operations tab.
Click the appropriate snapshot option:
- CompleteSnapshot
  
  Deletes all previously snapped data for the Study, and rewrites all Study data, up to the current time.
- IncrementalSnapshot
  
  Deletes Study data for any incomplete time window, and rewrites the current time window's data, up to the current time.
- DeleteSnapshot
  
  Deletes all snapped data for the Study.

10.5 Creating Reports from the Model Snapshot Data

You can create reports from the model snapshot tables, typically by using standard SQL Select commands, or by sending the data to business intelligence products which produce reports similar to the following product sales driver report:

Surrounding text describes prod_sales.gif.

The rest of this section provides examples of scripts that extract information from several of the model snapshot tables. Each script example is followed by sample output. Notes are provided for some of the examples to help you to interpret the results.

The Inline Service used for these examples was a CrossSell application, and the data was generated by running the Oracle RTD Load Generator script to completion, simulating 400,000 user sessions.

10.5.1 Counts by Choice Query

The following query gets the counts for every Choice, for all time windows:

select g.display_name     as 'Choice Group', 
       c.display_name     as 'Choice',
       e.name             as 'Event',
      mi.timewindow_start as 'Start',
      mi.timewindow_end   as 'End',
      mi.state            as 'Model Status',
       m.name             as 'Model Name',
      mi.count_total,
      mi.count_positive,
      mi.quality
from RTDApp a 
     inner join RTDStudy s          on s.id=a.study_id
     inner join RTDModel m          on m.study_id=s.id
     inner join RTDModelInstance mi on mi.model_id=m.id
     inner join RTDEvent e          on mi.event_id=e.id
     inner join RTDChoice c         on c.id=mi.choice_id
     inner join RTDChoiceGroup g    on c.choicegroup_id=g.id
where a.name='CrossSell'
order by m.name, 
         g.display_name, 
         c.display_name,
       mi.timewindow_start

Figure 10-2 shows the results of the Counts by Choice query.

Figure 10-2 Counts by Choice Query Results

Surrounding text describes Figure 10-2 .

Notes on the Counts by Choice Query Results

In row 13, Choice=Gold Card, Event=Interested, count_total=477, count_positive=19, and quality=0.0. This shows that, out of 477 users that were presented with the Gold Card offer, 19 were interested. The counts are small and the model quality with respect to the Gold Card Choice and the Interested Event is 0.

In row 14, Choice=Gold Card, Event=Purchased, count_total=477, count_positive=1, and quality=0.0. This shows that one user purchased this offer. The model quality with respect to the Gold Card Choice and the Purchased Event is 0.

Both row 13 and row 14 apply to the period of time from April 1, 2003 to July 1, 2003.
In the columns Choice Group and Choice, the value BASE EVENT means "in general" or "overall."

For example, in row 1, Choice=BASE EVENT, Event=Interested, count_total=24917, count_positive=1663, and quality is approximately 0.6882. This means that, in the period between the Start and End dates, a grand total of 24917 users were presented offers, and 1663 of these were interested. The overall model quality for this period of time was about 0.69.

In row 2, Choice=BASE EVENT, Event=Purchased, count_total=24917, count_positive=220, and quality is approximately 0.5666. This means that, for the same period of time as for row 1, there were 220 Purchased events across all Choices, and the model quality was about 0.57.

10.5.2 Top Six Predictive Attributes Query

The following query selects the top six predictive attributes, for each time window, for the Credit Protection Choice resulting in the Purchased Event.

select a.name                'Attribute Name',
       p.predictiveness      'Predictiveness',
       c.display_name        'Choice Name',
      mi.timewindow_start as 'Start',
      mi.timewindow_end   as 'End',
      mi.state            as 'Model Status'
from RTDApp app
     inner join RTDChoice c         on c.app_id=app.id
     inner join RTDStudy s          on s.id=app.study_id
     inner join RTDModel m          on m.study_id=s.id
     inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id
     inner join RTDEvent e          on mi.event_id=e.id
     inner join RTDPredictiveness p on p.model_instance_id=mi.id
     inner join RTDAttribute a      on a.id=p.attribute_id
where app.name         = 'CrossSell'
  and   c.display_name = 'Credit Protection'
  and   e.name         = 'Purchased'
  and   m.name         = 'OfferAcceptance'
  and   7 > (select count(*) 
             from RTDPredictiveness p2 
             where p2.model_instance_id = p.model_instance_id 
               and p2.predictiveness > p.predictiveness)
order by mi.timewindow_end desc,
          p.predictiveness desc

Figure 10-3 shows the results of the Top Six Predictive Attributes query.

Figure 10-3 Top Six Predictive Attributes Query Results

Surrounding text describes Figure 10-3 .

10.5.3 Difference Between Expected and Actual Counts Query

The following query shows, for people of different marital statuses, the number who actually purchased credit protection and the number who were expected to do so. The report also shows the difference between the two values, and the importance of the correlation for the offer acceptance.

select cor.value               'customer MaritalStatus',
       cor.count_output_input  'Actual Count',
       mi.count_positive*cor.count_input/mi.count_total 'Expected Count',
  100* (  (mi.count_total*cor.count_output_input)
        / (mi.count_positive*cor.count_input) - 1) 'Percent Difference', 
       cor.correlation         'Importance',
         c.display_name        'Choice Name',
        mi.timewindow_start as 'Start',
        mi.timewindow_end   as 'End',
        mi.state            as 'Model Status'
from RTDApp app
     inner join RTDChoice c         on c.app_id=app.id
     inner join RTDStudy s          on s.id=app.study_id 
     inner join RTDModel m          on m.study_id=s.id
     inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id
     inner join RTDEvent e          on mi.event_id=e.id
     inner join RTDCorrelation cor  on cor.model_instance_id=mi.id
     inner join RTDAttribute a      on a.id = cor.attribute_id
where app.name         = 'CrossSell'
    and c.display_name = 'Credit Protection'
    and e.name         = 'Purchased'
    and m.name         = 'OfferAcceptance'
    and a.name         = 'customer MaritalStatus'
order by mi.timewindow_end desc,
        cor.correlation desc

Notes on the Difference Between Expected and Actual Counts Query

The Actual Count, cor.count_output_input, is the actual number of people who purchased credit protection, for each marital status.
The Expected Count is a simple linear projection of the total count of each marital status, cor.count_input, to those that purchased credit protection, as expressed by mi.count_positive/mi.count_total.
The Percent Difference is 100 * (Actual Count - Expected Count) / Expected Count.

Figure 10-4 shows the results of the Difference Between Expected and Actual Counts query.

Figure 10-4 Difference Between Expected and Actual Counts Query Results

Surrounding text describes Figure 10-4 .

Notes on the Difference Between Expected and Actual Counts Query Results

There are two rows for each marital status, each corresponding to one of the two time periods, April 1, 2003 - July 1, 2003 and July 1, 2003 - October 1, 2003.

10.6 Handling Partitions

The RTDPartition table holds values for partitioning attributes. If a Model is not partitioned, the Model has one Model Instance per time window, and there are no associated rows in the RTDPartition table.

A Model that is split along one or more of its dimensions is a partitioned Model.

As an example, a Model M can be partitioned by two attributes, Marital Status and Favorite Beverage. If there are 3 values for Marital Status (Married, Single, Divorced) and 2 for Favorite Beverage (coffee, tea), then this model has 6 model instances.

In this case, each Model Instance has two associated RTDPartition rows. For example, the Model Instance for the combination (Marital Status=Married and Favorite Beverage=Coffee) would be associated with two RTDPartition rows, containing the following information:

RTDPartition row 1: Attribute=Marital Status, Value=Married
RTDPartition row 2: Attribute=Favorite Beverage, Value=Coffee

Whether or not a Model is partitioned can influence the results of queries on the model snapshot tables. To avoid repetitions in your results, include RTDPartition and RTDAttribute join conditions in your query.

The following example modifies and extends the Difference Between Expected and Actual Counts query to cover the case of a Model partitioned on two attributes, Diabetic, which has "yes" and "no" values, and Marital Status.

select a.name,
       p.value,
      subquery.* 
from (select cor.value               'Favorite Sports',
             cor.count_output_input  'Actual Count',
             mi.count_positive*cor.count_input/mi.count_total 'Expected Count',
  100* (  (mi.count_total*cor.count_output_input)
        / (mi.count_positive*cor.count_input) - 1) 'Percent Difference', 
       cor.correlation         'Importance',
         c.display_name        'Choice Name',
        mi.timewindow_start as 'Start',
        mi.timewindow_end   as 'End',
        mi.state            as 'Model Status'
        mi.id                   model_instance_id
  from RTDApp app
       inner join RTDChoice c         on c.app_id=app.id
       inner join RTDStudy s          on s.id=app.study_id 
       inner join RTDModel m          on m.study_id=s.id
       inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id
       inner join RTDEvent e          on mi.event_id=e.id
       inner join RTDCorrelation cor  on cor.model_instance_id=mi.id
       inner join RTDAttribute a      on a.id = cor.attribute_id
  where app.name       = 'HighlyPartitionedDataset'
    and c.display_name = 'Fanta'
    and e.name = 'loved'
    and m.name = 'SatisfactionModel'
    and a.name = 'Favorite Sports') as subquery 
inner join RTDPartition p on subquery.model_instance_id = p.model_instance_id
inner join RTDAttribute a on p.attribute_id = a.id
order by subquery.[End] desc,
         subquery.model_instance_id,
         a.name,
         p.value
         subquery.[Importance] desc

Figure 10-5 shows the results of the Partitioned Expected and Actual Counts query.

Figure 10-5 Partitioned Expected and Actual Counts Query Results

Surrounding text describes Figure 10-5 .

10.7 Tuning the Model Snapshot Process

You can tune the model snapshot process through the following parameters, available as JMX MBean attributes:

ModelSnapshotMinAbsCorrelation controls whether to snapshot all correlation rows or to set a minimum correlation value for snapshots.
ModelSnapshotNumberOfBins controls the number of bins for model snapshots.

For more information, see Section 13.3.2, "About OracleRTD > SDClusterPropertyManager > Misc."

Column	Description
id	Primary key.
study_id	Foreign key to RTDStudy.
name	Application name.

Column	Description
id	Primary key.
name	Attribute name.
path	Delimited attribute names showing how this attribute was reached from the root of the session.

Column	Description
model_instance_id	Foreign key to RTDModelInstance.
x	Cumulative gains curve data point.
y	Cumulative gains curve data point.

Column	Description
id	Primary key.
name	Model name.
study_id	Foreign key to RTDStudy.