This chapter describes model snapshots, as implemented in Oracle Real-Time Decisions (Oracle RTD), Version 2.2.1. It contains the following topics:
Section 3.5, "Populating and Clearing the Model Snapshot Tables"
Section 3.6, "Creating Reports from the Model Snapshot Data"
In Oracle RTD Version 2.2.1, you can copy the results of Oracle RTD's learning process to database tables. These results include counts of events, predictiveness values, and correlations. From these tables, you can generate reports using simple SQL commands. This feature is known as Model Snapshots.
Each Inline Service is associated with a Study name, and each Inline Service may have one or more Models. The learning process data applies to each Model in a Study. Using the Model Snapshot feature, you dump all the Model results for a Study to the database.
Note: Strictly speaking, the correct term for the data saved is study snapshot rather than model snapshot. However, for simplicity, the two terms are used interchangeably in this section. |
Overview of Setting Up and Using Model Snapshots
There are four main stages in the process of setting up and using model snapshots, as follows:
Configuring the model snapshot tables.
Running one or more applications which add Oracle RTD learning data to their associated models.
This stage does not explicitly use the model snapshot tables, but it supplies the data to be used in the following stage.
Populating the model snapshot tables from the learned data, and clearing the tables as required.
Creating reports from the model snapshot tables.
Two new system parameters related to model snapshots have been added in Oracle RTD Version 2.2.1. For more information, see Section 3.8, "Tuning the Model Snapshot Process."
Figure 3-1 shows the model snapshot tables, their columns, and the inter-table relationships, as represented by the connector lines.
Each connector represents a one-to-many relationship, from the table at the top end of the connector to the table at the lower end. In the reverse direction, the relationship from the lower table to the top table is one-to-one.
For example, the relationship from RTDStudy to RTDApp is one-to-many. From RTDApp to RTDStudy, the relationship is one-to-one. This corresponds to the real-world situation that each Study can have one or more Applications, and each Application must have one Study only. For production work, Oracle recommends that you use only one Application per Study.
Note: In this section, the term Application is synonymous with the term Inline Service. For example, the table RTDApp stores information about Oracle RTD Applications, which are also known as Inline Services. |
The one-to-many relationships between the elements represented by the tables are as follows (each one-to-many relationship implies the corresponding one-to-one relationship in the reverse direction):
Each Study can have one or more Applications, and one or more Models.
Each Application can have one or more Choice Groups, and one or more Choices.
Each Choice Group can have one or more Choices.
Each Choice, Event, and Model can have one or more Model Instances.
There are also three many-to-many relationships between Model Instances and Attributes, namely Partition, Predictiveness, and Correlation.
The table and column names appear in the following list.
RTDApp
Column | Description |
---|---|
id | Primary key. |
study_id | Foreign key to RTDStudy. |
name | Application name. |
RTDAttribute
Column | Description |
---|---|
id | Primary key. |
name | Attribute name. |
path | Delimited attribute names showing how this attribute was reached from the root of the session. |
RTDChoice
Column | Description |
---|---|
id | Primary key. |
app_id | Foreign key to RTDApp. |
internal_name | Internal name for Choice. |
display_name | Name displayed for Choice. |
choicegroup_id | Foreign key to RTDChoiceGroup. |
RTDChoiceGroup
Column | Description |
---|---|
id | Primary key. |
app_id | Foreign key to RTDApp. |
internal_name | Internal name for Choice Group. |
display_name | Name displayed for Choice Group. |
RTDCorrelation
Column | Description |
---|---|
model_instance_id | Foreign key to RTDModelInstance. |
attribute_id | Foreign key to RTDAttribute. |
count_output_input | Count of those in the population where the value of this output attribute was found given the input. |
correlation | Value between -1 and 1. A positive correlation indicates the degree to which the input attribute's value is associated with the value of the output. A negative correlation indicates a negative association. |
count_input | Count of those in the population where this value of this input attribute was found. |
value | Value for the input attribute. |
RTDEvent
Column | Description |
---|---|
id | Primary key. |
name | Event name. |
RTDModel
Column | Description |
---|---|
id | Primary key. |
name | Model name. |
study_id | Foreign key to RTDStudy. |
RTDModelInstance
Column | Description |
---|---|
id | Primary key. |
state | Model state.
Values can be: c: Completed.The model is completed, such as for previous time windows, and is no longer subject to change. It will not be rewritten unless you perform a total dump, or delete the dumps and then perform an incremental dump. d: Combined. The model is for the current and previous time window. s: Split. The model is for the current time window. w: Written. The model is currently being written. Results may be inconsistent. |
model_id | Foreign key to RTDModel. |
choice_id | Foreign key to RTDChoice. |
event_id | Foreign key to RTDEvent. |
count_total | Total number of base events. |
count_positive | Size of the subset of the population where this non-base event was recorded. |
time_window_start | Start of time window. |
time_window_end | End of time window. |
quality | Quality of the model, value vary from 0 to 1. Higher values are better, 0 means nothing was learned. |
RTDPartition
Column | Description |
---|---|
model_instance_id | Foreign key to RTDModelInstance. |
attribute_id | Foreign key to RTDAttribute. |
value | Attribute value for the partition. |
RTDPredictiveness
Column | Description |
---|---|
model_instance_id | Foreign key to RTDModelInstance. |
attribute_id | Foreign key to RTDAttribute. |
predictiveness | Explanatory score of an input attribute for a particular model instance. Values vary from 0 to 1, higher values are better. |
RTDStudy
Column | Description |
---|---|
id | Primary key. |
name | Study name. |
The main objectives of this stage are to create the model snapshot tables, and to register them with the application server and the JMX MBeans.
To configure the model snapshot tables:
Select the database where your model snapshot tables will be stored.
Note: You may choose to store the tables in the SDDB database, but this is not recommended for a production system. |
From the RTD_HOME\scripts directory, run the command that creates the model snapshot tables:
sdexec com.sigmadynamics.tools.SDDBTool.SDDBTool -f -i -I InitSnapshotDb.ctl db_type db_host db_port db_name db_runtime_user db_admin_user db_admin_password
The following table describes the parameters for the sdexec script.
Parameter | Description |
---|---|
db_type |
The database type.
Select one of the following: oracle, sqlserver, db2. |
db_host |
The name of the computer hosting the database server. |
db_port |
The database port number. |
db_name |
The name of the database or, for Oracle Database, the SID. |
db_runtime_user Foot 1 |
The user name of the run-time user for the system. |
db_admin_user |
The name of a user that has rights on the database to create tables and stored procedures. |
db_admin_password |
The password of the administrative user. |
Footnote 1 For Oracle Database, the db_runtime_user
and db_admin_user
are the same user.
Create a new Data Source in your application server, that references the database where your model snapshot tables are stored.
For details of how to create a Data Source in an application server, see the chapter Configuring Data Access for Oracle Real-Time Decisions in the Oracle Real-Time Decisions Installation and Administration of Oracle RTD.
As part of the operation of creating a new Data Source, you provide a new JNDI name, that is used in the steps following.
Add the new Data Source to Oracle RTD.
The actions required to add the new Data Source to Oracle RTD depend on which application server you are using.
Perform the steps appropriate to your application server, as follows:
If your application server is OC4J, continue at step 5.
If your application server is WebSphere, continue at step 11.
If your application server is WebLogic, continue at step 14.
Adding the Data Source to Oracle RTD in OC4J
If Oracle RTD is running on standalone OC4J, go to the directory OC4J_HOME
/j2ee/home/applications/OracleRTD
. If Oracle RTD is running on Oracle Application Server, go to ORACLE_AS_HOME
/j2ee/
oc4j_instance
/applications/OracleRTD
.
This directory is the location where the RTD.ear
file was expanded when you deployed it as an application.
Locate the file ./ls/WEB-INF/web.xml
and open it for editing. Scroll to the bottom of the file. Copy the section for the definition of the resource reference of SDDS
and paste it after the existing section. In the copied section, replace the string SDDS
with the JNDI name (jndi_name
) that you entered in step 3.
For example:
<resource-ref id="jndi_name_LS"> <res-ref-name>jndi_name</res-ref-name> <res-type>javax.sql.DataSource</res-type> <res-auth>Container</res-auth> <res-sharing-scope>Unshareable</res-sharing-scope> </resource-ref>
Then, save the changes and close the file.
Go to the directory OC4J_HOME
/j2ee/home/application-deployments/OracleRTD
. Make sure to go to the application-deployments
directory, not the applications
directory.
Locate the file ./ls/WEB-INF/orion-web.xml
and open it for editing. Scroll to the bottom of the file. Copy the line for the definition of the resource reference mapping of SDDS
and paste it after the existing line. In the copied line, replace the string SDDS
with the JNDI name (jndi_name
) that you entered in step 3. For example:
<resource-ref-mapping name="jndi_name" location="jndi_name"/>
Then, save the changes and close the file.
Start OC4J.
Continue at step 16.
Adding the Data Source to Oracle RTD in WebSphere
Open the following file for editing:
WEBSPHERE_HOME/AppServer/profiles/profile_name/config/cells/server_name/ applications/OracleRTD.ear/deployments/OracleRTD/ls.war/WEB-INF/web.xml
Create a new <resource-ref>
entry, as follows:
Copy the existing <resource-ref>
entry for the Oracle RTD Database (SDDS
) and paste it directly below.
Modify the id
attribute by entering a descriptive value followed by _LS
, similar to the SDDS
entry. The id
must have a unique value within the file.
Modify the <res-ref-name>
tag by entering the JNDI name (jndi_name
) that you provided in step 3.
For example:
<resource-ref id="jndi_name_LS"> <res-ref-name>jndi_name</res-ref-name> <res-type>javax.sql.DataSource</res-type> <res-auth>Container</res-auth> <res-sharing-scope>Unshareable</res-sharing-scope> </resource-ref>
Save the changes and close the file.
Continue at step 16.
Adding the Data Source to Oracle RTD in WebLogic
Go to the directory where you expanded the RTD.ear
file during installation (RTD_HOME
/package/expanded
).
Open the ls.war
archive, extract web.xml
, then open web.xml
for editing. Scroll to the bottom of the file. Copy the section for the definition of the resource reference of SDDS_LS
and paste it after the existing section. In the copied section, replace the string SDDS
with the JNDI name (jndi_name
) that you provided in step 3.
For example:
<resource-ref id="jndi_name_LS"> <res-ref-name>jndi_name</res-ref-name> <res-type>javax.sql.DataSource</res-type> <res-auth>Container</res-auth> <res-sharing-scope>Unshareable</res-sharing-scope> </resource-ref>
Save the changes and close the file, then re-archive the file back in ls.war.
Continue at step 16.
In the J2SE Console, register the name of the new Data Source in the OracleRTD MBeans:
Navigate to MBeans > OracleRTD > SDClusterPropertyManager > Misc.
For the ModelSnapshotDSName attribute, specify the name of the Data Source that you created for the model snapshots.
Figure 3-2 shows the model snapshot Data Source name SDDM specified for the ModelSnapshotDSName attribute in the J2SE Console.
Applications that use Oracle RTD are typically configured to activate model learning, as part of their normal operating procedures.
The process of activating Oracle RTD model learning is simply to execute the calling application. The appropriate application operations then trigger the procedures that send data to Oracle RTD's model learning process. By default, all models learn at session close time.
You can take snapshots of the model learning data at any time, even if not enough data has been accumulated for prediction purposes.
Each model is defined to have a time window, such as week, month, and quarter. This determines how much data to gather in the learning process, and also influences how much data to write to the model snapshot tables.
You can select the amount of data to write from the following options:
All the model learning data for a Study
All the model learning data for a Study for the current time window
You can also delete all model learning data for a Study from the model snapshot tables.
To populate or clear the model snapshot tables:
In the J2SE Console, navigate to MBeans > OracleRTD > Learning Server > your study name.
Click the Operations tab.
Figure 3-3 shows the model snapshot operations specified for the CrossSell Study in the J2SE Console.
Click the appropriate snapshot option:
CompleteSnapshot
Deletes all previously snapped data for the Study, and rewrites all Study data, up to the current time.
IncrementalSnapshot
Deletes Study data for any incomplete time window, and rewrites the current time window's data, up to the current time.
DeleteSnapshot
Deletes all snapped data for the Study.
You can create reports from the model snapshot tables, typically by using standard SQL Select commands.
This section provides examples of scripts that extract information from several of the model snapshot tables. Each script example is followed by sample output. Notes are provided for some of the examples to help you to interpret the results.
The Inline Service used for these examples was a CrossSell application, and the data was generated by running the Oracle RTD Load Generator script to completion, simulating 400,000 user sessions.
The following query gets the counts for every Choice, for all time windows:
select g.display_name as 'Choice Group', c.display_name as 'Choice', e.name as 'Event', mi.timewindow_start as 'Start', mi.timewindow_end as 'End', mi.state as 'Model Status', m.name as 'Model Name', mi.count_total, mi.count_positive, mi.quality from RTDApp a inner join RTDStudy s on s.id=a.study_id inner join RTDModel m on m.study_id=s.id inner join RTDModelInstance mi on mi.model_id=m.id inner join RTDEvent e on mi.event_id=e.id inner join RTDChoice c on c.id=mi.choice_id inner join RTDChoiceGroup g on c.choicegroup_id=g.id where a.name='CrossSell' order by m.name, g.display_name, c.display_name, mi.timewindow_start
Figure 3-4 shows the results of the Counts by Choice query.
Notes on the Counts by Choice Query Results
In row 13, Choice=Gold Card, Event=Interested, count_total=477, count_positive=19, and quality=0.0. This shows that, out of 477 users that were presented with the Gold Card offer, 19 were interested. The counts are small and the model quality with respect to the Gold Card Choice and the Interested Event is 0.
In row 14, Choice=Gold Card, Event=Purchased, count_total=477, count_positive=1, and quality=0.0. This shows that one user purchased this offer. The model quality with respect to the Gold Card Choice and the Purchased Event is 0.
Both row 13 and row 14 apply to the period of time from April 1, 2003 to July 1, 2003.
In the columns Choice Group and Choice, the value BASE EVENT means "in general" or "overall."
For example, in row 1, Choice=BASE EVENT, Event=Interested, count_total=24917, count_positive=1663, and quality is approximately 0.6882. This means that, in the period between the Start and End dates, a grand total of 24917 users were presented offers, and 1663 of these were interested. The overall model quality for this period of time was about 0.69.
In row 2, Choice=BASE EVENT, Event=Purchased, count_total=24917, count_positive=220, and quality is approximately 0.5666. This means that, for the same period of time as for row 1, there were 220 Purchased events across all Choices, and the model quality was about 0.57.
The following query selects the top six predictive attributes, for each time window, for the Credit Protection Choice resulting in the Purchased Event.
select a.name 'Attribute Name', p.predictiveness 'Predictiveness', c.display_name 'Choice Name', mi.timewindow_start as 'Start', mi.timewindow_end as 'End', mi.state as 'Model Status' from RTDApp app inner join RTDChoice c on c.app_id=app.id inner join RTDStudy s on s.id=app.study_id inner join RTDModel m on m.study_id=s.id inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id inner join RTDEvent e on mi.event_id=e.id inner join RTDPredictiveness p on p.model_instance_id=mi.id inner join RTDAttribute a on a.id=p.attribute_id where app.name = 'CrossSell' and c.display_name = 'Credit Protection' and e.name = 'Purchased' and m.name = 'OfferAcceptance' and 7 > (select count(*) from RTDPredictiveness p2 where p2.model_instance_id = p.model_instance_id and p2.predictiveness > p.predictiveness) order by mi.timewindow_end desc, p.predictiveness desc
Figure 3-5 shows the results of the Top Six Predictive Attributes query.
The following query shows, for people of different marital statuses, the number who actually purchased credit protection and the number who were expected to do so. The report also shows the difference between the two values, and the importance of the correlation for the offer acceptance.
select cor.value 'customer MaritalStatus', cor.count_output_input 'Actual Count', mi.count_positive*cor.count_input/mi.count_total 'Expected Count', (100*(mi.count_positive*cor.count_input/mi.count_total - cor.count_output_input))/cor.count_output_input 'Percent Difference', cor.correlation 'Importance', c.display_name 'Choice Name', mi.timewindow_start as 'Start', mi.timewindow_end as 'End', mi.state as 'Model Status' from RTDApp app inner join RTDChoice c on c.app_id=app.id inner join RTDStudy s on s.id=app.study_id inner join RTDModel m on m.study_id=s.id inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id inner join RTDEvent e on mi.event_id=e.id inner join RTDCorrelation cor on cor.model_instance_id=mi.id inner join RTDAttribute a on a.id = cor.attribute_id where app.name = 'CrossSell' and c.display_name = 'Credit Protection' and e.name = 'Purchased' and m.name = 'OfferAcceptance' and a.name = 'customer MaritalStatus' order by mi.timewindow_end desc, cor.correlation desc
Notes on the Difference Between Expected and Actual Counts Query
The Actual Count, cor.count_output_input
, is the actual number of people who purchased credit protection, for each marital status.
The Expected Count is a simple linear projection of the total count of each marital status, cor.count_input
, to those that purchased credit protection, as expressed by mi.count_positive/mi.count_total
.
The Percent Difference is 100 * (Expected Count - Actual Count) / Actual Count.
Figure 3-6 shows the results of the Difference Between Expected and Actual Counts query.
Notes on the Difference Between Expected and Actual Counts Query Results
There are two rows for each marital status, each corresponding to one of the two time periods, April 1, 2003 - July 1, 2003 and July 1, 2003 - October 1, 2003.
The RTDPartition table holds values for partitioning attributes. If a Model is not partitioned, the Model has one Model Instance per time window, and there are no associated rows in the RTDPartition table.
A Model that is split along one or more of its dimensions is a partitioned Model.
As an example, a Model M can be partitioned by two attributes, Marital Status and Favorite Beverage. If there are 3 values for Marital Status (Married, Single, Divorced) and 2 for Favorite Beverage (coffee, tea), then this model has 6 model instances.
In this case, each Model Instance has two associated RTDPartition rows. For example, the Model Instance for the combination (Marital Status=Married and Favorite Beverage=Coffee) would be associated with two RTDPartition rows, containing the following information:
RTDPartition row 1: Attribute=Marital Status, Value=Married
RTDPartition row 2: Attribute=Favorite Beverage, Value=Coffee
Whether or not a Model is partitioned can influence the results of queries on the model snapshot tables. To avoid repetitions in your results, include RTDPartition and RTDAttribute join conditions in your query.
The following example modifies and extends the Difference Between Expected and Actual Counts query to cover the case of a Model partitioned on two attributes, Diabetic, which has "yes" and "no" values, and Marital Status.
select a.name, p.value, subquery.* from (select cor.value 'Favorite Sports', cor.count_output_input 'Actual Count', mi.count_positive*cor.count_input/mi.count_total 'Expected Count', (100*(mi.count_positive*cor.count_input/mi.count_total - cor.count_output_input))/cor.count_output_input 'Percent Difference', cor.correlation 'Importance', c.display_name 'Choice Name', mi.timewindow_start as 'Start', mi.timewindow_end as 'End', mi.state as 'Model Status' mi.id model_instance_id from RTDApp app inner join RTDChoice c on c.app_id=app.id inner join RTDStudy s on s.id=app.study_id inner join RTDModel m on m.study_id=s.id inner join RTDModelInstance mi on mi.model_id=m.id and mi.choice_id=c.id inner join RTDEvent e on mi.event_id=e.id inner join RTDCorrelation cor on cor.model_instance_id=mi.id inner join RTDAttribute a on a.id = cor.attribute_id where app.name = 'HighlyPartitionedDataset' and c.display_name = 'Fanta' and e.name = 'loved' and m.name = 'SatisfactionModel' and a.name = 'Favorite Sports') as subquery inner join RTDPartition p on subquery.model_instance_id = p.model_instance_id inner join RTDAttribute a on p.attribute_id = a.id order by subquery.[End] desc, subquery.model_instance_id, a.name, p.value subquery.[Importance] desc
Figure 3-7 shows the results of the Partitioned Expected and Actual Counts query.
Two new system properties are available in Oracle RTD Version 2.2.1 for tuning the model snapshot process:
ModelSnapshotMinAbsCorrelation controls whether to dump all correlation rows or to set a minimum correlation value for dumping. The default value of 0.000001 prevents the dumping of very small value correlation rows. Set the value to 0 to dump all correlation rows.
ModelSnapshotNumberOfBins controls the number of bins for model snapshots. Numeric attribute values are automatically binned, or assigned to numeric ranges. The default number of bins is 5. To achieve greater resolution of your numeric data, increase the number of bins.
Caution: For the same numeric attribute, Oracle RTD creates different bins in different time windows. Therefore, it is unlikely that you will be able to join numeric attribute values across time windows. |
If you want to change the value of a system property, set the system properties as server properties for your application server. See Oracle Real-Time Decisions Installation and Administration of Oracle RTD for details of how to configure server properties for your application server.
For WebSphere and OC4J, edit the JVM arguments. For WebLogic, edit the Java options. For example, add the parameter -DModelSnapshotNumberOfBins=10
to the JVM arguments for WebSphere and OC4J, or to the Java options for WebLogic.