1
Introducing Oracle9iAS Personalization

Oracle9iAS Personalization enables 1:1 marketing for e-businesses by dynamically serving personalized recommendations in real time to both registered customers and anonymous visitors. OP uses data mining technology to sift through the large amounts of data gathered from Web sites and other applications to find patterns within purchase, demographic, ratings, and navigational data. Oracle9iAS Personalization answers such questions as "Which items is this person most likely to buy or like, and with what likelihood?", "People that bought this item are likely to buy which other item?", and "What should I offer this customer to retain his or her business?"

This chapter first explains what personalization is and then introduces Oracle9iAS Personalization.

What Is Personalization?

Personalization makes recommendations automatically, implicitly, and explicitly. It is should not be confused with similar processes such as customization, business rules, or collaborative filtering.

Customization requires users to explicitly state preferences such as which stocks or sports teams to track. Personalization automatically deduces the customer's interests from the customer's behavior.

Business rules, such as "people who buy digital cameras buy many batteries for the cameras," are created from the experience of human beings running a business. They are not automatic and do not necessarily apply to a particular customer.

Collaborative filtering considers a customer's purchasing history but usually is not able to distinguish gifts from regular purchases. For example, you may never buy perfume for yourself, but do sometimes buy perfume as a gift. In this case, you may not want to get recommendations about specials on perfume. Personalization can take such things into consideration.

Personalization permits delivering recommendations with the touch and timing of someone who knows you well.

What Is Oracle9iAS Personalization?

Oracle9iAS Personalization (OP) is an integrated software product that provides a way for businesses to personalize recommendations they suggest to customers.

Recommendations are personalized for each customer. Note: A customer is often thought of as an anonymous visitor or registered customer at a Web site, but can also be a customer calling in to a call center or using an ATM machine. For OP to be able to serve recommendations, the applications need only be able to make Java-API calls to or from OP.

OP collects the data and uses it to build predictive models that support personalized recommendations of the form "a person who has clicked links x and y and who has demographic characteristics a and b is likely to buy z".

OP incorporates visitor activity into its recommendations in real time -- during the Web visitor's session. For example, OP records a visitor's navigation through the Web site, noting the links that are clicked, etc. The visitor may respond to a Web site's request to rate something, e.g., a book or a movie; the rating becomes part of the data stored for that visitor. Any purchases made become part of the data stored for that visitor. All the Web-based behavior for the visitor is saved to a database, where OP uses it to build predictive models. This data can be updated with data collected in subsequent sessions, thereby increasing the accuracy of predictions.

OP can work in conjunction with an existing Web application or other customer applications. The Web application asks OP to record certain activities, and the data is saved by OP into a schema in an Oracle9i database.

The application asks OP to produce a list of products likely to be purchased by the visitor; a scored list of recommendations compiled from the visitor's current behavior (stored in a database table) and from data in another schema holding historical data is passed to the application.

A third schema maintains administrative schedules and activities.

Although recommendations to Web site visitors is one important use of OP, OP can provide recommendations in other situations. Any application that collects customer data and needs to provide recommendations can use OP. OP also has a batch interface that can be used to generate recommendations that would be useful in marketing campaigns.

What Kind of Data Does OP Collect?

OP collects four kinds of data:

navigational behavior
ratings
purchases
demographic data

Of these, navigational behavior allows the most flexibility. It can represent anything the Web application wants to consider a hit (e.g., viewing a page, clicking a link/item, etc.).

Visitors to the Web site are of two types: registered visitors (customers) and unregistered visitors (visitors). For customers, OP has both data from a current session and historical data collected over time for that customer, as well as demographic data. For visitors, there is no historical data associated with this person, so recommendations are based on current session behavior and any demographic data that may be available.

How Does OP Collect the Data?

OP collects the data using Java calls provided by the REAPI (Recommendation Engine Application Programming Interface). These calls add information to the recommendation engine (RE) cache for the specific session, identified by a session ID. The RE finds the correct session ID by looking up one of the following arguments passed in the REAPI calls:

appSessionID -- used by sessionful Web applications (that is, an application that stores an identifier for each session)

customerID -- used by sessionless Web applications (that is, an application that does not store an identifier for each session)

In more detail: The data collected are temporarily stored in a dual buffer cache. Periodically the buffer is flushed and the data are sent to the appropriate RE schema. The session data are then used, combined with historical data, to generate recommendations. Finally, the RE instance periodically flushes the data to the Mining Table Repository (MTR) for sessions that have concluded or timed out. The OP administrator can set configuration parameters to indicate what data (by data source type) is saved to the MTR. The data in the MTR is then used to build predictive models for future deployment.

Sessionful and Sessionless Web Applications

Some Web applications are sessionful, i.e., they create a session for each visit to the Web site. Others are sessionless (stateless), i.e., they do not create sessions.

Regardless of whether the calling Web application is sessionful or sessionless, OP is always sessionful; OP always creates a session internally and maps that session to the Web site's session if there is one.

During the OP session, the Web application can collect data and/or request recommendations.

What Does OP Do with the Data?

OP uses the data to build data mining models. The models predict what the Web site visitor will probably like or buy. The predictions are based on the data collected for that Web site visitor in previous sessions, in the current session, and all available demographic, ratiings, purchase, and navigational data stored in the MTR.

A model is no better than the data that it is based on. As time passes, more data is collected. When there is more data available, a model should be rebuilt.

The OP Administrator defines a package that controls model building and deployment. A package specifies the build settings and other attributes that control the way a models is to be built, as well as the RE Farm (collection of recommendation engines) to which the model is to be deployed. After the build is complete, the package consists of the rules tables that are deployed to the recommendation engines in the specified RE Farm.

The OP Administrator creates and manages schedules for building the packages, and for deploying the packages to the recommendation engines (REs) that will produce the recommendations. Recommendation engines with the same package are grouped together in recommendation engine farms (RE Farms). These and related terms are defined more fully in the next section.

Models Built by OP

OP uses data mining models to make predictions. A model is actually part of an OP package.

OP uses one of two algorithms, depending on the type of recommendation requested by the Web application. Both algorithms are based on a theorem of Bayes concerning conditional probability. See Appendix A for a description of the algorithms.

Data for Model Building

Model building requires data. OP must have the data required to build a model before you try to build and deploy a package.

If you have data collected and saved to an Oracle database, that data can be used to populate the MTR tables. As an alternative, the MTR schema can be mapped to the existing data via views.

However, if you have no previously collected data, you must use the REAPI methods addItem and addItems to collect data. Data collection occurs in the Recommendation Engine (RE). For an RE to be up and running, there must be a package deployed in that RE. However, in order to build and deploy a package, you must have data in the MTR. To put it simply, you can't collect data unless you have enough data to build a package. You resolve this problem by populating the MTR with seed data and then using the seed data to build and deploy an initial package. See the administrator's guide for information on how to use the seed data.

OP Components

The user of the OP Administrative UI is anyone who needs to build and deploy packages. The Oracle9iAS Personalization User's Guide is designed to introduce anyone who needs to build and deploy packages to the basic components and interfaces of OP. This guide may also be useful to people who design and implement REAPI programs.

Note:

OP requires both an Oracle9i database and the Oracle9i Application Server. The database and the application server can either be installed on the same system or on different systems. If they are installed on different systems, some OP components are installed on the system where the database is installed; others are installed on the system where the application server is installed.

The OP components and interfaces consist of:

Administrative UI: A browser-based user interface that permits the OP Administrator to schedule package builds, deployments, and reports, manage RE Farms and REs, and otherwise manage OP. Chapter 2 describes the Administrative UI in detail. The Administrative UI is installed on the system where Oracle9iAS is installed.
Recommendation Engine (RE): An RE consists of programs and tables (RE schema) and programs required for collecting data and making recommendations. The RE supports a Web application written in Java for collecting and preprocessing customer and visitor data, and for providing recommendations to those customers and visitors. Access to the RE is provided via the REAPI (Recommendation Engine Application Programming Interface). A given RE may support one or more Java server processes in a Web application. An RE resides in the customer database on the system where Oracle9i is installed.
Recommendation Engine Farm (RE Farm): A logical grouping of one or more REs that are populated with the same deployable package (data mining model(s)). An RE Farm is generally treated as a single unit for management from the Administrative UI.
Package: An object created using the Administrative UI. A package contains the information from historical data necessary to make recommendations. A package defines the build settings and other attributes necessary for building data mining models and for scheduling model builds. A package also contains the model that is built from this definition.
Mining Object Repository (MOR): The schema that maintains mining metadata and mining model results. The MOR contains data required for logging in to the data mining system, logging off, and scheduling OP events. The Administrative UI provides a way to interact with the MOR. The MOR is installed on the system where the database is installed
Mining Table Repository (MTR): The MTR contains the schema and data to be used for data mining. The MTR has a fixed schema designed to support the building of models that produce recommendations. The MTR resides in the customer database on the system where Oracle9i is installed.
Recommendation Engine API (REAPI): A collection of Java classes that enable a Web application to collect and preprocess data used to build OP models and to obtain recommendations in real time from OP (that is, to score items for particular customers). REAPI is installed on the system where Oracle9iAS is installed.
Batch Recommendation Engine API: A collection of Java classes that permits users to obtain bulk recommendations in batch, i.e., offline. The Batch API is installed on the system where Oracle9iAS is installed.

It is an option during OP installation to populate the MTR with a small amount of sample data. If this option is chosen, an RE demo can be accessed and some recommendations and administrative actions can be tested.

Location of OP Components

OP requires both Oracle9i and the Oracle9i Application Server (Oracle9iAS). Oracle9i and Oracle9iAS may be installed on different systems.

If the database and Oracle9iAS are installed on different systems, the following OP components are installed on the system where Oracle9iAS is installed:

REAPI
RE Batch API
REAPI Demo
The Administrative UI

All other components are installed on the system where the database is installed.

How It All Works

OP's components and process are diagrammed in Figure 1-1. The diagram is a flow chart of the entire Oracle Personalization process.

Keep in mind the following main points about the different OP components: the MTR (Mining Table Repository) is where all the data is stored -- data that is used to create the model that produces the rules that generate the recommendations.

The MOR (Mining Object Repository) provides the administrative environment within OP; it holds all the tables that are responsible for OP's administrative functions. Your access to the MOR is provided by the Administrative UI. It is through the Admin UI that you control the MOR functions such as creating recommendation engine farms, building packages, scheduling packages for builds and deployments -- all these functions are accomplished through the Admin UI.

Figure 1-1 Oracle Personalization Process.

Text description of opdiagr3.gif follows.

Text description of the illustration opdiagr3.gif

The RE (Recommendation Engine) is the part of OP that generates the recommendations that are displayed within a Web application. The RE is also the part of OP that collects data into the RE schema.

The process, in a nutshell, is as follows: The data (step 1) resides in the MTR, and is transferred to the MOR (step 2) where a package (a data mining model) is developed. Once that package is built, it is deployed to the recommendation engine (step 3), where it is used to score data and records and develop the recommendations that are then passed to the Web application (step 4). The Web application provides data via the REAPI data collection methods, which are passed back to the REs (step 5) and are eventually synchronized back to the MTR (step 6).

Where does the data come from? Wherever it comes from, it has to end up in the MTR. There are two ways of getting the data into the MTR. The most direct way is to map existing customer data (session data -- ratings, purchasing, or navigational -- or demographic data) onto the MTR schema. This method lets you generate recommendations very quickly. If you have no customer data, you can use the seed data that is optionally installed with OP. The point is simply that you have to have data of some kind to get started; you cannot build a model without data. It can be real data, mapped to the MTR schema, or artificial data, which you use to get started and can then adjust as real customer data comes in.

The second way to get data into the MTR is through the REAPI data collection methods, which are implemented within the Web Site. As a visitor to the Web site goes through the Web application, the APIs are collecting data at different points and sending it back to the RE, which then passes the data back to the MTR, where it is used in subsequent model builds.

Note that to collect data using REAPI calls, there must be a deployed package in the RE. Of course, you cannot deploy a package until you build it, and you cannot build a package without data. If you have no real data, you can use the seed data to kick-start the process. The seed data is artificial data -- it is the minimum amount of data required to build and deploy a package. Once you have built and deployed a package, you can collect real data, and you can then build a package that can be used to generate recommendations relevant to your target.

1 Introducing Oracle9iAS Personalization