Skip Headers

Oracle Data Mining Application Developer's Guide
10g Release 1 (10.1)

Part Number B10699-01
Go to Documentation Home
Go to Book List
Book List
Go to Table of Contents
Go to Index
Go to Master Index
Master Index
Go to Feedback page

Go to previous page
Go to next page
View PDF

ODM Java API Basic Usage

This chapter describes how to use the ODM Java interface to write data mining applications in Java. Our approach in this chapter is to use a simple example to describe the use of different features of the API.

For detailed descriptions of the class and method usage, refer to the Javadoc that is shipped with the product. See the administrator's guide for the location of the Javadoc.

3.1 Connecting to the Data Mining Server

To perform any mining operation in the database, first create an instance of oracle.dmt.odm.DataMiningServer class. This instance is used as a proxy to create connections to a data mining server (DMS), and to maintain the connection. The DMS is the server-side, in-database component that performs the actual data mining operations within ODM. The DMS also provides a metadata repository consisting of mining input objects and result objects, along with the namespaces within which these objects are stored and retrieved.

In this step, we illustrate creating a DataMiningServer object and then logging in to get the connection. Note that there is a logout method to release all the resources held by the connection

// Create an instance of the DMS server and get a connection.
// The database JDBC URL, user_name, and password for data mining
// user schema
DataMiningServer dms = new DataMiningServer(
   "DB_URL",// JDBC URL jdbc:oracle:thin:@Host name:Port:SID 
//Login to get the DMS connection
oracle.dmt.odm.Connection m_dmsConn = dms.login();

3.2 Describing the Mining Data

In the ODM Java interface, (LAD) and oracle.dmt.odm.PhysicalDataSpecification (PDS) classes are used for describing the mining dataset (table/view in the user schema). To represent single-record format dataset, use an instance of NonTransactionalDataSpecification class, and to represent multi-record format dataset, use TransactionalDataSpecification class. Both classes are inherited from the common super class PhysicalDataSpecification. For more information about the data formats, refer to ODM Concepts.

In this step, we illustrate creating LAD and PDS objects for both types of formats.

3.2.1 Creating LocationAccessData

LocationAccessData (LAD) class encapsulates the dataset location details. The following code describes the creation of this object.

// Create a LocationAccessData by specifying the table/view name
// and the schema name 
LocationAccessData lad =
          new LocationAccessData("input table name", "schema name");

3.2.2 Creating NonTransactionalDataSpecification

The NonTransactionalDataSpecification class contains the LocationAccessData object and specifies the data format as single-record case. The following code describes the creation of this object.

// Create the actual NonTransactionalDataSpecification
PhysicalDataSpecification pds =
          new NonTransactionalDataSpecification(lad);

3.2.3 Creating TransactionalDataSpecification

The TransactionalDataSpecification class contains a LocationAccessData object; it specifies the data format as multi-record case and it specifies the column roles.

This dataset must contain three types of columns: Sequence-Id/case-id column to represent each case, attribute name column, and attribute value column. This format is commonly used when the data has a large number of attributes. For more information, refer to ODM Concepts. The following code illustrates the creation of this object.

// Create the actual TransactionalDataSpecification for transactional data.
PhysicalDataSpecification pds =
          new TransactionalDataSpecification(
                    "CASE_ID", //column name for sequence id
                    "ATTRIBUTES", //column name for attribute name
                    "VALUES", //column name for value
                    lad //Location Access Data

3.3 MiningFunctionSettings Object

The class oracle.dmt.odm.settings.function.MiningFunctionSettings (MFS) is the common super class for all types of mining function settings classes. It encapsulates the details of function and algorithm settings, logical data, and data usage specifications. For more detailed information about logical data and data usage specification, refer to Javadoc documentation for and oracle.dmt.odm.settings.function.DataUsageSpecification.

An MFS object is a named object that can be stored in the DMS. If no algorithm is specified, the underlying DMS selects the default algorithm and its settings for that function. For example, Naive Bayes is the default algorithm for classification function. In this step, the ODM Java interface has the following function settings classes and a list of associated algorithm settings classes with each function.

oracle.dmt.odm.settings.algorithm.NaiveBayesSettings (Default)

oracle.dmt.odm.settings.algorithm.SVMRegressionSettings (Default)

oracle.dmt.odm.settings.algorithm.AprioriAlgorithmSettings (Default)

oracle.dmt.odm.settings.algorithm.KMeansAlgorithmSettings (Default)
oracle.dmt.odm.settings.algorithm.OClusterAlgorithmSettings (Default)



In this step, we illustrate the creation of a ClassificationFunctionSettings object using Naive Bayes algorithm.

3.3.1 Creating Algorithm Settings

The class oracle.dmt.odm.settings.algorithm.MiningAlgorithmSettings is the common superclass for all algorithm settings. It encapsulates all the settings that can be tuned by a data-mining expert based on the problem and the data. ODM provides default values for algorithm settings; refer to the Javadoc documentation for more information about each the algorithm settings. For example, Naive Bayes has two settings: singleton_threshold and pairwise_threshold. The default values for both of these settings is 0.01.

In this step we create a NaiveBayesSettings object that will be used by the next step to create the ClassificationFunctionSettings object.

// Create the Naive Bayes algorithm settings by setting both the pairwise  
// and singleton thresholds to 0.01.
NaiveBayesSettings nbAlgo = new NaiveBayesSettings(0.02f,0.02f);

3.3.2 Creating Classification Function Settings

An MFS object can be created in two ways: by using the constructor or by using create and adjust utility methods. If you have the input dataset, it is recommended that you use the create utility method because it simplifies the creation of this complex object.

In this example, the utility method is used to create a ClassificationFunctionSettings object for a dataset, which has all unprepared categorical attributes and an ID column. Here we use automated binning; for more information about data preparation, see

// Create classification function settings
ClassificationFunctionSettings mfs =
                   m_dmsConn,       //DMS Connection
                   nbAlgo,          //NB algorithm settings
                   pds,             //Build data specification
                   "target_attribute_name", //Target column
                   AttributeType.categorical, //Target attribute type
                   DataPreparationStatus.unprepared //Default preparation status

//Set ID attribute as an inactive attribute
mfs.adjustAttributeUsage(new String[]{"ID"},AttributeUsage.inactive);

3.3.3 Validate and Store Mining Function Settings

Because the MiningFunctionSettings object is a complex object, it is a good practice to validate the correctness of this object before persisting it. If you use utility methods to create MFS, then it will be a valid object.

The following code illustrates validation and persistence of the MFS object.

// Validate and store the ClassificationFunctionSettings object
try {
 mfs.validate();, "Name_of_the_MFS");
} catch(ODMException invalidMFS) {
  throw invalidMFS;

3.4 MiningTask Object

The class oracle.dmt.odm.task.MiningTask is the common superclass for all the mining tasks. This class provides asynchronous execution of mining operations in the database using DBMS_JOBS. For each execution of the task an oracle.dmt.odm.task.ExecutionHandle object is created. The ExecutionHandle object provides the methods to retrieve the status of the execution and utility methods like waitForCompletion, terminate, and getStatusHistory. Refer to the Javadoc API documentation of these classes for more information.

The ODM Java interface has the following task classes:

3.5 Build a Mining Model

To build a mining model, the MiningBuildTask object is used. It encapsulates the input and output details of the model build operation.

In this step, we illustrate creation, storing, and executing the MiningBuildTask object and task execution status retrieval by using ExecutionHandle object.

// Create a build task and store it.
MiningBuildTask buildTask =
          new MiningBuildTask(

// Store the task, "name_of_the_build_task");

// Execute the task
ExecutionHandle execHandle = buildTask.execute(m_dmsConn);

// Wait for the task execution to complete
MiningTaskStatus status = execHandle.waitForCompletion(dmsConnection);

After the build task completes successfully, the model is stored in the DMS with a name specified by the user.

3.6 MiningModel Object

The class oracle.dmt.odm.model.MiningModel is the common superclass for all the mining models. It is a wrapper class for the actual model stored in the DMS. Each model class provides methods for retrieving the details of the models. For example, AssociationRulesModel provides methods to retrieve the rules from the model using different filtering criteria. Refer to Javadoc API documentation for more details about the model classes.

In this step, we illustrate restoring the NaiveBayesModel object and retrieve the ModelSignature object. The ModelSignature object specifies the input attributes required to apply data using a specific model.

//Restore the naïve bayes model 
NaiveBayesModel nbModel = 

//Get the model signature 
ModelSignature nbModelSignature = nbModel.getSignature();

3.7 Testing a Model

After creating the classification model, you can test the model to assess its accuracy and compute a confusion matrix using the test dataset.

In this step, we illustrate how to test the classification model using the ClassificationTestTask object and how to retrieve the test results using the ClassificationTestResult object.

3.7.1 Describe the Test Dataset

To test the model, a compatible test dataset is required. For example, if the model is built using single-record dataset, then the test dataset must be single-record dataset. All the active attributes and target attribute columns must be present in the test dataset.

To test a model, the user needs to specify the test dataset details using the PhysicalDataSpecification class.

     //Create PhysicalDataSpecification
      LocationAccessData lad = new LocationAccessData( 
                                 "schema_name" );
      PhysicalDataSpecification pds = 
        new NonTransactionalDataSpecification( lad );

3.7.2 Test the Model

After creating the PhysicalDataSpecification object, create a ClassificationTestTask instance by specifying the input arguments required to perform the test operation. Before executing a task, it must be stored in the DMS. After invoking execute on the task, the task is submitted for asynchronous execution in the DMS. To wait for the completion of the task, use waitForCompletion method.

           //Create, store & execute Test Task
      ClassificationTestTask testTask = new ClassificationTestTask(
                   pds, //test data specification
                   "name_of_the_test_results_object" );, "name_of_the_test_task");      

     //Wait for completion of the Test task
     MiningTaskStatus taskStatus = 

3.7.3 Get the Test Results

After successful completion of the test task, you can restore the results object persisted in the DMS using the restore method. The ClassificationTestResult object has get methods for accuracy and confusion matrix. The toString method can be used to display the test results.

//Restore the test results
ClassificationTestResult testResult = 
          ClassificationTestResult.restore(m_dmsConn, "name of the test 

//Get accuracy
double accuracy = testResult.getAccuracy();

//Get confusion matrix
ConfusionMatrix confMatrix = testResult.getConfusionMatrix();

//Display results

3.8 Lift Computation

Lift is a measure of how much better prediction results are using a model than could be obtained by chance. You can compute lift after the model is built successfully. You can compute lift using the same test dataset. The test dataset must be compatible with the model as described in Section 2.2.4.

In this step, we illustrate how to compute lift by using MiningLiftTask object and how to retrieve the test results using MiningLiftResult object.

3.8.1 Specify Positive Target Value

To compute lift, a positive target value needs to be specified. This value depends on the dataset and the data mining problem. For example, for a marketing campaign response model, the positive target value could be "customer responds to the campaign". In the Java interface, oracle.dmt.odm.Category class is used to represent the target value.

      Category positiveCategory = new Category(
        "Display name of the positive target value",
        "String representation of the target value",
        DataType.intType //Data type

3.8.2 Compute Lift

To compute lift, create a MiningLiftTask instance by specifying the input arguments that are required to perform the lift operation. The user needs to specify the number of quantiles to be used. A quantile is the specific value of a variable that divides the distribution into two parts: those values that are greater than the quantile value and those values that are less. Here the test dataset records are divided into the user-specified number of quantiles and lift is computed for each quantile.

           //Create, store & execute Lift Task
      MiningLiftTask liftTask = new MiningLiftTask (
                   pds, //test data specification
                   10, //Number of quantiles
                   positiveCategory, //Positive target value
                   "name_of_the_lift_results_object" );, name_of_the_lift_task");      

     //Wait for completion of the lift task
     MiningTaskStatus taskStatus = 

3.8.3 Get the Lift Results

After successful completion of the test task, you can restore the results object persisted in the DMS using restore method.MiningLiftResult. To get the lift measures for each quantile use getLiftResultElements(). Method toString() can be used to display the lift results.

//Restore the lift results
MiningLiftResult liftResult = 
          MiningLiftResult.restore(m_dmsConn, "name_of_the_lift_results");
//Get lift measures for each quantile
LiftResultElement[] quntileLiftResults =  
//Display results

3.9 Scoring Data Using a Model

A classification or clustering model can be applied to new data to make predictions; the process is referred to as "scoring data."

Similar to the test dataset, the apply dataset must have all the active attributes that were used to build the model. Unlike test dataset, apply dataset does not have a target attribute column; the apply process predicts the values of the target attribute. ODM Java API supports real-time scoring in addition to batch scoring (i.e., scoring with an input table)

In this step, we illustrate how to apply a model to a table/view to make predictions and how to apply a model to a single record for real-time scoring.

3.9.1 Describing Apply Input and Output Datasets

The Apply operation requires an input dataset that has all the active attributes that were used to build the model. It produces an output table in the user- specified format.

//Create PhysicalDataSpecification
LocationAccessData lad = new LocationAccessData( 
PhysicalDataSpecification pds = 
          new NonTransactionalDataSpecification( lad );

//Output table location details
LocationAccessData outputTable =  new LocationAccessData(
           "schema_name" );

3.9.2 Specify the Format of the Apply Output

The DMS also needs to know the content of the scoring output. This information is captured in a MiningApplyOutput (MAO) object. An instance of MiningApplyOutput specifies the data (columns) to be included in the apply output table that is created as the result of an apply operation. The columns in the apply output table are described by a combination of ApplyContentItem objects. These columns can be either from the input table or generated by the scoring task (for example, prediction and probability). The following steps create a MiningApplyOutput object:

        // Create MiningApplyOutput object using default settings
        MiningApplyOutput mao = MiningApplyOutput.createDefault();

        // Add all the source attributes to be returned with the scored result.
        // For example, here we add attribute "CUST_ID" from the original table 
        // to the apply output table
        MiningAttribute sourceAttribute =
                  new MiningAttribute("CUST_ID", DataType.intType,
        Attribute destinationAttribute = new Attribute(

        ApplySourceAttributeItem m_ApplySourceAttributeItem =
           new ApplySourceAttributeItem(sourceAttribute,destinationAttribute);
        // Add a source and destination mapping

3.9.3 Apply the Model

To apply the model, create a MiningApplyTask instance by specifying the input arguments that are required to perform the apply operation.

    //Create, store & execute apply Task
    MiningApplyTask applyTask = new MiningApplyTask(
                     pds, //test data specification
                     "name_of_the_model", //Input model name
                     mao, //MiningApplyOutput object
                     outputTable, //Apply output table location details
                     "name_of_the_apply_results" //Apply results name
                     );, name_of_the_apply_task");      

    //Wait for completion of the apply task
    MiningTaskStatus taskStatus = 

3.9.4 Real-Time Scoring

To apply the model to a single record, use the oracle.dmt.odm.result.RecordInstance class. Model classes that support record apply have the static apply method, which can take RecordInstance object as input and returns with the prediction and probability.

In this step, we illustrate the creation of the RecordInstance object and score using Naive Bayes model's apply static method.

//Create RecordInstance object for a model with two active attributes
RecordInstance inputRecord = new RecordInstance();

//Add active attribute values to this record
AttributeInstance attr1 = new AttributeInstance("Attribute1_Name", value);
AttributeInstance attr2 = new AttributeInstance("Attribute2_Name", value);

//Record apply, output record will have the prediction value and its probability 
RecordInstance outputRecord = NaiveBayesModel.apply(
m_dmsConn, inputRecord, "model_name");

3.10 Use of CostMatrix

The class oracle.dmt.odm.CostMatrix is used to represent the costs of the false positive and false negative predictions. It is used for classification problems to specify the costs associated with the false predictions. A user can specify the cost matrix in the classification function settings. For more information about the cost matrix, see ODM Concepts.

The following code illustrates how to create a cost matrix object where the target has two classes: YES (1) and NO (0). Suppose a positive (YES) response to the promotion generates $2 and the cost of the promotion is $1. Then the cost of misclassifying a positive responder is $2. The cost of misclassifying a non-responder is $1.

        // Define a list of categories
        Category negativeCat = new Category("negativeResponse", "0", 
        Category positiveCat = new Category("positiveResponse", "1", 

        // Define a Cost Matrix
        // AddEntry( Actual Category, Predicted Category, Cost Value)
        CostMatrix costMatrix = new CostMatrix();
        // Row 1
        costMatrix.addEntry(negativeCat, negativeCat, new Integer("0"));
        costMatrix.addEntry(negativeCat, positiveCat, new Integer("1"));
        // Row 2
        costMatrix.addEntry(positiveCat, negativeCat, new Integer("2"));
        costMatrix.addEntry(positiveCat, positiveCat, new Integer("0"));
        // Set Cost Matrix to MFS

3.11 Use of PriorProbabilities

The class oracle.dmt.odm.PriorProbabilities is used to represent the prior probabilities of the target values. It is used for classification problems if the actual data has a different distribution for target values than the data provided for the model build. A user can specify the prior probabilities in the classification function settings. For more information about the prior probabilities, see ODM Concepts.

The following code illustrates how to create PriorProbabilities object, when the target has two classes: YES (1) and NO (0), and probability of YES is 0.05, probability of NO is 0.95.

        // Define a list of categories
        Category negativeCat = new Category(
                "negativeResponse", "0", DataType.intType);
        Category positiveCat = new Category(
                "positiveResponse", "1", DataType.intType);  
        // Define a Prior Probability 
        // AddEntry( Target Category, Probability Value)
        PriorProbabilities priorProbability = new PriorProbabilities();
        // Row 1
        priorProbability.addEntry(negativeCat, new Float("0.95"));
        // Row 2
        priorProbability.addEntry(positiveCat, new Float("0.05"));       
        // Set Prior Probabilities to MFS 

3.12 Data Preparation

Data Mining algorithms require the data to be prepared to build mining models and to score. Data preparation requirements can be specific to a function and an algorithm. ODM algorithms require binning (discretization) or normalization, depending on the algorithm. For more information about which algorithm requires what type of data preparation, see ODM Concepts. Java API supports automated binning, automated normalization, external binning, winsorizing, and embedded binning.

In this section, we illustrate how to do the automated binning, automated normalization, external binning, and embedded binning.

3.12.1 Automated Binning and Normalization

In the MiningFunctionSettings, if any of the active attributes are set as unprepared attributes, the DMS chooses the appropriate data preparation (i.e., binning or normalization), depending on the algorithm, and prepares the data automatically before sending it to the algorithm codes.

3.12.2 External Binning

The class oracle.dmt.odm.transformation.Transformation provides the utility methods to perform external binning. Binning is a two-step process, first bin boundary tables need to be created and then bin the actual data using the bin boundary tables as input.

The following code illustrates the creation of bin boundary tables for a table with one categorical attribute and one numerical attribute.

 //Create an array of DiscretizationSpecification
 //for the two columns in the table
 DiscretizationSpecification[] binSpec = new DiscretizationSpecification[2];

 //Specify binning criteria for categorical column.
 //In this example we are specifying binning criteria 
 //as top 5 frequent values need to be used and 
 //the rest of the less frequent values need 
 //to be treated as OTHER_CATEGORY 
 CategoricalDiscretization binCategoricalCriteria = 
          new CategoricalDiscretization(5,"OTHER_CATEGORY");

 binSpec[0] = new DiscretizationSpecification(
          "categorical_attribute_name", binCategoricalCriteria);

 //Specify binning criteria for numerical column.
 //In this example we are specifying binning criteria 
 //as use equal width binning with 10 bins and use
 //winsorize technique to filter 1 tail percent 
 float tailPercentage = 1.0f; //tail percentage value

 NumericalDiscretization binNumericCriteria = 
          new NumericalDiscretization(10, tailPercentage);
 binSpec[1] = new DiscretizationSpecification(
          "numerical_attribute_name", binNumericCriteria);

  //Create PhysicalDataSpecification object for the input data
 LocationAccessData lad = new LocationAccessData( 
                                     "schema_name" );         
 PhysicalDataSpecification pds = 
          new NonTransactionalDataSpecification( lad );
 //Create bin boundaries tables
          m_dmsConn, //DMS connection
          lad,  pds, //Input data details
          binSpec,   //Binning criteria

//Resulting discretized view location
LocationAccessData resultViewLocation = new LocationAccessData( 
           "schema_name" );
//Perform binning
            m_dmsConn, // DMS connection
            lad, pds,  // Input data details
            resultViewLocation, // location of the resulting binned view
            true                // open ended binning

3.12.3 Embedded Binning

In case of external binning, the user needs to maintain the bin boundary tables and use these tables to bin the data. In case of embedded, the user can give the binning bin boundary tables as an input to the model build operation. The model will maintain these tables internally and use them for binning of the data for build, apply, test, or lift operations.

The following code illustrates how to associate the bin boundary tables with the mining function settings.

//Create location access data objects for bin boundary tables
LocationAccessData numBinBoundaries = new LocationAccessData( 

LocationAccessData catBinBoundaries = new LocationAccessData( 

//Get the Logical Data Specification from the MiningFunctionSettings class
LogicalDataSpecification lds = mfs.getLogicalDataSpecification();

//Set the bin boundary tables to the logical data specification  
lds.setUserSuppliedDiscretizationTables(numBinBoundaries, catBinBoundaries);

3.13 Text Mining

ODM Java API supports text mining for SVM and NMF algorithms. For these algorithms, an input table can have a combination of categorical, numerical, and text columns. The data mining server (DMS) internally performs the transformations required for the text data before building the model.

Note that for text mining, the case-id column must be specified in the NonTransactionalDataSpecification object, case-id column must have not-NULL unique values.

The following code illustrates how to set the text attribute in the ODM Java API.

//Set a caseid/sequenceid column for the dataset with active text attribute
Attribute sequenceAttr = new Attribute ("case_id_column_name",;
pds.setSequenceAttribute( Attribute sequenceAttr )

//Set the text attribute
mfs.adjustAttributesType( new String[] {"text_attribute_column"}, 
            AttributeType.text );

3.14 Summary of Java Sample Programs

All the demo programs listed in the tables below are located in the directory $ORACLE_HOME/dm/demo/sample/java.

The summary description of these sample programs is also provided in $ORACLE_HOME/dm/demo/sample/java.101/README.txt.

Note: Before executing these programs, make sure that the SH schema and user schema are installed with the datasets used by these programs. You also need to provide DB URL, username, and password in login method and a valid data schema name by changing the DATA_SCHEMA_NAME constant value in the program.

Table 3-1  Java Sample Programs
Sample Program Description

Classification using the ABN algorithm

Determine most important attributes using the Attribute Importance algorithm; then use the resulting AI model to build classification model using Naive Bayes algorithm

Association (AR) model using the Apriori algorithm; extracting association rules

Use of cost matrix; compare results with and without the cost matrix

Use of discretization methodologies: automated, external discretization, and user-supplied bin boundaries (embedded binning)

Clustering using the k-Means algorithm

Classification using the Naive Bayes algorithm

Feature extraction and text mining using the Non-Negative Matrix Factorization (NMF) algorithm

Clustering using the O-Cluster algorithm.

Import and export a PMML model

Use of prior probability; compare results with and without the prior probability

Classification and text mining using the SVM algorithm.

Regression using the SVM algorithm