Skip Headers

Oracle Data Mining Application Developer's Guide
10g Release 1 (10.1)

Part Number B10699-01
Go to Documentation Home
Go to Book List
Book List
Go to Table of Contents
Go to Index
Go to Master Index
Master Index
Go to Feedback page

Go to previous page
Go to next page
View PDF


This chapter discusses the following topics related to writing data mining programs with the PL/SQL interface:

This chapter provides an overview of the steps required to perform basic Oracle Data Mining tasks. For detailed examples of how to perform these tasks, see the sample programs in Chapter 5.

This chapter does not include detailed descriptions of the PL/SQL subprograms. For that information, see the DBMS_DATA_MINING and DBMS_DATA_MINING_TRANSFORM chapters in the PL/SQL Packages and Types Reference.

The DBMS_DATA_MINING package provides PL/SQL support for in-database data mining. You can use the package to build a mining model, test the model, and apply this model to your data to obtain predictive and descriptive information.

See also:

This chapter discusses the following topics:

DBMS_DATA_MINING subprograms are presented in PL/SQL Packages and Types Reference. Sample code is described in Chapter 5 of this manual; the code itself is in the dm/demo/sample/plsql directory.

4.1 Development Methodology

The development methodology for data mining using the DBMS_DATA_MINING API is divided into two phases.

The first phase includes your application and data analysis and design, where you perform the following two steps:

  1. Analyze your problem, and choose the mining function and algorithm.
  2. Analyze the data to be used for building mining models (build data), testing predictive models (test data), and the new data on which the model will be applied (scoring data).

The second phase involves developing a mining application using DBMS_DATA_MINING and DBMS_DATA_MINING_TRANSFORM packages.

  1. Prepare the build, test, and scoring data using the DBMS_DATA_MINING_TRANSFORM package or other third-party tool or direct SQL or PL/SQL utility scripts in a manner suitable for the chosen mining function and algorithm. An important caveat is that the three detests referred to above have to be prepared in an identical manner for mining results to be meaningful. This is an optional step.
  2. Prepare a settings table that overrides the default mining algorithm for a given mining function, and the default algorithm settings. This is also an optional step.
  3. Build a mining model for the given training dataset.
  4. For predictive models (classification and regression), test the model for its accuracy and other attributes. This amounts to applying the model on the test data (i.e., scoring the test data), and computing various matrix on the apply results.
  5. Retrieve the model signature to determine the mining attributes required by a given model for scoring. This information will help ascertain that the scoring data is suitable for a given model. This is an optional step.
  6. Apply a classification, regression, clustering, or feature extraction model to new data to generate predictions and/or descriptive summaries and patterns about the data.
  7. Retrieve the model details to understand why a model scored the data in a particular manner. This is an optional step.
  8. Repeat steps 3 through 9 until you obtain satisfactory results.

4.2 Mining Models, Function, and Algorithm Settings

The DBMS_DATA_MINING package creates a mining model for a mining function using a specified mining algorithm that supports the function. The algorithm can be influenced by specific algorithm settings.

4.2.1 Mining Model

A model is identified by its name. Like tables in the database, a model has storage associated with it. But unlike a table, the form, shape, and content of this storage is opaque to the user. However, the user can view the contents of a model -- that is, the patterns and rules that constitute a mining model -- using algorithm-specific GET_MODEL_DETAILS functions. In addition, dm_user_models provides the model size in megabytes.

4.2.2 Mining Function

The DBMS_DATA_MINING package supports Classification, Regression, Association Rules, Clustering, and Feature Extraction. You can specify your choice of mining function through a parameter to the CREATE_MODEL procedure.

4.2.3 Mining Algorithm

Each mining function can be implemented using one or more algorithms. Table 4-1 provides a list of supported algorithms. Oracle assumes a default algorithm for each mining function, but you can override this default through an explicit setting in the settings table.

Table 4-1   DBMS_DATA_MINING Summary of Functions and Algorithms
Mining Function Mining Algorithm


Naive Bayes (NB) -- default algorithm


Adaptive Bayes Network (ABN)


Support Vector Machine (SVM)


Support Vector Machine (SVM) -- default algorithm

Association Rules

Apriori Association Rules (AR)


k-Means (KM)

Feature Extraction

Non-Negative Matrix Factorization (NMF)

Each algorithm has one or more settings or parameters that influence the way it builds the model. Oracle assumes a default set of algorithm settings for each mining algorithm. These defaults are available for your review through the table function GET_DEFAULT_SETTINGS. To override the defaults, you must provide the choice of the algorithm and the settings for the algorithm through a settings table input to the CREATE_MODEL procedure.

4.2.4 Settings Table

The settings table is a simple relational table with a fixed schema. You can choose the name of the settings table, but the column names and their types must be defined as specified below.

(setting_name   VARCHAR2(30),
setting_value   VARCHAR2(128))

The values provided in the settings table override the default values assumed by the system. The values inserted into the setting_name column are one or more of several constants defined in the DBMS_DATA_MINING package. Depending on what the setting name denotes, the value for the setting_value column can be a predefined constant or the actual numerical value corresponding to the setting itself. The setting_value column is defined to be VARCHAR2, so you must cast numerical inputs to string using the TO_CHAR() function before input into the settings table.

Table 4-2 through Table 4-7 list the various setting names and the valid setting values, with a brief explanation of each setting

Table 4-2 DBMS_DATA_MINING Function Settings
Algorithm Settings Setting Value (with Permissible Value Ranges)


Classification: One of:

  • algo_naive_bayes
  • algo_support_vector_machines
  • algo_adaptive_bayes_network


  • algo_support_vector_machines

Association Rules:

  • algo_apriori_association_rules


  • algo_kmeans

Feature Extraction:

  • algo_non_negative_matrix_factor

Attribute Importance:

  • algo_ai_mdl


VARCHAR2 string denoting the name of a relational table of fixed schema containing prior probabilities. The schema of this table is provided in Section

This input is applicable only for classification algorithms. The prior probabilities table must be present in the current user's schema


TO_CHAR(numeric_expr >= 1)

Number of clusters generated by a clustering algorithm

Default value is 10


TO_CHAR(numeric_expr >= 1)

Number of features to be extracted

Default value estimated from the data by the algorithm


TO_CHAR(2 <= numeric_expr <= 20)

Maximum rule length for AR algorithm

Default value is 4


TO_CHAR(0 <= numeric_expr <= 1)

Minimum confidence value for AR algorithm

Default value is 0.1


TO_CHAR(0 <= numeric_expr <= 1)

Minimum support value for AR algorithm

Default value is 0.1

Table 4-3  Algorithm Settings for Adaptive Bayes Network
Setting Name Setting Value (with Permissible Value Ranges)


Model type for Adaptive Bayes Network:

  • abns_single_feature (single feature)
  • abns_multi_feature (multi feature
  • abns_naive_bayes (naive bayes)

Default value is abns_multi_feature


TO_CHAR(numeric_expr >= 0)

The maximum time threshold for completion of model build. Default value is 0, which implies no time limit.


TO_CHAR(numeric_expr > 0)

Maximum number of Naive Bayes predictors to be considered for model build, when the model type is chosen to be abns_naive_bayes. Default value is 10.


TO_CHAR(numeric_expr > 0)

Default is 25

Table 4-4  Algorithm Settings for Naive Bayes
Setting Name Setting Value (with Permissible Value Ranges)


TO_CHAR(0 <= numeric_expr <=1)

Value of singleton threshold for NB algorithm

Default value is 0.01


TO_CHAR (0 <= numeric_expr <=1)

Value of pairwise threshold for NB algorithm

Default value is 0.01

Table 4-5  Algorithm Settings for Support Vector Machines
Setting Name Setting Value (with Permissible Value Ranges)


Kernel for Support Vector Machine:

  • svms_linear (for Linear Kernel
  • svms_gaussian (for Gaussian Kernel)

Default value is svms_linear


TO_CHAR(numeric_expr > 0)

Value of kernel cache size for SVM algorithm. Applies to Gaussian kernel only.

Default value is 50000000 bytes


TO_CHAR(numeric_expr > 0)

Convergence tolerance for SVM algorithm

Default value is 0.001.


TO_CHAR(numeric_expr > 0)

Value of standard deviation for SVM algorithm

This is applicable only for Gaussian kernel

Default value estimated from the data by the algorithm


TO_CHAR(numeric_expr > 0)

Value of complexity factor for SVM algorithm

Default value estimated from the data by the algorithm


TO_CHAR(numeric_expr > 0)

Value of epsilon factor for SVM Regression

Default value estimated from the data by the algorithm

Table 4-6  Algorithm Settings for Non-Negative Matrix Factorization
Setting Name Setting Value (with Permissible Value Ranges)



Number of iterations for NMF algorithm

Default value is -1


TO_CHAR(1 <= numeric_expr <= 500)

Number of iterations for NMF algorithm

Default value is 50


TO_CHAR(0 < numeric_expr <= 0.5)

Convergence tolerance for NMF algorithm

Default value is 0.05

Table 4-7  Algorithm Settings for k-Means
Setting Name Setting Value (with Permissible Value Ranges)


Distance Function for k-Means Clustering:

  • kmns_euclidean
  • kmns_cosine
  • kmns_fast_cosine

Default value is kmns_euclidean


TO_CHAR(0 <numeric_expr <= 20)

Number of iterations for k-Means algorithm

Default value is 3


TO_CHAR(0 < numeric_expr <= 0.5)

Convergence tolerance for k-Means algorithm

Default value is 0.01


Split criterion for k-Means Clustering:

  • kmns_variance
  • kmns_size

Default value is kmns_variance


Number of histogram bins. Specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.

Range > 0

Default value is 10.


TO_CHAR(1 < numeric_expr <= 5)

Growth factor for memory allocated to hold cluster data

Default value is 2


Minimum percentage support required for attributes in rules. Specifies the minimum percentage of values for an attribute in a given cluster required to include this attribute in the rule description of the cluster. That is, if the required support level is not met, the attribute would be omitted from the rule. This would allow retaining in the rule only the well-represented attributes. Setting the parameter value too high in data with missing values can result in very short or even empty rules.

Range >= 0 and <= 1

Default is 0.1.

You can create a settings table as shown in the example below for an SVM classification model, and edit the individual values using SQL DML.

CREATE TABLE drugstore_settings (
  setting_name VARCHAR2(30),
  setting_value VARCHAR2(128))
-- override the default for convergence tolerance for SVM Classification
INSERT INTO drugstore_model_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_conv_tolerance, TO_CHAR(0.081));

The table function GET_DEFAULT_SETTINGS provides you all the default settings for mining functions and algorithms. If you intend to override all the default settings, you can create a seed settings table and edit them using DML.

CREATE TABLE drug_store_settings AS
SELECT setting_name, setting_value
 WHERE setting_name LIKE 'SVMS_%';
-- update the values using appropriate DML

You can also create a settings table based on another model's settings using GET_MODEL_SETTINGS, as shown in the example below.

CREATE TABLE my_new_model_settings AS
SELECT setting_name, setting_value 
END; Prior Probabilities Table

Consult Oracle Data Mining Concepts for an explanation of the prior probabilities table. You can specify a prior probabilities table as an optional function setting when building classification models.

You must create the prior probabilities table using the fixed schema shown below. For numerical targets, use the following schema:

target_value      NUMBER
prior_probability NUMBER

For categorical targets, use the following schema:

target_value      VARCHAR2
prior_probability NUMBER

Next, provide the name of the prior probabilities table as input to the setting_value column in the settings table, with the corresponding value for the setting_name column to be DBMS_DATA_MINING.clas_priors_table_name, as shown below:

INSERT INTO drugstore_settings (setting_name, setting_value) VALUES (DBMS_
END; Cost Matrix Table

Consult Oracle Data Mining Concepts for an explanation of the cost matrix. You must create a cost matrix table with the fixed schema shown below. For numerical targets, use the following schema:

actual_target_value    NUMBER
predicted_target_value NUMBER
cost                   NUMBER

For categorical targets, use the following schema:

actual_target_value    VARCHAR2
predicted_target_value VARCHAR2
cost                   NUMBER

The DBMS_DATA_MINING package enables you to evaluate the cost of predictions from classification models in an iterative manner during the experimental phase of mining, and to eventually apply the optimal cost matrix to predictions on the actual scoring data in a production environment.

The data input to each COMPUTE procedure in the package is the result generated from applying the model on test data. In addition, if you also provide a cost matrix as an input, the COMPUTE procedure generates test results taking the cost matrix into account. This enables you to experiment with various costs for a given prediction against the same APPLY results, without rebuilding the model and applying it against the same test data for every iteration.

Once you arrive at an optimal cost matrix, you can then input this cost matrix to the RANK_APPLY procedure along with the results of APPLY on your scoring data. RANK_APPLY will provide your new data ranked by cost.

4.3 Mining Operations and Results

There are essentially four classes of mining operations supported by the DBMS_DATA_MINING package:

Of these, the first set represents DDL-like operations. The last set represents utilities. The rest are query-like operations in that they do not modify the model. Besides these operations, the following capabilities are also provided as part of the Oracle Data Mining installation:

Mining results are either returned as result sets or persisted as fixed schema tables.

4.3.1 Build Results

The CREATE_MODEL operation creates a mining model. The viewable contents of a mining model are provided to you through GET_MODEL_DETAILS functions for each supported algorithm. In addition, GET_MODEL_SIGNATURE and GET_MODEL_SETTINGS provide descriptive information about the model.

4.3.2 Apply Results

The APPLY operation creates and populates a fixed schema table with a given name. The schema of this table varies based on the particular mining function, algorithm, and target attribute type -- numerical or categorical.

The RANK_APPLY operation takes this results table as input and generates another table with results ranked based on a top-N input, and for classification models, also based on cost. The schema of this table varies based on the particular mining function, algorithm, and the target attribute type -- numerical or categorical.

4.3.3 Test Results for Classification Models

The COMPUTE routines provided in the package are the most popularly used metrics for classification. They are not tied to a particular model -- they can compute the metrics from any meaningful data input as long as the schema of the input tables fits the specification of the apply results table and the targets tables. Please consult any of the COMPUTE descriptions in this document for details.

4.3.4 Test Results for Regression Models

The most commonly used metrics for regression models are root mean square error and mean absolute error. You can use the simple SQL queries provided below to compute these metrics by replacing the italicized tokens with table and column names appropriate for your application. Root Mean Square Error

SELECT sqrt(avg((A.prediction - B.target_column_name) * 
                (A.prediction - B.target_column_name))) rmse 
  FROM apply_results_table A, targets_table B 
 WHERE A.case_id_column_name = B.case_id_column_name; Mean Absolute Error

Given the targets_table generated from the test data with the schema:

(case_id_column_name    VARCHAR2,
target_column_name      NUMBER)

and apply results table for regression with the schema:

(case_id_column_name    VARCHAR2, 
prediction              NUMBER)

and a normalization table (optional) with the schema:

(attribute_name        VARCHAR2(30),
scale                  NUMBER,
shift                  NUMBER)

the query for mean absolute error is:

    AVG(ABS(T.actual_value - T.target_value)) mean_absolute_error
  FROM (SELECT B.case_id_column_name
              (B.target_column_name * N.scale + N.shift) actual_value
          FROM targets_table B,
               normalization_table N
        WHERE N.attribute_name = B.target_column_name AND
              B.target_column_name = 1) T,
apply_results_table_name A
 WHERE A.case_id_column_name = T.case_id_column_name;

You can fill in the italicized values with the actual column and table names chosen by you. If the data has not undergone normalization transformation, you can eliminate those references from the subquery. See dm/demo/sample/plsql/svmrdemo.sql for an example.

4.4 Mining Data

The data input for all the mining operations should be through standard relational tables, where each row in the table represents a case, and each column in the table uniquely represents a mining attribute. We call this format single-record case.

A table in the Oracle RDBMS supports 1000 columns. The DBMS_DATA_MINING package requires a mandatory case identifier column, which implies that you can provide 999 mining attributes using this representation, which is adequate for most business applications.

4.4.1 Wide Data Support

The notable exceptions to the common scenario are applications in the domains of bioinformatics, text mining, and such specialized areas where data is characterized to be wide and shallow -- with relatively few cases numbering in the thousands, but with several thousand mining attributes.

You can provide such data in a multi-record case format, where attribute, value pairs are grouped into collections (nested tables) associated with a given case-id. You must use the fixed collection types DM_Nested_Numericals and DM_Nested_Categoricals to define columns that represent collections of numerical attributes and categorical attributes respectively.

A few caveats on the use of multi-record case format:

From a data modeling perspective, the multi-record case format is analogous to the dimension-fact relationship commonly found in OLAP applications, where the dimension and fact tables are aggregated based on a common key for generating reports using materialized views or SQL queries.

The main distinction between the OLAP fact table and the multi-record case is this: Each row in the fact table represents a column value, whereas each row in the multi-record collection represents an attribute name (paired with its corresponding value in another column in the nested table). The two fixed collection types are provided to highlight and enforce this distinction. These two types are defined with the reasonable assumption that mining attributes of the same type (numericals versus categoricals) are generally grouped together, just as a fact table contains values that logically correspond to the same entity.

Oracle strongly recommends that you present your multi-record case data using object views, and present the view as input to CREATE_MODEL and APPLY operations. Apart from the benefit of providing all your mining attributes through a single row-source without impacting their physical data storage, the view acts as a join specification on the underlying tables that can be used by the server for efficiently accessing your data.

We illustrate this discussion on wide data with a real-world example of an analytical pipeline for brain tumor research. The datasets of relevance are gene expression data from the instruments (fact table), and the clinical data about the patient (dimension table). The schemas of these tables are provided below. Clinical Data -- Dimension Table

(case_id    NUMBER,
name        VARCHAR2(30)
type        VARCHAR2(30)
subtype     VARCHAR2(30),
gender      CHAR(1),
age         NUMBER,
status      VARCHAR2(30)) Gene Expression Data -- Fact Table

(case_id    NUMBER,
gene        VARCHAR2(30),
expr        NUMBER)

Let us consider building a model with status as the target prediction, and with sex, age, and expr being the predictors. You can provide the build data input using an object view that combines the clinical_data table and the gene_expression_data table with the schema:

(case_id     NUMBER,
gender       CHAR(1),
age          NUMBER,
gene_expr    DM_Nested_Numericals,
status       VARCHAR2(30))

The query for constructing such an object view is relatively simple in Oracle SQL:

SELECT C.case_id,
       SELECT gene, expr
         FROM gene_expression_data
        WHERE case_id = C.case_id) AS DM_Nested_Numericals
       ) gene_expr,
 FROM  clinical_data C

Now gene_expr_build can be provided as the input data_table_name for CREATE_MODEL.

4.4.2 Attribute Types

Oracle Data Mining handles categorical and numerical attributes. See Oracle Data Mining Concepts for explanation and examples of attribute types.

The DBMS_DATA_MINING package infers the attribute type based on the type of the column representing the mining attribute. You must define all columns representing numerical attributes to be of type NUMBER. You must define all columns representing categorical attributes to be of type VARCHAR2 or CHAR.

In situations where you have numbers that are deemed categorical data by your application logic, you must typecast such attribute values using the TO_CHAR() operator and populate them into a VARCHAR2 or CHAR column representing the mining attribute.

In situations where you have numeric attribute values stored in a CHAR or VARCHAR2 column, you must typecast those attribute values using the TO_NUMBER() operator and store them in a NUMBER column.

If persisting these transformed values in another table is not a viable option, you could also create a view with these conversions in place, and provide the view name to represent the training data input for the CREATE_MODEL operation.

4.4.3 Target Attribute

Classification and Regression algorithms require a target attribute. The package supports single targets. The target attribute for all classification algorithms can be numerical or categorical. SVM Regression supports only numerical target attributes.

4.4.4 Data Transformations

All mining operations assume the incoming data to be already prepared and transformed. You can preprocess your data using the DBMS_DATA_MINING_TRANSFORM package, or by using any third-party tool, or using your own homegrown utilities based on SQL and/or PL/SQL scripts.

If you preprocess or transform the input data, you must also reverse-transform the results from APPLY to map the data back to its original form.

See PL/SQL Packages and Types Reference for a description of the DBMS_DATA_MINING_TRANSFORM package.

The suggestion to customers with wide data is to perform transforms before forming WD views on data in single-record (2D) or multi-record (transactional) format. It is possible to use dbmsdmxf for multi-record format. In this case, all attributes are transformed in a similar way. In most of the cases, attributes in transactional (multi-record) form are of the same scale and thus this approach works. Otherwise, our suggestion is to split the data into sets of similar items and then transform them separately.

4.5 Performance Considerations

If you have mining attributes numbering in the few hundreds, and your application requires them to be represented as columns in the same row of the table, here are some data storage issues to consider. For a table with several columns, the key question to consider is the (average) row length, not the number of columns. Having more than 255 columns in a table built with a smaller block size typically results in intra-block chaining. Oracle stores multiple row pieces in the same block owing to pragmatics of design, but the overhead to maintain the column information is minimal as long as all row pieces fit in a single data block. If the rows don't fit in a single data block, you may consider using a larger database block size (or use multiple block sizes in the same database). For more details, consult the Oracle Data Mining Concepts and the Oracle Database Performance Tuning Guide.

4.6 Rules and Limitations for DBMS_DATA_MINING

The use of DBMS_DATA_MINING is subject to the following rules and limitations:

4.7 Summary of Data Types, Constants, Exceptions, and User Views

The DBMS_DATA_MINING and the DBMS_DATA_MINING_TRANSFORM packages use the data types shown in Table 4-8.

Table 4-8 DBMS_DATA_MINING Summary of Data Types
Data Type Purpose


This type represents each row of the model detail output generated by GET_MODEL_DETAILS_ABN


This type represents the ABN model details generated by GET_MODEL_DETAILS_ABN


This type represents the centroid of a cluster. It is used when retrieving cluster details using GET_MODEL_DETAILS_KM.


This type represents each child node of a cluster


This type represents a set of children nodes for a given cluster identifier


This type represents a cluster retrieved using GET_MODEL_DETAILS_KM


This type represents a set of clusters


This type represents each conditional probability from a set of conditional probabilities associated with each mining attribute used in a Naive Bayes or Adaptive Bayes Network model


This type represents conditional probabilities associated with a given mining attribute used in a Naive Bayes or Adaptive Bayes Network model. It is used when retrieving model details using GET_MODEL_DETAILS_NB or GET_MODEL_DETAILS_ABN respectively.


This type represents a histogram associated with a cluster identifier. It is used when retrieving cluster details using GET_MODEL_DETAILS_KM


This type represents a set of histograms


This type represents an item in a set of items


This type represents the set of items in an ItemSet


This type represents an ItemSet


This type represents frequent Itemsets in Association models.


This type represents the algorithm settings retrieved using the GET_MODEL_SETTINGS function.


This type represents a list of model signature attributes generated by GET_MODEL_SIGNATURE


This type represents a list of model names provided as input for the parameter model_names in EXPORT_MODEL and IMPORT_MODEL procedures.


his type represents the each row of the model detail output generated by GET_MODEL_DETAILS_NB


This type represents the NB model details generated by GET_MODEL_DETAILS_NB.


This type represents a nested table of categorical attributes, used for representing wide data.


This type represents a nested table of numerical attributes, used for representing wide data.


This type represents each attribute in an attribute set for NMF model details


This type represents a set of attributes that correspond to a feature identifier, returned by GET_MODEL_DETAILS_NMF.


This type represents a feature in a NMF model


This type represents a set of features returned by GET_MODEL_DETAILS_NMF.


This type represents each predicate in the set of predicates in a rule.


This type represents a set of predicates that constitute either the antecedent or the consequent of a rule.


This type represents an entry in the set of ranked attribute returned by GET_MODEL_DETAILS_AI, ranked by the attribute's importance.


This type represents a list of ranked attributes returned by GET_MODEL_DETAILS_AI.


This type represents each rule in a list of rules generated by either GET_ASSOCIATION_RULES or GET_MODEL_DETAILS_KM.


This type represents rules retrieved for Association Rules or k-means models using GET_ASSOCIATION_RULES and GET_MODEL_DETAILS_KM respectively.


This type represents each attribute in an attribute set for SVM model details


This type represents a set of attributes returned by GET_MODEL_DETAILS_SVM for a linear model.


This type represents an entry in the set of linear coefficients returned by GET_MODEL_DETAILS_SVM


This type represents the set of linear coefficients returned by GET_MODEL_DETAILS_SVM for an SVM model built using the linear kernel.


Table 4-9 through Table 4-16 list the constants to be used for various settings in the settings table.

Table 4-9  DBMS_DATA_MINING Constants Summary: Mining Function
Constant Purpose


Parameter value for mining_function in CREATE_MODEL, representing association mining function

attribute importance

Parameter value for mining_function in CREATE_MODEL, representing attribute importance mining function


Parameter value for mining_function in CREATE_MODEL, representing classification mining function


Parameter value for mining_function in CREATE_MODEL, representing regression mining function


Parameter value for mining_function in CREATE_MODEL, representing clustering mining function


Parameter value for mining_function in CREATE_MODEL, representing Feature Extraction mining function

Table 4-10  DBMS_DATA_MINING Constants Summary: Function Settings
Constant Purpose


Setting name representing prior probability table name for classification function


Setting name representing number of clusters for clustering function


Setting name representing number of features for feature selection function


Setting name representing maximum rule length


Setting name representing minimum confidence


Setting name representing minimum support

Table 4-11  DBMS_DATA_MINING Constants Summary: Algorithm Settings
Constant Purpose


Setting name representing the mining algorithm


Setting value for Apriori algorithm for association rules


Setting value for Naive Bayes (NB) algorithm for classification


Setting value for Support Vector Machine (SVM) algorithm for classification or regression


Setting value for Non-Negative Matrix Factorization (NMF) for feature selection


Setting value for k-Means (KM) for clustering


Setting value for Minimum Description Length based algorithm for Attribute Importance.

Table 4-12  DBMS_DATA_MINING Constants Summary: Adaptive Bayes Network
Constant Purpose


Setting name representing ABN model type


Setting value representing single feature ABN model


Setting value representing multi feature ABN model


Setting value representing Naive Bayes ABN model


Setting name representing maximum time threshold to complete an ABN model build


Setting name representing the maximum number of Naive Bayes predictors to be considered for building an ABN model of type abns_naive_bayes

Table 4-13  DBMS_DATA_MINING Constants Summary: Naive Bayes
Constant Purpose


Setting value for singleton threshold for Naive Bayes


Setting value for pair-wise threshold for Naive Bayes

Table 4-14  DBMS_DATA_MINING Constants Summary: Support Vector Machines
Constant Purpose


Setting name representing the kernel function for SVM


Setting value for Linear Kernel for SVM


Setting value for Gaussian Kernel for SVM


Setting name representing for Kernel Cache Size for Support Vector Machine


Setting name representing tolerance for SVM


Setting name representing standard deviation for


Setting name representing complexity factor for SVM


Setting name representing epsilon for SVM Regression

Table 4-15  DBMS_DATA_MINING Constants Summary: Non-Negative Matrix Factorization
Constant Purpose


Setting name representing number of iterations


Setting name representing convergence tolerance


Setting name representing random seed for NMF

Table 4-16  DBMS_DATA_MINING Constants Summary: k-Means
Constant Purpose


Setting name representing distance function


Setting value representing Euclidean distance function


Setting value representing cosine distance function


Setting value representing fast cosine distance function


Setting name representing number of iterations


Setting name representing convergence tolerance


Setting name representing split criterion


Setting value representing variance as the split criterion


Setting value representing size as the split criterion


Setting name representing growth factor for memory allocated to hold cluster data


Setting value for number of histogram bins


Setting value for minimum percentage report required for attributes in rules

Table 6-18 lists the errors generated by DBMS_DATA_MINING.

Table 4-17 DBMS DATA_MINING Errors Summary
Oracle Error Description


Invalid input parameter %s


Column %s does not exist in the input table %s


Model %s does not exist


Model %s already exists


Invalid setting name %s


Invalid setting value for setting name %s


Duplicate or multiple function settings


Duplicate or multiple algorithm settings for function %s


Invalid setting: %s for function %s


Algorithm name: %s is invalid


Invalid target data type in input data for function: %


Contradictory values for settings: %s, %s


Duplicate setting: %s


Model %s is incompatible with current operation


Feature not supported


Apply result table %s is incompatible with current operation


Maximum number of attributes exceeded


Maximum target cardinality exceeded


Data mining model export failed, job name=%s, error=%s


Data mining model import failed, job name=%s, error=%s


Model is currently in use by another process


No support vectors were found


No target values were found


No target counter examples were found


Input data for model build contains negative values


NMF: number of features not between [1, %s]


No statistically significant features were found


Apply rules prohibited for this model mode


Invalid model type %s for Adaptive Bayes Network algorithm

Table 4-18 lists the user views provided by Oracle to obtain information about the models generated using DBMS_DATA_MINING.

Table 4-18 DBMS_DATA_MINING Summary of User Views
User View Purpose


Lists all models in a given user's schema.

4.8 Summary of DBMS_DATA_MINING Subprograms

Table 4-19  DBMS_DATA_MINING Summary of Subprograms
Data Type Purpose

APPLY Procedure

Applies a model to scoring data


Creates (builds) a mining model


Computes the confusion matrix from the APPLY results on test data for a classification model; also provides the accuracy of the model


Computes lift for a given positive target value from the APPLY results on test data for a classification model


Computes Receiver Operating Characteristic for target attributes with binary class from the APPLY results on test data for a classification model.

DROP_MODEL Procedure

Drops a model


Exports one or more models from a schema


This table function returns the rules from an Association model


This table function returns all the default settings for all mining functions and algorithms.


Returns a set of rows that represent the frequent itemsets from an Association model.


Provides the details of an Adaptive Bayes Network model


Provides the details of a k-Means model


Provides the details of a Naive Bayes model


Provides the details of an NMF model


Provides the details of a SVM model


Provides the settings used to build a model


Provides the signature of a model


Imports one or more models into the current schema

RANK_APPLY Procedure

Ranks the predictions from the APPLY results for a classification model


Renames a model

4.9 Model Export and Import

Data mining models can be moved between Oracle databases or schemas. For example, in an organization, data mining specialists may build and test data mining models in a data mining lab. After models are built and tested in the lab, the chosen model may be moved to a scoring engine used by applications. Because data mining lab and scoring engine usually do not share the same database, the model must be exported from the lab and then imported to the scoring engine. Model export and import can be a routine procedure. As new data are accumulated, data mining specialists will build and test new models, and newer and better models will be loaded onto the scoring engine on a regular basis. DBAs will want to back up and restore models in their routine database maintenance.

Native export and import of data mining models are supported in the following scenarios:

4.9.1 Limitations

The use of model export and import is subject to the following limitations:

4.9.2 Prerequisites

Prerequisites for model export are as follows:

Prerequisites for model import are as follows:

See also:

4.9.3 Choose the Right Utility

There are two ways to export models:

To export all data mining models in a user schema, you can either run expdp or use EXPORT_MODEL with the parameter model_filter set to NULL. Note the difference between the two operations: When you run expdp to export the schema, all objects in the schema including data mining models are exported. When you run EXPORT_MODEL with a NULL model_filter, only the models will be exported.

There are also two ways to import models from the dump file:

In general, if you want to import the full dump file set, run impdp. This imports all database objects and data, including all data mining models, from the dump file set. If you want to import models only, use IMPORT_MODEL. When the parameter model_filter is set to NULL, all models are imported from the dump. If valid model names are assigned in model_filter, this operation imports only named models from the dump file set.

4.9.4 Temp Tables

Data mining model export and import jobs will create and manage two temporary tables in the user schema: DM$P_MODEL_EXPIMP_TEMP and DM$P_MODEL_TABKEY_TEMP. Users should not manipulate these tables.