Skip Headers
Oracle® Data Mining User's Guide
12c Release 1 (12.1)

E17693-15
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

2 About the Data Mining API

This chapter provides an overview of the components of the Oracle Data Mining API. This chapter contains the following topics:

About Mining Models

Mining models are database schema objects that perform data mining. As with all schema objects, access to mining models is controlled by database privileges. Models can be exported and imported. They support comments, and they can be tracked in the Database auditing system.

Mining models are created by the CREATE_MODEL procedure in the DBMS_DATA_MINING PL/SQL package. Models are created for a specific mining function, and they use a specific algorithm to perform that function. Mining function is a data mining term that refers to a class of mining problems to be solved. Examples of mining functions are: regression, classification, attribute importance, clustering, anomaly detection, and feature extraction. Oracle Data Mining supports one or more algorithms for each mining function.

Note:

Most types of mining models can be used to score data. However, it is possible to score data without applying a model. Dynamic scoring and predictive analytics return scoring results without a user-supplied model. They create and apply transient models that are not visible to the user. For more information, see:

Data Mining Data Dictionary Views

The data dictionary views for Oracle Data Mining are listed in Table 2-1. DBA and USER versions of the views are also available.

Table 2-1 Data Dictionary Views for Oracle Data Mining

View Name Description

ALL_MINING_MODELS

Provides information about all accessible mining models

ALL_MINING_MODEL_ATTRIBUTES

Provides information about the attributes of all accessible mining models

ALL_MINING_MODEL_SETTINGS

Provides information the configuration settings for all accessible mining models


USER_MINING_MODELS

Example 2-1 describes USER_MINING_MODELS and shows a sample query.

Example 2-1 USER_MINING_MODELS

describe user_mining_models
 Name                             Null?    Type
 -------------------------------- -------- ----------------------
 MODEL_NAME                       NOT NULL VARCHAR2(128)
 MINING_FUNCTION                           VARCHAR2(30)
 ALGORITHM                                 VARCHAR2(30)
 CREATION_DATE                    NOT NULL DATE
 BUILD_DURATION                            NUMBER
 MODEL_SIZE                                NUMBER
 COMMENTS                                  VARCHAR2(4000)

The following query returns the models in the user's schema that use the Support Vector Machine algorithm.

SELECT  mining_function, model_name, algorithm
     FROM user_mining_models
     WHERE algorithm IN 'SUPPORT_VECTOR_MACHINES'
     ORDER BY mining_function;

MINING_FUNCTION           MODEL_NAME           ALGORITHM
------------------------- -------------------- ------------------------------
CLASSIFICATION            SVMO_SH_CLAS_SAMPLE  SUPPORT_VECTOR_MACHINES
CLASSIFICATION            T_SVM_CLAS_SAMPLE    SUPPORT_VECTOR_MACHINES
CLASSIFICATION            SVMC_SH_CLAS_SAMPLE  SUPPORT_VECTOR_MACHINES
REGRESSION                SVMR_SH_REGR_SAMPLE  SUPPORT_VECTOR_MACHINES

USER_MINING_MODEL_ATTRIBUTES

Example 2-2 describes USER_MINING_MODEL_ATTRIBUTES and shows a sample query. Attributes are the predictors or conditions that are used to create models and score data.

Example 2-2 USER_MINING_MODEL_ATTRIBUTES

describe user_mining_model_attributes
 Name                             Null?    Type
 -------------------------------- -------- ----------------------
 MODEL_NAME                       NOT NULL VARCHAR2(128)
 ATTRIBUTE_NAME                   NOT NULL VARCHAR2(128)
 ATTRIBUTE_TYPE                            VARCHAR2(11)
 DATA_TYPE                                 VARCHAR2(106)
 DATA_LENGTH                               NUMBER
 DATA_PRECISION                            NUMBER
 DATA_SCALE                                NUMBER
 USAGE_TYPE                                VARCHAR2(8)
 TARGET                                    VARCHAR2(3)
 ATTRIBUTE_SPEC                            VARCHAR2(4000)

The following query returns the attributes of an SVM classification model named T_SVM_CLAS_SAMPLE in the user's schema. The model has both categorical and numerical attributes and includes one attribute that is unstructured text.

SELECT model_name, attribute_name, attribute_type, target
    FROM user_mining_model_attributes
    WHERE model_name IN 'T_SVM_CLAS_SAMPLE';
 
MODEL_NAME           ATTRIBUTE_NAME            ATTRIBUTE_TYPE       TAR
-------------------- ------------------------- -------------------- ---
T_SVM_CLAS_SAMPLE    PRINTER_SUPPLIES          NUMERICAL            NO
T_SVM_CLAS_SAMPLE    CUST_INCOME_LEVEL         CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    AGE                       NUMERICAL            NO
T_SVM_CLAS_SAMPLE    HOME_THEATER_PACKAGE      NUMERICAL            NO
T_SVM_CLAS_SAMPLE    CUST_GENDER               CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    CUST_MARITAL_STATUS       CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    BOOKKEEPING_APPLICATION   NUMERICAL            NO
T_SVM_CLAS_SAMPLE    EDUCATION                 CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    HOUSEHOLD_SIZE            CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    FLAT_PANEL_MONITOR        NUMERICAL            NO
T_SVM_CLAS_SAMPLE    COUNTRY_NAME              CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    OS_DOC_SET_KANJI          NUMERICAL            NO
T_SVM_CLAS_SAMPLE    COMMENTS                  TEXT                 NO
T_SVM_CLAS_SAMPLE    OCCUPATION                CATEGORICAL          NO
T_SVM_CLAS_SAMPLE    Y_BOX_GAMES               NUMERICAL            NO
T_SVM_CLAS_SAMPLE    BULK_PACK_DISKETTES       NUMERICAL            NO
T_SVM_CLAS_SAMPLE    YRS_RESIDENCE             NUMERICAL            NO
T_SVM_CLAS_SAMPLE    AFFINITY_CARD             CATEGORICAL          YES

See Also:

USER_MINING_MODEL_SETTINGS

Example 2-3 describes USER_MINING_MODEL_SETTINGS and shows a sample query. Settings influence model behavior. Settings may be specific to an algorithm or to a mining function, or they may be general.

Example 2-3 USER_MINING_MODEL_SETTINGS

describe user_mining_model_settings
 Name                             Null?    Type
 -------------------------------- -------- ----------------------
 MODEL_NAME                       NOT NULL VARCHAR2(128)
 SETTING_NAME                     NOT NULL VARCHAR2(30)
 SETTING_VALUE                             VARCHAR2(4000)
 SETTING_TYPE                              VARCHAR2(7)

The following query returns the settings for a model named SVD_SH_SAMPLE in the user's schema. The model uses the Singular Value Decomposition algorithm for feature extraction.

SELECT model_name, setting_name, setting_value, setting_type
    FROM user_mining_model_settings
    WHERE model_name IN 'SVD_SH_SAMPLE'
    ORDER BY setting_name;
 
MODEL_NAME     SETTING_NAME                 SETTING_VALUE                SETTING
-------------- ---------------------------- ---------------------------- -------
SVD_SH_SAMPLE  ALGO_NAME                     ALGO_SINGULAR_VALUE_DECOMP  INPUT
SVD_SH_SAMPLE  FEAT_NUM_FEATURES             10                          DEFAULT
SVD_SH_SAMPLE  ODMS_APPROXIMATE_COMPUTATION  ODMS_APPR_COMP_DISABLE      DEFAULT
SVD_SH_SAMPLE  PREP_AUTO                     OFF                         DEFAULT
SVD_SH_SAMPLE  SVDS_SCORING_MODE             SVDS_SCORING_SVD            DEFAULT
SVD_SH_SAMPLE  SVDS_U_MATRIX_OUTPUT          SVDS_U_MATRIX_ENABLE        INPUT

See Also:

Data Mining PL/SQL Packages

The PL/SQL interface to Oracle Data Mining is implemented in three packages, as shown in Table 2-2.

Table 2-2 Data Mining PL/SQL Packages

Package Name Description

DBMS_DATA_MINING

Routines for creating and managing mining models

DBMS_DATA_MINING_TRANSFORM

Routines for transforming the data for mining

DBMS_PREDICTIVE_ANALYTICS

Routines that perform predictive analytics


DBMS_DATA_MINING

The DBMS_DATA_MINING package contains routines for creating mining models, for performing operations on mining models, and for querying mining models. The package includes routines for:

  • Creating, dropping, and performing other DDL operations on mining models

  • Obtaining detailed information about model attributes, rules, and other information internal to the model (model details)

  • Computing test metrics for classification models

  • Specifying costs for classification models

  • Exporting and importing models

DDL for Mining Models

Table 2-3 describes the DDL operations for mining models.

Table 2-3 DDL for Mining Models

DDL DBMS_DATA_MINING Description

Create model

CREATE_MODEL

Creates a model

Drop model

DROP_MODEL

Drops a model

Rename model

RENAME_MODEL

Renames a model

Add cost matrix

ADD_COST_MATRIX

Adds a cost matrix to a classification model

Remove cost matrix

REMOVE_COST_MATRIX

Removes the cost matrix from a classification model

Alter reverse expression

ALTER_REVERSE_EXPRESSION

Alters the reverse transformation expression associated with a model


The DBMS_DATA_MINING package contains a number of functions that return information about mining models. For example, the query in Example 2-4 returns details about feature 1 of the feature extraction model named NMF_SH_Sample.

Example 2-4 Sample Model Details Query

SELECT F.feature_id,
       A.attribute_name,
       A.attribute_value,
       A.coefficient
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF('NMF_SH_Sample')) F,
       TABLE(F.attribute_set) A
WHERE feature_id = 1
  AND attribute_name in ('AFFINITY_CARD','AGE','COUNTRY_NAME')
ORDER BY feature_id,attribute_name,attribute_value;

DBMS_DATA_MINING_TRANSFORM

The DBMS_DATA_MINING_TRANSFORM package contains routines that perform data transformations such as binning, normalization, and outlier treatment. The package includes routines for:

  • Specifying transformations in a format that can be embedded in a mining model

  • Specifying transformations as relational views (external to mining model objects)

  • Specifying distinct properties for columns in the build data. For example, you can specify that the column should be interpreted as unstructured text, or that the column should be excluded from Automatic Data Preparation.

Table 2-4 DBMS_DATA_MINING_TRANSFORM Transformation Methods

Transformation Method Description

XFORM interface

CREATE, INSERT, and XFORM routines specify transformations in external views

STACK interface

CREATE, INSERT, and XFORM routines specify transformations for embedding in a model

SET_TRANSFORM

Specifies transformations for embedding in a model


The statements in Example 2-5 create an SVM classification model called T_SVM_Clas_sample with an embedded transformation that causes the comments attribute to be treated as unstructured text data.

Example 2-5 Sample Embedded Transformation

DECLARE
  xformlist dbms_data_mining_transform.TRANSFORM_LIST;
BEGIN
  dbms_data_mining_transform.SET_TRANSFORM(
    xformlist, 'comments', null, 'comments', null, 'TEXT');
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'T_SVM_Clas_sample',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'mining_build_text',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 't_svmc_sample_settings',
    xform_list => xformlist);
END;
/ 

DBMS_PREDICTIVE_ANALYTICS

The DBMS_PREDICTIVE_ANALYTICS package contains routines that perform an automated form of data mining known as predictive analytics. With predictive analytics, you do not need to be aware of model building or scoring. All mining activities are handled internally by the procedure. The DBMS_PREDICTIVE_ANALYTICS package includes these routines:

  • EXPLAIN ranks attributes in order of influence in explaining a target column.

  • PREDICT predicts the value of a target column based on values in the input data.

  • PROFILE generates rules that describe the cases from the input data.

The EXPLAIN statement in Example 2-6 lists attributes in the view mining_data_build_v in order of their importance in predicting affinity_card.

Example 2-6 Sample EXPLAIN Statement

BEGIN 
    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN( 
        data_table_name      => 'mining_data_build_v', 
        explain_column_name  => 'affinity_card', 
        result_table_name    => 'explain_results'); 
END; 
/

See Also:

DBMS_PREDICTIVE_ANALYTICS in Oracle Database PL/SQL Packages and Types Reference

Data Mining SQL Scoring Functions

The Data Mining SQL language functions use Oracle Data Mining to score data. The functions can apply a mining model schema object to the data, or they can dynamically mine the data by executing an analytic clause. SQL functions are available for all the data mining algorithms that support the scoring operation. Table 2-5 lists the Data Mining SQL functions.

Table 2-5 Data Mining SQL Functions

Function Description

CLUSTER_ID

Returns the ID of the predicted cluster

CLUSTER_DETAILS

Returns detailed information about the predicted cluster

CLUSTER_DISTANCE

Returns the distance from the centroid of the predicted cluster

CLUSTER_PROBABILITY

Returns the probability of a case belonging to a given cluster

CLUSTER_SET

Returns a list of all possible clusters to which a given case belongs along with the associated probability of inclusion

FEATURE_ID

Returns the ID of the feature with the highest coefficient value

FEATURE_DETAILS

Returns detailed information about the predicted feature

FEATURE_SET

Returns a list of objects containing all possible features along with the associated coefficients

FEATURE_VALUE

Returns the value of the predicted feature

PREDICTION

Returns the best prediction for the target

PREDICTION_BOUNDS

(GLM only) Returns the upper and lower bounds of the interval wherein the predicted values (linear regression) or probabilities (logistic regression) lie.

PREDICTION_COST

Returns a measure of the cost of incorrect predictions

PREDICTION_DETAILS

Returns detailed information about the prediction

PREDICTION_PROBABILITY

Returns the probability of the prediction

PREDICTION_SET

Returns the results of a classification model, including the predictions and associated probabilities for each case


Example 2-7 shows a query that returns the results of the CLUSTER_ID function. The query applies the model em_sh_clus_sample, which finds groups of customers that share certain characteristics. The query returns the identifiers of the clusters and the number of customers in each cluster.

Example 2-7 CLUSTER_ID Function

-- -List the clusters into which the customers in this
-- -data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt 
  FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;


SQL> -- List the clusters into which the customers in this
SQL> -- data set have been grouped.
SQL> --
SQL> SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt
  2    FROM mining_data_apply_v
  3  GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
  4  ORDER BY cnt DESC;
 
      CLUS        CNT
---------- ----------
         9        311
         3        294
         7        215
        12        201
        17        123
        16        114
        14         86
        19         64
        15         56
        18         36

See Also: