22 About the Data Mining API

Overview of the Oracle Data Mining application programming interface (API) components.

22.1 About Mining Models

Mining models are database schema objects that perform data mining.

As with all schema objects, access to mining models is controlled by database privileges. Models can be exported and imported. They support comments, and they can be tracked in the Database auditing system.

Mining models are created by the CREATE_MODEL procedure in the DBMS_DATA_MINING PL/SQL package. Models are created for a specific mining function, and they use a specific algorithm to perform that function. Mining function is a data mining term that refers to a class of mining problems to be solved. Examples of mining functions are: regression, classification, attribute importance, clustering, anomaly detection, and feature extraction. Oracle Data Mining supports one or more algorithms for each mining function.

Note:

Most types of mining models can be used to score data. However, it is possible to score data without applying a model. Dynamic scoring and predictive analytics return scoring results without a user-supplied model. They create and apply transient models that are not visible to you.

22.2 Data Mining Data Dictionary Views

Lists Oracle Data Mining data dictionary views.

The data dictionary views for Oracle Data Mining are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.

Table 22-1 Data Dictionary Views for Oracle Data Mining

View Name Description

ALL_MINING_MODELS

Provides information about all accessible mining models

ALL_MINING_MODEL_ATTRIBUTES

Provides information about the attributes of all accessible mining models

ALL_MINING_MODEL_PARTITIONS

Provides information about the partitions of all accessible partitioned mining models

ALL_MINING_MODEL_SETTINGS

Provides information about the configuration settings for all accessible mining models

ALL_MINING_MODEL_VIEWS

Provides information about the model views for all accessible mining models

ALL_MINING_MODEL_XFORMS

Provides the user-specified transformations embedded in all accessible mining models.

22.2.1 ALL_MINING_MODELS

Describes an example of ALL_MINING_MODELS and shows a sample query.

The following example describes ALL_MINING_MODELS and shows a sample query.

Example 22-1 ALL_MINING_MODELS

 describe ALL_MINING_MODELS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 MINING_FUNCTION                                    VARCHAR2(30)
 ALGORITHM                                          VARCHAR2(30)
 CREATION_DATE                             NOT NULL DATE
 BUILD_DURATION                                     NUMBER
 MODEL_SIZE                                         NUMBER
 PARTITIONED                                        VARCHAR2(3)
 COMMENTS                                           VARCHAR2(4000)

The following query returns the models accessible to you that use the Support Vector Machine algorithm.

SELECT mining_function, model_name
    FROM all_mining_models
    WHERE algorithm = 'SUPPORT_VECTOR_MACHINES'
    ORDER BY mining_function, model_name;

MINING_FUNCTION           MODEL_NAME                                            
------------------------- --------------------                                  
CLASSIFICATION            PART2_CLAS_SAMPLE                                     
CLASSIFICATION            PART_CLAS_SAMPLE                                      
CLASSIFICATION            SVMC_SH_CLAS_SAMPLE                                   
CLASSIFICATION            SVMO_SH_CLAS_SAMPLE                                   
CLASSIFICATION            T_SVM_CLAS_SAMPLE                                     
REGRESSION                SVMR_SH_REGR_SAMPLE  

22.2.2 ALL_MINING_MODEL_ATTRIBUTES

Describes an example of ALL_MINING_MODEL_ATTRIBUTES and shows a sample query.

The following example describes ALL_MINING_MODEL_ATTRIBUTES and shows a sample query. Attributes are the predictors or conditions that are used to create models and score data.

Example 22-2 ALL_MINING_MODEL_ATTRIBUTES

describe ALL_MINING_MODEL_ATTRIBUTES
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 ATTRIBUTE_NAME                            NOT NULL VARCHAR2(128)
 ATTRIBUTE_TYPE                                     VARCHAR2(11)
 DATA_TYPE                                          VARCHAR2(106)
 DATA_LENGTH                                        NUMBER
 DATA_PRECISION                                     NUMBER
 DATA_SCALE                                         NUMBER
 USAGE_TYPE                                         VARCHAR2(8)
 TARGET                                             VARCHAR2(3)
 ATTRIBUTE_SPEC                                     VARCHAR2(4000)

The following query returns the attributes of an SVM classification model named T_SVM_CLAS_SAMPLE. The model has both categorical and numerical attributes and includes one attribute that is unstructured text.

SELECT attribute_name, attribute_type, target
    FROM all_mining_model_attributes
    WHERE model_name = 'T_SVM_CLAS_SAMPLE'
    ORDER BY attribute_name;

ATTRIBUTE_NAME            ATTRIBUTE_TYPE       TAR                              
------------------------- -------------------- ---                              
AFFINITY_CARD             CATEGORICAL          YES                              
AGE                       NUMERICAL            NO                               
BOOKKEEPING_APPLICATION   NUMERICAL            NO                               
BULK_PACK_DISKETTES       NUMERICAL            NO                               
COMMENTS                  TEXT                 NO                               
COUNTRY_NAME              CATEGORICAL          NO                               
CUST_GENDER               CATEGORICAL          NO                               
CUST_INCOME_LEVEL         CATEGORICAL          NO                               
CUST_MARITAL_STATUS       CATEGORICAL          NO                               
EDUCATION                 CATEGORICAL          NO                               
FLAT_PANEL_MONITOR        NUMERICAL            NO                               
HOME_THEATER_PACKAGE      NUMERICAL            NO                               
HOUSEHOLD_SIZE            CATEGORICAL          NO                               
OCCUPATION                CATEGORICAL          NO                               
OS_DOC_SET_KANJI          NUMERICAL            NO                               
PRINTER_SUPPLIES          NUMERICAL            NO                               
YRS_RESIDENCE             NUMERICAL            NO                               
Y_BOX_GAMES               NUMERICAL            NO 

22.2.3 ALL_MINING_MODEL_PARTITIONS

Describes an example of ALL_MINING_MODEL_PARTITIONS and shows a sample query.

The following example describes ALL_MINING_MODEL_PARTITIONS and shows a sample query.

Example 22-3 ALL_MINING_MODEL_PARTITIONS

describe ALL_MINING_MODEL_PARTITIONS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 PARTITION_NAME                                     VARCHAR2(128)
 POSITION                                           NUMBER
 COLUMN_NAME                               NOT NULL VARCHAR2(128)
 COLUMN_VALUE                                       VARCHAR2(4000) 

The following query returns the partition names and partition key values for two partitioned models. Model PART2_CLAS_SAMPLE has a two column partition key with system-generated partition names.

SELECT model_name, partition_name, position, column_name, column_value
    FROM all_mining_model_partitions
    ORDER BY model_name, partition_name, position;

MODEL_NAME           PARTITION_ POSITION COLUMN_NAME          COLUMN_VALUE      
-------------------- ---------- -------- -------------------- ---------------   
PART2_CLAS_SAMPLE    DM$$_P0           1 CUST_GENDER          F                 
PART2_CLAS_SAMPLE    DM$$_P0           2 CUST_INCOME_LEVEL    HIGH              
PART2_CLAS_SAMPLE    DM$$_P1           1 CUST_GENDER          F                 
PART2_CLAS_SAMPLE    DM$$_P1           2 CUST_INCOME_LEVEL    LOW               
PART2_CLAS_SAMPLE    DM$$_P2           1 CUST_GENDER          F                 
PART2_CLAS_SAMPLE    DM$$_P2           2 CUST_INCOME_LEVEL    MEDIUM            
PART2_CLAS_SAMPLE    DM$$_P3           1 CUST_GENDER          M                 
PART2_CLAS_SAMPLE    DM$$_P3           2 CUST_INCOME_LEVEL    HIGH              
PART2_CLAS_SAMPLE    DM$$_P4           1 CUST_GENDER          M                 
PART2_CLAS_SAMPLE    DM$$_P4           2 CUST_INCOME_LEVEL    LOW               
PART2_CLAS_SAMPLE    DM$$_P5           1 CUST_GENDER          M                 
PART2_CLAS_SAMPLE    DM$$_P5           2 CUST_INCOME_LEVEL    MEDIUM            
PART_CLAS_SAMPLE     F                 1 CUST_GENDER          F                 
PART_CLAS_SAMPLE     M                 1 CUST_GENDER          M                 
PART_CLAS_SAMPLE     U                 1 CUST_GENDER          U         

Related Topics

22.2.4 ALL_MINING_MODEL_SETTINGS

Describes an example of ALL_MINING_MODEL_SETTINGS and shows a sample query.

The following example describes ALL_MINING_MODEL_SETTINGS and shows a sample query. Settings influence model behavior. Settings may be specific to an algorithm or to a mining function, or they may be general.

Example 22-4 ALL_MINING_MODEL_SETTINGS

 describe ALL_MINING_MODEL_SETTINGS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 SETTING_NAME                              NOT NULL VARCHAR2(30)
 SETTING_VALUE                                      VARCHAR2(4000)
 SETTING_TYPE                                       VARCHAR2(7)

The following query returns the settings for a model named SVD_SH_SAMPLE. The model uses the Singular Value Decomposition algorithm for feature extraction.

SELECT setting_name, setting_value, setting_type
    FROM all_mining_model_settings
    WHERE model_name = 'SVD_SH_SAMPLE'
    ORDER BY setting_name;

SETTING_NAME                   SETTING_VALUE                  SETTING           
------------------------------ ------------------------------ -------           
ALGO_NAME                      ALGO_SINGULAR_VALUE_DECOMP     INPUT             
ODMS_MISSING_VALUE_TREATMENT   ODMS_MISSING_VALUE_AUTO        DEFAULT           
ODMS_SAMPLING                  ODMS_SAMPLING_DISABLE          DEFAULT           
PREP_AUTO                      OFF                            INPUT             
SVDS_SCORING_MODE              SVDS_SCORING_SVD               DEFAULT           
SVDS_U_MATRIX_OUTPUT           SVDS_U_MATRIX_ENABLE           INPUT  

22.2.5 ALL_MINING_MODEL_VIEWS

Describes an example of ALL_MINING_MODEL_VIEWS and shows a sample query.

The following example describes ALL_MINING_MODEL_VIEWS and shows a sample query. Model views provide details on the models.

Example 22-5 ALL_MINING_MODEL_VIEWS

describe ALL_MINING_MODEL_VIEWS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 VIEW_NAME                                 NOT NULL VARCHAR2(128)
 VIEW_TYPE                                          VARCHAR2(128)

The following query returns the model views for a model SVD_SH_SAMPLE. The model uses the Singular Value Decomposition algorithm for feature extraction.

SELECT view_name, view_type
    FROM all_mining_model_views
    WHERE model_name = 'SVD_SH_SAMPLE'
    ORDER BY view_name;

VIEW_NAME                 VIEW_TYPE                                             
------------------------- --------------------------------------------------    
DM$VESVD_SH_SAMPLE        Singular Value Decomposition S Matrix                 
DM$VGSVD_SH_SAMPLE        Global Name-Value Pairs                               
DM$VNSVD_SH_SAMPLE        Normalization and Missing Value Handling              
DM$VSSVD_SH_SAMPLE        Computed Settings                                     
DM$VUSVD_SH_SAMPLE        Singular Value Decomposition U Matrix                 
DM$VVSVD_SH_SAMPLE        Singular Value Decomposition V Matrix                 
DM$VWSVD_SH_SAMPLE        Model Build Alerts 

Related Topics

22.2.6 ALL_MINING_MODEL_XFORMS

Describes an example of ALL_MINING_MODEL_XFORMS and provides a sample query.

The following example describes ALL_MINING_MODEL_XFORMS and provides a sample query.

Example 22-6 ALL_MINING_MODEL_XFORMS

describe ALL_MINING_MODEL_XFORMS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(128)
 MODEL_NAME                                NOT NULL VARCHAR2(128)
 ATTRIBUTE_NAME                                     VARCHAR2(128)
 ATTRIBUTE_SUBNAME                                  VARCHAR2(4000)
 ATTRIBUTE_SPEC                                     VARCHAR2(4000)
 EXPRESSION                                         CLOB
 REVERSE                                            VARCHAR2(3)

The following query returns the embedded transformations for a model PART2_CLAS_SAMPLE.

SELECT attribute_name, expression
    FROM all_mining_model_xforms
    WHERE model_name = 'PART2_CLAS_SAMPLE'
    ORDER BY attribute_name;

ATTRIBUTE_NAME                                                                  
-------------------------                                                       
EXPRESSION                                                                      
--------------------------------------------------------------------------------
CUST_INCOME_LEVEL                                                               
CASE CUST_INCOME_LEVEL WHEN 'A: Below 30,000' THEN 'LOW'                        
    WHEN 'L: 300,000 and above' THEN 'HIGH'                                     
    ELSE 'MEDIUM' END     

Related Topics

22.3 Data Mining PL/SQL Packages

The PL/SQL interface to Oracle Data Mining is implemented in three packages.

The following table displays the PL/SQL packages.

Table 22-2 Data Mining PL/SQL Packages

Package Name Description

DBMS_DATA_MINING

Routines for creating and managing mining models

DBMS_DATA_MINING_TRANSFORM

Routines for transforming the data for mining

DBMS_PREDICTIVE_ANALYTICS

Routines that perform predictive analytics

22.3.1 DBMS_DATA_MINING

Understand the routines of DBMS_DATA_MINING package.

The DBMS_DATA_MINING package contains routines for creating mining models, for performing operations on mining models, and for querying mining models. The package includes routines for:

  • Creating, dropping, and performing other DDL operations on mining models

  • Obtaining detailed information about model attributes, rules, and other information internal to the model (model details)

  • Computing test metrics for classification models

  • Specifying costs for classification models

  • Exporting and importing models

  • Building models using Oracle's native algorithms as well as algorithms written in R

22.3.2 DBMS_DATA_MINING_TRANSFORM

Understand the routines of DBMS_DATA_MINING_TRANSFORM package.

The DBMS_DATA_MINING_TRANSFORM package contains routines that perform data transformations such as binning, normalization, and outlier treatment. The package includes routines for:

  • Specifying transformations in a format that can be embedded in a mining model.

  • Specifying transformations as relational views (external to mining model objects).

  • Specifying distinct properties for columns in the build data. For example, you can specify that the column must be interpreted as unstructured text, or that the column must be excluded from Automatic Data Preparation.

22.3.2.1 Transformation Methods in DBMS_DATA_MINING_TRANSFORM

Summarizes the methods for transforming data in DBMS_DATA_MINING_TRANSFORM package.

Table 22-3 DBMS_DATA_MINING_TRANSFORM Transformation Methods

Transformation Method Description

XFORM interface

CREATE, INSERT, and XFORM routines specify transformations in external views

STACK interface

CREATE, INSERT, and XFORM routines specify transformations for embedding in a model

SET_TRANSFORM

Specifies transformations for embedding in a model

The statements in the following example create an Support Vector Machine (SVM) Classification model called T_SVM_Clas_sample with an embedded transformation that causes the comments attribute to be treated as unstructured text data.

Example 22-7 Sample Embedded Transformation

DECLARE
  xformlist dbms_data_mining_transform.TRANSFORM_LIST;
BEGIN
  dbms_data_mining_transform.SET_TRANSFORM(
    xformlist, 'comments', null, 'comments', null, 'TEXT');
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'T_SVM_Clas_sample',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'mining_build_text',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 't_svmc_sample_settings',
    xform_list => xformlist);
END;
/ 

22.3.3 DBMS_PREDICTIVE_ANALYTICS

Understand the routines of DBMS_PREDICTIVE_ANALYTICS package.

The DBMS_PREDICTIVE_ANALYTICS package contains routines that perform an automated form of data mining known as predictive analytics. With predictive analytics, you do not need to be aware of model building or scoring. All mining activities are handled internally by the procedure. The DBMS_PREDICTIVE_ANALYTICS package includes these routines:

  • EXPLAIN ranks attributes in order of influence in explaining a target column.

  • PREDICT predicts the value of a target column based on values in the input data.

  • PROFILE generates rules that describe the cases from the input data.

The EXPLAIN statement in the following example lists attributes in the view mining_data_build_v in order of their importance in predicting affinity_card.

Example 22-8 Sample EXPLAIN Statement

BEGIN 
    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN( 
        data_table_name      => 'mining_data_build_v', 
        explain_column_name  => 'affinity_card', 
        result_table_name    => 'explain_results'); 
END; 
/

22.4 Data Mining SQL Scoring Functions

Understand the different data mining SQL scoring functions.

The Data Mining SQL language functions use Oracle Data Mining to score data. The functions can apply a mining model schema object to the data, or they can dynamically mine the data by executing an analytic clause. SQL functions are available for all the data mining algorithms that support the scoring operation. All Data Mining SQL functions, as listed in the following table can operate on R Mining Model with the corresponding mining function. However, the functions are not limited to the ones listed here.

Table 22-4 Data Mining SQL Functions

Function Description

CLUSTER_ID

Returns the ID of the predicted cluster

CLUSTER_DETAILS

Returns detailed information about the predicted cluster

CLUSTER_DISTANCE

Returns the distance from the centroid of the predicted cluster

CLUSTER_PROBABILITY

Returns the probability of a case belonging to a given cluster

CLUSTER_SET

Returns a list of all possible clusters to which a given case belongs along with the associated probability of inclusion

FEATURE_COMPARE Compares two similar and dissimilar set of texts from two different documents or keyword phrases or a combination of both

FEATURE_ID

Returns the ID of the feature with the highest coefficient value

FEATURE_DETAILS

Returns detailed information about the predicted feature

FEATURE_SET

Returns a list of objects containing all possible features along with the associated coefficients

FEATURE_VALUE

Returns the value of the predicted feature

ORA_DM_PARTITION_NAME Returns the partition names for a partitioned model

PREDICTION

Returns the best prediction for the target

PREDICTION_BOUNDS

(GLM only) Returns the upper and lower bounds of the interval wherein the predicted values (linear regression) or probabilities (logistic regression) lie.

PREDICTION_COST

Returns a measure of the cost of incorrect predictions

PREDICTION_DETAILS

Returns detailed information about the prediction

PREDICTION_PROBABILITY

Returns the probability of the prediction

PREDICTION_SET

Returns the results of a classification model, including the predictions and associated probabilities for each case

The following example shows a query that returns the results of the CLUSTER_ID function. The query applies the model em_sh_clus_sample, which finds groups of customers that share certain characteristics. The query returns the identifiers of the clusters and the number of customers in each cluster.

Example 22-9 CLUSTER_ID Function

-- -List the clusters into which the customers in this
-- -data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt 
  FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;


SQL> -- List the clusters into which the customers in this
SQL> -- data set have been grouped.
SQL> --
SQL> SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt
  2    FROM mining_data_apply_v
  3  GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
  4  ORDER BY cnt DESC;
 
      CLUS        CNT
---------- ----------
         9        311
         3        294
         7        215
        12        201
        17        123
        16        114
        14         86
        19         64
        15         56
        18         36