About Scoring and Deployment
Scoring is the application of models to new data. In Oracle Machine Learning for SQL, scoring is performed by SQL language functions.
Predictive functions perform classification, regression, or anomaly detection. Clustering functions assign rows to clusters. Feature extraction functions transform the input data to a set of higher order predictors. A scoring procedure is also available in the DBMS_DATA_MINING
PL/SQL package.
Deployment refers to the use of models in a target environment. Once the models have been built, the challenges come in deploying them to obtain the best results, and in maintaining them within a production environment. Deployment can be any of the following:
-
Scoring data either for batch or real-time results. Scores can include predictions, probabilities, rules, and other statistics.
-
Extracting model details to produce reports. For example: clustering rules, decision tree rules, or attribute rankings from an Attribute Importance model.
-
Extending the business intelligence infrastructure of a data warehouse by incorporating machine learning results in applications or operational systems.
-
Moving a model from the database where it was built to the database where it used for scoring (export/import)
OML4SQL supports all of these deployment scenarios.
Note:
OML4SQL scoring operations support parallel execution. When parallel execution is enabled, multiple CPU and I/O resources are applied to the execution of a single database operation.
Parallel execution offers significant performance improvements, especially for operations that involve complex queries and large databases typically associated with decision support systems (DSS) and data warehouses.
Use the Oracle Machine Learning for SQL Functions
Some of the benefits of using SQL functions for Oracle Machine Learning for SQL are listed.
The OML4SQL functions provide the following benefits:
-
Models can be easily deployed within the context of existing SQL applications.
-
Scoring operations take advantage of existing query execution functionality. This provides performance benefits.
-
Scoring results are pipelined, enabling the rows to be processed without requiring materialization.
The machine learning functions produce a score for each row in the selection. The functions can apply a machine learning model schema object to compute the score, or they can score dynamically without a pre-defined model, as described in "Dynamic Scoring".
Choose the Predictors
You can select different attributes as predictors in a PREDICTION
function through a USING
clause.
The OML4SQL functions support a USING
clause that specifies which attributes to use for scoring. You can specify some or all of the attributes in the selection and you can specify expressions. The following examples all use the PREDICTION
function to find the customers who are likely to use an affinity card, but each example uses a different set of predictors.
When predictor values are not in the training data, the models score categorical values that were not in the training data without error. A score is produced using the remaining predictors. This enables batch scoring that does not fail because of a single record with an invalid value. Also, in some algorithms, like k-Means or Gaussian SVM, a new value can change the prediction in a meaningful way, such as resulting in larger distances with the unknown value. Furthermore, additional columns that were not present for building may be present in the table or view provided for scoring, and only the columns matching the model signature are used. Also, scoring may be performed with fewer predictors than are listed in the model signature.
In the case of partitioned models, a NULL
score is produced if the
partition value is invalid. If the partition column value is omitted, an error message is
returned.
The query in Example 3-7 uses all the predictors.
The query in Example 3-8 uses only gender, marital status, occupation, and income as predictors.
The query in Example 3-9 uses three attributes and an expression as predictors. The prediction is based on gender, marital status, occupation, and the assumption that all customers are in the highest income bracket.
Example 3-7 Using All Predictors
The dt_sh_clas_sample model is created by the oml4sql-classification-decision-tree.sql
example.
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING *) = 1 GROUP BY cust_gender ORDER BY cust_gender;
The output is follows:
C CNT AVG_AGE
- ---------- ----------
F 25 38
M 213 43
Example 3-8 Using Some Predictors
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING cust_gender,cust_marital_status, occupation, cust_income_level) = 1 GROUP BY cust_gender ORDER BY cust_gender;
The output is as follows:
C CNT AVG_AGE
- ---------- ----------
F 30 38
M 186 43
Example 3-9 Using Some Predictors and an Expression
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING cust_gender, cust_marital_status, occupation, 'L: 300,000 and above' AS cust_income_level) = 1 GROUP BY cust_gender ORDER BY cust_gender;
The output is follows:
C CNT AVG_AGE
- ---------- ----------
F 30 38
M 186 43
Single-Record Scoring
You can score a single record which produces 0 and 1 to predict customers who are unlikely or likely to use an affinity card.
The Oracle Machine Learning for SQL functions can produce a score for a single record, as shown in Example 3-10 and Example 3-11.
Example 3-10 returns a prediction for customer 102001 by applying the classification model NB_SH_Clas_sample. The resulting score is 0, meaning that this customer is unlikely to use an affinity card. The NB_SH_Clas_Sample model is created by the oml4sql-classification-naive-bayes.sql
example.
Example 3-11 returns a prediction for 'Affinity card is great' as the comments attribute by applying the text machine learning model T_SVM_Clas_sample. The resulting score is 1, meaning that this customer is likely to use an affinity card. The T_SVM_Clas_sample model is created by the oml4sql-classification-text-analysis-svm.sql
example.
Example 3-10 Scoring a Single Customer or a Single Text Expression
SELECT PREDICTION (NB_SH_Clas_Sample USING *) FROM sh.customers where cust_id = 102001;
The output is as follows:
PREDICTION(NB_SH_CLAS_SAMPLEUSING*)
-----------------------------------
0
Example 3-11 Scoring a Single Text Expression
SELECT PREDICTION(T_SVM_Clas_sample USING 'Affinity card is great' AS comments) FROM DUAL;
The output is as follows:
PREDICTION(T_SVM_CLAS_SAMPLEUSING'AFFINITYCARDISGREAT'ASCOMMENTS)
-----------------------------------------------------------------
1
Prediction Details
Prediction details are XML strings that provide information about the score.
Details are available for all types of scoring: clustering, feature extraction, classification, regression, and anomaly detection. Details are available whether scoring is dynamic or the result of model apply.
The details functions, CLUSTER_DETAILS
, FEATURE_DETAILS
, and PREDICTION_DETAILS
return the actual value of attributes used for scoring and the relative importance of the attributes in determining the score. By default, the functions return the five most important attributes in descending order of importance.
Cluster Details
Shows an example of the CLUSTER_DETAILS
function.
For the most likely cluster assignments of customer 100955 (probability of
assignment > 20%), the query in the following example
produces the five attributes that have the most impact for each of the likely clusters. The
clustering functions apply an Expectation Maximization model named em_sh_clus_sample to the
data selected from mining_data_apply_v
. The "5" specified in
CLUSTER_DETAILS
is not required, because five attributes are returned by
default. The em_sh_clus_sample model is created by the
oml4sql-clustering-expectation-maximization.sql
example.
Example 3-12 Cluster Details
SELECT S.cluster_id, probability prob, CLUSTER_DETAILS(em_sh_clus_sample, S.cluster_id, 5 USING T.*) det FROM (SELECT v.*, CLUSTER_SET(em_sh_clus_sample, NULL, 0.2 USING *) pset FROM mining_data_apply_v v WHERE cust_id = 100955) T, TABLE(T.pset) S ORDER BY 2 DESC;
The output is as follows:
CLUSTER_ID PROB DET
---------- ----- ----------------------------------------------------------------------------
14 .6761 <Details algorithm="Expectation Maximization" cluster="14">
<Attribute name="AGE" actualValue="51" weight=".676" rank="1"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".557" rank="2"/>
<Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".412" rank="3"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".171" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION"actualValue="1" weight="-.003"
rank="5"/>
</Details>
3 .3227 <Details algorithm="Expectation Maximization" cluster="3">
<Attribute name="YRS_RESIDENCE" actualValue="3" weight=".323" rank="1"/>
<Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".265" rank="2"/>
<Attribute name="EDUCATION" actualValue="HS-grad" weight=".172" rank="3"/>
<Attribute name="AFFINITY_CARD" actualValue="0" weight=".125" rank="4"/>
<Attribute name="OCCUPATION" actualValue="Crafts" weight=".055" rank="5"/>
</Details>
Feature Details
Shows an example of the FEATURE_DETAILS
function.
The query in the following example returns the three attributes that have the greatest impact on the top Principal Components Analysis (PCA) projection for customer 101501. The FEATURE_DETAILS
function applies a Singular Value Decomposition (SVD) model named svd_sh_sample to the data selected from the svd_sh_sample_build_num table. The table and model are created by the oml4sql-singular-value-decomposition.sql
example.
Example 3-13 Feature Details
SELECT FEATURE_DETAILS(svd_sh_sample, 1, 3 USING *) proj1det FROM svd_sh_sample_build_num WHERE CUST_ID = 101501;
The output is as follows:
PROJ1DET
--------------------------------------------------------------------------------
<Details algorithm="Singular Value Decomposition" feature="1">
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".352" rank="1"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".249" rank="2"/>
<Attribute name="AGE" actualValue="41" weight=".063" rank="3"/>
</Details>
Prediction Details
Shows an examples of PREDICTION_DETAILS
function.
The query in the following example returns the attributes that are most important in predicting the age of customer 100010. The prediction functions apply a Generalized Linear Model regression model named GLMR_SH_Regr_sample to the data selected from mining_data_apply_v
. The GLMR_SH_Regr_sample model is created by the oml4sql-regression-glm.sql
example.
Example 3-14 Prediction Details for Regression
SELECT cust_id, PREDICTION(GLMR_SH_Regr_sample USING *) pr, PREDICTION_DETAILS(GLMR_SH_Regr_sample USING *) pd FROM mining_data_apply_v WHERE CUST_ID = 100010;
The output is as follows:
CUST_ID PR PD
------- ----- -----------
100010 25.45 <Details algorithm="Generalized Linear Model">
<Attribute name="FLAT_PANEL_MONITOR" actualValue="1" weight=".025" rank="1"/>
<Attribute name="OCCUPATION" actualValue="Crafts" weight=".019" rank="2"/>
<Attribute name="AFFINITY_CARD" actualValue="0" weight=".01" rank="3"/>
<Attribute name="OS_DOC_SET_KANJI" actualValue="0" weight="0" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight="-.004" rank="5"/>
</Details>
The query in the following example returns the customers who work in Tech Support and are likely to use an affinity card (with more than 85% probability). The prediction functions apply an Support Vector Machine (SVM) classification model named svmc_sh_clas_sample. to the data selected from mining_data_apply_v
. The query includes the prediction details, which show that education is the most important predictor. The svmc_sh_clas_sample model is created by the oml4sql-classification-svm.sql
example.
Example 3-15 Prediction Details for Classification
SELECT cust_id, PREDICTION_DETAILS(svmc_sh_clas_sample, 1 USING *) PD FROM mining_data_apply_v WHERE PREDICTION_PROBABILITY(svmc_sh_clas_sample, 1 USING *) > 0.85 AND occupation = 'TechSup' ORDER BY cust_id;
The output is as follows:
CUST_ID PD
------- ---------------------------------------------------------------------------------------
100029 <Details algorithm="Support Vector Machines" class="1">
<Attribute name="EDUCATION" actualValue="Assoc-A" weight=".199" rank="1"/>
<Attribute name="CUST_INCOME_LEVEL" actualValue="I: 170\,000 - 189\,999" weight=".044"
rank="2"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".028" rank="3"/>
<Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".024" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".022" rank="5"/>
</Details>
100378 <Details algorithm="Support Vector Machines" class="1">
<Attribute name="EDUCATION" actualValue="Assoc-A" weight=".21" rank="1"/>
<Attribute name="CUST_INCOME_LEVEL" actualValue="B: 30\,000 - 49\,999" weight=".047"
rank="2"/>
<Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".043" rank="3"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".03" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".023" rank="5"/>
</Details>
100508 <Details algorithm="Support Vector Machines" class="1">
<Attribute name="EDUCATION" actualValue="Bach." weight=".19" rank="1"/>
<Attribute name="CUST_INCOME_LEVEL" actualValue="L: 300\,000 and above" weight=".046"
rank="2"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".031" rank="3"/>
<Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".026" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".024" rank="5"/>
</Details>
100980 <Details algorithm="Support Vector Machines" class="1">
<Attribute name="EDUCATION" actualValue="Assoc-A" weight=".19" rank="1"/>
<Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".038" rank="2"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".026" rank="3"/>
<Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".022" rank="4"/>
<Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".02" rank="5"/>
</Details>
The query in the following example returns the two customers that differ the most from the rest of the customers. The prediction functions apply an anomaly detection model named SVMO_SH_Clas_sample to the data selected from mining_data_apply_v
. anomaly detection uses a one-class SVM classifier. The model is created by the oml4sql-singular-value-decomposition.sql
example.
Example 3-16 Prediction Details for Anomaly Detection
SELECT cust_id, pd FROM (SELECT cust_id, PREDICTION_DETAILS(SVMO_SH_Clas_sample, 0 USING *) pd, RANK() OVER (ORDER BY prediction_probability( SVMO_SH_Clas_sample, 0 USING *) DESC, cust_id) rnk FROM mining_data_one_class_v) WHERE rnk <= 2 ORDER BY rnk;
The output is as follows:
CUST_ID PD
---------- -----------------------------------------------------------------------------------
102366 <Details algorithm="Support Vector Machines" class="0">
<Attribute name="COUNTRY_NAME" actualValue="United Kingdom" weight=".078" rank="1"/>
<Attribute name="CUST_MARITAL_STATUS" actualValue="Divorc." weight=".027" rank="2"/>
<Attribute name="CUST_GENDER" actualValue="F" weight=".01" rank="3"/>
<Attribute name="HOUSEHOLD_SIZE" actualValue="9+" weight=".009" rank="4"/>
<Attribute name="AGE" actualValue="28" weight=".006" rank="5"/>
</Details>
101790 <Details algorithm="Support Vector Machines" class="0">
<Attribute name="COUNTRY_NAME" actualValue="Canada" weight=".068" rank="1"/>
<Attribute name="HOUSEHOLD_SIZE" actualValue="4-5" weight=".018" rank="2"/>
<Attribute name="EDUCATION" actualValue="7th-8th" weight=".015" rank="3"/>
<Attribute name="CUST_GENDER" actualValue="F" weight=".013" rank="4"/>
<Attribute name="AGE" actualValue="38" weight=".001" rank="5"/>
</Details>
GROUPING Hint
OML4SQL functions include PREDICTION*
, CLUSTER*
, FEATURE*
, and ORA_DM_*
. The GROUPING
hint is an optional hint that applies to machine learning scoring functions when scoring partitioned models.
Enhanced PREDICTION Function Command Format
This hint results in partitioning the input data set into distinct data slices so that each partition is scored in its entirety before advancing to the next partition. However, parallelism by partition is still available. Data slices are determined by the partitioning key columns used when the model was built. This method can be used with any machine learning function against a partitioned model. The hint may yield a query performance gain when scoring large data that is associated with many partitions but may negatively impact performance when scoring large data with few partitions on large systems. Typically, there is no performance gain if you use the hint for single row queries.
<prediction function> ::=
PREDICTION <left paren> /*+ GROUPING */ <prediction model>
[ <comma> <class value> [ <comma> <top N> ] ]
USING <machine learning attribute list> <right paren>
The syntax for only the PREDICTION
function is given but it is applicable to any machine learning function in which PREDICTION
, CLUSTERING
, and FEATURE_EXTRACTION
scoring functions occur.
Example 3-17 Example
SELECT PREDICTION(/*+ GROUPING */my_model USING *) pred FROM <input table>;
Related Topics
In-Database Scoring
In-database scoring applies machine learning models to new data within the database, ensuring security, efficiency, and ease of integration with applications.
Scoring is the application of a machine learning algorithm to new data. In Oracle Machine Learning for SQL scoring engine and the data both reside within the database. In traditional machine learning, models are built using specialized software on a remote system and deployed to another system for scoring. This is a cumbersome, error-prone process open to security violations and difficulties in data synchronization.
With Oracle Machine Learning for SQL, scoring is simple and secure. The scoring engine and the data both reside within the database. Scoring is an extension to the SQL language, so the results of machine learning can easily be incorporated into applications and reporting systems.
Parallel Execution and Ease of Administration
Parallel execution and in-database scoring provide performance advantages and simplify model deployment, ensuring efficient handling of large data sets.
All Oracle Machine Learning for SQL scoring routines support parallel execution for scoring large data sets.
In-database scoring provides performance advantages. All Oracle Machine Learning for SQL scoring routines support parallel execution, which significantly reduces the time required for executing complex queries and scoring large data sets.
In-database machine learning minimizes the IT effort needed to support Oracle Machine Learning for SQL initiatives. Using standard database techniques, models can easily be refreshed (re-created) on more recent data and redeployed. The deployment is immediate since the scoring query remains the same; only the underlying model is replaced in the database.
Related Topics
SQL Functions for Model Apply and Dynamic Scoring
In Oracle Machine Learning for SQL, scoring is performed by SQL language functions. Understand the different ways of scoring using SQL functions.
The functions perform prediction, clustering, and feature extraction. The functions can be loaded in two different ways: By applying a machine learning model object (Example 3-18), or by running an analytic clause that computes the machine learning analysis dynamically and applies it to the data (Example 3-19). Dynamic scoring, which eliminates the need for a model, can supplement, or even replace, the more traditional methodology described in "The Machine Learning Process".
In Example 3-18, the PREDICTION_PROBABILITY
function applies the
model svmc_sh_clas_sample, created in Example 5-1, to score the data in
mining_data_apply_v
. The function returns the ten customers in
Italy who are most likely to use an affinity card.
In Example 3-19, the functions PREDICTION
and
PREDICTION_PROBABILITY
use the analytic syntax (the
OVER
() clause) to dynamically score the data in
mining_data_apply_v
. The query
returns the customers who currently do not have an affinity card with the probability
that they are likely to use.
Example 3-18 Applying a Oracle Machine Learning for SQL Model to Score Data
SELECT cust_id FROM (SELECT cust_id, rank() over (order by PREDICTION_PROBABILITY(svmc_sh_clas_sample, 1 USING *) DESC, cust_id) rnk FROM mining_data_apply_v WHERE country_name = 'Italy') WHERE rnk <= 10 ORDER BY rnk;The output is as follows:
CUST_ID
----------
101445
100179
100662
100733
100554
100081
100344
100324
100185
101345
Example 3-19 Executing an Analytic Function to Score Data
SELECT cust_id, pred_prob FROM (SELECT cust_id, affinity_card, PREDICTION(FOR TO_CHAR(affinity_card) USING *) OVER () pred_card, PREDICTION_PROBABILITY(FOR TO_CHAR(affinity_card),1 USING *) OVER () pred_prob FROM mining_data_build_v) WHERE affinity_card = 0 AND pred_card = 1 ORDER BY pred_prob DESC;The output is similar to:
CUST_ID PRED_PROB
---------- ---------
102434 .96
102365 .96
102330 .96
101733 .95
102615 .94
102686 .94
102749 .93
.
.
.
101656 .51
Dynamic Scoring
You can perform dynamic scoring if, for some reason, you do not want to apply a predefined model.
The Oracle Machine Learning for SQL functions operate in two modes: by applying a predefined model, or by executing an analytic clause. If you supply an analytic clause instead of a model name, the function builds one or more transient models and uses them to score the data.
The ability to score data dynamically without a predefined model extends the application of basic embedded machine learning techniques into environments where models are not available. Dynamic scoring, however, has limitations. The transient models created during dynamic scoring are not available for inspection or fine tuning. Applications that require model inspection, the correlation of scoring results with the model, special algorithm settings, or multiple scoring queries that use the same model, require a predefined model.
The following example shows a dynamic scoring query. The example identifies the rows in the input data that contain unusual customer age values.
Example 3-20 Dynamic Prediction
SELECT cust_id, age, pred_age, age-pred_age age_diff, pred_det FROM (SELECT cust_id, age, pred_age, pred_det, RANK() OVER (ORDER BY ABS(age-pred_age) DESC) rnk FROM (SELECT cust_id, age, PREDICTION(FOR age USING *) OVER () pred_age, PREDICTION_DETAILS(FOR age ABS USING *) OVER () pred_det FROM mining_data_apply_v)) WHERE rnk <= 5;
The output is follows:
CUST_ID AGE PRED_AGE AGE_DIFF PRED_DET
------- ---- ---------- -------- --------------------------------------------------------------
100910 80 40.6686505 39.33 <Details algorithm="Support Vector Machines">
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1"
weight=".059" rank="1"/>
<Attribute name="Y_BOX_GAMES" actualValue="0"
weight=".059" rank="2"/>
<Attribute name="AFFINITY_CARD" actualValue="0"
weight=".059" rank="3"/>
<Attribute name="FLAT_PANEL_MONITOR" actualValue="1"
weight=".059" rank="4"/>
<Attribute name="YRS_RESIDENCE" actualValue="4"
weight=".059" rank="5"/>
</Details>
101285 79 42.1753571 36.82 <Details algorithm="Support Vector Machines">
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1"
weight=".059" rank="1"/>
<Attribute name="HOUSEHOLD_SIZE" actualValue="2" weight=".059"
rank="2"/>
<Attribute name="CUST_MARITAL_STATUS" actualValue="Mabsent"
weight=".059" rank="3"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059"
rank="4"/>
<Attribute name="OCCUPATION" actualValue="Prof." weight=".059"
rank="5"/>
</Details>
100694 77 41.0396722 35.96 <Details algorithm="Support Vector Machines">
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1"
weight=".059" rank="1"/>
<Attribute name="EDUCATION" actualValue="< Bach."
weight=".059" rank="2"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059"
rank="3"/>
<Attribute name="CUST_ID" actualValue="100694" weight=".059"
rank="4"/>
<Attribute name="COUNTRY_NAME" actualValue="United States of
America" weight=".059" rank="5"/>
</Details>
100308 81 45.3252491 35.67 <Details algorithm="Support Vector Machines">
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1"
weight=".059" rank="1"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059"
rank="2"/>
<Attribute name="HOUSEHOLD_SIZE" actualValue="2" weight=".059"
rank="3"/>
<Attribute name="FLAT_PANEL_MONITOR" actualValue="1"
weight=".059" rank="4"/>
<Attribute name="CUST_GENDER" actualValue="F" weight=".059"
rank="5"/>
</Details>
101256 90 54.3862214 35.61 <Details algorithm="Support Vector Machines">
<Attribute name="YRS_RESIDENCE" actualValue="9" weight=".059"
rank="1"/>
<Attribute name="HOME_THEATER_PACKAGE" actualValue="1"
weight=".059" rank="2"/>
<Attribute name="EDUCATION" actualValue="< Bach."
weight=".059" rank="3"/>
<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059"
rank="4"/>
<Attribute name="COUNTRY_NAME" actualValue="United States of
America" weight=".059" rank="5"/>
</Details>
Real-Time Scoring
You can perform real-time scoring by running a SQL query. An example shows a real-time query using PREDICTION_PROBABILITY
function. Based on the result, a customer representative can offer a value card to the customer.
Oracle Machine Learning for SQL functions enable prediction, clustering, and feature extraction analysis to be easily integrated into live production and operational systems. Because machine learning results are returned within SQL queries, machine learning can occur in real time.
With real-time scoring, point-of-sales database transactions can be mined. Predictions and rule sets can be generated to help front-line workers make better analytical decisions. Real-time scoring enables fraud detection, identification of potential liabilities, and recognition of better marketing and selling opportunities.
The query in the following example uses a Decision Tree model named dt_sh_clas_sample
to predict the probability that customer 101488 uses an affinity card. A customer representative can retrieve this information in real time when talking to this customer on the phone. Based on the query result, the representative can offer an extra-value card, since there is a 73% chance that the customer uses a card. The model is created by the oml4sql-classification-decision-tree.sql
example.
Example 3-21 Real-Time Query with Prediction Probability
SELECT PREDICTION_PROBABILITY(dt_sh_clas_sample, 1 USING *) cust_card_prob FROM mining_data_apply_v WHERE cust_id = 101488;
The output is as follows:
CUST_CARD_PROB
--------------
.72764
DBMS_DATA_MINING.APPLY
The APPLY
procedure in DBMS_DATA_MINING
is a batch apply operation that writes the results of scoring directly to a table.
The columns in the table are machine learning function-dependent.
Scoring with APPLY
generates the same results as scoring with the SQL scoring functions. Classification produces a prediction and a probability for each case; clustering produces a cluster ID and a probability for each case, and so on. The difference lies in the way that scoring results are captured and the mechanisms that can be used for retrieving them.
APPLY
creates an output table with the columns shown in the following table:
Table 3-14 APPLY Output Table
Machine Learning Technique | Output Columns |
---|---|
classification |
|
regression |
|
anomaly detection |
|
clustering |
|
feature extraction |
|
Since APPLY
output is stored separately from the scoring data, it must be joined to the scoring data to support queries that include the scored rows. Thus any model that is used with APPLY
must have a case ID.
A case ID is not required for models that is applied with SQL scoring functions. Likewise, storage and joins are not required, since scoring results are generated and consumed in real time within a SQL query.
The following example illustrates anomaly detection with
APPLY
. The query of the APPLY
output table returns
the ten first customers in the table. Each has a a probability for being typical (1) and
a probability for being anomalous (0). The SVMO_SH_Clas_sample model is created by the
oml4sql-anomaly-detection-1class-svm.sql
example.
Example 3-22 Anomaly Detection with DBMS_DATA_MINING.APPLY
EXEC dbms_data_mining.apply ('SVMO_SH_Clas_sample','svmo_sh_sample_prepared', 'cust_id', 'one_class_output'); SELECT * from one_class_output where rownum < 11;
The output is as follows:
CUST_ID PREDICTION PROBABILITY
---------- ---------- -----------
101798 1 .567389309
101798 0 .432610691
102276 1 .564922469
102276 0 .435077531
102404 1 .51213544
102404 0 .48786456
101891 1 .563474346
101891 0 .436525654
102815 0 .500663683
102815 1 .499336317
Related Topics
Create a Model that Includes Machine Learning Operations on Text
Create a model and specify the settings to perform machine learning operations on text.
Oracle Machine Learning for SQL supports unstructured text within columns of VARCHAR2
, CHAR
, CLOB
, BLOB
, and BFILE
, as described in the following table:
Table 3-15 Column Data Types That May Contain Unstructured Text
Data Type | Description |
---|---|
|
Oracle Machine Learning for SQL interprets |
|
OML4SQL interprets |
|
OML4SQL interprets |
|
OML4SQL interprets OML4SQL interprets |
The settings described in the following table control the term extraction process for text attributes in a model. Instructions for specifying model settings are in "Specifying Model Settings".
Table 3-16 Model Settings for Text
Setting Name | Data Type | Setting Value | Description |
---|---|---|---|
|
|
Name of an Oracle Text policy object created with |
Affects how individual tokens are extracted from unstructured text. |
|
|
1 <= value <= 100000 |
Maximum number of features to use from the document set (across all documents of each text column) passed to Default is 3000. |
A model can include one or more text attributes. A model with text attributes can also include categorical and numerical attributes.
To create a model that includes text attributes:
-
Create an Oracle Text policy object.
-
Specify the model configuration settings that are described in "Table 3-16".
-
Specify which columns must be treated as text and, optionally, provide text transformation instructions for individual attributes.
-
Pass the model settings and text transformation instructions to
DBMS_DATA_MINING.CREATE_MODEL2
orDBMS_DATA_MINING.CREATE_MODEL
.Note:
All algorithms except O-Cluster can support columns of unstructured text.
The use of unstructured text is not recommended for association rules (Apriori).
In the following example, an SVM model is used to predict customers that are most likely to be positive responders to an Affinity Card loyalty program. The data comes with a text column that contains user generated comments. By creating an Oracle Text policy and specifying model settings, the algorithm automatically uses the text column and builds the model on both the structured data and unstructured text.
This example uses a view called mining_data
which is created from SH.SALES
table. A training data set called mining_train_text
is also created.
The following queries show you how to create an Oracle Text policy followed by building a model using CREATE_MODEL2
procedure.
%script BEGIN EXECUTE ctx_ddl.create_policy('dmdemo_svm_policy');
The output is:
PL/SQL procedure successfully completed.
---------------------------
PL/SQL procedure successfully completed.
%script BEGIN DBMS_DATA_MINING.DROP_MODEL('T_SVM_Clas_sample'); EXCEPTION WHEN OTHERS THEN NULL; END; / DECLARE v_setlst DBMS_DATA_MINING.SETTING_LIST; xformlist dbms_data_mining_transform.TRANSFORM_LIST; BEGIN v_setlst(dbms_data_mining.algo_name) := dbms_data_mining.algo_support_vector_machines; v_setlst(dbms_data_mining.prep_auto) := dbms_data_mining.prep_auto_on; v_setlst(dbms_data_mining.svms_kernel_function) := dbms_data_mining.svms_linear; v_setlst(dbms_data_mining.svms_complexity_factor) := '100'; v_setlst(dbms_data_mining.odms_text_policy_name) := 'DMDEMO_SVM_POLICY'; v_setlst(dbms_data_mining.svms_solver) := dbms_data_mining.svms_solver_sgd; dbms_data_mining_transform.SET_TRANSFORM( xformlist, 'comments', null, 'comments', null, 'TEXT'); DBMS_DATA_MINING.CREATE_MODEL2( model_name => 'T_SVM_Clas_sample', mining_function => dbms_data_mining.classification, data_query => 'select * from mining_train_text', set_list => v_setlst, case_id_column_name => 'cust_id', target_column_name => 'affinity_card', xform_list => xformlist); END; /
The output is:
PL/SQL procedure successfully completed.
---------------------------
PL/SQL procedure successfully completed.
---------------------------
Create a Text Policy
An Oracle Text policy specifies how text content must be interpreted. You can provide a text policy to govern a model, an attribute, or both the model and individual attributes.
If a model-specific policy is present and one or more attributes have their own policies, Oracle Machine Learning for SQL uses the attribute policies for the specified attributes and the model-specific policy for the other attributes.
The CTX_DDL.CREATE_POLICY
procedure creates a text policy.
CTX_DDL.CREATE_POLICY( policy_name IN VARCHAR2, filter IN VARCHAR2 DEFAULT NULL, section_group IN VARCHAR2 DEFAULT NULL, lexer IN VARCHAR2 DEFAULT NULL, stoplist IN VARCHAR2 DEFAULT NULL, wordlist IN VARCHAR2 DEFAULT NULL);
The parameters of CTX_DDL.CREATE_POLICY
are described in the following table.
Table 3-17 CTX_DDL.CREATE_POLICY Procedure Parameters
Parameter Name | Description |
---|---|
|
Name of the new policy object. Oracle Text policies and text indexes share the same namespace. |
|
Specifies how the documents must be converted to plain text for indexing. Examples are: For |
|
Identifies sections within the documents. For example, For Note: You can specify any section group that is supported by |
|
Identifies the language that is being indexed. For example, For |
|
Specifies words and themes to exclude from term extraction. For example, the word "the" is typically in the stoplist for English language documents. The system-supplied stoplist is used by default. See "Stoplists" in Oracle Text Reference. |
|
Specifies how stems and fuzzy queries must be expanded. A stem defines a root form of a word so that different grammatical forms have a single representation. A fuzzy query includes common misspellings in the representation of a word. See " |
Related Topics
Configure a Text Attribute
Provide transformation instructions for text attribute or unstructured text by explicitly identifying the column datatypes.
As shown in Table 3-15, you can identify columns of CHAR,
shorter VARCHAR2
(<=4000), BFILE
, and BLOB
as text attributes. If CHAR
and shorter VARCHAR2
columns are not explicitly identified as unstructured text, then CREATE_MODEL
processes them as categorical attributes. If BFILE
and BLOB
columns are not explicitly identified as unstructured text, then CREATE_MODEL
returns an error.
To identify a column as a text attribute, supply the keyword TEXT
in an Attribute specification. The attribute specification is a field (attribute_spec
) in a transformation record (transform_rec
). Transformation records are components of transformation lists (xform_list
) that can be passed to CREATE_MODEL
or CREATE_MODEL2
.
Note:
An attribute specification can also include information that is not related to text. Instructions for constructing an attribute specification are in "Embedding Transformations in a Model".
You can provide transformation instructions for any text attribute by qualifying the TEXT
keyword in the attribute specification with the subsettings described in the following table.
Table 3-18 Attribute-Specific Text Transformation Instructions
Subsetting Name | Description | Example |
---|---|---|
|
A sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. Here, |
( |
|
Name of an Oracle Text policy object created with |
( |
|
Here, |
( |
|
Oracle Machine Learning for SQL supports synonyms. The following is an optional parameter: < |
( ( |
|
The following values are supported:
|
|
|
Maximum number of features to use from the attribute. |
|
Note:
The TEXT
keyword is only required for CLOB
and longer VARCHAR2
(>4000) when you specify transformation instructions. The TEXT
keyword is always required for CHAR
, shorter VARCHAR2
, BFILE
, and BLOB
— whether or not you specify transformation instructions.
Tip:
You can view attribute specifications in the data dictionary view ALL_MINING_MODEL_ATTRIBUTES
, as shown in Oracle Database
Reference.
Token Types in an Attribute Specification
When stems or themes are specified as the token type, the lexer preference for the text policy must support these types of tokens.
The following example adds themes and English stems to BASIC_LEXER
.
BEGIN CTX_DDL.CREATE_PREFERENCE('my_lexer', 'BASIC_LEXER'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_stems', 'ENGLISH'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_themes', 'YES'); END;
Example 3-23 A Sample Attribute Specification for Text
This expression specifies that text transformation for the attribute must use the text policy named my_policy
. The token type is THEME
, and the maximum number of features is 3000.
"TEXT(POLICY_NAME:my_policy)(TOKEN_TYPE:THEME)(MAX_FEATURES:3000)"