Oracle Data Mining can mine tables, views, star schemas, transactional data, and unstructured data. The OREdm functions provide R interfaces that use arguments that conform to typical R usage for corresponding predictive analytics and data mining functions.
This section has the following topics:
The functions in the OREdm package provide access to the in-database data mining functionality of Oracle Database. You use these functions to build data mining models in the database.
The following table lists the Oracle R Enterprise functions that build Oracle Data Mining models and the corresponding Oracle Data Mining algorithms and functions.
Table 4-2 Oracle R Enterprise Data Mining Model Functions
| Oracle R Enterprise Function | Oracle Data Mining Algorithm | Oracle Data Mining Function |
|---|---|---|
|
|
Minimum Description Length |
Attribute Importance for Classification or Regression |
|
|
Apriori |
Association Rules |
|
|
Decision Tree |
Classification |
|
(12.2 feature) |
Expectation Maximization |
Clustering |
|
(12.2 feature) |
Explicit Semantic Analysis |
Feature Extraction |
|
|
Generalized Linear Models |
Classification and Regression |
|
|
k-Means |
Clustering |
|
|
Naive Bayes |
Classification |
|
|
Non-Negative Matrix Factorization |
Feature Extraction |
|
|
Orthogonal Partitioning Cluster (O-Cluster) |
Clustering |
|
(12.2 feature) |
Extensible R Algorithm |
Association Rules, Attribute Importance, Classification, Clustering, Feature Extraction, and Regression |
|
(12.2 feature) |
Singular Value Decomposition |
Feature Extraction |
|
|
Support Vector Machines |
Classification and Regression |
In each OREdm R model object, the slot name (or fit.name) is the name of the underlying Oracle Data Mining model generated by the OREdm function. While the R model exists, the Oracle Data Mining model name can be used to access the Oracle Data Mining model through other interfaces, including:
Oracle Data Miner
Any SQL interface, such as SQL*Plus or SQL Developer
In particular, the models can be used with the Oracle Data Mining SQL prediction functions.
With Oracle Data Miner you can do the following:
Get a list of available models
Use model viewers to inspect model details
Score appropriately transformed data
Note:
Any transformations performed in the R space are not carried over into Oracle Data Miner or SQL scoring.
Users can also get a list of models using SQL for inspecting model details or for scoring appropriately transformed data.
By default, models built using OREdm functions are transient objects; they do not persist past the R session in which they were built unless they are explicitly saved in an Oracle R Enterprise datastore. Oracle Data Mining models built using Data Miner or SQL, on the other hand, exist until they are explicitly dropped.
Model objects can be saved or persisted. Saving a model object generated by an OREdm function allows it to exist across R sessions and keeps the corresponding Oracle Data Mining object in place. While the OREdm model exists, you can export and import it; then you can use it apart from the Oracle R Enterprise R object existence.
You can use the MODEL_NAME parameter in odm.settings to explicitly name an Oracle Data Mining object created in the database. The named Oracle Data Mining model object persists in the database just like those created using Oracle Data Miner or SQL.
Related Topics
Beginning with Oracle Database 12c, Release 2 (12.2), functions in the Oracle Data Mining package have an argument that specifies settings for an Oracle Data Mining model and some have an argument for setting text mining parameters.
With the odm.setting argument to an OREdm function, you can specify a list of Oracle Data Mining parameter settings. Each list element's name and value refer to the parameter setting name and value, respectively. The setting value must be numeric or string. Refer to Oracle Data Mining User’s Guide for each algorithm's valid settings.
The settings function returns a data.frame that lists each Oracle Data Mining parameter setting name and value pair used to build the model.
Partitioned Oracle Data Mining Models
A partitioned model is an ensemble model that consists of multiple sub-models. To create a partitioned Oracle Data Mining model, use the odm.setting argument with ODMS_PARTITION_COLUMNS as the name and with the names of the columns by which to partition the input data as the value. The OREdm function returns a model with a sub-model for each partition. The partitions are based on the unique values found in the columns.
Partitioned models can automate scoring by allowing you to reference the top-level model only, which causes the proper sub-model to be chosen based on the values of the partitioned column or columns for each row of data to be scored.
The partitions function returns an ore.frame that lists each partition of the specified model object and the associated partition column values of the model. Partition names are system-determined. The function returns NULL for a non-partitioned model.
Text Mining Attribute Settings
Some OREdm functions have a ctx.settings argument that specifies text mining attribute settings with which you can specify Oracle Text attribute-specific settings. With the odm.settings argument, you can specify the Oracle text policy, the minimal number of documents in which each token occurs, and the maximum number of distinct features for text mining. With the ctx.settings argument, you specify the columns that should be treated as text and the type of text transformation to apply.
The ctx.settings argument applies to the following functions:
ore.odmESA, Explicit Semantic Analysis
ore.odmGLM, Generalized Linear Models
ore.odmKMeans, k-Means
ore.odmNMF, Non-Negative Matrix Factorization
ore.odmSVD, Singular Value Decomposition
ore.odmSVM, Support Vector Machines
Note:
To create an Oracle Text policy, the user must have theCTXSYS.CTX_DDL privilege.See Also:
Creating a Model that Includes Text Mining in Oracle Data Mining User’s Guide for valid text attribute values.Example 4-8 Example of Text Mining with ore.odmKMeans
This example uses the odm.settings and ctx.settings arguments. The figure following the example shows the output of the histogram(km.mod1) function.
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
X <- ore.push (data.frame(x))
km.mod1 <- NULL
km.mod1 <- ore.odmKMeans(~., X, num.centers = 2)
km.mod1
summary(km.mod1)
rules(km.mod1)
clusterhists(km.mod1)
histogram(km.mod1)
km.res1 <- predict(km.mod1,X,type="class",supplemental.cols=c("x","y"))
head(km.res1,3)
km.res1.local <- ore.pull(km.res1)
plot(data.frame(x = km.res1.local$x,
y = km.res1.local$y),
col = km.res1.local$CLUSTER_ID)
points(km.mod1$centers2, col = rownames(km.mod1$centers2), pch = 8, cex=2)
head(predict(km.mod1,X))
head(predict(km.mod1,X,type=c("class","raw"),supplemental.cols=c("x","y")),3)
head(predict(km.mod1,X,type="raw",supplemental.cols=c("x","y")),3)
# Text mining with ore.odmKMeans
title <- c('Aids in Africa: Planning for a long war',
'Mars rover maneuvers for rim shot',
'Mars express confirms presence of water at Mars south pole',
'NASA announces major Mars rover finding',
'Drug access, Asia threat in focus at AIDS summit',
'NASA Mars Odyssey THEMIS image: typical crater',
'Road blocks for Aids')
response <- c('Aids', 'Mars', 'Mars', 'Mars', 'Aids', 'Mars', 'Aids')
# TEXT contents in character column
KM_TEXT <- ore.push(data.frame(CUST_ID = seq(length(title)),
RESPONSE = response, TITLE = title))
# Create text policy (CTXSYS.CTX_DDL privilege is required)
ore.exec("Begin ctx_ddl.create_policy('ESA_TXTPOL'); End;")
# Specify POLICY_NAME, MIN_DOCUMENTS, MAX_FEATURES and
# text column attribute specification
km.mod <- ore.odmKMeans( ~ TITLE, data = KM_TEXT, num.centers = 2L,
odm.settings = list(ODMS_TEXT_POLICY_NAME = "ESA_TXTPOL",
ODMS_TEXT_MIN_DOCUMENTS = 1,
ODMS_TEXT_MAX_FEATURES = 3,
kmns_distance = "dbms_data_mining.kmns_cosine",
kmns_details = "kmns_details_all"),
ctx.settings = list(TITLE = "TEXT(TOKEN_TYPE:STEM)"))
summary(km.mod)
settings(km.mod)
print(predict(km.mod, KM_TEXT, supplemental.cols = "RESPONSE"), digits = 3L)
ore.exec("Begin ctx_ddl.drop_policy('ESA_TXTPOL'); End;")
Listing for This Example
R> x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
+ matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
R> colnames(x) <- c("x", "y")
R>
R> X <- ore.push (data.frame(x))
R> km.mod1 <- NULL
R> km.mod1 <- ore.odmKMeans(~., X, num.centers = 2)
R> km.mod1
Call:
ore.odmKMeans(formula = ~., data = X, num.centers = 2)
Settings:
value
clus.num.clusters 2
block.growth 2
conv.tolerance 0.01
details details.all
distance euclidean
iterations 3
min.pct.attr.support 0.1
num.bins 10
random.seed 0
split.criterion variance
odms.missing.value.treatment odms.missing.value.auto
odms.sampling odms.sampling.disable
prep.auto ON
R> summary(km.mod1)
Call:
ore.odmKMeans(formula = ~., data = X, num.centers = 2)
Settings:
value
clus.num.clusters 2
block.growth 2
conv.tolerance 0.01
details details.all
distance euclidean
iterations 3
min.pct.attr.support 0.1
num.bins 10
random.seed 0
split.criterion variance
odms.missing.value.treatment odms.missing.value.auto
odms.sampling odms.sampling.disable
prep.auto ON
Centers:
x y
2 -0.07638266 0.04449368
3 0.98493306 1.00864399
R> rules(km.mod1)
cluster.id rhs.support rhs.conf lhr.support lhs.conf lhs.var lhs.var.support lhs.var.conf predicate
1 1 100 1.0 92 0.86 x 86 0.2222222 x <= 1.2209
2 1 100 1.0 92 0.86 x 86 0.2222222 x >= -.6188
3 1 100 1.0 86 0.86 y 86 0.4444444 y <= 1.1653
4 1 100 1.0 86 0.86 y 86 0.4444444 y > -.3053
5 2 50 0.5 48 0.96 x 48 0.0870793 x <= .4324
6 2 50 0.5 48 0.96 x 48 0.0870793 x >= -.6188
7 2 50 0.5 48 0.96 y 48 0.0893300 y <= .5771
8 2 50 0.5 48 0.96 y 48 0.0893300 y > -.5995
9 3 50 0.5 49 0.98 x 49 0.0852841 x <= 1.7465
10 3 50 0.5 49 0.98 x 49 0.0852841 x > .4324
11 3 50 0.5 50 0.98 y 49 0.0838225 y <= 1.7536
12 3 50 0.5 50 0.98 y 49 0.0838225 y > .2829
R> clusterhists(km.mod1)
cluster.id variable bin.id lower.bound upper.bound label count
1 1 x 1 -0.61884662 -0.35602715 -.6188466:-.3560272 6
2 1 x 2 -0.35602715 -0.09320769 -.3560272:-.0932077 17
3 1 x 3 -0.09320769 0.16961178 -.0932077:.1696118 15
4 1 x 4 0.16961178 0.43243125 .1696118:.4324312 11
5 1 x 5 0.43243125 0.69525071 .4324312:.6952507 8
6 1 x 6 0.69525071 0.95807018 .6952507:.9580702 17
7 1 x 7 0.95807018 1.22088965 .9580702:1.2208896 18
8 1 x 8 1.22088965 1.48370911 1.2208896:1.4837091 4
9 1 x 9 1.48370911 1.74652858 1.4837091:1.7465286 4
10 1 y 1 -0.89359597 -0.59946141 -.893596:-.5994614 2
11 1 y 2 -0.59946141 -0.30532685 -.5994614:-.3053269 4
12 1 y 3 -0.30532685 -0.01119230 -.3053269:-.0111923 11
13 1 y 4 -0.01119230 0.28294226 -.0111923:.2829423 24
14 1 y 5 0.28294226 0.57707682 .2829423:.5770768 13
15 1 y 6 0.57707682 0.87121138 .5770768:.8712114 12
16 1 y 7 0.87121138 1.16534593 .8712114:1.1653459 26
17 1 y 8 1.16534593 1.45948049 1.1653459:1.4594805 5
18 1 y 9 1.45948049 1.75361505 1.4594805:1.753615 3
19 2 x 1 -0.61884662 -0.35602715 -.6188466:-.3560272 6
20 2 x 2 -0.35602715 -0.09320769 -.3560272:-.0932077 17
21 2 x 3 -0.09320769 0.16961178 -.0932077:.1696118 15
22 2 x 4 0.16961178 0.43243125 .1696118:.4324312 10
23 2 x 5 0.43243125 0.69525071 .4324312:.6952507 2
24 2 x 6 0.69525071 0.95807018 .6952507:.9580702 0
25 2 x 7 0.95807018 1.22088965 .9580702:1.2208896 0
26 2 x 8 1.22088965 1.48370911 1.2208896:1.4837091 0
27 2 x 9 1.48370911 1.74652858 1.4837091:1.7465286 0
28 2 y 1 -0.89359597 -0.59946141 -.893596:-.5994614 2
29 2 y 2 -0.59946141 -0.30532685 -.5994614:-.3053269 4
30 2 y 3 -0.30532685 -0.01119230 -.3053269:-.0111923 11
31 2 y 4 -0.01119230 0.28294226 -.0111923:.2829423 24
32 2 y 5 0.28294226 0.57707682 .2829423:.5770768 9
33 2 y 6 0.57707682 0.87121138 .5770768:.8712114 0
34 2 y 7 0.87121138 1.16534593 .8712114:1.1653459 0
35 2 y 8 1.16534593 1.45948049 1.1653459:1.4594805 0
36 2 y 9 1.45948049 1.75361505 1.4594805:1.753615 0
37 3 x 1 -0.61884662 -0.35602715 -.6188466:-.3560272 0
38 3 x 2 -0.35602715 -0.09320769 -.3560272:-.0932077 0
39 3 x 3 -0.09320769 0.16961178 -.0932077:.1696118 0
40 3 x 4 0.16961178 0.43243125 .1696118:.4324312 1
41 3 x 5 0.43243125 0.69525071 .4324312:.6952507 6
42 3 x 6 0.69525071 0.95807018 .6952507:.9580702 17
43 3 x 7 0.95807018 1.22088965 .9580702:1.2208896 18
44 3 x 8 1.22088965 1.48370911 1.2208896:1.4837091 4
45 3 x 9 1.48370911 1.74652858 1.4837091:1.7465286 4
46 3 y 1 -0.89359597 -0.59946141 -.893596:-.5994614 0
47 3 y 2 -0.59946141 -0.30532685 -.5994614:-.3053269 0
48 3 y 3 -0.30532685 -0.01119230 -.3053269:-.0111923 0
49 3 y 4 -0.01119230 0.28294226 -.0111923:.2829423 0
50 3 y 5 0.28294226 0.57707682 .2829423:.5770768 4
51 3 y 6 0.57707682 0.87121138 .5770768:.8712114 12
52 3 y 7 0.87121138 1.16534593 .8712114:1.1653459 26
53 3 y 8 1.16534593 1.45948049 1.1653459:1.4594805 5
54 3 y 9 1.45948049 1.75361505 1.4594805:1.753615 3
R> histogram(km.mod1)
R>
R> km.res1 <- predict(km.mod1, X, type="class", supplemental.cols = c("x","y"))
R> head(km.res1, 3)
x y CLUSTER_ID
1 -0.43646407 0.26201831 2
2 -0.02797831 0.07319952 2
3 0.11998373 -0.08638716 2
R> km.res1.local <- ore.pull(km.res1)
R> plot(data.frame(x = km.res1.local$x,
+ y = km.res1.local$y),
+ col = km.res1.local$CLUSTER_ID)
R> points(km.mod1$centers2, col = rownames(km.mod1$centers2), pch = 8, cex = 2)
R>
R> head(predict(km.mod1, X))
'2' '3' CLUSTER_ID
1 0.9992236 0.0007763706 2
2 0.9971310 0.0028690375 2
3 0.9974216 0.0025783939 2
4 0.9997335 0.0002665114 2
5 0.9917773 0.0082226599 2
6 0.9771667 0.0228333398 2
R> head(predict(km.mod1,X,type=c("class","raw"),supplemental.cols=c("x","y")),3)
'2' '3' x y CLUSTER_ID
1 0.9992236 0.0007763706 -0.43646407 0.26201831 2
2 0.9971310 0.0028690375 -0.02797831 0.07319952 2
3 0.9974216 0.0025783939 0.11998373 -0.08638716 2
R> head(predict(km.mod1,X,type="raw",supplemental.cols=c("x","y")),3)
x y '2' '3'
1 -0.43646407 0.26201831 0.9992236 0.0007763706
2 -0.02797831 0.07319952 0.9971310 0.0028690375
3 0.11998373 -0.08638716 0.9974216 0.0025783939R>
R>
R> # Text mining with ore.odmKMeans
R> title <- c('Aids in Africa: Planning for a long war',
+ 'Mars rover maneuvers for rim shot',
+ 'Mars express confirms presence of water at Mars south pole',
+ 'NASA announces major Mars rover finding',
+ 'Drug access, Asia threat in focus at AIDS summit',
+ 'NASA Mars Odyssey THEMIS image: typical crater',
+ 'Road blocks for Aids')
R> response <- c('Aids', 'Mars', 'Mars', 'Mars', 'Aids', 'Mars', 'Aids')
R>
R> KM_TEXT <- ore.push(data.frame(CUST_ID = seq(length(title)),
+ RESPONSE = response, TITLE = title))
R>
R> # Create text policy (CTXSYS.CTX_DDL privilege is required)
R> ore.exec("Begin ctx_ddl.create_policy('ESA_TXTPOL'); End;")
R>
R> # specify POLICY_NAME, MIN_DOCUMENTS, MAX_FEATURES and
R> # Text column attribute specification
R> km.mod <- ore.odmKMeans( ~ TITLE, data = KM_TEXT, num.centers = 2L,
+ odm.settings = list(ODMS_TEXT_POLICY_NAME = "ESA_TXTPOL",
+ ODMS_TEXT_MIN_DOCUMENTS = 1,
+ ODMS_TEXT_MAX_FEATURES = 3,
+ kmns_distance = "dbms_data_mining.kmns_cosine",
+ kmns_details = "kmns_details_all"),
+ ctx.settings = list(TITLE="TEXT(TOKEN_TYPE:STEM)"))
R> summary(km.mod)
Call:
ore.odmKMeans(formula = ~TITLE, data = KM_TEXT, num.centers = 2L,
odm.settings = list(ODMS_TEXT_POLICY_NAME = "ESA_TXTPOL",
ODMS_TEXT_MIN_DOCUMENTS = 1, ODMS_TEXT_MAX_FEATURES = 3,
kmns_distance = "dbms_data_mining.kmns_cosine",
kmns_details = "kmns_details_all"),
ctx.settings = list(TITLE = "TEXT(TOKEN_TYPE:STEM)"))
Settings:
value
clus.num.clusters 2
block.growth 2
conv.tolerance 0.01
details details.all
distance cosine
iterations 3
min.pct.attr.support 0.1
num.bins 10
random.seed 0
split.criterion variance
odms.missing.value.treatment odms.missing.value.auto
odms.sampling odms.sampling.disable
odms.text.max.features 3
odms.text.min.documents 1
odms.text.policy.name ESA_TXTPOL
prep.auto ON
Centers:
TITLE.MARS TITLE.NASA TITLE.ROVER TITLE.AIDS
2 0.5292307 0.7936566 0.7936566 NA
3 NA NA NA 1
R> settings(km.mod)
SETTING_NAME SETTING_VALUE SETTING_TYPE
1 ALGO_NAME ALGO_KMEANS INPUT
2 CLUS_NUM_CLUSTERS 2 INPUT
3 KMNS_BLOCK_GROWTH 2 INPUT
4 KMNS_CONV_TOLERANCE 0.01 INPUT
5 KMNS_DETAILS KMNS_DETAILS_ALL INPUT
6 KMNS_DISTANCE KMNS_COSINE INPUT
7 KMNS_ITERATIONS 3 INPUT
8 KMNS_MIN_PCT_ATTR_SUPPORT 0.1 INPUT
9 KMNS_NUM_BINS 10 INPUT
10 KMNS_RANDOM_SEED 0 DEFAULT
11 KMNS_SPLIT_CRITERION KMNS_VARIANCE INPUT
12 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO DEFAULT
13 ODMS_SAMPLING ODMS_SAMPLING_DISABLE DEFAULT
14 ODMS_TEXT_MAX_FEATURES 3 INPUT
15 ODMS_TEXT_MIN_DOCUMENTS 1 INPUT
16 ODMS_TEXT_POLICY_NAME ESA_TXTPOL INPUT
17 PREP_AUTO ON INPUT
R> print(predict(km.mod, KM_TEXT, supplemental.cols = "RESPONSE"), digits = 3L)
'2' '3' RESPONSE CLUSTER_ID
1 0.0213 0.9787 Aids 3
2 0.9463 0.0537 Mars 2
3 0.9325 0.0675 Mars 2
4 0.9691 0.0309 Mars 2
5 0.0213 0.9787 Aids 3
6 0.9463 0.0537 Mars 2
7 0.0213 0.9787 Aids 3
R>
R> ore.exec("Begin ctx_ddl.drop_policy('ESA_TXTPOL'); End;")