7.10 Extensible R Algorithm Model

The ore.odmRAlg function creates an Extensible R algorithm model.

The Extensible R algorithm builds, scores, and views an R model using registered R scripts. It supports classification, regression, clustering, feature extraction, attribute importance, and association machine learning functions.

For information on the ore.odmRAlg function arguments and for an example of using the function, call help(ore.odmRAlg).

Settings for an Extensible R Algorithm Model

The following table lists settings that apply to Extensible R Algorithm models.

Table 7-9 Extensible R Algorithm Model Settings

Setting Name Setting Value Description

RALG_BUILD_FUNCTION

R_BUILD_FUNCTION_SCRIPT_NAME

Specifies the name of an existing registered R script for R algorithm mining model build function. The R script defines an R function for the first input argument for training data and returns an R model object. For Clustering and Feature Extraction mining function model build, the R attributes dm$nclus and dm$nfeat must be set on the R model to indicate the number of clusters and features respectively.

The RALG_BUILD_FUNCTION must be set along with ALGO_EXTENSIBLE_LANG in the model_setting_table.

RALG_BUILD_PARAMETER

SELECT value param_name, ...FROM DUAL

Specifies a list of numeric and string scalar for optional input parameters of the model build function.

RALG_SCORE_FUNCTION

R_SCORE_FUNCTION_SCRIPT_NAME

Specifies the name of an existing registered R script to score data. The script returns a data.frame containing the corresponding prediction results. The setting is used to score data for mining functions such as Regression, Classification, Clustering, and Feature Extraction. This setting does not apply to Association and Attribute Importance functions.

RALG_WEIGHT_FUNCTION

R_WEIGHT_FUNCTION_SCRIPT_NAME

Specifies the name of an existing registered R script for R algorithm that computes the weight (contribution) for each attribute in scoring. The script returns a data.frame containing the contributing weight for each attribute in a row. This function setting is needed for PREDICTION_DETAILS SQL function.

RALG_DETAILS_FUNCTION R_DETAILS_FUNCTION_SCRIPT_NAME Specifies the name of an existing registered R script for R algorithm that produces the model information. This setting is required to generate a model view.
RALG_DETAILS_FORMAT SELECT type_value column_name,FROM DUAL Specifies the SELECT query for the list of numeric and string scalars for the output column type and the column name of the generated model view. This setting is required to generate a model view.

Example 7-9 Using the ore.odmRAlg Function

library(OREembed)

digits <- getOption("digits")
options(digits = 5L)

IRIS <- ore.push(iris)

# Regression with glm
ore.scriptCreate("glm_build", 
                 function(data, form, family) 
                 glm(formula = form, data = data, family = family)) 

ore.scriptCreate("glm_score", 
                  function(mod, data) 
                    { res <- predict(mod, newdata = data); 
                      data.frame(res) })

ore.scriptCreate("glm_detail", function(mod) 
                 data.frame(name=names(mod$coefficients), 
                              coef=mod$coefficients))

ore.scriptList(name = "glm_build")
ore.scriptList(name = "glm_score")
ore.scriptList(name = "glm_detail")

ralg.glm <- ore.odmRAlg(IRIS, mining.function = "regression",
                        formula = c(form="Sepal.Length ~ ."),
                        build.function = "glm_build", 
                        build.parameter = list(family="gaussian"),
                        score.function = "glm_score",
                        detail.function = "glm_detail", 
                        detail.value = data.frame(name="a", coef=1))
summary(ralg.glm)
predict(ralg.glm, newdata = head(IRIS), supplemental.cols = "Sepal.Length")

ore.scriptDrop(name = "glm_build")
ore.scriptDrop(name = "glm_score")
ore.scriptDrop(name = "glm_detail")

# Classification with nnet
ore.scriptCreate("nnet_build", 
                 function(dat, form, sz){
                   require(nnet); 
                   set.seed(1234);
                   nnet(formula = formula(form), data = dat, 
	                       size = sz, linout = TRUE, trace = FALSE); 
                  }, 
                  overwrite = TRUE)

ore.scriptCreate("nnet_detail", function(mod)
                 data.frame(conn = mod$conn, wts = mod$wts), 
                 overwrite = TRUE)

ore.scriptCreate("nnet_score", 
                 function(mod, data) {
                   require(nnet); 
                   res <- data.frame(predict(mod, newdata = data)); 
                   names(res) <- sort(mod$lev); res
                 })

ralg.nnet <- ore.odmRAlg(IRIS, mining.function = "classification",
                         formula = c(form="Species ~ ."),
                         build.function = "nnet_build", 
                         build.parameter = list(sz=2),
                         score.function = "nnet_score",
                         detail.function = "nnet_detail",
                         detail.value = data.frame(conn=1, wts =1))

summary(ralg.nnet)
predict(ralg.nnet, newdata = head(IRIS), supplemental.cols = "Species")

ore.scriptDrop(name = "nnet_build")
ore.scriptDrop(name = "nnet_score")
ore.scriptDrop(name = "nnet_detail")

# Feature extraction with pca
# Feature extraction with pca
ore.scriptCreate("pca_build", 
                 function(dat){
                   mod <- prcomp(dat, retx = FALSE)
                   attr(mod, "dm$nfeat") <- ncol(mod$rotation)
                   mod}, 
                 overwrite = TRUE)

ore.scriptCreate("pca_score", 
                 function(mod, data) {
                   res <- predict(mod, data)
                   as.data.frame(res)}, 
                 overwrite=TRUE)

ore.scriptCreate("pca_detail", 
                 function(mod) {
                   rotation_t <- t(mod$rotation)
                   data.frame(id = seq_along(rownames(rotation_t)), 
                                             rotation_t)},
	               overwrite = TRUE)

X <- IRIS[, -5L]
ralg.pca <- ore.odmRAlg(X, 
                        mining.function = "feature_extraction",
                        formula = NULL,
                        build.function = "pca_build",
                        score.function = "pca_score",
                        detail.function = "pca_detail",
                        detail.value = data.frame(Feature.ID=1, 
                                                  ore.pull(head(X,1L))))

summary(ralg.pca)
head(cbind(X, Pred = predict(ralg.pca, newdata = X)))

ore.scriptDrop(name = "pca_build")
ore.scriptDrop(name = "pca_score")
ore.scriptDrop(name = "pca_detail")

options(digits = digits)

Listing for This Example

R> library(OREembed)
R> 
R> digits <- getOption("digits")
R> options(digits = 5L)
R> 
R> IRIS <- ore.push(iris)
R> 
R> # Regression with glm
R> ore.scriptCreate("glm_build", 
+                   function(data, form, family) 
+                   glm(formula = form, data = data, family = family))
R> 
R> ore.scriptCreate("glm_score", 
+                    function(mod, data)
+                      { res <- predict(mod, newdata = data); 
+                        data.frame(res) })
R> 
R> ore.scriptCreate("glm_detail", function(mod) 
+                   data.frame(name=names(mod$coefficients), 
+                                     coef=mod$coefficients))
R>
R> ore.scriptList(name = "glm_build")
       NAME                                                                            SCRIPT
1 glm_build function (data, form, family) \nglm(formula = form, data = data, family = family)

R> ore.scriptList(name = "glm_score")
       NAME                                                                                    SCRIPT
1 glm_score function (mod, data) \n{\n    res <- predict(mod, newdata = data)\n    data.frame(res)\n}
R> ore.scriptList(name = "glm_detail")
        NAME                                                                               SCRIPT
1 glm_detail function (mod) \ndata.frame(name = names(mod$coefficients), coef = mod$coefficients)
R> 
R> ralg.glm <- ore.odmRAlg(IRIS, mining.function = "regression",
+                         formula = c(form="Sepal.Length ~ ."),
+                         build.function = "glm_build", 
+                         build.parameter = list(family="gaussian"),
+                         score.function = "glm_score",
+                         detail.function = "glm_detail", 
+                         detail.value = data.frame(name="a", coef=1))
R> 
R> summary(ralg.glm)

Call:
ore.odmRAlg(data = IRIS, mining.function = "regression", formula = c(form = "Sepal.Length ~ ."), 
    build.function = "glm_build", build.parameter = list(family = "gaussian"), 
    score.function = "glm_score", detail.function = "glm_detail", 
    detail.value = data.frame(name = "a", coef = 1))

Settings: 
                                                                                       value
odms.missing.value.treatment                                         odms.missing.value.auto
odms.sampling                                                          odms.sampling.disable
prep.auto                                                                                OFF
build.function                                                            OML_USER.glm_build
build.parameter              select 'Sepal.Length ~ .' "form", 'gaussian' "family" from dual
details.format                 select cast('a' as varchar2(4000)) "name", 1 "coef" from dual
details.function                                                         OML_USER.glm_detail
score.function                                                            OML_USER.glm_score

               name     coef
1       (Intercept)  2.17127
2      Petal.Length  0.82924
3       Petal.Width -0.31516
4       Sepal.Width  0.49589
5 Speciesversicolor -0.72356
6  Speciesvirginica -1.02350
R> predict(ralg.glm, newdata = head(IRIS), supplemental.cols = "Sepal.Length")
  Sepal.Length PREDICTION
1          5.1     5.0048
2          4.9     4.7568
3          4.7     4.7731
4          4.6     4.8894
5          5.0     5.0544
6          5.4     5.3889
R> 
R> ore.scriptDrop(name = "glm_build")
R> ore.scriptDrop(name = "glm_score")
R> ore.scriptDrop(name = "glm_detail")
R> 
R> # Classification with nnet
R> ore.scriptCreate("nnet_build", 
+                   function(dat, form, sz){
+                     require(nnet); 
+                     set.seed(1234);
+                     nnet(formula = formula(form), data = dat, 
+                            size = sz, linout = TRUE, trace = FALSE); 
+                   }, 
+                   overwrite = TRUE)
R> 
R> ore.scriptCreate("nnet_detail", function(mod)
+                   data.frame(conn = mod$conn, wts = mod$wts), 
+                   overwrite = TRUE)
R> 
R> ore.scriptCreate("nnet_score", 
+                   function(mod, data) {
+                     require(nnet); 
+                     res <- data.frame(predict(mod, newdata = data)); 
+                     names(res) <- sort(mod$lev); res
+                   })
R> 
R> ralg.nnet <- ore.odmRAlg(IRIS, mining.function = "classification",
+                           formula = c(form="Species ~ ."),
+                           build.function = "nnet_build", 
+                           build.parameter = list(sz=2),
+                           score.function = "nnet_score",
+                           detail.function = "nnet_detail",
+                           detail.value = data.frame(conn=1, wts =1))
R> 
R> summary(ralg.nnet)

Call:
ore.odmRAlg(data = IRIS, mining.function = "classification", 
    formula = c(form = "Species ~ ."), build.function = "nnet_build", 
    build.parameter = list(sz = 2), score.function = "nnet_score", 
    detail.function = "nnet_detail", detail.value = data.frame(conn = 1, 
        wts = 1))

Settings: 
                                                                     value
clas.weights.balanced                                                  OFF
odms.missing.value.treatment                       odms.missing.value.auto
odms.sampling                                        odms.sampling.disable
prep.auto                                                              OFF
build.function                                         OML_USER.nnet_build
build.parameter              select 'Species ~ .' "form", 2 "sz" from dual
details.format                          select 1 "conn", 1 "wts" from dual
details.function                                      OML_USER.nnet_detail
score.function                                         OML_USER.nnet_score

   conn       wts
1     0   1.46775
2     1 -12.88542
3     2  -4.38886
4     3   9.98648
5     4  16.57056
6     0   0.97809
7     1  -0.51626
8     2  -0.94815
9     3   0.13692
10    4   0.35104
11    0  37.22475
12    5 -66.49123
13    6  70.81160
14    0  -4.50893
15    5   7.01611
16    6  20.88774
17    0 -32.15127
18    5  58.92088
19    6 -91.96989
R> predict(ralg.nnet, newdata = head(IRIS), supplemental.cols = "Species")
  Species PREDICTION PROBABILITY
1  setosa     setosa     0.99999
2  setosa     setosa     0.99998
3  setosa     setosa     0.99999
4  setosa     setosa     0.99998
5  setosa     setosa     1.00000
6  setosa     setosa     0.99999
R> 
R> ore.scriptDrop(name = "nnet_build")
R> ore.scriptDrop(name = "nnet_score")
R> ore.scriptDrop(name = "nnet_detail")
R> 
R> ore.scriptCreate("pca_build", 
+                   function(dat){
+                     mod <- prcomp(dat, retx = FALSE)
+                     attr(mod, "dm$nfeat") <- ncol(mod$rotation)
+                     mod}, 
+                   overwrite = TRUE)
R> 
R> ore.scriptCreate("pca_score", 
+                   function(mod, data) {
+                     res <- predict(mod, data)
+                     as.data.frame(res)}, 
+                   overwrite=TRUE)
R> 
R> ore.scriptCreate("pca_detail", 
+                   function(mod) {
+                     rotation_t <- t(mod$rotation)
+                     data.frame(id = seq_along(rownames(rotation_t)), 
+                                               rotation_t)},
+                   overwrite = TRUE)
R> 
R> X <- IRIS[, -5L]
R> ralg.pca <- ore.odmRAlg(X, 
+                         mining.function = "feature_extraction",
+                         formula = NULL,
+                         build.function = "pca_build",
+                         score.function = "pca_score",
+                         detail.function = "pca_detail",
+                         detail.value = data.frame(Feature.ID=1, 
+                                                   ore.pull(head(X,1L))))
R> 
R> summary(ralg.pca)

Call:
ore.odmRAlg(data = X, mining.function = "feature_extraction", 
    formula = NULL, build.function = "pca_build", score.function = "pca_score", 
    detail.function = "pca_detail", detail.value = data.frame(Feature.ID = 1, 
        ore.pull(head(X, 1L))))

Settings: 
                                                                  value
odms.missing.value.treatment                    odms.missing.value.auto
odms.sampling                                     odms.sampling.disable
prep.auto                                                           OFF
build.function                                       OML_USER.pca_build
details.format    select 1 "Feature.ID", 5.1 "Sepal.Length", 3.5 "Sepal.Width", 1.4 "Petal.Length", 0.2 "Petal.Width" from dual
details.function                                    OML_USER.pca_detail
score.function                                       OML_USER.pca_score

  Feature.ID Sepal.Length Sepal.Width Petal.Length Petal.Width
1          1     0.856671    0.358289      0.36139   -0.084523
2          2    -0.173373   -0.075481      0.65659    0.730161
3          3     0.076236    0.545831     -0.58203    0.597911
4          4     0.479839   -0.753657     -0.31549    0.319723
R> head(cbind(X, Pred = predict(ralg.pca, newdata = X)))
  Sepal.Length Sepal.Width Petal.Length Petal.Width FEATURE_ID
1          5.1         3.5          1.4         0.2          2
2          4.9         3.0          1.4         0.2          4
3          4.7         3.2          1.3         0.2          3
4          4.6         3.1          1.5         0.2          4
5          5.0         3.6          1.4         0.2          2
6          5.4         3.9          1.7         0.4          2
R> 
R> ore.scriptDrop(name = "pca_build")
R> ore.scriptDrop(name = "pca_score")
R> ore.scriptDrop(name = "pca_detail")
R> 
R> options(digits = digits)