RALG_SCORE_FUNCTION

Use the RALG_SCORE_FUNCTION setting to specify an existing registered R script for R algorithm machine learning model to use for scoring data.

The specified R script defines an R function. The first input argument defines the model object. The second input argument defines the R data.frame that is used for scoring data.

Example 6-7 Example of RALG_SCORE_FUNCTION

This example shows how the R function takes the Linear Model model and scores the data in the data.frame. The function argument object is the LM model. The argument newdata is a data.frame containing the data to score.

function(object, newdata) {res <- predict.lm(object, newdata = newdata, se.fit = TRUE); data.frame(fit=res$fit, se=res$se.fit, df=summary(object)$df[1L])}

The output of the R function must be a data.frame. Each row represents the prediction for the corresponding scoring data from the input data.frame. The columns of the data.frame are specific to machine learning functions, such as:

Regression: A single numeric column for the predicted target value, with two optional columns containing the standard error of the model fit, and the degrees of freedom number. The optional columns are needed for the SQL function PREDICTION_BOUNDS to work.

Example 6-8 Example of RALG_SCORE_FUNCTION for Regression

This example shows how to specify the name of the R script MY_LM_PREDICT_SCRIPT that is used to score the model in the model settings table model_setting_table.

Begin
insert into model_setting_table values
(dbms_data_mining.ralg_score_function, 'MY_LM_PREDICT_SCRIPT');
End;
/

In the Oracle Machine Learning for R script repository, the script MY_LM_PREDICT_SCRIPT is registered as:

function(object, newdata) {data.frame(pre = predict(object, newdata = newdata))}

Classification: Each column represents the predicted probability of one target class. The column name is the target class name.

Example 6-9 Example of RALG_SCORE_FUNCTION for Classification

This example shows how to specify the name of the R script MY_LOGITGLM_PREDICT_SCRIPT that is used to score the logit Classification model in the model settings table model_setting_table.

Begin
insert into model_setting_table values
(dbms_data_mining.ralg_score_function, 'MY_LOGITGLM_PREDICT_SCRIPT');
End;
/

In the Oracle Machine Learning for R script repository, MY_LOGITGLM_PREDICT_SCRIPT is registered as follows. It is a logit Classification with two target classes, "0" and "1".

'function(object, newdata) {
   pred <- predict(object, newdata = newdata, type="response");
   res <- data.frame(1-pred, pred);
   names(res) <- c("0", "1");
   res}'

Clustering: Each column represents the predicted probability of one cluster. The columns are arranged in order of cluster ID. Each cluster is assigned a cluster ID, and they are consecutive values starting from 1. To support CLUSTER_DISTANCE in the R model, the output of R score function returns an extra column containing the value of the distance to each cluster in order of cluster ID after the columns for the predicted probability.

Example 6-10 Example of RALG_SCORE_FUNCTION for Clustering

This example shows how to specify the name of the R script MY_CLUSTER_PREDICT_SCRIPT that is used to score the model in the model settings table model_setting_table.

Begin
insert into model_setting_table values
(dbms_data_mining.ralg_score_function, 'MY_CLUSTER_PREDICT_SCRIPT');
End;
/

In the Oracle Machine Learning for R script repository, the script MY_CLUSTER_PREDICT_SCRIPT is registered as:

'function(object, dat){
     mod <- object[[1L]]; ce <- object[[2L]]; sc <- object[[3L]];
     newdata = scale(dat, center = ce, scale = sc);
     centers <- mod$centers;
     ss <- sapply(as.data.frame(t(centers)),
     function(v) rowSums(scale(newdata, center=v, scale=FALSE)^2));
     if (!is.matrix(ss)) ss <- matrix(ss, ncol=length(ss));
     disp <- -1 / (2* mod$tot.withinss/length(mod$cluster));
     distr <- exp(disp*ss);
     prob <- distr / rowSums(distr);
     as.data.frame(cbind(prob, sqrt(ss)))}'

The function fetches the centers of the clusters from the R model, and computes the probabilities for belonging to each cluster based on the distance of the score data to the corresponding cluster center.

Feature Extraction: Each column represents the coefficient value of one feature. The columns are arranged in order of feature ID. Each feature is assigned a feature ID, which are consecutive values starting from 1.

Example 6-11 Example of RALG_SCORE_FUNCTION for Feature Extraction

This example shows how to specify the name of the R script MY_FEATURE_EXTRACTION_SCRIPT that is used to score the model in the model settings table model_setting_table.

Begin
insert into model_setting_table values
(dbms_data_mining.ralg_score_function, 'MY_FEATURE_EXTRACTION_SCRIPT');
End;
/

In the Oracle Machine Learning for R script repository, the script MY_FEATURE_EXTRACTION_SCRIPT is registered as:

 'function(object, dat) { as.data.frame(predict(object, dat)) }'

The function fetches the centers of the features from the R model, and computes the feature coefficient based on the distance of the score data to the corresponding feature center.

Related Topics

Registered R Scripts