The ore.odmGLM function builds Generalized Linear Models (GLM), which include and extend the class of linear models (linear regression). Generalized linear models relax the restrictions on linear models, which are often violated in practice. For example, binary (yes/no or 0/1) responses do not have same variance across classes.
The Oracle Data Mining GLM is a parametric modeling technique. Parametric models make assumptions about the distribution of the data. When the assumptions are met, parametric models can be more efficient than non-parametric models.
The challenge in developing models of this type involves assessing the extent to which the assumptions are met. For this reason, quality diagnostics are key to developing quality parametric models.
In addition to the classical weighted least squares estimation for linear regression and iteratively re-weighted least squares estimation for logistic regression, both solved through Cholesky decomposition and matrix inversion, Oracle Data Mining GLM provides a conjugate gradient-based optimization algorithm that does not require matrix inversion and is very well suited to high-dimensional data. The choice of algorithm is handled internally and is transparent to the user.
GLM can be used to build classification or regression models as follows:
Classification: Binary logistic regression is the GLM classification algorithm. The algorithm uses the logit link function and the binomial variance function.
Regression: Linear regression is the GLM regression algorithm. The algorithm assumes no target transformation and constant variance over the range of target values.
The ore.odmGLM function allows you to build two different types of models. Some arguments apply to classification models only and some to regression models only.
For information on the ore.odmGLM function arguments, invoke help(ore.odmGLM).
The following examples build several models using GLM. The input ore.frame objects are R data sets pushed to the database.
Example 4-15 Building a Linear Regression Model
This example builds a linear regression model using the longley data set.
longley_of <- ore.push(longley) longfit1 <- ore.odmGLM(Employed ~ ., data = longley_of) summary(longfit1)Listing for Example 4-15
R> longley_of <- ore.push(longley)
R> longfit1 <- ore.odmGLM(Employed ~ ., data = longley_of)
R> summary(longfit1)
Call:
ore.odmGLM(formula = Employed ~ ., data = longely_of)
Residuals:
Min 1Q Median 3Q Max
-0.41011 -0.15767 -0.02816 0.10155 0.45539
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.482e+03 8.904e+02 -3.911 0.003560 **
GNP.deflator 1.506e-02 8.492e-02 0.177 0.863141
GNP -3.582e-02 3.349e-02 -1.070 0.312681
Unemployed -2.020e-02 4.884e-03 -4.136 0.002535 **
Armed.Forces -1.033e-02 2.143e-03 -4.822 0.000944 ***
Population -5.110e-02 2.261e-01 -0.226 0.826212
Year 1.829e+00 4.555e-01 4.016 0.003037 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3049 on 9 degrees of freedom
Multiple R-squared: 0.9955, Adjusted R-squared: 0.9925
F-statistic: 330.3 on 6 and 9 DF, p-value: 4.984e-10
Example 4-16 Using Ridge Estimation for the Coefficients of the ore.odmGLM Model
This example uses the longley_of ore.frame from Example 4-15. Example 4-16 invokes the ore.odmGLM function and specifies using ridge estimation for the coefficients.
longfit2 <- ore.odmGLM(Employed ~ ., data = longley_of, ridge = TRUE,
ridge.vif = TRUE)
summary(longfit2)
Listing for Example 4-16
R> longfit2 <- ore.odmGLM(Employed ~ ., data = longley_of, ridge = TRUE,
+ ridge.vif = TRUE)
R> summary(longfit2)
Call:
ore.odmGLM(formula = Employed ~ ., data = longley_of, ridge = TRUE,
ridge.vif = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.4100 -0.1579 -0.0271 0.1017 0.4575
Coefficients:
Estimate VIF
(Intercept) -3.466e+03 0.000
GNP.deflator 1.479e-02 0.077
GNP -3.535e-02 0.012
Unemployed -2.013e-02 0.000
Armed.Forces -1.031e-02 0.000
Population -5.262e-02 0.548
Year 1.821e+00 2.212
Residual standard error: 0.3049 on 9 degrees of freedom
Multiple R-squared: 0.9955, Adjusted R-squared: 0.9925
F-statistic: 330.2 on 6 and 9 DF, p-value: 4.986e-10
Example 4-17 Building a Logistic Regression GLM
This example builds a logistic regression (classification) model. It uses the infert data set. The example invokes the ore.odmGLM function and specifies logistic as the type argument, which builds a binomial GLM.
infert_of <- ore.push(infert)
infit1 <- ore.odmGLM(case ~ age+parity+education+spontaneous+induced,
data = infert_of, type = "logistic")
infit1
Listing for Example 4-17
R> infert_of <- ore.push(infert)
R> infit1 <- ore.odmGLM(case ~ age+parity+education+spontaneous+induced,
+ data = infert_of, type = "logistic")
R> infit1
Response:
case == "1"
Call: ore.odmGLM(formula = case ~ age + parity + education + spontaneous +
induced, data = infert_of, type = "logistic")
Coefficients:
(Intercept) age parity education0-5yrs education12+ yrs spontaneous induced
-2.19348 0.03958 -0.82828 1.04424 -0.35896 2.04590 1.28876
Degrees of Freedom: 247 Total (i.e. Null); 241 Residual
Null Deviance: 316.2
Residual Deviance: 257.8 AIC: 271.8
Example 4-18 Specifying a Reference Value in Building a Logistic Regression GLM
This example builds a logistic regression (classification) model and specifies a reference value. The example uses the infert_of ore.frame from Example 4-17.
infit2 <- ore.odmGLM(case ~ age+parity+education+spontaneous+induced,
data = infert_of, type = "logistic", reference = 1)
infit2
Listing for Example 4-18
infit2 <- ore.odmGLM(case ~ age+parity+education+spontaneous+induced,
data = infert_of, type = "logistic", reference = 1)
infit2
Response:
case == "0"
Call: ore.odmGLM(formula = case ~ age + parity + education + spontaneous +
induced, data = infert_of, type = "logistic", reference = 1)
Coefficients:
(Intercept) age parity education0-5yrs education12+ yrs spontaneous induced
2.19348 -0.03958 0.82828 -1.04424 0.35896 -2.04590 -1.28876
Degrees of Freedom: 247 Total (i.e. Null); 241 Residual
Null Deviance: 316.2
Residual Deviance: 257.8 AIC: 271.8