Missing Values

GLM automatically replaces missing values.

When building or applying a model, Oracle Machine Learning automatically replaces missing values of numerical attributes with the mean and missing values of categorical attributes with the mode.

You can configure the Generalized Linear Model algorithm to override the default treatment of missing values. With the ODMS_MISSING_VALUE_TREATMENT setting, you can cause the algorithm to delete rows in the training data that have missing values instead of replacing them with the mean or the mode. However, when the model is applied, Oracle Machine Learning for SQL performs the usual mean/mode missing value replacement. As a result, it is possible that the statistics generated from scoring does not match the statistics generated from building the model.

If you want to delete rows with missing values in the scoring the model, you must perform the transformation explicitly. To make build and apply statistics match, you must remove the rows with NULLs from the scoring data before performing the apply operation. You can do this by creating a view.

CREATE VIEW viewname AS SELECT * from tablename 
     WHERE column_name1 is NOT NULL 
     AND   column_name2 is NOT NULL 
     AND   column_name3 is NOT NULL .....

Note:

In Oracle Machine Learning for SQL, missing values in nested data indicate sparsity, not values missing at random.

The value ODMS_MISSING_VALUE_DELETE_ROW is only valid for tables without nested columns. If this value is used with nested data, an exception is raised.