Missing Values
GLM automatically replaces missing values.
When building or applying a model, Oracle Machine Learning automatically replaces missing values of numerical attributes with the mean and missing values of categorical attributes with the mode.
You can configure the Generalized Linear Model algorithm to override the
default treatment of missing values. With the
ODMS_MISSING_VALUE_TREATMENT
setting, you can cause the
algorithm to delete rows in the training data that have missing values instead of
replacing them with the mean or the mode. However, when the model is applied, Oracle Machine Learning for SQL performs the usual mean/mode missing value
replacement. As a result, it is possible that the statistics generated from scoring
does not match the statistics generated from building the model.
If you want to delete rows with missing values in the scoring the model, you must perform the transformation explicitly. To make build and apply statistics match, you must remove the rows with NULLs from the scoring data before performing the apply operation. You can do this by creating a view.
CREATE VIEW viewname
AS SELECT * from tablename
WHERE column_name1
is NOT NULL
AND column_name2
is NOT NULL
AND column_name3
is NOT NULL .....
Note:
In Oracle Machine Learning for SQL, missing values in nested data indicate sparsity, not values missing at random.
The value
ODMS_MISSING_VALUE_DELETE_ROW
is only valid for tables
without nested columns. If this value is used with nested data, an exception is
raised.