Neural Network

9.15 Neural Network

The oml.nn class creates a Neural Network (NN) model for classification and regression.

Neural Network models can be used to capture intricate nonlinear relationships between inputs and outputs or to find patterns in data.

The oml.nn class methods build a feed-forward neural network for regression on oml.DataFrame data. It supports multiple hidden layers with a specifiable number of nodes. Each layer can have one of several activation functions.

The output layer is a single numeric or binary categorical target. The output layer can have any of the activation functions. It has the linear activation function by default.

Modeling with the oml.nn class is well-suited for noisy and complex data such as sensor data. Problems that such data might have are the following:

Potentially many (numeric) predictors, for example, pixel values
The target may be discrete-valued, real-valued, or a vector of such values
Training data may contain errors – robust to noise
Fast scoring
Model transparency is not required; models difficult to interpret

Typical steps in Neural Network modeling are the following:

Specifying the architecture
Preparing the data
Building the model
Specifying the stopping criteria: iterations, error on a validation set within tolerance
Viewing statistical results from the model
Improving the model

For information on the oml.nn class attributes and methods, invoke help(oml.nn) or help(oml.hist), or see Oracle Machine Learning for Python API Reference.

Settings for a Neural Network Model

The following table lists settings for NN models.

Table 9-13 Neural Network Models Settings

Setting Name	Setting Value	Description
`CLAS_COST_TABLE_NAME`	table_name	The name of a table that stores a cost matrix for the algorithm to use in scoring the model. The cost matrix specifies the costs associated with misclassifications. The cost matrix table is user-created. The following are the column requirements for the table. Column Name: ACTUAL_TARGET_VALUE Data Type: Valid target data type Column Name: PREDICTED_TARGET_VALUE Data Type: Valid target data type Column Name: COST Data Type: NUMBER
`CLAS_WEIGHTS_BALANCED`	`ON` `OFF`	Indicates whether the algorithm must create a model that balances the target distribution. This setting is most relevant in the presence of rare targets, as balancing the distribution may enable better average accuracy (average of per-class accuracy) instead of overall accuracy (which favors the dominant class). The default value is `OFF`.
`NNET_ACTIVATIONS`	A list of the following strings: ''`NNET_ACTIVATIONS_ARCTAN`'' ''`NNET_ACTIVATIONS_BIPOLAR_SIG`'' ''`NNET_ACTIVATIONS_LINEAR`'' ''`NNET_ACTIVATIONS_LOG_SIG`'' ''`NNET_ACTIVATIONS_TANH`''	Defines the activation function for the hidden layers. For example, '''`NNET_ACTIVATIONS_BIPOLAR_SIG`'', ''`NNET_ACTIVATIONS_TANH`'''. Different layers can have different activation functions. The default value is ''`NNET_ACTIVATIONS_LOG_SIG`''. The number of activation functions must be consistent with `NNET_HIDDEN_LAYERS` and `NNET_NODES_PER_LAYER`. Note: All quotes are single and two single quotes are used to escape a single quote in SQL statements.
`NNET_HELDASIDE_MAX_FAIL`	A positive integer	With `NNET_REGULARIZER_HELDASIDE`, the training process is stopped early if the network performance on the validation data fails to improve or remains the same for `NNET_HELDASIDE_MAX_FAIL` epochs in a row. The default value is `6`.
`NNET_HELDASIDE_RATIO`	`0 <= numeric_expr <= 1`	Defines the held ratio for the held-aside method. The default value is `0.25`.
`NNET_HIDDEN_LAYERS`	A non-negative integer	Defines the topology by number of hidden layers. The default value is `1`.
`NNET_ITERATIONS`	A positive integer	Specifies the maximum number of iterations in the Neural Network algorithm. The default value is `200`.
`NNET_NODES_PER_LAYER`	A list of positive integers	Defines the topology by number of nodes per layer. Different layers can have different number of nodes. The value should be a comma separated list non-negative integers. For example, '10, 20, 5'. The setting values must be consistent with `NNET_HIDDEN_LAYERS`. The default number of nodes per layer is the number of attributes or `50` (if the number of attributes > `50`).
`NNET_REG_LAMBDA`	`TO_CHAR(numeric_expr >= 0)`	Defines the L2 regularization parameter lambda. This can not be set together with `NNET_REGULARIZER_HELDASIDE`. The default value is `1`.
`NNET_REGULARIZER`	`NNET_REGULARIZER_HELDASIDE` `NNET_REGULARIZER_L2` `NNET_REGULARIZER_NONE`	Regularization setting for the Neural Network algorithm. If the total number of training rows is greater than 50000, then the default is `NNET_REGULARIZER_HELDASIDE`. If the total number of training rows is less than or equal to 50000, then the default is `NNET_REGULARIZER_NONE`.
`NNET_SOLVER`	`NNET_SOLVER_ADAM` `NNET_SOLVER_LBFGS`	Specifies the method of optimization. The default value is `NNET_SOLVER_LBFGS`.
`NNET_TOLERANCE`	`TO_CHAR(0 <` `numeric_expr` `< 1)`	Defines the convergence tolerance setting of the Neural Network algorithm. The default value is `0.000001`.
`NNET_WEIGHT_LOWER_BOUND`	`A real number`	Specifies the lower bound of the region where weights are randomly initialized. `NNET_WEIGHT_LOWER_BOUND` and `NNET_WEIGHT_UPPER_BOUND` must be set together. Setting one and not setting the other raises an error. `NNET_WEIGHT_LOWER_BOUND` must not be greater than `NNET_WEIGHT_UPPER_BOUND`. The default value is `–sqrt(6/(l_nodes+r_nodes))`. The value of `l_nodes` for: input layer dense attributes is (`1+number of dense attributes`) input layer sparse attributes is `number of sparse attributes` each hidden layer is (`1+number of nodes in that hidden layer`) The value of `r_nodes` is the number of nodes in the layer that the weight is connecting to.
`NNET_WEIGHT_UPPER_BOUND`	`A real number`	Specifies the upper bound of the region where weights are initialized. It should be set in pairs with `NNET_WEIGHT_LOWER_BOUND` and its value must not be smaller than the value of `NNET_WEIGHT_LOWER_BOUND`. If not specified, the values of `NNET_WEIGHT_LOWER_BOUND` and `NNET_WEIGHT_UPPER_BOUND` are system determined. The default value is `sqrt(6/(l_nodes+r_nodes))`. See `NNET_WEIGHT_LOWER_BOUND`.
`ODMS_RANDOM_SEED`	A non-negative integer	Controls the random number seed used by the hash function to generate a random number with uniform distribution. The default values is `0`.

See Also:

Example 9-15 Building a Neural Network Model

This example creates an NN model and uses some of the methods of the oml.nn class.

import oml
import pandas as pd
from sklearn import datasets

# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
                 columns = ['Sepal_Length','Sepal_Width',
                            'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
                           {0: 'setosa', 1: 'versicolor',
                            2:'virginica'}[x], iris.target)),
                 columns = ['Species'])

try:
    oml.drop('IRIS')
except:
    pass

# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

# Create training and test data.
dat = oml.sync(table = 'IRIS').split()
train_x = dat[0].drop('Species')
train_y = dat[0]['Species']
test_dat = dat[1]

# Create a Neural Network model object.
nn_mod = oml.nn(nnet_hidden_layers = 1, 
                nnet_activations= "'NNET_ACTIVATIONS_LOG_SIG'", 
                NNET_NODES_PER_LAYER= '30')

# Fit the NN model according to the training data and parameter
# settings.
nn_mod = nn_mod.fit(train_x, train_y)

# Show details of the model.
nn_mod

# Use the model to make predictions on test data.
nn_mod.predict(test_dat.drop('Species'), 
    supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 
                                     'Petal_Length', 'Species']])

nn_mod.predict(test_dat.drop('Species'), 
    supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 
                                 'Species']], proba = True)

nn_mod.predict_proba(test_dat.drop('Species'), 
    supplemental_cols = test_dat[:, ['Sepal_Length', 
      'Species']]).sort_values(by = ['Sepal_Length', 'Species',
         'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])

nn_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])

# Change the setting parameter and refit the model.
new_setting = {'NNET_NODES_PER_LAYER': '50'}
nn_mod.set_params(**new_setting).fit(train_x, train_y)

Listing for This Example

>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data, 
...                  columns = ['Sepal_Length','Sepal_Width',
...                             'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x: 
...                            {0: 'setosa', 1: 'versicolor', 
...                             2:'virginica'}[x], iris.target)), 
...                  columns = ['Species'])
>>>
>>> try:
...    oml.drop('IRIS')
... except:
...    pass
>>>
>>> # Create the IRIS database table and the proxy object for the table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training and test data.
... dat = oml.sync(table = 'IRIS').split()
>>> train_x = dat[0].drop('Species')
>>> train_y = dat[0]['Species']
>>> test_dat = dat[1]
>>> 
>>> # Create a Neural Network model object.
... nn_mod = oml.nn(nnet_hidden_layers = 1, 
...                 nnet_activations= "'NNET_ACTIVATIONS_LOG_SIG'", 
...                 NNET_NODES_PER_LAYER= '30')
>>> 
>>> # Fit the NN model according to the training data and parameter
... # settings.
... nn_mod = nn_mod.fit(train_x, train_y)
>>>
>>> # Show details of the model.
... nn_mod

Algorithm Name: Neural Network

Mining Function: CLASSIFICATION

Target: Species

Settings: 
                    setting name               setting value
0                      ALGO_NAME         ALGO_NEURAL_NETWORK
1          CLAS_WEIGHTS_BALANCED                         OFF
2       LBFGS_GRADIENT_TOLERANCE                  .000000001
3            LBFGS_HISTORY_DEPTH                          20
4            LBFGS_SCALE_HESSIAN  LBFGS_SCALE_HESSIAN_ENABLE
5               NNET_ACTIVATIONS  'NNET_ACTIVATIONS_LOG_SIG'
6        NNET_HELDASIDE_MAX_FAIL                           6
7           NNET_HELDASIDE_RATIO                         .25
8             NNET_HIDDEN_LAYERS                           1
9                NNET_ITERATIONS                         200
10          NNET_NODES_PER_LAYER                          30
11                NNET_TOLERANCE                     .000001
12                  ODMS_DETAILS                 ODMS_ENABLE
13  ODMS_MISSING_VALUE_TREATMENT     ODMS_MISSING_VALUE_AUTO
14              ODMS_RANDOM_SEED                           0
15                 ODMS_SAMPLING       ODMS_SAMPLING_DISABLE
16                     PREP_AUTO                          ON

Computed Settings: 
       setting name          setting value
0  NNET_REGULARIZER  NNET_REGULARIZER_NONE

Global Statistics: 
   attribute name  attribute value
0       CONVERGED              YES
1      ITERATIONS             60.0
2      LOSS_VALUE              0.0
3        NUM_ROWS            102.0


Attributes: 
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width

Partition: NO

Topology: 

    HIDDEN_LAYER_ID  NUM_NODE       ACTIVATION_FUNCTION
 0                0        30  NNET_ACTIVATIONS_LOG_SIG

Weights: 

      LAYER  IDX_FROM  IDX_TO ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_VALUE  \
 0        0       0.0       0   Petal_Length              None            None   
 1        0       0.0       1   Petal_Length              None            None   
 2        0       0.0       2   Petal_Length              None            None   
 3        0       0.0       3   Petal_Length              None            None   
 ...    ...       ...     ...            ...               ...             ...   
 239      1      29.0       2           None              None            None   
 240      1       NaN       0           None              None            None   
 241      1       NaN       1           None              None            None   
 242      1       NaN       2           None              None            None   

     TARGET_VALUE     WEIGHT  
 0           None -39.836487  
 1           None  32.604824  
 2           None   0.953903  
 3           None   0.714064  
 ...          ...        ... 
 239    virginica -22.650606  
 240       setosa   2.402457  
 241   versicolor   7.647615  
 242    virginica  -9.493982  

[243 rows x 8 columns]



>>> # Use the model to make predictions on test data.
... nn_mod.predict(test_dat.drop('Species'), 
...     supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 
...                                      'Petal_Length', 'Species']])
     Sepal_Length  Sepal_Width  Petal_Length     Species  PREDICTION
 0            4.9          3.0           1.4      setosa      setosa
 1            4.9          3.1           1.5      setosa      setosa
 2            4.8          3.4           1.6      setosa      setosa
 3            5.8          4.0           1.2      setosa      setosa
...           ...          ...           ...         ...         ...
 44           6.7          3.3           5.7   virginica   virginica
 45           6.7          3.0           5.2   virginica   virginica
 46           6.5          3.0           5.2   virginica   virginica
 47           5.9          3.0           5.1   virginica   virginica

>>> nn_mod.predict(test_dat.drop('Species'), 
...     supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 
...                                  'Species']], proba = True)
     Sepal_Length  Sepal_Width     Species  PREDICTION  PROBABILITY
 0            4.9          3.0      setosa      setosa     1.000000
 1            4.9          3.1      setosa      setosa     1.000000
 2            4.8          3.4      setosa      setosa     1.000000
 3            5.8          4.0      setosa      setosa     1.000000
...           ...          ...         ...        ...          ...
 44           6.7          3.3   virginica   virginica     1.000000
 45           6.7          3.0   virginica   virginica     1.000000
 46           6.5          3.0   virginica   virginica     1.000000
 47           5.9          3.0   virginica   virginica     1.000000

>>> nn_mod.predict_proba(test_dat.drop('Species'), 
...     supplemental_cols = test_dat[:, ['Sepal_Length', 
...       'Species']]).sort_values(by = ['Sepal_Length', 'Species',
...         'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])
     Sepal_Length     Species  PROBABILITY_OF_SETOSA  \
 0            4.4      setosa           1.000000e+00   
 1            4.4      setosa           1.000000e+00   
 2            4.5      setosa           1.000000e+00   
 3            4.8      setosa           1.000000e+00   
...           ...         ...                    ...   
 44           6.7   virginica          4.567318e-218   
 45           6.9  versicolor          3.028266e-177   
 46           6.9   virginica          1.203417e-215   
 47           7.0  versicolor          3.382837e-148   

     PROBABILITY_OF_VERSICOLOR  PROBABILITY_OF_VIRGINICA  
 0                3.491272e-67             3.459448e-283  
 1                8.038930e-58             2.883999e-288  
 2                5.273544e-64             2.243282e-293  
 3                1.332150e-78             2.040723e-283  
...                        ...                       ... 
 44               1.328042e-36              1.000000e+00  
 45               1.000000e+00              5.063405e-55  
 46               4.000953e-31              1.000000e+00  
 47               1.000000e+00             2.593761e-121  

>>> nn_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])
 0.9375

>>> # Change the setting parameter and refit the model.
... new_setting = {'NNET_NODES_PER_LAYER': '50'}
>>> nn_mod.set_params(**new_setting).fit(train_x, train_y)

Algorithm Name: Neural Network

Mining Function: CLASSIFICATION

Target: Species

Settings: 
                    setting name               setting value
0                      ALGO_NAME         ALGO_NEURAL_NETWORK
1          CLAS_WEIGHTS_BALANCED                         OFF
2       LBFGS_GRADIENT_TOLERANCE                  .000000001
3            LBFGS_HISTORY_DEPTH                          20
4            LBFGS_SCALE_HESSIAN  LBFGS_SCALE_HESSIAN_ENABLE
5               NNET_ACTIVATIONS  'NNET_ACTIVATIONS_LOG_SIG'
6        NNET_HELDASIDE_MAX_FAIL                           6
7           NNET_HELDASIDE_RATIO                         .25
8             NNET_HIDDEN_LAYERS                           1
9                NNET_ITERATIONS                         200
10          NNET_NODES_PER_LAYER                          50
11                NNET_TOLERANCE                     .000001
12                  ODMS_DETAILS                 ODMS_ENABLE
13  ODMS_MISSING_VALUE_TREATMENT     ODMS_MISSING_VALUE_AUTO
14              ODMS_RANDOM_SEED                           0
15                 ODMS_SAMPLING       ODMS_SAMPLING_DISABLE
16                     PREP_AUTO                          ON

Computed Settings: 
       setting name          setting value
0  NNET_REGULARIZER  NNET_REGULARIZER_NONE

Global Statistics: 
   attribute name  attribute value
0       CONVERGED              YES
1      ITERATIONS             68.0
2      LOSS_VALUE              0.0
3        NUM_ROWS            102.0

Attributes: 
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width

Partition: NO

Topology: 

    HIDDEN_LAYER_ID  NUM_NODE       ACTIVATION_FUNCTION
 0                0        50  NNET_ACTIVATIONS_LOG_SIG

Weights: 

      LAYER  IDX_FROM  IDX_TO ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_VALUE  \
 0        0       0.0       0   Petal_Length              None            None   
 1        0       0.0       1   Petal_Length              None            None   
 2        0       0.0       2   Petal_Length              None            None   
 3        0       0.0       3   Petal_Length              None            None   
 ...    ...       ...     ...            ...               ...             ...   
 399      1      49.0       2           None              None            None   
 400      1       NaN       0           None              None            None   
 401      1       NaN       1           None              None            None   
 402      1       NaN       2           None              None            None   

     TARGET_VALUE     WEIGHT  
 0           None  10.606389  
 1           None -37.256485  
 2           None -14.263772  
 3           None -17.945173  
 ...          ...        ...  
 399    virginica -22.179815  
 400       setosa  -6.452953  
 401   versicolor  13.186332  
 402    virginica  -6.973605  

[403 rows x 8 columns]

Parent topic: OML4Py Classes That Provide Access to In-Database Machine Learning Algorithms