9.15 Neural Network
The oml.nn
class creates a Neural Network (NN) model for classification and regression.
Neural Network models can be used to capture intricate nonlinear relationships between inputs and outputs or to find patterns in data.
The oml.nn
class methods build a feed-forward neural network for regression on oml.DataFrame
data. It supports multiple hidden layers with a specifiable number of nodes. Each layer can have one of several activation functions.
The output layer is a single numeric or binary categorical target. The output layer can have any of the activation functions. It has the linear activation function by default.
Modeling with the oml.nn
class is well-suited for noisy and complex data such as sensor data. Problems that such data might have are the following:
-
Potentially many (numeric) predictors, for example, pixel values
-
The target may be discrete-valued, real-valued, or a vector of such values
-
Training data may contain errors – robust to noise
-
Fast scoring
-
Model transparency is not required; models difficult to interpret
Typical steps in Neural Network modeling are the following:
-
Specifying the architecture
-
Preparing the data
-
Building the model
-
Specifying the stopping criteria: iterations, error on a validation set within tolerance
-
Viewing statistical results from the model
-
Improving the model
For information on the oml.nn
class attributes and methods, invoke
help(oml.nn)
or help(oml.hist)
, or see Oracle Machine
Learning for Python API Reference.
Settings for a Neural Network Model
The following table lists settings for NN models.
Table 9-13 Neural Network Models Settings
Setting Name | Setting Value | Description |
---|---|---|
CLAS_COST_TABLE_NAME |
table_name |
The name of a table that stores a cost matrix for the algorithm to use in scoring the model. The cost matrix specifies the costs associated with misclassifications. The cost matrix table is user-created. The following are the column requirements for the table.
|
CLAS_WEIGHTS_BALANCED |
|
Indicates whether the algorithm must create a model that balances the target distribution. This setting is most relevant in the presence of rare targets, as balancing the distribution may enable better average accuracy (average of per-class accuracy) instead of overall accuracy (which favors the dominant class). The default value is |
|
A list of the following strings:
|
Defines the activation function for the hidden layers. For example, ''' Different layers can have different activation functions. The default value is '' The number of activation functions must be consistent with Note: All quotes are single and two single quotes are used to escape a single quote in SQL statements. |
NNET_HELDASIDE_MAX_FAIL |
A positive integer |
With The default value is |
NNET_HELDASIDE_RATIO |
|
Defines the held ratio for the held-aside method. The default value is |
|
A non-negative integer |
Defines the topology by number of hidden layers. The default value is |
|
A positive integer |
Specifies the maximum number of iterations in the Neural Network algorithm. The default value is |
|
A list of positive integers |
Defines the topology by number of nodes per layer. Different layers can have different number of nodes. The value should be a comma separated list non-negative integers. For example, '10, 20, 5'. The setting values must be consistent with |
|
|
Defines the L2 regularization parameter lambda. This can not be set together with The default value is |
|
|
Regularization setting for the Neural Network algorithm. If the total number of training rows is greater than 50000, then the default is |
|
|
Specifies the method of optimization. The default value is |
|
|
Defines the convergence tolerance setting of the Neural Network algorithm. The default value is |
|
|
Specifies the lower bound of the region where weights are randomly initialized.
NNET_WEIGHT_LOWER_BOUND and NNET_WEIGHT_UPPER_BOUND must be set together. Setting one and not setting the other raises an error. NNET_WEIGHT_LOWER_BOUND must not be greater than NNET_WEIGHT_UPPER_BOUND . The default value is –sqrt(6/(l_nodes+r_nodes)) . The value of l_nodes for:
The value of |
|
|
Specifies the upper bound of the region where weights are initialized. It should be set in pairs with The default value is |
ODMS_RANDOM_SEED |
A non-negative integer |
Controls the random number seed used by the hash function to generate a random number with uniform distribution. The default values is |
See Also:
Example 9-15 Building a Neural Network Model
This example creates an NN model and uses some of the methods of the oml.nn
class.
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Create training and test data.
dat = oml.sync(table = 'IRIS').split()
train_x = dat[0].drop('Species')
train_y = dat[0]['Species']
test_dat = dat[1]
# Create a Neural Network model object.
nn_mod = oml.nn(nnet_hidden_layers = 1,
nnet_activations= "'NNET_ACTIVATIONS_LOG_SIG'",
NNET_NODES_PER_LAYER= '30')
# Fit the NN model according to the training data and parameter
# settings.
nn_mod = nn_mod.fit(train_x, train_y)
# Show details of the model.
nn_mod
# Use the model to make predictions on test data.
nn_mod.predict(test_dat.drop('Species'),
supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
'Petal_Length', 'Species']])
nn_mod.predict(test_dat.drop('Species'),
supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
'Species']], proba = True)
nn_mod.predict_proba(test_dat.drop('Species'),
supplemental_cols = test_dat[:, ['Sepal_Length',
'Species']]).sort_values(by = ['Sepal_Length', 'Species',
'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])
nn_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])
# Change the setting parameter and refit the model.
new_setting = {'NNET_NODES_PER_LAYER': '50'}
nn_mod.set_params(**new_setting).fit(train_x, train_y)
Listing for This Example
>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data,
... columns = ['Sepal_Length','Sepal_Width',
... 'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x:
... {0: 'setosa', 1: 'versicolor',
... 2:'virginica'}[x], iris.target)),
... columns = ['Species'])
>>>
>>> try:
... oml.drop('IRIS')
... except:
... pass
>>>
>>> # Create the IRIS database table and the proxy object for the table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training and test data.
... dat = oml.sync(table = 'IRIS').split()
>>> train_x = dat[0].drop('Species')
>>> train_y = dat[0]['Species']
>>> test_dat = dat[1]
>>>
>>> # Create a Neural Network model object.
... nn_mod = oml.nn(nnet_hidden_layers = 1,
... nnet_activations= "'NNET_ACTIVATIONS_LOG_SIG'",
... NNET_NODES_PER_LAYER= '30')
>>>
>>> # Fit the NN model according to the training data and parameter
... # settings.
... nn_mod = nn_mod.fit(train_x, train_y)
>>>
>>> # Show details of the model.
... nn_mod
Algorithm Name: Neural Network
Mining Function: CLASSIFICATION
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_NEURAL_NETWORK
1 CLAS_WEIGHTS_BALANCED OFF
2 LBFGS_GRADIENT_TOLERANCE .000000001
3 LBFGS_HISTORY_DEPTH 20
4 LBFGS_SCALE_HESSIAN LBFGS_SCALE_HESSIAN_ENABLE
5 NNET_ACTIVATIONS 'NNET_ACTIVATIONS_LOG_SIG'
6 NNET_HELDASIDE_MAX_FAIL 6
7 NNET_HELDASIDE_RATIO .25
8 NNET_HIDDEN_LAYERS 1
9 NNET_ITERATIONS 200
10 NNET_NODES_PER_LAYER 30
11 NNET_TOLERANCE .000001
12 ODMS_DETAILS ODMS_ENABLE
13 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
14 ODMS_RANDOM_SEED 0
15 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
16 PREP_AUTO ON
Computed Settings:
setting name setting value
0 NNET_REGULARIZER NNET_REGULARIZER_NONE
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 ITERATIONS 60.0
2 LOSS_VALUE 0.0
3 NUM_ROWS 102.0
Attributes:
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width
Partition: NO
Topology:
HIDDEN_LAYER_ID NUM_NODE ACTIVATION_FUNCTION
0 0 30 NNET_ACTIVATIONS_LOG_SIG
Weights:
LAYER IDX_FROM IDX_TO ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_VALUE \
0 0 0.0 0 Petal_Length None None
1 0 0.0 1 Petal_Length None None
2 0 0.0 2 Petal_Length None None
3 0 0.0 3 Petal_Length None None
... ... ... ... ... ... ...
239 1 29.0 2 None None None
240 1 NaN 0 None None None
241 1 NaN 1 None None None
242 1 NaN 2 None None None
TARGET_VALUE WEIGHT
0 None -39.836487
1 None 32.604824
2 None 0.953903
3 None 0.714064
... ... ...
239 virginica -22.650606
240 setosa 2.402457
241 versicolor 7.647615
242 virginica -9.493982
[243 rows x 8 columns]
>>> # Use the model to make predictions on test data.
... nn_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
... 'Petal_Length', 'Species']])
Sepal_Length Sepal_Width Petal_Length Species PREDICTION
0 4.9 3.0 1.4 setosa setosa
1 4.9 3.1 1.5 setosa setosa
2 4.8 3.4 1.6 setosa setosa
3 5.8 4.0 1.2 setosa setosa
... ... ... ... ... ...
44 6.7 3.3 5.7 virginica virginica
45 6.7 3.0 5.2 virginica virginica
46 6.5 3.0 5.2 virginica virginica
47 5.9 3.0 5.1 virginica virginica
>>> nn_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
... 'Species']], proba = True)
Sepal_Length Sepal_Width Species PREDICTION PROBABILITY
0 4.9 3.0 setosa setosa 1.000000
1 4.9 3.1 setosa setosa 1.000000
2 4.8 3.4 setosa setosa 1.000000
3 5.8 4.0 setosa setosa 1.000000
... ... ... ... ... ...
44 6.7 3.3 virginica virginica 1.000000
45 6.7 3.0 virginica virginica 1.000000
46 6.5 3.0 virginica virginica 1.000000
47 5.9 3.0 virginica virginica 1.000000
>>> nn_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Species']]).sort_values(by = ['Sepal_Length', 'Species',
... 'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])
Sepal_Length Species PROBABILITY_OF_SETOSA \
0 4.4 setosa 1.000000e+00
1 4.4 setosa 1.000000e+00
2 4.5 setosa 1.000000e+00
3 4.8 setosa 1.000000e+00
... ... ... ...
44 6.7 virginica 4.567318e-218
45 6.9 versicolor 3.028266e-177
46 6.9 virginica 1.203417e-215
47 7.0 versicolor 3.382837e-148
PROBABILITY_OF_VERSICOLOR PROBABILITY_OF_VIRGINICA
0 3.491272e-67 3.459448e-283
1 8.038930e-58 2.883999e-288
2 5.273544e-64 2.243282e-293
3 1.332150e-78 2.040723e-283
... ... ...
44 1.328042e-36 1.000000e+00
45 1.000000e+00 5.063405e-55
46 4.000953e-31 1.000000e+00
47 1.000000e+00 2.593761e-121
>>> nn_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])
0.9375
>>> # Change the setting parameter and refit the model.
... new_setting = {'NNET_NODES_PER_LAYER': '50'}
>>> nn_mod.set_params(**new_setting).fit(train_x, train_y)
Algorithm Name: Neural Network
Mining Function: CLASSIFICATION
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_NEURAL_NETWORK
1 CLAS_WEIGHTS_BALANCED OFF
2 LBFGS_GRADIENT_TOLERANCE .000000001
3 LBFGS_HISTORY_DEPTH 20
4 LBFGS_SCALE_HESSIAN LBFGS_SCALE_HESSIAN_ENABLE
5 NNET_ACTIVATIONS 'NNET_ACTIVATIONS_LOG_SIG'
6 NNET_HELDASIDE_MAX_FAIL 6
7 NNET_HELDASIDE_RATIO .25
8 NNET_HIDDEN_LAYERS 1
9 NNET_ITERATIONS 200
10 NNET_NODES_PER_LAYER 50
11 NNET_TOLERANCE .000001
12 ODMS_DETAILS ODMS_ENABLE
13 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
14 ODMS_RANDOM_SEED 0
15 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
16 PREP_AUTO ON
Computed Settings:
setting name setting value
0 NNET_REGULARIZER NNET_REGULARIZER_NONE
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 ITERATIONS 68.0
2 LOSS_VALUE 0.0
3 NUM_ROWS 102.0
Attributes:
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width
Partition: NO
Topology:
HIDDEN_LAYER_ID NUM_NODE ACTIVATION_FUNCTION
0 0 50 NNET_ACTIVATIONS_LOG_SIG
Weights:
LAYER IDX_FROM IDX_TO ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_VALUE \
0 0 0.0 0 Petal_Length None None
1 0 0.0 1 Petal_Length None None
2 0 0.0 2 Petal_Length None None
3 0 0.0 3 Petal_Length None None
... ... ... ... ... ... ...
399 1 49.0 2 None None None
400 1 NaN 0 None None None
401 1 NaN 1 None None None
402 1 NaN 2 None None None
TARGET_VALUE WEIGHT
0 None 10.606389
1 None -37.256485
2 None -14.263772
3 None -17.945173
... ... ...
399 virginica -22.179815
400 setosa -6.452953
401 versicolor 13.186332
402 virginica -6.973605
[403 rows x 8 columns]