21 Neural Network

Learn about Neural Network for Regression and Classification mining techniques.

21.1 About Neural Network

Neural Network in Oracle Data Mining is designed for mining techniques like Classification and Regression.

In machine learning, an artificial neural network is an algorithm inspired from biological neural network and is used to estimate or approximate functions that depend on a large number of generally unknown inputs. An artificial neural network is composed of a large number of interconnected neurons which exchange messages between each other to solve specific problems. They learn by examples and tune the weights of the connections among the neurons during the learning process. Neural Network is capable of solving a wide variety of tasks such as computer vision, speech recognition, and various complex business problems.

Related Topics

21.1.1 Neuron and activation function

Neurons are the building blocks of a Neural Network.

A neuron takes one or more inputs having different weights and has an output which depends on the inputs. The output is achieved by adding up inputs of each neuron with weights and feeding the sum into the activation function.

A Sigmoid function is usually the most common choice for activation function but other non-linear functions, piecewise linear functions or step functions are also used. The following are some examples of activation functions:

  • Logistic Sigmoid function

  • Linear function

  • Tanh function

  • Arctan function

  • Bipolar sigmoid function

21.1.2 Loss or Cost function

A loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.

An optimization problem seeks to minimize a loss function. The form of loss function is chosen based on the nature of the problem and mathematical needs.

The following are the different loss functions for different scenarios:

  • Binary classification: cross entropy function.

  • Multi-class classification: softmax function.

  • Regression: squared error function.

21.1.3 Forward-Backward Propagation

Understand forward-backward propagation.

Forward propagation computes the loss function value by weighted summing the previous layer neuron values and applying activation functions. Backward propagation calculates the gradient of a loss function with respect to all the weights in the network. The weights are initialized with a set of random numbers uniformly distributed within a region specified by user (by setting weights boundaries), or region defined by the number of nodes in the adjacent layers (data driven). The gradients are fed to an optimization method which in turn uses them to update the weights, in an attempt to minimize the loss function.

21.1.4 Optimization Solver

Understand optimization solver.

An optimization solver is used to search for the optimal solution of the loss function to find the extreme value (maximum or minimum) of the loss (cost) function.

Oracle Data Mining implements Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) together with line search. L-BFGS is a Quasi-Newton method. This method uses rank-one updates specified by gradient evaluations to approximate Hessian matrix. This method only needs limited amount of memory. L-BFGS is used to find the descent direction and line search is used to find the appropriate step size. The number of historical copies kept in L-BFGS solver is defined by LBFGS_HISTORY_DEPTH. When the number of iterations is smaller than the history depth, the Hessian computed by L-BFGS is accurate. When the number of iterations is larger than the history depth, the Hessian computed by L-BFGS is an approximation. Therefore, the history depth cannot be too small or too large as the computation can be too slow. Typically, the value is between 3 and 10.

21.1.5 Regularization

Understand regularization.

Regularization refers to a process of introducing additional information to solve an ill-posed problem or to prevent over-fitting. Ill-posed or over-fitting can occur when a statistical model describes random error or noise instead of the underlying relationship. Typical regularization techniques include L1-norm regularization, L2-norm regularization, and held-aside.

Held-aside is usually used for large training date set whereas L1-norm regularization and L2-norm regularization are mostly used for small training date set.

21.1.6 Convergence Check

This checks if the optimal solution has been reached and if the iterations of the optimization has come to an end.

In L-BFGS solver, the convergence criteria includes maximum number of iterations, infinity norm of gradient, and relative error tolerance. For held-aside regularization, the convergence criteria checks the loss function value of the test data set, as well as the best model learned so far. The training is terminated when the model becomes worse for a specific number of iterations (specified by NNET_HELDASIDE_MAX_FAIL), or the loss function is close to zero, or the relative error on test data is less than the tolerance.

21.1.7 LBFGS_SCALE_HESSIAN

Defines LBFGS_SCALE_HESSIAN.

It specifies how to set the initial approximation of the inverse Hessian at the beginning of each iteration. If the value is set to be LBFGS_SCALE_HESSIAN_ENABLE, then we approximate the initial inverse Hessian with Oren-Luenberger scaling. If it is set to be LBFGS_SCALE_HESSIAN_DISABLE, then we use identity as the approximation of the inverse Hessian at the beginning of each iteration.

21.1.8 NNET_HELDASIDE_MAX_FAIL

Defines NNET_HELDASIDE_MAX_FAIL.

Validation data (held-aside) is used to stop training early if the network performance on the validation data fails to improve or remains the same for NNET_HELDASIDE_MAX_FAIL epochs in a row.

21.2 Data Preparation for Neural Network

Learn about preparing data for Neural Network.

The algorithm automatically "explodes" categorical data into a set of binary attributes, one per category value. Oracle Data Mining algorithms automatically handle missing values and therefore, missing value treatment is not necessary.

The algorithm automatically replaces missing categorical values with the mode and missing numerical values with the mean. Neural Network requires the normalization of numeric input. The algorithm uses z-score normalization. The normalization occurs only for two-dimensional numeric columns (not nested). Normalization places the values of numeric attributes on the same scale and prevents attributes with a large original scale from biasing the solution. Neural Network scales the numeric values in nested columns by the maximum absolute value seen in the corresponding columns.

21.3 Neural Network Algorithm Configuration

Learn about configuring Neural Network algorithm.

Specify Nodes Per Layer

INSERT INTO SETTINGS_TABLE (setting_name, setting_value) VALUES
                   ('NNET_NODES_PER_LAYER', '2,3');

Specify Activation Functions Per Layer

INSERT INTO SETTINGS_TABLE (setting_name, setting_value) VALUES
                   ('NNET_ACTIVATIONS', ' ' ' NNET_ACTIVATIONS_TANH ' ',  ' ' NNET_ACTIVATIONS_LOG_SIG ' ' ');

Example 21-1 Example

In this example you will understand how to build a Neural Network. When the settings table is created and populated, insert a row in the settings table to specify the algorithm.
INSERT INTO SETTINGS_TABLE (setting_name, setting_value) VALUES
     ('ALGO_NAME', 'ALGO_NEURAL_NETWORK');

Build the model as follows:

BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name          => 'model-name',
mining_function     => dbms_data_mining.classification/regression,
data_table_name     => 'test_table',
case_id_column_name => 'case_id',
target_column_name  => 'test_target',
settings_table_name => 'settings_table');
END;
/

21.4 Scoring with Neural Network

Learn to score with Neural Network.

Scoring with Neural Network is the same as any other Classification or Regression algorithm. The following functions are supported: PREDICTION, PREDICTION_PROBABILITY, PREDICTION_COST, PREDICTION_SET, and PREDICTION_DETAILS.