Inspect a Training Model

After you create the training model and run the data flow, you can review information about the model to determine its accuracy. Use this information to iteratively adjust the model settings to improve accuracy and predict better results.

  1. Click the Navigator icon and select Machine Learning.
  2. Click the Models tab.
  3. Click the menu icon for a model and select Inspect.
    The Inspect dialog is displayed.
  4. Browse the dialog's tabs for information about the model and to view the model's accuracy to determine if you need to adjust the model's parameters or select a more suitable training algorithm. Note the following information:
    • Quality tab - This tab contains model quality details that include accuracy metrics like model accuracy, precision, recall, F1 value, false positive rate, and so on. Oracle Analytics provides similar metrics irrespective of the algorithm used to create the model thereby making comparison between different models easy.

      During the model creation process, the input data set is split into two parts to train and test the model based on the Train Partition Percent parameter. The model uses the test portion of the data set to test the accuracy of the model that is built.

    • Related tab - Use to navigate to the data sets generated when you train a model. Depending on the algorithm, these data sets contain details about the model like: prediction rules, accuracy metrics, confusion matrix, key drivers for prediction, and so on.

      These parameters help you understand the rules the model used to determine the predictions and classifications. You can double-click a related data set to view it or to use it in a project.

  5. If based on your findings in the Quality and Related tabs you need to adjust the model parameters and retrain it, then close the information dialog, click the Navigator icon, select Data, click the Data Flows tab, locate the data flow, and click Open.

What Are Related Data Sets?

When you run the data flow to create the training model, Oracle Analytics creates a set of related data sets. You can open and create projects on these data sets to learn about the accuracy of the model.

Depending on the algorithm you chose for your model, related data sets contain details about the model such as prediction rules, accuracy metrics, confusion matrix, and key drivers for prediction. You can use this information to fine tune the model to get better results, and you can use related data sets to compare models and decide which model is more accurate.

For example, you can open a Drivers data set to discover which columns have a strong positive or negative influence on the model. By examining those columns, you find that some columns aren't treated as model variables because they aren't realistic inputs or that they're too granular for the forecast. You use the data flow editor to open the model and based on the information you discovered, you remove the irrelevant or too-granular columns, and regenerate the model. You check the Quality and Results tab and verify if the model accuracy is improved. You continue this process until you're satisfied with the model's accuracy and it's ready to score a new data set.

To find and open a model, see Inspect a Training Model.

Different algorithms generate similar related data sets. Individual parameters and column names may change in the data set depending on the type of algorithm, but the functionality of the data set stays the same. For example, the column names in a statistics data set may change from Linear Regression to Logistic Regression, but the statistics data set contains accuracy metrics of the model.

These are the related data sets:

CARTree

This data set is a tabular representation of CART (Decision Tree), computed to predict the target column values. It contains columns that represent the conditions and the conditions' criteria in the decision tree, a prediction for each group, and prediction confidence. The Inbuilt Tree Diagram visualization can be used to visualize this decision tree.

The CARTree data set is outputted when you select these model and algorithm combinations.

Model Algorithm
Numeric CART for Numeric Prediction
Binary Classification CART (Decision Tree)
Multi Classification CART (Decision Tree)

Classification Report

This data set is a tabular representation of the accuracy metrics for each distinct value of the target column. For example, if the target column can have the two distinct values Yes and No, this data set shows accuracy metrics like F1, Precision, Recall, and Support (the number of rows in the training data set with this value) for every distinct value of the target column.

The Classification data set is outputted when you select these model and algorithm combinations.

Model Algorithms
Binary Classification

Naive Bayes

Neural Network

Support Vector Machine

Multi Classification

Naive Bayes

Neural Network

Support Vector Machine

Confusion Matrix

This data set, which is also called an error matrix, is a pivot table layout. Each row represents an instance of a predicted class, and each column represents an instance in an actual class. This table reports the number of false positives, false negatives, true positives, and true negatives, which are used to compute precision, recall, and F1 accuracy metrics.

The Confusion Matrix data set is outputted when you select these model and algorithm combinations.

Model Algorithms
Binary Classification

Logistics Regression

CART (Decision Tree)

Naive Bayes

Neural Network

Random Forest

Support Vector Machine

Multi Classification

CART (Decision Tree)

Naive Bayes

Neural Network

Random Forest

Support Vector Machine

Drivers

This data set provides information about the columns that determine the target column values. Linear regressions are used to identify these columns. Each column is assigned coefficient and correlation values. The coefficient value describes the column's weight-age used to determine the target column's value. The correlation value indicates the relationship direction between the target column and dependent column. For example, if the target column's value increases or decreases based on the dependent column.

The Drivers data set is outputted when you select these model and algorithm combinations.

Model Algorithms
Numeric

Linear Regression

Elastic Net Linear Regression

Binary Classification

Logistics Regression

Support Vector Machine

Multi Classification Support Vector Machine

Hitmap

This data set contains information about the decision tree's leaf nodes. Each row in the table represents a leaf node and contains information describing what that leaf node represents, such as segment size, confidence, and expected number of rows. For example, expected number of correct predictions = Segment Size * Confidence.

The Hitmap data set is outputted when you select these model and algorithm combinations.

Model Algorithm
Numeric CART for Numeric Prediction

Residuals

This data set provides information on the quality of the residual predictions. A residual is the difference between the measured value and the predicted value of a regression model. This data set contains an aggregated sum value of absolute difference between the actual and predicted values for all columns in the data set.

The Residuals data set is outputted when you select these model and algorithm combinations.

Model Algorithms
Numerics

Linear Regression

Elastic Net Linear Regression

CART for Numeric Prediction

Binary Classification CART (Decision Tree)
Multi Classificatin CART (Decision Tree)

Statistics

This data set's metrics depend upon the algorithm used to generate it. Note this list of metrics based on algorithm:

  • Linear Regression, CART for Numeric Prediction, Elastic Net Linear Regression - These algorithms contain R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE).
  • CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression - These algorithms contain Accuracy, Total F1.

This data set is outputted when you select these model and algorithm combinations.

Model Algorithm
Numeric

Linear Regression

Elastic Net Linear Regression

CART for Numeric Prediction

Binary Classification

Logistics Regression

CART (Decision Tree)

Naive Bayes

Neural Network

Random Forest

Support Vector Machine

Multi Classification

Naive Bayes

Neural Network

Random Forest

Support Vector Machine

Summary

This data set contains information such as Target name and Model name.

The Summary data set is outputted when you select these model and algorithm combinations.

Model Algorithms
Binary Classification

Naive Bayes

Neural Network

Support Vector Machine

Multi Classification

Naive Bayes

Neural Network

Support Vector Machine