## Inspect a Training Model

After you create the training model and run the data flow, you can review information about the model to determine its accuracy. Use this information to iteratively adjust the model settings to improve accuracy and predict better results.

### What Are Related Data Sets?

When you run the data flow to create the training model, Oracle Analytics creates a set of related data sets. You can open and create projects on these data sets to learn about the accuracy of the model.

Depending on the algorithm you chose for your model, related data sets contain details about the model such as prediction rules, accuracy metrics, confusion matrix, and key drivers for prediction. You can use this information to fine tune the model to get better results, and you can use related data sets to compare models and decide which model is more accurate.

For example, you can open a Drivers data set to discover which columns have a strong positive or negative influence on the model. By examining those columns, you find that some columns aren't treated as model variables because they aren't realistic inputs or that they're too granular for the forecast. You use the data flow editor to open the model and based on the information you discovered, you remove the irrelevant or too-granular columns, and regenerate the model. You check the Quality and Results tab and verify if the model accuracy is improved. You continue this process until you're satisfied with the model's accuracy and it's ready to score a new data set.

To find and open a model, see Inspect a Training Model.

Different algorithms generate similar related data sets. Individual parameters and column names may change in the data set depending on the type of algorithm, but the functionality of the data set stays the same. For example, the column names in a statistics data set may change from Linear Regression to Logistic Regression, but the statistics data set contains accuracy metrics of the model.

These are the related data sets:

CARTree

This data set is a tabular representation of CART (Decision Tree), computed to predict the target column values. It contains columns that represent the conditions and the conditions' criteria in the decision tree, a prediction for each group, and prediction confidence. The Inbuilt Tree Diagram visualization can be used to visualize this decision tree.

The CARTree data set is outputted when you select these model and algorithm combinations.

Model | Algorithm |
---|---|

Numeric | CART for Numeric Prediction |

Binary Classification | CART (Decision Tree) |

Multi Classification | CART (Decision Tree) |

Classification Report

This data set is a tabular representation of the accuracy metrics for each distinct value of the target column. For example, if the target column can have the two distinct values Yes and No, this data set shows accuracy metrics like F1, Precision, Recall, and Support (the number of rows in the training data set with this value) for every distinct value of the target column.

The Classification data set is outputted when you select these model and algorithm combinations.

Model | Algorithms |
---|---|

Binary Classification |
Naive Bayes Neural Network Support Vector Machine |

Multi Classification |
Naive Bayes Neural Network Support Vector Machine |

Confusion Matrix

This data set, which is also called an error matrix, is a pivot table layout. Each row represents an instance of a predicted class, and each column represents an instance in an actual class. This table reports the number of false positives, false negatives, true positives, and true negatives, which are used to compute precision, recall, and F1 accuracy metrics.

The Confusion Matrix data set is outputted when you select these model and algorithm combinations.

Model | Algorithms |
---|---|

Binary Classification |
Logistics Regression CART (Decision Tree) Naive Bayes Neural Network Random Forest Support Vector Machine |

Multi Classification |
CART (Decision Tree) Naive Bayes Neural Network Random Forest Support Vector Machine |

Drivers

This data set provides information about the columns that determine the target column values. Linear regressions are used to identify these columns. Each column is assigned coefficient and correlation values. The coefficient value describes the column's weight-age used to determine the target column's value. The correlation value indicates the relationship direction between the target column and dependent column. For example, if the target column's value increases or decreases based on the dependent column.

The Drivers data set is outputted when you select these model and algorithm combinations.

Model | Algorithms |
---|---|

Numeric |
Linear Regression Elastic Net Linear Regression |

Binary Classification |
Logistics Regression Support Vector Machine |

Multi Classification | Support Vector Machine |

Hitmap

This data set contains information about the decision tree's leaf nodes. Each row in the table represents a leaf node and contains information describing what that leaf node represents, such as segment size, confidence, and expected number of rows. For example, expected number of correct predictions = Segment Size * Confidence.

The Hitmap data set is outputted when you select these model and algorithm combinations.

Model | Algorithm |
---|---|

Numeric | CART for Numeric Prediction |

Residuals

This data set provides information on the quality of the residual predictions. A residual is the difference between the measured value and the predicted value of a regression model. This data set contains an aggregated sum value of absolute difference between the actual and predicted values for all columns in the data set.

The Residuals data set is outputted when you select these model and algorithm combinations.

Model | Algorithms |
---|---|

Numerics |
Linear Regression Elastic Net Linear Regression CART for Numeric Prediction |

Binary Classification | CART (Decision Tree) |

Multi Classificatin | CART (Decision Tree) |

Statistics

This data set's metrics depend upon the algorithm used to generate it. Note this list of metrics based on algorithm:

- Linear Regression, CART for Numeric Prediction, Elastic Net Linear Regression - These algorithms contain R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE).
- CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression - These algorithms contain Accuracy, Total F1.

This data set is outputted when you select these model and algorithm combinations.

Model | Algorithm |
---|---|

Numeric |
Linear Regression Elastic Net Linear Regression CART for Numeric Prediction |

Binary Classification |
Logistics Regression CART (Decision Tree) Naive Bayes Neural Network Random Forest Support Vector Machine |

Multi Classification |
Naive Bayes Neural Network Random Forest Support Vector Machine |

Summary

This data set contains information such as Target name and Model name.

The Summary data set is outputted when you select these model and algorithm combinations.

Model | Algorithms |
---|---|

Binary Classification |
Naive Bayes Neural Network Support Vector Machine |

Multi Classification |
Naive Bayes Neural Network Support Vector Machine |