10.2.16 Optimization and Scoring Metrics

The ML_TRAIN routine includes the optimization_metric option, and the ML_SCORE routine includes the metric option. Both of these options define a metric that must be compatible with the task type and the target data. Model Metadata includes the optimization_metric field.

For more information about scoring metrics, see: scikit-learn.org. For more information about forecasting metrics, see: sktime.org and statsmodels.org.

Classification Metrics

Binary-only metrics:

Binary and multi-class metrics:

Regression Metrics

Forecasting Metrics

Anomaly Detection Metrics

Metrics for anomaly detection can only be used with the ML_SCORE routine. They cannot be used with the ML_TRAIN routine.

roc_auc: You must not specify threshold or topk options.
precision_k: An Oracle implementation of a common metric for fraud detection and lead scoring. You must use the topk option. You cannot use the threshold option.

The following metrics can use the threshold option, but cannot use the topk option:

Recommendation Model Metrics

The following rating metrics can be used for explicit feedback:

For recommendation models that use implicit feedback:

If a user and item combination in the input table is not unique, the input table is grouped by user and item columns, and the result is the average of the rankings.
If the input table overlaps with the training table, and remove_seen is true, which is the default setting, then the model will not repeat a recommendation and it ignores the overlap items.

The following ranking metrics can be used for implicit and explicit feedback:

precision_at_k is the number of relevant topk recommended items divided by the total topk recommended items for a particular user:
precision_at_k = (relevant topk recommended items) / (total topk recommended items)
For example, if 7 out of 10 items are relevant for a user, and topk is 10, then precision_at_k is 70%.
The precision_at_k value for the input table is the average for all users. If remove_seen is true, the default setting, then the average only includes users for whom the model can make a recommendation. If a user has implicitly ranked every item in the training table, the model cannot recommend any more items for that user, and they are ignored from the average calculation if remove_seen is true.
recall_at_k is the number of relevant topk recommended items divided by the total relevant items for a particular user:
recall_at_k = (relevant topk recommended items) / (total relevant items)
For example, there is a total of 20 relevant items for a user. If topk is 10, and 7 of those items are relevant, then recall_at_k is 7 / 20 = 35%.
The recall_at_k value for the input table is the average for all users.
hit_ratio_at_k is the number of relevant topk recommended items divided by the total relevant items for all users:
hit_ratio_at_k = (relevant topk recommended items, all users) / (total relevant items, all users)
The average of hit_ratio_at_k for the input table is recall_at_k. If there is only one user, hit_ratio_at_k is the same as recall_at_k.
ndcg_at_k is normalized discounted cumulative gain, which is the discounted cumulative gain of the relevant topk recommended items divided by the discounted cumulative gain of the relevant topk items for a particular user.
The discounted gain of an item is the true rating divided by log₂(r+1) where r is the ranking of this item in the relevant topk items. If a user prefers a particular item, the rating is higher, and the ranking is lower.
The ndcg_at_k value for the input table is the average for all users.