Building and Explaining an Anomaly Detector using AutoMLx - Experimental

by the Oracle AutoMLx Team


Anomaly Detection Demo Notebook.

Copyright © 2025, Oracle and/or its affiliates.

Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/

Overview of this Notebook¶

In this notebook we will build an anomaly detection model using the experimental, fully unsupervised anomaly detection pipeline in Oracle AutoMLx for the public Credit Card Fraud dataset. The dataset is a binary anomaly detection dataset, and more details about the dataset can be found at https://www.openml.org/d/1597. We explore the various options provided by the Oracle AutoMLx tool, allowing the user to control the AutoML training process. We then evaluate the different models trained by AutoML. Finally we provide an overview of the possibilities that Oracle AutoMLx provides for explaining the predictions of the tuned model.


Prerequisites¶

  • Experience level: novice (Python and Machine Learning)
  • Professional experience: some industry experience

Business Use¶

Data analytics and modeling problems using Machine Learning (ML) are becoming popular and often rely on data science expertise to build accurate ML models. Such modeling tasks primarily involve the following steps:

  • Preprocess dataset (clean, impute, engineer features, normalize).
  • Pick an appropriate model for the given dataset and prediction task at hand.
  • Tune the chosen model’s hyperparameters for the given dataset.

All of these steps are significantly time consuming and heavily rely on data scientist expertise. Unfortunately, to make this problem harder, the best feature subset, model, and hyperparameter choice widely varies with the dataset and the prediction task. Hence, there is no one-size-fits-all solution to achieve reasonably good model performance. Using a simple Python API, AutoML can quickly jump-start the datascience process with an accurately-tuned model and appropriate features for a given prediction task.

Table of Contents¶

  • Setup
  • Load the Credit Card dataset
  • AutoML
    • Setting the execution engine
    • Create an Instance of AutoMLx
    • Train a Model using AutoMLx
    • Analyze the AutoMLx optimization process
      • Algorithm Selection
      • Hyperparameter Tuning
    • Specify a Time Budget to AutoML
  • Machine Learning Explainability (MLX)
    • Initialize an MLExplainer
    • Model Explanations (Global Feature Importance)
    • Feature Dependence Explanations
    • Prediction Explanations (Local Feature Importance)
    • Interactive What-If Explanations
    • Counterfactual Explanations
    • Aggregate Local Feature Importance & Local Feature Importance Built-in Sampling
    • Advanced Feature Importance Options
      • Change the number of iterations
      • Include the effects of feature interactions (with Shapley feature importance)
    • Advanced Feature Dependence Options (ALE)
  • References

Setup¶

Basic setup for the Notebook.

In [1]:
! pip install rdata==0.9

%matplotlib inline
%load_ext autoreload
%autoreload 2
Collecting rdata==0.9
  Using cached rdata-0.9-py3-none-any.whl.metadata (1.1 kB)
Requirement already satisfied: numpy in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from rdata==0.9) (1.26.4)
Requirement already satisfied: xarray in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from rdata==0.9) (2024.7.0)
Requirement already satisfied: pandas in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from rdata==0.9) (2.2.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from pandas->rdata==0.9) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from pandas->rdata==0.9) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from pandas->rdata==0.9) (2025.2)
Requirement already satisfied: packaging>=23.1 in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from xarray->rdata==0.9) (25.0)
Requirement already satisfied: six>=1.5 in /scratch_user/olautoml/.conda/envs/pipeline-run-3.9.19-releasev252/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas->rdata==0.9) (1.17.0)
Using cached rdata-0.9-py3-none-any.whl (19 kB)
Installing collected packages: rdata
  Attempting uninstall: rdata
    Found existing installation: rdata 0.11.2
    Uninstalling rdata-0.11.2:
      Successfully uninstalled rdata-0.11.2
Successfully installed rdata-0.9

Load the required modules.

In [2]:
import urllib
import rdata
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import plotly.figure_factory as ff
import plotly.express as px
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.model_selection import train_test_split
from pyod.models.iforest import IForest
import time
import datetime

# Settings for plots
plt.rcParams['figure.figsize'] = [10, 7]
plt.rcParams['font.size'] = 15
import automlx
from automlx import init

Load the Credit Card Fraud Dataset¶

We start by retrieving and reading in the dataset from provided URL.

In [3]:
url = "http://www.ulb.ac.be/di/map/adalpozz/data/creditcard.Rdata"
dst_path = "./creditcard.Rdata"

with open(dst_path, 'wb') as fout:
    fout.write(urllib.request.urlopen(url).read())
parsed_res = rdata.parser.parse_file(dst_path)
res = rdata.conversion.convert(parsed_res)
dataset = res['creditcard'].reset_index(drop=True).drop(['Time'], axis=1)

In this case, the target is identified by the Class column.

In [4]:
y = dataset.loc[:, 'Class']

We reduce the total number of features to 20 to have a reasonable training time for this demonstration.

In [5]:
df = dataset.iloc[:, :20]

Since the dataset is not split into training and validation sets by default, we now split it into training (60%) and test (40%) datasets. The training set will be used to create a Machine Learning model using AutoML, and the test set will be used to evaluate the model's performance on unseen data.

In [6]:
X_train, X_test, y_train, y_test = train_test_split(df, y, train_size=0.6, random_state=0, stratify=y)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, train_size=0.5, random_state=0, stratify=y_test)

X_train.shape, X_test.shape
Out[6]:
((170884, 20), (56962, 20))

Again to keep the training time reasonable, we also downsample to use 5% of the total training set.

In [7]:
X_train, _, y_train, _ = train_test_split(X_train, y_train, train_size=0.05, random_state=0, stratify=y_train)

X_train.shape
Out[7]:
(8544, 20)

We also need to reset the indexes after our downsampling.

In [8]:
X_train.reset_index(drop=True, inplace=True)
y_train.reset_index(drop=True, inplace=True)

Lets look at a few of the samples in the training dataset.

In [9]:
X_train.head()
Out[9]:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
0 -0.716073 0.225322 3.761183 3.207240 -0.495804 1.865745 -0.678000 0.458706 -0.009656 0.484059 0.026032 1.070390 0.905135 -1.490118 -1.857273 -0.478333 0.224669 0.763987 1.715066 0.421615
1 -0.287669 1.355348 0.607857 0.564008 0.823146 -0.623391 0.846476 -0.026066 -0.968745 -0.953380 1.469807 0.470426 0.426923 -1.388217 0.138532 0.771883 0.503981 1.289862 -0.073898 0.021523
2 -1.360701 0.068936 1.547622 0.968746 -2.539901 1.505830 2.200844 -0.087798 1.034511 -0.918203 -0.866630 0.289333 -1.194264 -0.939673 -2.124787 -0.923653 0.630370 -1.355050 -0.591845 -0.520554
3 -1.167623 -0.206586 1.155390 -1.460830 -1.248562 -0.637028 0.015802 0.154537 -3.190664 0.822285 0.061631 -0.513346 1.103738 0.029712 0.751646 -1.206843 1.420112 -0.898657 0.463735 -0.318270
4 -0.771096 -0.882119 -0.583668 -0.087184 -1.984118 0.750064 1.243510 0.479466 0.410805 -1.674414 -0.841014 -0.267114 0.487951 -1.263536 1.160706 1.723230 -0.083639 1.335979 -1.330248 0.989092

The Credit card fraud dataset contains numerical data.

In [10]:
pd.DataFrame({'Data type': X_train.dtypes}).T
Out[10]:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
Data type float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64

The Oracle AutoMLx solution automatically handles missing values by dropping features with too many missing values, and filling in the remaining missing values based on the feature type.

In this case, there are no such missing values in our training dataset.

In [11]:
pd.DataFrame({'% missing values': X_train.isnull().sum() * 100 / len(df)}).T
Out[11]:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
% missing values 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

We visualize the distribution of the target variable in the data. The dataset is heavily unbalanced, as is often the case in the anomaly detection use-case.

In [12]:
y_df = pd.DataFrame(y_train)
y_df.columns = ['income']

fig = px.histogram(y_df["income"].apply(lambda x: "False" if x == "0" else "True"), x="income")
fig.update_layout(xaxis_title="Anomaly")
fig.show()

AutoML¶

Setting the execution engine¶

The AutoMLx package offers the function init, which allows to initialize the parallelization engine.

In [13]:
init(engine='ray')
[2025-04-25 03:01:14,603] [automlx.backend] Overwriting ray session directory to /tmp/1odefte7/ray, which will be deleted at engine shutdown. If you wish to retain ray logs, provide _temp_dir in ray_setup dict of engine_opts when initializing the AutoMLx engine.

Create an instance of AutoML for Unsupervised Anomaly Detection - Experimental Feature¶

The Oracle AutoMLx solution provides a pipeline that automatically finds a tuned model given a prediction task and a training dataset. In particular it allows to find a tuned model for the task of Unsupervised Anomaly Detection (UAD), where the training labels (whether a training point is an anomaly or not) are unknown.

The AutoML UAD Pipeline consists of three main modules:

  • Preprocessing : Clean, impute, engineer, and normalize features.
  • Algorithm Selection : Identify the right algorithm for a given dataset, choosing from amongst the following Outlier Detectors (OD):
    • IsolationForestOD
    • SubspaceOD
    • HistogramOD
    • ClusteringLocalFactorOD
    • PrincipalCompOD
    • MinCovOD
    • AutoEncoder
    • KNearestNeighborsOD
    • OneClassSVMOD
  • Hyperparameter Tuning : Find the best model hyperparameters that maximize score for the given dataset.

All these pieces are readily combined into a simple AutoML pipeline which automates the entire Machine Learning process with minimal user input/interaction.

Train a model using Oracle AutoMLx¶

The AutoMLx API is quite simple to work with. We create an instance of the pipeline. Next, the training data is passed to the fit() function which successively executes the three previously mentioned modules.

A model is then generated and can be used for prediction tasks. We then evaluate the performance of the model on unseen data (X_test) using the F1-score.

In [14]:
est = automlx.Pipeline(task='anomaly_detection', score_metric='f1')
est.fit(X_train, X_valid=X_valid, y_valid=y_valid)

y_pred = est.predict(X_test)

score_default = f1_score(y_test.astype(int), y_pred)

print(f'F1-Score on test data : {score_default}')
[2025-04-25 03:01:54,867] [automlx.interface] Dataset shape: (65505,19)
[2025-04-25 03:02:52,433] [sanerec.autotuning.parameter] Hyperparameter epsilon autotune range is set to its validation range. This could lead to long training times
[2025-04-25 03:03:01,882] [sanerec.autotuning.parameter] Hyperparameter repeat_quality_threshold autotune range is set to its validation range. This could lead to long training times
[2025-04-25 03:03:02,032] [sanerec.autotuning.parameter] Hyperparameter scope autotune range is set to its validation range. This could lead to long training times
[2025-04-25 03:03:02,367] [automlx.data_transform] Running preprocessing. Number of features: 20
[2025-04-25 03:03:02,684] [automlx.data_transform] Preprocessing completed. Took 0.317 secs
[2025-04-25 03:03:02,728] [automlx.process] Running Model Generation
[2025-04-25 03:03:02,773] [automlx.process] Model Generation completed.
[2025-04-25 03:03:02,835] [automlx.model_selection] Running Model Selection
[2025-04-25 03:03:49,626] [automlx.model_selection] Model Selection completed - Took 46.791 sec - Selected models: [['IsolationForestOD']]
[2025-04-25 03:03:49,674] [automlx.trials] Running Model Tuning for ['IsolationForestOD']
[2025-04-25 03:03:56,558] [automlx.trials] Best parameters for IsolationForestOD: {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 6, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'}
[2025-04-25 03:03:56,559] [automlx.trials] Model Tuning completed. Took: 6.885 secs
[2025-04-25 03:03:57,138] [automlx.interface] Re-fitting pipeline
[2025-04-25 03:03:57,153] [automlx.final_fit] Skipping updating parameter seed, already fixed by FinalFit_85a27f35-1
F1-Score on test data : 0.026709401709401708

Analyze the AutoML optimization process¶

During AutoML training, a summary of the optimization process is logged, containing:

  • Information about the training data.
  • Information about the AutoML pipeline, such as:
    • Selected algorithm that was the best choice for this data;
    • Selected hyperparameters for the selected algorithm.

AutoML provides a print_summary() API to output all the different trials performed.

In [15]:
est.print_summary()

General Summary
(8544, 20)
(56961, 20)
ManualSplit(Shuffle=True, Seed=7)
f1
IsolationForestOD
{'contamination': 0.1, 'n_estimators': 5, 'max_samples': 6, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'}
25.2.1
3.9.21 (main, Dec 11 2024, 16:24:11) \n[GCC 11.2.0]

Trials Summary
Step # Samples # Features Algorithm Hyperparameters Score (f1) All Metrics Runtime (Seconds) Memory Usage (GB) Finished
Model Selection 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 100, 'max_samples': 5, 'max_features': 1.0, 'bootstrap': False, 'behaviour': 'old'} 0.031 {'f1': 0.031007751937984496} 1.6463 0.3198 Fri Apr 25 03:03:13 2025
Model Selection 8544 20 HistogramOD {'contamination': 0.1, 'n_bins': 10, 'alpha': 0.1, 'tol': 0.5} 0.0299 {'f1': 0.029925187032418952} 6.5625 0.3786 Fri Apr 25 03:03:18 2025
Model Selection 8544 20 PrincipalCompOD {'contamination': 0.1, 'whiten': False, 'n_components': 0.9999, 'weighted': True, 'svd_solver': 'full', 'n_selected_components': None, 'copy': True, 'tol': 0.0, 'iterated_power': 'auto', 'standardization': True} 0.0293 {'f1': 0.029301644147810516} 1.3347 0.3462 Fri Apr 25 03:03:13 2025
Model Selection 8544 20 AutoEncoder {'contamination': 0.1, 'middle_layer_size': 2, 'encoder_length': 2, 'layer_size_growth': 'exponential', 'hidden_activation': 'relu', 'batch_norm': True, 'learning_rate': 0.001, 'epochs': 100, 'batch_size': 256, 'dropout_rate': 0.05, 'weight_decay': 1e-05, 'preprocessing': False, 'input_dim': 20} 0.029 {'f1': 0.029035821366577244} 37.3068 0.6494 Fri Apr 25 03:03:49 2025
Model Selection 8544 20 ClusteringLocalFactorOD {'contamination': 0.1, 'n_clusters': 9, 'alpha': 0.8, 'beta': 5, 'use_weights': False, 'clustering_estimator': None, 'check_estimator': False} 0.0284 {'f1': 0.02841530054644809} 6.7511 0.4133 Fri Apr 25 03:03:18 2025
Model Selection 8544 20 MinCovOD {'contamination': 0.1, 'assume_centered': False, 'support_fraction': 0.5012289325842697, 'store_precision': True} 0.0283 {'f1': 0.0282780676653762} 4.0895 0.3637 Fri Apr 25 03:03:16 2025
Model Selection 8544 20 KNearestNeighborsOD {'contamination': 0.1, 'n_neighbors': 5, 'method': 'largest', 'radius': 1.0, 'algorithm': 'ball_tree', 'leaf_size': 30, 'metric': 'minkowski', 'p': 2, 'metric_params': None} 0.0262 {'f1': 0.026224783861671472} 20.9862 0.3448 Fri Apr 25 03:03:33 2025
Model Selection 8544 20 OneClassSVMOD {'contamination': 0.1, 'gamma': 0.5, 'kernel': 'rbf', 'nu': 0.5, 'coef0': 0, 'degree': 3, 'tol': 0.001, 'shrinking': True, 'cache_size': 200, 'max_iter': -1} 0.0136 {'f1': 0.013550135501355014} 23.1880 0.5145 Fri Apr 25 03:03:35 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 6, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'} 0.032 {'f1': 0.03197925669835782} 0.1362 0.3567 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 101, 'max_samples': 5, 'max_features': 1.0, 'bootstrap': False, 'behaviour': 'old'} 0.0315 {'f1': 0.03149606299212599} 0.4225 0.6598 Fri Apr 25 03:03:50 2025
... ... ... ... ... ... ... ... ... ...
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 37, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'} 0.0237 {'f1': 0.02370163820146393} 0.1536 0.3488 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 36, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'} 0.0236 {'f1': 0.023570190641247834} 0.1406 0.6632 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 1.0, 'bootstrap': False, 'behaviour': 'old'} 0.0038 {'f1': 0.0037695207323640285} 0.1230 0.3496 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.9999905, 'bootstrap': False, 'behaviour': 'old'} 0.0012 {'f1': 0.0011851851851851852} 0.1246 0.6598 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'} 0.0012 {'f1': 0.0011813349084465446} 0.0994 0.3379 Fri Apr 25 03:03:55 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.05, 'bootstrap': False, 'behaviour': 'old'} 0.0012 {'f1': 0.0011813349084465446} 0.1292 0.3519 Fri Apr 25 03:03:51 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.050009500000000005, 'bootstrap': False, 'behaviour': 'old'} 0.0012 {'f1': 0.0011813349084465446} 0.1274 0.3477 Fri Apr 25 03:03:50 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.05001900000000001, 'bootstrap': False, 'behaviour': 'old'} 0.0012 {'f1': 0.0011813349084465446} 0.1151 0.3501 Fri Apr 25 03:03:50 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.472893264264607, 'bootstrap': False, 'behaviour': 'old'} 0.0005 {'f1': 0.00046838407494145194} 0.1258 0.3477 Fri Apr 25 03:03:50 2025
Model Tuning 8544 20 IsolationForestOD {'contamination': 0.1, 'n_estimators': 5, 'max_samples': 5, 'max_features': 0.472902764264607, 'bootstrap': False, 'behaviour': 'old'} 0.0005 {'f1': 0.00046838407494145194} 0.1166 0.3585 Fri Apr 25 03:03:50 2025

We also provide the capability to visualize the results of each stage of the AutoML pipeline.

Algorithm Selection¶

The plot below shows the scores predicted by Algorithm Selection for each algorithm. The horizontal line shows the average score across all algorithms. Algorithms below the line are colored turquoise, whereas those with a score higher than the mean are colored teal. Here we can see that the MinCovOD achieved the highest predicted score (orange bar), and is chosen for subsequent stages of the Pipeline.

In [16]:
# Each trial is a row in a dataframe that contains
# Algorithm, Number of Samples, Number of Features, Hyperparameters, Score, Runtime, Memory Usage, Step as features
trials = est.completed_trials_summary_[est.completed_trials_summary_["Step"].str.contains('Model Selection')]
name_of_score_column = f"Score ({est._inferred_score_metric[0].name})"
trials.replace([np.inf, -np.inf], np.nan, inplace=True)
trials.dropna(subset=[name_of_score_column], inplace = True)
scores = trials[name_of_score_column].tolist()
models = trials['Algorithm'].tolist()
colors = []

y_margin = 0.10 * (max(scores) - min(scores))
s = pd.Series(scores, index=models).sort_values(ascending=False)
s = s.dropna()
for f in s.keys():
    if f.strip()  ==  est.selected_model_.strip():
        colors.append('orange')
    elif s[f] >= s.mean():
        colors.append('teal')
    else:
        colors.append('turquoise')

fig, ax = plt.subplots(1)
ax.set_title("Algorithm Selection Trials")
ax.set_ylim(min(scores) - y_margin, max(scores) + y_margin)
ax.set_ylabel(est._inferred_score_metric[0].name)
s.plot.bar(ax=ax, color=colors, edgecolor='black')
ax.axhline(y=s.mean(), color='black', linewidth=0.5)
plt.show()

Hyperparameter Tuning¶

Hyperparameter Tuning is the last stage of the AutoML pipeline, and focuses on improving the chosen algorithm's score on the dataset. We use a novel algorithm to search across many hyperparameters dimensions, and converge automatically when optimal hyperparameters are identified. Each trial in the graph below represents a particular hyperparameters configuration for the selected model.

In [17]:
# Each trial is a row in a dataframe that contains
# Algorithm, Number of Samples, Number of Features, Hyperparameters, Score, Runtime, Memory Usage, Step as features
trials = est.completed_trials_summary_[est.completed_trials_summary_["Step"].str.contains('Model Tuning')]
trials.replace([np.inf, -np.inf], np.nan, inplace=True)
trials.dropna(subset=[name_of_score_column], inplace = True)
trials.drop(trials[trials['Finished'] == -1].index, inplace = True)
trials['Finished']= trials['Finished'].apply(lambda x: time.mktime(datetime.datetime.strptime(x,
                                             "%a %b %d %H:%M:%S %Y").timetuple()))
trials.sort_values(by=['Finished'],ascending=True, inplace = True)
scores = trials[name_of_score_column].tolist()
score = []
score.append(scores[0])
for i in range(1,len(scores)):
    if scores[i]>= score[i-1]:
        score.append(scores[i])
    else:
        score.append(score[i-1])
y_margin = 0.10 * (max(score) - min(score))

fig, ax = plt.subplots(1)
ax.set_title("Hyperparameter Tuning Trials")
ax.set_xlabel("Iteration $n$")
ax.set_ylabel(est._inferred_score_metric[0].name)
ax.grid(color='g', linestyle='-', linewidth=0.1)
ax.set_ylim(min(score) - y_margin, max(score) + y_margin)
ax.plot(range(1, len(trials) + 1), score, 'k:', marker="s", color='teal', markersize=3)
plt.show()

Confusion Matrix¶

Evaluating an anomaly detection model is slightly more involved. Essentially, we would like to know when the model was wrong and when the model was right. We use a Confusion Matrix to help us visualize the model's behavior.

In [18]:
cm = confusion_matrix(y_test.astype(int), y_pred, labels=[False, True])
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

text = [[f"{y*100:.2f}" for y in x] for x in cm]
fig = ff.create_annotated_heatmap(cm, x=['Normal', 'Fraud'], y=['Normal', 'Fraud'], annotation_text=text, colorscale='Viridis')
fig.add_annotation(dict(font=dict(color="black",size=14),
                        x=0.5,
                        y=-0.15,
                        showarrow=False,
                        text="Predicted value",
                        xref="paper",
                        yref="paper"))

fig.add_annotation(dict(font=dict(color="black",size=14),
                        x=-0.15,
                        y=0.5,
                        showarrow=False,
                        text="Actual",
                        textangle=-90,
                        xref="paper",
                        yref="paper"))
fig.update_layout(margin=dict(t=50, l=150))
fig.show()

Specify a time budget to Oracle AutoMLx¶

The Oracle AutoMLx tool also allows a user to specify a time budget in seconds. Given the small size of this dataset, we give a small time budget of 10 seconds using the time_budget argument to fit().

In [19]:
est_timebudget = automlx.Pipeline(task='anomaly_detection', score_metric='unsupervised_unify95')
est_timebudget.fit(X_train, time_budget=10)
y_pred = est_timebudget.predict(X_test)
score_timebudget = f1_score(y_test.astype(int), y_pred)

print(f'F1-Score on test data : {score_timebudget}')
[2025-04-25 03:04:02,809] [automlx.interface] Dataset shape: (17088,19)
[2025-04-25 03:04:02,906] [automlx.data_transform] Running preprocessing. Number of features: 20
[2025-04-25 03:04:03,080] [automlx.data_transform] Preprocessing completed. Took 0.174 secs
[2025-04-25 03:04:03,122] [automlx.process] Running Model Generation
[2025-04-25 03:04:03,169] [automlx.process] Model Generation completed.
[2025-04-25 03:04:03,235] [automlx.model_selection] Running Model Selection
[2025-04-25 03:04:13,204] [automlx.model_selection] Model Selection completed - Took 9.969 sec - Selected models: [['MinCovOD']]
[2025-04-25 03:04:13,215] [automlx.process] Timebudget exceeded for steps ['HyperparameterOptimization'], skipping processing of [MinCovOD (InputTargetDataTransformer_MinCovOD)]
[2025-04-25 03:04:13,296] [automlx.interface] Re-fitting pipeline
[2025-04-25 03:04:13,307] [automlx.final_fit] Skipping updating parameter seed, already fixed by FinalFit_c8b84ec6-f
F1-Score on test data : 0.3114754098360656

Machine Learning Explainability¶

For a variety of decision-making tasks, getting only a prediction as model output is not sufficient. A user may wish to know why the model outputs that prediction, or which data features are relevant for that prediction. For that purpose the Oracle AutoMLx solution defines the MLExplainer object, which allows to compute a variety of model explanations.

Initialize an MLExplainer¶

The MLExplainer object takes as argument the trained model, the training data and the task. If you know the labels for your dataset, you may provide them; however, since we are dealing with anomaly detection they are optional. When the labels are not provided, we will use the model's predictions instead.

In [20]:
explainer = automlx.MLExplainer(est,
                              X_train,
                              target_names=['Normal', 'Anomaly'],
                              task='anomaly_detection')

Model Explanations (Global Feature Importance)¶

The notion of Global Feature Importance intuitively measures how much the model's performance (relative to the model's original predictions or the provided train labels, if available) would change if a given feature were dropped from the dataset, and the model was retrained. (Note that this is unlike the default explainers for classification and regression tasks, which explain the model as if it were not retrained. Also unlike these supervised explainers, the anomaly detection explainer does not support interventional explanations.) Note that this notion of feature importance still considers each feature independently from all other features.

Compute the importance¶

By default we use a permutation method to successively measure the importance of each feature. Such a method therefore runs in linear time with respect to the number of features in the dataset.

The method explain_model() allows to compute such feature importances. It also provides 95% confidence intervals for each feature importance.

In [21]:
result_explain_model_default = explainer.explain_model()

Visualization¶

There are two options to show the explanation's results:

  • to_dataframe() will return a dataframe of the results.
  • show_in_notebook() will show the results as a bar plot.

The features are returned in decreasing order of importance.

In [22]:
result_explain_model_default.to_dataframe()
Out[22]:
Feature Attribution Lower Bound Upper Bound
0 V19 0.062594 0.058448 0.066740
1 V8 0.047495 0.042024 0.052967
2 V4 0.035441 0.030222 0.040659
3 V3 0.030964 0.028117 0.033810
4 V14 0.013733 0.011846 0.015620
12 V11 0.000000 0.000000 0.000000
18 V18 0.000000 0.000000 0.000000
17 V17 0.000000 0.000000 0.000000
16 V16 0.000000 0.000000 0.000000
15 V15 0.000000 0.000000 0.000000
14 V13 0.000000 0.000000 0.000000
13 V12 0.000000 0.000000 0.000000
10 V9 0.000000 0.000000 0.000000
11 V10 0.000000 0.000000 0.000000
9 V7 0.000000 0.000000 0.000000
8 V6 0.000000 0.000000 0.000000
7 V5 0.000000 0.000000 0.000000
6 V2 0.000000 0.000000 0.000000
5 V1 0.000000 0.000000 0.000000
19 V20 0.000000 0.000000 0.000000
In [23]:
result_explain_model_default.show_in_notebook()

Feature Dependence Explanations (Partial Dependence Plots)¶

Another way to measure dependency on a feature is through a partial dependence plot (PDP). Given a dataset, a PDP displays the average output of the model as a function of the value of the selected set of features.

The X-axis is the value of the feature V17 and the y-axis is the corresponding outputted price. Since we are considering the whole dataset, while the shaded interval corresponds to a 95% confidence interval for the average.

The histogram on top shows the distribution of the value of the feature V17 in the dataset.

In [24]:
result_explain_feature_dependence_default = explainer.explain_feature_dependence('V17')
result_explain_feature_dependence_default.show_in_notebook()

Prediction Explanations¶

In addition to the Model's behavior, a user might be curious about decision-logic behind the specific predictions made by the model or the impact of specific feature values on the prediction. The Oracle AutoMLx offers prediction explanations to address such concerns.

Local Feature Importance¶

Given a data sample, one can also obtain the local importance, which is the importance of the features for the model's prediction on that sample. In the following cell, we consider sample $1$. The function explain_prediction() computes the local importance for a given sample.

In the plot, V8=0.8878 means that the value of feature V8 for that sample is 0.8878. Removing that feature and retraining the model would change the model's prediction by the magnitude of the bar. That is, in this case, the model's prediction for the probability that the point is anomalous is approximately 0.4% larger because the model was able to observe the value for V8.

In [25]:
anomaly_indices = np.where(y_pred == 1)[0]
In [26]:
index = anomaly_indices[0]
result_explain_prediction_default = explainer.explain_prediction(X_train.iloc[index:index+1,:])
result_explain_prediction_default[0].show_in_notebook()

Interactive What-If Explanations¶

The Oracle AutoMLx solution offers also What-IF tool to explain a trained ML model's predictions through a simple interactive interface.

You can use What-IF explainer to explore and visualize immediately how changing a sample value will affect the model's prediction. Furthermore, What-IF can be used to visualize how model's predictions are related to any feature of the dataset.

In [27]:
explainer.explore_whatif(X_test, y_test)

Counterfactual Explainers¶

Counterfactual explainers are another set of advanced features that Oracle AutoMLx supports, which help to explain a trained ML model's predictions by identifying the minimal set of changes necessary to flip the model's decision, resulting in a different outcome. To achieve this, the solution frames the explanation process as an optimization problem, similar to adversarial discoveries, while ensuring that the counterfactual perturbations used are feasible and diverse.

With the Oracle AutoMLx solution, users are guaranteed a close to zero-failure rate in generating a set of counterfactual explanations; the explainers might only fail if the reference training set doesn't contain any example with the desired class. AutoMLx also provides support for simple constraints on features, using features_to_fix and permitted_range, to ensure the feasibility of the generated counterfactual examples. Additionally, users can use tunable parameters to control the proximity and diversity of the explanations to the original input.

The Oracle AutoMLx solution supports the following strategy for creating counterfactual examples.

  • ace: The AutoMLx counterfactual explainer introduced by Oracle Labs that uses KDTree structures to find a set of nearest but diverse counterfactuals per sample.

The final results can be returned either through the interactive interface of What-IF tools to show the model's prediction sensitivity or static tables and figures.

In [28]:
explainer.configure_explain_counterfactual(strategy='ace')
explanations = explainer.explain_counterfactual(X_test[0:1],
                                               n_counterfactuals=3,
                                               desired_pred='auto')
explanations[0].show_in_notebook()
Out[28]:

Aggregate Local Feature Importance & Local Feature Importance Built-in Sampling¶

We now summarize all of the individual local feature importance explanations into one single aggregate explanation.

To speed up the computation of the local feature importance explanations, we enable the explainer's built-in sampling.

In [29]:
explainer.configure_explain_prediction(sampling={'technique': 'random', 'n_samples': 2000})
In [30]:
# We select 5 random instances here as an example and show the aggregate explanation of those instances.
local_explanations = explainer.explain_prediction(X_train.sample(n=5))
alfi = explainer.aggregate(explanations=local_explanations)
alfi.show_in_notebook()

Advanced Feature Importance Configurations¶

We now display more advanced configuration for computing feature importance. Here, we will explain a custom isolation forest model from the PyOD package. Note that the MLExplainer object is capable to explain any anomaly detection model, as long as the model follows a pyod-style interface with the predict and predict_proba functions.

In [31]:
pyod_model = IForest()
pyod_model.fit(X_train, y_train)

y_pred = pd.Series(pyod_model.predict(X_train), index=X_train.index)
explainer_pyod = automlx.MLExplainer(pyod_model,
                                   X_train,
                                   target_names=['Normal', 'Anomaly'],
                                   task="anomaly_detection")

Changing the number of iterations¶

One can modify the number of iterations n_iter used to evaluate the global importance of the model, or the local importance of a prediction.

Increasing n_iter requires a linear increase in computation time. It however provides more accurate importance estimates, thereby decreasing the variance in repeated calls to explain_model/explain_prediction.

The default value is auto, which selects a suitable default value based on the choice of the method of explanation. Decreasing the number of iterations to 1 also means that the confidence intervals are no longer available.

In this example, because we are explaining a different model, the order of the most important features has changed.

In [32]:
result_explain_model_increase_n_iter = explainer_pyod.explain_model(n_iter=1)
result_explain_model_increase_n_iter.show_in_notebook()

Including the effects of feature interactions (with Shapley feature importance)¶

The Oracle AutoMLx solution allows one to change the effect of feature interactions. This can be done through the tabulator_type argument of both global and local importance methods.

tabulator_type can be set to one of those three options: permutation, kernel_shap, shapley, shap_pi

  • permutation: This value is the default method in the MLExplainer object, and the behaviour was described above

  • kernel_shap: Feature importance attributions will be calculated using an approximation of the Shapley value method. It typically provides relatively high-quality approximations; however, it currently does not provide confidence intervals.

  • shapley: Feature importance is computed using the popular game-theoretic Shapley value method. Technically, this measures the importance of each feature while including the effect of all feature interactions. As a result, it runs in exponential time with respect to the number of features in the dataset. This method also includes the interaction effects of the other features, which means that if two features contain duplicate information, they will be less important. Note that the interpretation of this method's result is a bit different from the permutation method's result. An interested reader may find this a good source for learning more about it.

  • shap_pi: Feature importance attributions will be computed using an approximation of the Shapley value method. It runs in linear time, but may miss the effect of interactions between some features, which may therefore produce lower-quality results. Most likely, you will notice that this method yields larger confidence intervals than the other two.

Summary: permutation can miss important features for AD. Exact SHAP (shapley) doesn't, but it is exponential in running time. kernel_shap is an approximation of exact SHAP method that does not provide confidence intervals. shap_pi is linear, thus faster than exact SHAP and kernel_shap but unstable and very random leads to lower quality approximations.

Local feature importance with kernel_shap¶
In [33]:
explainer_pyod.configure_explain_prediction(tabulator_type="kernel_shap",
                                           sampling={'technique': 'random', 'n_samples': 2000})
In [34]:
anomaly_indices = np.where(y_pred == 1)[0]
index = anomaly_indices[0]
result_explain_prediction_kernel_shap = explainer_pyod.explain_prediction(X_train.iloc[index:index+1,:])
result_explain_prediction_kernel_shap[0].show_in_notebook()

Advanced Feature Dependence Options (ALE)¶

We now show how to use an alternative method for computing feature dependence: accumulated local effects (ALE). ALE explanations are sometimes considered a better alternative to PDPs when features are correlated, because it does not evaluate the model outside of its training distribution in these cases. For more information, see https://christophm.github.io/interpretable-ml-book/ale.html.

Given a dataset, an ALE displays the average change in the output of the model, accumulated of multiple small changes in one or two features, when all other features are held fixed. By default, the ALE explanations are centered around 0, and thus, unlike PDPs, ALEs show the change in the prediction measured by changing a given feature, rather than the average model's prediction for a particular feature value.

The X-axis is the value of the V17 feature and the y-axis is the corresponding computed ALE (price unit).

The histogram on top shows the distribution of the value of the V17 feature in the dataset.

In [35]:
explainer_pyod.configure_explain_feature_dependence(explanation_type='ale')
result_explain_feature_dependence_default = explainer_pyod.explain_feature_dependence(['V17'])
result_explain_feature_dependence_default.show_in_notebook()

References¶

  • Oracle AutoML http://www.vldb.org/pvldb/vol13/p3166-yakovlev.pdf
  • scikit-learn https://scikit-learn.org/stable/
  • Interpretable Machine Learning https://christophm.github.io/interpretable-ml-book/
  • LIME https://arxiv.org/pdf/1602.04938
  • OpenML (Credit Card Fraud Dataset) https://www.openml.org/d/1597