by the Oracle AutoMLx Team
Fairness Demo notebook.
Copyright © 2025, Oracle and/or its affiliates.
Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/
In this notebook, we explore the fairness features of AutoMLx package. We start by training an AutoML model on the Census Income dataset. Later, we provide examples of how to evaluate the fairness of the model and the dataset. We also explore how the provided explanation techniques may help us gain more insight on the fairness of our model.
%matplotlib inline
%load_ext autoreload
%autoreload 2
Load the required modules.
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import sklearn
from sklearn.metrics import roc_auc_score
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
# Settings for plots
plt.rcParams['figure.figsize'] = [10, 7]
plt.rcParams['font.size'] = 15
import automlx
Here, we give an overview of the key features. We first load the Census Income dataset, train a model on it, evaluate its accuracy and fairness, and compute its fairness feature importance. All of these steps will be revisited again, but with more detail through the rest of the notebook.
dataset = fetch_openml(name='adult', as_frame=True)
df, y = dataset.data, dataset.target
# Several of the columns are incorrectly labeled as category type in the original dataset
numeric_columns = ['age', 'capitalgain', 'capitalloss', 'hoursperweek']
for col in df.columns:
if col in numeric_columns:
df[col] = df[col].astype(int)
X_train, X_test, y_train, y_test = train_test_split(
df, y.map({">50K": 1, "<=50K": 0}).astype(int), train_size=0.8, random_state=12345
)
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, train_size=0.75, random_state=12345
)
X_train.shape, X_val.shape, X_test.shape
((29304, 14), (9769, 14), (9769, 14))
The AutoML API is quite simple to work with. We create an instance of the pipeline. Next, the training data is passed to fit
.
model = automlx.Pipeline(task='classification')
model.fit(X_train, y_train)
[2025-04-25 03:08:03,813] [automlx.backend] Overwriting ray session directory to /tmp/ct1h5q1c/ray, which will be deleted at engine shutdown. If you wish to retain ray logs, provide _temp_dir in ray_setup dict of engine_opts when initializing the AutoMLx engine. [2025-04-25 03:08:07,973] [automlx.interface] Dataset shape: (29304,14) [2025-04-25 03:08:12,868] [sanerec.autotuning.parameter] Hyperparameter epsilon autotune range is set to its validation range. This could lead to long training times [2025-04-25 03:08:13,548] [sanerec.autotuning.parameter] Hyperparameter repeat_quality_threshold autotune range is set to its validation range. This could lead to long training times [2025-04-25 03:08:13,561] [sanerec.autotuning.parameter] Hyperparameter scope autotune range is set to its validation range. This could lead to long training times [2025-04-25 03:08:13,642] [automlx.data_transform] Running preprocessing. Number of features: 15 [2025-04-25 03:08:14,283] [automlx.data_transform] Preprocessing completed. Took 0.641 secs [2025-04-25 03:08:14,310] [automlx.process] Running Model Generation [2025-04-25 03:08:14,361] [automlx.process] KNeighborsClassifier is disabled. The KNeighborsClassifier model is only recommended for datasets with less than 10000 samples and 1000 features. [2025-04-25 03:08:14,361] [automlx.process] SVC is disabled. The SVC model is only recommended for datasets with less than 10000 samples and 1000 features. [2025-04-25 03:08:14,363] [automlx.process] Model Generation completed. [2025-04-25 03:08:14,432] [automlx.model_selection] Running Model Selection (run pid=2668571) [LightGBM] [Info] Number of positive: 2000, number of negative: 2000 (run pid=2668571) [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000439 seconds. (run pid=2668571) You can set `force_row_wise=true` to remove the overhead. (run pid=2668571) And if memory is not enough, you can set `force_col_wise=true`. (run pid=2668571) [LightGBM] [Info] Total Bins 391 (run pid=2668571) [LightGBM] [Info] Number of data points in the train set: 4000, number of used features: 15 (run pid=2668571) [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000 [2025-04-25 03:08:33,792] [automlx.model_selection] Model Selection completed - Took 19.360 sec - Selected models: [['XGBClassifier']] [2025-04-25 03:08:33,820] [automlx.adaptive_sampling] Running Adaptive Sampling. Dataset shape: (29304,16). [2025-04-25 03:08:35,976] [automlx.trials] Adaptive Sampling completed - Took 2.1557 sec. [2025-04-25 03:08:36,067] [automlx.feature_selection] Starting feature ranking for XGBClassifier [2025-04-25 03:08:43,777] [automlx.feature_selection] Feature Selection completed. Took 7.726 secs. [2025-04-25 03:08:43,832] [automlx.trials] Running Model Tuning for ['XGBClassifier'] [2025-04-25 03:09:28,601] [automlx.trials] Best parameters for XGBClassifier: {'learning_rate': 0.10242113515453982, 'min_child_weight': 12, 'max_depth': 4, 'reg_alpha': 0, 'booster': 'gbtree', 'reg_lambda': 0.01878279410038923, 'n_estimators': 143, 'use_label_encoder': False} [2025-04-25 03:09:28,602] [automlx.trials] Model Tuning completed. Took: 44.770 secs [2025-04-25 03:09:35,259] [automlx.interface] Re-fitting pipeline [2025-04-25 03:09:35,275] [automlx.final_fit] Skipping updating parameter seed, already fixed by FinalFit_4c481463-9 [2025-04-25 03:09:37,613] [automlx.interface] AutoMLx completed.
<automlx._interface.classifier.AutoClassifier at 0x150a49d219a0>
y_proba = model.predict_proba(X_test)
score_original = roc_auc_score(y_test, y_proba[:, 1])
print(f'Score on test data: {score_original:.2f}')
Score on test data: 0.91
Among the several fairness metrics available in the AutoMLx package, we compute the statistical parity of the model on test data.
from automlx.fairness.metrics import ModelStatisticalParityScorer
fairness_score = ModelStatisticalParityScorer(protected_attributes='sex')
parity_test_model = fairness_score(model, X_test)
print(f'Statistical parity of the model on test data (lower is better): {parity_test_model:.2f}')
Statistical parity of the model on test data (lower is better): 0.18
Using the fairness feature importance, we can gain insight on which features contribute the most to model unfairness.
explainer = automlx.MLExplainer(model,
X_train,
y_train,
target_names=["<=50K", ">50K"],
task="classification")
fairness_exp = explainer.explain_model_fairness(protected_attributes='sex',
scoring_metric='statistical_parity')
fairness_exp.show_in_notebook()
AutoMLx provides a bias mitigation tool as well. We first need to initialize a ModelBiasMitigator
. It requires a fitted model (the base estimator), the name of the protected attributes to use, a fairness metric, and an accuracy metric.
from automlx.fairness.bias_mitigation import ModelBiasMitigator
bias_mitigated_model = ModelBiasMitigator(
model,
protected_attribute_names="sex",
fairness_metric="equalized_odds",
accuracy_metric="balanced_accuracy",
random_seed=12345,
)
The ModelBiasMitigator
can be called with the usual scikit-learn
interface, notably being trained with a single call to fit
.
bias_mitigated_model.fit(X_val, y_val)
<automlx.fairness.bias_mitigation._sklearn.ModelBiasMitigator at 0x150a8dbe6370>
The model can easily be used for inference.
bias_mitigated_model.predict(X_test)
array([0, 1, 0, ..., 1, 0, 0])
We can also visualize all of the best models that were found by our approach using a single show_tradeoff
call.
bias_mitigated_model.show_tradeoff(hide_inadmissible=False)
A summary of these models can be accessed as below.
bias_mitigated_model.tradeoff_summary_
equalized_odds | balanced_accuracy | multiplier_sex=Female | multiplier_sex=Male | |
---|---|---|---|---|
0 | 0.006916 | 0.608927 | 0.111534 | 0.199131 |
1 | 0.023170 | 0.624593 | 0.193148 | 0.233259 |
2 | 0.028703 | 0.628152 | 0.256520 | 0.233259 |
3 | 0.036227 | 0.642396 | 0.329184 | 0.274094 |
4 | 0.046819 | 0.759193 | 1.441927 | 0.840396 |
5 | 0.052286 | 0.793150 | 3.661628 | 1.334403 |
6 | 0.095597 | 0.795917 | 1.767152 | 1.365998 |
7 | 0.097830 | 0.796030 | 1.700147 | 1.364635 |
8 | 0.129705 | 0.817235 | 8.367842 | 3.129173 |
9 | 0.151287 | 0.819858 | 5.442261 | 2.822292 |
10 | 0.184541 | 0.822028 | 4.626349 | 3.146395 |
11 | 0.216564 | 0.822988 | 3.661628 | 3.447235 |
12 | 0.237449 | 0.824125 | 2.028335 | 3.146395 |
Each of these models can be selected as the final bias-mitigated model and inference can be performed like before. For example, the model with index=1
can be selected as below.
bias_mitigated_model.select_model(1)
dataset = fetch_openml(name='adult', as_frame=True)
df, y = dataset.data, dataset.target
Lets look at a few of the values in the data
df.head()
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capitalgain | capitalloss | hoursperweek | native-country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | State-gov | 77516.0 | Bachelors | 13.0 | Never-married | Adm-clerical | Not-in-family | White | Male | 1 | 0 | 2 | United-States |
1 | 3 | Self-emp-not-inc | 83311.0 | Bachelors | 13.0 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 0 | United-States |
2 | 2 | Private | 215646.0 | HS-grad | 9.0 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 2 | United-States |
3 | 3 | Private | 234721.0 | 11th | 7.0 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 2 | United-States |
4 | 1 | Private | 338409.0 | Bachelors | 13.0 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 2 | Cuba |
The Adult dataset contains a mix of numerical and string data, making it a challenging problem to train machine learning models on.
We visualize the distribution of the target variable in the training data.
y_df = pd.DataFrame(y)
y_df.columns = ['income']
fig = px.histogram(y_df, x="income")
fig.show()
We now visualize the distribution of the target variable conditioned on values of sex
. We can already see some biases in this dataset.
df1 = pd.concat([df, y_df], axis=1)
df1 = df1.groupby('sex')['income'].value_counts(normalize=True)
df1 = df1.mul(100).reset_index()
df1.columns = ['sex', 'income', 'percent']
fig = px.bar(df1, x="sex", y="percent", color="income", barmode="group")
fig.show()
We now separate the predictions (y
) from the training data (X
) for both the training (70%) and test (30%) datasets. The training set will be used to create a Machine Learning model using AutoML, and the test set will be used to evaluate the model's performance on unseen data.
# Several of the columns are incorrectly labeled as category type in the original dataset
numeric_columns = ['age', 'capitalgain', 'capitalloss', 'hoursperweek']
for col in df.columns:
if col in numeric_columns:
df[col] = df[col].astype(int)
X_train, X_test, y_train, y_test = train_test_split(
df, y.map({">50K": 1, "<=50K": 0}).astype(int), train_size=0.8, random_state=12345
)
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, train_size=0.75, random_state=12345
)
X_train.shape, X_val.shape, X_test.shape
((29304, 14), (9769, 14), (9769, 14))
Protected attributes are referred to as features that may not be used as the basis for decisions (for example, race, gender, etc.). When machine learning is applied to decision-making processes involving humans, one should not only look for models with good performance, but also models that do not discriminate against protected population subgroups.
We provide a table summarizing the fairness metrics in the AutoMLx package. Choosing the right fairness metric for a particular application is critical; it requires domain knowledge of the complete sociotechnical system. Moreover, different metrics bring in different perspectives and sometimes the data/model might need to be analyzed for multiple fairness metrics. Therefore, this choice is based on a combination of the domain, task at hand, societal impact of model predictions, policies and regulations, legal considerations, etc. and cannot be fully automated. However, we hope that the table below will help give some insights into which fairness metric is best for your application.
Machine learning models that decide outcomes affecting individuals can either be assistive or punitive. For example, a model that classifies whether or not a job applicant should be interviewed is assistive, because the model is screening for individuals that should receive a positive outcome. In contrast, a model that classifies loan applicants as high risk is punitive, because the model is screening for individuals that should receive a negative outcome. For models used in assistive applications, it is typically important to minimize false negatives (for example, to ensure individuals who deserve to be interviewed are interviewed), whereas in punitive applications, it is usually important to minimize false positives (for example, to avoid denying loans to individuals that have low credit risk). In the spirit of fairness, one should therefore aim to minimize the disparity in false negative rates across protected groups in assistive applications whilst minimizing the disparity in false positive rates for punitive applications. In the following table, we have classified each metric based on whether or not it is most appropriate for models used in assistive or punitive applications (or both). For further explanations, please refer to this book.
Metric | Dataset | Model | Punitive | Assistive | Perfect score means |
---|---|---|---|---|---|
Consistency | ✓ | NA | NA | Neighbors (k-means) have the same labels | |
Smoothed EDF | ✓ | NA | NA | Sub-populations have equal probability of positive label (with log scaling of deviation) | |
Statistical Parity | ✓ | ✓ | ✓ | Sub-populations have equal probability of positive prediction | |
True Positive Rates | ✓ | ✓ | Sub-populations have equal probability of positive prediction when their true label is positive | ||
False Positive Rates | ✓ | ✓ | Sub-populations have equal probability of positive prediction when their true label is negative | ||
False Negative Rates | ✓ | ✓ | Sub-populations have equal probability of negative prediction when their true label is positive | ||
False Omission Rates | ✓ | ✓ | Sub-populations have equal probability of a positive true label when their prediction is negative | ||
False Discovery Rates | ✓ | ✓ | Sub-populations have equal probability of a negative true label when their prediction is positive | ||
Equalized Odds | ✓ | ✓ | ✓ | Sub-populations have equal true positive rate and equal false positive rate | |
Error Rates | ✓ | ✓ | Sub-populations have equal probability of a false prediction | ||
Theil Index | ✓ | ✓ | Error rates are the same for sub-populations and whole population (deviations are measured using entropy). |
The automlx.fairness.metrics
module provides metrics dedicated to assessing and checking whether the model predictions and/or true labels in data comply with a particular fairness metric.
For this example, we will take a look at the statistical parity metric. This metric, also known as demographic parity, measures how much a protected group's outcome varies when compared to the rest of the population. Thus, such fairness metrics denote differences in error rates for different demographic groups/protected attributes in data. Therefore, these metrics are to be minimized to decrease discrepancies in model predictions with respect to specific groups of people. Traditional classification metrics such as accuracy, on the other hand, are to be maximized.
In the context of the Adult Census Income dataset, if we want to measure fairness with respect to the sex
attribute, statistical parity corresponds to the disparity between the model's rate of predicting a >50k
income between men and women.
Model fairness metrics are available as scikit-learn compatible scorers, taking in a list of protected_attributes
at creation and then being called with a model
, X
, and y
on which to measure fairness - note that the fairness features are not limited to AutoML models.
By default, the fairness metric will measure the difference between a subgroup's outcome and that of the rest of the population, returning the mean disparity over all subgroups. These two options can be changed at the creation of the metric, using the distance_measure
and reduction
arguments, respectively.
We first train a simple sklearn random forest and then evaluate its performance and fairness.
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OneHotEncoder
sklearn_model = sklearn.pipeline.Pipeline(
steps=[("preprocessor", OneHotEncoder(handle_unknown="ignore")), ("classifier", RandomForestClassifier())]
)
sklearn_model.fit(X_train, y_train)
Pipeline(steps=[('preprocessor', OneHotEncoder(handle_unknown='ignore')), ('classifier', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('preprocessor', OneHotEncoder(handle_unknown='ignore')), ('classifier', RandomForestClassifier())])
OneHotEncoder(handle_unknown='ignore')
RandomForestClassifier()
We use the roc_auc_score
scoring metric to evaluate the performance of this model on unseen data (X_test
).
y_proba = sklearn_model.predict_proba(X_test)
score = roc_auc_score(y_test, y_proba[:, 1])
print(f'Score on test data: {score:.2f}')
Score on test data: 0.90
Now, we can also compute the statistical parity of the model on test data.
fairness_score = ModelStatisticalParityScorer(protected_attributes='sex')
parity_test_sklearn_model = fairness_score(sklearn_model, X_test)
print(f'Statistical parity of the sklearn model on test data (lower is better): {parity_test_sklearn_model:.2f}')
Statistical parity of the sklearn model on test data (lower is better): 0.19
One can also use the fairness metrics to score AutoML models, which is even easier, because it will handle all of the dataset pre-processing and also select the best learning algorithm.
model = automlx.Pipeline(task='classification')
model.fit(X_train, y_train)
[2025-04-25 03:11:35,769] [automlx.interface] Dataset shape: (29304,14) [2025-04-25 03:11:35,874] [automlx.data_transform] Running preprocessing. Number of features: 15 [2025-04-25 03:11:36,280] [automlx.data_transform] Preprocessing completed. Took 0.406 secs [2025-04-25 03:11:36,306] [automlx.process] Running Model Generation [2025-04-25 03:11:36,357] [automlx.process] KNeighborsClassifier is disabled. The KNeighborsClassifier model is only recommended for datasets with less than 10000 samples and 1000 features. [2025-04-25 03:11:36,357] [automlx.process] SVC is disabled. The SVC model is only recommended for datasets with less than 10000 samples and 1000 features. [2025-04-25 03:11:36,358] [automlx.process] Model Generation completed. [2025-04-25 03:11:36,430] [automlx.model_selection] Running Model Selection [2025-04-25 03:11:53,842] [automlx.model_selection] Model Selection completed - Took 17.412 sec - Selected models: [['XGBClassifier']] [2025-04-25 03:11:53,871] [automlx.adaptive_sampling] Running Adaptive Sampling. Dataset shape: (29304,16). [2025-04-25 03:11:55,656] [automlx.trials] Adaptive Sampling completed - Took 1.7846 sec. [2025-04-25 03:11:55,771] [automlx.feature_selection] Starting feature ranking for XGBClassifier [2025-04-25 03:12:03,593] [automlx.feature_selection] Feature Selection completed. Took 7.839 secs. [2025-04-25 03:12:03,647] [automlx.trials] Running Model Tuning for ['XGBClassifier'] [2025-04-25 03:12:49,388] [automlx.trials] Best parameters for XGBClassifier: {'learning_rate': 0.10242113515453982, 'min_child_weight': 12, 'max_depth': 4, 'reg_alpha': 0, 'booster': 'gbtree', 'reg_lambda': 0.01878279410038923, 'n_estimators': 143, 'use_label_encoder': False} [2025-04-25 03:12:49,389] [automlx.trials] Model Tuning completed. Took: 45.742 secs [2025-04-25 03:12:56,342] [automlx.interface] Re-fitting pipeline [2025-04-25 03:12:56,357] [automlx.final_fit] Skipping updating parameter seed, already fixed by FinalFit_2070da66-2 [2025-04-25 03:12:58,079] [automlx.interface] AutoMLx completed.
<automlx._interface.classifier.AutoClassifier at 0x1501d42f5f40>
Again, we use the roc_auc_score
scoring metric to evaluate the performance of this model on unseen data (X_test
).
y_proba = model.predict_proba(X_test)
score_original = roc_auc_score(y_test, y_proba[:, 1])
print(f'Score on test data: {score_original:.2f}')
Score on test data: 0.91
We can now go on with this model to show-case the rest of fairness features and metrics, but everything could work for a scikit-learn model as well.
fairness_score = ModelStatisticalParityScorer(protected_attributes='sex')
parity_test_model = fairness_score(model, X_test)
print(f'Statistical parity of the model on test data (lower is better): {parity_test_model:.2f}')
Statistical parity of the model on test data (lower is better): 0.18
Below is another way to visually see statistical parity. The difference of the bars amounts to statistical disparity.
y_pred = model.predict(X_train)
df_predict = X_train.copy()
df_predict['model prediction probability'] = y_pred
pred_per_sex = df_predict.groupby('sex').mean('model prediction probability').reset_index()
pred_per_sex = pred_per_sex.rename(columns={'model prediction probability': 'average prediction'})
fig = px.bar(pred_per_sex, x='sex', y='average prediction')
fig.show()
We can see here that our tuned model has a statistical disparity with respect to sex of 0.33
, meaning that among the two values of sex
in the dataset, the model predicts favorable outcomes for one sex 33% more than the other.
Model fairness metrics are also available as functions taking as inputs y_true
, y_pred
and subgroups
- though note that statistical parity, by definition, does not require the true labels.
from automlx.fairness.metrics import model_statistical_parity
y_pred = model.predict(X_test)
subgroups = X_test[['sex']]
parity_test_model = model_statistical_parity(y_pred=y_pred, subgroups=subgroups)
print(f'Statistical parity of the model on test data (lower is better): {parity_test_model:.2f}')
Statistical parity of the model on test data (lower is better): 0.18
Given a dataset with some ground truth labels, we can check whether those true labels satisfy a particular fairness metric of concern. In this context, statistical parity measures the disparity of positive label rates between subgroups and the rest of the population.
Dataset fairness metrics are available as scikit-learn compatible scorers, taking in a list of protected_attributes
at creation and then being called with a model
, X
and y
on which to measure fairness, with model
being an ignored and optional argument.
from automlx.fairness.metrics import DatasetStatisticalParityScorer
DSPS = DatasetStatisticalParityScorer(protected_attributes='sex')
parity_test_data = DSPS(X=X_test, y_true=y_test)
Dataset fairness metrics are also available as functions taking as inputs y_true
and subgroups
.
from automlx.fairness.metrics import dataset_statistical_parity
subgroups = X_test[['sex']]
parity_test_data = dataset_statistical_parity(y_test, subgroups)
print(f'Statistical parity of the test data (lower is better): {parity_test_data:.2f}')
Statistical parity of the test data (lower is better): 0.20
We can see here that the test set of the Adult Census Income Dataset has a statistical parity with respect to sex of 0.20
, meaning that men have 20% more >50k
labels than women.
Interestingly, the dataset's statistical disparity (0.20
) is less than the tuned model's (0.33
), highlighting that a trained model can amplify the unintended bias that is contained in the dataset.
fig = px.bar(
pd.DataFrame({
'Fairness Type': ['Data Fairness', 'Model Fairness'],
'Statistical Parity': [parity_test_data, parity_test_model],
}),
x='Fairness Type',
y='Statistical Parity',
)
fig.show()
Statistical parity is only one of the many supported fairness metrics. As another example, we can compute Equalized Odds, which measures the disparity of a model’s true positive and false positive rates between different subgroups of the data based on demographic information/protected attributes.
from automlx.fairness.metrics import EqualizedOddsScorer
fairness_score = EqualizedOddsScorer(protected_attributes='sex', distance_measure='diff')
EO_original = fairness_score(model, X_test, y_test)
print(f'Equalized odds on test data (lower is better): {EO_original:.2f}')
Equalized odds on test data (lower is better): 0.08
We can also easily compute these fairness metrics on more than one protected attribute.
fairness_score = EqualizedOddsScorer(protected_attributes=['sex', 'race'], distance_measure='diff')
EO = fairness_score(model, X_test, y_test)
print(f'Equalized odds on test data (lower is better): {EO:.2f}')
Equalized odds on test data (lower is better): 0.29
Note that, unlike statistical parity, we cannot compute equalized odds on the dataset since it is dependent to model output. However, we can compute other metrics on the dataset like Smoothed EDF
; it is computed as the minimal exponential deviation of positive target ratios comparing a subgroup to the rest of the population.
from automlx.fairness.metrics import smoothed_edf
subgroups = X_train[['race', 'sex']]
smoothed_edf_score = smoothed_edf(y_train, subgroups)
print(f'Smoothed EDF score on train data: {smoothed_edf_score:.2f}')
Smoothed EDF score on train data: 1.63
For a variety of decision-making tasks, getting only a prediction as model output is not sufficient. A user may wish to know why the model outputs that prediction, or which data features are relevant for that prediction. For that purpose the Oracle AutoMLx solution defines the MLExplainer factory function, which allows to compute a variety of model explanations.
The MLExplainer
object takes as argument the trained model, the training data and ground truth labels, as well as the task.
explainer = automlx.MLExplainer(model,
X_train,
y_train,
target_names=["<=50K", ">50K"],
task="classification")
The notion of global feature importance intuitively measures how much the model's performance (relative to the provided train labels) would change if a given feature were dropped from the dataset, without retraining the model. This notion of feature importance considers each feature independently from all other features.
global_exp = explainer.explain_model()
There are two options to show the explanation's results:
to_dataframe()
will return a dataframe of the results.show_in_notebook()
will show the results as a bar plot.The features are returned in decreasing order of importance.
global_exp.show_in_notebook()
The Global Feature Importance attributions can be computed for fairness metrics using the explain_model_fairness()
method, which provides 95% confidence intervals for each fairness feature importance attribution.
fairness_exp = explainer.explain_model_fairness(protected_attributes='sex',
scoring_metric='statistical_parity')
fairness_exp.show_in_notebook()
Here, we see that marital-status
is considered to be the feature that contributes most to the model's unfairness.
Note that fairness feature importance has to be interpreted slightly differently: the most important features are the ones that contributed the most to make the model unfair.
For that purpose, below, we compare these two types of explanations.
Let's plot a feature's fairness importance according to its global importance to highlight the difference between the two.
def compare(global_exp, fairness_exp):
dfg = global_exp.to_dataframe()
dff = fairness_exp.to_dataframe()
dfg = dfg.set_index('Feature')
dff = dff.set_index('Feature')
dfg.columns = [f'{col}_score' for col in dfg.columns]
dff.columns = [f'{col}_fairness' for col in dff.columns]
df = pd.concat([dfg, dff], axis=1)
df = df.reset_index()
df.columns = ['Feature', 'Increases Accuracy', 'Upper-bound Accuracy', 'Lower-bound Accuracy',
'Decreases Fairness', 'Upper-bound Fairness', 'Lower-bound Fairness',]
fig = px.scatter(df, x="Increases Accuracy", y="Decreases Fairness", text="Feature", log_x=False, size_max=60)
fig.update_traces(textposition='middle left')
fig.update_layout(
height=800,
title_text='Global vs Fairness Feature Importance'
)
fig.show()
compare(global_exp, fairness_exp)
AutoMLx provides other explainers that can sometimes also reveal unintended biases that the model has learned. For more explainers, please refer to the OracleAutoMLx_Classification
notebook.
The AutoMLx package provides a bias mitigation algorithm that fine-tunes decision thresholds across demographic groups to compensate for the bias present in the original model. The approach is called Bias Mitigation.
First, we need to initialize a ModelBiasMitigator
. It requires a base estimator, the name of the protected attributes to use, a fairness metric, and an accuracy metric. There are many more options that can be configured, for example, let's say you'd like the bias mitigated model to be constrained to not exceeding an absolute
value for the fairness metric. This is how that can be configured.
from automlx.fairness.bias_mitigation import ModelBiasMitigator
bias_mitigated_model = ModelBiasMitigator(
model,
protected_attribute_names="sex",
fairness_metric="equalized_odds",
accuracy_metric="balanced_accuracy",
constraint_type="absolute", # indicates a hard constraint
constraint_value=0.1, # The maximum allowed equalized odds score
constraint_target="fairness", # as opposed to accuracy
)
Similarly, let's say you'd like the bias mitigated model to be constrained to not decreasing accuracy more than 5% relative
to the most accurate (but potentially unfair) model. The following is how this would be configured - these are also the default options. Other common options such as time limit or random seed can also be specified.
bias_mitigated_model = ModelBiasMitigator(
model,
protected_attribute_names="sex",
fairness_metric="equalized_odds",
accuracy_metric="balanced_accuracy",
constraint_type="relative", # default
constraint_value=0.05, # default
constraint_target="accuracy", # default
time_limit=50,
n_trials_per_group=30, # Number of different multiplying scalars to consider
random_seed=12345,
)
The ModelBiasMitigator
can be called with the usual scikit-learn
interface, notably being trained with a single call to fit
.
bias_mitigated_model.fit(X_val, y_val)
<automlx.fairness.bias_mitigation._sklearn.ModelBiasMitigator at 0x1501cf1e32b0>
The fitted model can then be used to collect probabilities and labels like any usual model.
bias_mitigated_model.predict_proba(X_test)
array([[0.9480151 , 0.05198494], [0.00739041, 0.99260956], [0.89531946, 0.1046806 ], ..., [0.5493545 , 0.45064548], [0.6450227 , 0.3549773 ], [0.98485196, 0.01514801]], dtype=float32)
bias_mitigated_model.predict(X_test)
array([0, 1, 0, ..., 0, 0, 0])
We can see a summary of the best models found within the tradeoff_summary_
dataframe.
bias_mitigated_model.tradeoff_summary_
equalized_odds | balanced_accuracy | multiplier_sex=Female | multiplier_sex=Male | |
---|---|---|---|---|
0 | 0.023170 | 0.624593 | 0.193148 | 0.233259 |
1 | 0.028703 | 0.628152 | 0.256520 | 0.233259 |
2 | 0.036227 | 0.642396 | 0.329184 | 0.274094 |
3 | 0.054804 | 0.779418 | 2.391568 | 1.075606 |
4 | 0.100502 | 0.794733 | 1.552707 | 1.365998 |
5 | 0.129705 | 0.817235 | 8.367842 | 3.129173 |
6 | 0.157349 | 0.819466 | 7.183583 | 3.448481 |
7 | 0.216564 | 0.822988 | 3.661628 | 3.447235 |
8 | 0.237449 | 0.824125 | 2.028335 | 3.146395 |
We can also visualize all of the best models that were found by our approach using a single show_tradeoff
call.
bias_mitigated_model.show_tradeoff(hide_inadmissible=False)
By default, the best model retained and used for inference is the most fair within a 5% accuracy drop relative to the most accurate model found by our approach. It is highlighted in red in the above figure. Note how the base estimator without bias mitigation is dominated by a number of models available with bias mitigation. With little to no loss of accuracy score, we have a model that is more than twice more fair!
If we prefer a model with a different fairness and accuracy tradeoff, we can instead pick another model from the tradeoff plot above. The index needed to select a model can be obtained by hovering over individual points in the plot.
We can also look up a model's index in the tradeoff_summary_
DataFrame.
We can then select the model using the select_model
method.
bias_mitigated_model.select_model(3)
We can run inference on with this model, just like the other one.
bias_mitigated_model.predict(X_test)
array([0, 1, 0, ..., 1, 0, 0])