Training Models

Oracle AutoML

Oracle Logo

Oracle AutoML automates the machine learning experience. It replaces the laborious and time-consuming tasks of the data scientist whose workflow is as follows:

  1. Select a model from a large number of viable candidate models.

  2. For each model, tune the hyperparameters.

  3. Select only predictive features to speed up the pipeline and reduce over-fitting.

  4. Ensure the model performs well on unseen data (also called generalization).

../../_images/motivation.png

Oracle AutoML automates this workflow and provides you with an optimal model given a time budget. In addition to incorporating these typical machine learning workflow steps, Oracle AutoML is also optimized to produce a high-quality model very efficiently. This is achieved by the following:

  • Scalable design: All stages in the Oracle AutoML Pipeline exploit both inter-node and intra-node parallelism, improving scalability and reducing runtime.

  • Intelligent choices reduce trials in each stage: Algorithms and parameters are chosen based on dataset characteristics. This ensures that the selected model is accurate and is efficiently selected. This is achieved with the use of meta-learning throughout the pipeline. Meta-learning is used in:

    • Algorithm selection to choose an optimal model class.

    • Adaptive sampling to identify the optimal set of samples.

    • Feature selection to determine the ideal feature subset.

    • Hyperparameter optimization.

The following topics describe the Oracle AutoML Pipeline and individual stages of the pipeline in more detail.

Keras

Keras is an open source neural network library. It can run on top of TensorFlow, Theano, and Microsoft Cognitive Toolkit. By default, Keras uses TensorFlow as the backend. Keras is written in Python, but it has support for R and PlaidML. You can familiarize yourself with Keras by reviewing About Keras.

These examples examine a binary classification problem predicting churn. This is a common type of problem that can be solved using Keras, Tensorflow, and scikit-learn.

If the data is not cached, it is pulled from github and cached. Then, it is loaded.

from os import path
import numpy as np
import pandas as pd
import requests

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

churn_data_file = '/tmp/churn.csv'
if not path.exists(churn_data_file):
    # fetch sand save some data
    print('fetching data from web...', end =" ")
    r = requests.get('https://github.com/darenr/public_datasets/raw/master/churn_dataset.csv')
    with open(churn_data_file, 'wb') as fd:
        fd.write(r.content)
    print("Done")


df = pd.read_csv(churn_data_file)

Keras needs to be imported. scikit-learn should be imported to generate metrics. Much of the data preprocessing and modeling can be done using the ADS library. However, the following example demonstrates how to do these tasks with external libraries:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, roc_auc_score

from keras.models import Sequential
from keras.layers import Dense

The first step is data preparation. From the pandas.DataFrame, you extract the X and Y-values as numpy arrays. The feature selection is performed manually. The next step is feature encoding using sklearn LabelEncoder. This converts categorical variables into ordinal values (‘red’, ‘green’, ‘blue’ –> 0, 1, 2) so that they are compatible with Keras. The data is then split using a 80/20 ratio. The training is performed on 80% of the data. Model testing is performed on the remaining 20% of the data to evaluate how well the model generalizes.

feature_name = ['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
     'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']

response_name = ['Exited']
data = df[[val for sublist in [feature_name, response_name] for val in sublist]].copy()

# Encode the category columns
for col in ['Geography', 'Gender']:
  data.loc[:, col] = LabelEncoder().fit_transform(data.loc[:, col])

# Do an 80/20 split for the training and test data
train, test = train_test_split(data, test_size=0.2, random_state=42)

# Scale the features and split the features away from the response
sc = StandardScaler() # Feature Scaling
X_train = sc.fit_transform(train.drop('Exited', axis=1).to_numpy())
X_test = sc.transform(test.drop('Exited', axis=1).to_numpy())
y_train = train.loc[:, 'Exited'].to_numpy()
y_test = test.loc[:, 'Exited'].to_numpy()

Following is a depiction of the the neural network architecture. It is a sequential model with an input layer with 10 nodes. It has two hidden layers with 255 densely connected nodes and the ReLu activation function. The output layer has a single node with a sigmoid activation function because the model is doing binary classification. The optimizer is Adam and the loss function is binary cross-entropy. The model is optimized on the accuracy metric. This takes several minutes to run.

keras_classifier = Sequential()
keras_classifier.add(Dense(units=255, kernel_initializer='uniform', activation='relu', input_dim=10))
keras_classifier.add(Dense(units=255, kernel_initializer='uniform', activation='relu'))
keras_classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
keras_classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

keras_classifier.fit(X_train, y_train, batch_size=10, epochs=25)

To evaluate this model, you could use sklearn or ADS.

This example uses sklearn:

y_pred = keras_classifier.predict(X_test)
y_pred = (y_pred > 0.5)

cm = confusion_matrix(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred)

print("confusion_matrix:\n", cm)
print("roc_auc_score", auc)

This example uses the ADS evaluator package:

from ads.common.model import ADSModel
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import MLData

eval_test = MLData.build(X = pd.DataFrame(sc.transform(test.drop('Exited', axis=1)), columns=feature_name),
                         y = pd.Series(test.loc[:, 'Exited']),
                         name = 'Test Data')
eval_train = MLData.build(X = pd.DataFrame(sc.transform(train.drop('Exited', axis=1)), columns=feature_name),
                          y = pd.Series(train.loc[:, 'Exited']),
                          name = 'Training Data')
clf = ADSModel.from_estimator(keras_classifier, name="Keras")
evaluator = ADSEvaluator(eval_test, models=[clf], training_data=eval_train)

Scikit-Learn

The sklearn pipeline can be used to build a model on the same churn dataset that was used in the Keras section. The pipeline allows the model to contain multiple stages and transformations. Generally, there would be pipeline stages for feature encoding, scaling, and so on. In this pipeline example, a LogisticRegression estimator is used:

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

pipeline_classifier = Pipeline(steps=[
  ('clf', LogisticRegression())
])

pipeline_classifier.fit(X_train, y_train)

You can evaluate this model using sklearn or ADS.

XGBoost

XGBoost is an optimized, distributed gradient boosting library designed to be efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides parallel tree boosting (also known as Gradient Boosting Decision Tree (GBDT), Gradient Boosting Machines(GBM)) and can be used to solve a variety of data science applications. The code runs, unmodified, on several distributed environments (Hadoop, SGE, MPI) and can processes billions of observations. You can familiarize yourself with XGBoost by reviewing XGBoost Documentation.

Import XGBoost with:

from xgboost import XGBClassifier

xgb_classifier = XGBClassifier(nthread=1)
xgb_classifier.fit(eval_train.X, eval_train.y)

From three estimators we create three ADSModel objects: a Keras classifier, a sklearn pipeline with a single LogisticRegression stage, and an XGBoost model:

from ads.common.model import ADSModel
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import MLDataa

keras_model = ADSModel.from_estimator(keras_classifier)
lr_model = ADSModel.from_estimator(lr_classifier)
xgb_model = ADSModel.from_estimator(xgb_classifier)

evaluator = ADSEvaluator(eval_test, models=[keras_model, lr_model, xgb_model], training_data=eval_train)
evaluator.show_in_notebook()

ADSTuner

In addition to the other services for training models, ADS includes a hyperparamater tuning framework called ADSTuner.

ADSTuner supports using several hyperparamater search strategies out of the box which plug into common model architectures like sklearn.

ADSTuner further supports users defining their own search spaces and stategies which makes ADSTuner functional amd useful with any ML library which doesn’t include hyperparamater tuning out of the box.

First, import packages before the rest of the usage examples.

import category_encoders as ce
import lightgbm
import logging
import numpy as np
import os
import pandas as pd
import pytest
import sklearn
import xgboost

from ads.dataset.factory import DatasetFactory
from ads.hpo.stopping_criterion import *
from ads.hpo.distributions import *
from ads.hpo.search_cv import ADSTuner, NotResumableError

from lightgbm import LGBMClassifier
from sklearn import preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_iris, load_boston
from sklearn.decomposition import PCA
from sklearn.ensemble import AdaBoostRegressor, AdaBoostClassifier
from sklearn.impute import SimpleImputer
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.metrics import make_scorer, f1_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_classif
from xgboost import XGBClassifier

Here is an example of running the ADSTuner on a support model - SGD from sklearn

model = SGDClassifier() ##Initialize the model
X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
tuner = ADSTuner(model, cv=3) ## cv is cross validation splits
tuner.search_space() ##This is the default search space
tuner.tune(X_train, y_train, exit_criterion=[NTrials(10)])

After running this code, ADSTuner will generate a tuning report which lists its trials, best performing hyperparamaters, and performance statistics.

../../_images/adstuner.png

You can use tuner.best_score to get the best score on the scoring metric used (which is accessible as``tuner.scoring_name``) The best selected paramaters are available at tuner.best_params and the complete record of trials at tuner.trials

If you have further compute resources and would like to continue hyperparamater optimization on a model which has already been optimized, you can do so

tuner.resume(exit_criterion=[TimeBudget(5)], loglevel=logging.NOTSET)
print('So far the best {} score is {}'.format(tuner.scoring_name, tuner.best_score))
print("The best trial found was number: " + str(tuner.best_index))

ADSTuner has some robust Visualization and plotting capabilities.

tuner.plot_best_scores()
tuner.plot_intermediate_scores()
tuner.search_space()
tuner.plot_contour_scores(params=['penalty', 'alpha'])
tuner.plot_parallel_coordinate_scores(params=['penalty', 'alpha'])
tuner.plot_edf_scores()

These commands produce the following plots

../../_images/contourplot.png ../../_images/empiricaldistribution.png ../../_images/intermediatevalues.png ../../_images/optimizationhistory.png ../../_images/parallelcoordinate.png

ADSTuner supports custom scoring functions and custom search spaces. Let’s demonstrate this with a different model

model2 = LogisticRegression()
tuner = ADSTuner(model2,
                 strategy = {
                 'C': LogUniformDistribution(low=1e-05, high=1),
                 'solver': CategoricalDistribution(['saga']),
                 'max_iter': IntUniformDistribution(500, 1000, 50)},
                 scoring=make_scorer(f1_score, average='weighted'),
                 cv=3)
tuner.tune(X_train, y_train, exit_criterion=[NTrials(5)])

ADSTuner doesn’t support every model out of the box, a list of supported models are as follows:

  • ‘Ridge’,

  • ‘RidgeClassifier’,

  • ‘Lasso’,

  • ‘ElasticNet’,

  • ‘LogisticRegression’,

  • ‘SVC’,

  • ‘SVR’,

  • ‘LinearSVC’,

  • ‘LinearSVR’,

  • ‘DecisionTreeClassifier’,

  • ‘DecisionTreeRegressor’,

  • ‘RandomForestClassifier’,

  • ‘RandomForestRegressor’,

  • ‘GradientBoostingClassifier’,

  • ‘GradientBoostingRegressor’,

  • ‘XGBClassifier’,

  • ‘XGBRegressor’,

  • ‘ExtraTreesClassifier’,

  • ‘ExtraTreesRegressor’,

  • ‘LGBMClassifier’,

  • ‘LGBMRegressor’,

  • ‘SGDClassifier’,

  • ‘SGDRegressor’

One model we don’t support directly out of the box is AdaBoostRegressor, but here is an example of a custom strategy which defined foer it which works

model3 = AdaBoostRegressor()
X, y = load_boston(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
tuner = ADSTuner(model3, strategy={'n_estimators': IntUniformDistribution(50, 100)})
tuner.tune(X_train, y_train, exit_criterion=[TimeBudget(5)])

Finally, ADSTuner supports sklearn Pipelines.

df, target = pd.read_csv(os.path.join('~', 'advanced-ds', 'tests', 'vor_datasets', 'vor_titanic.csv')), 'Survived'
X = df.drop(target, axis=1)
y = df[target]

numeric_features = X.select_dtypes(include=['int64', 'float64', 'int32', 'float32']).columns
categorical_features = X.select_dtypes(include=['object', 'category', 'bool']).columns

y = preprocessing.LabelEncoder().fit_transform(y)

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.3, random_state=42)

num_features = len(numeric_features) + len(categorical_features)

numeric_transformer = Pipeline(steps=[
    ('num_imputer', SimpleImputer(strategy='median')),
    ('num_scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('cat_imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('cat_encoder', ce.woe.WOEEncoder())
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

pipe = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('feature_selection', SelectKBest(f_classif, k=int(0.9 * num_features))),
        ('classifier', LogisticRegression())
    ]
)

def customerize_score(y_true, y_pred, sample_weight=None):
    score = y_true == y_pred
    return np.average(score, weights=sample_weight)

score = make_scorer(customerize_score)
ads_search = ADSTuner(
    pipe,
    scoring=score,
    strategy='detailed',
    cv=2,
    random_state=42
)
ads_search.tune(X=X_train, y=y_train, exit_criterion=[NTrials(20)])