9.1 About Automated Machine Learning

Automated Machine Learning (AutoML) provides built-in data science expertise about data analytics and modeling that you can employ to build machine learning models.

Any modeling problem for a specified data set and prediction task involves a sequence of data cleansing and preprocessing, algorithm selection, and model tuning tasks. Each of these steps require data science expertise to help guide the process to an efficient final model. Automated Machine Learning (AutoML) automates this process with its built-in data science expertise.

OML4Py has the following AutoML capabilities:

  • Automated algorithm selection that selects the appropriate algorithm from the supported machine learning algorithms
  • Automated feature selection that reduces the size of the original feature set to speed up model training and tuning, while possibly also increasing model quality
  • Automated tuning of model hyperparameters, which selects the model with the highest score metric from among several metrics as selected by the user

AutoML performs those common modeling tasks automatically, with less effort and potentially better results. It also leverages in-database algorithm parallel processing and scalability to minimize runtime and produce high-quality results.

Note:

As the fit method of the machine learning classes does, the AutoML functions reduce, select, and tune provide a case_id parameter that you can use to achieve repeatable data sampling and data shuffling during model building.

The AutoML functionality is also available in a no-code user interface alongside OML Notebooks on Oracle Autonomous Database. For more information, see Oracle Machine Learning AutoML User Interface .

Automated Machine Learning Classes and Algorithms

The Automated Machine Learning classes are the following.

Class Description
oml.automl.AlgorithmSelection

Using only the characteristics of the data set and the task, automatically selects the best algorithms from the set of supported Oracle Machine Learning algorithms.

Supports classification and regression functions.

oml.automl.FeatureSelection

Uses meta-learning to quickly identify the most relevant feature subsets given a training data set and an Oracle Machine Learning algorithm.

Supports classification and regression functions.

oml.automl.ModelTuning

Uses a highly parallel, asynchronous gradient-based hyperparameter optimization algorithm to tune the algorithm hyperparameters.

Supports classification and regression functions.

oml.automl.ModelSelection

Selects the best Oracle Machine Learning algorithm and then tunes that algorithm.

Supports classification and regression functions.

The Oracle Machine Learning algorithms supported by AutoML are the following:

Table 9-1 Machine Learning Algorithms Supported by AutoML

Algorithm Abbreviation Algorithm Name
dt Decision Tree
glm Generalized Linear Model
glm_ridge Generalized Linear Model with ridge regression
nb Naive Bayes
nn Neural Network
rf Random Forest
svm_gaussian Support Vector Machine with Gaussian kernel
svm_linear Support Vector Machine with linear kernel

Classification and Regression Metrics

The following tables list the scoring metrics supported by AutoML.

Table 9-2 Binary and Multiclass Classification Metrics

Metric Description, Scikit-learn Equivalent, and Formula
accuracy

Calculates the rate of correct classification of the target.

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

Formula: (tp + tn)/samples

f1_macro

Calculates the f-score or f-measure, which is a weighted average of the precision and recall. The f1_macro takes the unweighted average of per-class scores.

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’macro’, sample_weight=None)

Formula: 2 * (precision * recall) / (precision + recall)

f1_micro

Calculates the f-score or f-measure with micro-averaging in which true positives, false positives, and false negatives are counted globally.

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’micro’, sample_weight=None)

Formula: 2 * (precision * recall) / (precision + recall)

f1_weighted

Calculates the f-score or f-measure with weighted averaging of per-class scores based on support (the fraction of true samples per class). Accounts for imbalanced classes.

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’weighted’, sample_weight=None)

Formula: 2 * (precision * recall) / (precision + recall)

precision_macro

Calculates the ability of the classifier to not label a sample incorrectly. The precision_macro takes the unweighted average of per-class scores.

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average=’macro’, sample_weight=None)

Formula: tp / (tp + fp)

precision_micro

Calculates the ability of the classifier to not label a sample incorrectly. Uses micro-averaging in which true positives, false positives, and false negatives are counted globally.

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average=’micro’, sample_weight=None)

Formula: tp / (tp + fp)

precision_weighted

Calculates the ability of the classifier to not label a sample incorrectly. Uses weighted averaging of per-class scores based on support (the fraction of true samples per class). Accounts for imbalanced classes.

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average=’weighted’, sample_weight=None)

Formula: tp / (tp + fp)

recall_macro

Calculates the ability of the classifier to correctly label each class. The recall_macro takes the unweighted average of per-class scores.

sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, average=’macro’, sample_weight=None)

Formula: tp / (tp + fn)

recall_micro

Calculates the ability of the classifier to correctly label each class with micro-averaging in which the true positives, false positives, and false negatives are counted globally.

sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, average=’micro’, sample_weight=None)

Formula: tp / (tp + fn)

recall_weighted

Calculates the ability of the classifier to correctly label each class with weighted averaging of per-class scores based on support (the fraction of true samples per class). Accounts for imbalanced classes.

sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, average=’weighted’, sample_weight=None)

Formula: tp / (tp + fn)

See Also: Scikit-learn classification metrics

Table 9-3 Binary Classification Metrics Only

Metric Description, Scikit-learn Equivalent, and Formula
f1

Calculates the f-score or f-measure, which is a weighted average of the precision and recall. This metric by default requires a positive target to be encoded as 1 to function as expected.

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)

Formula: 2 * (precision * recall) / (precision + recall)

precision

Calculates the ability of the classifier to not label a sample positive (1) that is actually negative (0).

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)

Formula: tp / (tp + fp)

recall

Calculates the ability of the classifier to label all positive (1) samples correctly.

sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)

Formula: tp / (tp + fn)

roc_auc

Calculates the Area Under the Receiver Operating Characteristic Curve (roc_auc) from prediction scores.

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

See also the definition of receiver operation characteristic.

Table 9-4 Regression Metrics

Metric Description, Scikit-learn Equivalent, and Formula
r2

Calculates the coefficient of determination (R squared).

sklearn.metrics.r2_score(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)

See also the definition of coefficient of determination.

neg_mean_absolute_error

Calculates the mean of the absolute difference of predicted and true targets (MAE).

sklearn.metrics.mean_absolute_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)

Formula:

Description of negmeanabserr.png follows
Description of the illustration negmeanabserr.png
neg_mean_squared_error

Calculates the mean of the squared difference of predicted and true targets.

-1.0 * sklearn.metrics.mean_squared_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)

Formula:

Description of negmeansqerr.png follows
Description of the illustration negmeansqerr.png
neg_mean_squared_log_error

Calculates the mean of the difference in the natural log of predicted and true targets.

sklearn.metrics.mean_squared_log_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)

Formula:

Description of negmeansqlogerr.png follows
Description of the illustration negmeansqlogerr.png
neg_median_absolute_error

Calculates the median of the absolute difference between predicted and true targets.

sklearn.metrics.median_absolute_error(y_true, y_pred)

Formula:

Description of negmedianabserr.png follows
Description of the illustration negmedianabserr.png

See Also: Scikit-learn regression metrics