10.4 モデルのチューニング
oml.automl.ModelTuning
クラスは、指定された分類または回帰アルゴリズムおよびトレーニング・データのハイパーパラメータをチューニングします。
モデルのチューニングは、多くの時間と労力を要する機械学習タスクであり、データ・サイエンティストの専門知識に大きく依存します。oml.automl.ModelTuning
クラスは、限られたユーザー入力で、勾配法に基づく、並列度の高い非同期のハイパーパラメータ最適化アルゴリズムを使用してこのプロセスを自動化し、Oracle Machine Learningアルゴリズムのハイパーパラメータをチューニングします。
oml.automl.ModelTuning
クラスは、分類および回帰アルゴリズムをサポートしています。oml.automl.ModelTuning
クラスを使用するには、データセットとアルゴリズムを指定して、チューニングされたモデルおよびそれに対応するハイパーパラメータを取得します。上級ユーザーは、カスタマイズしたハイパーパラメータ検索領域とデフォルト以外のスコアリング・メトリックをこのブラックボックス・オプティマイザに渡すことができます。
パーティション化されたモデルについては、tune
メソッドのparam_space
引数でパーティション化する列を渡すと、oml.automl.ModelTuning
は、パーティション化されたモデルのハイパーパラメータをチューニングします。
このクラスのパラメータおよびメソッドの詳細は、help(oml.automl.ModelTuning)
を呼び出すか、Oracle Machine Learning for Python APIリファレンスを参照してください。
例10-3 oml.automl.ModelTuning
クラスの使用
この例では、oml.automl.ModelTuning
オブジェクトを作成します。
import oml
from oml import automl
import pandas as pd
from sklearn import datasets
# Load the breast cancer data set.
bc = datasets.load_breast_cancer()
bc_data = bc.data.astype(float)
X = pd.DataFrame(bc_data, columns = bc.feature_names)
y = pd.DataFrame(bc.target, columns = ['TARGET'])
# Create the database table BreastCancer.
oml_df = oml.create(pd.concat([X, y], axis=1),
table = 'BreastCancer')
# Split the data set into training and test data.
train, test = oml_df.split(ratio=(0.8, 0.2), seed = 1234)
X, y = train.drop('TARGET'), train['TARGET']
X_test, y_test = test.drop('TARGET'), test['TARGET']
# Start an automated model tuning run with a Decision Tree model.
at = automl.ModelTuning(mining_function='classification',
parallel=4)
results = at.tune('dt', X, y, score_metric='accuracy')
# Show the tuned model details.
tuned_model = results['best_model']
tuned_model
# Show the best tuned model train score and the
# corresponding hyperparameters.
score, params = results['all_evals'][0]
"{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)]
# Use the tuned model to get the score on the test set.
"{:.2}".format(tuned_model.score(X_test, y_test))
# An example invocation of model tuning with user-defined
# search ranges for selected hyperparameters on a new tuning
# metric (f1_macro).
search_space = {
'RFOR_SAMPLING_RATIO': {'type': 'continuous',
'range': [0.01, 0.5]},
'RFOR_NUM_TREES': {'type': 'discrete',
'range': [50, 100]},
'TREE_IMPURITY_METRIC': {'type': 'categorical',
'range': ['TREE_IMPURITY_ENTROPY',
'TREE_IMPURITY_GINI']},}
results = at.tune('rf', X, y, score_metric='f1_macro',
param_space=search_space)
score, params = results['all_evals'][0]
("{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)])
# Some hyperparameter search ranges need to be defined based on the
# training data set sizes (for example, the number of samples and
# features). You can use placeholders specific to the data set,
# such as $nr_features and $nr_samples, as the search ranges.
search_space = {'RFOR_MTRY': {'type': 'discrete',
'range': [1, '$nr_features/2']}}
results = at.tune('rf', X, y,
score_metric='f1_macro', param_space=search_space)
score, params = results['all_evals'][0]
("{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)])
# Drop the database table.
oml.drop('BreastCancer')
この例のリスト
>>> import oml
>>> from oml import automl
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the breast cancer data set.
... bc = datasets.load_breast_cancer()
>>> bc_data = bc.data.astype(float)
>>> X = pd.DataFrame(bc_data, columns = bc.feature_names)
>>> y = pd.DataFrame(bc.target, columns = ['TARGET'])
>>>
>>> # Create the database table BreastCancer.
>>> oml_df = oml.create(pd.concat([X, y], axis=1),
... table = 'BreastCancer')
>>>
>>> # Split the data set into training and test data.
... train, test = oml_df.split(ratio=(0.8, 0.2), seed = 1234)
>>> X, y = train.drop('TARGET'), train['TARGET']
>>> X_test, y_test = test.drop('TARGET'), test['TARGET']
>>>
>>> # Start an automated model tuning run with a Decision Tree model.
... at = automl.ModelTuning(mining_function='classification',
... parallel=4)
>>> results = at.tune('dt', X, y, score_metric='accuracy')
>>>
>>> # Show the tuned model details.
... tuned_model = results['best_model']
>>> tuned_model
Algorithm Name: Decision Tree
Mining Function: CLASSIFICATION
Target: TARGET
Settings:
setting name setting value
0 ALGO_NAME ALGO_DECISION_TREE
1 CLAS_MAX_SUP_BINS 32
2 CLAS_WEIGHTS_BALANCED OFF
3 ODMS_DETAILS ODMS_DISABLE
4 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
5 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
6 PREP_AUTO ON
7 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
8 TREE_TERM_MAX_DEPTH 8
9 TREE_TERM_MINPCT_NODE 3.34
10 TREE_TERM_MINPCT_SPLIT 0.1
11 TREE_TERM_MINREC_NODE 10
12 TREE_TERM_MINREC_SPLIT 20
Attributes:
mean radius
mean texture
mean perimeter
mean area
mean smoothness
mean compactness
mean concavity
mean concave points
mean symmetry
mean fractal dimension
radius error
texture error
perimeter error
area error
smoothness error
compactness error
concavity error
concave points error
symmetry error
fractal dimension error
worst radius
worst texture
worst perimeter
worst area
worst smoothness
worst compactness
worst concavity
worst concave points
worst symmetry
worst fractal dimension
Partition: NO
>>>
>>> # Show the best tuned model train score and the
... # corresponding hyperparameters.
... score, params = results['all_evals'][0]
>>> "{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)]
('0.92', ['CLAS_MAX_SUP_BINS:32', 'TREE_IMPURITY_METRIC:TREE_IMPURITY_GINI', 'TREE_TERM_MAX_DEPTH:7', 'TREE_TERM_MINPCT_NODE:0.05', 'TREE_TERM_MINPCT_SPLIT:0.1'])
>>>
>>> # Use the tuned model to get the score on the test set.
... "{:.2}".format(tuned_model.score(X_test, y_test))
'0.92
>>>
>>> # An example invocation of model tuning with user-defined
... # search ranges for selected hyperparameters on a new tuning
... # metric (f1_macro).
... search_space = {
... 'RFOR_SAMPLING_RATIO': {'type': 'continuous',
... 'range': [0.01, 0.5]},
... 'RFOR_NUM_TREES': {'type': 'discrete',
... 'range': [50, 100]},
... 'TREE_IMPURITY_METRIC': {'type': 'categorical',
... 'range': ['TREE_IMPURITY_ENTROPY',
... 'TREE_IMPURITY_GINI']},}
>>> results = at.tune('rf', X, y, score_metric='f1_macro',
>>> param_space=search_space)
>>> score, params = results['all_evals'][0]
>>> ("{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)])
('0.92', ['RFOR_NUM_TREES:53', 'RFOR_SAMPLING_RATIO:0.4999951', 'TREE_IMPURITY_METRIC:TREE_IMPURITY_ENTROPY'])
>>>
>>> # Some hyperparameter search ranges need to be defined based on the
... # training data set sizes (for example, the number of samples and
... # features). You can use placeholders specific to the data set,
... # such as $nr_features and $nr_samples, as the search ranges.
... search_space = {'RFOR_MTRY': {'type': 'discrete',
... 'range': [1, '$nr_features/2']}}
>>> results = at.tune('rf', X, y,
... score_metric='f1_macro', param_space=search_space)
>>> score, params = results['all_evals'][0]
>>> ("{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)])
('0.93', ['RFOR_MTRY:10'])
>>>
>>> # Drop the database table.
... oml.drop('BreastCancer')
親トピック: 自動化された機械学習