8.5 Automatic Data Preparation

Oracle Machine Learning for Python supports Automatic Data Preparation (ADP) and user-directed general data preparation.

The PREP_* settings enable you to request fully automated (ADP) or manual data preparation. By default, ADP is enabled (PREP_AUTO_ON) . When performed manually, data preparation requirements of each algorithm must be addressed

When you enable ADP, the model uses heuristics to transform the build data according to the requirements of the algorithm. Instead of ADP, you can request that the data be shifted and/or scaled with the PREP_SCALE_* and PREP_SHIFT_* settings. The transformation instructions are stored with the model and reused whenever the model is applied. The model settings can be viewed in USER_MINING_MODEL_SETTINGS.

PREP_* Settings

The values for the PREP_* settings are described in the following table.

Table 8-2 title

Setting Name Setting Value Description
PREP_AUTO

PREP_AUTO_ON

PREP_AUTO_OFF

This setting enables fully automated data preparation.

The default is PREP_AUTO_ON.

PREP_SCALE_2DNUM

PREP_SCALE_STDDEV

PREP_SCALE_RANGE

This setting enables scaling data preparation for two-dimensional numeric columns. PREP_AUTO must be OFF for this setting to take effect. The following are the possible values.

PREP_SCALE_STDDEV: A request to divide the column values by the standard deviation of the column and is often provided together with PREP_SHIFT_MEAN to yield z-score normalization.

PREP_SCALE_RANGE: A request to divide the column values by the range of values and is often provided together with PREP_SHIFT_MIN to yield a range of [0,1].

PREP_SCALE_NNUM

PREP_SCALE_MAXABS

This setting enables scaling data preparation for nested numeric columns. PREP_AUTO must be OFF for this setting to take effect. If specified, then the valid value for this setting is PREP_SCALE_MAXABS, which yields data in the range of [-1,1].

PREP_SHIFT_2DNUM

PREP_SHIFT_MEAN

PREP_SHIFT_MIN

This setting enables centering data preparation for two-dimensional numeric columns. PREP_AUTO must be OFF for this setting to take effect. The following are the possible values:

PREP_SHIFT_MEAN: Results in subtracting the average of the column from each value.

PREP_SHIFT_MIN: Results in subtracting the minimum of the column from each value.