8.3 Shared Settings

These settings are common to all of the OML4Py machine learning classes.

The following table lists the settings that are shared by all OML4Py models.

Table 8-1 Shared Model Settings

Setting Name Setting Value Description

ODMS_DETAILS

ODMS_ENABLE

ODMS_DISABLE

Helps to control model size in the database. Model details can consume significant disk space, especially for partitioned models. The default value is ODMS_ENABLE.

If the setting value is ODMS_ENABLE, then model detail tables and views are created along with the model. You can query the model details using SQL.

If the value is ODMS_DISABLE, then model detail tables are not created and tables relevant to model details are also not created.

The reduction in the space depends on the algorithm. Model size reduction can be on the order of 10x .

ODMS_MAX_PARTITIONS

1 < value <= 1000000

Controls the maximum number of partitions allowed for a partitioned model. The default is 1000.

ODMS_MISSING_VALUE_TREATMENT

ODMS_MISSING_VALUE_AUTO

ODMS_MISSING_VALUE_MEAN_MODE

ODMS_MISSING_VALUE_DELETE_ROW

Indicates how to treat missing values in the training data. This setting does not affect the scoring data. The default value is ODMS_MISSING_VALUE_AUTO.

ODMS_MISSING_VALUE_MEAN_MODE replaces missing values with the mean (numeric attributes) or the mode (categorical attributes) both at build time and apply time where appropriate. ODMS_MISSING_VALUE_AUTO performs different strategies for different algorithms.

When ODMS_MISSING_VALUE_TREATMENT is set to ODMS_MISSING_VALUE_DELETE_ROW, the rows in the training data that contain missing values are deleted. However, if you want to replicate this missing value treatment in the scoring data, then you must perform the transformation explicitly.

The value ODMS_MISSING_VALUE_DELETE_ROW is applicable to all algorithms.

ODMS_PARTITION_BUILD_TYPE

ODMS_PARTITION_BUILD_INTRA

ODMS_PARTITION_BUILD_INTER

ODMS_PARTITION_BUILD_HYBRID

Controls the parallel building of partitioned models.

ODMS_PARTITION_BUILD_INTRA builds each partition in parallel using all slaves.

ODMS_PARTITION_BUILD_INTER builds each partition entirely in a single slave, but multiple partitions may be built at the same time because multiple slaves are active.

ODMS_PARTITION_BUILD_HYBRID combines the other two types and is recommended for most situations to adapt to dynamic environments. This is the default value.

ODMS_PARTITION_COLUMNS

Comma separated list of machine learning attributes

Requests the building of a partitioned model. The setting value is a comma-separated list of the machine learning attributes to be used to determine the in-list partition key values. These attributes are taken from the input columns, unless an XFORM_LIST parameter is passed to the model. If XFORM_LIST parameter is passed to the model, then the attributes are taken from the attributes produced by these transformations.

ODMS_TABLESPACE_NAME

tablespace_name

Specifies the tablespace in which to store the model.

If you explicitly set this to the name of a tablespace (for which you have sufficient quota), then the specified tablespace storage creates the resulting model content. If you do not provide this setting, then the your default tablespace creates the resulting model content.

ODMS_SAMPLE_SIZE

0 < value

Determines how many rows to sample (approximately). You can use this setting only if ODMS_SAMPLING is enabled. The default value is system determined.

ODMS_SAMPLING

ODMS_SAMPLING_ENABLE

ODMS_SAMPLING_DISABLE

Allows the user to request sampling of the build data. The default is ODMS_SAMPLING_DISABLE.

ODMS_TEXT_MAX_FEATURES

1 <= value

The maximum number of distinct features, across all text attributes, to use from a document set passed to the model. The default is 3000. An oml.esa model has the default value of 300000.

ODMS_TEXT_MIN_DOCUMENTS

Non-negative value

This text processing setting controls how many documents a token needs to appear in to be used as a feature.

The default is 1. An oml.esa model has the default value of 3.

ODMS_TEXT_POLICY_NAME

The name of an Oracle Text POLICY created using CTX_DDL.CREATE_POLICY.

Affects how individual tokens are extracted from unstructured text.

For details about CTX_DDL.CREATE_POLICY, see Oracle Text Reference.

PREP_AUTO

PREP_AUTO_ON

PREP_AUTO_OFF

This data preparation setting enables fully automated data preparation.

The default is PREP_AUTO_ON.

PREP_SCALE_2DNUM

pPREP_SCALE_STDDEV

PREP_SCALE_RANGE

This data preparation setting enables scaling data preparation for two-dimensional numeric columns. PREP_AUTO must be OFF for this setting to take effect. The following are the possible values:

PREP_SCALE_STDDEV: A request to divide the column values by the standard deviation of the column and is often provided together with PREP_SHIFT_MEAN to yield z-score normalization.

PREP_SCALE_RANGE: A request to divide the column values by the range of values and is often provided together with PREP_SHIFT_MIN to yield a range of [0,1].

PREP_SCALE_NNUM

PREP_SCALE_MAXABS

This data preparation setting enables scaling data preparation for nested numeric columns. PREP_AUTO must be OFF for this setting to take effect. If specified, then the valid value for this setting is PREP_SCALE_MAXABS, which yields data in the range of [-1,1].

PREP_SHIFT_2DNUM

PREP_SHIFT_MEAN

PREP_SHIFT_MIN

This data preparation setting enables centering data preparation for two-dimensional numeric columns. PREP_AUTO must be OFF for this setting to take effect. The following are the possible values:

PREP_SHIFT_MEAN: Results in subtracting the average of the column from each value.

PREP_SHIFT_MIN: Results in subtracting the minimum of the column from each value.