4.8.4 About Partitioned Models
Partitioned models allow you to divide your data set into multiple partitions based on specific attributes and build a model for each partition. The system automates the creation and management of these models, reducing manual effort.
When you build a model on a data set, a single generic model may not perform well on new or evolving data. To address this, you can specify partition columns to create separate models for each partition based on some characteristics. For example, if your data set includes a "REGION" attribute with four values, four models will be created automatically, each corresponding to a region. The system manages these sub-models as part of a single partitioned model.
The partitioned model resides as a first-class, persistent database object, with all sub-models stored on disk. This structure improves performance, especially for models with a large number of partitions, and allows for efficient management, such as dropping individual sub-models when needed.
When building or scoring partitioned models, not all sub-models need to be loaded into memory simultaneously. This approach optimizes memory usage and enhances processing efficiency. The system provides a single model for scoring, while users can still access individual component models if needed.
- Partitioned Model Build Process
To build a partitioned model, Oracle Machine Learning for SQL requires a partitioning key specified in a settings table. - DDL in Partitioned model
Learn about maintenance of partitioned models thorough DDL operations. - Partitioned Model Scoring
The scoring of the partitioned model is the same as that of the non-partitioned model.
Related Topics
See Also:
Oracle Machine Learning for SQL User’s Guide to understand more about partitioned models.
Parent topic: Specify Model Settings
4.8.4.1 Partitioned Model Build Process
To build a partitioned model, Oracle Machine Learning for SQL requires a partitioning key specified in a settings table.
The partitioning key is a comma-separated list of one or more columns (up to 16) from the input data set. The partitioning key horizontally slices the input data based on discrete values of the partitioning key. That is, partitioning is performed as list values as opposed to range partitioning against a continuous value. The partitioning key supports only columns of the data type NUMBER and VARCHAR2.
During the build process the input data set is partitioned based on the distinct values of the specified key. Each data slice (unique key value) results in its own model partition. The resultant model partition is not separate and is not visible to you as a standalone model. The default value of the maximum number of partitions for partitioned models is 1000 partitions. You can also set a different maximum partitions value. If the number of partitions in the input data set exceeds the defined maximum, OML4SQL throws an exception.
The partitioned model organizes features common to all partitions and the partition specific features. The common features consist of the following metadata:
-
The model name
-
The machine learning function
-
The machine learning algorithm
-
A super set of all machine learning model attributes referenced by all partitions (signature)
-
A common set of user-defined column transformations
-
Any user-specified or default build settings that are interpreted as global; for example, the Auto Data Preparation (ADP) setting
Parent topic: About Partitioned Models
4.8.4.2 DDL in Partitioned model
Learn about maintenance of partitioned models thorough DDL operations.
Partitioned models are maintained through the following DDL operations:
- Add Partition
Oracle Machine Learning for SQL supports adding a single partition or multiple partitions to an existing partitioned model. - Drop Partition
Oracle Machine Learning for SQL supports dropping a single model partition for a given partition name. - Replace Partition
Oracle Machine Learning for SQL supports replacing an existing partition for which conflicting keys are found. - Drop Partitioned Model
OML4SQL supports dropping a partitioned model. This is used when the full model is obsolete or no longer required.
Parent topic: About Partitioned Models
4.8.4.2.1 Add Partition
Oracle Machine Learning for SQL supports adding a single partition or multiple partitions to an existing partitioned model.
The addition occurs based on the input data set and the name of the existing partitioned model. The operation takes the input data set and the existing partitioned model as parameters. The partition keys are extracted from the input data set and the model partitions are built against the input data set. These partitions are added to the partitioned model.
If the input data set contains multiple keys, then the operation may create
one or more partitions. If the total number of partitions in the model increases to more
than the user-defined maximum specified when the model was created, then you get an
error. The default threshold value for the number of partitions is 1000. See
ODMS_MAX_PARTITIONS in DBMS_DATA_MINING - Global Settings for more details.
Tip:
In the case where partition keys for new partitions conflict with the existing partitions in the model, you can select from the following three approaches to resolve the conflicts:-
ERROR: Terminates the ADD operation without adding any partitions. -
REPLACE: Replaces the existing partition for which the conflicting keys are found. -
IGNORE: Eliminates the rows having the conflicting keys.
The ODMS_PARTITIONED_COLUMNS parameter is used for selecting
the column on which partition is applied. The following example shows selecting the
column for partition:
%script
BEGIN DBMS_DATA_MINING.DROP_MODEL('SVM_MOD_PARTITIONED');
EXCEPTION WHEN OTHERS THEN NULL; END;
/
DECLARE
v_setlst DBMS_DATA_MINING.SETTING_LIST;
BEGIN
v_setlst('ALGO_NAME') := 'ALGO_SUPPORT_VECTOR_MACHINES'; -- SVM algorithm
v_setlst('SVMS_KERNEL_FUNCTION') := 'SVMS_LINEAR'; -- choose linear kernel, which provides coefficients
v_setlst('ODMS_PARTITION_COLUMNS') := 'CUST_GENDER'; -- choose column on which to partition the data
DBMS_DATA_MINING.CREATE_MODEL2(
MODEL_NAME => 'SVM_MOD_PARTITIONED',
MINING_FUNCTION => 'REGRESSION',
DATA_QUERY => 'SELECT * FROM CUSTOMERS_DEMO',
SET_LIST => v_setlst,
CASE_ID_COLUMN_NAME => 'CUST_ID',
TARGET_COLUMN_NAME => 'YRS_RESIDENCE');
END;
You can also define ODMS_PARTITION_BUILD_TYPE to control the
parallel build of partitioned models.
ERROR as an
optional value for ADD_OPTIONS which terminates the add operation
without adding any
partitions.BEGIN
DBMS_DATA_MINING.ADD_PARTITION(
model_name => 'SVM_MOD_PARTITIONED',
data_query =>'SELECT * FROM CUSTOMERS_DEMO',
add_options = >'ERROR'
);
END;
/See Also:
DBMS_DATA_MINING - Global Settings and ADD_PARTITION Procedure to specify the settings.
Parent topic: DDL in Partitioned model
4.8.4.2.2 Drop Partition
Oracle Machine Learning for SQL supports dropping a single model partition for a given partition name.
Drop a partition when a particular segment of data is no longer relevant or has become obsolete.
Note:
If only one partition exists, you cannot directly drop it. Instead, add more partitions before dropping it or drop the entire model.CUST_GENDER
column on which partition was
applied.--For dropping a partition with the value 'M'
BEGIN
DBMS_DATA_MINING.DROP_PARTITION(
model_name => 'SVM_MOD_PARTITIONED',
partition_name =>'M'
);
END;
/
--For dropping a partition with the value 'F'
BEGIN
DBMS_DATA_MINING.DROP_PARTITION(
model_name => 'SVM_MOD_PARTITIONED',
partition_name =>'F'
);
END;
/
Parent topic: DDL in Partitioned model
4.8.4.2.3 Replace Partition
Oracle Machine Learning for SQL supports replacing an existing partition for which conflicting keys are found.
You can use REPLACE as an optional parameter to define how
add operation handles when rows in the input data set conflict with existing partitions
in the model.
The following example shows how you can use REPLACE as an
optional value for ADD_OPTIONS.
BEGIN
DBMS_DATA_MINING.ADD_PARTITION(
model_name => 'SVM_MOD_PARTITIONED',
data_query =>'SELECT * FROM CUSTOMERS_DEMO',
add_options = >'REPLACE'
);
END;
/See Also:
ADD_PARTITION ProcedureParent topic: DDL in Partitioned model
4.8.4.2.4 Drop Partitioned Model
OML4SQL supports dropping a partitioned model. This is used when the full model is obsolete or no longer required.
BEGIN DBMS_DATA_MINING.DROP_MODEL('SVM_MOD_PARTITIONED');
EXCEPTION WHEN OTHERS THEN NULL;
END;
/See Also:
DROP_MODEL ProcedureParent topic: DDL in Partitioned model
4.8.4.3 Partitioned Model Scoring
The scoring of the partitioned model is the same as that of the non-partitioned model.
The syntax of the machine learning function remains the same but is extended to provide an optional hint.
For scoring a partitioned model, the signature columns used during the build for the partitioning key must be present in the scoring data set. These columns are combined to form a unique partition key. The unique key is then mapped to a specific underlying model partition, and the identified model partition is used to score that row.
The partitioned objects that are necessary for scoring are loaded on demand during the query execution and are aged out depending on the System Global Area (SGA) memory.
CREATE_MODEL2 procedure is used for creating
the model. The partition attribute is CUST_GENDER. This attribute
has two options M and F.%script
BEGIN DBMS_DATA_MINING.DROP_MODEL('SVM_MOD_PARTITIONED');
EXCEPTION WHEN OTHERS THEN NULL; END;
/
DECLARE
v_setlst DBMS_DATA_MINING.SETTING_LIST;
BEGIN
v_setlst('ALGO_NAME'):= 'ALGO_SUPPORT_VECTOR_MACHINES';
v_setlst('SVMS_KERNEL_FUNCTION') :='SVMS_LINEAR';
v_setlst('ODMS_PARTITION_COLUMNS'):='CUST_GENDER';
DBMS_DATA_MINING.CREATE_MODEL2(
MODEL_NAME => 'SVM_MOD_PARTITIONED',
MINING_FUNCTION => 'REGRESSION',
DATA_QUERY => 'SELECT * FROM CUSTOMERS_DEMO',
SET_LIST => v_setlst,
CASE_ID_COLUMN_NAME => 'CUST_ID',
TARGET_COLUMN_NAME => 'YRS_RESIDENCE');
END;
The
output is as
follows:
PL/SQL procedure successfully completed.
---------------------------
PL/SQL procedure successfully completed.
%script
SELECT cust_id, YRS_RESIDENCE,
ROUND(PREDICTION(SVM_MOD_PARTITIONED USING *),2) pred_YRS_RESIDENCE
FROM CUSTOMERS_DEMO;
CUST_ID YRS_RESIDENCE PRED_YRS_RESIDENCE
100100 4 4.71
100200 2 1.62
100300 4 4.66
100400 6 5.9
100500 2 2.07
100600 3 2.74
100700 6 5.78
100800 5 7.22
100900 4 4.88
101000 7 6.49
101100 4 3.54
101200 1 1.46
101300 4 4.34
101400 4 4.34 ...SVM_MOD_PARTITIONED, for the partition where
CUST_GENDER =
'M'.SELECT target_value class, attribute_name aname, attribute_value aval, coefficient coeff
FROM (SELECT target_value, attribute_name, attribute_value, coefficient
FROM DM$VLSVM_MOD_PARTITIONED WHERE partition_name =
(SELECT ORA_DM_PARTITION_NAME(SVM_MOD_PARTITIONED using 'M' CUST_GENDER) FROM dual)
ORDER BY coefficient DESC)
WHERE ROWNUM <= 10;Use Optional Hint for Scoring a Partitioned Model
The optional hint can impact the performance of a query which involves scoring a partitioned model. See GROUPING Hint for more details.
The following example retrieves the top 3 customers per gender based on
prediction probability using a partitioned SVM model named
SVM_MOD_PARTITIONED and using a GROUPING
hint.
column gender format a1
column income format a30
column rnk format 9
SELECT cust_id, cust_gender as gender, rnk, pd FROM
( SELECT cust_id, cust_gender,
PREDICTION_DETAILS(/*+ GROUPING */ SVM_MOD_PARTITIONED, 1 USING *) pd,
rank() over (partition by cust_gender order by
PREDICTION_PROBABILITY(SVM_MOD_PARTITIONED, 1 USING *) desc, cust_id) rnk
FROM mining_data_apply_parallel_v)
WHERE rnk <= 3
order by rnk, cust_gender;
See the oml4sql-partitioned-models-svm example on GitHub for a complete example.
Related Topics
Parent topic: About Partitioned Models