6 Managing System Resources for Oracle Data Miner

This chapter provides information to help you optimize your system to support Oracle Data Miner.

Oracle Data Miner Resource Management Overview

You can effectively manage system resources for Oracle Data Miner by using features of Oracle Database and the options provided in Oracle Data Miner repository.

You can effectively manage system resources for Oracle Data Miner in the following ways:

  • To manage Oracle Data Miner sessions, develop an appropriate resource plan using Oracle Database Resource Manager.

  • To manage workflow execution, change the Oracle Data Miner default job class used for Oracle Scheduler to a job class with an appropriate resource plan.

  • To manage the model build process, change the Oracle Data Miner default maximum number of concurrent model builds.

  • To manage parallel query processing, change the Oracle Data Miner default parallel query setting to prevent users from specifying parallel processing for individual nodes and workflows.

Allocating Resources to Oracle Data Miner User Sessions

You can use Database Resource Manager to create resource plans that allocate system resources for groups of sessions based on session attributes.

Oracle Database Resource Manager allocates CPU time, configures parallel query processing, limits the number of sessions, and controls other aspects of system behavior that would otherwise be controlled by the operating system. In a database where multiple applications run concurrently and compete for system resources, you can use Oracle Database Resource Manager to distribute the workloads and optimize overall performance. For example, you could balance the demands of ETL, OLAP, data mining, and reporting workloads running simultaneously in the database.

Oracle Data Miner workflows can potentially make extensive demands on system resources, especially when transformations and large data sets are involved. In a database that must accommodate the demands of multiple applications, you can create a resource plan to limit the impact of Oracle Data Miner sessions on other applications and prevent other applications from compromising the performance of Oracle Data Miner.

Example 6-1 illustrates the creation of a simple resource plan for Oracle Data Miner. The resource plan, called SIMPLE_RESOURCE_PLAN, creates two consumer groups: DATA_MINER_GROUP and OTHER_GROUPS. The plan allocates 50% of CPU resource to DATA_MINER_GROUP and the rest to OTHER_GROUPS. The DATA_MINER_GROUP is mapped to the DMUSER account; other users are mapped to the OTHER_GROUPS group.

Example 6-1 Simple Resource Plan for Oracle Data Miner Sessions

CONNECT sys as sysdba;
Enter password: password
-- creating a pending area is the first step in defining 
-- consumer groups and resource plans
EXEC dbms_resource_manager.create_pending_area();
-- delete old plan (optional)
EXEC dbms_resource_manager.delete_plan_cascade(
               plan => 'SIMPLE_RESOURCE_PLAN');
-- create a custom consumer group for data miner workload
EXEC dbms_resource_manager.create_consumer_group(
               consumer_group => 'DATA_MINER_GROUP', 
               comment => 'Sessions for data miner operations');
-- map DMUSER account to the consumer group
EXEC dbms_resource_manager.set_consumer_group_mapping(
               attribute => dbms_resource_manager.oracle_user, 
               value => 'DMUSER', 
               consumer_group => 'DATA_MINER_GROUP');
-- create a custom resource plan
EXEC dbms_resource_manager.create_plan(
               plan => 'SIMPLE_RESOURCE_PLAN', 
               comment => 'Resource plan for database operations');
-- specifies how much CPU and parallelism 
-- should be allocated to the consumer group
EXEC dbms_resource_manager.create_plan_directive(
               plan => 'SIMPLE_RESOURCE_PLAN',
               group_or_subplan => 'DATA_MINER_GROUP',
               comment => 'Percentage of CPU for DATA_MINER_GROUP', 
               mgmt_p1 => 50, 
               utilization_limit => 55, 
               parallel_degree_limit_p1 => 8, 
               parallel_server_limit => 4);
-- specifies how much CPU should be allocated to the required OTHER_GROUPS
EXEC dbms_resource_manager.create_plan_directive(
                plan => 'SIMPLE_RESOURCE_PLAN',
                group_or_subplan => 'OTHER_GROUPS',
                comment => 'Percentage of CPU for OTHER_GROUPS', 
                mgmt_p1 => 50);
-- persist plan to the database
EXEC dbms_resource_manager.submit_pending_area();
-- Now that the resource plan is defined, enable it by setting
-- the resource_manager_plan parameter with the resource plan name
ALTER SYSTEM SET resource_manager_plan = 'SIMPLE_RESOURCE_PLAN'; 
-- DBA can also enable a resource plan for the period of time corresponding 
-- to a job scheduler Window (via the dbms_scheduler.create_window procedure)

Managing Model Builds

The process of building mining models can consume significant system resources. You have the option to control the impact of model builds on overall system resources, by increasing or decreasing the value of MAX_NUM_THREADS repository property.

MAX_NUM_THREADS specifies the maximum number of mining model builds that can execute concurrently across all workflows in an Oracle Data Miner session. MAX_NUM_THREADS has no effect on model builds that are triggered individually and do not belong to a workflow.

For example, if one workflow is attempting to build 25 models while another workflow is attempting to build 15 models and MAX_NUM_THREADS is 10, then 10 model build operations occur simultaneously and the 30 remaining model builds are queued. The delayed build processes complete with a warning message that explains the reason for the delay. Two additional properties, THREAD_WAIT_TIME and MAX_THREAD_WAIT control the queuing of model builds. See Table 6-1 for details.

Example 6-2 shows how to increase the maximum number of concurrent model builds from 10 (the default) to 15.

This script produces the following log:

-- value of MAX_NUM_THREADS before update
SELECT property_name, property_num_value 
    FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    WHERE property_name = 'MAX_NUM_THREADS';
PROPERTY_NAME PROPERTY_NUM_VALUE
------------------------------ ----------------------
MAX_NUM_THREADS 10
-- update MAX_NUM_THREADS
UPDATE ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    SET property_num_value = 15
    WHERE property_name = 'MAX_NUM_THREADS';
1 rows updated
-- commit change
commit
committed
-- value of MAX_NUM_THREADS after update
SELECT property_name, property_num_value 
    FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    WHERE property_name = 'MAX_NUM_THREADS';
PROPERTY_NAME PROPERTY_NUM_VALUE
------------------------------ ----------------------
MAX_NUM_THREADS 15

Example 6-2 Changing the Number of Concurrent Model Builds

set echo on;
-- value of MAX_NUM_THREADS before update
SELECT property_name, property_num_value 
    FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    WHERE property_name = 'MAX_NUM_THREADS';
-- update MAX_NUM_THREADS
UPDATE ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    SET property_num_value = 15
    WHERE property_name = 'MAX_NUM_THREADS';
-- commit change
COMMIT;
-- value of MAX_NUM_THREADS after update
SELECT property_name, property_num_value 
    FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
    WHERE property_name = 'MAX_NUM_THREADS';

Managing Workflow Execution

Oracle Data Miner submits workflows to Oracle Database Scheduler for execution as Scheduler Jobs. Oracle Scheduler supports a variety of features that control how system resources are allocated. You can configure Oracle Scheduler to effectively manage a large pool of run requests.

Oracle Data Miner uses the default Scheduler job class, DEFAULT_JOB_CLASS as its own default. In a resource plan, jobs that run as DEFAULT_JOB_CLASS are not assigned to any consumer group; access to system resources is not restricted for jobs that have the default class. You can change the job class to a class that is based on a consumer group by setting the Oracle Data Miner repository property WORKFLOW_JOB_CLASS.

Example 6-3 shows you could create a MINING_CLASS job class based on a consumer group HEAVY_LOAD_RESOURCE_GROUP, which was previously created to allocate high CPU for heavy workload jobs. When you update WORKFLOW_JOB_CLASS, the workflow will run with access to system resources that are restricted to this consumer group.The resource plan for the assigned HEAVY_LOAD_RESOURCE_GROUP group must be active when the workflow is run. You can set up Scheduler windows to activate specific resource plans at specific time periods.

Example 6-3 Changing the Scheduler Job Class for Oracle Data Miner Workflows

connect sys as sysdba;
Enter password: password
EXEC DBMS_SCHEDULER.CREATE_JOB_CLASS(
              job_class_name => 'MINING_CLASS', 
              resource_consumer_group => 'HEAVY_LOAD_RESOURCE_GROUP');
GRANT EXECUTE ON MINING_CLASS to DMUSER;
-- update WORKFLOW_JOB_CLASS
UPDATE ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   SET property_str_value = 'MINING_CLASS'
   WHERE property_name = 'WORKFLOW_JOB_CLASS';
-- commit change
commit;
-- verify value of WORKFLOW_JOB_CLASS after update
SELECT property_name, property_str_value 
   FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   WHERE property_name = 'WORKFLOW_JOB_CLASS';

Managing Parallel Processing

Oracle Data Miner workflows and views, and most Data Mining algorithms, can take advantage of parallel processing of queries when it is enabled in the database. Parameters in INIT.ORA control the behavior of parallel processing. By default parallelism is disabled (PARALLEL_DEGREE_POLICY=MANUAL).

Parallel processing of queries can be system-determined, or it can be set to Degree of Parallelism. When parallel processing is system-determined, the database dynamically determines Degree of Parallelism values for all SQL statements.

The parallel feature of Oracle Database is designed to use maximum resources assuming the operation will finish faster if you use more resources. In a multiuser environment, increasing the use of parallelism can rapidly deplete system resources, reducing resources for other users to execute parallel statements concurrently.

Oracle Data Miner workflows support a parallel query feature, which allows users to manually enable parallel processing for specific nodes or entire workflows. You can disable this feature by setting the Oracle Data Miner repository property PARALLEL_QUERY_ON_ALLOWED to FALSE. By default, this property is set to TRUE. Example 6-4 shows how to disable the parallel query processing feature of Oracle Data Miner.

The Oracle Data Miner model build process can generate many parallel sessions if your database configuration allows for it. To limit the impact on the overall system, you should implement an appropriate resource plan and, if necessary, prevent users from setting parallel query within their Data Miner sessions.

Example 6-4 Disabling the Parallel Query Feature for Oracle Data Miner Workflows and Nodes

connect sys as sysdba;
Enter password: password
-- value of PARALLEL_QUERY_ON_ALLOWED before update
SELECT property_name, property_str_value 
   FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';
-- update PARALLEL_QUERY_ON_ALLOWED
UPDATE ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   SET property_str_value = 'FALSE'
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';
-- commit change
COMMIT;
-- verify value of PARALLEL_QUERY_ON_ALLOWED after update
SELECT property_name, property_str_value 
   FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';
connect sys as sysdba;
Enter password: password
-- value of PARALLEL_QUERY_ON_ALLOWED before update
SELECT property_name, property_str_value 
   FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';
-- update PARALLEL_QUERY_ON_ALLOWED
UPDATE ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   SET property_str_value = 'FALSE'
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';
-- commit change
COMMIT;
-- verify value of PARALLEL_QUERY_ON_ALLOWED after update
SELECT property_name, property_str_value 
   FROM ODMRSYS.ODMR$REPOSITORY_PROPERTIES
   WHERE property_name = 'PARALLEL_QUERY_ON_ALLOWED';

Summary of Oracle Data Miner Repository Properties for System Management

A summary of the system management properties is available in the Oracle Data Miner repository.

Table 6-1 provides a summary of the system management properties available in the Oracle Data Miner repository.

Table 6-1 Oracle Data Miner Repository Properties for System Management

Property Type Description

PARALLEL_QUERY_ON_ALLOWED

Boolean

Indicates whether users can specify parallel query for nodes or workflows. Values are TRUE or FALSE. Default is TRUE.

MAX_NUM_THREADS

Integer

Maximum number of concurrent model builds. Default is 10.

THREAD_WAIT_TIME

Integer

When MAX_NUM_THREADS is reached, any outstanding model build processes are queued until the parallel model build count is less than MAX_NUM_THREADS. The THREAD_WAIT_TIME setting determines how long to wait before checking the parallel model build count. The default wait time is 5 seconds.

MAX_THREAD_WAIT

Integer

The timeout interval in seconds for a model build process that has been queued. When a timeout occurs, the build process exits with an error message stating that the process lock wait timeout interval has been exceeded. When the value is NULL, no timeout occurs. Default is NULL

MAX_STRING_SIZE VARCHAR2, NVARCHAR2 Controls the maximum size of the data types VARCHAR2 and NVARCHAR2 in tables that are generated by Oracle Data Miner. This property is initialized from the value of MAX_STRING_SIZE database property.

Note:

If you change the MAX_STRING_SIZE database property, then you must update the MAX_STRING_SIZE property in the ODMR$REPOSITORY_PROPERTIES to match the value. ODMr run time error may occur if the values are not in synchrony.

WORKFLOW_JOB_CLASS

Varchar

Oracle Scheduler job class for workflows. Default is DEFAULT_JOB_CLASS.

POLLING_IDLE_RATE NUMBER This is the rate at which Oracle Data Miner client polls the database when there are apparently no workflows detected as Running. Default is 30000 milliseconds (internal default). The actual value is 0 in the repository property table. If the administrator changes its value to a non zero value, then it becomes the active setting.
POLLING_ACTIVE_RATE NUMBER

This is the rate at which Oracle Data Miner client polls the database when there are workflows detected as Running. Default is 1500 milliseconds (internal default). The actual default in the repository property setting is 0. If you change it to a non zero value, then it becomes the active setting.

POLLING_COMPLETED_WINDOW NUMBER This is the window of time in hours to include completed workflows in the polling query result. Default is 0 in the repository. Client level default is 24. This setting will override the client level preference setting available to the user if it is changed to a non zero value.
PURGE_WORKFLOW_SCHEDULER_OBJS VARCHAR2 Purges old Oracle Scheduler objects generated by the running workflows.
  • TRUE (Default). Purges Oracle Scheduler objects automatically

  • FALSE. Does not purge any Oracle Scheduler objects. If a workflow is deleted, then Oracle Scheduler objects are purged.

    Oracle recommends that you use the FALSE option if you must triage workflow issues. You must set it to TRUE after completing the triage.

PURGE_WORKFLOW_EVENT_LOG NUMBER Controls the number of workflows runs that are preserved for each workflow in the event log. The events of the older workflow are purged to keep within this limit. Default is 2.

Note:

If you provide the value 0, then Oracle Data Miner will not purge any workflow events. This is not recommended as it will increase the size of the event log, and affect query performances.