Getting Started with Troubleshooting an Oracle Enterprise Scheduler Cluster

Troubleshooting an Oracle Enterprise Scheduler clustered environment may involve finding performance and scalability issues, using a shared database or tuning Oracle Enterprise Scheduler system performance.

Troubleshooting an Oracle Enterprise Scheduler clustered environment involves the following.

Finding Performance and Scalability Issues

It is possible to detect any issues with performance and scalability problems by viewing performance metrics in Fusion Middleware Control. Metrics include Oracle WebLogic Server, JVM-level metrics and plots, as well as Oracle Enterprise Scheduler level metrics.

The system-level tools specific to the operating system running on the server can be used as an additional diagnostic tool. System level tools indicate how machine resources are utilized at various times, such as network, memory, CPU, I/O, and so on. Such system tools are especially useful for tuning job implementations. Database tools enable identifying problems in the database. For remote jobs, such as ADF Business Components or SOA Java jobs running on a remote system, you can use the corresponding Fusion Middleware Control and system level tools on those servers.

Fusion Middleware Control provides the following types of Oracle Enterprise Scheduler metrics to help identify problems:

Completed job request statistics by job name. Shows run count, run time, success rate and last run job request statistics for completed requests by job name.
Completed job request statistics by user. Shows completed request count and run time for completed requests by user.
Completed job request statistics by work assignment. Shows wait time, processing time, completed and failed count for completed job requests by work assignment.
Completed job request count by status. Displays completed job requests in a variety of terminal states.

For more information about metrics, see Monitoring Key Performance Metrics for Oracle Enterprise Scheduler..

Using a Shared Database

A common database can be used across multiple application domains. In this case, the database may be loaded from multiple sources. To help with this, Oracle Enterprise Manager allows you to see running and waiting jobs, as well as metrics across the database from multiple Oracle Enterprise Scheduler systems.

Tuning Oracle Enterprise Scheduler System Performance

The following potential performance and scalability issues may occur in the context of job requests or Oracle Enterprise Scheduler runtime.

Jobs are saturating the CPU of Oracle Enterprise Scheduler servers.
Jobs are overloading the remote systems where asynchronous jobs are running.
Ready jobs are filling the queue, despite the availability of spare CPU power, such that job output is delayed.
Multiple domains are sharing a database. A great number of concurrently running database-intensive jobs across domains are slowing down the database.
Performance and scalability are affected by the running of large jobs at the end of a financial quarter or month.
Performance and scalability are affected by the concurrent running of two or more jobs that interact very intensively with the database.

Tuning involves changing job implementations or changing schedules, Oracle Enterprise Scheduler cluster size, processor bindings for work assignments, throttling and thread limits.

Tuning Clusters

Clusters are the basic mechanism for enabling scalability and high availability. When a job runs, it is equally likely to run on any processor on which it is eligible to run at that time. By carefully controlling the size of the Oracle Enterprise Scheduler cluster, it is possible to better distribute job executions across the cluster so that servers do not become overloaded. In the case of remote jobs, the jobs actually execute on a remote system and consume very few resources in the cluster (except for a blocked thread for synchronous jobs). If jobs running locally are overloading the system, the first step is to revisit the cluster size configuration.

Processor Bindings

Some jobs can physically execute only on a given server; these jobs have been bound to run only on that server. If too many jobs are bound to a particular processor, the benefits of a clustered environment are effectively moot. For the purposes of high availability, avoid tying a job to just one server while enabling the job to run on at least two servers. The job does not run if the bound processor is down.

Rather than relying on clusters to randomly distribute work, you may have a set of long, resource intensive jobs to be run locally within a given scheduling window. In this case, you can bind jobs to specific processors and explicitly control the distribution of these jobs.

A processor is an Oracle Enterprise Scheduler instance. One Oracle Enterprise Scheduler instance runs on one cluster node. As one cluster node typically runs on a single computer, a processor normally equates to a computer.

At times, a clustered environment runs well until the scheduled periods during which a number of resource intensive jobs run. In order to maintain performance, you can configure the cluster with extra idle nodes that are activated during busy periods so as to handle the extra job load. You use standard Oracle WebLogic Server cluster methodologies to enable this clustering. For more information about Oracle WebLogic Server clustering, see the "High Availability for WebLogic Server" chapter in the Oracle Fusion Middleware High Availability Guide.

Job performance can vary depending on whether executing jobs are synchronous or asynchronous. Synchronous jobs consume a single thread throughout their execution, and are normally short lived. (An exception might be a process or spawned job that loads a database.) Asynchronous jobs consume a thread at the beginning and end of execution for a very short time, but they otherwise run independently. Asynchronous jobs are typically long running and continue execution across server restarts. Typical examples of asynchronous jobs are PL/SQL jobs, Java jobs that invoke remote asynchronous ADF Business Components services, and Java jobs that invoke remote asynchronous SOA services.

Throttling limits the maximum number of jobs that may execute concurrently. This is important to avoid flooding the system with too many concurrently running jobs. For synchronous jobs, this limit is imposed by limiting the number of threads available for execution. For PL/SQL jobs and other asynchronous jobs, this limit is imposed by defining a maximum concurrency limit for PL/SQL and asynchronous jobs respectively. Asynchronous throttling limits are set on the work assignment to which the jobs are assigned. For more information about setting asynchronous job limits, see Creating or Editing a Workshift

It is possible that all threads configured are used up for synchronous jobs thereby blocking asynchronous jobs from starting. This can be avoided if asynchronous and synchronous jobs are not combined in a single work assignment.

Using Job Incompatibility to Manage Performance

You can configure a job incompatibility not only to prevent two incompatible jobs from running, but in order to prevent both intensive jobs from heavily loading the same resource. In order to maintain good performance, you can define an incompatibility for such jobs so that they never run at the same time. For more information about defining a job incompatibility, see Creating or Editing an Incompatibility.

Tuning Oracle Enterprise Scheduler for Optimal Performance

You can tune the following Oracle Enterprise Scheduler components:

Request dispatcher
Request processor
Connection pool size
RDBMS Scheduler

Tuning the Dispatcher

The dispatcher tuning parameters apply to the Oracle Enterprise Scheduler request dispatcher. The request dispatcher manages requests that are awaiting their scheduled execution. The request processor handles the job requests after they have executed.

Parameters are as follows.

Dispatcher Enabled: Indicates whether the request dispatcher is enabled on the Oracle Enterprise Scheduler server. When disabled, that Oracle Enterprise Scheduler server does not dispatch job requests whose scheduled execution time has arrived. By default, this parameter is enabled.
Maximum Poll Interval: Specifies the maximum frequency, in seconds, at which the request dispatcher checks for job requests that are ready to be dispatched. The default value is 15 seconds.

Tuning the Processor

The processor tuning parameters apply to the Oracle Enterprise Scheduler request processor. The request processor manages job requests whose scheduled execution time has arrived, and are ready to execute.

Parameters are as follows.

Processor Enabled: Indicates whether the request processor is enabled on the Oracle Enterprise Scheduler server. If disabled, the Oracle Enterprise Scheduler server does not process requests that are ready to be executed. By default, this parameter is enabled.
Maximum Processor Threads: Specifies the maximum number of threads used to process job requests. This represents the total number of worker threads that might run concurrently for all active work assignments for the Oracle Enterprise Scheduler server. By default, this parameter is set to 25.
Starvation Threshold: Indicates the wait time, in minutes, before a job request that is ready to be executed is considered starved and eligible to be processed by a starvation worker thread. The starvation worker processes only those job requests that have been ready longer than the starvation time. A starvation worker is not created if the threshold value is equal to zero.

If enabled (meaning the parameter value is greater than zero), a starvation worker thread is created for each active work assignment for the Oracle Enterprise Scheduler server. The Maximum Processor Threads parameter does not apply to starvation workers. By default, the value of this parameter is set to zero, such that no starvation worker is created.

Tuning the Connection Pool Size for the Oracle Enterprise Scheduler Internal Data Source

The connection pool size for the Oracle Enterprise Scheduler internal JDBC data source should be based on the request processor tuning values configured for the Maximum Processor Threads and Starvation Threshold parameters.

The recommended pool size if the Starvation Threshold parameter is disabled (its value is equal to zero) is the number of maximum processor threads plus twenty.

The recommended pool size if the Starvation Threshold parameter is enabled (its value is greater than 0) is the number of maximum processor threads, along with the number of bound work assignments plus twenty.

Tuning the RDBMS Scheduler

The RDBMS scheduler is capable of auto-tuning. To enable auto-tuning, set job_queue_processes to 0. Leave JOB_QUEUE_PROCESSES to the default value at 1000. For more information about the JOB_QUEUE_PROCESSES parameter, see the Oracle Database Reference.

Tuning Dead Database Connections

Oracle Enterprise Scheduler spawned jobs connect to the database using SQL*Net. If the spawned jobs are canceled, Oracle Enterprise Scheduler kills these processes at the operating system level. It is possible, however, that the database connections used by these processes still exist in the database.

To reduce dead connections in the database, use the SQLNET.EXPIRE_TIME configuration option by setting this value to the desired value. For more information about the SQLNET.EXPIRE_TIME parameter, see the "Parameters for the sqlnet.ora File" chapter in Oracle Database Net Services Reference.