B Troubleshooting Oracle Enterprise Scheduler

This appendix describes common problems that you might encounter when using Oracle Enterprise Scheduler and explains how to solve them.

This appendix includes the following sections:

Section B.1, "Introduction to Troubleshooting Oracle Enterprise Scheduler"
Section B.2, "Getting Started with Troubleshooting Oracle Enterprise Scheduler Jobs"
Section B.3, "Getting Started with Troubleshooting an Oracle Enterprise Scheduler Cluster"
Section B.4, "Problems and Solutions"
Section B.5, "Using My Oracle Support for Additional Troubleshooting Information"

In addition to this chapter, review the Oracle Fusion Middleware Error Messages Reference for information about the error messages you may encounter.

B.1 Introduction to Troubleshooting Oracle Enterprise Scheduler

This section provides guidelines and a process for using the information in this chapter. Using the following guidelines and process will focus and minimize the time you spend resolving problems.

Guidelines

When using the information in this chapter, Oracle recommends:

After performing any of the solution procedures in this chapter, immediately retrying the failed task that led you to this troubleshooting information. If the task still fails when you retry it, perform a different solution procedure in this chapter and then try the failed task again. Repeat this process until you resolve the problem.
Making notes about the solution procedures you perform, symptoms you see, and data you collect while troubleshooting. If you cannot resolve the problem using the information in this chapter and you must log a service request, the notes you make will expedite the process of solving the problem.

B.2 Getting Started with Troubleshooting Oracle Enterprise Scheduler Jobs

You may want to troubleshoot the following typical issues that can arise when running Oracle Enterprise Scheduler jobs.

Asynchronous jobs remain in running state indefinitely.
An asynchronous job hangs or crashes.
Oracle Enterprise Scheduler is down when the remote scheduled job completes or there are network problems such that Oracle Enterprise Scheduler does not receive the completion status from the remote job.
A scheduled job is ready to execute, but does not execute.
A scheduled job is placed in manual error recovery state where troubleshooting is needed.
Oracle Enterprise Scheduler is throwing errors.
A scheduled job ends in error.

For troubleshooting Oracle Enterprise Scheduler, use the standard Oracle WebLogic Server system log. For information about viewing job request logs, see Section 6.7

This section contains the following topics:

Troubleshooting Asynchronous Scheduled Jobs
Troubleshooting Process Jobs
Steps for Manual Recovery
Job Diagnostics

B.2.1 Troubleshooting Asynchronous Scheduled Jobs

Asynchronous jobs run on separate JVMS, including asynchronous Oracle BI Reporting and Publishing, PL/SQL and Java jobs that invoke asynchronous SOA or ADF Business Components services. When handling asynchronous scheduled jobs, Oracle Enterprise Scheduler depends on the remote job sending a completion status defining the job outcome after running. However, the completion status may not be generated, or it may get lost for any of the following reasons:

The scheduled job has crashed.
The job is stuck in a hanging state.
Oracle Enterprise Scheduler was down when the job completed.
Network problems.

In any of these cases, an asynchronous job stays in running state indefinitely. As a result, subsequent steps in a job set may not execute, or an incompatible job may be blocked indefinitely.

Oracle Enterprise Scheduler displays information in Fusion Middleware Control that enables you to locate the job on the remote system, including an external identifier.

You can take any of the following actions to troubleshoot an asynchronous job that is stuck in running state:

Use the remote system to troubleshoot the job and determine the outcome of its execution. For more information, see Section B.2.1.1 and Section B.2.1.2
Cancel the job. For more information, see Section 4.2.4.
Configure a timeout for asynchronous jobs. For more information, see Section 4.2.1.

When configuring timeouts for jobs, you can use Fusion Middleware Control to display all jobs that have timed out. However, a job that has timed out is still in running state. You must manually change the state of jobs that have timed out. Status callbacks are still accepted for timed out jobs and the job transitions to completion.

You can change the status of asynchronous jobs that have not timed out. This might happen if a timeout has not been configured, the completion status was lost and you notice that the job has been running for a long time.

B.2.1.1 Troubleshooting Asynchronous Java Jobs

In the case of asynchronous Java jobs (including jobs that invoke remote Oracle SOA Suite or Oracle ADF Business Components services), the log records are tagged with the ECID. You can view logs across the domain by ECID to troubleshoot the job execution. Oracle ADF Business Components, SOA composites, the web services stack and application log records with the ECID.

For asynchronous SOA jobs, the audit trail for the instance in Fusion Middleware Control can be used to troubleshoot composite execution, as described in Section B.4.4.

Example B-1 shows a sample log message file the JRF web services stack in server-diagnostic.log, with inline comments for each log. This log is stored in the following directories:

(UNIX) DOMAIN_HOME/servers/server_name/logs
(Windows) DOMAIN_HOME\servers\server_name\logs

Example B-1 Asynchronous job logging messages in server-diagnostic.log

Sending message to JMS queue "oracle.j2ee.ws.server.async.DefaultRequestQueue" for asynchronous
processing of service b99d80e5-42aa-423a-9e98-f6f88b8b79dfRequest.

[2010-08-30T18:28:13.519-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: <anonymous>] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0:1] [WEBSERVICE\_PORT.name:
 EmployeeModuleServiceSoapHttpPort] [APP: ADFBCAsync] [J2EE\_MODULE.name: ADFBCAsync\-ejb]
[WEBSERVICE.name: EmployeeModuleService] [J2EE\_APP.name: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

An asynchronous request message is received and successfully recorded for the service
EmployeeModuleService with the replyTo address
http://adc2180314:7001/ADFBCAsyncCallback/EmployeeModuleServiceCallbackResponseImplService.

[2010-08-30T18:28:13.724-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: <anonymous>]
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0:1] [WEBSERVICE\_PORT.name:
EmployeeModuleServiceSoapHttpPort] [APP: ADFBCAsync] [J2EE\_MODULE.name: ADFBCAsync\-ejb]
[WEBSERVICE.name: EmployeeModuleService] [J2EE\_APP.name: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 
Unknown macro: {/service/common/} 

Started asynchronous request processing for the service EmployeeModuleService with the message
selector "b99d80e5-42aa-423a-9e98-f6f88b8b79dfRequest". Transaction enabled: "true".

[2010-08-30T18:28:13.811-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '2' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

Completed asynchronous request processing. A response is sent to the client.

[2010-08-30T18:28:17.307-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '2' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56]

Started asynchronous response processing for the service EmployeeModuleService with the message
selector "b99d80e5-42aa-423a-9e98-f6f88b8b79dfResponse".

[2010-08-30T18:28:17.330-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 
Unknown macro: {/service/common/} 

Completed asynchronous response processing successfully. The client should have received the
response at this point.

[2010-08-30T18:28:17.825-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

Completed asynchronous response processing with exceptions. The client does not receive any
response.

[2010-08-31T09:55:33.939-07:00] AdminServer ERROR [] oracle.j2ee.ws.common.jaxws.JAXWSMessages
[tid: ACTIVE.ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId:
OracleSystemUser] [ecid: 0000If94mKW76EWLHyo2yf1CVJG1000000,0] [APP: j2wpojoasync] [MessageID:
urn:uuid:5b9a5134-1416-4bda-95fd-da6e624466d7] 

The response will not be sent again as the callback service has replied with a SOAP fault. The HTTP
response code is 500.

[2010-08-31T10:17:35.852-07:00] AdminServer ERROR [] oracle.j2ee.ws.common.jaxws.JAXWSMessages
[tid: ACTIVE.ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId:
OracleSystemUser] [ecid: 0000If99plA76EWLHyo2yf1CVJ^i000000,0] [APP:
ADFBCAsyncInvalidCallbackCreds] [MessageID: urn:uuid:7e355428-f68f-4c3f-a368-baf529048323]

For more information about configuring parameters for a job, see Section 4.2.1. For more information about viewing the server-diagnostic.log file, see Section 6.7.6.

B.2.1.2 Troubleshooting Asynchronous PL/SQL Jobs

PL/SQL jobs can be identified in the Oracle Database by their job names. You can find the database job names associated with the request on the Job Details page in Fusion Middleware Control. For more information about viewing job request details, see Section 4.2.4.

B.2.2 Troubleshooting Process Jobs

Spawned job processes can be identified by a combination of host name, process ID and process group ID. The Job Details page in Fusion Middleware Control shows this and additional information. For more information about viewing job request details, see Section 4.2.4.

B.2.3 Steps for Manual Recovery

Job requests may require manual error recovery for a number of reasons. For example, in the case of an asynchronous job request, the job implementation may not know whether the job request launched successfully and an error manual recovery exception is thrown. When a job request is placed in manual error recovery, the job request must be manually completed. If the job request actually was successfully launched and returns a completion status on completion, this status is ignored.

A job request may be placed in manual error recovery for the following reasons:

An asynchronous Java job request throws an ExecutionManualRecoveryException, which indicates to Oracle Enterprise Scheduler that manual recovery is necessary. The job request is placed in ERROR_MANUAL_RECOVERY state. The cause is set to Cause.PROCESS_MANUAL_RECOVER_ERROR (209).
An asynchronous Java job request throws a run-time exception or an error. If a thrown exception is not handled by the job implementation, Oracle Enterprise Scheduler cannot know whether the remote job was invoked, such that manually recovery is required. The cause is set to Cause.PROCESS_MANUAL_RECOVER_ERROR (209).
If Oracle Enterprise Scheduler crashes before it has finished initiating an asynchronous job request, it cannot know whether the remote job has been invoked. The job request transitions to ERROR_MANUAL_RECOVERY, which holds onto incompatibility locks. The cause is set to Cause.PROCESS_RECOVER (210). If the service crashes after it has finished initiating the asynchronous job and the job has transitioned to running, then the job will continue in a running state and complete when the callback occurs.
A spawned job runs in a clustered environment, the job request runs on the first instance of Oracle Enterprise Scheduler, which goes down, along with the associated Perl agent. If the first instance of Oracle Enterprise Scheduler will not be back up and running for a while, Oracle Enterprise Scheduler does not know whether the spawned process is actually still running. Manual detection and recovery are required. The cause is set to Cause.PROCESS_RECOVER(210).

B.2.3.1 Handling Synchronous Java Jobs Requiring Manual Recovery

If a Java job times out or runs for too long, you can either let it run to completion as usual, or attempt to terminate it.

To recover synchronous Java jobs:

If the job is not in the ERROR_MANUAL_RECOVERY state, cancel the job. See Section 4.2.6. If the job remains in CANCELLING state for an unreasonable amount of time, continue to the next step.
Take one of the following actions:
- If, after some time, the request transitions to a terminal state, no other intervention is required.
- If the request remains in CANCELLING state, then determine the Oracle Enterprise Scheduler server on which the request is running by finding the server and cluster names.
  
  Determine the cluster name from the Oracle Enterprise Scheduler process group. Process group information is shown in search results only when the scope for the job request search is All Scheduling Services sharing the ESS repository. See Section 4.2.2 for information on performing searches.
If canceling the job is not effective, restart the relevant Oracle Enterprise Scheduler server:
1. From the navigation pane, expand the farm and then WebLogic Domain.
2. Expand the Oracle Enterprise Scheduler cluster and select the Oracle Enterprise Scheduler server.
3. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Shut Down.
4. After the server is shut down, from the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.
In the Request Details page, from the Action menu, select Recover Stuck Request.

B.2.3.2 Handling Stuck Asynchronous Jobs Requiring Manual Recovery

When an asynchronous job request requires manual recovery, follow these basic steps. Additional steps will depend on the job type.

If a request is stuck (marked for manual recovery or taking too long or timed out) and it is an asynchronous remote job, you first check to see if the remote job is still running.

To handle asynchronous jobs:

Check to see if the remote job is still running:
1. Identify the remote job by navigating to the Request Details page. See Section 4.2.4 for information about how to view the Request Details page.
  
  - For PL/SQL jobs, navigate to the Request Details page and click the Execution Type icon. The database session information displays, correlating the scheduled job request with the database scheduler job. This indicates whether the job is still executing.
  
  - For spawned jobs, navigate to the Request Details page, click the Execution Type icon to display the spawned process information. This correlates the scheduled job request with the non-Oracle Enterprise Scheduler job, in this case the operating system process. This correlation indicates whether the job is still executing.
2. Verify that the remote job is no longer running.
  
  - If the remote job was not successfully created on the remote system, set the status of the job request to ERROR.
  
  - If the remote job was created and has finished executing, determine its status and set the status of the job request accordingly.
  
  - If the remote job instance has not finished executing, wait until it completes and set the job request status accordingly.
Once the remote job is no longer running, terminate the job request in Oracle Enterprise Scheduler, so that Oracle Enterprise Scheduler is no longer is keeping track of the job.
1. Navigate to the Search request page by clicking the Scheduling Service menu and selecting Search Job Request.
2. From the Quick Search list, for asynchronous jobs marked for manual recovery (ERROR_MANUAL_RECOVERY), select Asynchronous requests that need manual recovery. Requests will be in RUNNING state. For asynchronous jobs not in the ERROR_MANUAL_RECOVERY state, rather than search with the Asynchronous requests that need manual recovery option, search for the known requests that need to be updated.
3. In the search results, click the request ID to display the Request Details page.
4. In the Request Details page, from the Action menu, choose Recover Stuck Request.
5. In the Recover Stuck Request dialog box, set the state accordingly for the job request. Optionally, add a description for the status of the job request.
  
  If you set the status to ERROR, the description you add displays in the Request Details page.

B.2.4 Job Diagnostics

A scheduled job may not execute for a number of reasons, or it may fail. Either way, Fusion Middleware Control provides built-in diagnostics in the Job Details page. For jobs that fail with an error, the Job Details page displays the reason and provides access to the job request log.

Fusion Middleware Control provides the following:

Access logs for the job request.
Database session information for PL/SQL job requests, shown in the Job Details page for a PL/SQL job request.
Spawned process information for process job requests, shown in the Job Details page for process job requests.
A message displays in the Job Details page for job requests in wait, ready or blocked state specifying the reason the job request is in that state.
Error and warning messages display in the Job Details page. Additional details also display, such as stack traces and so on.
For retried job requests, the Job Details page displays the number of times the job request was retried, the time of the next retry and the number of additional times the job request is to be retried in the event of an error.
For job requests that require manual recovery or have timed out, the Job Details page displays a message regarding a need for manual recovery.

For more information about viewing job request details, see Section 4.2.4.

Table B-1 shows the associated diagnostic codes for each state along with a description and additional information that is provided. If a request is in a state that does not appear in the table, its diagnosis will contain only the request state.

Table B-1 Job Request States and Associated Diagnostic Codes

State	Diagnostic Code	Description	Related Documentation
`BLOCKED`	`BLOCKED`	Blocked due to incompatible job request or requests. Includes the request ID of the blocking request.	For information about cancelling a job request, see Section 4.2.6.
`COMPLETED`	`POSTPROCESS_DELAY`	The job request is delayed by the post-processor. Includes the time at which the delay ends.
`PAUSED`	`PAUSED`	The job request is the parent of one or more subrequests and has been paused. Includes the request ID of a subrequest in a non-terminal state, if there is one.	For information about resuming a paused job request, see Section 4.2.5.
`READY`	`NO_ACTIVE_SERVER`	No server is active in the process group. Includes the name of the process and isolation groups.	For information about activating an Oracle Enterprise Scheduler, see Section 3.6.
`READY`	`REQUESTED_PROCESSOR_NOT_ACTIVE`	The server specified by the job request `SYS_requestedProcessor` property is not available. Includes the name of the requested processor.
`READY`	`NO_APPLICATION`	The application is either not deployed or not active. Includes the name of the application, process group and isolation group.
`READY`	`PROCESSOR_STOPPED`	The request cannot be processed because there is no server with the application deployed with an active processor.	For information about starting a request processor, see Section 3.6.2.
`READY`	`PROCESSOR_FAILED`	The request cannot be processed because the processor has failed on all servers to which the application is deployed.
`READY`	`PROCESSOR_WAIT`	The job request is waiting for an available processor thread. Includes the name of a work assignment that could process the job request.
`READY`	`INACTIVE_WORK_ASSIGNMENT_WAIT`	Waiting for a work assignment to become active. Includes the name of an inactive work assignment that could process the job request if it were active.
`READY`	`NO_LOADED_WORK_ASSIGNMENT`	There is a bound work assignment that could process the request, but the binding is not loaded. The server may be down. Includes the name of a work assignment that could process the request but is not loaded.	For information about activating an Oracle Enterprise Scheduler, see Section 3.6.
`READY`	`NO_BOUND_WORK_ASSIGNMENT`	You must bind a specialized work assignment to the job request.	For information about binding a work assignment, see Section 3.4.2.
`READY`	`THROTTLED`	The job request is asynchronous and the number of active asynchronous jobs of the same type is at the allowed limit. Includes the work assignment, workshift, the asynchronous job type and the asynchronous limit.	Change the number of threads allocated to jobs, or the asynchronous job limits for the work shift. For more information about editing a work shift, see Section 5.3.1.1.
`READY`	`DISABLED_WORK_ASSIGNMENT`	There is a bound work assignment that could process the request, but the work assignment is disabled. Includes the name of the disabled work assignment.	Enable the work assignment. For more information about editing a work assignment, see Section 5.3.1.1.
`RUNNING`	`PREPROCESS_DELAY`	The request is delayed by the pre-processor. Includes the time at which the delay ends.
`RUNNING`	`TIMED_OUT`	The job request has timed out. Typically, this code displays for timed out asynchronous Java job requests.
`RUNNING`	`NO_ACTIVE_SERVER`	The job request processing cannot continue because no server is active in the process group. Includes the name of the process and isolation groups.	Verify that the Oracle Enterprise Scheduler server is running (see Section 6.2). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Section 3.6.
`RUNNING`	`REQUESTED_PROCESSOR_NOT_ACTIVE`	Job request processing cannot continue because the server specified by the job request `SYS_requestedProcessor` property is not available. Includes the name of the requested processor.	Verify that the Oracle Enterprise Scheduler server is running (see Section 6.2). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Section 3.6.
`RUNNING`	`NO_APPLICATION`	Job request processing cannot continue because the application is either not deployed or not active. Includes the name of the application, process group and isolation group.
`RUNNING`	`PROCESSOR_STOPPED`	Job request processing cannot continue because there is no server with the application deployed with an active processor.	Verify that the Oracle Enterprise Scheduler server is running (see Section 6.2). Start a request processor, as described in Section 3.6.2.
`RUNNING`	`PROCESSOR_FAILED`	Job request processing cannot continue because the processor has failed on all servers to which the application is deployed.	Verify that the Oracle Enterprise Scheduler server is running (see Section 6.2). Restart the request processor, as described in Section 3.6.2.
`RUNNING`	`PROCESSOR_WAIT`	Waiting for an available processor thread to handle the update event for an asynchronous Java job request. Includes the name of a work assignment that could process the request; indicates whether the job request has disabled update events.
`RUNNING`	`INACTIVE_WORK_ASSIGNMENT_WAIT`	Waiting for a work assignment to become active to handle the update event for an asynchronous Java request. Includes the name of an inactive work assignment that could process the request if it were active; indicates whether the job request has disabled update events.	Activate the inactive work assignment so that the job request can be processed. For more information, see Section 5.3.1.1.
`RUNNING`	`NO_LOADED_WORK_ASSIGNMENT`	Processing of an asynchronous Java job request may be delayed. There is a bound work assignment that could process the update event, but the binding is not loaded. The server may be down. Includes the name of a work assignment that could process the job request but is not loaded; indicates whether the job request has disabled update events.	Verify that the Oracle Enterprise Scheduler server is running (see Section 6.2). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Section 3.6.
`RUNNING`	`NO_BOUND_WORK_ASSIGNMENT`	Processing of an asynchronous Java job request may be delayed. There is no bound work assignment that can handle the update event for the asynchronous Java job request; indicates whether the request has disabled update events.	Bind a work assignment to the request processor. For more information, see Section 3.4.2.
`RUNNING`	`DISABLED_WORK_ASSIGNMENT`	Processing of an asynchronous Java job request may be delayed. There is a bound work assignment that could process the update event for the job request, but the work assignment is disabled. Includes the name of the disabled work assignment; indicates whether the job request has disabled update events.	Enable the work assignment. For more information about editing a work assignment, see Section 5.3.1.1.
`WAIT`	`FUTURE_START`	The job request has a scheduled time in the future. Includes the scheduled time for the job request.
`WAIT`	`RETRY_DELAY`	An error occurred in the job request, and the job request has been delayed before being retried. Includes the scheduled retry time.
`WAIT`	`NO_APPLICATION`	The job request scheduled time has been reached, but the job request cannot be dispatched because the application is not available (is either not deployed or not active). Includes the name of the application, process group and isolation group.
`WAIT`	`PARENT_NOT_PAUSED`	The job request is a subrequest whose parent has not paused. The subrequest remains in `WAIT` state until its parent pauses. Includes the parent request ID.	For information about pausing a paused job request, see Section 4.2.5.
`WAIT`	`DEFERRED`	The job request is an instance of a recurrence for which the previous recurrence instance is still running. Oracle Enterprise Scheduler prevents concurrent execution of recurrent instances, and the next recurrence remains in `WAIT` state while the current recurrence is active.
`WAIT`	`DISPATCHER_STOPPED`	The job request cannot be dispatched because there is no server with the application deployed with a running dispatcher.	Restart the request dispatcher. For more information, see Section 3.6.2
`WAIT`	`DISPATCHER_FAILED`	The job request cannot be dispatched because the dispatcher has failed on all servers to which the application is deployed.	Restart the request dispatcher or other Oracle Enterprise Scheduler components. For more information, see Section 3.6.2 and Section 3.6.1.
terminal state	`TERMINAL`	The job request is in a terminal state (`SUCCEEDED`, `WARNING`, `ERROR`, `CANCELLED`, `EXPIRED`, `VALIDATION_FAILED`, `FINISHED`). Includes the application, process group, isolation group, work assignment and workshift.

Note:

For more information about job states, see Oracle Fusion Middleware Java API Reference for Oracle Enterprise Scheduler.

B.3 Getting Started with Troubleshooting an Oracle Enterprise Scheduler Cluster

Troubleshooting an Oracle Enterprise Scheduler clustered environment involves the following.

Finding Performance and Scalability Issues
Using a Shared Database
Tuning Oracle Enterprise Scheduler System Performance

B.3.1 Finding Performance and Scalability Issues

It is possible to detect any issues with performance and scalability problems by viewing performance metrics in Fusion Middleware Control. Metrics include Oracle WebLogic Server, JVM-level metrics and plots, as well as Oracle Enterprise Scheduler level metrics.

The system-level tools specific to the operating system running on the server can be used as an additional diagnostic tool. System level tools indicate how machine resources are utilized at various times, such as network, memory, CPU, I/O, and so on. Such system tools are especially useful for tuning job implementations. Database tools enable identifying problems in the database. For remote jobs, such as ADF Business Components or SOA Java jobs running on a remote system, you can use the corresponding Fusion Middleware Control and system level tools on those servers.

Fusion Middleware Control provides the following types of Oracle Enterprise Scheduler metrics to help identify problems:

Completed job request statistics by job name. Shows run count, run time, success rate and last run job request statistics for completed requests by job name.
Completed job request statistics by user. Shows completed request count and run time for completed requests by user.
Completed job request statistics by work assignment. Shows wait time, processing time, completed and failed count for completed job requests by work assignment.
Completed job request count by status. Displays completed job requests in a variety of terminal states.

For more information about metrics, see [TODO: new link. was 9.7].

B.3.2 Using a Shared Database

A common database can be used across multiple application domains. In this case, the database may be loaded from multiple sources. To help with this, Oracle Enterprise Manager allows you to see running and waiting jobs, as well as metrics across the database from multiple Oracle Enterprise Scheduler systems.

B.3.3 Tuning Oracle Enterprise Scheduler System Performance

The following potential performance and scalability issues may occur in the context of job requests or Oracle Enterprise Scheduler runtime.

Jobs are saturating the CPU of Oracle Enterprise Scheduler servers.
Jobs are overloading the remote systems where asynchronous jobs are running.
Ready jobs are filling the queue, despite the availability of spare CPU power, such that job output is delayed.
Multiple domains are sharing a database. A great number of concurrently running database-intensive jobs across domains are slowing down the database.
Performance and scalability are affected by the running of large jobs at the end of a financial quarter or month.
Performance and scalability are affected by the concurrent running of two or more jobs that interact very intensively with the database.

Tuning involves changing job implementations or changing schedules, Oracle Enterprise Scheduler cluster size, processor bindings for work assignments, throttling and thread limits.

B.3.3.1 Tuning Clusters

Clusters are the basic mechanism for enabling scalability and high availability. When a job runs, it is equally likely to run on any processor on which it is eligible to run at that time. By carefully controlling the size of the Oracle Enterprise Scheduler cluster, it is possible to better distribute job executions across the cluster so that servers do not become overloaded. In the case of remote jobs, the jobs actually execute on a remote system and consume very few resources in the cluster (except for a blocked thread for synchronous jobs). If jobs running locally are overloading the system, the first step is to revisit the cluster size configuration.

B.3.3.2 Processor Bindings

Some jobs can physically execute only on a given server; these jobs have been bound to run only on that server. If too many jobs are bound to a particular processor, the benefits of a clustered environment are effectively moot. For the purposes of high availability, avoid tying a job to just one server while enabling the job to run on at least two servers. Otherwise, the job will not run if the bound processor is down.

Rather than relying on clusters to randomly distribute work, you may have a set of long, resource intensive jobs to be run locally within a given scheduling window. In this case, you can bind jobs to specific processors and explicitly control the distribution of these jobs.

A processor is an Oracle Enterprise Scheduler instance. One Oracle Enterprise Scheduler instance runs on one cluster node. As one cluster node typically runs on a single computer, a processor normally equates to a computer.

At times, a clustered environment runs well until the scheduled periods during which a number of resource intensive jobs run. In order to maintain performance, you can configure the cluster with extra idle nodes that are activated during busy periods so as to handle the extra job load. You use standard Oracle WebLogic Server cluster methodologies to enable this clustering. For more information about Oracle WebLogic Server clustering, see the "High Availability for WebLogic Server" chapter in the Oracle Fusion Middleware High Availability Guide.

Job performance can vary depending on whether executing jobs are synchronous or asynchronous. Synchronous jobs consume a single thread throughout their execution, and are normally short lived. (An exception might be a process or spawned job that loads a database.) Asynchronous jobs consume a thread at the beginning and end of execution for a very short time, but they otherwise run independently. Asynchronous jobs are typically long running and continue execution across server restarts. Typical examples of asynchronous jobs are PL/SQL jobs, Java jobs that invoke remote asynchronous ADF Business Components services, and Java jobs that invoke remote asynchronous SOA services.

Throttling limits the maximum number of jobs that may execute concurrently. This is important to avoid flooding the system with too many concurrently running jobs. For synchronous jobs, this limit is imposed by limiting the number of threads available for execution. For PL/SQL jobs and other asynchronous jobs, this limit is imposed by defining a maximum concurrency limit for PL/SQL and asynchronous jobs respectively. Asynchronous throttling limits are set on the work assignment to which the jobs are assigned. For more information about setting asynchronous job limits, see Section 5.3.2.1

It is possible that all threads configured are used up for synchronous jobs thereby blocking asynchronous jobs from starting. This can be avoided if asynchronous and synchronous jobs are not combined in a single work assignment.

B.3.3.3 Using Job Incompatibility to Manage Performance

You can configure a job incompatibility not only to prevent two incompatible jobs from running, but in order to prevent both intensive jobs from heavily loading the same resource. In order to maintain good performance, you can define an incompatibility for such jobs so that they never run at the same time. For more information about defining a job incompatibility, see Section 5.2.3.2.

B.3.3.4 Tuning Oracle Enterprise Scheduler for Optimal Performance

You can tune the following Oracle Enterprise Scheduler components:

Request dispatcher
Request processor
Connection pool size
RDBMS Scheduler

Tuning the Dispatcher

The dispatcher tuning parameters apply to the Oracle Enterprise Scheduler request dispatcher. The request dispatcher manages requests that are awaiting their scheduled execution. The request processor handles the job requests once they have executed.

Parameters are as follows.

Dispatcher Enabled: Indicates whether the request dispatcher is enabled on the Oracle Enterprise Scheduler server. When disabled, that Oracle Enterprise Scheduler server will not dispatch job requests whose scheduled execution time has arrived. By default, this parameter is enabled.
Maximum Poll Interval: Specifies the maximum frequency, in seconds, at which the request dispatcher checks for job requests that are ready to be dispatched. The default value is 15 seconds.

Tuning the Processor

The processor tuning parameters apply to the Oracle Enterprise Scheduler request processor. The request processor manages job requests whose scheduled execution time has arrived, and are ready to execute.

Parameters are as follows.

Processor Enabled: Indicates whether the request processor is enabled on the Oracle Enterprise Scheduler server. If disabled, the Oracle Enterprise Scheduler server will not process requests that are ready to be executed. By default, this parameter is enabled.
Maximum Processor Threads: Specifies the maximum number of threads used to process job requests. This represents the total number of worker threads that might run concurrently for all active work assignments for the Oracle Enterprise Scheduler server. By default, this parameter is set to 25.
Starvation Threshold: Indicates the wait time, in minutes, before a job request that is ready to be executed will be considered starved and eligible to be processed by a starvation worker thread. The starvation worker processes only those job requests that have been ready longer than the starvation time. A starvation worker is not created if the threshold value is equal to zero.

If enabled (meaning the parameter value is greater than zero), a starvation worker thread is created for each active work assignment for the Oracle Enterprise Scheduler server. The Maximum Processor Threads parameter does not apply to starvation workers. By default, the value of this parameter is set to zero, such that no starvation worker is created.

Tuning the Connection Pool Size for the Oracle Enterprise Scheduler Internal Data Source

The connection pool size for the Oracle Enterprise Scheduler internal JDBC data source should be based on the request processor tuning values configured for the Maximum Processor Threads and Starvation Threshold parameters.

The recommended pool size if the Starvation Threshold parameter is disabled (its value is equal to zero) is the number of maximum processor threads plus twenty.

The recommended pool size if the Starvation Threshold parameter is enabled (its value is greater than 0) is the number of maximum processor threads, along with the number of bound work assignments plus twenty.

Tuning the RDBMS Scheduler

The RDBMS scheduler is capable of auto-turning. To enable auto-tuning, set job_queue_processes to 0. Leave JOB_QUEUE_PROCESSES to the default value at 1000. For more information about the JOB_QUEUE_PROCESSES parameter, see the Oracle Database Reference.

B.3.3.5 Tuning Dead Database Connections

Oracle Enterprise Scheduler spawned jobs connect to the database using SQL*Net. If the spawned jobs are canceled, Oracle Enterprise Scheduler kills these processes at the operating system level. It is possible, however, that the database connections used by these processes still exist in the database.

To reduce dead connections in the database, use the SQLNET.EXPIRE_TIME configuration option by setting this value to the desired value. For more information about the SQLNET.EXPIRE_TIME parameter, see the "Parameters for the sqlnet.ora File" chapter in Oracle Database Net Services Reference.

B.4 Problems and Solutions

This section describes common problems and solutions for Oracle Enterprise Scheduler. It contains the following topics:

Job Remains in WAIT State
Synchronous Job Continues in RUNNING State for Too Long
Asynchronous Jobs Remain in RUNNING State and Do Not Complete
Asynchronous Java SOA Job Remains In RUNNING State
Asynchronous Java Oracle ADF Business Components Job Remains In RUNNING State
Asynchronous PL/SQL Job Remains in RUNNING State
Job Does Not Execute at Scheduled Time
Asynchronous Java Job Requires Manual-Error Recovery
Spawned (Process Type) Job Requires Manual Error Recovery
Job Remains in CANCELLING State
Newly Added Server Is Not Being Utilized or Running Inappropriate Jobs
Oracle Enterprise Scheduler Run-Time System Is Throwing Errors
Oracle Enterprise Scheduler Is Running Out Of Database Connections
Job Queue Full Due to a Hanging Job

In addition to the recommended solutions, consider reviewing Section B.3.3 for tuning tips.

B.4.1 Job Remains in WAIT State

Problem

When a user submits a job, the job can remain in the WAIT state for too long without progressing to the RUNNING state.

Solution

To resolve this problem, verify the current status of Oracle Enterprise Scheduler from Fusion Middleware Control:

Verify the request processor and request dispatcher are running:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Processor has a status of Started.
  
  If it is not running, start it. See Section 3.6.2.
4. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Dispatcher has a status of Started.
  
  If it is not running, start it Section 3.6.2.
Verify the ESSAPP application is running:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Dispatcher has a status of Started.
4. In the WebLogic Server home page, in the Deployments section, ensure the ESSAPP applications is running.
  
  If it is not running, start it. See Section 3.6.1.
Check if concurrency or threads are configured is too small by looking at processor and work assignment configuration.
- From the Scheduling Service menu, choose Request Processor > Configure to review the Thread Count field in the Configure Request Processor page. See Section 3.4.2.
- From the Scheduling Service menu, choose Work Allocation > Work Assignments to review the configuration in the Work Assignments page. See Section 5.3.1.

B.4.2 Synchronous Job Continues in RUNNING State for Too Long

Problem

When the user submits a job, it remains in the RUNNING state for too long.

Solution

The job may be in RUNNING state because the Oracle Enterprise Scheduler server has crashed and recovery has not taken place.

To resolve this problem, determine the current status of Oracle Enterprise Scheduler from Fusion Middleware Control:

Verify the request processor is running:
1. From the navigation pane, expand the farm and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Processor is enabled and is started.
  
  If it is not running, start it. See Section 3.6.2.
Verify the Oracle Enterprise Scheduler server is running and start if it is not running:
1. From the navigation pane, expand the farm and then WebLogic Domain.
2. Select the Oracle Enterprise Scheduler cluster.
3. From the WebLogic Cluster page, in the Servers section, view the Status column to determine if the Oracle Enterprise Scheduler server is running.
4. If it shows a status of down (red down arrow) for a server, click the server name in the Name column.
5. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.
Look at the job output to see if the job is making progress. See Section 4.2.4.

When an Oracle Enterprise Scheduler server is restarted, synchronous jobs running on that server are transitioned to the ERROR state.

B.4.3 Asynchronous Jobs Remain in RUNNING State and Do Not Complete

Problem

PLSQL and Java jobs that invoke asynchronous SOA services and Java jobs that invoke asynchronous Oracle Application Development Framework (Oracle ADF) Business Component services run on separate Java Virtual Machines (JVMs) or machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated or lost, resulting in the job staying in the RUNNING state. In addition, subsequent steps in a job set may not execute, or an incompatible job may be blocked indefinitely.

Solution

To resolve this issue, follow the actions described in Section B.2.1.

B.4.4 Asynchronous Java SOA Job Remains In RUNNING State

Problem

Jobs that invoke asynchronous SOA services run on separate Java Virtual Machines (JVMs) or machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution

To resolve this issue, you must troubleshoot the native job.

To resolve this problem for asynchronous SOA jobs:

Search for the request, as described in Section 4.2.2.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Section 4.2.4 for further information about the Request Details page.

The Log Message page displays. By default when user navigates to view the logs for a request, only messages that are logged in Oracle Enterprise Scheduler cluster scope are shown. (If the ESSAPP application is not deployed to a cluster the messages that are logged in the Managed Server scope are shown). However, Oracle Enterprise Scheduler propagates the ECID associated with the request across subsystems, such as Oracle SOA Suite and Oracle ADF.
Make note of the value in the ECID field.
From the Broaden Target Scope list, select the /farm_name/domain_name (Oracle WebLogic Domain) to view messages across the domain.
In the Log Messages page for the Oracle WebLogic Server domain, in the Selected Targets section, ensure the search includes the ECID field with the value from the Request Details page.
Search and view log records for Oracle SOA Suite and the ECID and note any issues. Section 6.7.4.
View the audit trail for an SOA composite application instance using the ECID:
1. From the navigation pane, expand the farm, SOA, and then soa-infra.
2. From the SOA Infrastructure page, click the Instances tab.
3. In the Search section, enter the ECID in the ECID field.
4. Click Search to find the instance with the ECID.
5. Select the instance by clicking the ID in the Instance ID field from the Instances table.
  
  The Flow Trace page displays.
6. View the audit trail for the instance and observe if the composite completed successfully or completed with error. See the "Viewing the Audit Trail and Process Flow of a BPEL Process Service Component" section in the Oracle Fusion Middleware Administrator's Guide for Oracle SOA Suite and Oracle Business Process Management Suite. for more information about the Flow Trace window.
If the SOA composite is complete and the job is still running in Oracle Enterprise Scheduler, manually complete the job in the Request Details page. For related information, see Section B.2.1.

B.4.5 Asynchronous Java Oracle ADF Business Components Job Remains In RUNNING State

Problem

Jobs that invoke asynchronous Oracle ADF Business Components services run on separate JVMs or computers. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution

To resolve this issue, you must troubleshoot the native job.

To resolve this problem for synchronous Oracle ADF Business Components jobs:

Search for the request, as described in Section 4.2.2.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Section 4.2.4 for further information about the Request Details page.

The Log Message page displays. By default when user navigates to view the logs for a request, only messages that are logged in Oracle Enterprise Scheduler cluster scope are shown. (If the ESSAPP application is not deployed to a cluster the messages that are logged in the Managed Server scope are shown). However, Oracle Enterprise Scheduler propagates the ECID associated with the request across subsystems, such as Oracle SOA Suite and Oracle ADF.
Make note of the value in the ECID field.
From the Broaden Target Scope list, select the /farm_name/domain_name (Oracle WebLogic Domain) to view messages across the domain.
In the Log Messages page for the Oracle WebLogic Server domain, in the Selected Targets section, ensure the search includes the ECID field with the value from the Request Details page and the Component Name field.
Search and view log records for Oracle ADF Business Components and web services stack for the ECID and note any issues. See the "Viewing and Searching Log Files" section in the Oracle Fusion Middleware Administrator's Guide.
Observe if the Oracle ADF Business Components completed successfully or completed with error.
If the service is complete and the job is still running in Oracle Enterprise Scheduler, manually complete the job in the Request Details page. For related information, see Section B.2.1.

B.4.6 Asynchronous PL/SQL Job Remains in RUNNING State

Problem 1

PL/SQL jobs run on separate machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution 1

PL/SQL jobs can be identified in the Oracle Enterprise Scheduler by their job names. Job definition names are available from the Request Details page in the Fusion Middleware Control associated with the request. See Section 4.2.4.

To resolve this issue, troubleshoot the native job. For more information, see Section B.2.1.2.

Problem 2

Oracle Enterprise Scheduler internally uses the Database Management System (DBMS) scheduler to schedule PL/SQL jobs. In some cases, the DBMS scheduler has not scheduled the job request, even though Oracle Enterprise Scheduler has submitted the job to the DBMS scheduler and set its state to RUNNING.

Solution 2

To resolve this issue:

Verify the DBMS Scheduler resource usage. See the "Administering Oracle Scheduler" chapter in the Oracle Database Administrator's Guide.
Change the PL/SQL job throttle lim by configuring PL/SQL job limits. See Section 5.3.2.

B.4.7 Job Does Not Execute at Scheduled Time

Problem

When a job's scheduled time arrives, it does not execute.

Solution

To resolve this problem, view the Request Details page in the Fusion Middleware Control. This page provides built in diagnostics to show what the issue is. For jobs that fails with an error, the Request Details page shows the reason and provides access to the job request log from the Action menu. See Section 4.2.4.

For more information job diagnostics, see Section B.2.4.

B.4.8 Asynchronous Java Job Requires Manual-Error Recovery

Problem

A job gets placed in the ERROR_MANUAL_RECOVERY state. There are a number of reasons why a job may end up in error manual recovery. For example for an asynchronous job, the job implementation may not know if the job was successfully launched due to an error and throws the error manual recovery exception. For more reasons on why a job can end up in error manual recovery, see Section B.2.3.

Solution

To resolve this issue, manually update the job status to complete it. See Section B.2.3.1 and Section B.2.3.2.

B.4.9 Spawned (Process Type) Job Requires Manual Error Recovery

Problem

A spawned process type job gets placed in the ERROR_MANUAL_RECOVERY state.

Solution

To solve this issue, transition the request to a terminal state:

Identify the spawned host and process ID in the Request Details page. See Section 4.2.4 for more information about the Request Details page.
If the process is still running on the host, wait for it to complete or terminate it.
When the process is no longer running, recover the request to begin the transition to an error state. It will be subject to auto-retries if configured. From Fusion Middleware Control, perform the following steps:
1. Search for the request, as described in Section 4.2.2. When selecting a value for Status, select Cancelled and Error Manual Recovery.
2. On the Request Details page, from the Action menu, select Recover Stuck Request. See Section 4.2.4 for more information about the Request Details page.
  
  A request in the CANCELLED state is put to CANCELLED state and a request in ERROR_MANUAL_RECOVER_STATE is put to ERROR state, specifying an appropriate error message. The error message specified by the user will be shown on Request Details page.
3. Manually update the job status to complete it. See Section B.2.3.1 and Section B.2.3.2.

For more information on manually recovering a job, see Section B.2.3

B.4.10 Job Remains in CANCELLING State

Problem

Sometimes when a job is canceled, it stays in CANCELLING state and does not get canceled. Results of a cancellation request depends on the stage of processing for the request when the cancel happens and the results of that stage. Many jobs are implemented as asynchronous ADF service invocations. The application infrastructure does not support cancellation of in-flight service requests, and as a result, the job does not cancel as expected. In addition, there are additional reasons why a job may get stuck in CANCELLING state.

For PL/SQL jobs, Oracle Enterprise Scheduler will attempt to kill the RDBMS scheduler job. For spawned process, Oracle Enterprise Scheduler will try to kill the running process. If the job is successfully killed, the request will transition to CANCELLED state. If the job completes before it can be killed, the state to which the request will transition depends on the result of the job execution. For these type of jobs, this issue should not occur.
Asynchronous Java job: The request was canceled, but the remote job never contacted Oracle Enterprise Scheduler with its terminal status. This could happen if the job is still executing because either AsyncCancellable interface was not implemented or the remote cancel operation did not succeed. It could also happen if the remote system is unable to respond.
Synchronous Java job: The request was canceled and the job is still executing. This could happen if either Cancellable() interface was not implemented or the job's Executable.execute() method still did not return after Cancellable.cancel was invoked. For more information about the Executable interface, see the chapter "Use Case Oracle Enterprise Scheduler Sample Application" in Oracle Fusion Middleware Developer's Guide for Oracle Enterprise Scheduler.
Job sets (Parent-Child Requests): In cases of job sets, the cancellation operation propagates to all eligible child requests. Until all child requests are completed, the parent request will remain in CANCELLING state.
SOA Java job: In cases where an job ends in an error, look for the ECID in the Fusion Middleware Control by locating the composite instance and looking at the audit trail and logs tagged with the ECID to see what happened. For more information about finding the ECID, see the "Viewing the Audit Trail and Process Flow of a BPEL Process Service Component" section in the Oracle Fusion Middleware Administrator's Guide for Oracle SOA Suite and Oracle Business Process Management Suite.
Oracle ADF Business Component Java Job: In cases where requests end in an error, look for the message ID in the log file for the message ID in the server-name-diagnostic.log file in the following directories to see what happened:
```
(UNIX) DOMAIN_HOME/servers/server_name/logs
(Windows) DOMAIN_HOME\servers\server_name\logs
```
For more information about viewing log files, see the "Viewing and Searching Log Files" section in the Oracle Fusion Middleware Administrator's Guide.

Solution

Manually intervene in the Fusion Middleware Control to complete the job stuck in CANCELLING state.

To recover asynchronous requests that are stuck, determine whether the remote job is still executing. If it is, nothing should be done on the Oracle Enterprise Scheduler side. If the remote job is no longer executing, then perform the following from Fusion Middleware Control to complete the request:

Search for the request, as described in Section 4.2.2.
On the Request Details page, from the Action menu, select Recover Stuck Request. See Section 4.2.4 for more information about the Request Details page.

For synchronous Java jobs, wait for the job to complete. If the job is irrevocably hung, then the server on which it is executing must be restarted.

B.4.11 Performance and Scalability Goes Down When Two Very Database Intensive Jobs Run at the Same Time

Problem

Performance and scalability goes down when two very database intensive jobs run at the same time.

To determine job performance issues:

Use operating-system level or Database Management System (DBMS) tools to check performance metrics across the Oracle Database to ensure the performance bottleneck is isolated to these two jobs running simultaneously and that tuning practices have been attempted.
Determine the execution time of the job:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, click the Top 10 Long Running Job Requests tab to view the jobs.

Solution

To solve this problem, perform one of two solutions:

Mark the two very database intensive jobs as incompatible. Sometimes we want to ensure two jobs never run at the same time, not because they are incompatible and would corrupt the system if run simultaneously, but because they heavily load the same resource. In this case, the jobs can be defined to be incompatible so they never run at the same time. See Section 5.2.3.
Schedule the jobs at different times. See Section 4.2.1.

B.4.12 Newly Added Server Is Not Being Utilized or Running Inappropriate Jobs

Problem

A newly added Managed Server does not run jobs as expected or is not being utilized at all.

You will see in the job history over time that no jobs are running on this server or you notice in the job history that a particular job is running on this server that should not run.

To view the job history:

From the navigation pane, expand the farm, and then Scheduling Services.
Select the ESSAPP application for the appropriate Managed Server.
You will see in the job history over time that no jobs are running on this server.
Search for the request, as described in Section 4.2.2, to view the list of jobs that already have executed.
From the Request ID column in the Request Search page, click a job to go to the Request Details page for the job.
In the Execution Trail section of the Request Details page, view the Dispatcher, Processor, Work Assignment, and Workshift the job ran on.

Solution

After a new server is added, Oracle Enterprise Scheduler determines if the default work assignment can be used based on how other processors are bound. If it cannot use the default work assignment, it configures the new server to only run the health check service (internal work assignment ESSInternalWA). Revisit this default configuration to configure the work assignment binding to this server as desired and removing the internal work assignment after the health check is complete. For more details, see Section 3.4.1.

B.4.13 Oracle Enterprise Scheduler Run-Time System Is Throwing Errors

Problem

Oracle Enterprise Scheduler run-time system is not behaving properly and is throwing errors or encountering problems when processing a job.

Solution

To identify and solve this problem, review the Oracle Enterprise Scheduler system logs to troubleshoot this issue. From Fusion Middleware Control, perform the following:

Search for the request, as described in Section 4.2.2.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Section 4.2.4 for further information about the Request Details page.
To adjust the log levels:
1. From the navigation pane, expand the farm, WebLogic Domain, Oracle Enterprise Scheduler cluster, and select the Oracle Enterprise Scheduler server (for example, ess_server1).
  
  The WebLogic Server home page displays.
2. From the WebLogic Server menu, select Logs > Log Configuration to display the Log Configuration page.

You can configure the Oracle Enterprise Scheduler server loggers for an Oracle WebLogic Server by modifying the logging.xml file of that Oracle WebLogic Server. By default, there is no explicit logger entry for the Oracle Enterprise Scheduler and it will inherit the logging level and log handlers configured for the parent logger, typically the "oracle" logger or the ("") root logger.

By default, the log messages for the Oracle Enterprise Scheduler server logger can be found in the Oracle WebLogic Server diagnostic log file for that Oracle WebLogic Server. For more information on logging and log levels, see Section 6.7.4.

Note:

The logger only shows logs written by Oracle Enterprise Scheduler job running in Oracle WebLogic Server. Once Oracle Enterprise Scheduler transfers control of running PL/SQL and C jobs to the PL/SQL or C process, respectively, PL/SQL and C job logging data is not written to the Oracle Enterprise Scheduler logs as they run in a separate process.

B.4.14 Oracle Enterprise Scheduler Is Running Out Of Database Connections

Problem

If Oracle Enterprise Scheduler is running out of database connections, there could be a problem with connection leaks in Oracle Enterprise Scheduler.

Description

See the "Running out of Data Source Connections" section in the Oracle Fusion Middleware Administrator's Guide.

B.4.15 Job Queue Full Due to a Hanging Job

Problem

At times, some jobs may not behave as expected. The job queue may become large due to some job spinning or hanging or if a job has memory leaks.

Solution

Here are some typical scenarios:

Java job goes into an infinite loop and there are other jobs waiting in the queue for it to finish.

To resolve this problem, perform the following with Fusion Middleware Control:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. From the Scheduling Service menu, choose Request Processor > Configure to review the Thread Count field in the Configure Request Processor page. See Section 3.4.2.
  
  If the Thread Count field is set to 25 synchronous Java jobs, only 25 Java jobs are permitted to be in the RUNNING state, and all other Java jobs are in the queue have to wait to be processed.
  
  If some jobs are performing heavy processing or seem to hang, you can isolate such jobs by defining dedicated work assignments to process them on a specific Oracle Enterprise Scheduler server, leaving the other servers to process the rest of the jobs. See Section 5.3.1.1.
4. Restart the Oracle Enterprise Scheduler server:
  1. From the navigation pane, expand the farm and then WebLogic Domain.
  2. Expand the Oracle Enterprise Scheduler cluster and select the Oracle Enterprise Scheduler server.
  3. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Shut Down.
  4. After the server is shut down, from the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.
    
    After restarting the server, the Oracle Enterprise Scheduler moves RUNNING jobs to the ERROR state and starts processing the next batch of jobs in the queue.
PL/SQL job goes into infinite loop and never exits. Many other PL/SQL jobs are also submitted.

Cancel the PLSQL job to move the job to CANCELLED state. See Section 4.2.6.
At times, heavy database inserts and updates will cause jobs to wait.

For example, consider a PL/SQL procedure that performs 1 million inserts, updates, and deletes on a table. Such as PL/SQL procedure can take about 20 minute to two hours to complete, depending on the DBMS load. If a work Assignment named wa1 has been created with a PLSQL concurrency limit setting of 25. There are many PL/SQL jobs and each job runs this PL/SQL procedure.

In this case, the Oracle Enterprise Scheduler server will process only 25 PL/SQL jobs concurrently. Since each PL/SQL job is taking one to two hours to complete, the remaining PL/SQL jobs will be in the WAIT state. After the completion of a RUNNING job, the first job from the WAIT queue is picked and is scheduled to run.

B.5 Using My Oracle Support for Additional Troubleshooting Information

You can use My Oracle Support (formerly MetaLink) to help resolve problems. My Oracle Support contains several useful troubleshooting resources, such as:

Knowledge Base articles
Community Forums and Discussions
Patches and Upgrades
Certification information

You can access My Oracle Support at https://support.oracle.com.