Getting Started with Troubleshooting Oracle Enterprise Scheduler Jobs

You may want to troubleshoot some typical issues that can arise when running Oracle Enterprise Scheduler jobs.

  • Asynchronous jobs remain in running state indefinitely.

  • An asynchronous job hangs or crashes.

  • Oracle Enterprise Scheduler is down when the remote scheduled job completes or there are network problems such that Oracle Enterprise Scheduler does not receive the completion status from the remote job.

  • An invoked SOA composite returns an error.

  • A scheduled job is ready to execute, but does not execute.

  • A scheduled job is placed in manual error recovery state where troubleshooting is required.

  • Oracle Enterprise Scheduler is throwing errors.

  • A scheduled job ends in error.

For troubleshooting Oracle Enterprise Scheduler, use the standard Oracle WebLogic Server system log. For information about viewing job request logs, see Managing Logging for Oracle Enterprise Scheduler

This section contains the following topics:

Troubleshooting Asynchronous Scheduled Jobs

Asynchronous jobs run on separate JVMs, including asynchronous Oracle BI Reporting and Publishing, PL/SQL and Java jobs that invoke asynchronous SOA or ADF Business Components services. When handling asynchronous scheduled jobs, Oracle Enterprise Scheduler depends on the remote job sending a completion status defining the job outcome after running. However, the completion status may not be generated, or it may get lost for any of the following reasons:

  • The scheduled job has crashed.

  • The job is stuck in a hanging state.

  • Oracle Enterprise Scheduler was down when the job completed.

  • Network problems.

In any of these cases, an asynchronous job stays in running state indefinitely. As a result, subsequent steps in a job set may not execute, or an incompatible job may be blocked indefinitely.

Oracle Enterprise Scheduler displays information in Fusion Middleware Control that enables you to locate the job on the remote system, including an external identifier.

You can take any of the following actions to troubleshoot an asynchronous job that is stuck in running state:

When configuring timeouts for jobs, you can use Fusion Middleware Control to display all jobs that have timed out. However, a job that has timed out is still in running state. You must manually change the state of jobs that have timed out. Status callbacks are still accepted for timed out jobs and the job transitions to completion.

You can also change the status of asynchronous jobs that have not timed out. This might happen if a timeout has not been configured, the completion status was lost and you notice that the job has been running for a long time.

Troubleshooting Asynchronous Java Jobs

In the case of asynchronous Java jobs (including jobs that invoke remote Oracle SOA Suite or Oracle ADF Business Components services), the log records are tagged with the ECID. You can view logs across the domain by ECID to troubleshoot the job execution. Oracle ADF Business Components, SOA composites, the web services stack and application log records with the ECID.

For asynchronous SOA jobs, the audit trail for the instance in Fusion Middleware Control can be used to troubleshoot composite execution, as described in Asynchronous Java SOA Job Remains In RUNNING State.

Example B-1 shows a sample log message file the JRF web services stack in server-diagnostic.log, with inline comments for each log. This log is stored in the following directories:

(UNIX) DOMAIN_HOME/servers/server_name/logs
(Windows) DOMAIN_HOME\servers\server_name\logs

For more information about configuring parameters for a job, see Submitting an Oracle Enterprise Scheduler Job Request. For more information about viewing the server-diagnostic.log file, see Saving Job Request Logs.

Example B-1 Asynchronous job logging messages in server-diagnostic.log

Sending message to JMS queue "oracle.j2ee.ws.server.async.DefaultRequestQueue" for asynchronous
processing of service b99d80e5-42aa-423a-9e98-f6f88b8b79dfRequest.

[2010-08-30T18:28:13.519-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: <anonymous>] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0:1] [WEBSERVICE\_PORT.name:
 EmployeeModuleServiceSoapHttpPort] [APP: ADFBCAsync] [J2EE\_MODULE.name: ADFBCAsync\-ejb]
[WEBSERVICE.name: EmployeeModuleService] [J2EE\_APP.name: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

An asynchronous request message is received and successfully recorded for the service
EmployeeModuleService with the replyTo address
http://adc2180314:7001/ADFBCAsyncCallback/EmployeeModuleServiceCallbackResponseImplService.

[2010-08-30T18:28:13.724-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: <anonymous>]
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0:1] [WEBSERVICE\_PORT.name:
EmployeeModuleServiceSoapHttpPort] [APP: ADFBCAsync] [J2EE\_MODULE.name: ADFBCAsync\-ejb]
[WEBSERVICE.name: EmployeeModuleService] [J2EE\_APP.name: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 
Unknown macro: {/service/common/} 

Started asynchronous request processing for the service EmployeeModuleService with the message
selector "b99d80e5-42aa-423a-9e98-f6f88b8b79dfRequest". Transaction enabled: "true".

[2010-08-30T18:28:13.811-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '2' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

Completed asynchronous request processing. A response is sent to the client.

[2010-08-30T18:28:17.307-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '2' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56]

Started asynchronous response processing for the service EmployeeModuleService with the message
selector "b99d80e5-42aa-423a-9e98-f6f88b8b79dfResponse".

[2010-08-30T18:28:17.330-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 
Unknown macro: {/service/common/} 

Completed asynchronous response processing successfully. The client should have received the
response at this point.

[2010-08-30T18:28:17.825-07:00] AdminServer NOTIFICATION []
oracle.j2ee.ws.common.jaxws.JAXWSMessages [tid: ACTIVE.ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'] [userId: OracleSystemUser] 
[ecid: 0000If5kZGS76EWLHyo2yf1CV5eh000000,0] [APP: ADFBCAsync] [MessageID:
urn:uuid:01994234-4442-4dee-82a6-e1e04407af56] 

Completed asynchronous response processing with exceptions. The client does not receive any
response.

[2010-08-31T09:55:33.939-07:00] AdminServer ERROR [] oracle.j2ee.ws.common.jaxws.JAXWSMessages
[tid: ACTIVE.ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId:
OracleSystemUser] [ecid: 0000If94mKW76EWLHyo2yf1CVJG1000000,0] [APP: j2wpojoasync] [MessageID:
urn:uuid:5b9a5134-1416-4bda-95fd-da6e624466d7] 

The response is not sent again as the callback service has replied with a SOAP fault. The HTTP
response code is 500.

[2010-08-31T10:17:35.852-07:00] AdminServer ERROR [] oracle.j2ee.ws.common.jaxws.JAXWSMessages
[tid: ACTIVE.ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId:
OracleSystemUser] [ecid: 0000If99plA76EWLHyo2yf1CVĴi000000,0] [APP:
ADFBCAsyncInvalidCallbackCreds] [MessageID: urn:uuid:7e355428-f68f-4c3f-a368-baf529048323] 

Troubleshooting Asynchronous PL/SQL Jobs

PL/SQL jobs can be identified in the Oracle Database by their job names. You can find the database job names associated with the request on the Job Details page in Fusion Middleware Control. For more information about viewing job request details, see Viewing Job Request Details.

Troubleshooting SOA Composite Jobs

If a SOA composite that was invoked as a web service job is stuck or returns an error, go to the Oracle Enterprise Manager Fusion Middleware Control Request Detail page and navigate to the SOA flow trace.

Troubleshooting Process Jobs

Spawned job processes can be identified by a combination of host name, process ID and process group ID. The Job Details page in Fusion Middleware Control shows this and additional information. For more information about viewing job request details, see Viewing Job Request Details.

Steps for Manual Recovery

Job requests may require manual error recovery for a number of reasons. For example, in the case of an asynchronous job request, the job implementation may not know whether the job request launched successfully and an error manual recovery exception is thrown. When a job request is placed in manual error recovery, the job request must be manually completed. If the job request actually was successfully launched and returns a completion status on completion, this status is ignored.

A job request may be placed in manual error recovery for the following reasons:

  • An asynchronous Java job request throws an ExecutionManualRecoveryException, which indicates to Oracle Enterprise Scheduler that manual recovery is necessary. The job request is placed in ERROR_MANUAL_RECOVERY state. The cause is set to Cause.PROCESS_MANUAL_RECOVER_ERROR (209).

  • An asynchronous Java job request throws a run-time exception or an error. If a thrown exception is not handled by the job implementation, Oracle Enterprise Scheduler cannot know whether the remote job was invoked, such that manually recovery is required. The cause is set to Cause.PROCESS_MANUAL_RECOVER_ERROR (209).

  • If Oracle Enterprise Scheduler crashes before it has finished initiating an asynchronous job request, it cannot know whether the remote job has been invoked. The job request transitions to ERROR_MANUAL_RECOVERY, which holds onto incompatibility locks. The cause is set to Cause.PROCESS_RECOVER (210). If the service crashes after it has finished initiating the asynchronous job and the job has transitioned to running, then the job continues in a running state and complete when the callback occurs.

  • A spawned job runs in a clustered environment, the job request runs on the first instance of Oracle Enterprise Scheduler, which goes down, along with the associated spawn agent. If the first instance of Oracle Enterprise Scheduler is not back up and running for a while, Oracle Enterprise Scheduler does not know whether the spawned process is actually still running. Manual detection and recovery are required. The cause is set to Cause.PROCESS_RECOVER(210).

Handling Synchronous Java Jobs Requiring Manual Recovery

If a Java job times out or runs for too long, you can either let it run to completion as usual, or attempt to terminate it.

To recover synchronous Java jobs:

  1. If the job is not in the ERROR_MANUAL_RECOVERY state, cancel the job. See Canceling Oracle Enterprise Scheduler Job Requests. If the job remains in CANCELLING state for an unreasonable amount of time, continue to the next step.

  2. Take one of the following actions:

    • If, after some time, the request transitions to a terminal state, no other intervention is required.

    • If the request remains in CANCELLING state, then determine the Oracle Enterprise Scheduler server on which the request is running by finding the server and cluster names.

      Determine the cluster name from the Oracle Enterprise Scheduler process group. Process group information is shown in search results only when the scope for the job request search is All Scheduling Services sharing the ESS repository. See Searching for Oracle Enterprise Scheduler Job Requests for information on performing searches.

  3. If canceling the job is not effective, restart the relevant Oracle Enterprise Scheduler server:

    1. From the navigation pane, expand the farm and then WebLogic Domain.

    2. Expand the Oracle Enterprise Scheduler cluster and select the Oracle Enterprise Scheduler server.

    3. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Shut Down.

    4. After the server is shut down, from the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.

  4. In the Request Details page, from the Action menu, select Recover Stuck Request.

Handling Stuck Asynchronous Jobs Requiring Manual Recovery

When an asynchronous job request requires manual recovery, follow these basic steps. Additional steps depend on the job type.

If a request is stuck (marked for manual recovery or taking too long or timed out) and it is an asynchronous remote job, you first check to see if the remote job is still running.

To handle asynchronous jobs:

  1. Check to see if the remote job is still running:

    1. Identify the remote job by navigating to the Request Details page. See Viewing Job Request Details for information about how to view the Request Details page.

      - For PL/SQL jobs, navigate to the Request Details page and click the Execution Type icon. The database session information displays, correlating the scheduled job request with the database scheduler job. This indicates whether the job is still executing.

      - For spawned jobs, navigate to the Request Details page, click the Execution Type icon to display the spawned process information. This correlates the scheduled job request with the external job, in this case the operating system process. This correlation indicates whether the job is still executing.

    2. Verify that the remote job is no longer running.

      - If the remote job was not successfully created on the remote system, set the status of the job request to ERROR.

      - If the remote job was created and has finished executing, determine its status and set the status of the job request accordingly.

      - If the remote job instance has not finished executing, wait until it completes and set the job request status accordingly.

  2. After the remote job is no longer running, terminate the job request in Oracle Enterprise Scheduler, so that Oracle Enterprise Scheduler is no longer is keeping track of the job.

    1. Navigate to the Search request page by clicking the Scheduling Service menu and selecting Search Job Request.

    2. From the Quick Search list, for asynchronous jobs marked for manual recovery (ERROR_MANUAL_RECOVERY), select Asynchronous requests that need manual recovery. Requests are in the RUNNING state. For asynchronous jobs not in the ERROR_MANUAL_RECOVERY state, rather than search with the Asynchronous requests that need manual recovery option, search for the known requests that must be updated.

    3. In the search results, click the request ID to display the Request Details page.

    4. In the Request Details page, from the Action menu, choose Recover Stuck Request.

    5. In the Recover Stuck Request dialog box, set the state accordingly for the job request. Optionally, add a description for the status of the job request.

      If you set the status to ERROR, the description you add displays in the Request Details page.

Job Diagnostics

A scheduled job may not execute for a number of reasons, or it may fail. Either way, Fusion Middleware Control provides built-in diagnostics in the Job Details page. For jobs that fail with an error, the Job Details page displays the reason and provides access to the job request log.

Fusion Middleware Control provides the following:

  • Access logs for the job request.

  • Database session information for PL/SQL job requests, shown in the Job Details page for a PL/SQL job request.

  • Spawned process information for process job requests, shown in the Job Details page for process job requests.

  • A message displays in the Job Details page for job requests in wait, ready or blocked state specifying the reason the job request is in that state.

  • Error and warning messages display in the Job Details page. Additional details also display, such as stack traces and so on.

  • For retried job requests, the Job Details page displays the number of times the job request was retried, the time of the next retry and the number of additional times the job request is to be retried in the event of an error.

  • For job requests that require manual recovery or have timed out, the Job Details page displays a message regarding a requirement for manual recovery.

For more information about viewing job request details, see Viewing Job Request Details.

Table B-1 shows the associated diagnostic codes for each state along with a description and additional information that is provided. If a request is in a state that does not appear in the table, its diagnosis contains only the request state.


Table B-1 Job Request States and Associated Diagnostic Codes

State Diagnostic Code Description Related Documentation

BLOCKED

BLOCKED

Blocked due to incompatible job request or requests. Includes the request ID of the blocking request.

For information about canceling a job request, see Canceling Oracle Enterprise Scheduler Job Requests.

COMPLETED

POSTPROCESS_DELAY

The job request is delayed by the post-processor. Includes the time at which the delay ends.

 

PAUSED

PAUSED

The job request is the parent of one or more subrequests and has been paused. Includes the request ID of a subrequest in a non-terminal state, if there is one.

For information about resuming a paused job request, see Holding and Resuming Oracle Enterprise Scheduler Job Requests.

READY

NO_ACTIVE_SERVER

No server is active in the process group. Includes the name of the process and isolation groups.

For information about activating an Oracle Enterprise Scheduler, see Starting and Stopping Oracle Enterprise Scheduler Components.

READY

REQUESTED_PROCESSOR_NOT_ACTIVE

The server specified by the job request SYS_requestedProcessor property is not available. Includes the name of the requested processor.

-

READY

NO_APPLICATION

The application is either not deployed or not active. Includes the name of the application, process group and isolation group.

-

READY

PROCESSOR_STOPPED

The request cannot be processed because there is no server with the application deployed with an active processor.

For information about starting a request processor, see Starting and Stopping a Request Processor or Dispatcher.

READY

PROCESSOR_FAILED

The request cannot be processed because the processor has failed on all servers to which the application is deployed.

-

READY

PROCESSOR_WAIT

The job request is waiting for an available processor thread. Includes the name of a work assignment that could process the job request.

-

READY

INACTIVE_WORK_ASSIGNMENT_WAIT

Waiting for a work assignment to become active. Includes the name of an inactive work assignment that could process the job request if it were active.

-

READY

NO_LOADED_WORK_ASSIGNMENT

There is a bound work assignment that could process the request, but the binding is not loaded. The server may be down. Includes the name of a work assignment that could process the request but is not loaded.

For information about activating an Oracle Enterprise Scheduler, see Starting and Stopping Oracle Enterprise Scheduler Components.

READY

NO_BOUND_WORK_ASSIGNMENT

You must bind a specialized work assignment to the job request.

For information about binding a work assignment, see Configuring a Request Processor.

READY

THROTTLED

The job request is asynchronous and the number of active asynchronous jobs of the same type is at the allowed limit. Includes the work assignment, workshift, the asynchronous job type and the asynchronous limit.

Change the number of threads allocated to jobs, or the asynchronous job limits for the work shift. For more information about editing a work shift, see Creating or Editing a Work Assignment.

READY

DISABLED_WORK_ASSIGNMENT

There is a bound work assignment that could process the request, but the work assignment is disabled. Includes the name of the disabled work assignment.

Enable the work assignment. For more information about editing a work assignment, see Creating or Editing a Work Assignment.

RUNNING

PREPROCESS_DELAY

The request is delayed by the pre-processor. Includes the time at which the delay ends.

-

RUNNING

TIMED_OUT

The job request has timed out. Typically, this code displays for timed out asynchronous Java job requests.

-

RUNNING

NO_ACTIVE_SERVER

The job request processing cannot continue because no server is active in the process group. Includes the name of the process and isolation groups.

Verify that the Oracle Enterprise Scheduler server is running (see Monitoring Oracle Enterprise Scheduler Request Activity). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Starting and Stopping Oracle Enterprise Scheduler Components.

RUNNING

REQUESTED_PROCESSOR_NOT_ACTIVE

Job request processing cannot continue because the server specified by the job request SYS_requestedProcessor property is not available. Includes the name of the requested processor.

Verify that the Oracle Enterprise Scheduler server is running (see Monitoring Oracle Enterprise Scheduler Request Activity). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Starting and Stopping Oracle Enterprise Scheduler Components.

RUNNING

NO_APPLICATION

Job request processing cannot continue because the application is either not deployed or not active. Includes the name of the application, process group and isolation group.

-

RUNNING

PROCESSOR_STOPPED

Job request processing cannot continue because there is no server with the application deployed with an active processor.

Verify that the Oracle Enterprise Scheduler server is running (see Monitoring Oracle Enterprise Scheduler Request Activity). Start a request processor, as described in Starting and Stopping a Request Processor or Dispatcher.

RUNNING

PROCESSOR_FAILED

Job request processing cannot continue because the processor has failed on all servers to which the application is deployed.

Verify that the Oracle Enterprise Scheduler server is running (see Monitoring Oracle Enterprise Scheduler Request Activity). Restart the request processor, as described in Starting and Stopping a Request Processor or Dispatcher.

RUNNING

PROCESSOR_WAIT

Waiting for an available processor thread to handle the update event for an asynchronous Java job request. Includes the name of a work assignment that could process the request; indicates whether the job request has disabled update events.

-

RUNNING

INACTIVE_WORK_ASSIGNMENT_WAIT

Waiting for a work assignment to become active to handle the update event for an asynchronous Java request. Includes the name of an inactive work assignment that could process the request if it were active; indicates whether the job request has disabled update events.

Activate the inactive work assignment so that the job request can be processed. For more information, see Creating or Editing a Work Assignment.

RUNNING

NO_LOADED_WORK_ASSIGNMENT

Processing of an asynchronous Java job request may be delayed. There is a bound work assignment that could process the update event, but the binding is not loaded. The server may be down. Includes the name of a work assignment that could process the job request but is not loaded; indicates whether the job request has disabled update events.

Verify that the Oracle Enterprise Scheduler server is running (see Monitoring Oracle Enterprise Scheduler Request Activity). If necessary, restart one of the Oracle Enterprise Scheduler components, as described in Starting and Stopping Oracle Enterprise Scheduler Components.

RUNNING

NO_BOUND_WORK_ASSIGNMENT

Processing of an asynchronous Java job request may be delayed. There is no bound work assignment that can handle the update event for the asynchronous Java job request; indicates whether the request has disabled update events.

Bind a work assignment to the request processor. For more information, see Configuring a Request Processor.

RUNNING

DISABLED_WORK_ASSIGNMENT

Processing of an asynchronous Java job request may be delayed. There is a bound work assignment that could process the update event for the job request, but the work assignment is disabled. Includes the name of the disabled work assignment; indicates whether the job request has disabled update events.

Enable the work assignment. For more information about editing a work assignment, see Creating or Editing a Work Assignment.

RUNNING

ASYNC_WAIT_COMPLETION

The job request is an asynchronous request whose execution has been initiated. Oracle Enterprise Scheduler is waiting for a notification that the asynchronous executable has completed.

The request might need to be recovered manually if the remote executable has completed. For information about asynchronous job requests, see Troubleshooting Asynchronous Scheduled Jobs.

WAIT

FUTURE_START

The job request has a scheduled time in the future. Includes the scheduled time for the job request.

-

WAIT

RETRY_DELAY

An error occurred in the job request, and the job request has been delayed before being retried. Includes the scheduled retry time.

-

WAIT

NO_APPLICATION

The job request scheduled time has been reached, but the job request cannot be dispatched because the application is not available (is either not deployed or not active). Includes the name of the application, process group and isolation group.

-

WAIT

PARENT_NOT_PAUSED

The job request is a subrequest whose parent has not paused. The subrequest remains in WAIT state until its parent pauses. Includes the parent request ID.

For information about pausing a paused job request, see Holding and Resuming Oracle Enterprise Scheduler Job Requests.

WAIT

DEFERRED

The job request is an instance of a recurrence for which the previous recurrence instance is still running. Oracle Enterprise Scheduler prevents concurrent execution of recurrent instances, and the next recurrence remains in WAIT state while the current recurrence is active.

-

WAIT

DISPATCHER_STOPPED

The job request cannot be dispatched because there is no server with the application deployed with a running dispatcher.

Restart the request dispatcher. For more information, see Starting and Stopping a Request Processor or Dispatcher

WAIT

DISPATCHER_FAILED

The job request cannot be dispatched because the dispatcher has failed on all servers to which the application is deployed.

Restart the request dispatcher or other Oracle Enterprise Scheduler components. For more information, see Starting and Stopping a Request Processor or Dispatcher and Starting and Stopping an Oracle Enterprise Scheduler Service Instance.

WAIT

WAIT_QUEUE_MISSING

The request has no wait queue entry when one was expected.

-

terminal state

TERMINAL

The job request is in a terminal state (SUCCEEDED, WARNING, ERROR, CANCELLED, EXPIRED, VALIDATION_FAILED, FINISHED). Includes the application, process group, isolation group, work assignment and workshift.

-


Note:

For more information about job states, see Oracle Fusion Middleware Java API Reference for Oracle Enterprise Scheduler.