B.4 Problems and Solutions

This section describes common problems and solutions for Oracle Enterprise Scheduler. It contains the following topics:

In addition to the recommended solutions, consider reviewing Tuning Oracle Enterprise Scheduler System Performance for tuning tips.

B.4.1 Job Remains in WAIT State

Problem

When a user submits a job, the job can remain in the WAIT state for too long without progressing to the RUNNING state.

Solution

To resolve this problem, verify the current status of Oracle Enterprise Scheduler from Fusion Middleware Control:

Verify the request processor and request dispatcher are running:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Processor has a status of Started.
  
  If it is not running, start it. See Starting and Stopping a Request Processor or Dispatcher.
4. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Dispatcher has a status of Started.
  
  If it is not running, start it Starting and Stopping a Request Processor or Dispatcher.
Verify the ESSAPP application is running:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Dispatcher has a status of Started.
4. In the WebLogic Server home page, in the Deployments section, ensure the ESSAPP applications is running.
  
  If it is not running, start it. See Starting and Stopping an Oracle Enterprise Scheduler Service Instance.
Check if concurrency or threads are configured is too small by looking at processor and work assignment configuration.
- From the Scheduling Service menu, choose Request Processor > Configure to review the Thread Count field in the Configure Request Processor page. See Configuring a Request Processor.
- From the Scheduling Service menu, choose Work Allocation > Work Assignments to review the configuration in the Work Assignments page. See Managing Work Assignments.

B.4.2 Synchronous Job Continues in RUNNING State for Too Long

Problem

When the user submits a job, it remains in the RUNNING state for too long.

Solution

The job may be in RUNNING state because the Oracle Enterprise Scheduler server has crashed and recovery has not taken place.

To resolve this problem, determine the current status of Oracle Enterprise Scheduler from Fusion Middleware Control:

Verify the request processor is running:
1. From the navigation pane, expand the farm and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, in the Scheduler Components section, ensure the Request Processor is enabled and is started.
  
  If it is not running, start it. See Starting and Stopping a Request Processor or Dispatcher.
Verify the Oracle Enterprise Scheduler server is running and start if it is not running:
1. From the navigation pane, expand the farm and then WebLogic Domain.
2. Select the Oracle Enterprise Scheduler cluster.
3. From the WebLogic Cluster page, in the Servers section, view the Status column to determine if the Oracle Enterprise Scheduler server is running.
4. If it shows a status of down (red down arrow) for a server, click the server name in the Name column.
5. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.
Look at the job output to see if the job is making progress. See Viewing Job Request Details.

When an Oracle Enterprise Scheduler server is restarted, synchronous jobs running on that server are transitioned to the ERROR state.

B.4.3 Asynchronous Jobs Remain in RUNNING State and Do Not Complete

Problem

PLSQL and Java jobs that invoke asynchronous SOA services and Java jobs that invoke asynchronous Oracle Application Development Framework (Oracle ADF) Business Component services run on separate Java Virtual Machines (JVMs) or machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated or lost, resulting in the job staying in the RUNNING state. In addition, subsequent steps in a job set may not execute, or an incompatible job may be blocked indefinitely.

Solution

To resolve this issue, follow the actions described in Troubleshooting Asynchronous Scheduled Jobs.

B.4.4 Asynchronous Java SOA Job Remains In RUNNING State

Problem

Jobs that invoke asynchronous SOA services run on separate Java Virtual Machines (JVMs) or machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution

To resolve this issue, you must troubleshoot the native job.

To resolve this problem for asynchronous SOA jobs:

Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Viewing Job Request Details for further information about the Request Details page.

The Log Message page displays. By default when user navigates to view the logs for a request, only messages that are logged in Oracle Enterprise Scheduler cluster scope are shown. (If the ESSAPP application is not deployed to a cluster the messages that are logged in the Managed Server scope are shown). However, Oracle Enterprise Scheduler propagates the ECID associated with the request across subsystems, such as Oracle SOA Suite and Oracle ADF.
Make note of the value in the ECID field.
From the Broaden Target Scope list, select the /farm_name/domain_name (Oracle WebLogic Domain) to view messages across the domain.
In the Log Messages page for the Oracle WebLogic Server domain, in the Selected Targets section, ensure the search includes the ECID field with the value from the Request Details page.
Search and view log records for Oracle SOA Suite and the ECID and note any issues. Setting Oracle Enterprise Scheduler Log Levels.
View the audit trail for an SOA composite application instance using the ECID:
1. From the navigation pane, expand the farm, SOA, and then soa-infra.
2. From the SOA Infrastructure page, click the Instances tab.
3. In the Search section, enter the ECID in the ECID field.
4. Click Search to find the instance with the ECID.
5. Select the instance by clicking the ID in the Instance ID field from the Instances table.
  
  The Flow Trace page displays.
6. View the audit trail for the instance and observe if the composite completed successfully or completed with error. See the "Viewing the Audit Trail and Process Flow of a BPEL Process Service Component" section in the Oracle Fusion Middleware Administrator's Guide for Oracle SOA Suite and Oracle Business Process Management Suite. for more information about the Flow Trace window.
If the SOA composite is complete and the job is still running in Oracle Enterprise Scheduler, manually complete the job in the Request Details page. For related information, see Troubleshooting Asynchronous Scheduled Jobs.

B.4.5 Asynchronous Java Oracle ADF Business Components Job Remains In RUNNING State

Problem

Jobs that invoke asynchronous Oracle ADF Business Components services run on separate JVMs or computers. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution

To resolve this issue, you must troubleshoot the native job.

To resolve this problem for synchronous Oracle ADF Business Components jobs:

Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Viewing Job Request Details for further information about the Request Details page.

The Log Message page displays. By default when user navigates to view the logs for a request, only messages that are logged in Oracle Enterprise Scheduler cluster scope are shown. (If the ESSAPP application is not deployed to a cluster the messages that are logged in the Managed Server scope are shown). However, Oracle Enterprise Scheduler propagates the ECID associated with the request across subsystems, such as Oracle SOA Suite and Oracle ADF.
Make note of the value in the ECID field.
From the Broaden Target Scope list, select the /farm_name/domain_name (Oracle WebLogic Domain) to view messages across the domain.
In the Log Messages page for the Oracle WebLogic Server domain, in the Selected Targets section, ensure the search includes the ECID field with the value from the Request Details page and the Component Name field.
Search and view log records for Oracle ADF Business Components and web services stack for the ECID and note any issues. See the "Viewing and Searching Log Files" section in the Oracle Fusion Middleware Administrator's Guide.
Observe if the Oracle ADF Business Components completed successfully or completed with error.
If the service is complete and the job is still running in Oracle Enterprise Scheduler, manually complete the job in the Request Details page. For related information, see Troubleshooting Asynchronous Scheduled Jobs.

B.4.6 Asynchronous PL/SQL Job Remains in RUNNING State

Problem 1

PL/SQL jobs run on separate machines. In these cases, Oracle Enterprise Scheduler depends on the remote job sending a completion status at end of processing that defines the job outcome. However this message may never get generated due to various reasons.

Solution 1

PL/SQL jobs can be identified in the Oracle Enterprise Scheduler by their job names. Job definition names are available from the Request Details page in the Fusion Middleware Control associated with the request. See Viewing Job Request Details.

To resolve this issue, troubleshoot the native job. For more information, see Troubleshooting Asynchronous PL/SQL Jobs.

Problem 2

Oracle Enterprise Scheduler internally uses the Database Management System (DBMS) scheduler to schedule PL/SQL jobs. In some cases, the DBMS scheduler has not scheduled the job request, even though Oracle Enterprise Scheduler has submitted the job to the DBMS scheduler and set its state to RUNNING.

Solution 2

To resolve this issue:

Verify the DBMS Scheduler resource usage. See the "Administering Oracle Scheduler" chapter in the Oracle Database Administrator's Guide.
Change the PL/SQL job throttle limit by configuring PL/SQL job limits. See Managing Workshifts.

B.4.7 Job Does Not Execute at Scheduled Time

Problem

When a job's scheduled time arrives, it does not execute.

Solution

To resolve this problem, view the Request Details page in the Fusion Middleware Control. This page provides built in diagnostics to show what the issue is. For jobs that fails with an error, the Request Details page shows the reason and provides access to the job request log from the Action menu. See Viewing Job Request Details.

For more information job diagnostics, see Job Diagnostics.

B.4.8 Asynchronous Java Job Requires Manual-Error Recovery

Problem

A job gets placed in the ERROR_MANUAL_RECOVERY state. There are a number of reasons why a job may end up in error manual recovery. For example for an asynchronous job, the job implementation may not know if the job was successfully launched due to an error and throws the error manual recovery exception. For more reasons on why a job can end up in error manual recovery, see Steps for Manual Recovery.

Solution

To resolve this issue, manually update the job status to complete it. See Handling Synchronous Java Jobs Requiring Manual Recovery and Handling Stuck Asynchronous Jobs Requiring Manual Recovery.

B.4.9 Spawned (Process Type) Job Requires Manual Error Recovery

Problem

A spawned process type job gets placed in the ERROR_MANUAL_RECOVERY state.

Solution

To solve this issue, transition the request to a terminal state:

Identify the spawned host and process ID in the Request Details page. See Viewing Job Request Details for more information about the Request Details page.
If the process is still running on the host, wait for it to complete or terminate it.
When the process is no longer running, recover the request to begin the transition to an error state. It is subject to auto-retries if configured. From Fusion Middleware Control, perform the following steps:
1. Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests. When selecting a value for Status, select Cancelled and Error Manual Recovery.
2. On the Request Details page, from the Action menu, select Recover Stuck Request. See Viewing Job Request Details for more information about the Request Details page.
  
  A request in the CANCELLED state is put to CANCELLED state and a request in ERROR_MANUAL_RECOVER_STATE is put to ERROR state, specifying an appropriate error message. The error message specified by the user is shown on Request Details page.
3. Manually update the job status to complete it. See Handling Synchronous Java Jobs Requiring Manual Recovery and Handling Stuck Asynchronous Jobs Requiring Manual Recovery.

For more information on manually recovering a job, see Steps for Manual Recovery

B.4.10 Job Remains in CANCELLING State

Problem

Sometimes when a job is canceled, it stays in CANCELLING state and does not get canceled. Results of a cancellation request depends on the stage of processing for the request when the cancel happens and the results of that stage. Many jobs are implemented as asynchronous ADF service invocations. The application infrastructure does not support cancellation of in-flight service requests, and as a result, the job does not cancel as expected. In addition, there are additional reasons why a job may get stuck in CANCELLING state.

For PL/SQL jobs, Oracle Enterprise Scheduler attempts to kill the RDBMS scheduler job. For spawned process, Oracle Enterprise Scheduler tries to kill the running process. If the job is successfully killed, the request transitions to the CANCELLED state. If the job completes before it can be killed, the state to which the request transitions depends on the result of the job execution. For these types of jobs, this issue should not occur.
Asynchronous Java job: The request was canceled, but the remote job never contacted Oracle Enterprise Scheduler with its terminal status. This could happen if the job is still executing because either AsyncCancellable interface was not implemented or the remote cancel operation did not succeed. It could also happen if the remote system is unable to respond.
Synchronous Java job: The request was canceled and the job is still executing. This could happen if either Cancellable() interface was not implemented or the job's Executable.execute() method still did not return after Cancellable.cancel was invoked. For more information about the Executable interface, see the chapter "Use Case Oracle Enterprise Scheduler Sample Application" in Oracle Fusion Middleware Developer's Guide for Oracle Enterprise Scheduling Service.
Job sets (Parent-Child Requests): In cases of job sets, the cancellation operation propagates to all eligible child requests. Until all child requests are completed, the parent request remains in the CANCELLING state.
SOA Java job: In cases where a job ends in an error, look for the ECID in the Fusion Middleware Control by locating the composite instance and looking at the audit trail and logs tagged with the ECID to see what happened. For more information about finding the ECID, see the "Viewing the Audit Trail and Process Flow of a BPEL Process Service Component" section in the Oracle Fusion Middleware Administrator's Guide for Oracle SOA Suite and Oracle Business Process Management Suite.
Oracle ADF Business Component Java Job: In cases where requests end in an error, look for the message ID in the log file for the message ID in the server-name-diagnostic.log file in the following directories to see what happened:
```
(UNIX) DOMAIN_HOME/servers/server_name/logs
(Windows) DOMAIN_HOME\servers\server_name\logs
```
For more information about viewing log files, see the "Viewing and Searching Log Files" section in the Oracle Fusion Middleware Administrator's Guide.

Solution

Manually intervene in the Fusion Middleware Control to complete the job stuck in CANCELLING state.

To recover asynchronous requests that are stuck, determine whether the remote job is still executing. If it is, nothing should be done on the Oracle Enterprise Scheduler side. If the remote job is no longer executing, then perform the following from Fusion Middleware Control to complete the request:

Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests.
On the Request Details page, from the Action menu, select Recover Stuck Request. See Viewing Job Request Details for more information about the Request Details page.

For synchronous Java jobs, wait for the job to complete. If the job is irrevocably hung, then the server on which it is executing must be restarted.

B.4.11 Performance and Scalability Goes Down When Two Very Database Intensive Jobs Run at the Same Time

Problem

Performance and scalability goes down when two very database intensive jobs run at the same time.

To determine job performance issues:

Use operating-system level or Database Management System (DBMS) tools to check performance metrics across the Oracle Database to ensure the performance bottleneck is isolated to these two jobs running simultaneously and that tuning practices have been attempted.
Determine the execution time of the job:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. In the Scheduling Service home page, click the Top 10 Long Running Job Requests tab to view the jobs.

Solution

To solve this problem, perform one of two solutions:

Mark the two very database intensive jobs as incompatible. Sometimes we want to ensure two jobs never run at the same time, not because they are incompatible and would corrupt the system if run simultaneously, but because they heavily load the same resource. In this case, the jobs can be defined to be incompatible so they never run at the same time. See Managing Incompatibilities.
Schedule the jobs at different times. See Submitting an Oracle Enterprise Scheduler Job Request.

B.4.12 Newly Added Server Is Not Being Utilized or Running Inappropriate Jobs

Problem

A newly added Managed Server does not run jobs as expected or is not being utilized at all.

You see in the job history over time that no jobs are running on this server or you notice in the job history that a particular job is running on this server that should not run.

To view the job history:

From the navigation pane, expand the farm, and then Scheduling Services.
Select the ESSAPP application for the appropriate Managed Server.
You see in the job history over time that no jobs are running on this server.
Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests, to view the list of jobs that already have executed.
From the Request ID column in the Request Search page, click a job to go to the Request Details page for the job.
In the Execution Trail section of the Request Details page, view the Dispatcher, Processor, Work Assignment, and Workshift the job ran on.

Solution

After a new server is added, Oracle Enterprise Scheduler determines if the default work assignment can be used based on how other processors are bound. If it cannot use the default work assignment, it configures the new server to only run the health check service (internal work assignment ESSInternalWA). Revisit this default configuration to configure the work assignment binding to this server as desired and removing the internal work assignment after the health check is complete. For more details, see Expanding an Oracle Enterprise Scheduler Cluster.

B.4.13 Oracle Enterprise Scheduler Run-Time System Is Throwing Errors

Problem

Oracle Enterprise Scheduler run-time system is not behaving properly and is throwing errors or encountering problems when processing a job.

Solution

To identify and solve this problem, review the Oracle Enterprise Scheduler system logs to troubleshoot this issue. From Fusion Middleware Control, perform the following:

Search for the request, as described in Searching for Oracle Enterprise Scheduler Job Requests.
On the Request Details page, from the Action menu, select Request Log to view the log message. See Viewing Job Request Details for further information about the Request Details page.
To adjust the log levels:
1. From the navigation pane, expand the farm, WebLogic Domain, Oracle Enterprise Scheduler cluster, and select the Oracle Enterprise Scheduler server (for example, ess_server1).
  
  The WebLogic Server home page displays.
2. From the WebLogic Server menu, select Logs > Log Configuration to display the Log Configuration page.

You can configure the Oracle Enterprise Scheduler server loggers for an Oracle WebLogic Server by modifying the logging.xml file of that Oracle WebLogic Server. By default, there is no explicit logger entry for the Oracle Enterprise Scheduler and it inherits the logging level and log handlers configured for the parent logger, typically the "oracle" logger or the ("") root logger.

By default, the log messages for the Oracle Enterprise Scheduler server logger can be found in the Oracle WebLogic Server diagnostic log file for that Oracle WebLogic Server. For more information on logging and log levels, see Setting Oracle Enterprise Scheduler Log Levels.

Note:

The logger only shows logs written by Oracle Enterprise Scheduler job running in Oracle WebLogic Server. After Oracle Enterprise Scheduler transfers control of running PL/SQL and C jobs to the PL/SQL or C process, respectively, PL/SQL and C job logging data is not written to the Oracle Enterprise Scheduler logs as they run in a separate process.

B.4.14 Oracle Enterprise Scheduler Is Running Out Of Database Connections

Problem

If Oracle Enterprise Scheduler is running out of database connections, there could be a problem with connection leaks in Oracle Enterprise Scheduler.

Description

See the "Running out of Data Source Connections" section in the Oracle Fusion Middleware Administrator's Guide.

B.4.15 Job Queue Full Due to a Hanging Job

Problem

At times, some jobs may not behave as expected. The job queue may become large due to some job spinning or hanging or if a job has memory leaks.

Solution

Here are some typical scenarios:

Java job goes into an infinite loop and there are other jobs waiting in the queue for it to finish.

To resolve this problem, perform the following with Fusion Middleware Control:
1. From the navigation pane, expand the farm, and then Scheduling Services.
2. Select the ESSAPP application for the appropriate Managed Server.
3. From the Scheduling Service menu, choose Request Processor > Configure to review the Thread Count field in the Configure Request Processor page. See Configuring a Request Processor.
  
  If the Thread Count field is set to 25 synchronous Java jobs, only 25 Java jobs are permitted to be in the RUNNING state, and all other Java jobs are in the queue have to wait to be processed.
  
  If some jobs are performing heavy processing or seem to hang, you can isolate such jobs by defining dedicated work assignments to process them on a specific Oracle Enterprise Scheduler server, leaving the other servers to process the rest of the jobs. See Creating or Editing a Work Assignment.
4. Restart the Oracle Enterprise Scheduler server:
  1. From the navigation pane, expand the farm and then WebLogic Domain.
  2. Expand the Oracle Enterprise Scheduler cluster and select the Oracle Enterprise Scheduler server.
  3. In the WebLogic Server home page, from the WebLogic Server menu, choose Control > Shut Down.
  4. After the server is shut down, from the WebLogic Server home page, from the WebLogic Server menu, choose Control > Start Up.
    
    After restarting the server, the Oracle Enterprise Scheduler moves RUNNING jobs to the ERROR state and starts processing the next batch of jobs in the queue.
PL/SQL job goes into infinite loop and never exits. Many other PL/SQL jobs are also submitted.

Cancel the PLSQL job to move the job to CANCELLED state. See Canceling Oracle Enterprise Scheduler Job Requests.
At times, heavy database inserts and updates cause jobs to wait.

For example, consider a PL/SQL procedure that performs 1 million inserts, updates, and deletes on a table. Such as PL/SQL procedure can take about 20 minute to two hours to complete, depending on the DBMS load. If a work Assignment named wa1 has been created with a PLSQL concurrency limit setting of 25. There are many PL/SQL jobs and each job runs this PL/SQL procedure.

In this case, the Oracle Enterprise Scheduler server processes only 25 PL/SQL jobs concurrently. Since each PL/SQL job is taking one to two hours to complete, the remaining PL/SQL jobs are not in the WAIT state. After the completion of a RUNNING job, the first job from the WAIT queue is picked and is scheduled to run.