17 Managing Business and System Errors

This chapter describes how to indicate Oracle Enterprise Scheduler system and business errors as well as implement job request retries.

This chapter includes the following sections:

Section 17.1, "Introduction to Managing Business and System Errors"
Section 17.2, "Indicating Errors"
Section 17.3, "Configuring Retries for a Job Request"
Section 17.4, "Finding and Diagnosing Job Requests in Error State"

17.1 Introduction to Managing Business and System Errors

When an Oracle Enterprise Scheduler job request encounters an error during execution, Oracle Enterprise Scheduler can indicate whether the error is a business or system error.

A business error occurs when a job request must abort prematurely, but is otherwise able to exit cleanly, leaving its data in a consistent state. Examples of scenarios requiring a job to abort prematurely include a particular application setup or configuration condition, a functional conflict that requires an early exit or corrupt or inconsistent data.

A system error occurs when a job request encounters a technical error from which it cannot recover, but otherwise exits of its own volition. Alternatively, a system error occurs when the server or operating system running the job crashes. Examples of system errors include table space issues and unhandled runtime exceptions.

A job request that indicates an error is placed in the terminal state of ERROR. The error type field for a job request indicates whether the error is a business or system error. System errored job requests can be automatically retried if they are properly configured. Business errored job requests cannot be retried.

17.2 Indicating Errors

You can indicate business and system errors using specific error statuses or exit codes for each job type.

For more information about using exit codes, see the following sections:

17.2.1 How to Indicate a Business Error

Table 17-1 shows the code used to indicate a business error for each job type. For a business error, the job request state is set to ERROR, the error type to Business and the cause to PROCESS_ERROR. For the Java jobs, the table lists different stages in running a job along with a business error indication for each.

Table 17-1 Indicating a Business Error

Job Type or Job Stage	Business Error Indication
`Executable.execute` (Java job)	Throw `ExecutionBizErrorException` (extends `ExecutionErrorException`).
Asynchronous Java job (initiated from AsyncJava)	Send `AsyncStatus.BIZ_ERROR`.
`Updatable.onEvent`	Return `AsyncStatus.BIZ_ERROR` in the `UpdateAction`.
`CJobType`	Return `FDP_BIZERR` using `afpend()` API.
`PlSqlJobType`	Return `retcode = '3'`.
`SqlPlusJobType`	Set `FND_JOB.BIZERR_V` using `FND_JOB.SET_SQLPLUS_STATUS` API.
`PerlJobType`	Return exit code of 3.
`HostJobType`	Return exit code of 3.

17.2.2 How to Indicate a System Error

A system error results from an unhandled exception and may also be explicitly indicated by the job, as shown in Table 17-2. For a system error, the request state is set to ERROR and the error type to System. For the Java jobs, the table lists different stages in running a job along with a system error indication for each.

Table 17-2 Indicating System Errors

Job Type or Job Stage	System Error Indication
`Executable.execute` (Java job)	Throw `ExecutionErrorException`.
Asynchronous Java job (initiated from AsyncJava)	Send `AsyncStatus.ERROR`.
`Updatable.onEvent`	Return `AsyncStatus.ERROR` in the `UpdateAction`.
`CJobType`	Return `FDP_ERROR` using `afpend()` API.
`PlSqlJobType`	Return `retcode = '2'`.
`SqlPlusJobType`	Set `FND_JOB.FAILURE_V` using `FND_JOB.SET_SQLPLUS_STATUS` API.
`PerlJobType`	Return an exit code of 1.
`HostJobType`	Return an exit code of 1.

17.3 Configuring Retries for a Job Request

Job requests that fail as a result of a system error can be retried, meaning they can be configured to automatically re-run from the pre-process stage.

Oracle Enterprise Scheduler uses an increasing delay algorithm to improve the chances that the system error will have been resolved when the request is retried. During the delay, the request is placed in WAIT state. On the first system error, the delay is 1 minute; on the second, 2 minutes; on the third, 5 minutes; on the fourth system error and greater, the delay is 10 minutes. For example, suppose a job request fails with a system error three times before it is successful. The job request is delayed a total of 8 minutes (1+2+5).

When a job request fails, resources such as incompatibility locks are released, and the job request goes back to the wait queue. Incompatibility locks are released only for the job request being retried and not for any parent request that is still active.

The job may have already completed some of its processing when the error occurs. On retry, the job must be able to continue its processing from the point of error., meaning it must be an idempotent job. Idempotent jobs can be configured so that the job request is automatically retried in case of a system error. An idempotent job is able to continue where it left off when it is retried.

Note:

Configure retries only for idempotent jobs.

17.3.1 How to Configure Retries for a Job Request

The system property SYS_retries enables configuring the maximum number of times a failed job request can be retried.

To configure retries for a job request:

In JDeveloper, edit the job definition.
Using the system property SYS_retries, enter the number of times the job request is to be automatically retried. A value of zero indicates that the job request will not be retried. The property SYS_retries has a default value of zero, and can only be defined for idempotent jobs.

Note:
Job requests that fail with a business error are never automatically retried. Oracle Enterprise Scheduler ignores the SYS_retries parameter in such cases.

For more information about configuring properties for a job request, see Chapter 9, "How to Create a Job Definition."

17.3.2 What Happens at Run Time: How a Job Request Is Retried

The behavior of retried job requests differs depending on the type of job request.

Job set retry: Job sets cannot be retried, however, the steps of a job set can be retried provided the steps themselves are job definitions. When a job set step throws a system error, Oracle Enterprise Scheduler retries the step if the job definition associated with the step is configured for retry. When retrying a step, the incompatibility locks for the step request are released, while incompatibility locks for parent job sets continue to be held. This means that the incompatibility locks for parent job sets are held across retries of a job set step. The state of the job set is unaffected by the state of the step until the step reaches a non-error terminal state or all retries for the step have been exhausted.

For serial job sets, all retries are completed for a step before any link is followed. If a job set step defines both ON_SUCCESS and ON_ERROR links, the ON_ERROR link is not followed until all retries have been exhausted and the step has reached a terminal state of ERROR.
Sub-request retry: Sub-requests can be retried. When a sub-request throws a system error, Oracle Enterprise Scheduler retries the sub-request as many times as specified by the retry configuration for the sub-request. The parent request remains in PAUSED state until the sub-request reaches a non-error terminal state or all retries for the sub-request have been exhausted. Neither sub-request execution nor retry affects the incompatibility locks of the parent job request, meaning the parent holds its incompatibilities across sub-request retries.
Recurring job request retry: A submitted recurring job request cannot be retried. However, each recurring instance can itself be retried. For example, suppose the job definition for a recurring request has SYS_retries set to 3. Each instance of the recurrence that fails with a system error can be retried up to 3 times.

17.3.3 What You Should Know about Configuring Retries for a Job Request

Following is a list of recommendations for configuring retries for a job request.

To minimize the amount of time and effort required to recover from a job failure, it is advisable to develop most jobs as idempotent jobs (able to continue from the point of departure when retried). Thus, if the same job request executes again after it previously failed, the job code ensures that the retry is handled properly. If a job is idempotent, it can be configured to automatically retry when encountering system errors. This is especially important for long running jobs where recovery involves manually rolling back changes and restarting the job from the beginning.
If the job is idempotent, set SYS_retries to a positive number so that the job can be automatically retried in case of system error.
If the job is not idempotent, do not set SYS_retries. This prevents the job from being run twice with unpredictable results.
When defining a job set, make sure the ERROR branch connects to a job set step that does not depend on the successful completion of the previous step.
When developing parent and sub-requests, use the APIs described in Section 17.4, "Finding and Diagnosing Job Requests in Error State" in the parent request to determine the outcome of the sub-request. The state of the sub-request determines what to do next in the context of the parent request. The APIs enable the parent request to retrieve the state of the sub-request and determine whether any errors that have occurred in the sub-request are business or system errors.

17.4 Finding and Diagnosing Job Requests in Error State

You can use APIs to determine the following:

The state of a job request,
Which job requests have ended in error,
The number of times a job request has been retried.

Alternatively, you can use Fusion Applications Control to search for job requests that have ended in error. For more information, see the section "Managing Logging for Oracle Enterprise Scheduler" in the chapter "Managing Oracle Enterprise Scheduler Service and Jobs" in the Oracle Fusion Applications Administrator's Guide. You can also use an Oracle ADF UI to view logging information for Oracle Enterprise Scheduler jobs. For more information, see Section 9.17.3, "How to Log Scheduled Job Requests in an Oracle ADF UI."

17.4.1 Retrieving the State of a Job Request

Use the RuntimeService.getRequestDetailBasic API to retrieve the job request state. If the job request is in error state, retrieve the ErrorType of the job request to determine the type of terminal error that occurred. Example 17-1 shows sample code illustrating the use of the API.

Example 17-1 Retrieving the State of a Job Request

RequestDetail detail = runtime.getRequestDetailBasic(handle, requestId);
State state = detail.getState();
 
if (state == State.ERROR) {
    ErrorType errorType = detail.getErrorType();
    if (errorType == ErrorType.System) {
        // The job request had a system error.
    } else if (errorType == ErrorType.Business) {
        // The job request had a business error.
    }
}

For PL/SQL job requests, use the get_error_type API to determine the type of terminal error that has occurred. Example 17-2 shows sample code illustrating the use of the API.

Example 17-2 Retrieving the State of a PL/SQL Job Request

v_req_state       integer := null;
v_error_type      integer := null;
 
v_req_state := ess_runtime.get_request_state(v_request_id);
if v_req_state = ERROR_STATE then
    v_error_type := ess_runtime.get_error_type(v_request_id);
    if v_error_type = ETYPE_SYSTEM then
        -- The job request had a system error.
    elsif v_error_type = ETYPE_BUSINESS then
        -- The job request had a business error.
    end if;
end if;

17.4.2 Finding Job Requests with Business Errors

Use the RuntimeService.queryRequests API and include a match for the error state and ErrorType of business. Example 17-3 shows sample code illustrating the use of the API.

Example 17-3 Finding Job Requests with Business Errors

Filter filter = new Filter(
        RuntimeService.QueryField.STATE.fieldName(),
        Filter.Comparator.EQUALS,
        new Integer(State.ERROR.value()));
filter = filter.and(
        RuntimeService.QueryField.ERROR_TYPE.fieldName(),
        Filter.Comparator.EQUALS,
        new Integer(ErrorType.Business.value()));
Enumeration requests = runtime.queryRequests(handle, filter, null, false);

17.4.3 Determining the Number of Times a Job Request Has Been Retried

Use the RuntimeService.getRequestDetailBasic API to retrieve the job request retry count. The retry count is the number of times Oracle Enterprise Scheduler automatically retries the job request due to a system error. Example 17-4 shows sample code illustrating the use of the API.

Example 17-4 Determining the Number of Times a Job Request Has Been Retried

RequestDetail detail = runtime.getRequestDetailBasic(handle, requestId);
int retriedCount = detail.getRetriedCount();
if (retriedCount > 0) {
    // The job request has been retried the number of times indicated by
    // retriedCount.
} else {
    // The job request has not been retried.
}

For PL/SQL job requests, use the get_retried_count API to determine the number of times Oracle Enterprise Scheduler has automatically retried the job request. Example 17-5 shows sample code illustrating the use of the API.

Example 17-5 Determining the Number of Times a PL/SQL Job Request Has Been Retried

v_rcount      integer := null;
 
v_rcount := ess_runtime.get_retried_count(v_request_id);
if v_rcount > 0 then
    -- The job request has beem retried the number of times indicated by v_rcount.
else
    -- The job request has not been retried.
end if;