Chapter 8 Understanding Jobs, Events and Errors

A single operation often consists of multiple jobs, as in the following examples:

Table 8.1 Job Examples

Operation

Jobs

Creating a clustered server pool

Create a server pool

Create a cluster heartbeat device

Create a cluster

Creating a virtual machine

Create a virtual machine

Add the virtual machine to a server pool

Create a virtual disk

Add the virtual disk to the virtual machine

Create a VNIC on the virtual machine


Oracle VM uses a job operation framework that supports a flexible approach to the configuration of physical and virtual objects. Oracle VM Manager maintains an accurate and consistent view of the virtualization environment while users perform separate and simultaneous jobs. A configuration change on any individual object type within Oracle VM Manager is considered to be a job. Since Oracle VM allows for incredibly granular configuration of each object type within the environment, some configuration changes may affect more than one object type within Oracle VM Manager. This means that some configuration actions, such as the creation of a clustered server pool, may consist of many consecutive jobs, as each different object and the relationships between them are created, updated or removed within the Oracle VM Manager database.

A job begins when you make any change in Oracle VM Manager. Each change you make appears in the Job Summary pane on the Jobs tab as a discrete operation. Job operations can be comparatively minor actions, such as renaming a virtual machine. Operations may also have a wider scope, such as the creation of a new network or storage device. Performing any of these actions changes the configuration of Oracle VM Manager. When a new job is started, information about the job is displayed in the Job Summary pane at the bottom of the management pane to show the job's progress.

Furthermore, some jobs may be triggered internally within Oracle VM Manager by particular events, as a result of internally set recurring job operations, or as child jobs that are triggered as a subset of larger job operations initiated by a user. There is no distinction between a job resulting from a user action or from an internal event within Oracle VM Manager.

As of Release 3.3, Oracle VM Manager handles jobs asynchronously. This means that when an operation using the Oracle VM Manager Web Interface or Oracle VM Manager Command Line Interface triggers one or more jobs, an error may not be returned immediately if a job fails. It is important to monitor job progress when you perform an operation to check that all jobs complete successfully.

8.1 What are Job States?

A job listed in Oracle VM Manager can have any of the states defined in Table 8.2, “Job States”.

Table 8.2 Job States

Job State

Description

Completed

The job has finished successfully.

In Progress

The job is currently underway.

Aborted

The job has been aborted. Oracle VM Manager attempts to revert to the previous state.

Failed

The job has not finished successfully. Oracle VM Manager has reverted to the previous state.

Child Queued

The job has spawned a child job. This child job is in queue but is not running yet. The parent job is waiting for all child jobs to complete.

Child Running

The job has spawned a child job. This child job is currently running. The parent job is waiting for all child jobs to complete.


Some job operations, such as renaming an object, complete quickly. Others, such as adjusting the memory used by a virtual machine, take longer. Monitoring Job States allows you to understand how a job is progressing.

If a job is running or fails to complete, you can abort the job to cancel it. For example, a virtual machine or Oracle VM Server may be in an unresponsive state and fail to respond to a start or stop request. The appropriate action is to abort the job. When a job is canceled in this way, its state changes to Aborted. Note that in some cases the job may be aborted but once the action has been triggered it may be the case that Oracle VM Manager is unable to roll-back to its previous state. In this case, an action to abort the job may succeed but the action triggered by the job still completes. In this situation, the job is changes state to Aborted, but you may discover that the action that triggered the job completes anyway. For example, if you attempt to stop a virtual machine and then abort the job to do so, it may be the case that Oracle VM Manager has already succeeded in sending the shutdown message to the operating system within the virtual machine. In this case, the action that triggered the job effectively succeeds because the virtual machine continues its shutdown process, but the job is still Aborted. See Abort Jobs in the Oracle VM Manager User's Guide for information on aborting jobs.

Jobs may hang or remain In Progress every time a virtual machine is started or stopped. A paravirtual machine may be in an unresponsive state for a variety of reasons and consequently fail to respond to a start or stop request. The appropriate action in this case is to abort the job. For example, when starting a PVM virtual machine using PXE type boot with an invalid network URL, this causes the virtual machine status to be In Progress indefinitely. To resolve this, abort the virtual machine start job. Edit the virtual machine and provide the correct URL.

If a job has a Failed state, more information is available for the job, usually indicating the reason for failure. This information is accessible using any of the interfaces provided for Oracle VM Manager.

Some jobs may spawn child jobs. In this case, the parent job will have the status Child Queued or Child Running. The parent job must wait until all spawned child jobs are completed before it is able to report its status. This is typical in situations where multiple actions must be performed on different objects within Oracle VM Manager. For instance, when presenting a repository to multiple servers, a parent job is created to handle the action and report on its overall status, but subsequent child jobs are spawned to actually carry out the task on each server.

If a job seems to hang in a Child Queued state for an extended period of time, check the reachability of your server. Oracle VM Manager continues to try to execute a job even if the server is unreachable. It does not stop attempting to run the job until the job has been completed or the job is aborted. This can cause the job queue to lock so that subsequent actions within Oracle VM Manager are not added to the job queue. If this happens, you may need to abort the job that is hanging in this state to free up the job queue.

8.2 How are Failed Jobs Handled?

Job operations are validated by Oracle VM Manager as they are added to the Jobs tab. The failure of any job operation causes the following to happen:

  • The failed job is canceled.

  • Remaining jobs are attempted.

In previous Oracle VM releases (before Release 3.3.x), multiple operations could be combined into a single job. If the job failed, operations belonging to the job that had not yet been executed could be canceled and changes to the database could be rolled back to a consistent state prior to when the job was triggered. Sometimes rollback was ineffective and some manual cleanup may have been required. Unfortunately, since the operations were bundled into a single job, the actual status of each operation in the job was not transparent. This made it difficult to determine what actions needed to be performed to remedy the situation.

From Release 3.3.x onwards, this does not occur. The current architecture treats each operation as a single API call so that each job performs a single action. This makes each action more transparent so that when a job fails, the state of the environment is much clearer and it is easier to determine what caused the job to fail. It is true that a job can spawn child jobs, but each job is a discrete operation. This means that if an operation fails you must look through the job list to see which jobs failed, and which succeeded. When you find a failed job, attempt to remedy the error or fault, then retry the job. If the Oracle VM environment is left in an inconsistent state due to a failed job, then you must clean it up using the Oracle VM Manager Web Interface, or, if this cannot be done, using the Oracle VM Manager Command Line Interface.

8.3 What are Recurring Jobs?

Oracle VM Manager periodically performs recurring jobs, such as checking server update repositories for any available software updates for Oracle VM Servers. These jobs and the settings available for them can be configured within the Oracle VM Manager Web Interface. The recurring jobs are listed in Recurring in the Oracle VM Manager User's Guide.

Some periodic tasks are performed internally and do not include any user configurable parameters. These tasks are not listed among the recurring jobs exposed via the Oracle VM Manager Web Interface.

8.4 Acknowledging Events/Errors

When a job operation fails, one or more events may be generated. These can be viewed through any of the interfaces to Oracle VM Manager and are displayed in the user interface as flagged with yellow or red icons in the navigation tree. If an object has an error event associated with it you must resolve the issue that caused the event and then acknowledge the event to clear the error and notify Oracle VM Manager that the object is able to resume normal operations. For example, this can occur if an Oracle VM Server or virtual machine is stopped as a result of an error. Alternatively, events may be generated if a network goes down due to a hardware issue affecting either a switch or a network interface card. Oracle VM Servers, virtual machines, repositories and storage objects can have error events associated with them.

Note that often an event may be generated as the result of a problem outside of Oracle VM Manager. You should only acknowledge an event when you are certain that the problem that initially triggered the event has been resolved. If you acknowledge and event and the problem has not been resolved, Oracle VM Manager may continue with any remaining jobs that may have been blocked by the event. This can make it more difficult to roll back the environment and may result in further errors occurring.

To acknowledge events, see the object's Events perspective in Events Perspective in the Oracle VM Manager User's Guide.