10 Utilizing the Job System and Corrective Actions

The Enterprise Manager Cloud Control Job System can automate routine administrative tasks and synchronize components in your environment so you can manage them more efficiently.

This chapter facilitates your usage of the Job System by presenting instructional information in the following sections:

Job System Purpose and Overview

The Enterprise Manager Job System serves these purposes:

  • Automates many administrative tasks; for example: backup, cloning, and patching

  • Enables you to create your own jobs using your own custom OS and SQL scripts

  • Enables you to create your own multi-task jobs comprised of multiple tasks

  • Centralizes environment job scheduling into one robust tool

A job is a unit of work that you define to automate commonly-run tasks. Scheduling flexibility is one of the advantages of jobs. You can schedule a job to start immediately or start at a later date and time. You can also run the job once or at a specific interval, such as three times every month.

The Job Activity page is the hub of the Job System. From this page, you can:

  • Search for existing job runs and job executions filtered by name, owner, status, scheduled start, job type, target type, and target name

  • Create a job

  • View or edit the job definition

  • Create like, copy to library, suspend, resume, stop, and delete a job

  • View results, edit, create like, suspend, resume, retry, stop, and delete a job run or execution

Figure 10-1 Job Activity Page


Screen capture of Job Activity page. One job displayed.

Besides accessing the Job Activity page from the Enterprise menu, you can also access this page from any target-specific menu for all target types by selecting Job Activity from the target type's menu. When you access this page from these alternate locations, rather than showing the entire list of jobs, the Job Activity page shows a subset of the jobs associated with the particular target.

Changing Job Activity Summary Table Views

In addition to listing job executions in a conventional tabular format, Enterprise Manager also allows you to display the job executions using in using alternate views that can make job execution data more meaningful. For example, seeing which jobs are using the most resource or are taking the longest to run.

There are three job activity display options:

  • List: Conventional tabular display of job information.

  • Summary: Displays a graphical rollup of job runs based on selected attributes of attributes. Clicking on any cell in the rollup view further refines the search and takes you to the tabular view where you can analyze job runs in more granular detail.

Job Searches

For convenience, you can define job searches that allow you to view/access specific jobs that are of frequent interest. By default, there are predefined job searches at the top of the Job Activity page.

  • Jobs with problems in the Last 24 Hours

  • Job Run activity in the last 24 hours

  • Jobs scheduled to run in the next 24 hours

  • Jobs belonging to the currently logged in user

Saving Job Searches

Saving commonly used job searches allows you to view pertinent job information quickly.

To create a saved search:

  1. From the Available Criteria region, choose the requisite parameters for your search. The results display immediately in the Job Activity table.

  2. Click the Save Search button. The Create Saved Search dialog displays.

  3. Enter a name for your search.

  4. Choose how you want your saved search displayed:

    Show this search when I come to this page (available from the Saved Searches drop-down list)

    OR

    Show on top of the page (available as one of the summary boxes at the top of the page)

  5. Click OK to save your search.

  6. Click Run to run your search and view the results.

Editing Saved Job Searches

To edit a saved job search:

  1. From the Saved Searches menu, select Manage Saved Searches. The Manage Saved Searches dialog displays.

  2. Click the Edit icon (pencil). The Edit Saved Search dialog displays. Note: In addition to editing search criteria, you can set whether you wan the job to be a Default (Choosing Default runs the search whenever you access the Job Activity page. ) or want the job to appear at the top of the Job Activity page.

  3. Click OK to save your Job search changes.

  4. Click Done to close the Manage Saved Searches dialog.

Importing/Exporting Saved Job Searches

By default, you can only see searches that you have created. To use searches other administrators have created, you can import them. Conversely, you can make job searches you have created available to other administrators by exporting your searches. Once the job searches are accessible to you, they will appear in your list of saved searches.

What Are Job Executions and Job Runs?

The following sections explain the characteristics of each of these.

Job Executions

Job executions are usually associated with one target, such as a patch job on a particular database. These are called single-target jobs because each execution has only one target. However, job executions are not always a one-to-one mapping to a target. Some executions have multiple targets, such as comparing hosts. These jobs are called single-execution jobs, since there is only one execution for all the targets. When a job is run against multiple targets, it runs in one or many executions depending on whether it is a single-execution or single-target job. A few jobs have no target. These jobs are called targetless jobs and run in one execution.

When you submit a job to many targets, it would be tedious to examine the status of each execution of the job against each target. For example, suppose you run a backup job against several databases. A typical question would be: Were all the backup jobs successful, and if not, which jobs failed? If this backup job runs every week, you would want to know which backups were successful and those that failed each week.

Job Runs

With the Job System, you can easily get these answers by viewing the job run. A job run is the summary of all job executions of a job that ran on a particular scheduled date. For example, if you have a job scheduled for March 5th, you will have a March 5 job run. The job table that shows the job run provides a roll-up of the status of the executions, such as Succeeded, Failed, or Error.

Operations on Job Executions and Job Runs

Besides supporting the standard job operations of create, edit, create like, and delete, the Job System enables you to:

  • Suspend jobs

    You can suspend individual executions or entire jobs. For example, you may need to suspend a job if a needed resource was unavailable, or the job needs to be postponed.

    If a job is scheduled to repeat but is suspended past the scheduled repeat time, or a maximum of one day, the execution of this job would be marked "Skipped." A job is also skipped when the scheduled time plus the grace period has passed.

  • Resume jobs

    After you suspend a job, any scheduled executions do not occur until you decide to resume the job.

  • Retry all failed executions in a job run

    When analyzing individual executions or entire jobs, it is useful to retry a failed execution after you determine the cause of the problem. This alleviates the need to create a new job for that failed execution. When you use the Retry operation in the Job System, Enterprise Manager provides links from the failed execution to the retried execution and vice versa, should it become useful to retroactively examine the causes of the failed executions. Only the most recent retry is shown in the Job Run page.

With regard to job runs, the Job System enables you to:

  • Delete old job runs

  • Stop job runs

  • Retry all failed executions in a job run. Successful executions are never retried.

See Also:

For more information on job executions and runs, refer to Enterprise Manager Cloud Control online help.

Preliminary Considerations

Before proceeding to the procedural information presented in Creating Jobs, it is suggested that you read the topics presented in the sections below:

Administrator Roles

Enterprise Manager provides the following administrator types:

  • Administrator — Most jobs and other activities should be initiated using this "normal" user type

  • Super Administrator — There may be limited use cases for a super administrator to run jobs, create blackouts, or own targets.

  • Repository owner (SYSMAN) — The special repository owner user SYSMAN should almost never own or do any of the tasks listed for the other two types above. This user should only be reserved for top-level actions, such as setting up the site and so forth.

Creating Scripts

Besides predefined job tasks, you can define your own job tasks by writing code to be included in OS and SQL scripts. The advantages of using these scripts include:

  • When defining these jobs, you can use target properties.

  • When defining these jobs, you can use the job library, which enables you to share the job and make updates as issues arise. However, you need to resubmit modified library jobs for them to take effect.

  • You can submit the jobs against multiple targets.

  • You can submit the jobs against a group. The job automatically keeps up with changes to group membership.

  • For host command jobs, you can submit to a cluster.

Sharing Job Responsibilities

To allow you to share job responsibilities, the Job System provides job privileges. These job privileges allow you to share the job with other administrators. Using privileges, you can:

  • Grant access to the administrators who need to see the results of the job.

  • Grant Full access to the administrators who may need to edit the job definition or control the job execution (suspend, resume, stop).

You can grant these privileges on an as-needed basis.

Submitting Jobs for Groups

Rather than listing a large number of targets individually, you can use a group as the target of a job. All member targets in the group that match the selected target type of the job are selected as actual targets of the job when it runs. If the membership of the group changes, the actual target list of the job changes with it. If the job repeats, each iteration (or "run") of the job executes on the matching targets in the group at the time of the run.

Overriding the Target Type Selection

To override the target type selection for a group, set targetType=<override_target_type> in the input file for the create_job verb. For example, the default target type for OSCommand jobs is "host". To submit a job against a group of databases, specify:

target_list=my_db_group:composite
targetType=oracle_database

Note that any targets in the group that do not match the target type selected are ignored.

See Also:

Managing Groups

Creating Jobs

Your first task in creating a job from the Job Activity page is to choose a job type, which the next section, Selecting a Job Type, explains. The most typical job types are OS command jobs, script jobs, and multi-task jobs, which are explained in these subsequent sections:

Selecting a Job Type

Using the Job System, you can create a job by clicking Create Job in the Job Activity page and selecting the job type from the Select Job Type dialog. You can find a specifc job type by either searching for the name of a job type or by specifying a target type.

The most commonly used types are as follows:

  • OS Command — Runs an operating system command or script.

  • SQL Script — Runs a user-defined SQL or PL/SQL script.

  • Multi-Task — Use to specify primary characteristics for multi-task jobs or corrective actions. Multi-task jobs enable you to create composite jobs by defining tasks, with each task functioning as an independent job. You edit and define tasks similarly to a regular job.

Creating an OS Command Job

Use this type of job to run an operating system command or script. Tasks and their dependent steps for creating an OS command are discussed below.

Task 1: Initiate Job Creation

  1. From the Enterprise menu, select Job, then Activity.

  2. Click Create Job. The Select Job Type dialog displays.

  3. Choose the OS Command job type and click Select.

Task 2: Specify General Job Information

Perform these steps on the General property page:

  1. Provide a required Name for the job, then select a Target Type from the drop-down.

    After you have selected a target of a particular type for the job, only targets of that same type can be added to the job. If you change target types, the targets you have populated in the Targets table disappear, as well as parameters and credentials for the job.

    If you specify a composite as the target for this job, the job executes only against targets in the composite that are of the selected target type. For example, if you specify a target type of host and a group as the target, the job only executes against the hosts in the group, even if there are other non-host targets in the group. You can also include clusters in the target list if they are of the same base target type. For example, a host cluster would be selected if the target type is "host" and a RAC database would be selected if the target type is database.

  2. Click Add, then select one or more targets from the Search and Select: Targets pop-up window. The targets now appear in the Targets table.

  3. Click the Parameters property page link.

Task 3: Specify Parameters

Perform these steps on the Parameters property page:
  1. Select either Single Operation or Script from the Command Type drop-down.

    The command or script you specify executes against each target specified in the target list for the job. The Management Agent executes it for each of these targets.

    Depending on your objectives, you can choose one of the following options:

    • Single Operation to run a specific command

    • Script to run an OS script and optionally provide an interpreter, which processes the script; for example, %perlbin%/perl or /bin/sh .

    Sometimes, a single command line is insufficient to specify the commands to run, and you may not want to install and update a script on all hosts. In this case, you can use the Script option to specify the script text as part of the job.

  2. Based on your objectives, follow the instructions in Specifying a Single Operation or Specifying a Script.

  3. Click the Credentials property page link.

Note:

The OS Command relies on the target host's shell to execute the command/interpreter specified. On *nix systems, it is /bin/sh -c and on Windows systems, it is cmd /c. The command line specified is interpreted by the corresponding shell.

Task 4: Specify Credentials - (optional)

You do not need to provide input on this page if you want to use the system default of using preferred credentials.

On the Credentials property page, you can specify the credentials that you want the Oracle Management Service to use when it runs the OS Command job against target hosts. The job can use either the job submitter's preferred host-based credentials for the selected targets, or you can specify other credentials to override the preferred credentials.

You do not need to provide input on this page if you have already set preferred credentials.

Tip:

preferred credentials are useful when a job is submitted on multiple targets and each target needs to use different credentials for authentication.

  • To use preferred credentials:

    1. Select the Preferred Credential radio button, which is the default selection.

      If the target for the OS Command job is a host or host group, the preferred host credentials are used. You specify these for the host target on the Preferred Credentials page, and they are different from the host credentials for the host on which the database resides.

    2. Select either Normal Host Credentials or Privileged Host Credentials from the Host Credentials drop-down.

      You specify these separately on the Preferred Credentials page, which you can access by selecting Security from the Setup menu, then Preferred Credentials. The Preferred Credentials page appears, where you can click the Manage Preferred Credentials button to set credentials.

  • To use named credentials:

    1. Select the Named Credential radio button to override database or host preferred credentials.

      The drop-down list is a pre-populated credential set with values saved with names. These are not linked to targets, and you can use them to provide credential and authentication information to tasks.

  • To use other credentials:

    1. Select the New Credential radio button to override previously defined preferred credentials.

      Note that override credentials apply to all targets. This applies even for named credentials.

    2. Optionally select Sudo or PowerBroker as the run privilege.

      Sudo enables you to authorize certain users (or groups of users) to run some (or all) commands as root while logging all commands and arguments. PowerBroker provides access control, manageability, and auditing of all types of privileged accounts.

      If you provide Sudo or PowerBroker details, they must be applicable to all targets. It is assumed that Sudo or PowerBroker settings are already applied on all the hosts on which this job is to run.

      See your Super Administrator about setting up these features if they are not currently enabled.

      Tip:

      For information on using Sudo or PowerBroker, refer to the product guides on their respective product documentation pages.

Task 5: Schedule the Job - (optional)

You do not need to provide input on this page if you want to proceed with the system default of running the job immediately after you submit it.
  1. Select the type of schedule:

    • One Time (Immediately)

      If you do not set a schedule before submitting a job, Enterprise Manager executes the job immediately with an indefinite grace period. You may want to run the job immediately, but specify a definite grace period in case the job is unable to start for various reasons, such as a blackout, for instance.

      A grace period is a period of time that defines the maximum permissible delay when attempting to start a scheduled job. The job system sets the job status to Skipped, if it cannot start the execution between the scheduled time and the time equal to the scheduled time plus the grace period, or within the grace period from the scheduled time.

    • One Time (Later)

      • Setting up a custom schedule:

        You can set up a custom schedule to execute the job at a designated time in the future. When you set the Time Zone for your schedule, the job runs simultaneously on all targets when this time zone reaches the start time you specify. If you select each target's time zone, the job runs at the scheduled time using the time zone of the managed targets. The time zone you select is used consistently when displaying date and time information about the job, such as on the Job Activity page, Job Run page, and Job Execution page.

        For example, if you have targets in the Western United States (US Pacific Time) and Eastern United States (US Eastern Time), and you specify a schedule where Time Zone = US Pacific Time and Start Time = 5:00 p.m., the job runs simultaneously at 5:00 p.m. against the targets in the Western United States and at 8:00 p.m. against the targets in the Eastern United States. If you specify 5:00 p.m. in the Agent time zone, the executions do not run concurrently. The EST target would run 3 hours earlier.

      • Specifying the Grace Period:

        The grace period controls the latest start time for the job in case the job is delayed. A job might not start for many reasons, but the most common reasons are that the Agent was down or there was a blackout. By default, jobs are scheduled with indefinite grace periods.

        A job can start any time before the grace period expires. For example, a job scheduled for 1 p.m. with a grace period of 1 hour can start any time before 2 p.m., but if it has not started by 2 p.m., it is designated as skipped.

    • Repeating

      • Defining the repeat interval:

        Specify the Frequency Type (time unit) and Repeat Every (repeat interval) parameters to define your job's repeat interval.

        The Repeat Until options are as follows:

      • Indefinite: The job will run at the defined repeat interval until it is manually unscheduled.

      • Specified Date: The job will run at the defined repeat interval until the Specified Date is reached.

  2. Click the Access property page link.

Task 6: Specify Who Can Access the Job - (optional)

You do not need to provide input on this page if you want to proceed with the system default of not sharing the job. The table shows the access that administrators and roles have to the job. Only the job owner (or Super Administrator) can make changes on the Job Access page.
  1. Change access levels for administrators and roles, or remove administrators and roles. Your ability to make changes depends on your function.

    If you are a job owner, you can:

    • Change the access of an administrator or role by choosing the Full or View access privilege in the Access Level column in the table.

    • Remove all access to the job for an administrator or role by clicking the icon in the Remove column for the administrator or role. All administrators with Super Administrator privileges have the View access privilege to a job. If you choose to provide access privileges to a role, you can only provide the View access privilege to the role, not the Full access privilege. For private roles, it is possible to grant Full access privileges.

    If you are a Super Administrator, you can:

    • Grant View access to other Enterprise Manager administrators or roles.

    • Revoke all administrator access privileges.

      Note:

      Neither the owner nor a super user can revoke View access from a super user. All super users have View access.

    For more information on access levels, see Access Level Rules.

  2. Click Add to add administrators and roles. The Create Job Add Administrators and Roles page appears.

    1. Specify a Name and Type in the Search section and click Go. If you just click Go without specifying a Name or Type, all administrators and roles in the Management Repository appear in the table.

      The value you specify in the Name field is not case-sensitive. You can specify either * or % as a wildcard character at any location in a string (the wildcard character is implicitly added to the end of any string). For example, if you specify %na in the Name field, names such as ANA, ANA2, and CHRISTINA may be returned as search results in the Results section.

    2. Select one or more administrators or roles in the Results section, then click Select to grant them access to the job. Enterprise Manager returns to the Create Job Access page or the Edit Job Access page, where you can modify the access of administrators and roles.

  3. Define a notification rule.

    You can use the Notification system (rule creation) to easily associate specific jobs with a notification rule. The Cloud Control Notification system enables you to define a notification rule that sends e-mail to the job owner when a job enters one of these chosen states:

    • Scheduled

    • Running

    • Suspended

    • Succeeded

    • Problems

    • Action Required

    Note:

    Before you can specify notifications, you need to set up your email account and notification preferences. See Using Notifications for this information.

Task 7: Conclude Job Creation

At this point, you can either submit the job for execution or save it to the job library.
  • Submitting the job

    Click Submit to send the active job to the job system for execution, and then view the job's execution status on the main Job Activity page. If you are creating a library job, Submit saves the job to the library and returns you to the main Job Library page where you can edit or create other library jobs.

    If you submit a job that has problems, such as missing parameters or credentials, an error appears and you will need to correct these issues before submitting an active job. For library jobs, incomplete specifications are allowed, so no error occurs.

    Note:

    If you click Submit without changing the access, only Super Administrators can view your job.

  • Saving the job to the library

    Click Save to Library to the job to the Job Library as a repository for frequently used jobs. Other administrators can then share and reuse your library job if you provide them with access privileges. Analogous to active jobs, you can grant View or Full access to specific administrators. Additionally, you can use the job library to store:

    • Basic definitions of jobs, then add targets and other custom settings before submitting the job.

    • Jobs for your own reuse or to share with others. You can share jobs using views or giving Full access to the jobs.

    • Critical jobs for resubmitting later, or revised versions of these jobs as issues arise.

Specifying a Single Operation

Note:

The following information applies to step Creating an OS Command Job in Creating an OS Command Job .

Enter the full command in the Command field. For example:

     /bin/df -k /private

Note the following points about specifying a single operation:

  • You can use shell commands as part of your command. The default shell for the platform is used, which is /bin/sh for Linux and cmd/c for Windows.

         ls -la /tmp > /tmp/foobar.out
    
  • If you need to execute two consecutive shell commands, you must invoke the shell in the Command field and the commands themselves in the OS Script field. You would specify this as follows in the Command field:

         sleep 3; ls
    
  • The job status depends on the exit code returned by the command. If the command execution returns 0, the job returns a status of Succeeded. If it returns any other value, it returns a job status of Failed.

Specifying a Script

Note:

The following information applies to step Creating an OS Command Job in Specifying a Script .

The value you specify in the OS Script field is used as stdin for the command interpreter, which defaults to /bin/sh on Linux and cmd/c on Windows. You can override this with another interpreter; for example: %perlbin%/perl. The shell scripts size is limited to 2 GB.

To control the maximum output size, set the mgmt_job_output_size_limit parameter in MGMT_PARAMETERS to the required limit. Values less than 10 KB and greater than 2 GB are ignored. The default output size is 10 MB.

The job status depends on the exit code returned by the last command in the script. If the last command execution returns 0, the job returns a status of Succeeded. If it returns any other value, it returns a job status of Failed. You should implement proper exception handling in the script and return non-zero exit codes when appropriate. This will avoid situations in which the script failed, but the job reports the status as Succeeded.

You can run a script in several ways:

  • OS Scripts — Specify the path name to the script in the OS Script field. For example:

    OS Script field: /path/to/mycommand Interpreter field:

  • List of OS Commands — You do not need to enter anything in the Interpreter field for the following example of standard shell commands for Linux or Unix systems. The OS's default shell of /bin/sh or cmd/c will be used.

         /usr/local/bin/myProg arg1 arg2
         mkdir /home/$USER/mydir
         cp /dir/to/cp/from/file.txt /home/$USER/mydir
         /usr/local/bin/myProg2 /home/$USER/mydir/file.txt
    

    When submitting shell-based jobs, be aware of the syntax you use and the targets you choose. This script does not succeed on NT hosts, for example.

  • Scripts Requiring an Interpreter — Although the OS shell is invoked by default, you can bypass the shell by specifying an alternate interpreter. For example, you can run a Perl script by specifying the Perl script in the OS Script field and the location of the Perl executable in the Interpreter field:

    OS Script field: <Enter-Perl-script-commands-here> Interpreter field: %perlbin%/perl

    The following example shows how to run a list of commands that rely on a certain shell syntax:

         setenv VAR1 value1
         setenv VAR2 value2
         /user/local/bin/myProg $VAR1 $VAR2
    

    You would need to specify csh as the interpreter. Depending on your system configuration, you may need to specify the following string in the Interpreter field:

         /bin/csh 
    

    You have the option of running a script for a list of Windows shell commands, as shown in the following example. The default shell of cmd/c is used for Windows systems.

         C:\programs\MyApp arg1 arg2
         md C:\MyDir
         copy C:\dir1x\copy\from\file.txt \home\$USER\mydir
Access Level Rules

Note:

The following rules apply to Task 6, "Specify Who Can Access the Job - (optional)".

  • Super Administrators always have View access on any job.

  • The Enterprise Manager administrator who owns the job can make any access changes to the job, except revoking View from Super Administrators.

  • Super Administrators with a View or Full access level on a job can grant View (but not Full) to any new user. Super Administrators can also revoke Full and View from normal users, and Full from Super Administrators.

  • Normal Enterprise Manager administrators with Full access levels cannot make any access changes on the job.

  • If the job owner performs a Create Like operation on a job, all access privileges for the new job are identical to the original job. If the job owner grants other administrators View or Full job access to other administrators, and any of these administrators perform a Create Like operation on the job, ALL administrators will, by default, have View access on the newly created job.

Creating a SQL Script Job

The basic process for creating a SQL script job is the same as described in Creating an OS Command Job. The following sections provide supplemental information specific to script jobs:

Specifying Targets

You can run a SQL Script job against database and cluster database target types. You select the targets to run the job against by doing the following:

  1. Click Add in the Targets section.
  2. Select the database target(s) from the pop-up.

Your selection(s) now appears in the Target table.

Note:

For a cluster host or RAC database, a job runs only once for the target, regardless of the number of database instances. Consequently, a job cannot run on all nodes of a RAC cluster.

Specifying Options for the Parameters Page

In a SQL Script job, you can specify any of the following in the SQL Script field of the Parameters property page:

  • Any directives supported by SQL*Plus

  • Contents of the SQL script itself

  • Fully-qualified SQL script file; for example:

         @/private/oracle/scripts/myscript.sql
    

    Make sure that the script file is installed in the appropriate location on all targets.

  • PL/SQL script using syntax supported by SQL*Plus; for example, one of the following:

    EXEC plsql_block;
    

    or

    DECLARE
       local_date DATE;
    BEGIN
       SELECT SYSDATE INTO local_date FROM dual;
    END;
    /
    

    You can use target properties in the SQL Script field, a list of which appears in the Target Properties table. Target properties are case-sensitive. You can enter optional parameters to SQL*Plus in the Parameters field.

Specifying Host and Database Credentials

In the Credentials property page, you specify the host credentials and database credentials. The Management Agent uses the host credentials to launch the SQL*Plus executable, and uses database credentials to connect to the target database and run the SQL script. The job can use either the preferred credentials for hosts and databases, or you can specify other credentials that override the preferred credentials.

  • Use Preferred Credentials

    Select this choice if you want to use the preferred credentials for the targets for your SQL Script job. The credentials used for both host and database are those you specify in the drop-down. If you choose Normal Database Credentials, your normal database preferred credentials are used. If you choose SYSDBA Database Credentials, the SYSDBA preferred credentials are used. For both cases, the host credentials associated with the database target are used. Each time the job executes, it picks up the current values of your preferred credentials.

  • Named Credentials

    Select this choice if you want to override the preferred credentials for all targets, then enter the named credentials you want the job to use on all targets.

    Many IT organizations require that passwords be changed on regular intervals. You can change the password of any preferred credentials using this option. Jobs and corrective actions that use preferred credentials automatically pick up these new changes, because during execution, Enterprise Manager uses the current value of the credentials (both user name and password). Named credentials are also centrally managed. A change to a named credential is propagated to all jobs or corrective actions that use it.

    For corrective actions, if you specify preferred credentials, Enterprise Manager uses the preferred credentials of the last Enterprise Manager user who edited the corrective action. For this reason, if a user attempts to edit the corrective action that a first user initially specified, Enterprise Manager requires this second user to specify the credentials to be used for that corrective action.

Returning Error Codes from SQL Script Jobs

The SQL Script job internally uses SQL*Plus to run a user's SQL or PL/SQL script. If SQL*Plus returns 0, the job returns a status of Succeeded. If it returns any other value, it returns a job status of Failed. By default, if a SQL script runs and encounters an error, it may still result in a job status of Succeeded, because SQL*Plus still returned a value of 0. To make such jobs return a Failed status, you can use SQL*Plus EXIT to return a non-zero value.

The following examples show how you can return values from your PL/SQL or SQL scripts. These, in turn, will be used as the return value of SQL*Plus, thereby providing a way to return the appropriate job status (Succeeded or Failed). Refer to the SQL*Plus User's Guide and Reference for more information about returning EXIT codes.

Example 1

WHENEVER SQLERROR EXIT SQL.SQLCODE
select column_does_not_exist from dual;

Example 2

-- SQL*Plus will NOT return an error for the next SELECT statement
SELECT COLUMN_DOES_NOT_EXIST FROM DUAL;
 
WHENEVER SQLERROR EXIT SQL.SQLCODE;
BEGIN
  -- SQL*Plus will return an error at this point
  SELECT COLUMN_DOES_NOT_EXIST FROM DUAL;
END;
/
WHENEVER SQLERROR CONTINUE;

Example 3

variable exit_code number;
 
BEGIN
 DECLARE
 local_empno number(5);
 BEGIN
  -- do some work which will raise exception: no_data_found
  SELECT 123 INTO local_empno FROM sys.dual WHERE 1=2;
 EXCEPTION
  WHEN no_data_found THEN
    :exit_code := 10;
  WHEN others THEN
    :exit_code := 2;
  END;
 END;
/
exit :exit_code;

Creating a Multi-task Job

The basic process for creating a multi-task job is the same as described in Creating an OS Command Job. The following sections provide supplemental information specific to multi-task jobs:

Job Capabilities

Multi-task jobs enable you to create complex jobs consisting of one or more distinct tasks. Because multi-task jobs can run against targets of the same or different type, they can perform ad hoc operations on one or more targets of the same or different type.

The Job System's multi-task functionality makes it easy to create extremely complex operations. You can create multi-task jobs in which all tasks run on a single target. You can also create a multi-task job consisting of several tasks, each of which has a different job type, and with each task operating on separate (and different) target types. For example:

  • Task 1 (OS Command job type) performs an operation on Host 1.

  • If Task 1 is successful, run Task2 (SQL Script job type) against Database 1 and Database 2.

Specifying Targets for a Multi-task Job

You can run a multi-task job against any targets for which jobs are defined that can be used as tasks. Not all job types can be used as tasks.

The Target drop-down in the General page enables you to choose between running the job against the same targets for all tasks, or different targets for different tasks. Because each task of a multi-task job can be considered a complete job, when choosing the Same targets for all tasks option, you add all targets against which the job is to run from the General page. If you choose the Different targets for different tasks option, you specify the targets (and required credentials) the tasks will run against as you define each task.

After making your choice from the Target drop-down, you then select the targets to run the job against by clicking Add in the Targets section.

Adding Tasks to the Job

You can use the Tasks page to:

  • Add, delete, or edit tasks of various job types

  • Set task condition and dependency logic

  • Add task error handling

You must define at least two tasks in order to set Condition and Depends On options. Task conditions define states in which the task will be executed. Condition options include:

  • Always — Task is executed each time the job is run.

  • On Success — Task execution Depends On the successful execution of another task.

  • On Failure — Task execution Depends On the execution failure of another task.

The Error Handler Task is often a "clean-up" step that can undo the partial state of the job. The Error Handler Task executes if any task of the multi-task job has an error. Errors are a more severe form of failure, usually meaning that the job system could not run the task. Failures normally indicate that the task ran, but failed. The Error Handler Task does not affect the job execution status. Use the Select Task Type page to specify the job type of the task to be used for error handling.

Viewing and Analyzing Job Status

Viewing the Aggregate Status of All Jobs

After you submit jobs, the status of all job executions across all targets is automatically rolled up and available for review on the Enterprise Summary page. Figure 10-2 shows the Jobs section at the bottom of the Enterprise Summary page.

Figure 10-2 Summary of Target Jobs on the Enterprise Summary Page


Summary of Target Jobs on the Enterprise Summary Page

This information is particularly important when you are examining jobs that execute against hundreds or thousands of systems. You can determine the job executions that have failed. By clicking the number associated with a particular execution, you can drill down to study the details of the failed jobs.

Viewing the General Status of a Particular Job

To find out general status information for a particular job or jobs you have submitted, search for them in the Job Activity page, shown in Figure 10-1.

Viewing the Status of Job Executions

You can view detailed information about a single execution or multiple executions. A single execution can have a single step or multiple steps.

To view the status of executions:

  1. From the Job Activity page, click the Name link for the job of interest. The Job Execution page displays.

  2. In addition to displaying the job execution summary, you can drill down on specific tasks for further information.

Switching to Enhanced View

Beginning with Cloud Control version 12.1.0.4, you can optionally invoke a view of job runs that combines the views of several drill-downs on one page. To enable the enhanced view, execute the following command:

emctl set property -name oracle.sysman.core.jobs.ui.useAdfExecutionUi -value true

To revert to the standard view, execute the following command:

emctl set property -name oracle.sysman.core.jobs.ui.useAdfExecutionUi -value false

Note:

These commands do not require you to restart the OMS.

Viewing the Enhanced Status of a Job Run with a Single Execution

A single execution can have a single step or multiple steps. To view the status of a single execution:

Viewing the Enhanced Status of a Job Run with Multiple Executions

To view the status of multiple executions:

  1. From the Job Activity page, click the Name link or Status link for a job containing multiple executions.

    The Job Execution page appears.

  2. Click on an execution of interest in the left table.

    The details for the particular execution appears on the right side of the page.

Generating Job Event Criteria

The job system publishes status change events when a job changes its execution status, and these events have different severities based on the execution status.

Use the Job Event Generation Criteria page to set up targets for job event notifications. This page enables you to decide about the jobs or targets or statuses for which you want to raise events or notifications. This ensures that users raise only useful events. Any settings you make on this page do not change the job behavior whatsoever. You can set up notifications on job events through incident rule sets.

To access this page, from the Setup menu, select Incidents, then Job Events.

Figure 10-3 Job Event Generation Criteria Page


Job Event Generation Criteria Page

Enabling Events For Job Status, Status Severity, and Targetless Jobs

To enable events for job status and targetless jobs, do the following:

  1. Ensure that you have Super Administrator privileges to select the job status for which you want to generate events.

  2. Ensure that you are an administrator with View Target privileges to add targets for which you want to generate events for the job status set by the Super Administrator.

  3. Log into Cloud Control as a Super Administrator.

  4. From the Setup menu, select Incidents and then select Job Events. The Job Event Generation Criteria Page is displayed.

  5. In the Job Event Generation Criteria page, do the following:

    1. In the "Enable Events for Job Status"region, select the statuses for which you want to publish events.

    2. In the "Enable Events for Status Severity" region, select whether you want to enable events for a critical status, informational status, for both.

    3. In the "Enable Events for Jobs Without Target(s)" section, select Yes if you want to create events for jobs that are not associated with any target.

    4. In the "Events for Targets" section, click Add to add targets for which you want the job events to be enabled.

  6. Click Apply.

Adding Targets To Generate Events For Job Status

After a Super Administrator selects events for which job status will be published, administrators can add targets to generate events. To add targets to generate events for job status, do the following:

  1. Ensure that you are an administrator with View Target privileges to add targets for which you want to generate events for the job status set by a Super Administrator.

  2. Log into Cloud Control as an administrator.

  3. From the Setup menu, select Incidents and then select Job Events. The Job Event Generation Criteria Page is displayed.

  4. In the Job Event Generation Criteria page, do the following:

    1. In the Events For Job Status And Targetless Jobs section, you can view the status for which events can be published. You can also see if events have been enabled for targetless job filters.

    2. In the Events For Targets section, click Add to add targets for which you want the job events to be enabled. You can also remove targets for which you do not want the job events to be enabled by clicking Remove.

      Note:

      Your selected settings in the Events for Targets section are global. Adding or removing targets for events also affect other Enterprise Manager users.

  5. Click Apply.

Creating Event Rules For Job Status Change

Enterprise Manager enables you to create and apply rules to events, incidents, and problems. A rule is applied when a newly created or updated event, incident, or problem matches the conditions defined in the rule. The following sections explain how to create event rules for job status change events:

Creating Job Status Change Event Rules For Jobs

To create job status change event rules for jobs, do the following:

  1. Ensure that the relevant job status is enabled and required targets have been added to job event generation criteria.

  2. Ensure that you have administrator privileges to create event rules for job status change events.

  3. Log into Cloud Control as an administrator.

  4. From the Setup menu, select Incidents and then Incident Rules. The Incident Rules Page appears.

  5. In the Incident Rules page, click Create Rule Set to create rule sets for incidents.

  6. Specify the Name, Description, and select Enabled to enable the rule set. Select Type as Enterprise if you want to set the rule for all Enterprise Manager users or Private if you want to set the rule for a specific user only. Select Applies to Job.


    Create Rule Set Page

    In the Job tab, click Add to add jobs for which you want to create event rules.

  7. In the Add Jobs dialog box, if you select the job By Pattern, provide Job name like and select the Job Type. Specify Job owner like. For the Specific jobs choice, select the job. Click OK.

  8. In the Rules tab, click Create.

    In the Select Type of Rule to Create dialog box that appears, you can select from the following choices according to the rule set you want to create:

    • Incoming events and updates to events to receive notification or create incidents for job rules. If you are operating on events (for example, if you want to create incidents for incoming events, such as job failed, or notify someone), choose this option.

    • Newly created incidents or updates to incidents receive notifications or create rules for incidents even though the events for which incidents are generated do not have associated rules. If you are operating on incidents already created or newly created (for example, you want to direct all incidents related to a group, say foo, to a particular user or escalate all incidents open for more than 3 days), choose this option.

    • Newly created problems or updates to problems to receive notifications or create rules for problems even though the incidents for which problems are generated do not have associated rules. This option does not apply for jobs.

  9. Select Incoming events and updates to events, and in the Create New Rule: Select Events page, do the following:

    1. Select By Type to Job Status Change. Select All events of type Job Status Change if you want to take an action for all job state change events for the selected jobs. Select Specific events of type Job Status Change if you only want to act on specific job states. If you have selected Specific events of type Job Status Change, select Job Status for events for which you want to create the rule.

    2. Set the other criteria for which you want to set the rule as displayed in the graphic below.


    This graphic shows the Select Event page.

  10. Select Newly created incidents or updates to incidents if you want to create rules for an incident, though the event associated with the incident does not have notification rules. In the Create New Rule: Select Incidents page, select any of the following:

    • All new incidents and updated incidents to apply the rule to all new and updated incidents

    • All new incidents to apply the rule to all new incidents

    • Specific incidents and then select the criteria for the incidents


    This graphic shows Create New Rule : Select Incidents page

  11. In the Create New Rule: Add Actions page, click Add to add actions to the rule.

  12. In the Add Conditional Actions page, specify actions to be performed when the event matches the rule.

    In the Conditions for actions section, select:

    • Always execute the actions to execute actions regardless of event.

    • Only execute the actions if specified conditions match to execute actions to match specific criteria.

    When adding actions to events, specify the following:

    • Select Create Incident to create an incident for the event to manage and track its resolution.

    • In the Notifications section, specify recipients for notifications in the E-mail To, E-mail Cc, and Page fields who will receive e-mail when the event for which a condition is set occurs. If Advanced and Repeat Notifications options have been set, specify them.

    • In the Clear events section, select Clear permanently if you want to clear an event after the issue that generated the event is resolved.

    • If you have configured event connections, in the Forward to Event Connectors section, you can send the events to third-party event management systems.

    When adding actions to incidents, specify the following:

    • In the Notifications section, specify recipients for notifications in the E-mail To, E-mail Cc, and Page fields who will receive e-mail when the event for which a condition is set occurs. If Advanced and Repeat Notifications options have been set, specify them.

    • In the Update Incident section, specify the details to triage incidents when they occur. Specify Assign to, Set priority to, Set status to, and Escalate to details.

    • In the Create Ticket section, if a ticket device has been configured, specify details to create the ticket.

    Click Continue.

  13. In the Specify Name and Description page, specify a Name and Description for the event rule. Click Next.

  14. In the Review page, verify the details you have selected for the event rule and click Continue to add this rule in the rule set.

  15. On the Create Rule Set page, click Save to save the rule set.

Creating Job Status Change Event Rules For Targets

To create job status change event rules for targets, do the following:

  1. Ensure that the relevant job status is enabled and required targets have been added to job event generation criteria.
  2. Ensure that you have administrator privileges to create event rules for job status change events.
  3. Log into Cloud Control as an administrator.
  4. From the Setup menu, select Incidents, then Incident Rules. The Incident Rules Page is displayed.
  5. In the Incident Rules page, click Create Rule Set to create rule sets for incidents.
  6. Specify the Name, Description, and select Enabled to enable the rule set. Select Type as Enterprise if you want to set the rule for all Enterprise Manager users, or Private if you want to set the rule for a only specific user. Select Applies to Targets.

    incident rules page

    In the Targets tab, select one of the following:

    • All targets to apply to all targets. In the Excluded Targets section, click Add to search and select the target that you want to exclude from the rule set. Click Select.

    • All targets of types to select the types of targets to which you want to apply the rule set.

    • Specific targets to individually specify the targets. Select to Add Groups or Targets to add groups or targets and click Add to search and select the targets to which you want to apply the rule set. Click Select. In the Excluded Targets section, click Add to search and select the target that you want to exclude from the rule set. Click Select.

  7. In the Rules area, click Create.
  8. In the Select Type of Rule to Create dialog box, select from the following choices according to the rule set you want to create:
    • Incoming events and updates to events to receive notifications or create incidents for job rules. If you are operating on events (for example, if you want to create incidents for incoming events, such as job failed, or notify someone), choose this option.

    • Newly created incidents or updates to incidents receive notifications or create rules for incidents even though the events for which incidents are generated do not have associated rules. If you are operating on incidents already created or newly created (for example, you want to direct all incidents related to a group, say foo, to a particular user or escalate all incidents open for more than 3 days), choose this option.

    • Newly created problems or updates to problems to receive notifications or create rules for problems even though the incidents for which problems are generated do not have associated rules. This option does not apply for jobs.

  9. Select Incoming events and updates to events, and in the Create New Rule: Select Events page, do the following:
    • Select By Type to Job Status Change. Select All events of type Job Status Change if you want to take an action for all job state change events for the selected jobs. Select Specific events of type Job Status Change if you only want to act on specific job states. If you have selected Specific events of type Job Status Change, select Job Status for events for which you want to create the rule.


      This graphic shows the Select Event page.

    • Set the other criteria for which you want to set the rule as displayed in the above graphic.

  10. Select Newly created incidents or updates to incidents if you want to create rules for an incident, though the event associated with the incident does not have notification rules. In the Create New Rule: Select Incidents page, select any of the following:
    • All new incidents and updated incidents to apply the rule to all new and updated incidents.

    • All new incidents to apply the rule to all new incidents.

    • Specific incidents and then select the criteria for the incidents.


    Create New Rule : Select Incidents page

  11. In the Create New Rule: Add Actions page, click Add to add actions to the rule.
  12. In the Add Conditional Actions page, specify actions to be performed when the event matches the rule.

    In the Conditions for actions section, select:

    • Always execute the actions to execute actions regardless of event.

    • Only execute the actions if specified conditions match to execute actions to match specific criteria.

    When adding actions to events, specify the following:

    • Select Create Incident to create an incident for the event to manage and track its resolution.

    • In the Notifications section, specify recipients for notifications in the E-mail To, E-mail Cc, and Page fields who will receive e-mail when the event for which a condition is set occurs. If Advanced and Repeat Notifications options have been set, specify them.

    • In the Clear events section, select Clear permanently if you want to clear an event after the issue that generated the event is resolved.

    • If you have configured event connections, in the Forward to Event Connectors section, you can send the events to third-party event management systems.

    When adding actions to incidents, specify the following:

    • In the Notifications section, specify recipients for notifications in the E-mail To, E-mail Cc, and Page fields who will receive e-mail when the event for which a condition is set occurs. If Advanced and Repeat Notifications options have been set, specify them.

    • In the Update Incident section, specify the details to triage incidents when they occur. Specify Assign to, Set priority to, Set status to, and Escalate to details.

    • In the Create Ticket section, if a ticket device has been configured, specify the details to create the ticket.

    Click Continue.

  13. In the Specify Name and Description page, specify a Name and Description for the event rule. Click Next.
  14. In the Review page, verify the details you have selected for the event rule and click Continue to add this rule in the rule set.
  15. On the Create Rule Set page, click Save to save the rule set.

Using Diagnostic Tools

The following sections provided procedures for these diagnostic topics:

Enabling Job Logging

You can enable and disable object logging for diagnostic purposes. By default, only warning level and above are captured.

To enable job logging for a scheduled job:

  1. From the Enterprise menu of the Cloud Control console, select Job, then Activity.
  2. In the Top Activity table, click the Name link for the job you want to log.
  3. In the Execution page that appears, click Debug from the Actions menu. Debug logging occurs for the selected job while the job is running.

    A confirmation message appears that states "Successfully enabled logging at DEBUG level."

After the job execution completes, the 'Debug' option under the 'Actions' menu is automatically disabled.

Viewing Job Logging

If there is user-visible logging for a particular job, you can view the job execution log by doing the following:

  1. In the Top Activity page, click the Name link of the job for which you want to view the log. The Job Execution page displays.

  2. The job output log is displayed

You can also access job logging from the Execution page by clicking the link that appears in the Status column of the Job Run page, then clicking Log Report on the Execution page as shown below.

To view the log for a job step, do the following:

  1. In the Top Activity page, click the link of the job for which you want to view the log.
  2. In the Job Run table, click the link that appears in the Status column for the step you want to examine.

    The Output Log appears for the step.

Debugging a Failed Job

If an execution fails and you had not previously set debug as the logging level, you can choose to set the debug level when you retry the execution. For new job executions, you can set logging at the debug level in advance by clicking the Debug button. The Object Logging field indicates whether logging is enabled at the debug level.

Perform the following procedure if you encounter a job that fails.

  1. View the job steps that failed.
  2. Check the output for the failed step(s). Aggregated job output is displayed for all steps, and also for specific steps.
  3. If the output does not contain the reason for the failure, view the logging output. You may also want to check for any incidents that have occurred while the job was running.
  4. Determine the cause of the failure and fix the problem.
  5. Enable debug mode, then resubmit the job.

    Note that the checkbox for Debug mode only appears on the confirmation page if the earlier execution was not in Debug mode. If the earlier execution was already in Debug mode, the retried execution is automatically in Debug mode.

Checking for Incidents Related to a Failed Job

It is possible for a job to fail because of an internal code error, a severe scaling issue, or other Enterprise Manager issue for which you may be able to investigate an incident or event trail. For example, if the OMS bounced because all Job Workers were stuck, this would cause many jobs to fail. If the loader were failing, that could also cause some jobs to fail.

  1. Check for incidents or alerts in the time-frame of the job.

    • To check for incidents, select Summary from the Enterprise menu of the Cloud Control console, then view the Incidents section of the Enterprise Summary page.

      For more detailed information, select Monitoring from the Enterprise menu, then select Support Workbench.

    • To check for alerts,

  2. Submit a service request with the related incident(s) or event data.

    All step output, error output, logging, remote log files, and incident dump files for a given job are captured for an incident.

    • To submit a service request, select My Oracle Support from the Enterprise menu of the console, then select Service Requests.

    • To create a technical SR, click Create SR on the Service Requests Home page. To create a contact us SR, click Create "Contact Us" SR at the top of the Contact Us Service Requests region, or click Contact Us at the top of any My Oracle Support page. If you are creating a technical service request, depending on the Support IDs registered in your profile, you can create hardware or software SRs, or both.

      The Create Service Request wizard guides you through the process of specifying product information and attaching configuration information to the SR when it is filed with Oracle Support. To ensure that Oracle Support has the most accurate target information, select the Configuration tab in the What is the Problem? section of Step 1: Problem, then select a target.

  3. Apply a patch that support provides.

    • Select Provisioning and Patching from the Enterprise menu of the console, then select Patches & Updates.

    • Provide login credentials, then click Go.

    • Access the online help for assistance with this page.

  4. Try to submit the job again after applying the patch.

To package an incident or manually trigger an incident:

  1. Access Support Workbench.
  2. Gather all job-related dumps and log files, as well as other data from the same time, and package it for Support.
  3. Review the incident-related data in Support Workbench, searching for relevant errors.
  4. If you determine the root cause without support intervention, fix the job and resubmit it.

Packaging an Incident Generated by a Job Step

Incidents (and the problems that contain them) are not packaged by default. You will need to package the problems of interest or concern in Support Workbench. You can choose whether to package all problems or only a portion thereof.

Note:

If a job with remote log files is involved in an incident, the remote files are automatically included in the incident as part of packaging. For more information on remote log files, see Viewing Remote Log Files.

To package an incident generated by a job step:

  1. On the Log Report page, click the Incident ID link.
  2. Click the Problem Key link on the Support Workbench Incident Details page.
  3. In the Support Workbench Problem Details page, either click the Package the Problem link or the Quick Package button as shown below, then follow the instructions in the Quick Packaging wizard and online help.

Viewing Remote Log Files

Some jobs or Provisioning Adviser Framework (PAF) procedures run external commands, such as DBCA or the installer. These commands generate their own log files local to the system and Oracle home where they ran.

To view remote log files:

  1. In the Top Activity page, click the link of the job for which you want to view the log.
  2. In the Job Run table, click Log Report.
  3. In the Log Report page, click the Remote Log Files link.
  4. Specify host credentials, then click OK.

    The Remote File Viewer appears and displays the file contents.

Diagnosing Problems with Cloud Control Management Tools

The Cloud Control management portion of the console provides several tools that can assist you in assessing the current state of the job system and determining a proper course of action for optimum performance. All of these tools are accessible from the Setup menu of the Cloud Control console.

The following sections provide information on each of the available tools.

Health Overview
  1. From the Setup menu of the Cloud Control console, select Manage Cloud Control.

  2. Select the Health Overview sub-menu.

The Job System Status region of the Health Overview page displays the following information:

  • Step Scheduler Status

    The job step scheduler processes the job steps that are ready to run. If the status indicates that job step scheduler is running in warning or error mode, the job system is not functioning normally. In this case, the job system may run in fail-over mode, where the job dispatcher process may also run the task performed by the job step scheduler periodically. However, the job system may be running below its potential capacity, so resolving this situation would be beneficial.

    Several possible messages can appear:

    • DBMS_SCHEDULER job for step-scheduler not found

      This message is very rare and usually indicates a potentially serious issue. The job was likely removed inadvertently or due to some special processing

      (patch installation, for example, that requires recycling all DBMS_SCHEDULER jobs). No automatic resolution is possible here, and this would need to be addressed on a case by case basis.

    • Failure in checking status

      This is a rare occurrence. The error message is usually shown. The error may disappear on its own as this error indicates that the status could not be calculated.

    • DBMS_SCHEDULER is disabled

      All of the DBMS_SCHEDULER jobs are disabled in the environment. This should not occur unless a type of installation is in progress. Resolve this by starting DBMS_SCHEDULER processes.

    • All job queue processes are in use

      The DBMS_SCHEDULER processes have been expended. Increase the parameter job_queue_processes in the repository RDBMS.

    • All slave processes are in use

      The cause is similar to the above case. In this situation, you need to increase MAX_JOB_SLAVE_PROCESSES of the DBMS_SCHEDULER.

    • All sessions are in use

      No RDBMS sessions were available for the DBMS_SCHEDULER. Increase the PROCESSES for the RDBMS.

    • Reason for delay could not be established

      This usually appears because none of the above criteria were met, and is the most common warning. The dispatcher may just be overloaded because there is more work than available workers. Check the backlog in this case. The situation should resolve automatically, but if it persists, the number of workers available for the job system may be insufficient for the load the site experiences.

  • Job Backlog

    The job backlog indicates the number of job steps that have passed their scheduled time but have not executed yet. If this number is high and has not decreased for a long period, the job system is not functioning normally. This situation usually arises if job engine resources are unable to meet the inflow of jobs from system or user activity.

    A high backlog can also happen because of the abnormal processing of specific jobs because they are stuck for extended periods. For more information on stuck job worker threads, do the following:

    1. From the OMS and Repository menu of the Health Overview page, select Monitoring, then Diagnostic Metrics.

    If the jobs system has a backlog for long periods of time, or if you would like to process the backlog faster, set the following parameters with the emctl set property command. These settings assume that sufficient database resources are available to support more load. These parameters are likely to be needed in a Large configuration with 2 OMS nodes.

    Table 10-1 Large Job System Backlog Settings

    Parameter Value

    oracle.sysman.core.jobs.shortPoolSize

    50

    oracle.sysman.core.jobs.longPoolSize

    24

    oracle.sysman.core.jobs.longSystemPoolSize

    20

    oracle.sysman.core.jobs.systemPoolSize

    50

    oracle.sysman.core.conn.maxConnForJobWorkers

    144

    This setting may require an increase of the processes setting in the database of 144 * number of OMS servers.

Repository Home Page
  1. From the Setup menu of the Cloud Control console, select Manage Cloud Control.
  2. Select the Repository sub-menu.

The Repository Scheduler Jobs Status table in the Management Services and Repository page displays the job system purge status and next run schedule.

Management Services and Repository: All Metrics

There are two navigation paths for accessing the All Metrics page:

  • From the Setup menu

  • From the Targets menu

Setup Menu Navigation

  1. From the Setup menu of the Cloud Control console, select Manage Cloud Control.

  2. Select the Health Overview sub-menu.

  3. From the Management Services and Repository page that appears, select Monitoring from the OMS and Repository menu, then select All Metrics.

  4. Scroll down to DBMS Job Status in the left pane, then select a metric.

  5. Scroll down further and expand Repository Job Scheduler Performance.

    Definitions for the available metrics are as follows:

    • Average number of steps marked as ready by the scheduler — Average number of steps processed by the job step scheduler to mark the steps "ready" for execution. This number usually depends on the job system load over a time period.

    • Estimated time for clearing current Job steps backlog (Mins) — Estimated time to clear the backlog assuming the current inflow rate of the job system.

    • Job step backlog — Number of job steps that have passed their scheduled time but have not executed yet. If this number is high and has not decreased for a long period, the job system is not functioning normally. This situation usually arises if job engine resources are unable to meet the inflow of jobs from system or user activity.

    • Latency in marking steps as ready by the scheduler — The job step scheduler moves scheduled steps to ready queue. This metric indicates the average latency in marking the steps to ready queue. High latency means abnormal functioning of the job step scheduler process.

    • Overall job steps per second — Average number of steps the job system executes per second.

    • Scheduler cycles — Frequency of dbms scheduler process. Executes a minimum of 5 cycles per min, and may increase depending on the job system load. A low number usually indicates a problem in the job step scheduler process.

  6. Scroll down further and expand the Usage Summary entries for Jobs, then select the metric for which you are interested.

Target Menu Navigation

  1. From the Targets menu of the Cloud Control console, select All Targets.
  2. In the left pane of the All Targets page, scroll down and expand Internal, then select OMS and Repository.
  3. Click on the OMS and Repository table entry in the page that follows.
  4. From the OMS and Repository menu in the Health Overview page that appears, select Monitoring, then All Metrics.
OMS and Repository: Diagnostic Metrics
  1. From the Setup menu of the Cloud Control console, select Manage Cloud Control.
  2. Select the Health Overview sub-menu.
  3. From the Management Services and Repository page that appears, select Monitoring from the OMS and Repository menu, then select Diagnostic Metrics.

pbs_* metrics are relevant for diagnosing issues in the job system. This information is useful if you are searching for more information on stuck job system threads, or job threads usage statistics to determine outliers preventing other jobs from running.

OMS and Repository: Charts
  1. From the Setup menu of the Cloud Control console, select Manage Cloud Control.
  2. Select the Health Overview sub-menu.
  3. From the Management Services and Repository page that appears, select Monitoring from the OMS and Repository menu, then select Charts.

Assuming that the job had steps, this page shows historical charts for the overall upload backlog, job step backlog, and overall job steps per second.

Management Servers and Job Activity Details Pages
  1. From the Targets menu of the Cloud Control console, select All Targets.
  2. On the left pane under Groups, Systems, and Services, click Management Servers.
  3. Click Management Servers in the Target Name table.

    The Job System region displays a snapshot of job system status and details of processed executions. The Recent Job Executions Summary table displays the total user job executions that are expected to run within a specific time period, the completed count, running count, and the count of executions that are neither completed nor running. This helps you to determine if various user jobs are running as expected in the system.

  4. Click the More Details link below the summary table.
  5. Select the desired time frame in the drop-down for when executions are expected to start.

    Jobs and their status, if any, appear in the table.

  6. Select the Job Dispatchers tab.

    If more than one management server is configured, the page displays the job dispatcher and thread pool utilization information for each management servers.

    • Dispatcher Utilization (%) — Measures how frequently the job dispatcher picks up the job steps. High utilization indicates a heavy job system load.

    • Throughput (steps dispatched/min) — Indicates the average number of steps other than internal steps processed by dispatcher every minute.

    • Thread Pool Utilization — Displays the total number of threads configured for each pool, the average steps selected by the thread pool per minute, and the average number of available threads.

Job System Reports

The job system provides both a diagnostic report and usage report.

Diagnostic Report

  1. From the Setup Enterprise of the Cloud Control console, select Reports.
  2. Select the Information Publisher Reports sub-menu.
  3. Search for job in the Title field, then click Go.
  4. Click the Job System Diagnostic Report link in the table.

This report provides an overview of the job system's health and displays diagnostic information about executing jobs or jobs that are possibly delayed beyond their scheduled time. This information is usually relevant for an Oracle Support engineer diagnosing problems in the job system.

Usage Report

Follow the steps above to access this report, except click the Job Usage Report link in step 4.

This report provides an overview of the job system usage information over the past 7 days.

Job Diagnostics

The Job Diagnostics tool provides an in-depth administrators view into the Job System and it's components.

To access Job Diagnostics:

  1. Log in to the Enterprise Manager console as a user with Super Administrator privileges.

    Note:

    You must have Super Administrator privileges in order to access the Job Diagnostics UI.
  2. From the Setup menu, select Manage Cloud Control and then Job Diagnostics. This Job Diagnostics home page displays.

For more information about the Job Diagnostics tool, see Diagnosing Job System Issues.

Creating Corrective Actions

Corrective actions enable you to specify automated responses to metric alerts and events. Corrective actions ensure that routine responses to metric alerts are automatically executed, thereby saving you time and ensuring problems are dealt with before they noticeably impact end users.

Corrective actions share many features in common with the Job System. By default, a corrective action runs on the target on which the metric alert is triggered. Alternatively, you can specify a corrective action to contain multiple tasks, with each task running on a different target. You can also receive notifications for the success or failure of corrective actions.

Since corrective actions are associated with a target's metric thresholds, you can define corrective actions if you have been granted OPERATOR or greater privilege on the target. You can define separate corrective actions for both Warning and Critical thresholds. Corrective actions must run using the credentials of a specific user. For this reason, whenever a corrective action is created or modified, you must specify the credentials that the modified action runs with.

You define corrective actions for individual metrics for monitored targets. The following sections provide instructions on setting up corrective actions and viewing the details of a corrective action execution:

Privilege and Access Requirements for Corrective Actions

In order to create, edit, delete, or associate a Corrective Action with a specific entity (such as a target, monitoring template, or event rule), you must have the requisite privileges, as shown in the following table.

You want to: Required Privileges
Create, edit, or delete a Corrective Action User creating, editing, or delting the Corrective action must have the CREATE_CA privilege.
Associate a Corrective Action to a target/monitoring template. User associating the Corrective Action with a target/monitoring template must have the CREATE_CA + Any privileges required by the target/monitoring template.
Apply a monitoring template (with an associated Corrective Action) to a target. User applying the monitoring template must have CREATE_CA + Any privileges required by the template and target.
Associate a Corrective Action to an event rule. Rule owner associating a Corrective Action to an event rule must have the CREATE_CA privilege + any privileges required by the Event Rule framework.

Note:

No additional privileges are required to view a Corrective Action.

Sharing Access to Corrective Actions

After you create a Corrective Action, you need to determine the access to corrective actions by other users. You do not need to provide input on the Corrective Action Access page if you do not want to share the corrective action.

Defining or Modifying Access

The table on the Access page shows the access that administrators and roles have to the corrective action. Only the corrective action owner (or Super Administrator) can make changes on this page.

As the corrective action owner, you can do the following:

  • Add other administrators and roles to the table by clicking Add, then selecting the appropriate type in the subsequent page that appears.

  • Change the access of an administrator or role by choosing the Full or View access right in the Access Level column in the table.

  • Remove all access to the corrective action for an administrator or role by clicking the icon in the Remove columns for this administrator or role. All administrators with Super Administrator privileges have the View access right to a corrective action.

If you choose to provide access rights to a role, you can only provide the View access right to the role, not the Full access right.

If you are a Super Administrator, you can:

  • Grant View access to other Enterprise Manager administrators or roles.

  • Revoke all administrator access privileges.

Note:

If a new user is being created, the user should have the CREATE_JOB privilege to create corrective actions.

Access Level Rules

Access level rules are as follows:

  • Super Administrators always have View access for any corrective action.

  • The Enterprise Manager administrator who owns the corrective action can make any access changes to the corrective action (except revoking View from Super Administrators).

  • Super Administrators with a View or Full access level for a corrective action can grant View (but not Full) access to any new user. Super Administrators can also revoke Full and View access from normal users, and Full access from Super Administrators.

  • Normal Enterprise Manager administrators with Full access levels cannot make any access changes on the corrective action.

  • If the corrective action owner performs a Create Like operation on a corrective action, all access privileges for the new corrective action become identical to the original corrective action. If the corrective action owner grants other administrators View or Full access to other administrators, and any of these administrators perform a Create Like operation on this corrective action, all administrators will, by default, have View access on the newly created corrective action.

Creating Corrective Actions for Metrics

For any target, the Metric and Collection Settings page shows whether corrective actions have been set for various metrics. For each metric, the Corrective Actions column shows whether Critical and/or Warning severities of corrective actions have been set.

  1. From any target's home page menu, select Monitoring, then Metric and Collection Settings. The Metric and Collection Settings page appears.

    Tip:

    For instance, on the home page for a host named dadvmn0630.myco.com, you would select the Host menu, then Monitoring, then Metric and Collection Settings.

  2. Click the pencil icon for a specific metric to access the Edit Advanced Settings page for the metric.
  3. In the Corrective Actions section, click Add for the metric severity (Warning and/or Critical) for which you want to associate a corrective action.
  4. Select the task type on the Add Corrective Actions page, then click Continue.
    • If you want to use a corrective action from the library, select From Library as the task type. Using a library corrective action copies the description, parameters, and credentials from the library corrective action. You must still define a name for the new corrective action. You can provide corrective action parameters if necessary.

    • If you want to create a corrective action to store in the library, see Creating a Library Corrective Action.

    • If you want to provide an Agent-side response action, select Agent Response Action as the task type. See Providing Agent-side Response Actions for more information.

  5. On the Corrective Action page, provide input for General, Parameters, and Credentials as you would similarly do when creating a job.
  6. Click Continue to save the corrective action and return to the Edit Advanced Settings page, where your corrective action now appears.
  7. Optional: To prevent multiple instances of a corrective action from operating simultaneously, enable the Allow only one corrective action for this metric to run at any given time checkbox.

    This option specifies that both Critical and Warning corrective actions will not run if a severity is reported to the Oracle Management Services when an execution of either corrective action is currently running. This can occur if a corrective action runs longer than the collection interval of the metric it corrects; the value of the metric may be oscillating back and forth across one of the thresholds (leading to multiple executions of the same corrective action), or may be rising or falling quickly past both thresholds (in which case an execution of the Warning corrective action may overlap an execution of the Critical corrective action).

    If you do not select this option, multiple corrective action executions are launched under the aforementioned circumstances. It is the administrator's responsibility to ensure that the simultaneous corrective action executions do not conflict.

  8. Click Continue when you have finished adding corrective actions to return to the Metric and Collection Settings page.

    The page shows the corrective action value you have provided for the metric in the Corrective Actions column. Possible values are:

    • None — No corrective actions have been set for this metric.

    • Warning — A corrective action has been set for Warning, but not Critical, alerts for this metric.

    • Critical — A corrective action has been set for Critical, but not Warning, alerts for this metric.

    • Warning and Critical — Corrective actions have been set for both Warning and Critical alerts for this metric. If an Agent-side response action is associated with the metric, the value is also Warning and Critical, since Agent-side response actions are always triggered on either Critical or Warning alert severities.

  9. Continue the process from step 2 forward, then click OK on the Metric and Collection Settings page to save your corrective actions and return to the target page you started from in step 1.

Creating a Library Corrective Action

For corrective actions that you use repeatedly, you can define a library corrective action. After a corrective action is in the library, you can reuse the corrective action definition whenever you define a corrective action for a target metric or policy rule.

  1. From the Enterprise menu, select Monitoring, then Corrective Actions. The Corrective Action Library page appears.
  2. Select a job type from the Create Library Corrective Action drop-down, then click Go.
  3. Define the corrective action as you would for creating a job in Creating Jobs for General and Parameters. For Access, go to the following optional step.
  4. Optional: Select Access to define or modify the access you want other users to have for this corrective action.

    For more information, see Sharing Access to Corrective Actions.

  5. Click Save to Library when you have finished. The Corrective Action Library page reappears, and your corrective action appears in the list.

    You can now create another corrective action based on this one (Create Like button), edit, or delete this corrective action.

You can access this library entry whenever you define a corrective action for a metric severity by selecting From Library as the task type in the Add Corrective Actions page. See step Creating Corrective Actions for Events in Creating Corrective Actions for Metrics, for more information.

Specifying Preferred Credential Type for Corrective Actions

Preferred credentials are used to simplify access to managed targets by storing the login information for those targets in the Management Repository.

When creating a Corrective Action (CA) definition, you can specify which type of preferred credential should be used when running the CA based on the functional nature of the CA. There are two types of preferred credentials you can select when creating the CA definition:

  • Normal
  • Privileged

Graphic shows normals and preferred credential options

Important: Preferred credentials need to be global named credentials.

For more information about preferred credentials, see Preferred Credentials and Global Preferred Credentials.

Which Credentials Will Be Used When a Corrective Action Runs

In order to run a Corrective Action, it must be run by a user with the CREATE_CA privilege. The following table lists scenarios under which Corrective Actions are run and which user credentials are used when the Corrective Action is executed.

Scenario Credentials Used
Corrective Action is run when directly associated with metric settings for a target. The Corrective Action is executed using the privileges of the user who associates the Corrective Action to the target metric threshold.

Corrective Action is run when associated with a monitoring template applied to a target.

Corrective Action is executed using the privileges of the user who associates underlying template to the target.
Corrective Action is associated with a monitoring template that is part of an Administration Group or Template Collections Administration Group. If the Corrective Action is part of a monitoring template/template collection that is automatically applied to an administration group, the preferred credentials of the user who associated the template with the administration group will be used for the corrective action.

Corrective Action is associated with an Event Rule.

Note: This applies to target availability events, metric alerts, compliance events

Corrective Action is executed using the privileges of the Event Rule owner.

Setting Up Notifications for Corrective Actions

Corrective actions are associated with metrics whose alerts trigger them. Any Enterprise Manager administrator with View or higher privileges on a target can receive notifications following the success or failure of a corrective action.

A single incident rule can contain any combination of alert and corrective action states. All metrics and targets selected by the incident rule are notified for the same alert and corrective action states. Therefore, if you want to be notified of corrective action success or failure for one metric, but only on failure for another, you need to use two incident rules. An incident rule can include corrective action states for metrics with which no corrective actions have been associated. In this case, no notifications are sent.

Note:

Notifications cannot be sent for Agent-side response actions, regardless of the state of any incident rules applied to the target.

To create incident rules for notifications:

  1. From the Setup menu, select Incidents, then Incident Rules.

  2. Click Create Rule Set. The Create Rule Set wizard appears.

  3. Provide the requisite information at the top of the Create Rule Set page, then select one of the target choices in the Targets sub-tab, supplying additional information as needed for the "All targets of types" and "Specific targets" choices.

  4. Select the Rules sub-tab, then click Create.

  5. In the pop-up that appears, select the default Incoming events and update to events choice, the click Continue.

  6. On the Select Events page, enable the Type checkbox, then select Metric Alert.

  7. Click the Specific events of type Metric alert radio button, then click Add in the table that appears.

  8. In the pop-up that appears, select the Target Type, filter and select the metric, select a severity, then enable the desired corrective action status. Click OK.

  9. From the Add Actions page, click Add.

  10. Specify recipients in the Basic Notifications section of the Add Conditional Actions page.

  11. Proceed through the final two pages of the wizard, then click Continue. Your new rule appears in the Create Rule Set page.

  12. Click Save to save this rule.

After you have created one or more rule sets, you need to set up notification methods as follows:

  1. From the Setup menu, select Notifications, then Notification Methods.
  2. From the Notification Methods page, select Help, then Enterprise Manager Help for assistance on providing input for this page.

Providing Agent-side Response Actions

Agent-side response actions perform simple commands in response to an alert. When the metric triggers a warning or critical alert, the Management Agent automatically runs the specified command or script without requiring coordination with the Oracle Management Service (OMS). The Agent runs this command or script as the OS user who owns the Agent executable. Specific target properties can be used in the Agent response action script.

Note:

Use the Agent-side Response Action page to specify a single command-line action to be executed when a Warning or Critical severity is reached for a metric. For tasks that require alert context, contain more complex logic, or require that notifications be sent on success or failure, corrective actions should be used instead of an Agent-side response action.

To access this page, follow steps 1 through 4 in Creating Corrective Actions for Metrics.

Specifying Commands and Scripts

You can specify a single command or execute a script. You cannot specify special shell command characters (such as > and <) as part of the response action command. If you must include these types of special characters in your response action commands, you should use them in a script, then specify the script as the response action command.

If using a script, make sure the script is installed on the host machine that has the Agent. If using shell scripts, make sure the shell is specified either in the Response Action command line:

Script/Command: /bin/csh myScript

... or within the body of the script itself:

Script/Command: myScript

... where myScript contains the following:

     !#/bin/csh<
     <rest of script>
Using Target Properties in Commands

You can use target properties in a command. Click Show Available Target Properties to display target properties you can use in the Script/Command field. The list of available target properties changes according to the type of target the response action is to run against.

Use Target Properties as command-line arguments to the script or command, then have the script reference these command-line arguments. For example, to use the %OracleHome% and %SID% target properties, your command might appear as follows:

     /bin/csh MyScript %OracleHome% %SID%

.... and your script, MyScript, can reference these properties as command-line arguments. For example:

     IF $1 = 'u1/bin/OracleHome' THEN...

Target properties are case-sensitive. For example, if you want to access the Management Agent's Perl interpreter, you can specify %perlBin%/perl <my_perl_script> in the Script/Command field.

Using Advanced Capabilities

You can get other target properties from the target's XML file in the OracleHome/sysman/admin/metadata directory, where OracleHome is the Oracle home of the Management Agent that is monitoring the target. In the XML file, look for the PROP_LIST attribute of the DynamicProperties element to get a list of properties that are not listed in the targets.xml entry for the target.

The following example is an excerpt from the hosts.xml file:

<InstanceProperties>
 	<DynamicProperties NAME="Config" FORMAT="ROW"
 		PROP_LIST="OS;Version;OS_patchlevel;Platform;Boottime;IP_address">
   	<ExecutionDescriptor>
   		<GetTable NAME="_OSConfig"/>
   		<GetView NAME="Config" FROM_TABLE="_OSConfig">
   			<ComputeColumn NAME="osName" EXPR="Linux" IS_VALUE="TRUE"/>
   			<Column NAME="osVersion"/>
   			<Column NAME="osPatchLevel"/>
   			<Column NAME="Platform"/>
   			<Column NAME="Boottime"/>
   			<Column NAME="IPAddress"/>
   		</GetView>
   	</ExecutionDescriptor>
   </DynamicProperties>
   <InstanceProperty NAME="Username" OPTIONAL="TRUE" CREDENTIAL="TRUE">
   	<ValidIf>
   		<CategoryProp NAME="OS" CHOICES="Linux"/>
   	</ValidIf>
   	<Display>
   		<Label NLSID="host_username_iprop">Username</Label>
   	</Display>
   </InstanceProperty>
   <InstanceProperty NAME="Password" OPTIONAL="TRUE" CREDENTIAL="TRUE">
   	<ValidIf>
   		<CategoryProp NAME="OS" CHOICES="Linux"/>
   	</ValidIf>
   	<Display>
   		<Label NLSID="host_password_iprop">Password</Label>
   	</Display>
   </InstanceProperty>
</InstanceProperties>

Viewing the Details of a Corrective Action Execution

There are two methods of displaying the outcome of a corrective action execution.

  • Incident Manager method

    1. From the Enterprise Manager Cloud Control console Enterprise menu, select Monitoring, then Incident Manager.

    2. Click the Search icon, select Events from the Type drop-down, then click Get Results.

    3. Double-click the message of interest in the search results table.

      The Corrective Action History table now appears at the bottom of the page.

    4. Select the desired message in the history table, then click the glasses icon as shown below.


      Shows CA history table with first entry selected and cursor on the glasses icon.

      The Corrective Action Execution page now appears, which displays the output of the corrective action, status, start time, end time, and so forth.

  • All Metrics method

    1. From the target's home page, select Monitoring, then All Metrics.

    2. From the tree panel on the left, click the desired metric name.

      A row for the metric alert now appears in the Metric Alert History table.

    3. Click the glasses icon in the Details column as shown below.

      The Incident Manager Event Details page now appears.

    4. In the Corrective Action History table at the bottom of the page, select the message in the history table, then click the glasses icon.

      The Corrective Action Execution page now appears, which displays the output of the corrective action, status, start time, end time, and so forth.

Diagnosing Job System Issues

Job Diagnostics gives you an in-depth view into the job system. Using intuitive dashboards, Job Diagnostics provides an administrator view of the job system to diagnose problems and resolve job system performance issues.

For more information about the job system, see Utilizing the Job System and Corrective Actions.

Typical Job System Issues

Below are some of the top issues that can affect Job System performance:

  • Agent is Down, Unknown, or Suspended in Blackout
  • Agent is overloaded resulting in excessive job retries (Metric Extensions can often cause this)
  • Priority jobs are getting starved due to failing System Retry Jobs
  • DB session hang due to repository background process deadlocks
  • OMS UI console to PBS communication failure
  • Corrective Actions trigger too frequently due to incorrect metric threshold settings
  • User-suspended jobs are locking resources
  • Long running jobs are blocking common Job System resources, thus preventing new jobs from running
  • Jobs backlog due to stuck head of the queue

The job diagnostics dashboard enables administrators to easily identify the above issues, diagnose the root cause and take appropriate action..

Job System Components

The Enterprise Manager Job System is an OMS subsystem and includes a Job Scheduler and Job Workers. In turn, the Job Scheduler consists of two components: the Job Step Scheduler and the Job Dispatcher. In addition to user-submitted jobs, the majority of the background tasks in Enterprise Manager are run via a series of jobs. Typical tasks carried out by these jobs are loading metric data, calculating the availability of composite targets, rollup and purge of metric data and notifications.

Performance of the Job System relies on numerous components to perform optimally. Job Diagnostics consolidates performance information pertaining to these components into intuitive dashboards for easy comparison and analysis. The primary components of the Job system are shown in the following illustration.


Graphic shows the Job System architecture.

Job System Components Used by Job Diagnostics

  • Job Step Scheduler – The Job Step Scheduler is a global component so there is only one per Enterprise Manager environment. It is scheduled to run by the DBMS Scheduler. The primary purpose of this component is to mark steps ready for the dispatcher to execute.

  • Job Dispatcher - The Enterprise Manager Job system also has a notion of a short jobs (user jobs that complete quickly) and long jobs (user jobs that run a long time) and has separate worker pools in the OMS (not in the database as with the job workers) to handle those requests. The Job Dispatcher runs locally on each OMS and its purpose is to dispatch the jobs found by the Job Step Scheduler to the Job Workers. If the dispatcher cannot keep up with the work in the queue, the backlog increases. This is not a problem as long as the backlog is temporary. If it is not, then either the dispatcher is not able to keep up with the amount of work which could mean adding another OMS server or there is a problem with the Job Workers and they are not able to accept the work from the dispatcher.

  • Job Workers – Job Workers take work for a given job step from the Job Dispatcher and process it. This can happen while holding a thread for steps that do processing in java, by contacting the repository for those that use SQL, or by contacting the agent for those that run remotely. If Job Workers are always busy and never free, then capacity needs to be added either via another OMS server or by increasing the number of Job Workers and potentially increasing the number of DB connections (each Job Worker takes a connection to the database).

Accessing Job Diagnostics

  1. Log in to the Enterprise Manager console as a user with Super Administrator privileges. Note: You must have Super Administrator privileges in order to access the Job Diagnostics UI.
  2. From the Setup menu, select Manage Cloud Control and then Job Diagnostics. This Job Diagnostics home page displays.

Home (Overview) Dashboard

From the Job Diagnostics home page, you can select the following Job System areas to analyze.


Graphic shows the left job component selector region.

  • Home/Dispatchers: Toggle between the Job Diagnostics Home page and the Dispatchers page. See Dispatchers.
  • Retried Jobs: Jobs getting retried several times. This impacts Job System performance by consuming excessive resource. For example, jobs are retried because the agent is down or unreachable.
  • Retried Steps: Steps getting retried.
  • Longest Queues: The job queue ensures that a particular order for the job execution is followed on a particular target. For example, save target, delete target, update properties, etc. Various subsystems of Enterprise Manager use job queues. Queues are generally used by the system jobs. The following table lists common system jobs.

    Table 10-2 Typical System Jobs that use Queues

    Job Name Scheduler Job Name Task
    Agent Ping EM_PING_MARK_NODE_STATUS Keeps track of the health of the host targets in Enterprise Manager.
    Daily Maintenance EM_DAILY_MAINTENANCE This job does the daily repository maintenance tasks such as partition maintenance, stats updates, etc.
    Repository Metrics MGMT_COLLECTION.Collection Subsystem This job shows the amount of work done for the repository metrics.
    Rollup EM_ROLLUP_SCHED_JOB This job indicates the amount of data involved in the rollup job.
  • Jobs Executing: View a list of jobs that have been executing for the selected Time Frame.

Job System Overview

The Overview section displays at-a-glance information about the three main elements of the Job System in addition to a list of all steps processed in the selected time frame:


Graphic shows the Job Diagnostics home page.

Dispatchers

This region shows the status of all dispatchers within your Enterprise Manager environment. Currently there is at most 1 dispatcher per OMS.

Steps Scheduler

The Steps Scheduler marks the steps as ready for execution so that the dispatcher can pick them up for execution.

Book-keeping Steps

Internal Job System steps which help to maintain continuity of the job execution when various subsystems of Enterprise Manager perform specific actions. For example, mark jobs, executions and steps as failed, scheduled or suspended based on various system events such as agent bounce, blackouts, or group changes.

Steps Processed

List of steps that have been processed by the job system in a given time frame. This time frame can be fine-grained (5 minutes to a maximum of 1 day). The graph show steps marked as ready, steps that were executed, and the yellow line displays the backlog of steps. If you see a high level of backlog, this indicates that there may be an issue such as running out of threads.

The table shows the details of all steps that were executed with the selected time frame. Clicking on a step takes you to that step’s Job Activity page where you can view more detailed information.

Retried Jobs

Click Retried Jobs to view the top list of jobs that were retried for the specified time frame and how many times.

For example, if you see the total number of retried job is 51, and you see each job had been retried 100 times, then 5100 job cycles had been used retrying jobs, which can represent a significant amount of system resource.

In the following graphic, you can see the top job SI_NMR, that was retried 100 times before it failed.


Graphic shows the Retried Jobs page.

Clicking on a job takes you to that job’s Job Activity page where you can see which target the job was executed on as well as the output log for that job.


Graphic shows the Job Activity page for the selected Retried Job.

In the above graphic, you can see that NMO is not set up in the Output log. When an agent is installed, you need to execute the root.sh file. This helps the agent to execute an action on the agent for several types of jobs. After reaching maximum limit of 100 retries, the job is moved to Suspended by User status so that the user can perform a correction before moving this job forward.

Longest Queues

Job Queues ensure that a list of jobs is executed in sequential order. Click Longest Queues to view how many job queues there are and the maximum number of scheduled job executions.


Graphic shows the Longest Queues page.

For example, adding a target or deleting one creates several jobs that have to be executed in a particular order. That can be accomplished by adding it to a job queue. In the graphic above, you see this queue has 92 scheduled executions and the status of the job at the top of the queue (Head Job Status) is Agent is not Ready.

Click on a Queue Name to view explicit details for that queue. The Queue Details dialog appears.


Graphic shows the Queue Details dialog.

In the Top Job Types table, you see the job types currently stuck in the queue along with the number of executions. To find out why the jobs in the queue are not getting processed, click on the Head of the Queue job name to go to the Job Activity page for the head job.

On the Job Activity page, you will find specific details about why the job at the head of the queue is causing the backlog. In this case, the current status is Agent is not Ready.

Graphic shows the Job Activity page for the job at the head of the queue.

With this knowledge, you can go to the Target Status page to determine what the problem is with the agent, as shown below.

Graphic shows the Target Status page for the problem agent.

In the above webpage, we see that the target’s status is unknown and agent is blocked with a Plug-in Mismatch, If the agent is blocked and unable to upload or take any requests, all job requests on it will be delayed until the problem is fixed. The solution is to resolve the plug-in mismatch. So, in situations where the status of the agent is Agent is not Ready, you now know that the underlying issue can be a plug-in mismatch (as in this case), agent down, agent blackout or any other issue preventing agent communication. Navigate to the agent home page to determine the root cause. Once resolved, jobs should automatically start running again. There are also cases where a target is logically obsolete but not yet deleted from Enterprise Manager. There is often build up of jobs on such targets. Work with your operations team to finish deleting those targets if possible.

Jobs Executing

Click Jobs Executing to view a summary for all jobs that have successfully executed during the selected time frame.


Graphic shows the Jobs Executing page.

Click on a Job Name to view the Job Activity page for that job.

Dispatchers

As mentioned previously, Job Dispatchers are services that handle dispatching the Job Steps for execution. To view the current status of all Dispatchers in your Enterprise Manager environment (one dispatcher per OMS), select Dispatcher from the drop-down menu.

This is the start of your topic.


Graphic shows the selection of the Dispatchers menu option.

The Dispatcher dashboard displays.


Graphic shows the Dispatchers dashboard.

This dashboard displays the details for all dispatchers for your managed Enterprise environment (1 per OMS). You can click on a specific Dispatcher name to display details about that dispatcher. In addition to Status and Up Since, details for the dispatcher's Thread Pool and the Connection Pool are also shown.

Thread Pool

Thread pools provide a way to scope the resources used by the Job System. For example, the user short pool defaults to 25 threads. This allows each OMS to run up to 25 different user steps marked short running concurrently.

Job steps can be categorized into 5 broad categories:

  • User Short -- End user (short running)
  • User Long -- End user (long running)
  • System Normal—Steps run by system jobs.
  • System Critical—Steps run by system jobs.
  • Internal -- Steps created by the Job System for performing low-level actions like step time outs, grace period timeouts and bookkeeping steps.

Connection Pool (maximum number of connections allowed for the Dispatcher)

There are three categories of connections:

  • Job Worker--for the worker threads of the job system that execute particular steps.
  • Job Receiver—pool of threads to accept asynchronous status and updates from the agent.
  • Job Dispatcher—takes care of the dispatching the steps to various workers for execution.

If the Job Worker percent usage is high, then it means the dispatcher cannot dispatch to all the workers in a timely fashion. In this situation, there could be a resource problem, and the environment could probably benefit from more worker threads. However, do not go beyond doubling the size of the threads. If doubling the number of threads does not seem high enough, contact Oracle as it might be better to add an additional OMS.

In the graph below the Dispatcher Thread Pool and Connection Pool status, you can select for each pool how many steps were executed.


Graphic shows the lower graph area highlighted.

This graph is interactive and allows you to choose the pool for which you want to see information, thus allowing you to see which pools are being used more at a specific point in time.