9 Monitor Jobs

The Monitor page displays information about all the Jobs ever started in Data Integration Platform, whether they're running or stopped, already finished successfully or failed.

The Monitor page is a read-only page that gives you an aggregate overview of Jobs through its four pages: Summary for all the jobs, and then pages for these three specific types of jobs created from these tasks: Synchronize Data, Replicate Data and Data Lake Management. Each of these pages has three tiles and a table:

Agent Health

This tile reports whether capture and delivery plugins related to Oracle Data Integration Platform Cloud tasks are running on the data sources. This status is different from the status of the agents displayed on the Agents page.

Oracle Data Integration Platform Cloud has plugins on sources and targets for tasks with capture and delivery. These plugins instantiate the capture and delivery processes on the data sources, allocate port numbers, perform management and create reports. The Agent Health tile gets reports from Data Integration Platform Cloud's agents whether these plugin processes are running or not. If the plugins stop working, then jobs with capture and delivery can't be performed. This tile displays a two-color ring chart: green for Jobs with running plugins and red for stopped ones. The percentage of agents reporting running plugins is displayed at the center of the ring chart. For example, if there are four running plugins and one stopped one, then the center of the ring displays 80%. The number of running and stopped plugins are each displayed adjacent to their relative section of the chart. This chart is not clickable.

The Agent page displays the status of agents, but the Agent Health tile displays if the capture and delivery plugins are stopped or running. They don’t display the same information.

Currently the Agent Health displays information sent from plugins for the Synchronize Data and Replicate Data jobs. All other jobs don't use a plugin. Therefore, the Agent Health doesn't display a correct status for them. For now, use this tile only to review Synchronize Data and Replicate Data jobs.

Currently this tile doesn't distinguish the Replicate Data and Synchronize Data jobs and shows the sum of all information sent for both of these tasks.

The number on the Agent Health tile may be different than the total number of jobs that you have. The number displayed on this tile is equivalent to the number of plugins that you have. These plugins start on your sources and targets after you set up your agents and set up your GG_HOME directories for the Synchronize Data and Replicate Data tasks. For example, if you have two GG_HOME directories, one for source and one for target of a Synchronize Data job, then you have two plugins. So you have one job, but two plugins, and if both are running, you'll see the number 2 in the Agent Health tile with a status of Running.

Job Health

The Job Health tile located in the Monitor page of Oracle Data Integration Platform Cloud displays information for all the Jobs ever run on the server.

The Job Health tile displays a tri-color ring chart: green for running, red for failed and yellow for stopped Jobs. The percentage of running Jobs is displayed at the center of the ring chart. For example, if there are three running Jobs, one stopped and one failed Job, then the center of the ring displays 60%. The number of running, failed and stopped Jobs are each displayed adjacent to their relative section of the chart. This chart is not clickable.

For visual simplicity, Jobs with status of Prepared, Waiting, Running or Successful are all counted in the Running section of the chart, because they're neither Stopped nor Failed.

Top Duration Report

The Top Duration tile in the Monitor page of Oracle Data Integration Platform Cloud displays the top three jobs with the longest duration.

  • For jobs that are currently running, duration is current time minus the job's start time.

  • For jobs that ran successfully and ended without being stopped, it's the job's end time minus its start time.

Failed jobs don't display the correct duration.

  • For a failed job, currently the duration is calculated as current time minus the job's start time, instead of the time that job stopped due to failure minus the job’s start time.

  • For a job that ran successfully, but stopped for a duration in between, the duration is still calculated as the job's end time minus its start time.

For example:

  • Job A runs for an hour (started at 2:00PM)

  • Job A is stopped for 3 hours (between 3:00 and 6:00PM)

  • Job A is restarted (restarted at 6:00PM)

  • Job A finishes successfully after an additional hour (stops successfully at 7:00PM) The duration in this example will display five hours and you won't know that the job ran for two hours and was stopped for three hours.

The duration in this example will display five hours and you won't know that the job ran for two hours and was stopped for three hours.

Top 3 Data Report

The Top 3 Data Report tile in Oracle Data Integration Platform Cloud displays the top three Jobs with highest data volume in the past three days. This tile is available in the Synchronize Data, Replicate Data and Data Lake Management pages of the Monitor page.

Data volume is the number of operations replicated in the past three days in Mega Bytes per second.

Go to the Admin page and then click Data Usage to narrow down your durations and get snapshots for:

  • Total Data Processed (GB)

  • GB of Data Processed per Hour

Top 3 Lag Report

The Top 3 Lag Report tile in Oracle Data Integration Platform Cloud displays the top three running Jobs with the most lag. This tile is only available in the Synchronize Data and Replicate Data pages of the Monitor page.

The bar chart on this tile displays a color for each bar, with the color legend displaying the name of the corresponding Job.

Here, the lag is only for the Start Delivery action in the job detail page of the Synchronize Data and Replicate Data pages.

The Lag displayed in Top 3 Lag report is not the Lag displayed in the job detail page which represents the total log of a job. It is the Start Delivery lag which is an action only in Synchronize Data and Replicate Data jobs.

Job Details

The Job Details section of the Monitor page in Oracle Data Integration Platform Cloud displays all the Jobs ever started.

The Summary section of the Monitor page displays details for all the jobs ever started. You can go to Synchronize Data, Replicate Data and Data Lake Management pages to get a list jobs for these specific tasks.

The Jobs in each of these pages are listed in a table. Click each job to see its full details.

The Type of a job in the Summary page includes jobs created from all tasks including ODI Execution and Data Preparation.

The Status of a job can be prepared, waiting, running, stopped, success or failed.

Job Statuses

Every time you run a Oracle Data Integration Platform Cloud task, a new Job is created.

Job Status Life Cycle

When you click Run in the action menu of a task in the Catalog, you create a Job. A Job goes through many phases before it actually runs. Here is a list of all the states that a Job, or the actions it contains, may be in.

Job status Description

Prepared

The Prepared state transforms the actions in a Job into a correct format, so they can be ready to run.

Waiting

The orchestration service has received a message to run a certain action, but the action is in the Waiting state and has not run yet. It could be, for example, that the runtime engine is overloaded at the time and can't run the action.

Running

The Job or action is running.

Successful

The Job or action has run and finished successfully.

Stopped

The Job has been manually stopped.

Failed

The execution of a Job or one or more of its actions failed.

Heartbeat Lag

The Heartbeat Lag column appears only for Replicate Data jobs. It also requires a few configuration steps to display the right information.

In a Replicate Data Task, the source agent sends periodic signals, called heartbeats, from the source to the target to monitor the lag. The source agent creates these heartbeats by sending an artificial transaction to the target every 10 seconds. These transactions contain timing information that is used for the heartbeats. The agent also creates a heartbeat table in the source and target table to keep track of the heartbeat history.

Heartbeat lag is the difference between the time the source agent sends a signal to and is captured by the target. You can find the heartbeat lag for your job on its Job Details page. By default, this column is not visible. You need to click Customize the Table View (the gear icon), and then select Heartbeat Lag for display.

Configure Hearbeat Lag

For the Heartbeat Lag to work, you must create a Connection with a dba user named ggadmin. This user must have dba privileges on both the source and target database. When you create the Replicate Data Task, use the ggadmin connection with your desired schema.

You can also adjust the Heartbeat interval. The default value is 10 seconds. You can change this value in the agent.properties file. See Set Your Agent Properties.

Example 9-1 agentHeartBeatInterval

# agentHeartBeatInterval
# : This is the interval in seconds at which heart-beat is sent to the server
# Default value : 10
#agentHeartBeatInterval=10
agentHeartBeatInterval=20