3 OMS and Repository

The OMS and Repository, Oracle Management Service, OMS Console, and OMS Platform targets expose metrics that are useful for monitoring the Oracle Enterprise Manager Management Service (OMS) and Management Repository.

3.1 Active Agents

This category of metrics provides information on active agents.

3.1.1 Number of Active Agents

The number of active agents in the repository. If this number is 0, then Enterprise Manager is not monitoring any external targets. May be a problem if unexpected.

Data Source

The number of agents whose status is up in the mgmt_current_availability table.

User Action

If no agents are running, determine the reasons they are down, correct if needed and restart. Log files in the agent's $ORACLE_HOME/sysman/log directory can provide information about possible causes of agent shutdown.

3.2 Active Loader Status

This category provides information on Loader usage and performance, including throughput and rows processed in last hour.

3.2.1 Rows Processed in the Last Hour

This is the number of rows processed.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If this number continues to rise over time, then the user may want to consider adding another Management Service or increasing the number of loader threads for this Management Service. To increase the number of loader threads, add or change the em.loader.threadPoolSize entry in the emoms.properties file. The default number of threads is 2. Values between 2 and 10 are common.

3.2.2 Total Loader Runtime in the Last Hour (seconds)

This is the amount of time in seconds that the loader thread has been running in the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If this number is steadily increasing along with the Loader Throughput (rows per hour) metric, then perform the actions described in the User Action section of the help topic for the Loader Throughput (rows per hour) metric. If this number increases but the loader throughput does not, check for resource constraints, such as high CPU utilization by some process, deadlocks in the Management Repository database, or processor memory problems.

3.3 Active Management Servlets

This category of metrics provides information on Active Management Servlets Category.

3.3.1 Notifications Processed

The total number of notifications delivered by the Management Service over the previous 10 minutes. The metric is collected every 10 mins and no alerts will be generated.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the number of notifications processed is continually increasing over several days, then you may want to consider adding another Management Service.

3.3.2 Page Hits (per minute)

This indicates average number of EM console accesses in a minute. The metric is collected every 10 mins and alerts will not be generated.

Data Source

This metric is obtained using the following query of the mgmt_oms_parameters table in the Management Repository.

SELECT value FROM mgmt_oms_parameters 
where name='loaderOldestFile'

User Action

None.

3.4 Agent Status

This category of metrics provides information on the agent status.

3.4.1 Number of Agent Restarts

The number of times the agent has been restarted in the past 24 hrs.

Data Source

Derived by:

(SELECT t.target_name, COUNT(*) down_count
  FROM mgmt_availability a, mgmt_targets t
  WHERE a.start_collection_timestamp = a.end_collection_timestamp
    AND a.target_guid = t.target_guid
    AND t.target_type = MGMT_GLOBAL.G_AGENT_TARGET_TYPE
    AND a.start_collection_timestamp > SYSDATE-1
  GROUP BY t.target_name)

User Action

If this number is high, check the agent logs to see if a system condition exists causing the system to bounce. If an agent is constantly restarting, the Targets Not Uploading Data metric may also be set for targets on the agents with restart problems. Restart problems may be due to system resource constraints or configuration problems.

3.5 Configuration

This category of metrics provides information on configuration.

3.5.1 Number of Administrators

The number of administrators defined for Enterprise Manager.

Data Source

The mgmt_created_users table in the Management Repository.

3.5.2 Number of Groups

The number of groups defined for Enterprise Manager.

Data Source

The mgmt_targets table in the Management Repository.

User Action

If you have a problem viewing the All Targets page, you may want to check the number of roles and groups.

3.5.3 Number of Roles

The number of roles defined for Enterprise Manager.

Data Source

The mgmt_roles table in the Management Repository.

User Action

If you have a problem viewing the All Targets page, you may want to check the number of roles and groups.

3.5.4 Number of Targets

The number of targets defined for Enterprise Manager.

Data Source

The mgmt_targets table in the Management Repository.

User Action

This metric is informational only

3.5.5 Repository Tablespace Used

This is the total number of MB that the Management Repository tablespaces are currently using.

Data Source

The dba_data_files table in the Management Repository.

User Action

This metric is informational only.

3.5.6 Target Addition Rate (Last Hour)

The rate at which targets are being created. The target addition rate should be greatest shortly after EM is installed and then should increase briefly whenever a new agent is added. If the rate is increasing abnormally, you should check for abnormal agent or administrator activity and verify that the targets are useful. Check to see that group creation is not being over utilized.

Data Source

The metric is derived from the mgmt_target table, the current target count - target count at last sampling.

User Action

This metric is informational only.

3.5.7 Total Repository Tablespace

The total MB allocated to the Management Repository tablespaces. This will always be greater than or equal to the space used.

Data Source

The dba_free_space table in the Management Repository.

User Action

This metric is informational only.

3.5.8 User Addition Rate (Last Hour)

The rate at which users are being created. The target addition rate should be low. If the rate is increasing abnormally, you should check for abnormal administrator activity.

Data Source

The metric is derived from the mgmt_created_users table, the current user count - user count at last sampling.

User Action

This metric is informational only.

3.6 DBMS Job Status

This category of metrics provides information on the DBMS job status.

3.6.1 DBMS Job Invalid Schedule

This metric flags a DBMS job whose schedule is invalid. A schedule is marked 'Invalid' if it is scheduled for more than one hour in the past, or more than one year in the future. An invalid schedule means that the job is in serious trouble.

Data Source

The user_schedule_jobs table in the Management Repository.

User Action

None.

3.6.2 DBMS Job Processing Time (% of Last Hour)

The percentage of the past hour the job has been running.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the value of this metric is greater than 50%, then there may be a problem with the job. Check the System Errors page for errors reported by the job. Check the Alerts log for any alerts related to the job.

3.6.3 DBMS Job UpDown

The down condition equates to the dbms_job "broken" state. The Up Arrow means not broken.

Data Source

The broken column is from the all_users table in the Management Repository.

User Action

Determine the reason for the dbms job failure. Once the reason for the failure has been determined and corrected, the job can be restarted through the dbms_job.run command.

To determine the reason the dbms job failed, take the following steps (replacing myjob with the displayed name of the down job):

  1. Copy down the DBMS Job Name that is down from the row in the table. This DBMS Job Name is 'yourDBMSjobname' in the following example.

  2. Log onto the database as the repository owner.

  3. Issue the following SQL statement:

    select dbms_jobname 
      from mgmt_performance_names 
      where display_name='yourDBMSjobname';
    
  4. If the dbms_jobname is 'myjob', then issue the following SQL statement:

    select job
      from all_jobs
      where what='myjob';
    
  5. Using the job id returned, look for ORA-12012 messages for this jobid in the alerts log and trace files and try to determine and correct the problem.

The job can be manually restarted through the following database command:

execute dbms_job.run (jobid);

3.6.4 DBMS Job Throughput Per Second

The number of notifications delivered per second, averaged over the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

This metric is informational only.

3.7 Event Status

This metric category provides information about the repository metrics that track the health of an event system.

Note:

The event system maintains queues in a database queue table (em_event_bus_table) for processing events and its corresponding database view is aq$em_event_bus_table.

3.7.1 Average Event Dequeue Time (Milliseconds)

This metric displays the average time taken to dequeue a message from the event queues.

Data Source

The data for this metric comes from entries in mgmt_system_performance_log where name= 'DequeueTime'

User Action

If the average event dequeue time is consistently higher for more than an hour, then it might indicate that the database queue table (em_event_bus_table) requires maintenance.

3.7.2 Average Event Processing Time (Seconds)

This metric displays the average time taken to apply the incident rules to an event or an incident or a problem.

Data Source

The data for this metric comes from entries in mgmt_system_performance_log where name= 'ProcessingTime'

User Action

If the average event processing time is continually increasing for more than an hour, then remove any unnecessary or out-of-date incident rules. Fix any operational issues present in the Management Repository database. You might have to add an additional Management Service to increase the event system capacity.

3.7.3 Event Queue Query Time (Seconds)

This metric displays the average time taken to query the event bus (aq$em_event_bus_table).

Data Source

The data for this metric comes from entries in mgmt_system_performance_log where name= 'QueryTime'.

User Action

If the average event queue query time is consistently higher for more than an hour, then it might indicate that the database queue table (em_event_bus_table) requires maintenance.

3.7.4 Average Event Queue Wait Time (Seconds)

This metric displays the average time an event or an incident or a problem waits in the event queue before it is picked up for applying incident rules.

Data Source

The data for this metric comes from entries in mgmt_system_performance_log where name= 'Latency'

User Action

If the average latency is continually increasing, then remove any unnecessary or out-of-date incident rules. Fix any operational issues present in the Management Repository database. You might have to add an additional Management Service to increase the event system capacity.

3.7.5 Event System UpDown

This metric displays the status of the event system. If there are events waiting to be processed in the event queues and no events were processed for more than 5 minutes, then the event system is DOWN. Otherwise, the event system is UP.

Data Source

The data for this metric comes from entries in the aq$em_event_bus_table view and the em_event_bus_queues table.

User Action

Restart the Management Service. This might resolve the issue.

3.7.6 Queues With Invalid Listener

If any of the event queues is assigned a nonexisting Management Service, then this metric can have a positive count. Otherwise, it is 0.

Data Source

The data for this metric comes from entries in em_event_bus_queues, em_event_coordinators and mgmt_failover_table.

User Action

If the value stays positive continually for more than half an hour, then restart Oracle Management Service, which might resolve the issue.

3.7.7 Queues With No Listener

If any event queue is not assigned to a Management Service, this metric has a positive value. Otherwise, it is 0.

Data Source

The data for this metric comes from entries in em_event_bus_queues, em_event_coordinators and mgmt_failover_table.

User Action

If the value stays positive continually for at least half an hour, then restart Oracle Management Service, which might resolve the issue.

3.7.8 Total Events Pending

This metric displays the number of events waiting to be processed in event queues.

Data Source

The data for this metric comes from entries in aq$em_event_bus_table, where consumer_name does not start with 'ADM' and msg_state is 'READY'.

User Action

If the value stays high continually for at least half an hour, you might be experiencing an event flood and the issue could be temporary. Fix any operational issues present in the Management Repository database. You might need to add an additional Oracle Management Service to increase the event system capacity.

3.7.9 Total Events Processed (Last Hour)

This metric is a total number of events currently waiting to be processed in event queues.

Data Source

The data for this metric comes from entries in aq$em_event_bus_table, where consumer_name does not start with 'ADM' and msg_state is 'READY'.

User Action

If the value stays high continually, you might experience an event flood. You might have to add an additional Oracle Management Service to increase the event system capacity.

3.8 Event Performance

This metric category provides information about the repository metrics, which track the performance of the event system for each queue type.

  • Queue type 'H' represents high priority queues. High priority queues are used for processing target availability events.

  • Queue Type 'M' represents medium priority queues. Medium priority queues are used for processing events such as metric alerts, which have noninformational severities, such as CRITICAL or WARNING.

  • Queue type 'L' represents low priority queues. Low priority queues are used for processing INFORMATIONAL events.

3.8.1 Event Processing Time (% of Last Hour)

This metric measures the percentage of CPU time elapsed while processing the load from the queues of a specific queue type.

Data Source

The data for this metric comes from entries in mgmt_system_performance_log where name= 'ProcessingTime'.

User Action

If the value stays high continually, you might experience an event flood. You might have to add an additional Oracle Management Service to increase the event system capacity.

3.8.2 Events Processed (Last Hour)

This metric displays the number of events currently waiting to be processed in event system queues of a specific queue type.

Data Source

The data for this metric comes from entries in the mgmt_system_performance_log where name= 'ProcessingTime'.

User Action

If the value stays high continually, you might experience an event flood. You might have to add an additional Oracle Management Service to increase the event system capacity.

3.8.3 Pending Event Count

This metric displays the number of events currently waiting to be processed in event system queues of a specific queue type.

Data Source

The data for this metric comes from entries in aq$em_event_bus_table, where consumer_name starts with a specific queue type value and msg_state is 'READY'.

User Action

If the value stays high continually, you might experience an event flood. You might have to add an additional Oracle Management Service to increase the event system capacity.

3.9 Expired Agent initiated (emctl) Blackouts

This metric category provides information about expired blackouts started from the Enterprise Manager command line utility (emctl).

3.9.1 Blackout Name

This metric displays the name of the blackout

Target Version Collection Frequency
All Versions Every Hour

3.9.2 Agent URL

This metric displays the URL of the Management Agent that is installed on the host.

Target Version Collection Frequency
All Versions Every Hour

3.9.3 Number of Targets affected

This metric displays the number of targets affected by the expired blackout.

Table 3-1 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every Hour

0

Not Defined

Agent with url %emd_url% has an expired blackout (%blackout_name%) with (%target_count%) affected targets.


3.10 Incident

This category of metrics provides information on the Incident target

3.10.1 Alert Log Error Trace File

The alert log error trace file is the name of an associated server trace file generated when the problem generating this incident occurred. If no additional trace file was generated, this field will be blank.

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 5 Minutes

Data Source

The alert log error trace file name is extracted from the database alert log.

User Action

The alert log error trace file name is provided so that the user can look in this file for more information about the problem that occurred.

3.10.2 Alert Log Name

The fully specified (includes directory path) name of the current XML alert log file.

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

This name is retrieved by searching the OMS ADR_HOME/alert directory for the most recent (current) log file.

User Action

The alert log file name is provided so that the user can look in this file for more information about the problem that occurred.

3.10.3 Diagnostic Incident

A diagnostic incident is a single occurrence of a problem (critical error) that occurred in the OMS process while using Enterprise Manager.

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

Text describing a diagnostic incident is extracted from the database alert log, which is an XML file stored in the Automatic Diagnostic Repository (ADR) that stores a chronological list of database messages and errors.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench.

3.10.4 ECID

The Execution Context ID (ECID) tracks requests as they move through the application server. This information is useful for diagnostic purposes because it can be used to correlate related problems encountered by a single user attempting to accomplish a single task.

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

The ECID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. When packaging problems using Support Workbench, the ECID will be used by Support Workbench to correlate and include any additional problems in the package.

3.10.5 Impact

An optional field (may be empty) assessing the impact of the problem that occurred.

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

The impact is extracted from the database alert log.

User Action

This field is purely informational. Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench.

3.10.6 Incident ID

The Incident ID is a number that uniquely identifies a diagnostic incident (single occurrence of a problem).

The following table shows how often the metric's value is collected.

Target Version Collection Frequency
All Versions Every 15 Minutes

Data Source

The incident ID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. Problems are one or more occurrences of the same incident. Using Support Workbench, the incident ID can be used to select the correct Problem to package and send to Oracle. Using the command line tool ADRCI, the incident ID can also be used with the show incident command to get details about the incident.

3.11 Job Dispatcher Performance

This category of metrics provides information on the performance of job dispatcher.

3.11.1 Job Dispatcher Processing Time (% of Last Hour)

The job dispatcher is responsible for scheduling jobs as required. It starts up periodically and checks if jobs need to be run. If job dispatcher is running more than the threshold levels, then it is having problems handling the job load.

Data Source

This is the sum of the amount of time the job has run over the last hour from the mgmt_system_performance_log table in the Management Repository divided by one hour, multiplied by 100 to arrive at the percent.

User Action

This metric is informational only.

3.11.2 Job Steps Per Second

The number of job steps processed per second by the job dispatcher, averaged over the past hour and sampled every 10 minutes.

Data Source

The mgmt_job_execution table in the Management Repository.

User Action

This metric is informational only.

3.12 Metric Collection Errors Cleared (Per Day)

This metric category provides information about the metric collection errors cleared each day.

3.12.1 Number Of Errors Cleared

This metric displays number of metric collection errors cleared for the day.

Target Version Collection Frequency
All Versions Every Day

3.13 Metric Collection Errors Reported (Per Day)

This metric category provides information about the metric collection errors reported each day.

3.13.1 Number Of Errors Reported

This metric displays the number of metric collection errors reported each day.

Table 3-2 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every Day

Not Defined

Not Defined

Number of metric collection error %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold.


3.14 Notification Delivery Performance

This metric category provides information about repository metrics that report various performance measures for notification methods in use. Supported notification methods include:

  • EMAIL

  • OSCMD

  • PLSQL

  • SNMP

  • SNMPV3 (release 12c (12.1.0.4 and later))

  • JAVA

  • TICKET

3.14.1 Average Notification Time (seconds)

This metric displays the average time taken in seconds to deliver the notification from the time this issue was published to Enterprise Manager.

Data Source

  • The data for this metric comes from the em_notify_requests and em_notify_deliveries tables (Release 12c (12.1.0.4 and later))

  • The data for this metric comes from entries in mgmt_system_performance_log where name=method_name||_OMS_SECONDS

User Action

If the average notification time is steadily increasing, verify that the notification methods specified are performing as expected. Remove any unnecessary or out-of-date incident rules.

3.14.2 Notification Processing Time (% of Last Hour)

This metric displays the percentage of CPU (elapsed time) that the notification system was active in sending notifications.

User Action

  • The data for this metric comes from the em_notify_requests and em_notify_deliveries tables (Release 12c (12.1.0.4 and later)

  • The data for this metric comes from entries in mgmt_system_performance_log where name=method_name||DELIVERY_MILLIS (All releases earlier than 12c (12.1.0.4))

User Action

If the notification processing time is steadily increasing, then verify that the notification methods are performing as expected. Remove any unnecessary or out-of-date incident rules.

3.14.3 Notification Processed (Last Hour)

This metric displays the total number of notifications delivered by the Management Service over the past hour.

Data Source

  • The data for this metric comes from the em_notify_requests and em_notify_deliveries tables (Release 12c (12.1.0.4 and later)

  • The data for this metric comes from entries in mgmt_system_performance_log where name=method_name||DELIVERY_MILLIS (All releases earlier than 12c (12.1.0.4))

User Action

If the number of notifications processed is continually increasing over several days, then you might want to consider adding another Management Service.

3.14.4 Pending Notification Count

This metric displays the total number of notifications pending delivery.

Data Source

  • The data for this metric comes from the em_notify_requests and em_notify_deliveries tables (Release 12c (12.1.0.4 and later)

  • The data for this metric comes from aq$em_notify_qtable where consumer_name like 'method_name%' (All releases earlier than 12c (12.1.0.4))

User Action

If the number of notifications pending is continually increasing, then fix any operational issues. You might have to add an additional Management Service to increase the notification system capacity.

3.15 Notification Status

This is a Management Agent metric intended to send out-of-band notifications when the notification system is determined to be in a critical state. Note that the notification system uses a database queue table (em_notify_qtable) for managing its delivery work load and its corresponding database view is AQ$EM_NOTIFY_QTABLE.

3.15.1 Notification UpDown

If one of the following conditions exist, then the metric's value is DOWN. Otherwise, it is UP.

  • If more than 500 notifications are waiting to be completed for more than 1 hour

  • If an advanced notification device has not processed notifications for more than one hour and it has requested for notifications to be retried later

Table 3-3 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 5 Minutes

Not Defined

DOWN

%Message%


Data Source

  • If more than 500 notifications are waiting to be completed for more than one hour, then data is obtained from aq$em_notify_qtable:

    SELECT count(1)
     FROM aq$em_notify_qtable
    WHERE msg_state='READY'
     AND consumer_name not like 'ADM%'
     AND consumer_name not like 'RCA1'
     AND enq_time <= sysdate - (1/24);
    
  • If an advanced notification device has not processed notifications for more than one hour and it has requested for notifications to be retried later, then data is obtained from the em_notify_requeue table:

      SELECT count(1)
            FROM (SELECT device_id, min(insertion_timestamp) as first_ts
                FROM EM_NOTIFY_REQUEUE
               GROUP BY device_id) a
    WHERE a.first_ts > sysdate - (1/24);
    

User Action

  • If the pending notifications count is high, then check if notification methods are performing as expected. Remove any unnecessary or out-of-date incident rules. You might have to add an additional Management Service to increase the notification system capacity.

  • If an advanced notification method is requesting for notifications to be retried later, then fix the underlying issue so that it stops requesting retries.

3.16 Overall Status

This metric category provides information about the overall status of the Management Repository.

3.16.1 Overall Backoff Requests in the Last 10 Mins

This metric displays the number of backoff requests in the last 10 minutes.

Table 3-4 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Not Defined


3.16.2 Overall Upload Backlog (Files)

This metric displays the number of files in the upload backlog.

Table 3-5 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Not Defined


3.16.3 Overall Rows Processed by Loader in the Last Hour

This metric displays the number of rows processed by the data loader in the last hour.

Table 3-6 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Not Defined


3.16.4 Overall Upload Backlog (MB)

This metric displays the size (in MB) of the upload backlog.

Table 3-7 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Not Defined


3.16.5 Unmonitored Targets (%)

This metric displays the percentage of targets that are not monitored.

Table 3-8 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

25

50

Not Defined


3.16.6 Overall Upload Rate (MB/sec)

This metric displays the rate of data upload to the Management Repository.

Table 3-9 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Not Defined


3.17 Pending Monitoring Jobs

This metric category provides information about pending monitoring jobs.

3.17.1 Pending Template Applies

This metric displays the number of pending templates that will apply to your target.

For more information about Monitoring Templates, see Oracle Enterprise Manager Cloud Control Administrator's Guide.

Table 3-10 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Number of pending template apply jobs %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold


3.17.2 Pending Metric Extension Deployments

This metric displays the number of metric extensions pending deployment.

For more information about Metric Extensions, see Oracle Enterprise Manager Cloud Control Administrator's Guide

Table 3-11 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Number of pending metric extension deployment jobs %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold


3.18 Repository Collections Performance

This category of metrics provides information on the performance of repository collections. They are collected by background dbms jobs in the repository database called collection workers. Repository metrics are sub divided into long and short running metrics. These are called task classes (short task class and long task class). Some collection workers (Default 1) process the short task class and some (Default 1) process long task class. Repository collection performance metrics measure the performance data for repository metric collections for each task class. This metric is a repository metric and hence collected by the collection workers.

3.18.1 Average Collection Duration (seconds)

The total amount of time in seconds the collection workers were running in last 10 minutes. This is an indicator of the load on the repository collection subsystem. This could be due to two reasons, the number of collections have increased or some of the metrics are taking a long time to complete. This needs to be related with collections processed metric to find out if number of collections have increased or metrics are taking a long time.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem.

3.18.2 Collections Processed

The total number of collections that were processed in the last 10 minutes.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem

3.18.3 Collections Waiting To Run

The total number of collections that were waiting to run at the point this metric was collected. An increasing value would mean the collection workers are falling behind and would need to be increased. The collections waiting to run could be high initially on system startup and should ideally go down towards zero.

Data Source

The data for this metrics come from entries in mgmt_collection_tasks table which holds all the list of collections.

User Action

This metric is informational only.

3.18.4 Number of Collection Workers

The total number of workers that were processing the collections.

Data Source

The data for this metric come from entries in mgmt_collection_workers table.

User Action

This metric is informational only.

3.18.5 Total Throughput Across Collection Workers

The total number of collections per second processed by all the collection workers.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem.

User Action

This metric is informational only.

3.19 Repository Collection Task Performance

This metric provides information about the performance of repository collection tasks.

3.19.1 Run Duration (Seconds)

This metric displays the run duration (in seconds) for the collection task.

Table 3-12 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 10 Minutes

Not Defined

Not Defined

Run duration for task id/metric %metric_name% of target type %task_target_type% is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold


3.20 Repository Job Dispatcher

This category of metrics provides information on the Repository Job Dispatcher.

3.20.1 Job Step Backlog

The number of job steps that were ready to be scheduled but could not be because all the dispatchers were busy.

When this number grows steadily, it means the job scheduler is not able to keep up with the workload.

User Action

This is the sum of job steps whose next scheduled time is in the past - job steps eligible to run but not yet running. If the graph of this number increases steadily over time, the user should take one of the following actions:

  • Increase the em.jobs.shortPoolSize, em.jobs.longPoolSize and em.jobs.systemPoolSize properties in the web.xml file. The web.xml file specifies the number of threads allocated to process different types of job steps. The short pool size should be larger than the long pool size.

    Property Default Value Recommended Value Description
    em.jobs.shortPoolSize 10 10 50 Steps taking less than 15 minutes
    em.jobs.longPoolSize 8 8 - 30 Stars taking more than 15 minutes
    em.jobs.systemPoolSize 8 8 - 20 Internal jobs (e.g. agent ping)

  • Add another Management Service on a different host.

  • Check the job step contents to see if they can be made more efficient.

3.21 Repository Operation Status

This metric category provides information about the operational status of the Management Repository. Repository metrics are subdivided into long and short running metrics.

3.21.1 Repository Job Scheduler Status

This metric displays the status of the Repository Job Scheduler.

Table 3-13 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 5 Minutes

Not Defined

DOWN

Repository job scheduler is %value%


3.21.2 Long Running Metric Collection Status

This metric displays the status of the long running metric collections.

Table 3-14 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 5 Minutes

PARTIALLY UP

DOWN

Long running metric collection status is %value%


3.21.3 Short Running Metric Collection Status

This metric displays the status of the short running metric collections.

Table 3-15 Metric Summary Table

Target Version Evaluation and Collection Frequency Default Warning Threshold Default Critical Threshold Alert Text

All Versions

Every 5 Minutes

PARTIALLY UP

DOWN

Short running metric collection status is %value%


3.22 Repository Sessions

This category of metrics provides information on the Repository sessions.

3.22.1 Repository Session Count

A count of the number of sessions between the Management Service and Management Repository database.

Data Source

The gv$session system view.

User Action

This metric is informational only.

3.23 Response

This page indicates whether Enterprise Manager is up or down. It contains historical information for periods in which it was down.

3.23.1 Status

This metric indicates whether Enterprise Manager is up or down. If you have configured the agent monitoring the oracle_emrep target with a valid email address, you will receive an email notification when Enterprise Manager is down.

The following table shows how often the metric's value is collected and compared against the default thresholds.

Table 3-16 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

Not Uploaded

=

Not Defined

0

1

%Message%


Data Source

sysman/admin/scripts/emrepresp.pl

User Action

This metric checks for the following:

  • Is the Management Repository database up and accessible?

    If the Management Repository database is down, start it. If 'Invalid Username or Password' error is displayed, verify that the name and password for the oracle_emrep target is the same as the repository owner's name and password.

  • Is at least one Management Service running?

    If a Management Service is not running, start one.

  • Is the Repository Metrics dbms job running?

    If the DBMS job is down or has an invalid schedule, it should be restarted by following the instructions in the User Action section of the help topic for the DBMS Job Bad Schedule metric.

3.24 Service Initialization Errors

This category provides information on any initialization errors encountered by services like loader or events.

3.24.1 Service Status

This metric is generated if any of the OMS services (such as Loader, Notification, or PingRecorder) failed to get initialized during the OMS startup. At present this metric is used only by Loader service.

This metric has two key columns and one non-key columns:

  • The key columns are Management Service and Service Name. The key values uniquely identify the Service instance that has initialization errors.

  • The non-key column is Service Status. This column indicates whether the Service is running fine or encountered an error during OMS startup.