3 OMS and Repository

The OMS and Repository target exposes metrics that are useful for monitoring the Oracle Enterprise Manager Management Service (OMS) and Management Repository.

3.1 Active Loader Status

This category of metrics provides information on Active Loader Status per OMS.

3.2 Active Management Servlets

This category of metrics provides information on Active Management Servlets Category.

3.3 Agent Status

This category of metrics provides information on the agent status.

3.4 Cleared Group Security Violations

This category of metrics provides information on the violations on cleared group security.

3.5 Cleared Target Security Violations

This category of metrics provides information on the violations on cleared target security.

3.6 Configuration

This category of metrics provides information on configuration.

3.7 DBMS Job Status

This category of metrics provides information on the DBMS job status.

3.8 Duplicate Targets

This category of metrics provides information on duplicate targets.

3.9 Job Dispatcher Performance

This category of metrics provides information on the performance of job dispatcher.

3.10 New Group Security Violations

This category of metrics provides information on the security violations on new groups.

3.11 New Target Security Violations

This category of metrics provides information on the security violations on new targets.

3.12 No Agents

This category of metrics provides information on no agents.

3.13 Notification Method Performance

This category of metrics provides information on the performance of notification methods.

3.14 Notification Performance

This category of metrics provides information on the performance of notifications.

3.15 Notification Status

This is a Management Agent metric intended to send out of band notifications when the Notification system is determined to be in a critical state.

3.15.1 Average Delivery Time (ms)

This metric should be used in conjunction with Notifications Waiting for the particular notification type to help determine if a notification problem is becoming worse. If the notifications waiting is increasing along with the average delivery time, then the problem is most likely in the delivery, not the number of notifications itself. Delivery time problems can be related to network problems or resource constraints.

Data Source

The data for this metric comes from entries in the mgmt_system_performance_log where name=<method_name>||_TOTAL_DELIVERY_TIME

User Action

If the value is steadily increasing perform the following user actions:

  1. Check the Errors page for errors logged by Notification Delivery.

  2. Check for resource constraints along the notification delivery path� e.g. network errors, email or snmp servers being down etc.

3.15.2 Cleared Group Security Violations

This metric collects the information about the cleared violations on all groups of targets having security policies defined for the member targets. The number of cleared violations will increase with more violations getting rectified. This is used to trend the rate of fix of security policy violations.

Data Source

The data for this metric comes from entries in the mgmt_policies, mgmt_violations and mgmt_flat_target_assoc.

User Action

If the number of cleared violations is static or there are no cleared violations, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.3 Cleared Target Security Violations

This metric collects the information about the cleared violations on all targets having security policies defined for them. The number of cleared violations will increase with more violations getting rectified. This is used to trend the rate of fix of security policy violations.

Data Source

The data for this metric comes from entries in the mgmt_policies, mgmt_violations and mgmt_flat_target_assoc.

User Action

If the number of cleared violations is static or there are no cleared violations, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.4 DBMS Job Bad Schedule

This metric flags a DBMS job whose schedule is invalid. A schedule is marked 'Invalid' if it is scheduled for more than one hour in the past, or more than one year in the future. An invalid schedule means that the job is in serious trouble.

Data Source

The user_jobs.next_time table in the Management Repository.

User Action

If the job schedule is invalid, the DBMS job should be restarted. To do this:

  1. Copy down the DBMS Job Name that is down from the row in the table. This DBMS Job Name is 'yourDBMSjobname' in the following example.

  2. Logon to the database as the repository owner.

  3. Issue the following SQL statement:

    select dbms_jobname 
      from mgmt_performance_names 
      where display_name='yourDBMSjobname';
    
  4. If the dbms_jobname is 'myjob', then issue the following SQL statement:

    select job
      from all_jobs
      where what='myjob' ;
    
  5. Copy down the jobid.

  6. Force the job into the broken state so that it can be restarted by specifying the following DBMS job command and parameters:

    dbms_job.broken(jobid,true)

  7. Verify that the job has been marked as broken by using this SQL statement:

    select what, broken
      from all_jobs
      where broken='Y';
    

    You should see the job in the results.

  8. Once you've verified that the DBMS job is marked broken, restart the job with the following DBMS job command and parameters:

    dbms_job.run(jobid)

3.15.5 DBMS Job Processing Time, % of Last Hour

The percentage of the past hour the job has been running.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the value of this metric is greater than 50%, then there may be a problem with the job. Check the System Errors page for errors reported by the job. Check the Alerts log for any alerts related to the job.

3.15.6 DBMS Job UpDown

The down condition equates to the dbms_job "broken" state. The Up Arrow means not broken.

Data Source

The broken column is from the all_users table in the Management Repository.

User Action

Determine the reason for the dbms job failure. Once the reason for the failure has been determined and corrected, the job can be restarted through the dbms_job.run command.

To determine the reason the dbms job failed, take the following steps (replacing myjob with the displayed name of the down job):

  1. Copy down the DBMS Job Name that is down from the row in the table. This DBMS Job Name is 'yourDBMSjobname' in the following example.

  2. Log onto the database as the repository owner.

  3. Issue the following SQL statement:

    select dbms_jobname 
      from mgmt_performance_names 
      where display_name='yourDBMSjobname';
    
  4. If the dbms_jobname is 'myjob', then issue the following SQL statement:

    select job
      from all_jobs
      where what='myjob';
    
  5. Using the job id returned, look for ORA-12012 messages for this jobid in the alerts log and trace files and try to determine and correct the problem.

The job can be manually restarted through the following database command:

execute dbms_job.run (jobid);

3.15.7 Files Pending Load

The number of files waiting for the loader to process, sampled every 10 minutes.

Data Source

This metric is obtained using the following query of the mgmt_oms_parameters table in the Management Repository.

SELECT value 
  FROM mgmt_oms_parameters 
  where name='loaderFileCount'

User Action

If the Files Pending Load number is increasing steadily over a period of time, you may consider one of these options:

  • Increasing the number of background threads.

  • Adding another Management Service and pointing some of the Management Agents to the new Management Service.

3.15.8 Group Compliance

This metric gives the average of compliance score of policy rules associated with its member targets and self-target itself. The compliance score ranges from 0-100 %. This metric is collected for every 6 hours (360 minutes).

It tells how well the group is compliant with policy rules.

Data Source

The data for the metric comes from entries in the MGMT_POLICY_ASSOC_EVAL_SUMM.

User Action

If the value increases steadily, perform the following:

  1. Check the group�s policy rule data in Policy Violations tab and check for the individual compliance score of the policy rules of the member targets and self-target.

  2. Concentrate on policy rules, which have lesser compliance score and try to resolve the corresponding policy rule violations manually or through automatic corrective actions.

3.15.9 Group Security Compliance

This metric is used to collect the compliance trend of all the groups of targets w.r.t the security policies defined on the member targets. The security compliance score is an indication of the security health of a target. A score of 100 indicates full compliance and a score of 0 indicates no compliance.

Data Source

The data for this metric comes from entries in the mgmt_policy_assoc_eval_summ.

User Action

If the compliance score is reducing continuously, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.10 Group Target Compliance

This metric gives the average of the compliance score of all policy rules associated with each of its member targets and self-target. The metric data is rolled up by each member target so that the user can get the member target-wise compliance score of a group.

It helps to show the trend data on how many targets of the group lies in good compliance score range and how many are in poor compliance score range.

The metric is collected for every 6 hours (360 minutes).

Data Source

The compliance score of policies evaluated is in the mgmt_policy_assoc_eval_summ table.

User Action

If the average compliance score is coming down check the security policy violations in the Security At a Glance page. Identify the violating policies and fix the violation. Details of the violations and their policies can be had from the Policy Violations page.

3.15.11 Group Violations

This metric gives the sum of the violations of all policy rules associated with member targets of the group and self-target. Along with the violations count, it has the violation level also to tell whether it is Critical/Warning/Informational violation. It helps to show the trend overview of group policy violations data. This metric is collected for every 6 hours (360 minutes).

Data Source

The data for the metric comes from entries in the MGMT_POLICY_ASSOC_EVAL_SUMM.

User Action

If the value increases steadily, perform the following:

  1. Check the policy violations of the group target and its member targets.

  2. Give more priority to Critical violations, then warning and informational. Check the policy rules causing the policy violations in policy violations tab page.

  3. Try to resolve the violations through automatic corrective actions or manual actions.

3.15.12 Job Dispatcher Job Step Average Backlog

The number of job steps that were ready to be scheduled but could not be because all the dispatchers were busy.

When this number grows steadily, it means the job scheduler is not able to keep up with the workload.

User Action

This is the sum of job steps whose next scheduled time is in the past - job steps eligible to run but not yet running. If the graph of this number increases steadily over time, the user should take one of the following actions:

  • Increase the em.jobs.shortPoolSize, em.jobs.longPoolSize and em.jobs.systemPoolSize properties in the web.xml file. The web.xml file specifies the number of threads allocated to process different types of job steps. The short pool size should be larger than the long pool size.

    Property Default Value Recommended Value Description
    em.jobs.shortPoolSize 10 10 50 Steps taking less than 15 minutes
    em.jobs.longPoolSize 8 8 - 30 Stars taking more than 15 minutes
    em.jobs.systemPoolSize 8 8 - 20 Internal jobs (e.g. agent ping)

  • Add another Management Service on a different host.

Check the job step contents to see if they can be made more efficient.

3.15.13 Job Dispatcher Processing Time, % of Last Hour

The job dispatcher is responsible for scheduling jobs as required. It starts up periodically and checks if jobs need to be run. If job dispatcher is running more than the threshold levels, then it is having problems handling the job load.

Data Source

This is the sum of the amount of time the job has run over the last hour from the mgmt_system_performance_log table in the Management Repository divided by one hour, multiplied by 100 to arrive at the percent.

3.15.14 Last Error

Timestamp of the latest error for the job.

Data Source

The mgmt_system_error_log table in the Management Repository.

3.15.15 Loader Directory

The directory from which the loader is getting files.

Data Source

This metric is obtained using the following query of the mgmt_oms_parameters table in the Management Repository.

SELECT value 
  FROM mgmt_oms_parameters 
  where name='loaderDirectory'

User Action

If the loader directory is out of space, you may want to look for the error files to investigate the problem.

3.15.16 Loader Name

The unique name of the loader, consisting of the Management Service name separated by a comma from the loader name on that Management Service.

Data Source

The mgmt_system_performance_log table in the Management Repository.

3.15.17 Loader Throughput (rows per hour)

This is the number of lines of XML text processed by the loader thread over the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If this number continues to rise over time, then the user may want to consider adding another Management Service or increasing the number of loader threads for this Management Service. To increase the number of loader threads, add or change the em.loader.threadPoolSize entry in the emoms.properties file. The default number of threads is 2. Values between 2 and 10 are common.

3.15.18 Loader Throughput (rows per second)

This is the number of lines of XML text processed by the loader thread per second averaged over the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

3.15.19 Management Service Status

Shows whether the Management Service is up or down.

Data Source

The mgmt_oms_parameters and mgmt_failover_table tables in the Management Repository.

User Action

If the Management Service is down, start it. Only management services that are down can be deleted.

3.15.20 New Group Security Violations

This metric collects the information about the new violations that have happened on all groups of targets having security policies defined for the member targets. The number of new violations will increase with newer violations and will decrease with the violations getting cleared. This is used to trend the rate of arrival of new security policy violations.

Data Source

The data for this metric comes from entries in the mgmt_policies, mgmt_violations.

User Action

If the number of new violations is increasing continuously, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.21 New Target Security Violations

This metric collects the information about the new violations that have happened on all targets having security policies defined for them. The number of new violations will increase with newer violations and will decrease with the violations getting cleared. This is used to trend the rate of arrival of new security policy violations.

Data Source

The data for this metric comes from entries in the mgmt_policies, mgmt_violations.

User Action

If the number of new violations is increasing continuously, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.22 Notification Delivery Time

The time it took to deliver a notification, averaged over the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the average delivery time is steadily increasing, verify that the notification methods specified are valid. Remove any unnecessary or out of date notification rules and schedules.

3.15.23 Notification Processing Time, % of Last Hour

The percentage of the past hour that Notification delivery has been running.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the average delivery time is steadily increasing, verify that the notification methods specified are valid. Remove any unnecessary or out of date notification rules and schedules.

3.15.24 Notification UpDown

Displays whether the notification DBMS job (which processes severities to determine if notifications are required) is up or down.

Data Source

The user_jobs table in the Management Repository.

User Action

Determine the reason for the DBMS job failure. Once the reason for the failure has been determined and corrected, the job can be restarted through the dbms_job.run command.

To determine why the DBMS job failed, perform the following steps:

  1. Logon to the database as the Management Repository owner.

  2. Issue the following SQL statement:

    select job 
      from all_jobs 
      where what like '%CHECK_FOR_SEVERITIES%';
    
  3. Using the job id returned, look for ORA-12012 messages for this jobid in the alerts log and trace files and try to determine and correct the problem.

  4. Issue the following DBMS job command and parameters:

    execute dbms_job.run (jobid);

3.15.25 Notifications Processed

The total number of notifications delivered by the Management Service over the previous 10 minutes.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If the number of notifications processed is continually increasing over several days, then you may want to consider adding another Management Service.

3.15.26 Notifications Waiting

Notification Method Performance metrics measure the performance data for each notification type, such as SNMP, EMAIL, OSCMD, PLSQL and RCA. This metric shows the number of notifications queued for the method type.

Data Source

The data for this metric comes from entries in the mgmt_system_performance_log where name=<method_name>||_S_QUEUED

User Action

If the value is steadily increasing perform the following user actions:

  1. Check the Errors page for errors logged by the Notification Delivery.

  2. Check the number of notification rules defined utilizing the method and verify that they are all necessary, removing those that are not.

  3. Verify that the addresses being used for the notifications are correct

3.15.27 Number of Active Agents

The number of active agents in the repository. If this number is 0, then Enterprise Manager is not monitoring any external targets. May be a problem if unexpected.

Data Source

The number of agents whose status is up in the mgmt_current_availability table.

User Action

If no agents are running, determine the reasons they are down, correct if needed and restart. Log files in the agent's $ORACLE_HOME/sysman/log directory can provide information about possible causes of agent shutdown.

3.15.28 Number of Administrators

The number of administrators defined for Enterprise Manager.

Data Source

The mgmt_created_users table in the Management Repository.

3.15.29 Number of Duplicate Targets

The count of duplicate targets in the Management Repository.

Data Source

The mgmt_duplicate_targets table in the Management Repository.

User Action

Go to the Duplicate Targets page by clicking the Duplicate targets link on the Management System Overview page. The Duplicate targets link only appears on the Management System Overview page if there are problems involving duplicate targets.

Resolve the conflict by removing the duplicate target from the conflicting Management Agent.

3.15.30 Number of Groups

The number of groups defined for Enterprise Manager.

Data Source

The mgmt_targets table in the Management Repository.

User Action

If you have a problem viewing the All Targets page, you may want to check the number of roles and groups.

3.15.31 Number of Roles

The number of roles defined for Enterprise Manager.

Data Source

The mgmt_roles table in the Management Repository.

User Action

If you have a problem viewing the All Targets page, you may want to check the number of roles and groups.

3.15.32 Number of Targets

The number of targets defined for Enterprise Manager.

Data Source

The mgmt_targets table in the Management Repository.

3.15.33 Oldest Loader File

This metric shows how long the oldest loader file has been waiting to be processed by the loader. This is an indicator of the delay from when the Management Agent sends out information to when the user receives the information.

Data Source

This metric is obtained using the following query of the mgmt_oms_parameters table in the Management Repository.

SELECT value 
  FROM mgmt_oms_parameters 
  where name='loaderOldestFile'

User Action

If the oldest loader file is extremely old, you have a loader problem. You may want to add another Management Service and point some of the Management Agents to the new Management Service.

3.15.34 Repository Tablespace Used

This is the total number of MB that the Management Repository tablespaces are currently using.

Data Source

The dba_data_files table in the Management Repository.

3.15.35 Restart Count

The number of times the agent has been restarted in the past 24 hrs.

Data Source

Derived by:

(SELECT t.target_name, COUNT(*) down_count
  FROM mgmt_availability a, mgmt_targets t
  WHERE a.start_collection_timestamp = a.end_collection_timestamp
    AND a.target_guid = t.target_guid
    AND t.target_type = MGMT_GLOBAL.G_AGENT_TARGET_TYPE
    AND a.start_collection_timestamp > SYSDATE-1
  GROUP BY t.target_name)

User Action

If this number is high, check the agent logs to see if a system condition exists causing the system to bounce. If an agent is constantly restarting, the Targets Not Uploading Data metric may also be set for targets on the agents with restart problems. Restart problems may be due to system resource constraints or configuration problems.

3.15.36 Session Count

A count of the number of sessions between the Management Service and Management Repository database.

Data Source

The gv$session system view.

3.15.37 Steps Per Second

The number of job steps processed per second by the job dispatcher, averaged over the past hour and sampled every 10 minutes.

Data Source

The mgmt_job_execution table in the Management Repository.

3.15.38 Target Addition Rate (Last Hour)

The rate at which targets are being created. The target addition rate should be greatest shortly after EM is installed and then should increase briefly whenever a new agent is added. If the rate is increasing abnormally, you should check for abnormal agent or administrator activity and verify that the targets are useful. Check to see that group creation is not being over utilized.

Data Source

The metric is derived from the mgmt_target table, the current target count - target count at last sampling.

3.15.39 Target Compliance

This metric gives the compliance score for each target. It is calculated based on the compliance score of the individual policy rules associated with the given target. Compliance score ranges from 0-100 and it is represented in percentage. It tells how good the target is complaint with associated policy rules. This metric is collected for every 6 hours (360 minutes).

Data Source

The data for the metric comes from entries in the MGMT_POLICY_ASSOC_EVAL_SUMM

User Action

If the value decreases steadily, perform the following:

  1. Check the target�s policy rule data in Policy Violations tab and check for the individual compliance score of the policy rules of the target.

  2. Concentrate on policy rules, which have lesser compliance score and try to resolve the corresponding policy rule violations manually or through automatic corrective actions.

3.15.40 Target Security Compliance

This metric is used to capture the compliance trend of all the targets w.r.t the security policies defined on them. The security compliance score is an indication of the security health of a target. A score of 100 indicates full compliance and a score of 0 indicates no compliance.

Data Source

The data for this metric comes from entries in the mgmt_policy_assoc_eval_summ.

User Action

If the compliance score is reducing continuously, check the security policy violations and ensure that the violations are rectified as recommended by the policy.

3.15.41 Target Violations

This metric gives the sum of the violations of all policy rules associated with each target. Along with the violations count, it has the violation level also to tell whether it is Critical/Warning/Informational violation. This metric is collected for every 6 hours (360 minutes).

It helps to show the trend overview of target policy violations data.

Data Source

The data for the metric comes from entries in the MGMT_POLICY_ASSOC_EVAL_SUMM.

User Action

If the value increases steadily, perform the following:

  1. Give more priority to Critical violations, then warning and informational. Check the policy rules causing the policy violations in policy violations tab page.

  2. Try to resolve the violations through automatic corrective actions or manual actions.

3.15.42 Throughput Per Second

The number of notifications delivered per second, averaged over the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

3.15.43 Total Loader Runtime in the Last Hour (seconds)

This is the amount of time in seconds that the loader thread has been running in the past hour.

Data Source

The mgmt_system_performance_log table in the Management Repository.

User Action

If this number is steadily increasing along with the Loader Throughput (rows per hour) metric, then perform the actions described in the User Action section of the help topic for the Loader Throughput (rows per hour) metric. If this number increases but the loader throughput does not, check for resource constraints, such as high CPU utilization by some process, deadlocks in the Management Repository database, or processor memory problems.

3.15.44 Total Repository Tablespace

The total MB allocated to the Management Repository tablespaces. This will always be greater than or equal to the space used.

Data Source

The dba_free_space table in the Management Repository.

3.15.45 User Addition Rate (Last Hour)

The rate at which users are being created. The target addition rate should be low. If the rate is increasing abnormally, you should check for abnormal administrator activity.

Data Source

The metric is derived from the mgmt_created_users table, the current user count - user count at last sampling.

3.16 Oracle Management Services and Repository

The OMS and Repository target exposes metrics that are useful for monitoring the Oracle Enterprise Manager Management Service (OMS) and Management Repository.

3.17 Repository Collections Performance

This category of metrics provides information on the performance of repository collections. They are collected by background dbms jobs in the repository database called collection workers. Repository metrics are sub divided into long and short running metrics. These are called task classes (short task class and long task class). Some collection workers (Default 1) process the short task class and some (Default 1) process long task class. Repository collection performance metrics measure the performance data for repository metric collections for each task class. This metric is a repository metric and hence collected by the collection workers.

3.18 Repository Job Dispatcher

This category of metrics provides information on the Repository Job Dispatcher.

3.18.1 Collection Duration (seconds)

The total amount of time in seconds the collection workers were running in last 10 minutes. This is an indicator of the load on the repository collection subsystem. This could be due to two reasons, the number of collections have increased or some of the metrics are taking a long time to complete. This needs to be related with collections processed metric to find out if number of collections have increased or metrics are taking a long time.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem.

3.18.2 Collections Processed

The total number of collections that were processed in the last 10 minutes.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem

3.18.3 Collections Waiting To Run

The total number of collections that were waiting to run at the point this metric was collected. An increasing value would mean the collection workers are falling behind and would need to be increased. The collections waiting to run could be high initially on system startup and should ideally go down towards zero.

Data Source

The data for this metrics come from entries in mgmt_collection_tasks table which holds all the list of collections.

3.18.4 Number of Workers

The total number of workers that were processing the collections.

Data Source

The data for this metric come from entries in mgmt_collection_workers table.

3.18.5 Total Throughput Across Workers

The total number of collections per second processed by all the collection workers.

Data Source

The data for this metrics come from entries in mgmt_system_performance log where job_name=MGMT_COLLECTION.Collection Subsystem.

3.19 Repository Sessions

This category of metrics provides information on the Repository sessions.

3.20 Response

This page indicates whether Enterprise Manager is up or down. It contains historical information for periods in which it was down.

3.20.1 Status

This metric indicates whether Enterprise Manager is up or down. If you have configured the agent monitoring the oracle_emrep target with a valid email address, you will receive an email notification when Enterprise Manager is down.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 3-1 Metric Summary Table

Target Version Evaluation and Collection Frequency Upload Frequency Operator Default Warning Threshold Default Critical Threshold Consecutive Number of Occurrences Preceding Notification Alert Text

All Versions

Every 5 Minutes

Not Uploaded

=

Not Defined

0

1

%Message%


Data Source

sysman/admin/scripts/emrepresp.pl

User Action

This metric checks for the following:

  • Is the Management Repository database up and accessible?

    If the Management Repository database is down, start it. If 'Invalid Username or Password' error is displayed, verify that the name and password for the oracle_emrep target is the same as the repository owner's name and password.

  • Is at least one Management Service running?

    If a Management Service is not running, start one.

  • Is the Repository Metrics dbms job running?

    If the DBMS job is down or has an invalid schedule, it should be restarted by following the instructions in the User Action section of the help topic for the DBMS Job Bad Schedule metric.