Metric Thresholds: Determining When a Monitored Condition is an Issue

Some metrics have associated predefined limiting parameters called thresholds that cause metric alerts (specific type of event) to be triggered when collected metric values exceed these limits. Enterprise Manager allows you to set metric threshold values for two levels of alert severity:

  • Warning - Attention is required in a particular area, but the area is still functional.

  • Critical - Immediate action is required in a particular area. The area is either not functional or indicative of imminent problems.

Hence, thresholds are boundary values against which monitored metric values are compared. For example, for each disk device associated with the Disk Utilization (%) metric, you might define a warning threshold at 80% disk space used and critical threshold at 95%.

Note:

Not all metrics need a threshold: If the values do not make sense, or are not needed in a particular environment, they can be removed or simply not set.

While the out-of-box predefined metric threshold values will work for most monitoring conditions, your environment may require that you customize threshold values to more accurately reflect the operational norms of your environment. Setting accurate threshold values, however, may be more challenging for certain categories of metrics such as performance metrics.

For example, what are appropriate warning and critical thresholds for the Response Time Per Transaction database metric? For such metrics, it might make more sense to be alerted when the monitored values for the performance metric deviates from normal behavior. Enterprise Manager provides features to enable you to capture normal performance behavior for a target and determine thresholds that are deviations from that performance norm.

Note:

Enterprise Manager administrators must be granted Manage Target Metrics or greater privilege on a target in order to perform any metric threshold changes.

Preventing False Alerts:Setting the Number of Occurrences after which an Alert has been Triggered

To prevent false alerts due to spikes in metric values, the Number of Occurrences determines the period of time a collected metric value must remain above or below the threshold value before an alert is triggered or cleared. For example, if a metric value is collected every 5 minutes, and the Number of Occurrences is set to 6, the metric values (collected successively) must stay above the threshold value for 30 minutes before an alert is triggered. Also, after the alert is triggered, the same metric value needs to stay below its threshold for the same number of occurrences before the alert is cleared. For server-generated alerts, the evaluation frequency is determined by the Oracle Database internals. Refer to the Oracle Database documentation on Oracle Database Server-Generated Alerts for additional details.