There are monitoring situations in which different workloads for a target occur at regular (expected) intervals. Under these conditions, a static alert threshold would prove to be inaccurate. For example, the accurate alert thresholds for a database performing Online Transaction Process (OLTP) during the day and batch processing at night would be different. Similarly, database workloads can change based purely on different time periods, such as weekday versus weekend. In both these situations, fixed, static values for thresholds might result in false alert reporting.
Advanced Thresholds allow you to define and manage alert thresholds that are either adaptive (self-adjusting) or time-based (static).
Adaptive Thresholds are thresholds based on statistical calculations from the target's observed behavior (metrics).
This chapter covers the following topics:
You manage advanced thresholds from the Enterprise Manager console. The Advanced Threshold Management page allows you to create time-based static thresholds and adaptive thresholds. To access this page:
The Advanced Threshold Management page displays.
Adaptive thresholds are statistically computed thresholds that adapt to target workload conditions. Adaptive thresholds apply to all targets (both Agent and repository-monitored).
Creating an adaptive threshold is based on the following key concepts:
For the purpose of performance evaluation, a baseline period is a period of time used to characterize the typical behavior of the system. You compare system behavior over the baseline period to that observed at some other time.
There are two types of baseline periods:
Moving window baseline periods: Moving window baselines are defined as some number of days prior to the current date. This "window" of days forms a rolling interval that moves with the current time. The number of days that can be used to define moving window baseline in Enterprise Manager are:
Example: Suppose you have specified trailing 7 days as a time period while creating moving window baseline. In this situation, the most recent 7-day period becomes the baseline period for all metric observations and comparisons today. Tomorrow, this reference period drops the oldest day and picks up today.
Moving window baselines allow you to compare current metric values with recently observed history, thus allowing the baseline to incorporate changes to the system over time. Moving window baselines are suitable for systems with predictable workload cycles.
Enterprise Manager computes moving window statistics every day rather than sampling.
Adaptive threshold metrics are not immediately available by default; they must be defined and added to the system (registered) in order for them to become available for use by Enterprise Manager. Not all metrics can have adaptive thresholds: Adaptive Threshold metrics must fall into one of the following categories:
You can register adaptive threshold metrics from the Advanced Threshold Management page.
Threshold Change Frequency (The target timezone is used.)
None: One set of thresholds will be calculated using past data. This set of thresholds will be valid for the entire week.
None should be used when there is no usage pattern between daytime versus nighttime or within hours of a day.
By Day and Night: Two sets of thresholds (day and night) will be calculated using past data. Day thresholds will be calculated using previous day's daytime data, Night thresholds will be calculated using previous day's nighttime data. Thresholds will be changed every day and night.
By Day and Night should be used when there are distinct performance and usage variations between day hours and night hours.
By Weekdays and Weekend: Two sets of thresholds (weekdays and weekend) will be calculated using past data. Weekdays thresholds will be calculated using the previous weekdays data. Weekend thresholds will be calculated using the previous weekend data. Thresholds will be changed at start of the weekdays and start of the weekend.
By Day and Night, over Weekdays and Weekend: Four sets of thresholds will be calculated using past data. Weekdays Day thresholds will be calculated using the previous weekday's daytime data, Weekdays Night thresholds will be calculated using the previous weekday's nighttime data. Weekends Day thresholds will be calculated using previous weekend's daytime data, Weekends Night thresholds will be calculated using previous weekend's nighttime data. Thresholds will be changed each day and night.
Weekday day hours (7a.m. to 7p.m)
Weekend day hours (7am to 7pm)
Weekday night hours (7pm to 7am)
Weekend night hours (7pm to 7am)
By Day of Week: Seven sets of thresholds will be calculated, one for each day of the week. Thresholds will be calculated using the previous week's same-day data. Thresholds will be changed every day.
By Day of Week should be used when there is significant daily variation in usage for each day of week.
By Day and Night, per Day of Week: Fourteen sets of thresholds will be calculated, one for each day of the week and one for each night of the week. Day thresholds will be calculated using previous weeks same-day daytime data, Night thresholds will be calculated using the previous week's same-day nighttime data. Thresholds will be changed every day and night each day of the week.
Accumulated Trailing Data
Total time period for which metric data will be collected. Options are 7, 14, 21, and 28 days. In general, you should select the larger value as the additional data helps in computing more accurate thresholds.
Once you have registered the adaptive metrics, you now have the option of configuring the thresholds if the predefined thresholds do not meet your monitoring requirements.
To configure adaptive thresholds:
Significance Level: Thresholds based on significance level use statistical relevance to determine which current values are statistical outliers. The primary reason to use Significance Level for alerting is that you are trying to detect statistical outliers in metric values as opposed to simply setting a threshold value. Hence, thresholds are percentile based. For example, if the significance level is set to .95 for a warning threshold, the metric threshold is set where 5% of the collected metric values fall outside this value and any current values that exceed this value trigger an alert. A higher significance level of .98 or .99 will cause fewer alerts to be triggered.
Percentage of Maximum: These types of thresholds compute the threshold values based on specified percentages of the maximum observed over the period of time you selected. Percentage-of-maximum-based alerts are generated if the current value is at or above the percentage of maximum you specify. For example, if a maximum value of 1000 is encountered during a time group, and if 105 is specified as the Warning level, then values above 1050 (105% of 1000 = 1050) will raise an alert.
Percentage of Average. Based on the time grouping and bucketing, the threshold average is computed allowing you to set metric thresholds relative to the average of measured values over a period of time and time partition. 100% is the average value.
For all types of alerts you can set the Occurrences parameter, which is the number of times the metric crosses a threshold value before an alert is generated.
Clear Threshold: Thresholds for the selected metrics will be cleared. No Alert will be generated. Use this option when you do not want any thresholds set for the metrics but you do not want to remove historical data. Important: Deregistering metrics will remove the historical data.
Depending on the option selected, the Warning, Critical, and Occurrence setting options will change.
You can set the deviation over the computed average value.
The Threshold action for insufficient data menu allows you set the appropriate action for Enterprise Manager to take if there is not enough data to calculate a valid metric threshold. There are two actions available: Preserve the prior threshold and Suppress Alerts.
Maximum Allowed Threshold allows you to set a threshold limit that, if crossed, will raise a critical alert. A warning alert will automatically be raised at 25% less the maximum value.
Even though Enterprise Manager will use the adaptive threshold settings to determine an accurate target workload-metric threshold match, it is still be necessary to match the metric sampling schedule with the actual target workload. For example, your moving window baseline period (see Moving Window Baseline Periods ) should match the target workloads. In some situations, you may not know the actual target workloads, in which case setting adaptive thresholds may be problematic.
To help you determine the validity of your adaptive thresholds, Enterprise Manager allows you to analyze threshold using various adaptive settings to determine whether the settings are correct.
To analyze existing adaptive thresholds:
The Analyze Threshold page displays containing historical metric data charts (one for each metric).
Threshold Change Frequency
Threshold Based On
Metric Warning and Critical Thresholds
Because adaptive metric thresholds utilize statistical sampling of data over time, the accuracy of the thresholds will rely on the quantity and quality of the data collected. Hence, a sufficient amount of metric data needs to have been collected in order for the thresholds to be valid. To verify whether enough data has been collected for metrics registered with adaptive thresholds, use the Test Data Fitness function.
Enterprise Manager evaluates the adaptive threshold metrics and then displays the results in the Test Metrics window.
If there is sufficient data collected to compute the adaptive threshold, a green check appears in the results column. A red 'x' appears in the results column if there is insufficient data collected. To resolve this situation, you can use a longer accumulating trailing data window. Additionally, you can have metric data collected more frequently.
If you no longer want specific metrics to be adaptive, you can deregister them at any time. To deregister an adaptive threshold metric:
You can use monitoring templates to apply adaptive thresholds broadly across targets within your environment. For example, using a monitoring template, you can apply adaptive threshold setting for the CPU Utilization metric for all Host targets.
To apply adaptive thresholds using monitoring templates:
Create a template out of a target that already has adaptive threshold settings enabled.
From the Enterprise menu, select Monitoring and then Monitoring Templates.The Monitoring Templates page displays.
Click Create. The Create Monitoring Template: Copy Monitoring Settings page displays.
Choose a target on which adaptive thresholds have already been set and click Continue.
Enter a template Name and a brief Description. Click OK.
Once the monitoring template has been created, you can view or edit the template as you would any other template. To modify, add, or delete adaptive metrics in the template:
Time-based static thresholds allow you to define specific threshold values to be used at different times to account for changing workloads over time. Using time-based static thresholds can be used whenever the workload schedule for a specific target is well known or if you know what thresholds you want to specify.
To register metrics with time-based static thresholds:
The Metric Selector dialog displays.
The selected metrics appear in the Registered Metrics table.
If you want to set the thresholds for multiple metrics simultaneously, check the Select box for the metrics you want to update and click Bulk Configure Thresholds. The Configure Thresholds dialog displays.
Enter the revised Warning and Critical threshold values and click OK. A confirmation dialog displays stating that existing metric threshold values will be overwritten. Click Yes.
A confirmation dialog displays stating that changing Threshold Change Frequency will affect all the registered metrics and whether you want to continue. Click Yes to proceed.
If you no longer require time-based static threshold metrics, you can deregister them from the target.
To deregister time-based static metric thresholds:
As previously discussed, static thresholds do not account for expected performance variation due to increased/decreased workloads encountered by the target, such as the workload encountered by a warehouse database target against which OLTP transactions are performed. Workloads can also change based on different time periods, such as weekday versus weekend, or day versus night. These types of workload variations present conditions where fixed static metric threshold values may cause monitoring issues, such as the generation of false and/or excessive metric alerts. Ultimately, your monitoring needs dictate how to best go about obtaining accurate metric thresholds.