StandardThresholdEngined

You use the Oracle Communications Unified Assurance Metric Standard Thresholding Engine to monitor metric data and raise an event if a metric surpasses a threshold value within a specified time range. This thresholding engine is best used for monitoring availability of a device or metrics with discrete levels, such as 0% or 100% utilization. If your environment has a large variety of time ranges or poll intervals, you may wish to use multiple thresholding engines split by database shard to maintain performance.

You can run this application as a service using the Services UI.

Note:

If you are using any of the other thresholding engines, you must also enable and start the Metric Standard Thresholding Engine to process threshold violation messages into events. See the following for more information about the other thresholding engines:

Standard Thresholding Engine Setup

Add thresholds or modify existing thresholds by using the Thresholds UI. Set Type to Standard for these thresholds.
Add the thresholds to metrics. You can do this in the following ways:
- Manually, by using the Metrics UI.
- Automatically, during metric creation, using polling assignments or polling policies. See Polling Assignments and Polling Policies in Unified Assurance User's Guide.
- Automatically, during polling, using rules. See AddThresholdToMetric in Unified Assurance Developer's Guide.
Enable the default Metric Standard Thresholding Engine service, or create a custom service using specific configuration options.

See Services in Unified Assurance User's Guide for information about the Services UI.

Default Service

The following table shows the settings for the default service. Actual values are in bold, descriptions of values are in plaintext.

Field	Value
Package	coreProcessing-app
Name	Metric Standard Thresholding Engine
Program	bin/core/processing/StandardThresholdEngined
Arguments	This field is blank. There is no default value.
Description	Default Value, Util, and Availability Thresholding based on Daily Tables
Failover Type	Standalone (Supported: Standalone, Primary, Redundant/Backup, Cluster)
Status	Disabled
Privileged	This option is not selected.

See Services in Unified Assurance User's Guide for general information about the settings for services.

See Using Application Primary/Backup Failover and Using Application Clustering for more information about the different failover types.

Note:

Because the Metric Standard Thresholding Engine processes a high volume of data, in larger redundant environments Oracle recommends using application clustering (the Cluster failover type) to spread the load across multiple Metric Standard Thresholding Engine instances on your primary and secondary servers.

Default Configuration

The following table shows the default configurations for the application. Actual values are in bold, descriptions of values are in plaintext.

Name	Default Value	Possible Values	Notes
LogFile	logs/MetricStandardThresholdEngine.log	Text, 255 characters	The relative path to the log file.
LogLevel	ERROR	OFF, FATAL, ERROR, WARN, INFO, DEBUG	The logging level for the application.
Threads	3	An integer	The number of process threads to create when handling violations in rules. The engine takes a third of this value (rounded up) for database threads unless overridden by the ThresholdThreads configuration.
BaseRules	processing/event/threshold/base.rules	Text, 255 characters	The relative path to the application Base Rules file.
IncludeRules	processing/event/threshold/base.includes	Text, 255 characters	The relative path to the application Include Rules file.
LoadRules	processing/event/threshold/base.load	Text, 255 characters	The relative path to the application Load Rules file.
PollTime	60	An integer	How often the thresholding engine polls thresholds to see if they need to be checked for violations, in seconds. See Coordinating Poll Times for information about the interaction between polling and frequency for thresholds, metrics, and the thresholding engine.
DeviceZoneID	This field is blank. There is no default value.	All Zones or the name of any available device zone.	Do not use. Instead, add the MetricShardID configuration to retrieve metrics from a specific shard of the metric database.
CheckTime	900	An integer	The interval, in seconds, at which to check the application configuration for changes and to get a list of devices that do not have a maintenance window.
DeviceGroupID	This field is blank. There is no default value.	The name of any available device group.	Do not use. Instead, add the MetricShardID configuration to retrieve metrics from a specific shard of the metric database.
SendAllViolations	Disabled	Enabled or Disabled	(Optional) If set to Enabled, every threshold violation (regardless of current state) will create a notification. Otherwise, only violations that changed state (from Active to Clear or from Clear to Active) will be sent.
DBThreads	3	An integer	(Optional) The number of database threads to create when sending events to the Event database. If not specified, defaults to a third (rounded up) of ThresholdThreads.
Maintenance	Disabled	Enabled or Disabled	(Optional) If a device is in a maintenance window during the current cycle, it will be removed from the list of devices that the thresholding engine checks.
FieldSetFile	This field is blank. There is no default value.	Text, 255 characters	(Optional) The path to a CSV file containing the custom list of fields that will be used when inserting events. If you specify this, you must also specify InsertSQLFile.
InsertSQLFile	This field is blank. There is no default value.	Text, 255 characters	(Optional) The path to a file containing custom SQL Insert statement to use when inserting events. If you specify this, you must also specify FieldSetFile.
BranchDir	core/default	Text, 255 characters	The relative path to the rules directory.
ThresholdThreads	3	An integer	The number of threshold threads to create when querying the Metric database. Enables checking thresholds in the application instead of the default thresholding engine. If not specified, application threshold checking is disabled. Caution: Increasing this too much can overwhelm the Metric database. For example, in high-volume environments, do not increase this to more than 6.
PreferIPv4	Enabled	Enabled or Disabled	Whether IPv4 or IPv6 is preferred for communication with devices, when both are available. If this configuration is missing, IPv6 will be preferred.
EventShardID	1	An integer	(Optional) If using event database sharding, the shard to insert events into.
Capture	Disabled	Enabled or Disabled	(Optional) If enabled, saves violation messages in the $A1BASEDIR/logs/captures/StandardThresholdEngine.capture log.
MetricShardID	0	An integer	(Optional) If using metric database sharding, the shard to get metrics from. Use this instead of DeviceGroupID or DeviceZoneID. The default, 0, uses all shards.

Administration Details

The following list shows the technical details you will need for advanced administration of the application:

Package: coreProcessing-app
Package: ./StandardThresholdEngined [OPTIONS]

Options:

 -c, --AppConfigID N   Application Config ID (Job ID)
 -?, -h, --Help        Print usage and exit

Threaded: Multithreaded

Coordinating Poll Times

The PollTime configuration for the Metric Standard Thresholding Engine interacts with frequency that standard thresholds are checked and the poll time of the metrics themselves. The following settings affect timing:

On a metric: The poll time determines how frequently the metric's value is updated.
On a threshold:
- The frequency determines how frequently the metric value is checked for threshold violations.
- The time range determines the range of data points to use when checking for threshold violations.
On the Metric Standard Thresholding Engine, the poll time determines how frequently the engine polls the thresholds to see which need to be checked.

You must consider the interaction between these poll times when setting them. Because the Metric Standard Thresholding Engine processes all standard thresholds, you must be aware of the frequency of all of the related thresholds when setting the thresholding engine poll time.

The thresholds will only be checked as frequently as the Metric Standard Thresholding Engine polls them. If the Metric Standard Thresholding Engine's pollTime is set to 60 seconds, even if you set a threshold's Frequency to 30 seconds, the threshold will still only be checked every minute. However, having the Metric Standard Thresholding Engine poll the thresholds more frequently than the most frequent threshold results in unnecessary work.

Similarly, checking the threshold for violations more frequently than the metric data is updated could result in false positives, while checking the threshold too infrequently could result in missed violations. For example, you could get inaccurate data by setting the threshold's Frequency to 60 seconds when the metric's Poll Time is set to 300 seconds, or setting Frequency to 300 seconds when the metric's Poll Time is 30 seconds.

As a basic guideline, Oracle recommends setting Frequency for thresholds to be the same or less frequent than the pollTime for the Metric Standard Thresholding Engine and the Poll Time for the related metric.

See Metrics and Thresholds in Unified Assurance User's Guide for more information about configuring thresholds and metrics, including setting their frequency and poll times.

Note:

For thresholds that need very frequent polling times (less than a minute), using in-application thresholding may be more efficient than using the Metric Standard Thresholding Engine.

Poll Time Example

This example involves the following components:

Standard Thresholding Engine: pollTime is set to 60.
Threshold 1:
- Frequency is set to 300s.
- Time Range is set to 900s.
Metric 1: Poll Time is set to 300.
Threshold 2:
- Frequency is set to 60s.
- Time Range is set to 180s.
Metric 2: Poll Time is set to 60.

At 10:01, the following happens:

the Metric Standard Thresholding Engine polls the thresholds, to see if any should be checked.
Because threshold 1 was last checked at 10:00, and it only needs to be checked every 5 minutes, the thresholding engine does not check it.
Because threshold 2 was last checked at 10:00, and needs to be checked every minute, the thresholding engine checks if metric 2, which is polled every minute, violates threshold 2.
If metric 2 violates threshold 2, which is evaluated for data received since 9:58, the thresholding engine creates an event.

If pollTime for the Metric Standard Thresholding Engine is instead set to 300, it will not check threshold 2 frequently enough, and might miss violations. If pollTime for the Metric Standard Thresholding Engine is instead set to 15, it will perform extra work polling thresholds to see if they need to be checked at an unnecessary frequency.

Standard Thresholding Engine Self-Health Metrics

The following self-health metrics are captured for the Metric Standard Thresholding Engine:

Polled Thresholds: The number of thresholds being polled.
Threshold Violations: The number of thresholds being violated.
Poll Queue Length: The number of messages in the single-threaded polling queue, waiting to be added to the threshold queue.
Process Queue Length: The number of messages in the processing queue, waiting to be processed by rules.
Database Queue Length: The number of messages in the database queue, waiting to be inserted as events in the Event database.
Poll Duration: The duration of poll requests.
Average Db Time: The average time to insert data into or get data from the database in each poll cycle.

To monitor the metrics, you can add panels for them to a new or existing dashboard in Metric Analytics. Users in groups whose role has the Admin permission in the metricAnalytics package can add or edit Metric Analytics dashboards.

To add panels for the Standard Thresholding Engine self-health metrics:

From the main navigation menu, select Analytics, then Metrics, then Dashboard.
Click New Dashboard.
Click Add an empty panel.
In the Query section, click Toggle text edit mode (the pencil icon).
Enter the following statement:
```
SELECT ("value") FROM "metrictype_<metric>" WHERE ("instance" = 'Metric Standard Thresholding Engine' AND "host" =~ /^$Host$/)
```
where <metric> is one of the following:
- Polled_Thresholds
- Threshold_Violations
- Poll_Queue_Length
- Process_Queue_Length
- Database_Queue_Length
- Poll_Duration
- Average_Db_Time
Adjust any other panel settings as needed. For example, set the query name and panel title, adjust the query, visualization, display, and axes, and so on. See Create a dashboard in the Grafana documentation for more information about creating and configuring dashboards.
Click Apply to see the final dashboard.

Logging the Threshold Queue Length

In addition to the default self-health metrics for the polling, process, and database queues, you may want to monitor the number of messages in the ThresholdQueue, which contains queries to the Metric database to check for threshold violations. Monitoring this queue can be useful when stress-testing the Metric Standard Thresholding Engine.

You can do this by adding a rule to the thresholding base.rules.

The sample rule provided below logs the information for all of the queues every 100 threshold violations, but you can adjust this rule as needed. For example:

To log for every violation, uncomment the second line.
To only log for ThresholdQueue (the other queues are tracked in self-health metrics), comment out the lines for DatabaseQueue, ProcessQueue, and PollingQueue.

To add the rule:

From the Configuration menu, select Rules. See Rules for information about this UI.
Expand the following folders:

Core Rules (core)/Default read-write branch (default)/processing/event/threshold
Open the base.load file and add the following line:
```
$CustomHash->{ThresholdCount} = 0;
```
This sets the initial threshold count to 0.
Click Submit.

Open the base.rules file and add the following lines:

$CustomHash->{ThresholdCount} = $CustomHash->{ThresholdCount} + 1;
# $Log->Message('ALWAYS', "ThresholdCount: ". $CustomHash->{ThresholdCount});
if ($CustomHash->{ThresholdCount} % 100 == 0) {
$Log->Message('ALWAYS', "ThresholdCount: ". $CustomHash->{ThresholdCount});
$Log->Message('ALWAYS', "ThresholdsQueue: ". $ThresholdsQueue->pending());
$Log->Message('ALWAYS', "PollingQueue: ". $PollingQueue->pending());
$Log->Message('ALWAYS', "ProcessQueue: ". $ProcessQueue->pending());
$Log->Message('ALWAYS', "DatabaseQueue: ". $DatabaseQueue->pending());
}

For every 100 threshold violations, this logs the number of pending items in the queues. You can optionally change the value in the third line to log more or less frequently, or uncomment the second line to log the threshold count every time it increases.

Click Submit.

Title and Copyright Information

Implementation Guide

G36451-01