StandardThresholdEngined

You use the Oracle Communications Unified Assurance Metric Standard Thresholding Engine to monitor metric data and raise an event if a metric surpasses a threshold value within a specified time range. This thresholding engine is best used for monitoring availability of a device or metrics with discrete levels, such as 0% or 100% utilization. If your environment has a large variety of time ranges or poll intervals, you may wish to use multiple thresholding engines split by database shard to maintain performance.

You can run this application as a service using the Services UI.

Note:

If you are using any of the other thresholding engines, you must also enable and start the Metric Standard Thresholding Engine to process threshold violation messages into events. See the following for more information about the other thresholding engines:

Standard Thresholding Engine Setup

  1. Add thresholds or modify existing thresholds by using the Thresholds UI. Set Type to Standard for these thresholds.

  2. Add the thresholds to metrics. You can do this in the following ways:

    • Manually, by using the Metrics UI.

    • Automatically, during metric creation, using polling assignments or polling policies. See Polling Assignments and Polling Policies in Unified Assurance User's Guide.

    • Automatically, during polling, using rules. See AddThresholdToMetric in Unified Assurance Developer's Guide.

  3. Enable the default Metric Standard Thresholding Engine service, or create a custom service using specific configuration options.

    See Services in Unified Assurance User's Guide for information about the Services UI.

Default Service

The following table shows the settings for the default service. Actual values are in bold, descriptions of values are in plaintext.

Field Value
Package coreProcessing-app
Name Metric Standard Thresholding Engine
Program bin/core/processing/StandardThresholdEngined
Arguments This field is blank. There is no default value.
Description Default Value, Util, and Availability Thresholding based on Daily Tables
Failover Type Standalone (Supported: Standalone, Primary, Redundant/Backup, Cluster)
Status Disabled
Privileged This option is not selected.

See Services in Unified Assurance User's Guide for general information about the settings for services.

See Using Application Primary/Backup Failover and Using Application Clustering for more information about the different failover types.

Note:

Because the Metric Standard Thresholding Engine processes a high volume of data, in larger redundant environments Oracle recommends using application clustering (the Cluster failover type) to spread the load across multiple Metric Standard Thresholding Engine instances on your primary and secondary servers.

Default Configuration

The following table shows the default configurations for the application. Actual values are in bold, descriptions of values are in plaintext.

Name Default Value Possible Values Notes
LogFile logs/MetricStandardThresholdEngine.log Text, 255 characters The relative path to the log file.
LogLevel ERROR OFF, FATAL, ERROR, WARN, INFO, DEBUG The logging level for the application.
Threads 3 An integer The number of process threads to create when handling violations in rules. The engine takes a third of this value (rounded up) for database threads unless overridden by the ThresholdThreads configuration.
BaseRules processing/event/threshold/base.rules Text, 255 characters The relative path to the application Base Rules file.
IncludeRules processing/event/threshold/base.includes Text, 255 characters The relative path to the application Include Rules file.
LoadRules processing/event/threshold/base.load Text, 255 characters The relative path to the application Load Rules file.
PollTime 60 An integer How often the thresholding engine polls thresholds to see if they need to be checked for violations, in seconds. See Coordinating Poll Times for information about the interaction between polling and frequency for thresholds, metrics, and the thresholding engine.
DeviceZoneID This field is blank. There is no default value. All Zones or the name of any available device zone. Do not use. Instead, add the MetricShardID configuration to retrieve metrics from a specific shard of the metric database.
CheckTime 900 An integer The interval, in seconds, at which to check the application configuration for changes and to get a list of devices that do not have a maintenance window.
DeviceGroupID This field is blank. There is no default value. The name of any available device group. Do not use. Instead, add the MetricShardID configuration to retrieve metrics from a specific shard of the metric database.
SendAllViolations Disabled Enabled or Disabled (Optional) If set to Enabled, every threshold violation (regardless of current state) will create a notification. Otherwise, only violations that changed state (from Active to Clear or from Clear to Active) will be sent.
DBThreads 3 An integer (Optional) The number of database threads to create when sending events to the Event database. If not specified, defaults to a third (rounded up) of ThresholdThreads.
Maintenance Disabled Enabled or Disabled (Optional) If a device is in a maintenance window during the current cycle, it will be removed from the list of devices that the thresholding engine checks.
FieldSetFile This field is blank. There is no default value. Text, 255 characters (Optional) The path to a CSV file containing the custom list of fields that will be used when inserting events. If you specify this, you must also specify InsertSQLFile.
InsertSQLFile This field is blank. There is no default value. Text, 255 characters (Optional) The path to a file containing custom SQL Insert statement to use when inserting events. If you specify this, you must also specify FieldSetFile.
BranchDir core/default Text, 255 characters The relative path to the rules directory.
ThresholdThreads 3 An integer The number of threshold threads to create when querying the Metric database. Enables checking thresholds in the application instead of the default thresholding engine. If not specified, application threshold checking is disabled. Caution: Increasing this too much can overwhelm the Metric database. For example, in high-volume environments, do not increase this to more than 6.
PreferIPv4 Enabled Enabled or Disabled Whether IPv4 or IPv6 is preferred for communication with devices, when both are available. If this configuration is missing, IPv6 will be preferred.
EventShardID 1 An integer (Optional) If using event database sharding, the shard to insert events into.
Capture Disabled Enabled or Disabled (Optional) If enabled, saves violation messages in the $A1BASEDIR/logs/captures/StandardThresholdEngine.capture log.
MetricShardID 0 An integer (Optional) If using metric database sharding, the shard to get metrics from. Use this instead of DeviceGroupID or DeviceZoneID. The default, 0, uses all shards.

Administration Details

The following list shows the technical details you will need for advanced administration of the application:

Coordinating Poll Times

The PollTime configuration for the Metric Standard Thresholding Engine interacts with frequency that standard thresholds are checked and the poll time of the metrics themselves. The following settings affect timing:

You must consider the interaction between these poll times when setting them. Because the Metric Standard Thresholding Engine processes all standard thresholds, you must be aware of the frequency of all of the related thresholds when setting the thresholding engine poll time.

The thresholds will only be checked as frequently as the Metric Standard Thresholding Engine polls them. If the Metric Standard Thresholding Engine's pollTime is set to 60 seconds, even if you set a threshold's Frequency to 30 seconds, the threshold will still only be checked every minute. However, having the Metric Standard Thresholding Engine poll the thresholds more frequently than the most frequent threshold results in unnecessary work.

Similarly, checking the threshold for violations more frequently than the metric data is updated could result in false positives, while checking the threshold too infrequently could result in missed violations. For example, you could get inaccurate data by setting the threshold's Frequency to 60 seconds when the metric's Poll Time is set to 300 seconds, or setting Frequency to 300 seconds when the metric's Poll Time is 30 seconds.

As a basic guideline, Oracle recommends setting Frequency for thresholds to be the same or less frequent than the pollTime for the Metric Standard Thresholding Engine and the Poll Time for the related metric.

See Metrics and Thresholds in Unified Assurance User's Guide for more information about configuring thresholds and metrics, including setting their frequency and poll times.

Note:

For thresholds that need very frequent polling times (less than a minute), using in-application thresholding may be more efficient than using the Metric Standard Thresholding Engine.

Poll Time Example

This example involves the following components:

At 10:01, the following happens:

  1. the Metric Standard Thresholding Engine polls the thresholds, to see if any should be checked.

  2. Because threshold 1 was last checked at 10:00, and it only needs to be checked every 5 minutes, the thresholding engine does not check it.

  3. Because threshold 2 was last checked at 10:00, and needs to be checked every minute, the thresholding engine checks if metric 2, which is polled every minute, violates threshold 2.

  4. If metric 2 violates threshold 2, which is evaluated for data received since 9:58, the thresholding engine creates an event.

If pollTime for the Metric Standard Thresholding Engine is instead set to 300, it will not check threshold 2 frequently enough, and might miss violations. If pollTime for the Metric Standard Thresholding Engine is instead set to 15, it will perform extra work polling thresholds to see if they need to be checked at an unnecessary frequency.

Standard Thresholding Engine Self-Health Metrics

The following self-health metrics are captured for the Metric Standard Thresholding Engine:

To monitor the metrics, you can add panels for them to a new or existing dashboard in Metric Analytics. Users in groups whose role has the Admin permission in the metricAnalytics package can add or edit Metric Analytics dashboards.

To add panels for the Standard Thresholding Engine self-health metrics:

  1. From the main navigation menu, select Analytics, then Metrics, then Dashboard.

  2. Click New Dashboard.

  3. Click Add an empty panel.

  4. In the Query section, click Toggle text edit mode (the pencil icon).

  5. Enter the following statement:

    SELECT ("value") FROM "metrictype_<metric>" WHERE ("instance" = 'Metric Standard Thresholding Engine' AND "host" =~ /^$Host$/)
    

    where <metric> is one of the following:

    • Polled_Thresholds

    • Threshold_Violations

    • Poll_Queue_Length

    • Process_Queue_Length

    • Database_Queue_Length

    • Poll_Duration

    • Average_Db_Time

  6. Adjust any other panel settings as needed. For example, set the query name and panel title, adjust the query, visualization, display, and axes, and so on. See Create a dashboard in the Grafana documentation for more information about creating and configuring dashboards.

  7. Click Apply to see the final dashboard.

Logging the Threshold Queue Length

In addition to the default self-health metrics for the polling, process, and database queues, you may want to monitor the number of messages in the ThresholdQueue, which contains queries to the Metric database to check for threshold violations. Monitoring this queue can be useful when stress-testing the Metric Standard Thresholding Engine.

You can do this by adding a rule to the thresholding base.rules.

The sample rule provided below logs the information for all of the queues every 100 threshold violations, but you can adjust this rule as needed. For example:

To add the rule:

  1. From the Configuration menu, select Rules. See Rules for information about this UI.

  2. Expand the following folders:

    Core Rules (core)/Default read-write branch (default)/processing/event/threshold

  3. Open the base.load file and add the following line:

    $CustomHash->{ThresholdCount} = 0;
    

    This sets the initial threshold count to 0.

  4. Click Submit.

  5. Open the base.rules file and add the following lines:

    $CustomHash->{ThresholdCount} = $CustomHash->{ThresholdCount} + 1;
    # $Log->Message('ALWAYS', "ThresholdCount: ". $CustomHash->{ThresholdCount});
    if ($CustomHash->{ThresholdCount} % 100 == 0) {
    $Log->Message('ALWAYS', "ThresholdCount: ". $CustomHash->{ThresholdCount});
    $Log->Message('ALWAYS', "ThresholdsQueue: ". $ThresholdsQueue->pending());
    $Log->Message('ALWAYS', "PollingQueue: ". $PollingQueue->pending());
    $Log->Message('ALWAYS', "ProcessQueue: ". $ProcessQueue->pending());
    $Log->Message('ALWAYS', "DatabaseQueue: ". $DatabaseQueue->pending());
    }
    

    For every 100 threshold violations, this logs the number of pending items in the queues. You can optionally change the value in the third line to log more or less frequently, or uncomment the second line to log the threshold count every time it increases.

  6. Click Submit.