Performance Monitoring - Metrics

Unified Assurance Performance Monitoring provides a complete set of tools capable of gathering any metric, from any device, using any technology, at the granularity required for near real-time data collection. The gathered comprehensive set of data is stored leveraging the "Big Data" model and enhances it further with proactive analysis, monitoring, and reporting, providing forewarning of anomalies before they become outages. Integration with ticketing systems allows for rapid turnaround, high visibility, and increased tracking of troubleshooting issues. Through the included Knowledgebase system, detailed historical and current information and troubleshooting documents are available all in one place so repeat issues can be quickly found and resolved. This, coupled with ad hoc reports, scheduled reporting and dashboards provides a powerful and useful tool for reducing overhead costs and minimizing downtime.

The following sections cover configuring and using performance monitoring in Unified Assurance.

Objectives

Example

In this example, ping and SNMP polling are set up for devices, thresholds are added and configured, and a polling policy is created.

Metric Collection

Configuring Ping Polling

The following section covers Ping Polling of devices for Latency and Packet Loss metrics, using Unified Assurance's Default Ping Poller Template.

  1. Navigate to Metric Types. From this UI, you can add, edit and remove metric types, which are used in Unified Assurance to define how a metric is visualized/displayed.

    Configuration -> Metrics -> Metric Types

  2. In this example, the Latency and Packet Loss metric types are enabled for TopN viewing (i.e. TopN Scope set to Value), so that these metric types become available in the TopN Overview.

    Note:

    This is an optional step. Having the TopN Scope set to "Disabled" will not effect the gathering of metrics. It only effects metric availability/visibility in the TopN Overview.

    • To enable these metrics in the TopN Overview, click on them to open for editing, change the "TopN Scope" value to "Value", and click "Submit" to save the changes.
  3. Navigate to Poller Templates. Note that the "Default Ping" template includes the LatencyPacket LossPing Jitter and Ping Jitter Utilization metric types.

    Configuration -> Metrics -> Poller Templates

    • From this UI, you can create, edit and delete Poller Templates.

    • These templates consist of groups of Metric Types, and specify what metrics will be created and stored for the devices and instances that are polled using the template.

  4. Navigate to Polling Assignments. From this UI, you configure polling on a device instance for metric data.

    Configuration -> Metrics -> Polling Assignments

  5. In this below example, ping polling is set up for a selected number of devices.

    • For Method, select "NA".

    • For Poller Template, select "Default Ping".

    • For Threshold Group, select "Default Ping".

      • Thresholds will be covered in detail in a later section.
    • For Poll Time, enter 300.

      • Poll Time is how often the poller will poll for these metrics. Poll Time is measured in seconds, so entering 300 means that these metrics will be polled for every 5 minutes.
  6. In the Devices section, select the devices you want to poll metrics from and use the "Add" or "Add All" buttons to select them for polling.

    • If the list of devices is very large, you can use the filter button to filter specific devices by Device Name, Device ID, Device Group and/or Device Zone.
  7. Using the filter button, filter for the "Device" instances and use the "Add All" button to select the instances.

  8. Click "Submit" to create the metrics for polling.

  9. Navigate to Services.

    Configuration -> Broker Control -> Services

  10. Click on the "Metric Ping Latency Poller" service to open the "Service (Edit)" form to the right of the UI.

  11. In the edit form, change the Status value to "Enabled". Click the "Submit" button to save the changes. The Ping Latency Poller service is now set to enabled.

  12. Select the service again, then click on the Start button to start the ping poller immediately. The Ping Latency Poller will then poll devices for Latency, Packet Loss, Ping Jitter and Ping Jitter Utilization metrics every 5 minutes (assuming the default Unified Assurance application configuration is used). These metrics will be viewable from a number of UI pages, such as the "Device Overview" dashboard and the "All Metrics Overview".

    Note:

    If not manually started, the application will be automatically started within the next minute by the broker.

  13. Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.

Configuring SNMP Polling

This section covers the configuration/setup of SNMP-based polling of devices, firstly with an example using Unified Assurance's "Default CDM" (CPU, Disk, Memory) poller template. This is followed by an example demonstrating how to configure custom rules-based SNMP polling of devices.

Unified Assurance Default CDM (CPU, Disk, Memory)

Note:

The "Polling Assignments" UI is not needed for SNMP Polling. The Generic SNMP Poller is 100% rules based, and will not take any of the configured items from the "Polling Assignments" UI into consideration while running. Configuring the SNMP Poller through "Polling Assignments" may generate incorrect metrics. The only time SNMP-based metrics should be used in the "Polling Assignments" UI is specifically for applying non-rules based thresholds to already defined metrics (thresholds will be covered in a subsequent section).

Configuration -> Metrics -> Polling Assignments

  1. Navigate to Metric Types.

    Configuration -> Metrics -> Metric Types

  2. As described in the "Configuring Ping Polling" section, in this example, "Memory Used", "CPU Utilization" and "Disk Used" can be enabled for TopN Overview (TopN Scope: Utilization).

    Note:

    This is an optional step and is purely for the purpose of showing these metrics in the TopN Overview. This will have no effect on the polling of these metrics.

  3. Navigate to Services.

    Configuration -> Broker Control -> Services

  4. Click on the "Metric Generic SNMP Poller" service to open the "Service (Edit)" form to the right of the UI.

  5. In the edit form, change the Status value to "Enabled". Click the "Submit" button to save the changes. The Generic SNMP Poller service is now set to enabled.

  6. Select the service again, then click on the Start button to start the SNMP poller immediately. The Generic SNMP Poller will then poll devices for various metrics every 5 minutes (assuming the default Unified Assurance application configuration is used), depending on the rules that are available for that device. These metrics will be viewable from a number of UI pages, such as the "Device Overview" dashboard and the "All Metrics Overview".

    Note:

    If not manually started, the application will be automatically started within the next minute by the broker.

  7. Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.

Custom Metrics - UPS Metrics

This second example demonstrates how to write your own rules files for polling custom SNMP metrics. In this example, rules files are written to poll UPS data from a device. The metrics polled include:

  1. Battery temperature

  2. Battery runtime

  3. Battery capacity

  4. Input and output voltage

  5. Output load (%)

Note:

The custom UPS rules in this example are provided only as an example, demonstrating how to write your own custom metrics rules files. A default installation includes a set of rules files for polling a variety of devices from numerous different vendors. The following documentation has information regarding supported devices and other useful information:

Please contact Oracle Communications if there are devices that are not polled by the out-of-the-box Foundation rules.

  1. Navigate to Metric Types.

    Configuration -> Metrics -> Metric Types

  2. Add new metric types to Unified Assurance, giving them the values shown in the table below.

    Note:

    In this example, the TopN Scope is set to "Disabled". If you wish for these metrics to be available in the TopN Overview, you can do so by setting TopN Scope to "Utilization" or "Value" respectively.

    Name Metric Group Format Unit Name Abbreviation Name Value Type Unit Division Direction TopN Type TopN Scope
    UPS Battery Capacity None Float Capacity % Utilization SI (1000) Descending (Normal) Both Disabled
    UPS Battery Runtime None Integer Seconds s Raw Time Descending (Normal) Both Disabled
    UPS Battery Temperature None Float Celsius C Raw SI (1000) Descending (Normal) Both Disabled
    UPS Input Voltage None Float Volts V Raw None Descending (Normal) Both Disabled
    UPS Output Voltage None Float Volts V Raw None Descending (Normal) Both Disabled
    UPS Output Load % None Float Percentage % Raw SI (1000) Descending (Normal) Both Disabled
  3. Navigate to Rules.

    Configuration -> Rules

    • The UI contains a list of rules directories and subdirectories.

    • Click on the "right arrow" symbol to the immediate left of a folder icon to expand that directory. Clicking on the "down arrow" symbol will collapse the directory.

  4. Click to expand Core Rules (core) -> Default read-write branch (default) -> collection -> metric -> snmp.

  5. Click to select the "snmp" folder, then click the "Add" button, and click "Add File" to add a new rules file (the form will appear to the right of the UI).

  6. Enter an appropriate name for the rules in the File Name field (e.g. ups-snmp.rules).

  7. The rules logic (Perl syntax) is entered in the text area underneath the File Name field. The following is example code for the UPS rules file.

    my $DeviceID     = $DeviceHash->{DeviceID};
    my $DeviceInfo   = $DeviceHash->{DeviceID} . ':' . $DeviceHash->{DNS} . ':' . $DeviceHash->{IP};
    my $PollInterval = $PollerConfig->{'PollTime'};
    my $PolledTime   = $DeviceHash->{PollTime};
    
    $Log->Message("INFO","ups-snmp.rules -> [$DeviceInfo] -> Entering ups-snmp.rules");
    

    Here you specify the OID's you wish to poll for data. These exact OID's used in this example were taken from the "PowerNet-MIB" MIB file.

    # OID's to be polled
    my %OIDs = (
        'upsAdvBatteryCapacity'         => '1.3.6.1.4.1.318.1.1.1.2.2.1.0',  # Battery Capacity (%) # 
        'upsAdvBatteryRunTimeRemaining' => '1.3.6.1.4.1.318.1.1.1.2.2.3.0',  # Battery run time remaining #
        'upsAdvBatteryTemperature'      => '1.3.6.1.4.1.318.1.1.1.2.2.2.0',  # Temperature in Celcius #
        'upsAdvInputLineVoltage'        => '1.3.6.1.4.1.318.1.1.1.3.2.1.0',  # Input Voltage #
        'upsAdvOutputVoltage'           => '1.3.6.1.4.1.318.1.1.1.4.2.1.0',  # Output Voltage #
        'upsAdvOutputLoad'              => '1.3.6.1.4.1.318.1.1.1.4.2.3.0'   # Output Load (%) #
    );
    
    my %metricNames = reverse %OIDs;
    

    Next, match the OID's to be polled with their corresponding MetricTypeIDs in Unified Assurance (created in steps 1 and 2). NOTE: The MetricTypeIDs shown in this rules file example will probably differ from the IDs of your own MetricTypes that you create.

    # Matching MetricType ID's in Unified Assurance with OID's to poll
    my %MetricTypeIDs= (
        'upsAdvBatteryCapacity'         => '1015',
        'upsAdvBatteryRunTimeRemaining' => '1016',
        'upsAdvBatteryTemperature'      => '1017',
        'upsAdvInputLineVoltage'        => '1018',
        'upsAdvOutputVoltage'           => '1019',
        'upsAdvOutputLoad'              => '1020',
    );
    
    $Session->translate([ -timeticks => 0 ]);  # This tells the snmp client not to translate it into friendly time
    # Then, dividing $result by 100 will give the time in seconds
    

    Next, grab the available metrics from the device for polling. This is done via $Session->get_request:

    # Grab available metrics from device for polling
    my $DeviceData= $Session->get_request (
        -varbindlist => [
            $OIDs{'upsAdvBatteryCapacity'},
            $OIDs{'upsAdvBatteryRunTimeRemaining'},
            $OIDs{'upsAdvBatteryTemperature'},      
            $OIDs{'upsAdvInputLineVoltage'},
            $OIDs{'upsAdvOutputVoltage'},
            $OIDs{'upsAdvOutputLoad'},      
        ]
    );
    

    Finally, iterate through the polled metrics and update their values in Unified Assurance:

    # Iterate through polled metrics and update each one in Unified Assurance
    foreach my $thisOID (keys(%{$DeviceData})) {
        my $result = $DeviceData->{$thisOID};
        my $metricName = $metricNames{$thisOID}; 
        my $MetricTypeID = $MetricTypeIDs{$metricName};
        $Log->Message("DEBUG", "UPS rules ->  [$metricName], oid: [$thisOID] value: [$result] type: [$MetricTypeID]");
    
        $InstanceID = 0;
        #$Log->Message('DEBUG', "UPS rules -> Searching for InstanceID for [$InstanceName] on DeviceID[$DeviceID]");
        #($InstanceID, $Error) = FindInstanceID($RulesDBH, $MetricHash, $Log, $DeviceID, $InstanceName, 1); # Not necessary, as InstanceID is already specified (0)    
        #$Log->Message('DEBUG', "UPS rules -> Found InstanceID: $InstanceID for [$InstanceName]");
    
        my ($MetricID, $Error) = FindMetricID($RulesDBH, $MetricHash, $Log, $DeviceID, $InstanceID, $MetricTypeID, $Factor, $max, $PollInterval);
        $Log->Message('DEBUG', "UPS rules -> created/updated metric [$MetricID] for [$InstanceID]");
    
        # Converting Battery Runtime metric to minutes (default runtime metric looks like this example: [1 hour, 20:00.00])
        if($thisOID eq '1.3.6.1.4.1.318.1.1.1.2.2.3.0') { 
            my $convertedTime = $result/100;
            $Log->Message("DEBUG", "UPS rules -> [$DeviceID]DataQueue params: metricid[$MetricID], value[$convertedTime], status [$Status], polltime[$PolledTime]");     
            $DataQueue->enqueue($MetricID. ':' . $convertedTime . ':' . $Status . ':' . $PolledTime);
            $Log->Message("DEBUG", "UPS rules -> Finsihed with oid [$metricName]");
        }
        else {
            $Log->Message("DEBUG", "UPS rules -> [$DeviceID]DataQueue params: metricid[$MetricID], value[$result], status [$Status], polltime[$PolledTime]");     
            $DataQueue->enqueue($MetricID. ':' . $result . ':' . $Status . ':' . $PolledTime);
            $Log->Message("DEBUG", "UPS rules -> Finsihed with oid [$metricName]");
        }
    }
    $Log->Message("INFO", "Exiting ups-snmp.rules");
    

    It is good practice to include log messages in your rules file, to aid in the debugging process, should anything not work as intended.

    You will also need to update the 'base.rules' file and 'base.includes' file, to include the new rules file you just created:

    base.rules

    elsif($SysObjectID =~ '1.3.6.1.4.1.318') { # UPS
        $Log->Message("WARN","Base Rules -> [$DeviceInfo] -> Polling using ups-snmp rules");
        UPSsnmpRules();
    }
    

    base.includes

    UPSsnmpRules,metricStdPoller/snmp/ups-snmp.rules
    
  8. Navigate to Services.

    Configuration -> Broker Control -> Services

  9. Click to select the "Metric Generic SNMP Poller" and ensure that application configuration has the "LogLevel" set to DEBUG.

  10. Click the restart button to restart the service.

    • When the service is restarted, the new UPS rules file and new log level will be taken into account. The poller will now poll for UPS metrics using the UPS rules file.
  11. Navigate to Logs.

  12. Use the filter bar to enter the following, replacing the "" with the value of the poller from the "Services" UI. This will filter the log file using the keyword "ups":

    event.dataset:"GenericSNMPPollerd(35)" and message : "ups"
    
  13. Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a UPS device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.

Thresholds

This section covers the configuration of Thresholds, based on the UPS Metrics from the previous section.

Thresholds are used to detect and give early warning for problems that may exist for metric data being collected. The Threshold Engine analyzes the threshold definitions (defined in the Thresholds UI), looks at the metric database for the status and will create a notification or fault if the defined limit is breached. Several notification platforms are available for threshold alerting. For example, an alarm can be sent to the Event Engine, an email can be sent to an administrator, or a Syslog message can be generated.

  1. Navigate to Thresholds.

    Configuration -> Metrics -> Thresholds -> Thresholds

    • From this Thresholds UI you can define thresholds for metrics that will trigger an event or notification if the threshold value is breached.

    • For this example, thresholds will be defined for the UPS metrics set up previously.

  2. Click on the "Add" button.

  3. Fill in the form (to the right of the UI) to create a threshold that will trigger if the temperature of the battery reaches above 50 degrees centigrade.

    • Name => UPS High Battery Temp

    • Type => Standard

    • Measurement => UPS Battery Temperature

    • Metric Field => value

    • Time Range => 15m

    • Warning => (Checked)

      • Warning Operator => >=

      • Warning Value => 50

      • Warning Severity => Major

    • Critical => (Checked)

      • Critical Operator => >=

      • Critical Value => 70

      • Critical Severity => Critical

    • Message => Performance threshold violation: UPS High Battery Temp

    • Check Location => Threshold Engine

    • Status => Enabled

  4. Add thresholds for the rest of the UPS metrics using the Threshold UI. The following are some examples that could be configured:

    • UPS Battery Runtime

    • UPS Output Load %

    • UPS Output Voltage Surge

    • UPS Input Voltage Surge

  5. Navigate to Threshold Groups.

    Configuration -> Metrics -> Thresholds -> Threshold Groups

    • From the Threshold Groups UI, you can group individual thresholds together to form a threshold group for polling assignments.
  6. Click the "Add" button to add a new threshold group. Call the group "Default UPS", and add the UPS thresholds to the group.

  7. Click the "Submit" button to save the group.

  8. Now that the threshold group for the UPS metrics has been set up, the Polling Assignments UI can be used to add the new thresholds to metrics. Navigate to Polling Assignments.

    Configuration -> Metrics -> Polling Assignments

  9. Add the UPS devices for polling, using the "Default UPS" threshold group.

    • Method => SNMP

    • Poller Template => Default UPS

    • Threshold Group => Default UPS

    • Poll Time => 300

    • Devices => Select any UPS devices on your system, but limited to the "Device" instance for each device

  10. Click "Submit" to add the thresholds.

Polling Policies

The "Poller Discovery" scheduled job uses Polling Policy settings to search for devices to process, creates the types of Metrics to poll the devices for based on the selected Poller Template, and then assigns Thresholds based on the selected Threshold Group. Essentially, this is a simple, automated, and dynamic way to create and maintain Metrics and Threshold settings for certain devices and instances, rather than manually creating them using the "Polling Assignments" interface.

The following quick example is a Network Interface polling policy for Routers:

  1. Navigate to Polling Policies.

    Configuration -> Metrics -> Polling Policies

  2. Click the "Add" button to add a new polling policy.

  3. Enter the following in these form fields (the other fields can be left as is):

    • Name => Router Network Interface

    • Description => Network Interface Metric Polling Policy

    • Policy Status => Enabled

    • Match

      • IP Range => (optional)

      Note:

      Here you can specify a specific range of IP addresses to search. If left blank, the scheduled job will search every device. For this example, an IP range of 192.168.10.* is used.

      • Device Category => Router

      Note:

      The scheduled job will only use this polling policy on Router devices.

      • Instance

      Note:

      The scheduled job will only process instances that match the provided name(s).

      • Match => LIKE

      • Name => eth

    • Assign

      • Method => SNMP

      • Poller Template => Default Network Interface

      • Threshold Group => Default Network Interface

  4. Click "Submit" to save the new policy

  5. Navigate to Jobs.

    Configuration -> Broker Control -> Jobs

  6. Ensure the "Metric Poller Discovery" scheduled job is set to Enabled.

Calculation Policies

The "Metric Post-Collection Calculation Engine (PCCE)" allows for individual metrics to be combined to create a meta-metric. This meta-metric can then be used for thresholding, SLM monitoring, etc.

The Calculations interface is used to define the handling of the Meta Metrics defined in the Collections interface. These calculation policies use perl-syntax code to do special processing on the metric data. An example of a meta-metric is the creation of a total inbound bandwidth metric, where the metric data for all inbound interfaces is summed up and saved as a separate metric.

Configuration -> Metrics -> Calculations

Configuration -> Metrics -> Collections

The form for adding/editing a calculation policy has the following fields:

Exercise 4 - Performance Monitoring

Using the previous examples, set up and configure Performance Monitoring on your Unified Assurance system using your lab environment.