../E48257-01.epub /> ../E48257-01.mobi />

7 Setting Up Performance Monitoring

This chapter describes how to define the KPIs and SLAs used to monitor your network's performance, and which you can review via dashboards and reports. The management of the alerts used to notify staff members about incidents that impact service levels, such as who should be notified and when, is also described. You must have Full access level permission to define and modify KPIs and SLAs (as described in Section 14.2, "Understanding User Roles and Permissions").

7.1 Introduction

A Service Level Agreement (SLA) is an agreement between a provider and a customer that specifies the terms of the provider's responsibility to the customer, and the level of service that the customer can expect. Typically, this agreement is expressed in terms of a number of Key Performance Indicators (KPIs). These are a way of measuring and benchmarking specific aspects of an organization's performance.

For example, an SLA for a given service might promise that it will be up and running 99.999 percent of the time. Because this is a commitment given to customers, the organization could make this a KPI. As such, service availability would be monitored, and whenever it fell below this level, the appropriate staff would be notified, and corrective action taken.

It is important to understand that an organization may also set KPIs for its own performance monitoring, independently of an SLA. Because KPIs provide insight into an organization's performance, they may also be tracked as part of a management dashboard.

Grouping and Filtering KPIs

KPIs are grouped into categories, which can be customized to contain related performance indicators. For example, separate categories could be defined for business and IT-related issues, such as user flow completion, visitor traffic, website availability, and so on.

Because you may need to handle large number of KPIs, you can use the View menu shown in Figure 7-1 to filter the displayed KPIs.

If you select the "Service Levels" option, the left-hand side KPIs listing is updated to show only those KPIs that have service levels associated with them. Folders that do not contain such KPIs are not shown. Similarly, you can select the "Alerts" option to filter the listing to show only those KPIs that have alerts associated with them. The "All" option shows all currently defined KPIs.

7.2 Defining KPIs and SLAs

To create a KPI and, optionally, use it as the basis for alerts and service levels, do the following:

  1. Select Configuration, then Service level management, then select KPIs, and click the New KPI button. The dialog shown in Figure 7-2 appears.

    Figure 7-2 Metric Selection Dialog

    Description of Figure 7-2 follows
    Description of "Figure 7-2 Metric Selection Dialog"

  2. Use the Data access menu to specify if the KPI will be bound to a specific application, suite, or web service, or if it will be generic. The use of KPI access filters is described in Section 14.7, "Managing the Scope of Authorized Data Within Modules".

    In the case of an application or service-specific KPI, specify the application or service to which it should be bound. In the case of a suite-specific KPI, specify the suite type (for example, PeopleSoft), and the configured suite. Note the options available within the Suite type menu depends on the suite instances configured on your system.

    Note that users without Full access permission need to be authorized to view information about KPIs bound to specific applications, services, and suites. This is described in Chapter 14, "Managing Users and Permissions".

  3. Use the Metric menu to select the metric to be used as the basis for monitoring. See Table E-2 for a description of the available metrics. When ready, click Next. If the metric you selected requires a filter, the dialog shown in Figure 7-3 appears. Otherwise, the dialog shown in Figure 7-4 appears.

    Figure 7-3 Required Filter Dialog

    Description of Figure 7-3 follows
    Description of "Figure 7-3 Required Filter Dialog"

  4. Use the menu to specify a filter for the selected metric. For example, if you selected the user-flow-load-time(sec) metric, you need to specify the user flow to which it refers. If the required option is not in the displayed list, you can click the Search icon to locate it. When ready, click Next. The dialog shown in Figure 7-4 appears.

    Figure 7-4 KPI Attributes Dialog

    Description of Figure 7-4 follows
    Description of "Figure 7-4 KPI Attributes Dialog"

  5. Use the check boxes shown in Table 7-1 to specify the KPI's attributes.

    Table 7-1 KPI Attribute Check Boxes

    Check box Description

    Filters

    Specifies whether you want to add filters to the selected metric at this time. For example, you could define that a metric should only apply to a particular domain.

    Requirements

    Specifies any additional requirements for the selected metric. Using this facility, you can build compound KPIs.

    Targets

    Specifies whether targets are associated with the KPI. If so, you can define a minimum and maximum range for the KPI, and how it should be calculated.

    Service Level Agreement

    Specifies whether the KPI should be incorporated into an SLA. If so, you can configure the level of your committed agreement (in percentage terms) for specific time periods.

    Alerts

    Specifies whether an alert should be associated with the KPI. If so, you need to define the duration the KPI must down before an alert is issued, the severity of the incident. Optionally, you can also specify whether an additional notification should be created when the KPI has returned to its set target range after a specified number of minutes.


    When ready, click Next. The dialog shown in Figure 7-5 appears.

    Figure 7-5 Filters Dialog

    Description of Figure 7-5 follows
    Description of "Figure 7-5 Filters Dialog"

  6. Optionally, use this dialog to define a filter to tighten the conditions for the KPI. For example, you might specify a KPI that concerns user flow load time. Using the Dimension level menu, you can specify that you only want the KPI to apply to a particular user flow step, or only to users coming from a particular location.

    For most dimension levels (except application name or client location), one or more wildcard (*) characters can be specified. Note that filters on different dimensions will be considered to be part of a logical 'AND' clause, while multiple filters on the same dimension are considered to be part of a logical 'OR' clause.

    Click Add filter for each filter that you want to apply. Note that you see the history of your filter selections in the lower part of the dialog. If you define multiple filters, all the conditions must be met for a match to be made. Note that this dialog only appears if you checked the Filters check box in Figure 7-4. When ready, click Next. The dialog shown in Figure 7-6 appears.

    Figure 7-6 Requirements Dialog

    Description of Figure 7-6 follows
    Description of "Figure 7-6 Requirements Dialog"

  7. Use this dialog to specify additional requirements for the KPI. In this way, you can build compound metric conditions. For example, the monitored service should provide an end-to-end page time of between 3 and 5 seconds for 98% of requested pages, but this requirement should only apply when page views per minute are between 5 and 10. Click Add requirement to specify compound metrics.

    Note:

    Any filter you specified in Figure 7-1 will also apply to any additional metrics. Therefore, you should ensure that the filter is relevant to the additional metrics. Also, if you specify additional (compound) metrics, all the defined requirements must be met for the KPI to trigger an alert.

    Note that this dialog only appears if you checked the Requirements check box in Figure 7-4. When ready, click Next. The dialog shown in Figure 7-7 appears.

    Figure 7-7 Targets Dialog

    Description of Figure 7-7 follows
    Description of "Figure 7-7 Targets Dialog"

  8. Use this dialog to set a range for the KPI. You can define it in terms of a fixed range. For example, between 80 and 100. Alternatively, you can specify if the KPI should be measured for small, medium, or large deviations from its auto-learnt target. For more information on the use of this facility, see Section 7.3.2, "Automatic and Fixed Targets". Note that this dialog only appears if you checked the Targets check box in Figure 7-4. When ready, click Next. The dialog shown in Figure 7-8 appears.

    Figure 7-8 Service Level Agreement Dialog

    Description of Figure 7-8 follows
    Description of "Figure 7-8 Service Level Agreement Dialog"

  9. Use this dialog to specify the level of your service agreement. For example, you undertake that the service will meet its specified objectives throughout 98% of the year. However, on an hourly basis, the commitment is 80%, and on a daily basis, 90%. All the period fields are mandatory.

    Note that this dialog only appears if you checked the Service Level Agreement check box in Figure 7-4. When ready, click Next. The dialog shown in Figure 7-9 appears.

  10. Use this dialog to specify the alert schedule(s) that should be used, the duration that the KPI must be down before an alert is generated, and the severity (Harmless, Warning, Minor, Critical, or Fatal) of the incident. You can also specify if and when an additional notification should be generated when the KPI returns to its set target range. It is recommended that you carefully review these settings to prevent excessive notifications.

    Note that this is dialog only appears if you checked the Alerts check box in Figure 7-4. When ready, click Next. The dialog shown in Figure 7-10 appears.

    Figure 7-10 Save as Dialog

    Description of Figure 7-10 follows
    Description of "Figure 7-10 Save as Dialog"

  11. Use this dialog to specify a name, category, and brief description for the monitored KPI. If you specify a new category name, this category will be automatically created. When ready, click Finish to complete your KPI definition. Note that monitoring of the new KPI starts immediately.

7.2.1 Renaming, Moving, and Deleting KPIs

You can modify, rename, move, or delete KPIs by right clicking them and selecting the Rename or Remove options from the menu. Select the Edit option to modify the KPI. The procedure to do this is described in Section 7.3, "Modifying Existing KPIs".

7.2.2 Copying Existing KPIs

In addition to creating new KPIs from scratch, as explained in Section 7.2, "Defining KPIs and SLAs", you can also create a copy of an existing KPI, and use it as the basis for your new KPI. This is particularly useful when the new KPI is very similar to an existing one. For example, you already have an existing KPI that monitors user flow availability in the USA, but now want to create a new one for Canada. To use an existing KPI as the basis for a new one, do the following:

  1. Select Configuration, then Service level management, then KPIs, and select the required KPI from the displayed listing. Click the Copy KPI button. The dialog shown in Figure 7-11 appears.

    Figure 7-11 Copy KPI Dialog

    Description of Figure 7-11 follows
    Description of "Figure 7-11 Copy KPI Dialog"

  2. Specify a new name and location for the new KPI. Optionally, click Add category to create a new category. When ready, click Save.

  3. Use the facilities described in Section 7.3, "Modifying Existing KPIs" to modify the new KPI to meet your requirements.

7.3 Modifying Existing KPIs

You can review and modify the definitions of existing KPIs by selecting Configuration, then Service level management, then KPIs, and selecting the required KPI from the displayed listing. A screen similar to the one shown in Figure 7-12 appears.

You can use the tabs to locate particular aspects to the selected KPI, and review and modify their definition. Their associated settings are equivalent to those described in Section 7.2, "Defining KPIs and SLAs".

7.3.1 Understanding KPI Calculation Ranges

It is important to understand that a KPI's metric value is always calculated over a 1-minute interval. That is, the metric's value is derived from its average value over that 1-minute period.

The KPI calculation range specifies how many of these 1-minute period averages should be used when calculating the metric's reported value over any given 5-minute period. For example, if you specify a calculation range of 10 minutes, the metric's value over each reported 1-minute period is calculated based on the averages for the previous 10 1-minute periods. Similarly, a calculation range of 15 minutes would specify that the reported value should be derived from the averages for the last 15 1-minute periods. This is shown in Figure 7-13.

Figure 7-13 KPI Calculation Ranges

Description of Figure 7-13 follows
Description of "Figure 7-13 KPI Calculation Ranges"

By default, the KPI calculation range is one minute. However, it can be useful to specify a longer calculation range if you want extreme values to be averaged out over a longer period.

Setting the Calculation Range

After initially defining a KPI, you can modify the KPI's measurement range. Do the following:

  1. Select Configuration, then Service level management, then KPIs, and then select the required KPI from the displayed listing.

  2. Click the Target tab within the KPI overview, and then the Edit target item. The dialog shown in Figure 7-14 appears.

    Figure 7-14 Edit KPI Target

    Description of Figure 7-14 follows
    Description of "Figure 7-14 Edit KPI Target"

  3. Use the Calculation range (min) menu to specify the period over which the reported metric value should be calculated. When ready, click Save.

Reporting KPI Values for Calculation Periods Longer Than 1 Minute

KPI values are calculated at the end of each 1-minute period, and are stored internally as numerator/denominator combinations per minute. When a KPI has a calculation range of longer than 1 minute, the value is reported as the sum of the numerators divided by the sum of the nominators for each 1-minute period within the calculation range. It is recommended that you bear this in mind when interpreting reported values for concurrent sessions and percentile KPIs.

7.3.2 Automatic and Fixed Targets

As mentioned earlier, you can specify a KPI should use automatic (or auto-learnt) targets. Because visitor traffic and usage patterns can differ widely during the course of a day, these auto-learnt minimum and maximum targets are calculated as moving averages for the current 1-minute period, based on the measured metric value for that 1-minute period over the last 30 days. For example, when a KPI metric is measured at 10.45 AM, the average against which it is compared is calculated from the last 30 days of measurements at 10.45 AM. You can specify the minimum and maximum targets in terms of small, medium, or large deviations from these moving averages.

In contrast, a fixed KPI target essentially represents, either minimum or maximum, a straight line. This is shown in Figure 7-15.

Figure 7-15 Automatic and Fixed KPI Targets Contrasted

Description of Figure 7-15 follows
Description of "Figure 7-15 Automatic and Fixed KPI Targets Contrasted"

When using auto-learnt targets, be aware of the following points:

  • Auto-learnt targets assume that a KPI has approximately the same value at the same time of day during each of the last 30 days. If this is not the case, it is recommended you use fixed targets.

  • It requires a full day before the auto-learnt targets become available. Clearly, the more days of historical data that are available, the more reliable the calculated automatic targets. During the first day that a KPI is created with auto-learnt targets, these targets are automatically set to slightly above and below the actual recorded values in order to prevent the generation of alerts.

  • Although auto-learnt targets can signal a problem if the metric value is too high or too low, if the problem persists over a long period, these abnormal values will become part of the auto-learnt targets and will, eventually, be assumed to be normal behavior.

  • Auto-learnt targets can drop dramatically if the KPI value is unavailable every day at about the same time. For example, in the case of no network traffic after 18:00.

If you define a KPI to use automatic targets (see Figure 7-7), and later modify the KPI to use fixed targets, the previously calculated targets (derived by monitoring the KPI over time) are set as the new fixed targets. If you are in doubt about the fixed targets that should be set for a KPI, you can use this facility to obtain realistic initial values. Of course, you are free to modify these at any time.

7.4 Defining Service Level Schedules

In addition to defining the KPIs that will be used to track the service levels achieved by your organization, you also need to specify when these service levels should apply. Typically, an organization has a core time (for example, 9 am - 5 pm, Monday - Friday) when the committed service level should be achieved. However, you may need to define exceptions to this, such as for public holidays. For example, a limited service between 10 am and 4 pm may be required on Easter Monday. Finally, you will also need to take account of planned maintenance periods.

The scheduling of planned service levels is maintained through the Service level schedule (shown in Figure 7-16). To open it, select Configuration, then Service level management, and then select Service level schedule.

Figure 7-16 Service Level Schedule

Description of Figure 7-16 follows
Description of "Figure 7-16 Service Level Schedule"

You can mark a period within the Service level schedule by clicking and dragging over the required period of the week. Assign the selected period a status by clicking the Active or Non-active modes.

You can define exceptions by clicking the Plus (+) icon, and selecting the day, month, and year from the Exceptions list. You can remove exceptions by clicking the Minus (-) icon to the right of an exception.

Note that any changes you make are not put into effect until you click Save. On exit, any unsaved changes you made are discarded.

7.5 Defining Alert Schedules

If your organization uses alerts to notify staff members about incidents that impact service levels, you will need to specify who should be notified and when. This is done through the creation of an alert schedule. You can define as many schedules as you wish to meet your operational requirements.

When you define a KPI, you specify (in Figure 7-9) the alert schedule(s) that the KPI should use. Each alert schedule consists of a group of users, notification details, and the operative time frame. Exceptions to standard operating times can also be defined. To define the alert schedules that should be used by your KPIs, do the following:

  1. Select Configuration, then Service level management, then select Alert schedule. The window shown in Figure 7-17 appears.

    Figure 7-17 Example Alert Schedule

    Description of Figure 7-17 follows
    Description of "Figure 7-17 Example Alert Schedule"

  2. Use the View menu to review the currently available alert schedules.

    You can define exceptions by clicking the Plus (+) icon, and selecting the day, month, and year from the Exceptions list. You can remove exceptions by clicking the Minus (-) icon to the right of an exception. You can mark a period within a schedule by clicking and dragging over the required period of the week. Assign the selected period by clicking one of the alert profiles.

    Note that any changes you make are not put into effect until you click Save. On exit, any unsaved changes you made are discarded.

  3. Click New within the toolbar to define a new alert schedule. The dialog shown in Figure 7-18 appears.

    Figure 7-18 Add Alert Schedule

    Description of Figure 7-18 follows
    Description of "Figure 7-18 Add Alert Schedule"

  4. Specify a unique name for the alert schedule and, optionally, a brief description. It is recommended that it includes an indication of its scope and purpose. When ready, click Save.

  5. Specify the users to be notified, as described in the following section.

7.5.1 Alert Profiles

These define the users who will be notified if a KPI has been down for the specified duration required to generate an alert. Depending on how the KPI has been defined, these users will also be notified when the KPI returns to within its set target range.

For example, you might have defined a KPI for user-flow-success-rate, and have specified that a success rate of least 70% is required for normal operation. If the KPI falls below this level within core business hours (9 am - 5 pm, Monday - Friday), all web application Business Managers should be notified. If the failure occurs outside these hours, the Helpdesk should be notified.

Each profile can be customized by right clicking it, and selecting Edit from the context menu. This is shown in Figure 7-19.

Figure 7-19 Alert Profile Context Menu

Description of Figure 7-19 follows
Description of "Figure 7-19 Alert Profile Context Menu"

The dialog shown in Figure 7-20 appears.

Figure 7-20 Alert Profile Dialog

Description of Figure 7-20 follows
Description of "Figure 7-20 Alert Profile Dialog"

Use this dialog to specify the name and a brief description of the users to be notified. Use the other tabs in this dialog to specify the recipients of E-mail, SNMP, and text message notification. Use the Enabled check box for each method to activate notification.

Note:

When receiving text message-based alerts, the timestamp of the message shown within your mobile telephone may not match that recorded within your RUEI installation. This is due to time zone differences on your mobile telephone.

7.5.2 Escalation Procedures

Within the Escalation tab, shown in Figure 7-21, you can set reminders to be sent to the alert's recipients if the KPI remains down. In addition, you can define an escalation procedure if the KPI is still down after a defined period. For example, if the KPI is still down after three hours, notify another group. This escalation group can be customized by right clicking it, and selecting Edit from the context menu.

Figure 7-21 Escalation Tab

Description of Figure 7-21 follows
Description of "Figure 7-21 Escalation Tab"

7.5.3 Measuring and Notification Intervals

It is important to understand that there are two states associated with a KPI: the KPI state, and the alert state. The KPI state can change at each measuring interval. The alert state is controlled by the properties you define for the alert. For example, consider the case in which a KPI starts to fail, and you have defined a calculation range of 5 minutes (the default), and a DOWN duration of 15 minutes. Although after 5 minutes the KPI is considered to be failing, you will not be notified about it unless it has been continually down for 15 minutes.

Similarly, the reminder and escalation durations you specify in Figure 7-21 refer to the alert. Hence, specifying a reminder duration of every hour would generate a reminder notification every 60 minutes after the original alert was sent while the KPI is still failing. It is recommended that you carefully review the values you specify for these settings to meet your operational requirements.

7.5.4 Testing Alert Messages

If you have enabled E-mail, SNMP, or text message notification, you can use the Test profile option in the context menu shown in Figure 7-19 to send a test alert to all specified recipients in an alert or escalation profile. This is useful for testing that the contact information has been entered correctly. You are prompted to confirm the test notification.

7.5.5 Using Mail Notifications

To define E-mail alert recipients, click the E-mail tab to open the dialog shown in Figure 7-22, and do the following:

  1. Use the Recipients fields to specify the E-mail addresses of the users to be notified. Click Add to include a user in the notification list. Note that you can remove a user from the list by clicking the Remove icon to the right of the user.

  2. Check the Enable check box to activate E-mail notification. When ready, click Save.

7.5.6 Using SNMP Notifications

To define SNMP alert recipients, click the SNMP tab to open the dialog shown in Figure 7-23, and do the following:

  1. Ensure that the Enabled check box is checked. Note that if not checked, no SNMP traps will be generated.

  2. Use the Version list to specify which version of the SNMP protocol is being used. The default is version 2c.

  3. Use the Manager address field to specify the client software address. This must be a valid network address, and can either be an IP address or a host name.

  4. Use the Community field to specify the group to which information is sent. This string acts as a password to control the clients' access to the server.

  5. Check the Enable check box to activate SNMP notification.

  6. Download the Management Information Base (MIB) definition and incorporate it into your address book of managed objects. It contains necessary information about how the received SNMP messages should be interpreted. Note that the file (oracle-ruei.mib) is also available within the RUEI installation zip file in the RUEI/extra directory. The structure of the MIB file is shown in Figure 7-24Foot 1 .

Figure 7-24 SNMP MIB Structure

Description of Figure 7-24 follows
Description of "Figure 7-24 SNMP MIB Structure"

The available KPI information and metrics in the MIB represent the most important properties of every KPI configured within the system, and can be used as the basis for filtering and alerting. They are explained in Table 7-2.

Table 7-2 KPI Information and Metrics Structure

Object Type

kpiName

Text

kpiCategory

Text

kpiValue

Value

kpiMin

Value

kpiMax

Value

kpiSeverity

Text

kpiDuration

Value

kpiDescription

Text

eventDateTime

Value (format: YYYYMMDDHHMMSS).

eventSource

Text

eventCode

Integer

eventDescription

Text

eventUser

Value

eventRepeatCount

Integer

kpiCalculationRange

Integer


Note that KPI names in SNMP alerts are sent in UTF-8 format. Any characters in the KPI name not in ISO-Latin-1 format will be replaced by a question mark (?) character. Also, be aware that not all SNMP managers fully support UTF-8. For further information, refer to your SNMP manager product documentation.

Changing the Default SNMP Port

By default, SNMP alerts are sent to port 162. If you want to change this, you should specify the required port as part of the Manager address (Figure 7-23). For example:

194.158.1.23:10162

7.5.7 Using Text Message Notifications

To define text message notifications, click the Text message tab to open the dialog shown in Figure 7-25, and do the following:

Figure 7-25 Text Message Tab

Description of Figure 7-25 follows
Description of "Figure 7-25 Text Message Tab"

  1. Use the Recipients field to specify the telephone numbers of the users to be notified. Click Add to include a user in the notification list. Note that you can remove a user from the list by clicking the Remove icon to the right of the user.

  2. Check the Enable check box to activate text message notification. When ready, click Save.

  3. If you have not already done so, you will need to configure an text message provider. If you are warned that one has not already been configured, click the warning link, and follow the instructions described in Section 15.8, "Configuring Text Message Providers".



Footnote Legend

Footnote 1: This screen features the iReasoning MIB Browser (http://www.ireasoning.com). This utility is not distributed as part of RUEI, and requires a separate license. It is intended only to illustrate the structure of the provided MIB file.