8 Configuring Threshold Monitoring

Threshold monitoring generates actions in the form of email notifications, SNMP traps and/or reports when specific events occur or thresholds are reached. The actions taken are determined by individual threshold rules, which are aggregated into threshold rule sets to simplify the process of assigning them to devices. To implement threshold rule sets, use the following general steps:

  1. Define threshold rules. These rules vary for the types of resources monitored and can include event- or threshold-driven responses to specific resource states. For example, a rule can generate a notification when CPU utilization reaches a specific threshold or every time a certain account logs on or off.

  2. Define threshold rule sets from the rules you have already defined. The rule sets aggregate threshold rules for deployment to specific devices.

  3. Assign threshold rule sets to specific devices.

  4. Update the agents to put the new rule sets into action.

The following sections describe how to perform these steps in more detail.

Understanding Threshold Rule Sets

A threshold rule set is a collection of threshold rules. Threshold rules are defined separately and later combined under the umbrella of a Threshold Rule Set.

When defining threshold rule sets, you can do one of two things:

  • Create the rules first, then create rule sets from the rules

  • Create empty rule sets, and then populate them with rules as you define them

You should begin by deciding what kind of notifications you want to create. You will need to decide what to monitor/act on, who to notify, and what the escalation priority should be.

Threshold Rule Types

Threshold rules monitor specific resources and either act on thresholds being crossed or the occurrence of specific events. The Configuration Change Console supports a number of threshold rule types, depending on the resource being monitored:

Table 8-1 Threshold Rule Types

Rule/resource type Notification based on… Description

Process activity (CPU%)

User activity (CPU%)

CPU load

Memory usage

File system disk usage

Agent inactivity

Predefined thresholds

Generate email notification, an SNMP trap or a report when a resource crosses above or below specified threshold for a specified period of time

User login/logout

Error logs

Errors in database

Detection of specific events occurring

Generate email notification, SNMP trap or a report on specific events, such as user login/logout or administrative events.


Notification Options

Notification may take place by email and/or SNMP trap.

  • Configure SNMP server information using the Administration task:

    Administration --> Server Configuration --> SNMP Administration

  • Configure email server information using the Administration task:

    Administration --> Server Configuration --> Email Administration

  • If you are notifying via email, you will enter the Configuration Change Console user (defined in the People screen) to receive the notification. The email accounts for individuals are created in:

    Administration --> People --> People

Escalation Priorities

When the software sends a notification, it awaits an acknowledgement by the recipient. If the acknowledgement does not arrive within a certain time period, the software escalates the notification according to the rules defined in the priority levels below. The priority levels define how rapidly escalation occurs or if it occurs at all.

This process continues until the notification reaches a recipient without a direct manager.

Table 8-2 Escalation Priorities

Priority Definition

P1

If the notification has not been acknowledged by the recipient within 5 minutes, the notification is escalated to the direct manager of the recipient. The notification is then escalated every 5 minutes to the next direct manager in the organization hierarchy until it is finally acknowledged, or it reaches the recipient that has no direct manager. The notification will remain in that final manager's inbox whether or not it is acknowledged.

P2

If the notification has not been acknowledged by the recipient within 30 minutes, the notification is escalated to the recipient's direct manager. The notification is then escalated every 30 minutes to the next direct manager in the organization hierarchy until it is finally acknowledged, or it reaches the recipient that has no direct manager. The notification will remain in that final manager's inbox whether or not it is acknowledged.

P3

Notifications that are not acknowledged by their recipient within 4 hours will be forwarded to the next highest peer. If the peer fails to respond within 4 hours, the notification will be escalated to their direct manager. The notification will follow this escalation pattern every 4 hours, alternating between peer and manager, until it is acknowledged or reaches a recipient that has no direct manager. At this point the notification will remain in the final manager's inbox whether or not it is acknowledged.

P4

Notifications that are not acknowledged by their recipient within 24 hours will be forwarded to the next highest peer. If the peer fails to respond within 24 hours, the notification will be forwarded to the next highest peer. If the peer fails to respond within 24 hours, the notification will be escalated to their direct manager. The notification will follow this escalation pattern every 24 hours, alternating between peer and manager, until it is acknowledged or reaches a recipient that has no direct manager. At this point the notification will remain in the final manager's inbox whether or not it is acknowledged.

P5

This priority level has no escalation, whether or not the recipient acknowledges the notification. This priority is useful for informational notifications that do not require an action or an acknowledgement from users.


The escalation sequence continues until the notification is acknowledged or the top of the specified management hierarchy has been reached. All escalations are recorded in the Notification History. Individuals also can manually escalate notifications when responding to a notification.

Defining Threshold Rules

The Threshold Rules screen, in the Threshold Rule Sets view mode, displays any Threshold rule sets already defined on the system and gives you the opportunity to create new threshold rule sets.

To access this screen, navigate to Policy --> Threshold Monitoring --> Threshold Rules (Threshold Rule Sets view).

A threshold rule set aggregates threshold rules for assignment to devices or device groups. For this reason, it is helpful to first define the rules, and then create the rule sets.

To view rules already defined in the system, select Threshold Rules from the View drop-down menu of the Threshold Rules screen.

The Threshold Rules screen in Rule View lists all predefined rules.

You can restrict the rules displayed according to:

  • Rule Type: The type of threshold rule (CPU utilization, user activity, etc.)

  • Priority: The escalation/notification priority (P1-P5)

To add a new rule, select from the drop-down menu at the bottom of the screen. The system displays the Rule Definition screen for the type of rule you are defining. The resulting screen will vary depending on the type of rule you select.

Every rule has a name, associated administrator, and description.

  • The rule name is a required field.

  • The administrator is an optional setting for documentation purposes.

  • The optional description field is used to document the function of the threshold rule.

The exact options requested on the screen depend on the type of rule you are defining. The rule-specific options are summarized in the following table:

Table 8-3 Rule-Specific Options

Rule type Type-specific fields/actions

Process activity

Specify the process name or pattern.

Enter the CPU threshold and whether to notify when the system is over/under the threshold.

Enter the time that the system must remain in the specified state before the policy sends a notification.

User activity

Select login users to monitor from the Users menu.

Select the CPU utilization threshold and whether to respond when the system is over or under the threshold.

Enter the time that the user account exceeds or falls below the CPU utilization threshold before the policy sends a notification.

CPU load activity

Select the specific CPU name or all CPUs to monitor from the CPU menu.

Select the Threshold (% CPU utilization), and whether to respond when the system is over or under the threshold.

Select the time in minutes that the system must remain in the specified state before the policy sends a notification. For example, if you select 70% utilization, over, and 10 minutes, then the system must remain at over 70% CPU utilization for a notification to occur.

Memory used activity

Use this rule to monitor memory usage on a managed device. On Windows platforms, memory includes both physical and virtual memory. On UNIX systems, it includes physical memory and swap space.

Enter the memory usage threshold (percentage).

Select whether to respond when the system is over or under the threshold.

Select the time in minutes that the system must remain over or under the threshold before the notification is sent.

Disk used activity

Enter the file system to monitor from the File System menu.

Enter the threshold percentage for available disk storage, and select whether to notify when conditions are over or under the threshold.

Select a time that the system must remain in the specified state before the policy sends a notification.

Agent no message activity

Select the time in minutes after which to create a notification if there is no message received from the agent.

You can optionally select to be notified when the agent starts sending messages after a period of inactivity.

Errors in log

Use this type of rule to monitor errors in the Configuration Change Console log.

Specify the error pattern or select a predefined error from the menu to search for in the log. See Appendix E: Predefined Errors for Threshold Rules for a description of predefined errors and a list of potential error messages.

If you don't select a predefined error, select the module to monitor. Module options include database, agent, JMS and user interface (UI).

If you don't select a predefined error, select the severity level for the error at which to take action. Severity levels are S1-S5, with S1 being the most severe. Each level includes all the severity levels under it. For example, if S3 is selected, the notifications will be triggered for S1, S2 or S3.

Errors in database

Use this type of rule to monitor predefined errors in database logs or specific error messages or error numbers.

Enter the client name or a pattern for the client accessing the database. See Appendix B, "Operating System Rule Set Capability Details" for a description of predefined errors and a list of potential error messages.

Select a predefined error, or specify the error text. Note that using patterns with an asterisk (*) as a wildcard is allowed.


Once you define the rule, you also define the actions to take:

Field Description
Notify Enter the person to notify (from the Configuration Change Console People screen). This sends the notification to the email account associated with that individual.
Priority Select the escalation priority: P1, P2, P3, P4, or P5. For a description of these priorities, see Escalation Priorities.
SNMP Servers Select an SNMP server to which to send a trap. You can select more than one. SNMP Servers have to be configured under Administration > Server Configuration > SNMP Administration
Generate Report Choose a preconfigured report from the drop-down menu to automatically generate this report when notification occurs.

Once you have decided what the threshold or event should be and what the level of response should be, you can define or modify the threshold rule as follows:

  1. Enter the name and administrator for the policy.

  2. Enter the rule-specific options, described in the table above.

  3. Select the notification options, described in the table above.

  4. Select Reports from the Generate Report drop-down menu. Note that reports should be run minimally and on as few devices as possible to avoid unnecessary database load. The report will be sent to the people specified as the notification recipient.

  5. Enable or disable the rule: Temporarily suspend notification by unselecting the Enabled checkbox.

  6. Apply the rule to rule sets. If you already have defined rule sets, you can add this rule to available rule sets. Select a rule set from the Available Rule Sets area and move it to the Selected Rule Sets area using the >> button.

  7. Click Save when you are done.

  8. To deploy the rules, click the Update Agents button in the toolbar at the top of the screen.

Defining Threshold Rule Sets

A threshold rule set aggregates a number of threshold rules and lets you assign them to specific devices.

To access all existing Threshold Rule Sets, select the Threshold Rules screen and the Threshold Rule Sets view. To access this screen, navigate to Policy --> Threshold Monitoring --> Threshold Rules (in Threshold Rule Sets view).

To modify an existing rule set, click on the link under the Rule Set Name column for the desired rule set. To add a new rule set, scroll to the bottom of the screen and select the Create Rule Set button. Either way, the Add or Modify Threshold Rule Set screen will be displayed.

This screen prompts you to provide information and select available rules for the rule set. Enter the following information for the rule set:

  1. Enter a rule set name.

  2. Select an administrator to associate with the rule set.

  3. Enter a description for the rule set (optional).

  4. Select threshold rules for the rule set by highlighting rules from the Available Rules window and clicking the >> button to move them to the Selected Rules window.

  5. Click Save when you are done. You also can save a rule set under a different name by clicking the Save As button, Delete the rule set by clicking the Delete Rule Set button, or exit the screen without making changes by clicking the Cancel button.

Assigning Rule Sets to Devices

Once you have defined your threshold rule sets, you must assign them to specific devices.

To access this screen, navigate to Policy --> Threshold Monitoring --> Threshold Rules (Threshold Rule Set view). This screen displays a count device or device group assignments per policy in the Device Assignments and Group Assignments columns respectively. You can assign rule sets to individual devices or to device groups.

To assign a rule set to a device, select the Threshold Rule Set view from the view drop-down menu, and click the Device Assignments number link for the rule set to assign. This displays the Device Mode view of the Assign Devices to Rule Set screen.

To assign a policy to a device group, click the Group Assignments link for the rule set. This displays the Group Mode view of the Assign Devices to Policy screen.

Note:

You can set up either device assignments or group assignments, but not both at the same time. For example, if you attempt to assign individual devices to a policy that is already assigned to device groups, the established group assignments will become invalid.

The screen can be switched into the following modes through the Selection Mode drop-down menu:

  • Device mode displays individual devices in a tree structure based on device groups

  • Group mode displays device groups

  • Device index mode displays the alphabetical index of available devices

Assign Devices (for all selection modes). Select the device(s) or group(s) to assign the policy and then click Save.

Validating Threshold Assignments

Use the Validate Threshold Assignments screen to view summaries of attributes related to threshold rules to help determine if some configuration is missing or incomplete. To access this screen, navigate to Policy --> Threshold Monitoring --> Validate Threshold Assignments.

Use the links to view devices without rule sets or devices without team support assignments.