Sun Management Center 3.6.1 User's Guide

Alarm Concepts

The Alarm Manager software displays alarm information for managed objects. You can view object alarm information in an administrative domain in the main console window and in the Details Alarm window.

Note –

The Sun Management Center agent is configured so that only one server receives alarm information from that agent.

The Sun Management Center 3.6.1 Alarm Manager enables you to do the following tasks:

View alarms in pages from a database
Manually run the currently registered action after an alarm has been triggered
Set and change the currently registered action from a list of all installed actions
Sort alarms
Read the factory default suggested fix for a rule
Create a new user-suggested fix for a rule
Keep a running record of user notes for an alarm instance
Acknowledge alarms when they occur
Delete closed alarms from the database

Alarm Definitions

An alarm is a notification that is triggered by an abnormal event. Sun Management Center has two types of alarms:

Predefined alarm conditions included in the software modules, such as notification when the CPU usage exceeds a certain percentage. These alarms are triggered by conditions outside of a preset range or by Sun Management Center rules. Default alarm conditions and rules are included in the modules. For some predefined alarms, you can change the threshold at which the alarm is triggered. In addition, you can modify the action to take when the alarm occurs and add information to the suggested fixes. For a list of Sun Management Center rules, see Appendix D, Sun Management Center Software Rules.
User-defined alarm conditions. You define what causes an alarm to occur, what action to take, and a suggested fix, if desired.

Alarm Indicators

The Alarm Manager software uses several different methods to alert you to an unacknowledged open alarm condition:

Colored icons in the Domain Status Summary on the main console
Colored icons in the hierarchy (tree) view
Colored icons in the topology (contents) view
Colored relevant row or relevant column in the property table (contents view)

The type and color of alarm icon identify the severity of the alarm. For example, a red alarm icon indicates that a critical condition has developed and that corrective action is required immediately. By contrast, a blue alarm icon indicates a potential or an impending service-affecting fault.

You can acknowledge, delete, and manage the object alarms by using the Alarms Details window. For more information, see Managing and Controlling Alarms.

Figure 12–1 shows an unacknowledged, open critical alarm in the Swap Statistics properties table Used KB row. The row is red, which indicates a critical alarm. The alarm information propagates up the hierarchy tree view, from the individual module up to the host. You would also see a red alarm icon on the following objects:

Swap Statistics properties table
Kernel Reader module
Operating System
Host

You also see a red alarm icon on the corresponding host, group (if any), or administrative domain in the main console window. The only exception would occur if an unacknowledged open black alarm of higher severity exists.

Figure 12–1 Swap Statistics Alarm in the Details Window

Details window with Module Browser tab selected shows red (critical)
icons in left pane and red on the Used KB row in right pane.

You can select Critical Alarms, Alert Alarms, and Caution Alarms simultaneously for a module table. A tick mark appears when these alarms are selected.

Note –

Unacknowledged alarms take precedence over acknowledged alarms. If the hierarchy has two or more types of alarms, the color of the more severe unacknowledged alarm is propagated up the tree. For example, if there is a yellow unacknowledged alarm in CPU usage and a red unacknowledged alarm in Disk Statistics, only the red alarm icon propagates. However, if there is a yellow unacknowledged alarm in CPU usage and a red acknowledged alarm in Disk Statistics, only the yellow alarm icon propagates.

Alarm Severity Levels

The following alarm severities are supported:

Down Alarms: A down alarm indicates that a service-affecting condition has occurred that requires immediate corrective action. An example of this condition is the following: A required resource that is defined by a managed object has gone out of service. A specific example of this condition is a module that has gone down.
Critical Alarms: A critical alarm indicates that a service-affecting condition has developed that requires an urgent corrective action. An example of this condition occurs when a severe degradation in the capability of an object has occurred and you need to restore the object to full capability.
Alert Alarms: An alert alarm indicates that a non-service-affecting condition has developed. Corrective action should be taken to prevent a more serious fault.
Caution Alarms: A caution alarm indicates the detection of a potential or impending service-affecting fault before any significant effects have occurred. You should diagnose further if necessary and correct the problem before it becomes a serious service-affecting fault.
Off/Disabled Alarms: A disabled alarm indicates that a resource for a managed object is disabled. For example, a module is disabled.

Indeterminate State

Objects with black star icons are objects with indeterminate states, not to be confused with alarms. A black star or “splat” icon in the main console window means that a data acquisition failure occurred in that object. The failure is not the result of a rule infraction, so no alarm is associated with the failure.

Note –

When you view the data property table for an object, a pink row also indicates an indeterminate object state.

Domain Status Summary

The Domain Status Summary section of the main console window provides a brief view of managed object status. The colored icons designate the severity of the alarms.

Tip –

To see a definition for a status summary icon, place your cursor over the icon.

Numbers next to the alarm icons in the Domain Status Summary indicate the managed objects for which the highest severity open, unacknowledged alarm is represented. For example, a 1 next to the alert alarm icon (center) indicates that there is one managed object for which the highest severity alarm is alert.

The Domain Status Summary displays the number of managed objects in the administrative domain that have at least one unacknowledged open alarm of a specific severity.

Note –

If multiple types of alarms exist in the host, the icon that indicates the more severe unacknowledged open alarm appears in the Domain Status Summary.

If the most severe alarm on one host is critical and the most severe alarm on another host is alert , a 1 appears on both alarm icons.