Sun N1 System Manager 1.3 Discovery and Administration Guide

Monitoring Threshold Values

The value of any given monitored OS health attribute is compared to a threshold value. Low and high threshold values are defined and can be configured.

Attribute data is compared against thresholds at regular intervals.

When a monitored attribute's value is beyond the default or user-defined threshold safe range, an event is generated and a status is issued. If the value of the attribute is lower than the low threshold or higher than the high threshold, then depending on the severity of the threshold, an event is generated to show a status of nonrecoverable, critical, or warning. Otherwise the status of the OS health monitored attribute is OK, provided that a value can be obtained.

If no value can be obtained, an event is generated to show that the status of the monitored attribute is unknown. The health of an OS resource can be shown as unknown if the server is reachable but the agent for the monitoring feature cannot be contacted on SNMP port 161. For more information, see Understanding the Differences Between Unreachable and Unknown States for Managed Servers.

The nonrecoverable, critical, warning, and unknown statuses are represented by alarms displayed in the browser interface.

The values nonrecoverable, critical, and warning are discussed in show server in Sun N1 System Manager 1.3 Command Line Reference Manual.

Threshold values for OS health attributes can be configured at the command line. This process is explained in Setting Threshold Values. For threshold values measuring percentages, the valid range is from 0 to 100%. If you try to set a threshold value outside of this range, an error is generated. For attributes that do not measure percentages, these values depend on the number of processors in your system and on the usage characteristics of your installation.

What Happens When a Threshold Is Broken

If the value of an OS health monitored attribute rises above the warninghigh threshold, a status of warninghigh is issued. If the value continues to rise and passes the criticalhigh threshold, a status of Failed Critical is issued. If the value continues to rise above the nonrecoverablehigh threshold, a status of nonrecoverablehigh is issued.

If the value then falls back to the safe range, no further events are generated until the value falls below the Failed Warning threshold, at which point an event is generated to show a status of normal.

If the value of a monitored attribute falls below the warninglow threshold, a status of Failed Warning is issued. If the value continues to fall, and passes the criticallow threshold, a status of Failed Critical is issued. If the value continues to fall below the nonrecoverablelow threshold, a status of nonrecoverablelow is issued.

If the value then rises back to the safe range, no further events are generated until the value rises above the warninglow threshold, at which point an event is generated to show a status of normal.

Tuning Threshold Values for Your Installation

After a period of usage, you can develop an awareness of what levels to set for OS health attribute values. You can adjust thresholds once you determine more closely what value indicates a genuine justification for an event to be generated and for an event notification to be sent to your pager or email address. For example, you might want to receive event notifications every time a certain attribute reaches a warninghigh severity threshold level. For more information, see Setting Up Event Notifications.

For important or crucial attributes at your installation, you can set the warninghigh threshold level to a low percentage value so that you are notified about a rising value as early as possible.

ProcedureTo Retrieve Threshold Values for a Server

Before You Begin

To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding and Upgrading Base Management and OS Monitoring Features.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the show server command:


    N1-ok> show server server
    

    In this procedure, server is the name of the managed server for which you want to retrieve threshold values.

    Detailed monitoring threshold values appear in the output, including threshold information for the server's hardware health, OS health, and network reachability. Default values are shown if no specific values have been set.

    See show server in Sun N1 System Manager 1.3 Command Line Reference Manual for details.

    • Threshold information is also available from the Server Details page in the browser interface. This is shown in the following graphic.

      The graphic shows that OS monitoring information can
be displayed on the Server Details page, with threshold status information.

Default Threshold Values

Factory-configured default threshold values are provided in the N1 System Manager software for some OS health thresholds. These values are stated as percentages. Table 6–3 lists default values for these OS health attributes.


Note –

Setting or modifying threshold values for hardware health attributes is not supported in this version of the Sun N1 System Manager.


Table 6–3 Factory-Configured Default Threshold Values for OS Health Attributes

Attribute Name 

Description 

Default Threshold 

Default Threshold 

cpustats.loadavg1min

System load expressed as average number of queued processes over 1 minute 

warninghigh >4.00

criticalhigh >5.00

cpustats.loadavg5min

System load expressed as average number of queued processes over 5 minutes 

warninghigh >4.10

criticalhigh >5.10

cpustats.loadavg15min

System load expressed as average number of queued processes over 15 minutes 

warninghigh >4.10

criticalhigh >5.10

cpustats.pctusage

Percentage of overall CPU usage 

warninghigh >80%

criticalhigh >90.1%

cpustats.pctidle

Percentage of CPU idle 

warninglow <20%

criticallow <10%

memusage.mbmemfree

Memory free in MB 

warninghigh <39%

criticalhigh <29%

memusage.mbmemused

Memory used in MB 

warninghigh >1501

criticalhigh >2001

memusage.pctmemused

Percentage of memory in use 

warninghigh >80%

criticalhigh >90%

memusage.pctmemfree

Percentage of memory free 

warninglow <20%

criticallow <10%

memusage.kbswapused

Swap space in use in Kb 

warninghigh >500000

criticalhigh >1000000

fsusage.kbspacefree

File system free space in Kb 

warninglow <94.0Kb

criticallow <89.0Kb

Specific threshold values can be set at the command line by following the procedures described in Setting Threshold Values.

Setting Threshold Values

Threshold values for OS health attributes can be set on specific servers. If you set specific threshold values at the command line for OS health attributes, that overwrites any factory-configured threshold values for the attributes.

ProcedureTo Set Threshold Values for a Server

Before You Begin

To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding and Upgrading Base Management and OS Monitoring Features.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Use the set server command with the threshold attribute.

    The syntax requires the threshold keyword to be followed by the attribute for which you are setting a threshold. The attribute is an OS health attribute. OS health attributes are described in OS Health Monitoring and listed in Table 6–2.

    The threshold is either criticallow, warninglow, warninghigh, or criticalhigh. The value is a numeric figure and usually represents a percentage.

    This set server operation does not actually touch the managed server. It just synchronizes the data on the management server itself.

    • To set one threshold value, type the following:


      N1-ok> set server server threshold attribute threshold value
      
    • To set multiple threshold values for the server, type the following:


      N1-ok> set server server threshold attribute threshold value threshold value
      
    • For a server group, use the set group command with the threshold attribute. To modify one threshold for the server group:


      N1-ok> set group group threshold attribute threshold value
      
    • To modify multiple thresholds for the server group:


      N1-ok> set group group threshold attribute threshold value threshold value
      

Example 6–5 Setting Multiple Threshold Values for CPU Percentage Usage on a Server

This example shows how to set the CPU usage warninghigh severity threshold on a managed server named serv1 to 53 percent. This example also shows how to set the criticalhigh severity threshold value to 75 percent.


N1-ok> set server serv1 threshold cpustats.pctusage warninghigh 53 criticalhigh 75


Example 6–6 Setting Multiple Threshold Values for File System Percentage Usage On a Server

This example sets the file system percentage usage warninghigh threshold on a managed server named serv1 to 75 percent. This example also sets the criticalhigh threshold value to 87 percent. This example sets the threshold for every file system on the server.


N1-ok> set server serv1 threshold fsusage.pctused warninghigh 75 criticalhigh 87

You can also specify the file system for which you want to set multiple threshold values. To set the warninghigh threshold to 75 percent and the criticalhigh threshold value to 87 percent, for the /usr file system on the same server, use the filesystem attribute:


N1-ok> set server serv1 filesystem /usr threshold fsusage.pctused 
warninghigh 75 criticalhigh 87


Example 6–7 Setting a Threshold Value for File System Free Space On a Server

This example sets the warninghigh threshold for file system free space for the /var file system on a managed server named serv1 to 150 Kbytes of free space.


N1-ok> set server serv1 filesystem /var threshold fsusage.kbspacefree warninghigh 150


Example 6–8 Setting a Threshold Value for Percentage of Free Memory On a Server

This example sets the criticalhigh threshold for the percentage of free memory on a managed server named serv1 to 5%.


N1-ok> set server serv1 threshold memusage.pctmemused criticalhigh 5


Example 6–9 Deleting a Threshold Value for File System Percentage Usage on a Server

This example shows how to delete a value that was set for the warninghigh threshold on a managed server named serv1.


N1-ok> set server serv1 threshold fsusage warninghigh none

In this case, any previously set value for this threshold at this severity is deleted. In effect, monitoring is disabled for the warninghigh threshold for file system usage for this server.



Example 6–10 Setting Multiple Threshold Values for File System Usage on a Server Group

This example shows how to set the file system usage warninghigh threshold to 75 percent on a group of managed servers with a group name of grp3. This example also shows how to set the criticalhigh threshold severity value to 87 percent.


N1-ok> set group grp3 threshold fsusage.pctused warninghigh 75 criticalhigh 87