Previous Next Contents Generated Index Home


Appendix E

Sun Management Center Software Rules




This appendix lists the Sun Management Center rules for the following modules:

A rule is an alarm check mechanism that allows for complex or special purpose logic in determining the status of a monitored host or node.

There are two types of rules--simple and complex:


Note - Any user-customized Solstice SyMON 1.x rules must be ported to the new environment before the rules can be used in Sun Management Center software.


Kernel Reader

The following table lists the Kernel Reader simple rules.

TABLE  E-1   Kernel Reader Simple Rules 
Property
Description

avg_1min  

Load Averages Over The Last 1 Minute  

avg_5min  

Load Average Over The Last 5 Minutes  

avg_15min  

Load Average Over The Last 15 Minutes  

cpu_delta  

Difference between the previous and current time  

cpu_idle  

CPU idle time  

cpu_kernel  

CPU kernel time  

cpu_user  

CPU user time  

cpu_wait  

CPU wait time  

ipctused  

Percent of inodes used  

kpctused  

Percent of Kbytes used  

mem-inuse  

Physical Memory In Use (MBytes)  

numusers  

Number Of Users  

numsessions  

Number Of User Sessions  

swap_used  

Swap Used Kbytes  

wait_io  

CPU wait time breakdown  

wait_pio  

CPU wait time breakdown  

wait_swap  

CPU wait time breakdown  

The following table lists the Kernel Reader complex rules.

TABLE  E-2   Kernel Reader Complex Rules 
Rule ID
Description
Type of Alarm

rknrd100  

This rule covers a transitory event and generates an alert alarm when the disk is over 75% busy, the average queue length is over 10, and the wait queue is increasing. Alert alarm stays on until the disk is not over 70% busy and the average queue length is no longer than 8.  

Alert  

rknrd102  

This rule covers a transitory event and generates an alert alarm if 90% of swap space is in use. Event causing the alarm stays open until swap space in use is less than 80% of the total swap space.  

Alert  

rknrd103  

This rule covers a transitory event and generates an alert alarm if swapping and paging is high for a given CPU. This indicates that a CPU may be thrashing. Alert alarm is generated when CPU exceeds 1 swap-out, 10 page-ins, and 10 page-outs per second. Alert alarm stays on if CPU exceeds 1 swap-out, 8 page-ins, and 8 page-outs per second.  

Alert  

rknrd105  

File System Full error. This rule looks for a file system full error message in the syslog (/var/adm/message).  

Alert alarm that is closed immediately  

rknrd106  

No swap space error. This rule looks for a no swap space error message in the syslog (/var/adm/message).  

Alert alarm that is closed immediately  

rknrd400  

This rule checks for a continuous CPU load over 6 per CPU for four hours.  

Informational  

rknrd401  

This rule checks for disks being busy more than 90% of the file for x hours. The parameters field holds the last time CPU load was below 6, and is initialized to some date in the year 2001.  

Informational  

rknrd402  

This rule checks if available swap space drops below 10% for x hours. The parameters field holds the last time CPU load was below 6, and is initialized to some date in the year 2001.  

Informational  

rknrd403  

This rule is not currently supported.  

Informational  

rknrd404  

An informational alarm is generated if the rule rknrd401 gets triggered 4 times.  

Informational  

rknrd405  

An informational alarm is generated if the rule rknrd402 gets triggered 4 times.  

Informational  


Health Monitor

The following table lists the Health Monitor complex rules.

TABLE  E-3   Health Monitor Complex Rules 
Rule ID
Description
Type of Alarm

rhltm000  

This rule checks whether there is enough swap space.  

Critical, Alert, Caution  

rhltm001  

Each time a CPU has to wait for a lock to become free, it wastes CPU power; and this event is counted, since the kernel uses mutually exclusive locks to synchronize its operation and keep multiple CPUs from concurrently accessing critical code and data regions.  

Critical, Alert, Caution  

rhltm002  

This rule is based on the observation that NFS remote procedure call timeouts may be associated with duplicate responses after the call is retransmitted. This indicates that the network is okay but the server is responding slowly.  

Critical, Alert, Caution  

rhltm003  

Here the run queue length is divided by the number of CPUs. This is based upon the fact that every CPU takes a job off the run queue in each time slice.  

Critical, Alert, Caution  

rhltm004  

A busy or slow disk reduces system throughput and increases user response times. This rule identifies the disks that are loaded so that the load can be rebalanced.  

Critical, Alert, Caution  

rhltm005  

RAM rule based on residency time for an unreferenced page. The virtual memory system indicates that it needs more memory when it scans looking for idle pages to reclaim for other uses.  

Critical, Alert, Caution  

rhltm006  

This rule refers to the kernel memory allocation problem. It shows up when login attempts or network connections fail unexpectedly. There are two possible causes. Either the kernel has reached the extent of its address space, or the free list does not contain any pages to allocate. It is more a sign of a problem that may otherwise be overlooked.  

Critical, Alert, Caution  

rhltm007  

There is a global cache of directory path name components called the directory name lookup cache, or Directory Name Lookup Cache Rule (DNLC). Missing a cache means that directory entries must be read from disk and scanned to locate the right file.  

Critical, Alert, Caution  




Previous Next Contents Generated Index Home

Copyright © 2000 Sun Microsystems, Inc. All Rights Reserved.