Appendix D Sun Management Center Software Rules
This appendix lists the Sun Management Center rules for the following modules:
Rules Concepts
A rule is an alarm check mechanism that allows for complex or special purpose
logic in determining the status of a monitored host or node.
There are two types of rules:
-
Simple rules are based on the rCompare rule, in which monitored properties are compared to the rule. If the rule
condition becomes true, an alarm is generated. For example, a simple rule can be the
percentage of disk space used. If the percentage of disk space used is greater than
or equal to the percentage specified in the rule, then an alarm is generated.
-
Complex rules are based on multiple conditions.
For example, one complex rule states that an alert alarm is generated when the following
conditions are met:
-
The disk is over 75% busy
-
The average queue length is over 10
-
The wait queue is increasing
Note –
Any user-customized Solstice SyMONTM 1.x rules must be ported to the Sun Management Center environment before the rules
can be used in Sun Management Center software.
Kernel Reader
The following table lists the Kernel Reader simple rules.
Table D–1 Kernel Reader Simple Rules
Property
|
Description
|
avg_1min
|
Load average over the last minute
|
avg_5min
|
Load average over the last 5 minutes
|
avg_15min
|
Load average over the last 15 minutes
|
cpu_delta
|
Difference between the previous and current time
|
cpu_idle
|
CPU idle time
|
cpu_kernel
|
CPU kernel time
|
cpu_user
|
CPU user time
|
cpu_wait
|
CPU wait time
|
ipctused
|
Percent of inodes used
|
kpctused
|
Percent of Kbytes used
|
mem-inuse
|
Physical memory in use (Mbytes)
|
numusers
|
Number of users
|
numsessions
|
Number of user sessions
|
swap_used
|
Swap used (Kbytes)
|
wait_io
|
CPU wait time breakdown
|
wait_pio
|
CPU wait time breakdown
|
wait_swap
|
CPU wait time breakdown
|
The following table lists the Kernel Reader complex rules.
Table D–2 Kernel Reader Complex Rules
Rule ID
|
Description
|
Type of Alarm
|
rknrd100
|
This rule covers a transitory event. The rule generates an alert alarm
when the disk is over 75% busy, the average queue length is over 10, and the wait
queue is increasing. The alert alarm remains until the disk is less than 70% busy
and the average queue length is less than 8.
|
Alert
|
rknrd102
|
This rule covers a transitory event. The rule generates an alert alarm
if 90% of swap space is in use. The event causing the alarm remains until swap space
in use is less than 80% of the total swap space.
|
Alert
|
rknrd103
|
This rule covers a transitory event. The rule generates an alert alarm
if swapping and paging is high for a given CPU. This behavior indicates that a CPU
might be thrashing. An alert alarm is generated when CPU exceeds 1 swap-out, 10 page-ins,
and 10 page-outs per second. The alert alarm stays on if CPU exceeds 1 swap-out, 8
page-ins, and 8 page-outs per second.
|
Alert
|
rknrd105
|
File System Full error. This rule looks for a file system full error message
in the syslog (/var/adm/message).
|
Alert alarm that is closed immediately
|
rknrd106
|
No swap space error. This rule looks for a no swap space error message
in the syslog (/var/adm/message).
|
Alert alarm that is closed immediately
|
rknrd400
|
This rule checks for a continuous CPU load over six per CPU for four hours.
|
Informational
|
rknrd401
|
This rule checks for disks that are busy more than 90% of the file for x hours. The parameters field holds the last time CPU load was below
six, and is initialized to some date in the year 2001.
|
Informational
|
rknrd402
|
This rule checks if available swap space drops below 10% for x hours. The parameters field indicates the last time that the CPU load
was below six. This field is initialized to some date in the year 2001.
|
Informational
|
rknrd403
|
This rule is not currently supported.
|
Informational
|
rknrd404
|
An informational alarm is generated if rule rknrd401 gets triggered 4 times.
|
Informational
|
rknrd405
|
An informational alarm is generated if rule rknrd402 gets triggered 4 times.
|
Informational
|
Health Monitor
The following table lists the Health Monitor complex rules.
Table D–3 Health Monitor Complex Rules
Rule ID
|
Description
|
Type of Alarm
|
rhltm000
|
This rule checks whether there is enough swap space.
|
Critical, Alert, Caution
|
rhltm001
|
CPU power is wasted each time a CPU has to wait for a lock to become free.
This event is counted because the kernel uses mutually exclusive locks to synchronize
its operation and to keep multiple CPUs from concurrently accessing critical code
and data regions.
|
Critical, Alert, Caution
|
rhltm002
|
NFS remote procedure call timeouts may be associated with duplicate responses
after the call is retransmitted. These timeouts indicate that the network is okay
but the server is responding slowly.
|
Critical, Alert, Caution
|
rhltm003
|
The run queue length is divided by the number of CPUs because every CPU
takes a job off the run queue in each time slice.
|
Critical, Alert, Caution
|
rhltm004
|
A busy disk or a slow disk reduces system throughput and increases user
response times. This rule identifies the disks that are loaded so that the load can
be rebalanced.
|
Critical, Alert, Caution
|
rhltm005
|
RAM rule based on residency time for an unreferenced page. The virtual
memory system indicates that the system needs more memory when the system scans to
look for idle pages to reclaim for other uses.
|
Critical, Alert, Caution
|
rhltm006
|
This rule refers to the problem with kernel memory allocation that occurs
when login attempts or network connections fail unexpectedly. There are two possible
causes: Either the kernel has reached the extent of its address space, or the free
list does not contain any pages to allocate. The repeated failures signify a problem
that might otherwise be overlooked.
|
Critical, Alert, Caution
|
rhltm007
|
A global cache of directory path name components exists. This cache is called
the directory name lookup cache (DNLC). If this cache does not exist, directory entries
must be read from disk and be scanned to locate the right file.
|
Critical, Alert, Caution
|