Appendix D Sun Management Center Software Rules
This appendix lists the Sun Management Center rules for the following
modules:
Rules Concepts
A rule is an alarm check mechanism that allows for complex or special
purpose logic in determining the status of a monitored host or node.
There are two types of rules:
-
Simple rules are based on the rCompare rule,
in which monitored properties are compared to the rule. If the rule condition
becomes true, an alarm is generated. For example, a simple rule can be the
percentage of disk space used. If the percentage of disk space used is greater
than or equal to the percentage specified in the rule, then an alarm is generated.
-
Complex rules
are based on multiple conditions. For example, one complex rule states that
an alert alarm is generated when the following conditions are met:
-
The disk is over 75% busy
-
The average queue length is over 10
-
The wait queue is increasing
Note –
Any user-customized Solstice
SyMONTM 1.x rules must be ported to the Sun Management Center
environment before the rules can be used in Sun Management Center software.
Kernel Reader
The following table lists the Kernel Reader simple rules.
Table D–1
Kernel Reader Simple Rules
Property
|
Description
|
avg_1min
|
Load average over the
last minute
|
avg_5min
|
Load average over the
last 5 minutes
|
avg_15min
|
Load average over the
last 15 minutes
|
cpu_delta
|
Difference between the
previous and current time
|
cpu_idle
|
CPU idle time
|
cpu_kernel
|
CPU kernel time
|
cpu_user
|
CPU user time
|
cpu_wait
|
CPU wait time
|
ipctused
|
Percent of inodes used
|
kpctused
|
Percent of Kbytes used
|
mem-inuse
|
Physical memory in use
(Mbytes)
|
numusers
|
Number of users
|
numsessions
|
Number of user sessions
|
swap_used
|
Swap used (Kbytes)
|
wait_io
|
CPU wait time breakdown
|
wait_pio
|
CPU wait time breakdown
|
wait_swap
|
CPU wait time breakdown
|
The following table lists the Kernel Reader complex rules.
Table D–2 Kernel Reader Complex Rules
Rule ID
|
Description
|
Type of Alarm
|
rknrd100
|
This rule covers a transitory event. The rule generates an alert
alarm when the disk is over 75% busy, the average queue length is over 10,
and the wait queue is increasing. The alert alarm remains until the disk is
less than 70% busy and the average queue length is less than 8.
|
Alert
|
rknrd102
|
This rule covers a transitory event. The rule
generates an alert alarm if 90% of swap space is in use. The event causing
the alarm remains until swap space in use is less than 80% of the total swap
space.
|
Alert
|
rknrd103
|
This
rule covers a transitory event. The rule generates an alert alarm if swapping
and paging is high for a given CPU. This behavior indicates that a CPU may
be thrashing. An alert alarm is generated when CPU exceeds 1 swap-out, 10
page-ins, and 10 page-outs per second. The alert alarm stays on if CPU exceeds
1 swap-out, 8 page-ins, and 8 page-outs per second.
|
Alert
|
rknrd105
|
File
System Full error. This rule looks for a file system full error message in
the syslog (/var/adm/message).
|
Alert alarm that is closed immediately
|
rknrd106
|
No swap space error. This rule looks for a no swap space error
message in the syslog (/var/adm/message).
|
Alert alarm that is closed immediately
|
rknrd400
|
This rule checks for a continuous CPU load over six per
CPU for four hours.
|
Informational
|
rknrd401
|
This rule checks for disks that are busy more than
90% of the file for x hours. The parameters field
holds the last time CPU load was below six, and is initialized to some date
in the year 2001.
|
Informational
|
rknrd402
|
This rule
checks if available swap space drops below 10% for x
hours. The parameters field indicates the last time that the CPU load was
below six. This field is initialized to some date in the year 2001.
|
Informational
|
rknrd403
|
This rule is not currently
supported.
|
Informational
|
rknrd404
|
An informational alarm is generated if rule rknrd401 gets
triggered 4 times.
|
Informational
|
rknrd405
|
An informational alarm is generated if rule rknrd402 gets
triggered 4 times.
|
Informational
|
Health Monitor
The
following table lists the Health Monitor complex rules.
Table D–3 Health Monitor Complex Rules
Rule ID
|
Description
|
Type of Alarm
|
rhltm000
|
This rule
checks whether there is enough swap space.
|
Critical, Alert, Caution
|
rhltm001
|
CPU power is wasted each time a CPU has to wait for
a lock to become free. This event is counted because the kernel uses mutually
exclusive locks to synchronize its operation and to keep multiple CPUs from
concurrently accessing critical code and data regions.
|
Critical, Alert, Caution
|
rhltm002
|
NFS
remote procedure call timeouts may be associated with duplicate responses
after the call is retransmitted. These timeouts indicate that the network
is okay but the server is responding slowly.
|
Critical, Alert, Caution
|
rhltm003
|
The
run queue length is divided by the number of CPUs because every CPU takes
a job off the run queue in each time slice.
|
Critical, Alert, Caution
|
rhltm004
|
A busy disk or a
slow disk reduces system throughput and increases user response times. This
rule identifies the disks that are loaded so that the load can be rebalanced.
|
Critical, Alert, Caution
|
rhltm005
|
RAM rule based on residency time
for an unreferenced page. The virtual memory system indicates that the system
needs more memory when the system scans to look for idle pages to reclaim
for other uses.
|
Critical,
Alert, Caution
|
rhltm006
|
This rule refers to the problem with kernel memory
allocation that occurs when login attempts or network connections fail unexpectedly.
There are two possible causes: Either the kernel has reached the extent of
its address space, or the free list does not contain any pages to allocate.
The repeated failures signify a problem that might otherwise be overlooked.
|
Critical, Alert, Caution
|
rhltm007
|
A global cache of directory path name components exists.
This cache is called the directory name lookup cache (DNLC). If this cache
does not exist, directory entries must be read from disk and be scanned to
locate the right file.
|
Critical, Alert, Caution
|