Sun Cluster 2.2 System Administration Guide

Monitoring Utilities

You can use the Sun Cluster hastat(1M)utility, in addition to the /var/adm/messages files, to monitor a Sun Cluster configuration. You can also use the Sun Cluster Manager graphical user interface, which shows the status of major cluster components and subcomponents. For more information about the Sun Cluster Manager, refer to "Using Sun Cluster Manager ". Sun Cluster also provides an SNMP agent that can be used to monitor up to 32 clusters at the same time. See Appendix C, Using Sun Cluster SNMP Management Solutions.

If you are running Solstice DiskSuite, you can also use the metastat(1M), metadb(1M), metatool(1M), medstat(1M), and mdlogd(1M) utilities to monitor the status of your disksets. The SNMP-based Solstice DiskSuite log daemon, mdlogd(1M), generates a generic SNMP trap when Solstice DiskSuite logs a message to the syslog file. You can configure mdlogd(1M) to send a trap only when certain messages are logged by specifying a regular expression in the mdlogd.cf(4) configuration file. The trap is sent to the administrative host specified in the configuration file. The administrative host must be running a network management application such as Solstice SunNet Manager(TM). You can use mdlogd(1M) if you don't want to run metastat(1M) periodically or scan the syslog output looking for Solstice DiskSuite errors or warnings. See the mdlogd(1M) man page for more information.

If you are running VxVM, you can use the vxprint, vxstat, vxtrace, vxnotify, and vxva utilities. Refer to your volume management software documentation for information about these utilities.


Note -

For information about troubleshooting and repairing defective components, refer to the appropriate hardware documentation.


Monitoring the Configuration With hastat(1M)

The hastat(1M) program displays the current state of the configuration. The program displays status information for the hosts, logical hosts, private networks, public networks, data services, local disks, and disksets, along with the most recent error messages. The hastat(1M) program extracts Sun Cluster-related error messages from the /var/adm/messages file and outputs the last few messages from each host if -m is specified. Because the recent error messages list is a filtered extract of the log messages, the context of some messages might be lost. Check the /var/adm/messages file for a complete list of the messages. The following pages show an example of output from hastat(1M):


# hastat -m 10

HIGH AVAILABILITY CONFIGURATION AND STATUS 
-------------------------------------------

LIST OF NODES CONFIGURED IN <ha-host1> CLUSTER
      phys-host1 phys-host2

CURRENT MEMBERS OF THE CLUSTER

     phys-host1 is a cluster member
     phys-host2 is a cluster member

CONFIGURATION STATE OF THE CLUSTER

     Configuration State on phys-host1: Stable
     Configuration State on phys-host2: Stable

UPTIME OF NODES IN THE CLUSTER

     uptime of phys-host1:         12:47pm  up 12 day(s), 21:11,  1 user, 
load average: 0.21, 0.15, 0.14
     uptime of phys-host2:         12:46pm  up 12 day(s),  3:15,  3 users, 
load average: 0.40, 0.20, 0.16


LOGICAL HOSTS MASTERED BY THE CLUSTER MEMBERS

Logical Hosts Mastered on phys-host1:
        ha-host-1
Loghost Hosts for which phys-host1 is Backup Node:
        ha-host2

Logical Hosts Mastered on phys-host2:
        ha-host2
Loghost Hosts for which phys-host2 is Backup Node:
        ha-host1

LOGICAL HOSTS IN MAINTENANCE STATE

     None

STATUS OF PRIVATE NETS IN THE CLUSTER

     Status of Interconnects on phys-host1:
        interconnect0: selected
        interconnect1: up
     Status of private nets on phys-host1:
        To phys-host1 - UP
        To phys-host2 - UP

     Status of Interconnects on phys-host2:
        interconnect0: selected
        interconnect1: up
     Status of private nets on phys-host2:
        To phys-host1 - UP
        To phys-host2 - UP

STATUS OF PUBLIC NETS IN THE CLUSTER

Status of Public Network On phys-host1:

bkggrp  r_adp   status  fo_time live_adp
nafo0   le0     OK      NEVER   le0

Status of Public Network On phys-host2:

bkggrp  r_adp   status  fo_time live_adp
nafo0   le0     OK      NEVER   le0

STATUS OF SERVICES RUNNING ON LOGICAL HOSTS IN THE CLUSTER

       Status Of Registered Data Services
       q:                       Off
       p:                       Off
       nfs:                     On
       oracle:                  On
       dns:                     On
       nshttp:                  Off
       nsldap:                  On

      Status Of Data Services Running On phys-host1
      Data Service HA-NFS: 
      On Logical Host ha-host1:      Ok
     
      Status Of Data Services Running On phys-host2
      Data Service HA-NFS: 
      On Logical Host ha-host2:      Ok
       
       Data Service "oracle":
       Database Status on phys-host2:
       SC22FILE - running; 

       No Status Method for Data Service "dns"

       RECENT  ERROR MESSAGES FROM THE CLUSTER

       Recent Error Messages on phys-host1
       ...
       Recent Error Messages on phys-host2
       ...

Checking Message Files

The Sun Cluster software writes messages to the /var/adm/messages file, in addition to reporting messages to the console. The following is an example of the messages reported when a disk error occurs.


...
Jun 1 16:15:26 host1 unix: WARNING: /io-unit@f,e1200000/sbi@0.0/SUNW,pln@a0000000,741022/ssd@3,4(ssd49):   
Jun 1 16:15:26 host1 unix: Error for command `write(I))' Err
Jun 1 16:15:27 host1 unix: or Level: Fatal
Jun 1 16:15:27 host1 unix: Requested Block 144004, Error Block: 715559
Jun 1 16:15:27 host1 unix: Sense Key: Media Error
Jun 1 16:15:27 host1 unix: Vendor `CONNER':
Jun 1 16:15:27 host1 unix: ASC=0x10(ID CRC or ECC error),ASCQ=0x0,FRU=0x15
...


Note -

Because Solaris and Sun Cluster error messages are written to the /var/adm/messages file, the /var directory might become full. Refer to "Maintaining the /var File System" for the procedure to correct this problem.


Highly Available Data Service Utilities

In addition, Sun Cluster provides utilities for configuring and administering the highly available data services. The utilities are described in their associated man pages. The utilities include: