Sun Cluster 2.2 System Administration Guide

Part III Technical Reference

Appendix A Administering Volume Managers

This appendix provides instructions for administering Solstice DiskSuite disksets and metadevices, and for administering Sun StorEdge Volume Manager and Cluster Volume Manager objects. The procedures documented in this appendix are dependent on your volume management software.

This appendix includes the following procedures:

A.1 Using Solstice DiskSuite in the Sun Cluster Environment

This section describes using DiskSuite to administer:

Refer to the Solstice DiskSuite documentation for a complete discussion of administering DiskSuite objects.

A.1.1 Metadevice and Diskset Administration

Metadevices and disksets are created and administered using either Solstice DiskSuite command-line utilities or the DiskSuite Tool (metatool(1M)) graphical user interface.

Read the information in this chapter before using the Solstice DiskSuite documentation to administer disksets and metadevices in a Sun Cluster configuration.

Disksets are groups of disks. The primary administration task that you perform on disksets involves adding and removing disks.

Before using a disk that you have placed in a diskset, you must set up a metadevice using the disk's slices. A metadevice can be a concatenation, stripe, mirror, or UFS logging device (also called a trans device). You can also create hot spare pools that contain slices to serve as replacements when a metadevice is errored.


Note -

Metadevice names begin with d and are followed by a number. By default in a Sun Cluster configuration, there are 128 unique metadevices in the range 0 to 127. Each UFS logging device that you create will use at least seven metadevice names. Therefore, in a large Sun Cluster configuration, you might need more than the 128 default metadevice names. For instructions on changing the default quantity, refer to the Solstice DiskSuite documentation. Hot spare pool names begin with hsp and are followed by a number. You can have up to 1,000 hot spare pools ranging from hsp000 to hsp999.


A.1.1.1 About Disksets

This section provides overview information on disksets and their relationship to logical hosts, and procedures on how to add and remove disks from the diskset associated with the logical host.

Sun Cluster logical hosts are mastered by physical hosts. Only the physical host that currently masters a logical host can access the logical host's diskset. When a physical host masters a logical host's diskset, it is said to have ownership of the diskset. In general, Sun Cluster takes care of diskset ownership. However, if the logical host is in maintenance state, as reported by the hastat(1M) command, you can use the DiskSuite metaset -t command to manually take diskset ownership. Before returning the logical host to service, release diskset ownership with the metaset -r command.


Note -

If the logical hosts are up and running, you should never perform diskset administration using either the -t (take ownership) or -r (release ownership) options of the metaset(1M) command. These options are used internally by the Sun Cluster software and must be coordinated between the cluster nodes.


A.1.2 Adding a Disk to a Diskset

If the disk being added to a diskset will be used as a submirror, you must have two disks available on two different multihost disk expansion units to allow for mirroring. However, if the disk will be used as a hot spare, you can add a single disk.

A.1.3 How to Add a Disk to a Diskset (Solstice DiskSuite)

  1. Ensure that no data is on the disk.

    This is important because the partition table will be rewritten and space for a metadevice state database replica will be allocated on the disk.

  2. Insert the disk device into the multihost disk expansion unit.

    Use the instructions in the hardware documentation for your disk expansion unit for information on disk addition and removal procedures.

  3. Add the disk to a diskset.

    The syntax for the command is shown below. In this example, diskset is the name of the diskset to which the disk is to be added, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).


    # metaset -s diskset -a drive
    
  4. After adding the disks to the diskset by using the metaset(1M) command, use the scadmin(1M) command to reserve and enable failfast on the specified disks.


    phys-hahost1# scadmin reserve drivename
    

A.1.4 Removing a Disk From a Diskset

You can remove a disk from a diskset at any time, as long as none of the slices on the disk are currently in use in metadevices or hot spare pools.

A.1.5 How to Remove a Disk From a Diskset (Solstice DiskSuite)

  1. Use the metastat(1M) command to ensure that none of the slices are in use as metadevices or as hot spares.

  2. Use the metaset(1M) command to remove the target disk from the diskset.

    The syntax for the command is shown below. In this example, diskset is the name of the diskset containing the (failed) disk to be removed, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).


    # metaset -s diskset -d drive
    

    This operation can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.

A.1.6 Administering Multihost Metadevices

The following sections contain information about the differences between administering metadevices in the multihost Sun Cluster environment and in a single-host environment.

Unless noted in the following sections, you can use the instructions in the Solstice DiskSuite documentation.


Note -

The instructions in the Solstice DiskSuite books are relevant only for single-host configurations.


The following sections describe the Solstice DiskSuite command-line programs to use when performing a task. Optionally, you can use the metatool(1M) graphical user interface for all the tasks unless directed otherwise. Use the -s option when running metatool(1M), because it allows you to specify the diskset name.

A.1.6.1 Managing Metadevices

For ongoing management of metadevices, you must constantly monitor the metadevices for errors in operation, as discussed in "2.1 Monitoring Utilities".

When hastat(1M) reports a problem with a diskset, use the metastat(1M) command to locate the errored metadevice.

You must use the -s option when running either metastat(1M) or metatool(1M), so that you can specify the diskset name.


Note -

You should save the metadevice configuration information when you make changes to the configuration. Use metastat -p to create output similar to what is in the md.tab file and then save the output. Refer to "1.1 Saving Disk Partition Information (Solstice DiskSuite)", for details on saving partitioning data.


A.1.6.2 Adding a Mirror to a Diskset

Mirrored metadevices can be used as part of a logging UFS file system for Sun Cluster highly available applications.

Idle slices on disks within a diskset can be configured into metadevices by using the metainit(1M) command.

A.1.6.3 Removing a Mirror From a Diskset

Sun Cluster highly available database applications can use raw mirrored metadevices for database storage. While these are not mentioned in the dfstab.logicalhost file or in the vfstab file for each logical host, they appear in the related Sun Cluster database configuration files. The mirror must be removed from these files, and the Sun Cluster database system must stop using the mirror. Then the mirror can be deleted by using the metaclear(1M) command.

A.1.6.4 Taking Submirrors Offline

If you are using SPARCstorage Arrays, note that before replacing or adding a disk drive in a SPARCstorage Array tray, all metadevices on that tray must be taken offline.

In symmetric configurations, taking the submirrors offline for maintenance is complex because disks from each of the two disksets might be in the same tray in the SPARCstorage Array. You must take the metadevices from each diskset offline before removing the tray.

Use the metaoffline(1M) command to take offline all submirrors on every disk in the tray.

A.1.6.5 Creating New Metadevices

After a disk is added to a diskset, create new metadevices using metainit(1M) or metatool(1M). If the new devices will be hot spares, use the metahs(1M) command to place the hot spares in a hot spare pool.

A.1.6.6 Replacing Errored Components

When replacing an errored metadevice component, use the metareplace(1M) command.

A replacement slice (or disk) must be available. This could be an existing device that is not in use, or a new device that you have added to the diskset.

You also can return to service drives that have sustained transient errors (for example, as a result of a chassis power failure) by using the metareplace -e command.

A.1.6.7 Deleting Metadevices

Before deleting a metadevice, verify that none of the components in the metadevice is in use by Sun Cluster HA for NFS. Then use the metaclear(1M) command to delete the metadevice.

A.1.6.8 Growing Metadevices

To grow a metadevice, you must have a least two slices (disks) in different multihost disk expansion units available. Each of the two new slices should be added to a different submirror with the metainit(1M) command. You then use the growfs(1M) command to grow the file system.


Caution - Caution -

When the growfs(1M) command is running, clients might experience interruptions of service.


If a takeover occurs while the file system is growing, the file system will not be grown. You must reissue the growfs(1M) command after the takeover completes.


Note -

The file system that contains /logicalhost/statmon cannot be grown. Because the statd(1M) program modifies this directory, it would be blocked for extended periods while the file system is growing. This would have unpredictable effects on the network file locking protocol. This is a problem only for configurations using Sun Cluster HA for NFS.


A.1.6.9 Managing Hot Spare Pools

You can add or delete hot spare devices to or from hot spare pools at any time, as long as they are not in use. In addition, you can create new hot spare pools and associate them with submirrors using the metahs(1M) command.

A.1.6.10 Managing UFS Logs

All UFS logs on multihost disks are mirrored. When a submirror fails, it is reported as an errored component. Repair the failure using either metareplace(1M) or metatool(1M).

If the entire mirror that contains the UFS log fails, you must unmount the file system, back up any accessible data, repair the error, repair the file system (using fsck(1M)), and remount the file system.

A.1.6.11 Adding UFS Logging to a Logical Host

All UFS file systems within a logical host must be logging UFS file systems to ensure that the failover or haswitch(1M) timeout criteria can be met. This facilitates fast switchovers and takeovers.

The logging UFS file system is set up by creating a trans device with a mirrored logging device and a mirrored UFS master file system. Both the logging device and UFS master device must be mirrored.

Typically, Slice 6 of each drive in a diskset can be used as a UFS log. The slices can be used for UFS log submirrors. If the slices are smaller than the log size you want, several can be concatenated. Typically, one Mbyte per 100 Mbytes is adequate for UFS logs, up to a maximum of 64 Mbytes. Ideally, log slices should be drive-disjoint from the UFS master device.


Note -

If you must repartition the disk to gain space for UFS logs, then preserve the existing Slice 7, which starts on Cylinder 0 and contains at least two Mbytes. This space is required and reserved for metadevice state database replicas. The Tag and Flag fields (as reported by the format(1M) command) must be preserved for Slice 7. The metaset(1M) command sets the Tag and Flag fields correctly when the initial configuration is built.


After the trans device has been configured, create the UFS file system using newfs(1M) on the trans device.

After the newfs process is completed, add the UFS file system to the vfstab file for the logical host, using the following procedure.

    Edit the /etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhost file to update the administrative and multihost UFS file system information.

Make sure that the vfstab.logicalhost files of all cluster nodes contain the same information. Use the cconsole(1) facility to make simultaneous edits to vfstab.logicalhost files on all nodes in the cluster.

Here's a sample vfstab.logicalhost file showing the administrative file system and four other UFS file systems:


#device                 device                   mount       FS  fsck  mount mount
 #to mount                to fsck                    point       type pass  all   options#
 /dev/md/hahost1/dsk/d11  /dev/md/hahost1/rdsk/d11 /hahost1    ufs  1     no   -
 /dev/md/hahost1/dsk/d1   /dev/md/hahost1/rdsk/d1  /hahost1/1  ufs  1     no   -
 /dev/md/hahost1/dsk/d2   /dev/md/hahost1/rdsk/d2  /hahost1/2  ufs  1     no   -
 /dev/md/hahost1/dsk/d3   /dev/md/hahostt1/rdsk/d3 /hahost1/3  ufs  1     no   -
 /dev/md/hahost1/dsk/d4   /dev/md/hahost1/rdsk/d4  /hahost1/4  ufs  1     no   -

If the file system will be shared by Sun Cluster HA for NFS, follow the procedure for sharing NFS file systems as described in the chapter on Sun Cluster HA for NFS in the Sun Cluster 2.2 Software Installation Guide.

The new file system will be mounted automatically at the next membership monitor reconfiguration. To force membership reconfiguration, use the following command:


# haswitch -r

A.1.7 Administering Local Metadevices

Local disks can be mirrored. If a single mirror fails, use the instructions in the Solstice DiskSuite documentation to replace the failed mirror and resynchronize the replacement disk with the good disk.

A.1.8 Destructive Metadevice Actions

The metadevice actions that are not supported in Sun Cluster configurations include:

A.2 Using SSVM and CVM in the Sun Cluster Environment

Sun StorEdge Volume Manager (SSVM) and Cluster Volume Manager (CVM) are variations of the same volume manager. CVM is only used in Oracle Parallel Server (OPS) configurations. This section describes using disks under the control of the volume manager to administer:

Refer to the appropriate section for a complete discussion of administering these objects.

A.2.1 Objects Administration Overview (SSVM and CVM)

Objects under the control of a volume manager are created and administered using either command-line utilities or the Visual Administrator graphical user interface.

Read the information in this chapter before using the SSVM or CVM documentation to administer objects under the control of a volume manager in a Sun Cluster configuration. The procedures presented here are one method for performing the following tasks. Use the method that works best for your particular configuration.

These objects generally have the following relationship:

The default disk group is rootdg (the root disk group). You can create additional disk groups as necessary. The primary administration tasks that you perform on disk groups involve adding and removing disks.

Before using a disk that you have placed in a disk group, you must set up disks and subdisks (under volume manager control) to build plexes, or mirrors, using the physical disk's slices. A plex can be a concatenation or stripe.

With SSVM and CVM, applications access volumes (created on volume manager disks) rather than slices.

The following sections describe the SSVM and CVM command-line programs to use when performing a task. Optionally, you can use the graphical user interface for all the tasks unless directed otherwise.


Note -

On nodes running Sun Cluster HA data services, never manually run the vxdg import or deport options on a disk group that is under the control of Sun Cluster, unless the logical host for that disk group is in maintenance mode. Before manually importing or deporting a disk group, you must either stop Sun Cluster on all nodes that can master that disk group (by running scadmin stopnode on all such nodes), or use the haswitch -m command to switch any corresponding logical host into maintenance mode. When you are ready to return control of the disk group to Sun Cluster, the safest course is to deport the disk group before running scadmin startnode or before using haswitch(1M) to place the logical host back under the control of Sun Cluster.


A.2.2 About Disks

Before a disk can be used by SSVM or CVM, it must be identified, or initialized, as a disk that is under control of a volume manager. A fully initialized disk can be added to a disk group, used to replace a previously failed disk, or used to create a new disk group.

A.2.3 How to Initialize and Configure a Disk (SSVM and CVM)

  1. Ensure that no data is on the disk.

    This is important because existing data is destroyed if the disk is initialized.

  2. Insert the disk device and install it in the disk enclosure by following the instructions in the accompanying hardware documentation.

  3. Initialize the disk and add it to a disk group.

    This is commonly done by using either the vxdiskadm menus or the graphical user interface. Alternately, you can use the command line utilities vxdisksetup and vxdg addisk to initialize the disk and place it in a disk group.

A.2.3.1 Taking a Disk Offline

Occasionally, you may need to take a physical disk offline. If the disk is corrupted, you need to disable it and remove it. You also must disable a disk before moving the physical disk device to another location to be connected to another system.

To take a physical disk offline, first remove the disk from its disk group. Then place the disk offline by using the vxdisk(1M) command.

A.2.3.2 Removing a Disk

You can remove a disk to move it to another system, or you may remove the disk because the disk is failing or has failed. Alternatively, if the volumes are no longer needed, they can be removed.

To remove a disk from the disk group, use the vxdg(1M) command. To remove the disk from volume manager control by removing the private and pubic partitions, use the vxdiskunsetup(1M) command. Refer to the vxdg(1M) and vxdiskunsetup(1M) man pages for complete information on these commands.

A.2.4 Administering Disk Groups

For SSVM and CVM, it is most convenient to create and populate disk groups from the active node that is the default master of the particular disk group. In an N+1 configuration, each of these default master nodes shares multihost disk connectivity with only one other node in the cluster, the hot-standby node. By using these nodes to populate the disk groups, you avoid the risk of generating improperly configured groups.

A.2.5 How to Create a Disk Group (SSVM and CVM)

You can use either the vxdiskadm menus or the graphical user interface to create a new disk group. Alternately, you can use the command-line utility vxdg init.

Once the disk groups have been created and populated, each one should be deported by using the vxdg deport command. Then, each group should be imported onto the hot-standby node by using the -t option. The -t option is important, as it prevents the import from persisting across the next boot. All SSVM or CVM plexes and volumes should be created, and volumes started, before continuing.

A.2.6 How to Move a Disk to a Different Disk Group (SSVM and CVM)

To move a disk between disk groups, remove the disk from one disk group and add it to the other.

This example moves the physical disk c1t0d1 from disk group acct to disk group log_node1 by using command-line utilities.

  1. Use the vxprint(1M) command to determine if the disk is in use.


    # vxprint -g acct
     TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
     dg acct         acct         -        -        -        -        -       -
    
     dm c1t0d0       c1t0d0s2     -        2050272  -        -        -       -
     dm c1t0d1       c1t0d1s2     -        2050272  -        -        -       -
     dm c2t0d0       c2t0d0s2     -        2050272  -        -        -       -
     dm c2t0d1       c2t0d1s2     -        2050272  -        -        -       -
    
     v  newvol       gen          ENABLED  204800   -        ACTIVE   -       -
     pl newvol-01    newvol       ENABLED  205632   -        ACTIVE   -       -
     sd c1t0d1-01    newvol-01    ENABLED  205632   0        -        -       -
     pl newvol-02    newvol       ENABLED  205632   -        ACTIVE   -       -
     sd c2t0d1-01    newvol-02    ENABLED  205632   0        -        -       -
    
     v  vol01        gen          ENABLED  1024000  -        ACTIVE   -       -
     pl vol01-01     vol01        ENABLED  1024128  -        ACTIVE   -       -
     sd c1t0d0-01    vol01-01     ENABLED  1024128  0        -        -       -
     pl vol01-02     vol01        ENABLED  1024128  -        ACTIVE   -       -
     sd c2t0d0-01    vol01-02     ENABLED  1024128  0        -        -       -
  2. Use the vxedit(1M) command to remove the volume to free up the c1t0d1 disk.

    You must run the vxedit command from the CVM node mastering the shared disk group.


    # vxedit -g acct -fr rm newvol
    

    The -f option forces an operation. The -r option makes the operation recursive.

  3. Remove the c1t0d1 disk from the acct disk group.

    You must run the vxdg command from the CVM node mastering the shared disk group.


    # vxdg -g acct rmdisk c1t0d1
    
  4. Add the c1t0d1 disk to the log_node1 disk group.


    # vxdg -g log_node1 adddisk c1t0d1
    

    Caution - Caution -

    This procedure does not save the configuration or data on the disk.


    This is the acct disk group after c1t0d1 is removed.


    # vxprint -g acct
    TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
     dg acct         acct         -        -        -        -        -       -
    
     dm c1t0d0       c1t0d0s2     -        2050272  -        -        -       -
     dm c2t0d0       c2t0d0s2     -        2050272  -        -        -       -
     dm c2t0d1       c2t0d1s2     -        2050272  -        -        -       -
    
     v  vol01        gen          ENABLED  1024000  -        ACTIVE   -       -
     pl vol01-01     vol01        ENABLED  1024128  -        ACTIVE   -       -
     sd c1t0d0-01    vol01-01     ENABLED  1024128  0        -        -       -
     pl vol01-02     vol01        ENABLED  1024128  -        ACTIVE   -       -
     sd c2t0d0-01    vol01-02     ENABLED  1024128  0        -        -       -

    This is the log_node1 disk group after c1t0d1 is added.


    # vxprint -g log_node1
     TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
     dg log_node1    log_node1    -        -        -        -        -       -
    
     dm c1t0d1       c1t0d1s2     -        2050272  -        -        -       -
     dm c1t3d0       c1t3d0s2     -        2050272  -        -        -       -
     dm c2t3d0       c2t3d0s2     -        2050272  -        -        -       -
     # 

    To change permissions or ownership of volumes, you must use the vxedit command.


    Caution - Caution -

    Do not use chmod or chgrp. The permissions and ownership set by chmod or chgrp are automatically reset to root during a reboot.


    Here is an example of the permissions and ownership of the volumes vol01 and vol02 in the /dev/vx/rdsk directory before a change.


    # ls -l
    crw-------		 	 1	 	root 	root	 nnn,nnnnn		date 	time 	vol01
    crw-------		 	 1	 	root	 root 	nnn,nnnnn		date 	time 	vol02
     ...

    This an example for changing the permissions and ownership for vol01.


    # vxedit -g group_name set mode=755 user=oracle vol01
    

    After the edit, note how the permissions and ownership have changed.


    # ls -l
    crwxr-xr-x		 	 1 	oracle	 root	 nnn,nnnnn	 	date 	time 	vol01
    crw-------		 	 1	 root	 root	 nnn,nnnnn 		date 	time 	vol02
     ...

A.2.7 Administering SSVM and CVM Objects

Volumes, or virtual disks, can contain file systems or applications such as databases. A volume can consist of up to 32 plexes, each of which contains one or more subdisks. In order for a volume to be usable, it must have at least one associated plex with at least one associated subdisk. Note that all subdisks within a volume must belong to the same disk group.

A.2.7.1 Creating Volumes and Adding Mirrors to Volumes

Use the graphical user interface or the command-line utility vxassist(1M) to create volumes in each disk group, and to create an associated mirror for each volume.

The actual size of an SSVM or CVM device is slightly less than the full disk drive size. SSVM and CVM reserve a small amount of space for private use, called the private region.


Note -

The use of the same volume name is allowed if the volumes belong to different disk groups.


A.2.7.2 Adding Dirty Region Logging

Dirty Region Logging (DRL) is an optional property of a volume, used to provide a speedy recovery of mirrored volumes after a system failure. DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume and uses this information to recover only the portions of the volume that need to be recovered.

A.2.7.3 Creating a Log File for an Existing Volume

Log subdisks are used to store the dirty region log of a volume that has DRL enabled. A volume with DRL has at least one log subdisk; multiple log subdisks can be used to mirror the dirty region log. Each log subdisk is associated with one of the volume's plexes. Only one log subdisk can exist per plex. If the plex contains only a log subdisk and no data subdisks, that plex can be referred to as a log plex. The log subdisk can also be associated with a regular plex containing data subdisks, in which case the log subdisk risks becoming unavailable in the event that the plex must be detached due to the failure of one of its data subdisks.

Use the graphical user interface or the command-line utility vxassist(1M) to create a log for an existing volume.

A.2.7.4 Using Hot-Relocation

Hot-relocation is the ability of a system to automatically react to I/O failures on redundant (mirrored or RAID5) volume manager objects, and to restore redundancy and access to those objects. Hot-relocation is supported only on configurations using SSVM. SSVM detects I/O failures on volume manager objects and relocates the affected subdisks to disks designated as spare disks or free space within the disk group. SSVM then reconstructs the objects that existed before the failure and makes them redundant and accessible again.

When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated, and the existing volumes consisting of the unaffected portions of the disk remain accessible.


Note -

Hot-relocation is performed only for redundant (mirrored or RAID5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but you are notified of their failure.


A spare disk must be initialized and placed in a disk group as a spare before it can be used for replacement purposes. If no disks have been designated as spares when a failure occurs, SSVM automatically uses any available free space in the disk group in which the failure occurs. If there is not enough spare disk space, a combination of spare space and free space is used. You can designate one or more disks as hot-relocation spares within each disk group. Disks can be designated as spares with the vxedit(1M) command.

A.2.7.5 Using VxFS File Systems

You can configure and specify either UFS or VxFS file systems associated with a logical host's disk groups on volumes of type fsgen. When a cluster node masters a logical host, the logical host's file systems associated with the disk groups are mounted on the mastering node's specified mount points.

During a logical host reconfiguration sequence, it is necessary to check file systems with the fsck(1M) command. Though this process is performed in non-interactive parallel mode on UFS file systems, it can affect the overall time of the reconfiguration sequence. The logging feature of UFS, SDS, and VxFS file systems greatly reduce the time that fsck(1M) takes prior to mounting file systems.

When the switchover of a data service is required along with volume recovery, the recovery takes longer than allowed in the reconfiguration steps. This causes step time-outs and the node aborts.

Consequently, when setting up mirrored volumes, always add a DRL log to decrease volume recovery time in the event of a system crash. When mirrored volumes are used in the cluster environment, DRL must be assigned for volumes greater than 500 Mbytes.

Use VxFS if large file systems (greater than 500 Mbytes) are used for HA data services. Under most circumstances, VxFS is not bundled with Sun Cluster and must be purchased separately from Veritas.


Note -

Although it is possible to configure logical hosts with very small mirrored file systems, you should use Dirty Region Logging (DRL) or VxFS file systems because of the possibility of time-outs as the size of the file system increases.


A.2.7.6 Growing a File System

To grow a striped or RAID5 volume containing a file system, you must have the free space on the same number of disks that are currently in the stripe or RAID5 volume. For example, if you have four 1GB disks striped together (giving you a 4GB file system), and you wish to add 1GB of space (to yield a 5GB filesystem), you must have four new disks, each with at least .25GB of free space. In other words, you can not add one disk to a 4-disk stripe.

The SSVM or CVM graphical user interface will choose the disks on which to grow your file system. To select the specific disks on which to grow the file system, use the command line interface instead.

UFS file systems cannot be shrunk. The only way to "shrink" a file system is to recreate the volume, run newfs on the volume, then restore the data from backup.

A.2.8 Administering Local Mirrors

Local disks can be mirrored. If a single mirror fails, use the instructions in your volume manager documentation to replace the failed mirror and resynchronize the replacement disk with the good disk.

A.3 Backing Up Multihost Data Using Solstice Backup

This section contains suggestions for using Solstice BackupTM to back up Sun Cluster file systems.

Solstice Backup is designed to run each copy of the server software on a single server. Solstice Backup expects files to be recovered using the same physical server from which they were backed up.

Solstice Backup has considerable data about the physical machines (host names and host IDs) corresponding to the server and clients. Solstice Backup's information about the underlying physical machines on which the logical hosts are configured affects how it stores client indexes.

Do not put the Solstice Backup /nsr database on the multihost disks. Conflicts can arise if two different Solstice Backup servers attempt to access the same /nsr database.

Because of the way Solstice Backup stores client indexes, do not back up a particular client using different Solstice Backup servers on different days. Make sure that a particular logical host is always mastered by the same physical server whenever backups are performed. This will enable future recover operations to succeed.


Note -

By default, Sun Cluster systems will not generate the full file system list for your backup configuration. If the save set list consists of the keyword All, then the /etc/vfstab file will be examined to determine which file systems should be saved. Because Sun Cluster vfstab files are kept in /etc/opt/SUNWcluster/conf/hanfs by default, Solstice Backup will not find them unless you explicitly list the Sun Cluster files systems to be saved. When you are testing your backup procedures, verify that all of the Sun Cluster file systems that need to be backed up appear in the Solstice Backup file system list.


Four methods of configuring Solstice Backup are presented here. You might prefer one depending on your particular Sun Cluster configuration. Switchover times could influence your decision. Once you decide on a method, continue using that method so that future recover operations will succeed.

Here is a description of the configuration methods:

In all four of the above backup options, you can have another server configured to temporarily perform backups in the event the designated Solstice Backup server is down. Note that you will not be able to use the temporary Solstice Backup server to recover files backed up by the normal Solstice Backup server, and that you cannot recover files backed up by the temporary server from the normal backup server.

Appendix B Sun Cluster Man Page Quick Reference

This appendix contains a quick reference to the syntax and descriptions for all the commands and utilities associated with the Sun Cluster framework and the Sun Cluster data services. They are presented in alphabetical order within their appropriate man page section.


Note -

The complete man pages described in this appendix are available online by using the man(1) command.


B.1 Man1

B.2 Man1M

B.3 Man3HA

B.4 Man4

B.5 Man7

Appendix C Sun Cluster Fault Detection

This appendix describes fault detection for Sun Cluster.

This section presents an overview of Sun Cluster fault detection. This fault detection encompasses three general approaches:

Fault monitoring performs sanity checks to ensure that the faulty node is the one being blamed for a problem, and not the healthy node.

Some of the information presented is specific to this release of Sun Cluster, and is expected to change as the product evolves. The time estimates given to detect various faults are rough approximations and are intended only to give the reader a general understanding of how Sun Cluster behaves. This document is not intended to be a program logic manual for the internals of Sun Cluster nor does it describe a programming interface.

C.1 Fault Detection Overview

As noted in the basic Sun Cluster architecture discussion, when one server goes down the other server takes over. This raises an important issue: how does one server recognize that another server is down?

Sun Cluster uses three methods of fault detection.

For the second and third methods, one server is probing the other server for a response. After detecting an apparent problem, the probing server carries out a number of sanity checks of itself before forcibly taking over from the other server. These sanity checks try to ensure that a problem on the probing server is not the real cause of the lack of response from the other server. These sanity checks are provided by hactl(1M), a library subroutine that is part of the Sun Cluster base framework; hence, data service-specific fault detection code need only call hactl(1M) to perform sanity checks on the probing server. See the hactl(1M) man page for details.

C.1.1 The Heartbeat Mechanism: Cluster Membership Monitor

Sun Cluster uses a heartbeat mechanism. The heartbeat processing is performed by a real-time high-priority process which is pinned in memory, that is, it is not subject to paging. This process is called the cluster membership monitor. In a ps(1) listing, its name appears as clustd.

Each server sends out an "I am alive" message, or heartbeat, over both private links approximately once every two seconds. In addition, each server is listening for the heartbeat messages from other servers, on both private links. Receiving the heartbeat on either private link is sufficient evidence that another server is running. A server will decide that another server is down if it does not hear a heartbeat message from that server for a sufficiently long period of time, approximately 12 seconds.

In the overall fault detection strategy, the cluster membership monitor heartbeat mechanism is the first line of defense. The absence of the heartbeat will immediately detect hardware crashes and operating system panics. It might also detect some gross operating system problems, for example, leaking away all communication buffers. The heartbeat mechanism is also Sun Cluster's fastest fault detection method. Because the cluster membership monitor runs at real-time priority and because it is pinned in memory, a relatively short timeout for the absence of heartbeats is justified. Conversely, for the other fault detection methods, Sun Cluster must avoid labelling a server as being down when it is merely very slow. For those methods, relatively long timeouts of several minutes are used, and, in some cases, two or more such timeouts are required before Sun Cluster will perform a takeover.

The fact that the cluster membership monitor runs at real-time priority and is pinned in memory leads to the paradox that the membership monitor might be alive even though its server is performing no useful work at the data service level. This motivates the data service-specific fault monitoring, as described in "C.4 Data Service-Specific Fault Probes".

C.1.2 Sanity Checking of Probing Node

The network fault probing and data service-specific fault probing require each node to probe another node for a response. Before doing a takeover, the probing node performs a number of basic sanity checks of itself. These checks attempt to ensure that the problem does not really lie with the probing node. They also try to ensure that taking over from the server that seems to be having a problem really will improve the situation. Without the sanity checks, the problem of false takeovers would likely arise. That is, a sick node would wrongly blame another node for lack of response and would take over from the healthier server.

The probing node performs the following sanity checks on itself before doing a takeover from another node:

C.2 Public Network Monitoring (PNM)

The PNM component has two primary functions:

PNM is implemented as a daemon (pnmd) which periodically gathers network statistics on the set of public network interfaces in a node. If the results indicate any abnormalities, pnmd attempts to distinguish between the following three cases:

PNM then does a directed ping to a peer daemon on the same subnet. If there is no reply, PNM does a broadcast ping on the same subnet. PNM then places the results of its findings in the CCD and compares the local results with the results of the other nodes (which are also placed in the CCD). This comparison is used to determine whether the network is down or whether the network interface is faulty. If PNM detects that the network interface is faulty and backup adapters are configured, it performs the network adapter failover.

The results of PNM monitoring are used by various entities. The network adapter failover component of PNM uses the monitoring results to decide whether an adapter failover would be useful. For example, if the network is experiencing a failure, no adapter failover is performed. Fault monitors associated with SC HA data services and the API call hactl use the PNM facility to diagnose the cause of data service failures. The information returned by PNM is used to decide whether to migrate the data service, and to determine the location of the data service after migration.

The syslog messages written by the PNM facility on detection of adapter failures are read by the SC Manager, which translates the messages into graphic icons and displays them through the graphical user interface.

You also can run the PNM utilities on the command line to determine the status of network components. For more information, see the man pages pnmset(1M), pnmstat(1M), pnmptor(1M), pnmrtop(1M), and pnmd(1M).

C.3 Sun Cluster Fault Probes

PNM monitors the health of the public network and will switch to backup connections when necessary. However, in the event of the total loss of public network access, PNM will not provide data service or logical host failover. In such a case, PNM will report the loss but it is up to an external fault probe to handle switching between backup nodes.

If you are using SSVM as your volume manager, the Sun Cluster framework is responsible for monitoring each Network Adapter Failover (NAFO) backup group defined per logical host, and initiating a switchover to a backup node when either of the following conditions are met:

If neither of these conditions are met, Sun Cluster will not attempt a switchover.

If your volume manager is Solstice DiskSuite, loss of public network causes the disconnected node to abort and causes the logical hosts mastered by that node to migrate to the backup node.

The Sun Cluster framework monitors the public networks only while the configuration includes a logical host and while a data service is in the "on" state and registered on that logical host. Only those NAFO backup groups that are in use by a logical host are monitored.

C.4 Data Service-Specific Fault Probes

The motivation for performing data service-specific fault probing is that although the server node and operating system are running, the software or hardware might be in such a confused state that no useful work at the data service level is occurring. In the overall architecture, the total failure of the node or operating system is detected by the cluster membership monitor's heartbeat mechanism. However, a node might be working well enough for the heartbeat mechanism to continue to execute even though the data service is not doing useful work.

Conversely, the data service-specific fault probes do not need to detect the state where one node has crashed or has stopped sending cluster heartbeat messages. The assumption is made that the cluster membership monitor detects such states, and the data service fault probes themselves contain no logic for handling these states.

A data service fault probe behaves like a client of the data service. A fault probe running on a machine monitors both the data service exported by that machine and, more importantly, the data service exported by another server. A sick server is not reliable enough to detect its own sickness, so each server is monitoring another node in addition to itself.

In addition to behaving like a client, a data service-specific fault probe will also, in some cases, use statistics from the data service as an indication that useful work is or is not occurring. A probe might also check for the existence of certain processes that are crucial to a particular data service.

Typically, the fault probes react to the absence of service by forcing one server to take over from another. In some cases, the fault probes will first attempt to restart the data service on the original machine before doing the takeover. If many restarts occur within a short time, the indication is that the machine has serious problems. In this case, a takeover by another server is executed immediately, without attempting another local restart.

C.4.1 Sun Cluster HA for NFS Fault Probes

The probing server runs two types of periodic probes against another server's NFS service.

  1. The probing server sends a NULL RPC to all daemon processes on the target node that are required to provide NFS service; these daemons are rpcbind, mountd, nfsd, lockd, and statd.

  2. The probing server does an end-to-end test: it tries to mount an NFS file system from the other node, and then to read and write a file in that file system. It does this end-to-end test for every file system that the other node is currently sharing. Because the mount is expensive, it is executed less often than the other probes.

If any of these probes fail, the probing node will consider doing a takeover from the serving node. However, certain conditions might inhibit the takeover from occurring immediately:

After passing these Sun Cluster HA for NFS-specific tests, the process of considering whether or not to do a takeover continues with calls to hactl(1M) (see "C.1.2 Sanity Checking of Probing Node").

The probing server also checks its own NFS service. The logic is similar to the probes of the other server, but instead of doing takeovers, error messages are logged to syslog and an attempt is made to restart any daemons whose process no longer exists. In other words, the restart of a daemon process is performed only when the daemon process has exited or crashed. The restart of a daemon process is not attempted if the daemon process still exists but is not responding, because that would require killing the daemon without knowing which data structures it is updating. The restart is also not done if a local restart has been attempted too recently (within the last hour). Instead, the other server is told to consider doing a takeover (provided the other server passes its own sanity checks). Finally, the rpcbind daemon is never restarted, because there is no way to inform processes that had registered with rpcbind that they need to re-register.

C.4.2 HA-DBMS Fault Probes

The fault probes for Sun Cluster HA for Oracle, Sun Cluster HA for Sybase and Sun Cluster HA for Informix perform similarly to monitor the database server. The HA-DBMS fault probes are configured by running one of the utilities, haoracle(1M), hasybase(1M), or hainformix(1M). (See the online man pages for a detailed description of the options for these utilities.)

Once the utilities are configured and activated, two processes are started on the local node and two processes are started on the remote node simulating a client access. The remote fault probe is initiated by the ha_dbms_serv daemon and is started when hareg -y dataservicename is initiated.

The HA-DBMS module uses two methods to monitor whether the DBMS service is available. First, HA-DBMS extracts statistics from the DBMS itself:

If the extracted statistics indicate that work is being performed for clients, then no other probing of the DBMS is required. Second, if the DBMS statistics show that no work is occurring, then HA-DBMS submits a small test transaction to the DBMS. If all clients happen to be idle, the DBMS statistics would show no work occurring; that is, the test transaction distinguishes the situation of the database being hung from the legitimately idle situation. Because the test transaction is executed only when the statistics show no activity, it imposes no overhead on an active database. The test transaction consists of:

HA-DBMS carefully filters the error codes returned by the DBMS, using a table that describes which codes should or should not cause a takeover. For example, in the case of Sun Cluster HA for Oracle, the scenario of table space full does not cause a takeover, because an administrator must intervene to fix this condition. (If a takeover were to occur, the new master server would encounter the same table space full condition.)

On the other hand, an error return code such as could not allocate Unix semaphore causes Sun Cluster HA for Oracle to attempt to restart ORACLE locally on this server machine. If a local restart has occurred too recently, then the other machine takes over instead (after first passing its own sanity checks).

C.4.3 Sun Cluster HA for Netscape Fault Probes

The fault monitors for all of the Sun Cluster HA for Netscape data services share a common methodology for fault monitoring of the data service instance. All use the concept of remote and local fault monitoring.

The fault monitor process running on the node which currently masters the logical host that the data service is running on is called the local fault monitor. The fault monitor process running on a node which is a possible master of the logical host is called a remote fault monitor.

Sun Cluster HA for Netscape fault monitors periodically perform a simple data service operation with the server. If the operation fails or times out, that particular probe is declared to have failed.

When a probe fails, the local fault probe attempts to restart the data service locally. This is usually sufficient to restore the data service. The remote probe keeps a record of the probe failure but does not take any action. Upon two successive failures of the probe (indicating that a restart of the data service did not correct the problem), the remote probe invokes the hactl(1M) command in "takeover" mode to initiate a failover of the logical host. Some Netscape data services use a sliding window algorithm of probe successes and failures, in which a pre-configured number of failures within the window causes the probe to take action.

You can use the hadsconfig(1M) command to tune probe interval and timeout values for Sun Cluster HA for Netscape fault monitors. Reducing the probe interval value for fault probing results in faster detection of problems, but it also might result in spurious failovers due to transient problems. Similarly, reducing the probe timeout value results in faster detection of problems related to the data service instances, but also might result in spurious takeovers if the data service is merely busy due to heavy load. For most situations, the default values for these parameters are sufficient. The parameters are described in the hadsconfig(1M) man page and in the configuration sections of each data service chapter in the Sun Cluster 2.2 Software Installation Guide.

C.4.3.1 Sun Cluster HA for DNS Fault Probes

The Sun Cluster HA for DNS fault probe performs an nslookup operation to check the health of the Sun Cluster HA for DNS server. It looks up the domain name of the Sun Cluster HA for DNS logical host from the Sun Cluster HA for DNS server. Depending upon the configuration of your /etc/resolv.conf file, nslookup might contact other servers if the primary Sun Cluster HA for DNS server is down. Thus, the nslookup operation might succeed, even when the primary Sun Cluster HA for DNS server is down. To guard against this, the fault probe verifies whether replies come from the primary Sun Cluster HA for DNS server or other servers.

C.4.3.2 Sun Cluster HA for Netscape HTTP Fault Probes

The Sun Cluster HA for Netscape HTTP fault probe checks the health of the http server by trying to connect to it on the logical host address on the configured port. Note that the fault monitor uses the port number specified to hadsconfig(1M) during configuration of the nshttp service instance.

C.4.3.3 Sun Cluster HA for Netscape News Fault Probes

The Sun Cluster HA for Netscape News fault probe checks the health of the news server by connecting to it on the logical host IP addresses and the nntp port number. It then attempts to execute the NNTP date command on the news server, and expects a response from the server within the specified probe timeout period.

C.4.3.4 Sun Cluster HA for Netscape Mail or Message Server Fault Probes

The Sun Cluster HA for Netscape Mail or Message Server fault probe checks the health of the mail or message server by probing it on all three service ports served by the server, namely the SMTP, IMAP, and POP3 ports:

For all of these tests, the fault probe expects a response string from the server within the probe timeout interval. Note that a probe failure on any of the above three service ports is considered a failure of the server. To avoid spurious failovers, the nsmail fault probe uses a sliding window algorithm for tracking probe failures and successes. If the number for probe failures in the sliding window is greater than a pre-configured number, a takeover is initiated by the remote probe.

C.4.3.5 Sun Cluster HA for Netscape LDAP Fault Probes

The Sun Cluster HA for Netscape LDAP local probe can perform a variable number of local restarts before initiating a failover. The local restart mechanism uses a sliding window algorithm; only when the number of retries is exhausted within that window does a failover occur.

The Sun Cluster HA for Netscape LDAP remote probe uses a simple telnet connection to the LDAP port to check the status of the server. The LDAP port number is the one specified during initial set-up with hadsconfig(1M).

The local probe:

C.4.4 Sun Cluster HA for Lotus Fault Probes

The Sun Cluster HA for Lotus fault probe has two parts--a local probe that runs on the node on which the Lotus Domino server processes are currently running, and a remote probe that runs on all other nodes that are possible masters of the Lotus Domino server's logical host.

Both probes use a simple telnet connection to the Lotus Domino port to check the status of the Domino server. If a probe fails to connect, it initiates a failover or takeover by invoking the hactl(1M) command.

The local fault probe can perform three local restarts before initiating a failover. The local restart mechanism uses a sliding time window algorithm; only when the number of retries is exhausted within that window does a failover occur.

C.4.5 Sun Cluster HA for Tivoli Fault Probes

Sun Cluster HA for Tivoli uses only a local fault probe. It runs on the node on which the Tivoli object dispatcher, the oserv daemon, is currently running.

The fault probe uses the Tivoli command wping to check the status of the monitored oserv daemon. The wping of an oserv daemon can fail for the following reasons:

If the local probe fails to ping the oserv daemon, it initiates a failover by invoking the hactl(1M) command. The fault probe will perform one local restart before initiating a failover.

C.4.6 Sun Cluster HA for SAP Fault Probes

The Sun Cluster HA for SAP fault probe monitors the availability of the Central Instance, specifically the message server, the enqueue server, and the dispatcher. The probe monitors only the local node by checking for the existence of the critical SAP processes. It also uses the SAP utility lgtst to verify that the SAP message server is reachable.

Upon detecting a problem, such as when a process dies prematurely or lgtst reports an error, the fault probe will first try to restart SAP on the local node for a configurable number of times (configurable through hadsconfig(1M)). If the number of restarts that the user has configured has been exhausted, then the fault probe initiates a switchover by calling hactl(1M), if this instance has been configured to allow failover (also configurable through hadsconfig(1M)). The Central Instance is shut down before the switchover occurs, and then is restarted on the remote node after the switchover is complete.

Appendix D Using Sun Cluster SNMP Management Solutions

This appendix describes how to use SNMP to monitor the behavior of a Sun Cluster configuration.

This appendix includes the following procedures:

You can use the following SNMP Management solutions to monitor Sun Cluster configurations:

D.1 Cluster SNMP Agent and Cluster Management Information Base

Sun Cluster includes a Simple Network Management Protocol (SNMP) agent, along with a Management Information Base (MIB), for the cluster. The name of the agent file is snmpd (SNMP daemon) and the name of the MIB is sun.mib.

The cluster SNMP agent is a proxy agent that is capable of monitoring several clusters (a maximum of 32) at the same time. You can manage a typical Sun Cluster from the administration workstation or System Service Processor (SSP). By installing the cluster SNMP agent on the administrative workstation or SSP, network traffic is regulated and the CPU power of the nodes is not wasted in transmitting SNMP packets.

The snmpd daemon:

The Super Monitor daemon, smond, collects hardware configuration information and critical cluster events by connecting to the in.mond daemon from each of the member nodes of the cluster(s). The smond daemon then reports the same information to the SNMP daemon (snmpd).


Note -

You need to configure only one smond daemon to collect cluster information for several clusters.


The SUNWcsnmp package contains the following:

For additional information on the snmpd and smond daemons, see Appendix B, Sun Cluster Man Page Quick Reference.

D.2 Cluster Management Information Base

The Management Information Base (MIB) is a collection of objects that can be accessed through a network management protocol. The definition of the objects should be in a generic and consistent manner so that various management platforms can read and parse the definition.

Run the snmpd daemon on the management server, which is the cluster administration workstation, or on any client. This agent provides information (gathered from smond) for all the SNMP attributes defined in the cluster MIB. This MIB file is typically compiled into an "SNMP-aware" network manager like the SunNet Manager Console. See "D.5 Changing the snmpd.conf File".

The sun.mib file provides information about clusters in the following tables:


Note -

In the preceding bullets, time refers to the local time on the SNMP server (in which the table is maintained). Thus, the time indicates when any attribute change is reported on the server.


D.2.1 The clustersTable Attributes

The clusters table consists of entries for all of the monitored clusters. Each entry in the table contains specific attributes that provide cluster information. See Table D-1 for the clustersTable attributes.

Table D-1 clustersTable Attributes

Attribute Names 

Description 

clusterName

The name of the cluster 

clusterDescr

A description of the cluster 

clusterVersion

The release version of the cluster 

numNodes

The number of nodes in the cluster 

nodeNames

The names of all the nodes in the cluster, separated by commas 

quorumDevices

The names of all the quorum devices in the cluster, separated by commas 

clusterLastUpdate

The last time any of the attributes of this entry were modified 

D.2.2 The clusterNodesTable Attributes

The cluster nodes table consists of the known nodes of all of the monitored clusters. Each entry contains specific information about the node. See Table D-2 for the clusterNodesTable attributes.


Note -

When using a cross-reference, note that the belongsToCluster attribute acts as the key reference between this table and the clustersTable.


Table D-2 clusterNodesTable Attributes

Attribute Names 

Description 

nodeName

The host name of the node. 

belongsToCluster

The name of the cluster (to which this node belongs). 

scState 

State of the Sun Cluster software component on this node (Stopped, Aborted, In Transition, Included, Excluded, or Unknown). An enterprise specific trap signals a change in state. 

vmState 

State of the volume manager software component on this node. An enterprise specific trap signals a change in state. 

dbState 

State of the database software component on this node (Down, Up, or Unknown). An enterprise specific trap signals a change in state. 

vmType 

The type of volume manager currently being used on this node. 

vmOnNode 

Mode of the SSVM software component on this node (Master, Slave, or Unknown). An enterprise specific trap signals a change in state. This attribute is not valid for clusters with other volume managers. 

nodeLastUpdate

The last time any of the attributes of this entry were modified. 

D.2.3 The switchesTable Attributes

The switches table consists of entries for all of the switches. Each entry in the table contains specific information about a switch in a cluster. See Table D-3 for the switchesTable attributes.

Table D-3 switchesTable Attributes

Attribute Names 

Description 

switchName

The name of the switch 

numPorts

The number of ports on the switch 

connectedNodes

The names of all the nodes that are presently connected to the ports of the switch 

switchLastUpdate

The last time any of the switch attributes of this entry were modified 

D.2.4 The portsTable Attributes

The ports table consists of entries for all of the switch ports. Each entry in the table contains specific information about a port within a switch. See Table D-4 for the portsTable attributes.


Note -

When using a cross-reference, note that the belongsToSwitch attribute acts as the key reference between this table and the switchesTable.


Table D-4 portsTable Attributes

Attribute Names 

Description 

portId

The port ID or number 

belongsToSwitch

The name of the switch (to which this port belongs) 

connectedNode

The name of the node (to which this port is presently connected) 

nodeAdapterId

The adapter ID (of the SCI card) on the node to which this port is connected 

portStatus 

The status of the port (Active, Inactive, and so forth) 

portLastUpdate 

The last time any of the port attributes of this entry were modified 

D.2.5 The lhostTable Attributes

The logical hosts table consists of entries for each logical host configured in the cluster. See Table D-5 for the lhostTable attributes.

Table D-5 lhostTable Attributes

Attribute Names 

Description 

lhostName

The name of the logical host 

lhostMasters

The list of node names that constitute the logical host 

lhostCurrMaster

The name of the node that is currently the master for the logical host 

lhostDS

The list of data services configured to run on this logical host 

lhostDG

The disk groups configured on this logical host 

lhostLogicalIP

The logical IP address associated with this logical host 

lhostStatus

The current status of the logical host (UP or DOWN) 

lhostLastUpdate

The last time any of the attributes of this entry were modified 

D.2.6 The dsTable Attributes

The data services table consists of entries for all data services that are configured for all logical hosts in the monitored clusters. Each entry in the table consists of specific information about a data service configured on a logical host. See Table D-6 for the dsTable attributes.


Note -

When using a cross-reference, note that the dsonLhost attribute acts as a key reference between this table and the lhostTable.


Table D-6 dsTable Attributes

Attribute Names 

Description 

dsName

The name of the data service. 

dsOnLhost

The name of the logical host on which the data service is configured. 

dsReg

The value is 1 or 0 depending on whether the data service is registered and configured to run (1) or not run (0). 

dsStatus

The current status of the data service (ON/OFF/INST DOWN). 

dsDep

The list of other data services on which this data service depends. 

dsPkg

The package name for the data service. 

dsLastUpdate

The last time any of the attributes of this entry were last modified. 

D.2.7 The dsinstTable Attributes

The data service instance table consists of entries for all data service instances. See Table D-7 for the dsinstTable attributes.


Note -

When using a cross-reference, note that the dsinstOfDS attribute can be used as a key reference between this table and the dsTable. Similarly, the dsinstOnLhost attribute can be used as a key reference between this table and the lhostTable.


Table D-7 dsinstTable Attributes

Attribute Names 

Description 

dsinstName

The name of the data service instance 

dsinstOfDS

The name of the data service of which this is a data service instance 

dsinstOnLhost

The name of the logical host on which this data service instance is running 

dsinstStatus

The status of the data service instance 

dsinstLastUpdate

The last time any of the attributes of this entry were modified 

D.3 Cluster SNMP Daemon and Super Monitor Daemon Operation

The SNMP daemon operates under the following provisions:

D.4 SNMP Traps

SNMP traps are asynchronous notifications generated by the SNMP agent that indicate an unintended change in the state of monitored objects.

The software generates Sun Cluster-specific traps for critical cluster events. These events are listed in the following tables.

Table D-8 lists the Sun Cluster traps reflecting the state of the cluster software on a node.

Table D-8 Sun Cluster Traps Reflecting the Software on a Node

Trap Number 

Trap Name 

sc:stopped

sc:aborted

sc:in_transition

sc:included

sc:excluded

sc:unknown

Table D-9 lists the Sun Cluster traps reflecting the state of the volume manager on a node.

Table D-9 Sun Cluster Traps Reflecting the volume manager on a Node

Trap Number 

Trap Name 

10 

vm:down

11 

vm:up

12 

vm:unknown

Table D-10 lists the Sun Cluster traps reflecting the state of the database on a node.

Table D-10 Sun Cluster Traps Reflecting the Database on a Node

Trap Number 

Trap Name 

20 

db:down

21 

db:up

22 

db:unknown

Table D-11 lists the Sun Cluster traps reflecting the nature of the Cluster Volume Manager (master or slave) on a node.

Table D-11 Sun Cluster Traps Reflecting Cluster Volume Manager on a Node

Trap Number 

Trap Name 

30 

vm_on_node:master

31 

vm_on_node:slave

32 

vm_on_node:unknown

Table D-12 lists the Sun Cluster traps reflecting the states of a logical host.

Table D-12 Sun Cluster Traps Reflecting the States of a Logical Host

Trap Number 

Trap Name 

40 

lhost:givingup

41 

lhost:given

42 

lhost:takingover 

43 

lhost:taken 

46 

lhost:unknown

Table D-13 lists the Sun Cluster traps reflecting the states of a data service instance.

Table D-13 Sun Cluster Traps Reflecting the States of a Data Service Instance

Trap Number 

Trap Name 

50 

ds:started

51 

ds:stopped

52 

ds:in-transition 

53 

ds:failed-locally 

54 

ds:failed-remotely 

57 

ds:unknown

Table D-14 lists the Sun Cluster traps reflecting the states of the HA-NFS data service.

Table D-14 Sun Cluster Traps Reflecting the States of the HA-NFS Data Service Instance

Trap Number 

Trap Name 

60 

hanfs:start

61 

hanfs:stop

70 

hanfs:unknown

Table D-15 lists the Sun Cluster traps reflecting SNMP errors.

Table D-15 Sun Cluster Traps Reflecting SNMP Errors

Trap Number 

Trap Name 

100 

SOCKET_ERROR:node_out_of_system_resources

101 

CONNECT_ERROR:node_out_of_system_resources

102 

BADMOND_ERROR:node_running_bad/old_mond_version

103 

NOMOND_ERROR:mond_not_installed_on_node

104 

NOMONDYET_ERROR:mond_on_node_not_responding:node_may_be_rebooting

105 

TIMEOUT_ERROR:timed_out_upon_trying_to_connect_to_nodes_mond

106 

UNREACHABLE_ERROR:node's_mond_unreachable:network_problems??

107 

READFAILED_ERROR:node_out_of_system_resources

108 

NORESPONSE_ERROR:node_out_of_system_resources

109 

BADRESPONSE_ERROR:unexpected_welcome_message_from_node's_mond

110 

SHUTDOWN_ERROR:node's_mond_shutdown

200 

Fatal:super_monitor_daemon(smond)_exited!

For trap numbers 100-110, check the faulty node and fix the problem. For trap number 200, see "D.8 SNMP Troubleshooting".

D.5 Changing the snmpd.conf File

The snmpd.conf file is used for configuration information. Each entry in the file consists of a keyword followed by a parameter string. The default values in the file should suit your needs.

D.5.1 How to Change the snmpd.conf File

  1. Edit the snmpd.conf file.

    For details on the descriptions of the keywords, refer to the snmpd(7) man page.

  2. After making any changes to the snmpd.conf file, stop the smond and snmpd programs, then restart the scripts by entering:

    # /opt/SUNWcluster/bin/smond_ctl stop
    # /opt/SUNWcluster/bin/init.snmpd stop
    # /opt/SUNWcluster/bin/init.snmpd start
    # /opt/SUNWcluster/bin/smond_ctl start
    

    An example snmpd.conf file follows.

    sysdescr        Sun SNMP Agent, SPARCstation 10, Company
                                  Property Number 123456
     syscontact 	Coby Phelps
     sysLocation 	Room 123
     #
     system-group-read-community     public
     system-group-write-community    private
     #
     read-community  all_public
     write-community all_private
     #
     trap            localhost
     trap-community  SNMP-trap
     #
     #kernel-file    /vmunix
     #
     managers        lvs golden

D.6 Configuring the Cluster SNMP Agent Port

By default, the cluster SNMP agent listens on User Datagram Protocol (UDP) Port 161 for requests from the SNMP manager, for example, SunNet Manager Console. You can change this port by using the -p option to the snmpd and smond daemons.

Both the snmpd and smond daemons must be configured on the same port in order to function properly.


Caution - Caution -

If you are installing the cluster SNMP agent on an SSP or an Administrative workstation running the Solaris 2.6 Operating Environment or compatible versions, always configure the snmpd and the smond programs on a port other than the default UDP port 161.


For example, with the SSP, the cluster SNMP agent interferes with the SSP SNMP agent which also uses UDP port 161. This interference could result in the loss of RAS features of the Sun Enterprise 10000 server.

D.6.1 How to Configure the Cluster SNMP Agent Port

To configure the cluster SNMP agent on a port other than the default UDP Port 161, perform the following steps.

  1. Edit the /opt/SUNWcluster/bin/init.snmpd file and change the value of the CSNMP_PORT variable from 161 to the desired value.

  2. Edit the /opt/SUNWcluster/bin/smond_ctl file and change the value of the CSNMP_PORT variable from 161 to the same value you chose in Step 1.

  3. Stop and then restart both the snmpd and smond daemons for the changes to take effect.

    # /opt/SUNWcluster/bin/smond_ctl stop
    # /opt/SUNWcluster/bin/init.snmpd stop
    # /opt/SUNWcluster/bin/smond_ctl start
    # /opt/SUNWcluster/bin/init.snmpd start
    

    Note -

    Configuration files specific to the SNMP manager may need to be edited for SNMP manager to become aware of the new port number. Refer to your SNMP manager documentation for more information. Alternatively, you can configure the master SNMP agent on the Administrative workstation to start the cluster SNMP proxy agent as a subagent on a port other than 161. See the Solstice Enterprise Agents User's Guide or the snmpdx(1M) man page for information on how to configure the master SNMP agent.


D.7 Using the SNMP Agent With SunNet Manager

The cluster SNMP agent has been qualified with the SunNet Manager. Perform the following procedures prior to using SunNet Manager to monitor clusters.


Note -

These procedures assume you are using the UDP port 161 for SNMP. If you changed the port number as described in "D.6 Configuring the Cluster SNMP Agent Port", you need to run the SunNet Manager SNMP proxy agent, na.snmp to use the alternate port.


D.7.1 How to Use the SNMP Agent With SunNet Manager to Monitor Clusters

  1. Copy the cluster MIB /opt/SUNWcluster/etc/sun.mib to /opt/SUNWconn/snm/agents/cluster.mib on the SunNet Manager console.

  2. On the SunNet Manager console run mib2schema on the copied cluster.mib file:

    # /opt/SUNWconn/snm/bin/mib2schema cluster.mib
    
  3. On the Sun Cluster Administrative workstation, edit the snmpd.conf file and set the parameter string in the trap keyword to the name of the SunNet Manager console.

    For more information on editing the snmpd.conf file, refer to "D.5 Changing the snmpd.conf File".

  4. Run the smond_conf command on the Sun Cluster Administrative workstation for each cluster you want to monitor. For example:

    # /opt/SUNWcluster/bin/smond_conf -h [clustername ...]
  5. Set the proxy for cluster-snmp to be the name of the SunNet Manager console.


    Note -

    In order to monitor clusters, you must also monitor the Administrative workstation using SunNet Manager.


D.7.2 How to Reconfigure smond to Monitor a Different Cluster

You can reconfigure the smond daemon to monitor a different cluster.

  1. Stop the snmpd daemon by using:

    # /opt/SUNWcluster/bin/init.snmpd stop
    
  2. Reconfigure the smond daemon by using:

    # /opt/SUNWcluster/bin/smond_conf -h [clustername ...]
  3. Start the snmpd daemon by using:

    # /opt/SUNWcluster/bin/init.snmpd start
    
  4. Start the smond daemon by using:

    # /opt/SUNWcluster/bin/smond_ctl start
    

D.8 SNMP Troubleshooting

If the Cluster MIB tables are not filled in your application, or if you receive trap number 200, be sure that the snmpd and smond daemons are running by entering:

# ps -ef | grep snmpd
# ps -ef | grep smond

You do not see any output if the daemons are not running.

If the daemons are not running, enter:

# /opt/SUNWcluster/bin/init.snmpd start
# /opt/SUNWcluster/bin/smond_ctl start