Sun Cluster 2.2 System Administration Guide

Part III Technical Reference

Appendix A Administering Volume Managers

This appendix provides instructions for administering Solstice DiskSuite disksets and metadevices, and for administering Sun StorEdge Volume Manager and Cluster Volume Manager objects. The procedures documented in this appendix are dependent on your volume management software.

This appendix includes the following procedures:

A.1 Using Solstice DiskSuite in the Sun Cluster Environment

This section describes using DiskSuite to administer:

Disksets
Disks in disksets
Multi-host metadevices
Local metadevices

Refer to the Solstice DiskSuite documentation for a complete discussion of administering DiskSuite objects.

A.1.1 Metadevice and Diskset Administration

Metadevices and disksets are created and administered using either Solstice DiskSuite command-line utilities or the DiskSuite Tool (metatool(1M)) graphical user interface.

Read the information in this chapter before using the Solstice DiskSuite documentation to administer disksets and metadevices in a Sun Cluster configuration.

Disksets are groups of disks. The primary administration task that you perform on disksets involves adding and removing disks.

Before using a disk that you have placed in a diskset, you must set up a metadevice using the disk's slices. A metadevice can be a concatenation, stripe, mirror, or UFS logging device (also called a trans device). You can also create hot spare pools that contain slices to serve as replacements when a metadevice is errored.

Note -

Metadevice names begin with d and are followed by a number. By default in a Sun Cluster configuration, there are 128 unique metadevices in the range 0 to 127. Each UFS logging device that you create will use at least seven metadevice names. Therefore, in a large Sun Cluster configuration, you might need more than the 128 default metadevice names. For instructions on changing the default quantity, refer to the Solstice DiskSuite documentation. Hot spare pool names begin with hsp and are followed by a number. You can have up to 1,000 hot spare pools ranging from hsp000 to hsp999.

A.1.1.1 About Disksets

This section provides overview information on disksets and their relationship to logical hosts, and procedures on how to add and remove disks from the diskset associated with the logical host.

Sun Cluster logical hosts are mastered by physical hosts. Only the physical host that currently masters a logical host can access the logical host's diskset. When a physical host masters a logical host's diskset, it is said to have ownership of the diskset. In general, Sun Cluster takes care of diskset ownership. However, if the logical host is in maintenance state, as reported by the hastat(1M) command, you can use the DiskSuite metaset -t command to manually take diskset ownership. Before returning the logical host to service, release diskset ownership with the metaset -r command.

Note -

If the logical hosts are up and running, you should never perform diskset administration using either the -t (take ownership) or -r (release ownership) options of the metaset(1M) command. These options are used internally by the Sun Cluster software and must be coordinated between the cluster nodes.

A.1.2 Adding a Disk to a Diskset

If the disk being added to a diskset will be used as a submirror, you must have two disks available on two different multihost disk expansion units to allow for mirroring. However, if the disk will be used as a hot spare, you can add a single disk.

A.1.3 How to Add a Disk to a Diskset (Solstice DiskSuite)

Ensure that no data is on the disk.

This is important because the partition table will be rewritten and space for a metadevice state database replica will be allocated on the disk.

Insert the disk device into the multihost disk expansion unit.

Use the instructions in the hardware documentation for your disk expansion unit for information on disk addition and removal procedures.

Add the disk to a diskset.

The syntax for the command is shown below. In this example, diskset is the name of the diskset to which the disk is to be added, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).
# metaset -s diskset -a drive

After adding the disks to the diskset by using the metaset(1M) command, use the scadmin(1M) command to reserve and enable failfast on the specified disks.
phys-hahost1# scadmin reserve drivename

A.1.4 Removing a Disk From a Diskset

You can remove a disk from a diskset at any time, as long as none of the slices on the disk are currently in use in metadevices or hot spare pools.

A.1.5 How to Remove a Disk From a Diskset (Solstice DiskSuite)

Use the metastat(1M) command to ensure that none of the slices are in use as metadevices or as hot spares.

Use the metaset(1M) command to remove the target disk from the diskset.

The syntax for the command is shown below. In this example, diskset is the name of the diskset containing the (failed) disk to be removed, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).
# metaset -s diskset -d drive
This operation can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.

A.1.6 Administering Multihost Metadevices

The following sections contain information about the differences between administering metadevices in the multihost Sun Cluster environment and in a single-host environment.

Unless noted in the following sections, you can use the instructions in the Solstice DiskSuite documentation.

Note -

The instructions in the Solstice DiskSuite books are relevant only for single-host configurations.

The following sections describe the Solstice DiskSuite command-line programs to use when performing a task. Optionally, you can use the metatool(1M) graphical user interface for all the tasks unless directed otherwise. Use the -s option when running metatool(1M), because it allows you to specify the diskset name.

A.1.6.1 Managing Metadevices

For ongoing management of metadevices, you must constantly monitor the metadevices for errors in operation, as discussed in "2.1 Monitoring Utilities".

When hastat(1M) reports a problem with a diskset, use the metastat(1M) command to locate the errored metadevice.

You must use the -s option when running either metastat(1M) or metatool(1M), so that you can specify the diskset name.

Note -

You should save the metadevice configuration information when you make changes to the configuration. Use metastat -p to create output similar to what is in the md.tab file and then save the output. Refer to "1.1 Saving Disk Partition Information (Solstice DiskSuite)", for details on saving partitioning data.

A.1.6.2 Adding a Mirror to a Diskset

Mirrored metadevices can be used as part of a logging UFS file system for Sun Cluster highly available applications.

Idle slices on disks within a diskset can be configured into metadevices by using the metainit(1M) command.

A.1.6.3 Removing a Mirror From a Diskset

Sun Cluster highly available database applications can use raw mirrored metadevices for database storage. While these are not mentioned in the dfstab.logicalhost file or in the vfstab file for each logical host, they appear in the related Sun Cluster database configuration files. The mirror must be removed from these files, and the Sun Cluster database system must stop using the mirror. Then the mirror can be deleted by using the metaclear(1M) command.

A.1.6.4 Taking Submirrors Offline

If you are using SPARCstorage Arrays, note that before replacing or adding a disk drive in a SPARCstorage Array tray, all metadevices on that tray must be taken offline.

In symmetric configurations, taking the submirrors offline for maintenance is complex because disks from each of the two disksets might be in the same tray in the SPARCstorage Array. You must take the metadevices from each diskset offline before removing the tray.

Use the metaoffline(1M) command to take offline all submirrors on every disk in the tray.

A.1.6.5 Creating New Metadevices

After a disk is added to a diskset, create new metadevices using metainit(1M) or metatool(1M). If the new devices will be hot spares, use the metahs(1M) command to place the hot spares in a hot spare pool.

A.1.6.6 Replacing Errored Components

When replacing an errored metadevice component, use the metareplace(1M) command.

A replacement slice (or disk) must be available. This could be an existing device that is not in use, or a new device that you have added to the diskset.

You also can return to service drives that have sustained transient errors (for example, as a result of a chassis power failure) by using the metareplace -e command.

A.1.6.7 Deleting Metadevices

Before deleting a metadevice, verify that none of the components in the metadevice is in use by Sun Cluster HA for NFS. Then use the metaclear(1M) command to delete the metadevice.

A.1.6.8 Growing Metadevices

To grow a metadevice, you must have a least two slices (disks) in different multihost disk expansion units available. Each of the two new slices should be added to a different submirror with the metainit(1M) command. You then use the growfs(1M) command to grow the file system.

Caution -

When the growfs(1M) command is running, clients might experience interruptions of service.

If a takeover occurs while the file system is growing, the file system will not be grown. You must reissue the growfs(1M) command after the takeover completes.

Note -

The file system that contains /logicalhost/statmon cannot be grown. Because the statd(1M) program modifies this directory, it would be blocked for extended periods while the file system is growing. This would have unpredictable effects on the network file locking protocol. This is a problem only for configurations using Sun Cluster HA for NFS.

A.1.6.9 Managing Hot Spare Pools

You can add or delete hot spare devices to or from hot spare pools at any time, as long as they are not in use. In addition, you can create new hot spare pools and associate them with submirrors using the metahs(1M) command.

A.1.6.10 Managing UFS Logs

All UFS logs on multihost disks are mirrored. When a submirror fails, it is reported as an errored component. Repair the failure using either metareplace(1M) or metatool(1M).

If the entire mirror that contains the UFS log fails, you must unmount the file system, back up any accessible data, repair the error, repair the file system (using fsck(1M)), and remount the file system.

A.1.6.11 Adding UFS Logging to a Logical Host

All UFS file systems within a logical host must be logging UFS file systems to ensure that the failover or haswitch(1M) timeout criteria can be met. This facilitates fast switchovers and takeovers.

The logging UFS file system is set up by creating a trans device with a mirrored logging device and a mirrored UFS master file system. Both the logging device and UFS master device must be mirrored.

Typically, Slice 6 of each drive in a diskset can be used as a UFS log. The slices can be used for UFS log submirrors. If the slices are smaller than the log size you want, several can be concatenated. Typically, one Mbyte per 100 Mbytes is adequate for UFS logs, up to a maximum of 64 Mbytes. Ideally, log slices should be drive-disjoint from the UFS master device.

Note -

If you must repartition the disk to gain space for UFS logs, then preserve the existing Slice 7, which starts on Cylinder 0 and contains at least two Mbytes. This space is required and reserved for metadevice state database replicas. The Tag and Flag fields (as reported by the format(1M) command) must be preserved for Slice 7. The metaset(1M) command sets the Tag and Flag fields correctly when the initial configuration is built.

After the trans device has been configured, create the UFS file system using newfs(1M) on the trans device.

After the newfs process is completed, add the UFS file system to the vfstab file for the logical host, using the following procedure.

Edit the /etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhost file to update the administrative and multihost UFS file system information.

Make sure that the vfstab.logicalhost files of all cluster nodes contain the same information. Use the cconsole(1) facility to make simultaneous edits to vfstab.logicalhost files on all nodes in the cluster.

Here's a sample vfstab.logicalhost file showing the administrative file system and four other UFS file systems:

#device                 device                   mount       FS  fsck  mount mount
 #to mount                to fsck                    point       type pass  all   options#
 /dev/md/hahost1/dsk/d11  /dev/md/hahost1/rdsk/d11 /hahost1    ufs  1     no   -
 /dev/md/hahost1/dsk/d1   /dev/md/hahost1/rdsk/d1  /hahost1/1  ufs  1     no   -
 /dev/md/hahost1/dsk/d2   /dev/md/hahost1/rdsk/d2  /hahost1/2  ufs  1     no   -
 /dev/md/hahost1/dsk/d3   /dev/md/hahostt1/rdsk/d3 /hahost1/3  ufs  1     no   -
 /dev/md/hahost1/dsk/d4   /dev/md/hahost1/rdsk/d4  /hahost1/4  ufs  1     no   -

If the file system will be shared by Sun Cluster HA for NFS, follow the procedure for sharing NFS file systems as described in the chapter on Sun Cluster HA for NFS in the Sun Cluster 2.2 Software Installation Guide.

The new file system will be mounted automatically at the next membership monitor reconfiguration. To force membership reconfiguration, use the following command:

# haswitch -r

A.1.7 Administering Local Metadevices

Local disks can be mirrored. If a single mirror fails, use the instructions in the Solstice DiskSuite documentation to replace the failed mirror and resynchronize the replacement disk with the good disk.

A.1.8 Destructive Metadevice Actions

The metadevice actions that are not supported in Sun Cluster configurations include:

Creation of a one-way mirror in a diskset
Creation of a configuration with too few metadevice state database replicas on the local disks
Modification of metadevice state database replicas on multihost disks, unless there are explicit instructions to do so in this or another Sun Cluster book

A.2 Using SSVM and CVM in the Sun Cluster Environment

Sun StorEdge Volume Manager (SSVM) and Cluster Volume Manager (CVM) are variations of the same volume manager. CVM is only used in Oracle Parallel Server (OPS) configurations. This section describes using disks under the control of the volume manager to administer:

Volume manager disks
Disk groups
Subdisks
Plexes
Volumes

Refer to the appropriate section for a complete discussion of administering these objects.

A.2.1 Objects Administration Overview (SSVM and CVM)

Objects under the control of a volume manager are created and administered using either command-line utilities or the Visual Administrator graphical user interface.

Read the information in this chapter before using the SSVM or CVM documentation to administer objects under the control of a volume manager in a Sun Cluster configuration. The procedures presented here are one method for performing the following tasks. Use the method that works best for your particular configuration.

These objects generally have the following relationship:

Disks are placed under volume manger control and are grouped into disk groups.
One or more subdisks (each representing a specific portion of a disk) are combined to form plexes, or mirrors.
A volume is composed of one or more plexes.

The default disk group is rootdg (the root disk group). You can create additional disk groups as necessary. The primary administration tasks that you perform on disk groups involve adding and removing disks.

Before using a disk that you have placed in a disk group, you must set up disks and subdisks (under volume manager control) to build plexes, or mirrors, using the physical disk's slices. A plex can be a concatenation or stripe.

With SSVM and CVM, applications access volumes (created on volume manager disks) rather than slices.

The following sections describe the SSVM and CVM command-line programs to use when performing a task. Optionally, you can use the graphical user interface for all the tasks unless directed otherwise.

Note -

On nodes running Sun Cluster HA data services, never manually run the vxdg import or deport options on a disk group that is under the control of Sun Cluster, unless the logical host for that disk group is in maintenance mode. Before manually importing or deporting a disk group, you must either stop Sun Cluster on all nodes that can master that disk group (by running scadmin stopnode on all such nodes), or use the haswitch -m command to switch any corresponding logical host into maintenance mode. When you are ready to return control of the disk group to Sun Cluster, the safest course is to deport the disk group before running scadmin startnode or before using haswitch(1M) to place the logical host back under the control of Sun Cluster.

A.2.2 About Disks

Before a disk can be used by SSVM or CVM, it must be identified, or initialized, as a disk that is under control of a volume manager. A fully initialized disk can be added to a disk group, used to replace a previously failed disk, or used to create a new disk group.

A.2.3 How to Initialize and Configure a Disk (SSVM and CVM)

Ensure that no data is on the disk.

This is important because existing data is destroyed if the disk is initialized.

Insert the disk device and install it in the disk enclosure by following the instructions in the accompanying hardware documentation.

Initialize the disk and add it to a disk group.

This is commonly done by using either the vxdiskadm menus or the graphical user interface. Alternately, you can use the command line utilities vxdisksetup and vxdg addisk to initialize the disk and place it in a disk group.

A.2.3.1 Taking a Disk Offline

Occasionally, you may need to take a physical disk offline. If the disk is corrupted, you need to disable it and remove it. You also must disable a disk before moving the physical disk device to another location to be connected to another system.

To take a physical disk offline, first remove the disk from its disk group. Then place the disk offline by using the vxdisk(1M) command.

A.2.3.2 Removing a Disk

You can remove a disk to move it to another system, or you may remove the disk because the disk is failing or has failed. Alternatively, if the volumes are no longer needed, they can be removed.

To remove a disk from the disk group, use the vxdg(1M) command. To remove the disk from volume manager control by removing the private and pubic partitions, use the vxdiskunsetup(1M) command. Refer to the vxdg(1M) and vxdiskunsetup(1M) man pages for complete information on these commands.

A.2.4 Administering Disk Groups

For SSVM and CVM, it is most convenient to create and populate disk groups from the active node that is the default master of the particular disk group. In an N+1 configuration, each of these default master nodes shares multihost disk connectivity with only one other node in the cluster, the hot-standby node. By using these nodes to populate the disk groups, you avoid the risk of generating improperly configured groups.

A.2.5 How to Create a Disk Group (SSVM and CVM)

You can use either the vxdiskadm menus or the graphical user interface to create a new disk group. Alternately, you can use the command-line utility vxdg init.

Once the disk groups have been created and populated, each one should be deported by using the vxdg deport command. Then, each group should be imported onto the hot-standby node by using the -t option. The -t option is important, as it prevents the import from persisting across the next boot. All SSVM or CVM plexes and volumes should be created, and volumes started, before continuing.

A.2.6 How to Move a Disk to a Different Disk Group (SSVM and CVM)

To move a disk between disk groups, remove the disk from one disk group and add it to the other.

This example moves the physical disk c1t0d1 from disk group acct to disk group log_node1 by using command-line utilities.

Use the vxprint(1M) command to determine if the disk is in use.

# vxprint -g acct
 TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
 dg acct         acct         -        -        -        -        -       -

 dm c1t0d0       c1t0d0s2     -        2050272  -        -        -       -
 dm c1t0d1       c1t0d1s2     -        2050272  -        -        -       -
 dm c2t0d0       c2t0d0s2     -        2050272  -        -        -       -
 dm c2t0d1       c2t0d1s2     -        2050272  -        -        -       -

 v  newvol       gen          ENABLED  204800   -        ACTIVE   -       -
 pl newvol-01    newvol       ENABLED  205632   -        ACTIVE   -       -
 sd c1t0d1-01    newvol-01    ENABLED  205632   0        -        -       -
 pl newvol-02    newvol       ENABLED  205632   -        ACTIVE   -       -
 sd c2t0d1-01    newvol-02    ENABLED  205632   0        -        -       -

 v  vol01        gen          ENABLED  1024000  -        ACTIVE   -       -
 pl vol01-01     vol01        ENABLED  1024128  -        ACTIVE   -       -
 sd c1t0d0-01    vol01-01     ENABLED  1024128  0        -        -       -
 pl vol01-02     vol01        ENABLED  1024128  -        ACTIVE   -       -
 sd c2t0d0-01    vol01-02     ENABLED  1024128  0        -        -       -

Use the vxedit(1M) command to remove the volume to free up the c1t0d1 disk.

You must run the vxedit command from the CVM node mastering the shared disk group.
# vxedit -g acct -fr rm newvol
The -f option forces an operation. The -r option makes the operation recursive.

Remove the c1t0d1 disk from the acct disk group.

You must run the vxdg command from the CVM node mastering the shared disk group.
# vxdg -g acct rmdisk c1t0d1

Add the c1t0d1 disk to the log_node1 disk group.

# vxdg -g log_node1 adddisk c1t0d1

Caution -

This procedure does not save the configuration or data on the disk.

This is the acct disk group after c1t0d1 is removed.

# vxprint -g acct
TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
 dg acct         acct         -        -        -        -        -       -

 dm c1t0d0       c1t0d0s2     -        2050272  -        -        -       -
 dm c2t0d0       c2t0d0s2     -        2050272  -        -        -       -
 dm c2t0d1       c2t0d1s2     -        2050272  -        -        -       -

 v  vol01        gen          ENABLED  1024000  -        ACTIVE   -       -
 pl vol01-01     vol01        ENABLED  1024128  -        ACTIVE   -       -
 sd c1t0d0-01    vol01-01     ENABLED  1024128  0        -        -       -
 pl vol01-02     vol01        ENABLED  1024128  -        ACTIVE   -       -
 sd c2t0d0-01    vol01-02     ENABLED  1024128  0        -        -       -

This is the log_node1 disk group after c1t0d1 is added.

# vxprint -g log_node1
 TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
 dg log_node1    log_node1    -        -        -        -        -       -

 dm c1t0d1       c1t0d1s2     -        2050272  -        -        -       -
 dm c1t3d0       c1t3d0s2     -        2050272  -        -        -       -
 dm c2t3d0       c2t3d0s2     -        2050272  -        -        -       -
 #

To change permissions or ownership of volumes, you must use the vxedit command.

Caution -

Do not use chmod or chgrp. The permissions and ownership set by chmod or chgrp are automatically reset to root during a reboot.

Here is an example of the permissions and ownership of the volumes vol01 and vol02 in the /dev/vx/rdsk directory before a change.

# ls -l
crw-------		 	 1	 	root 	root	 nnn,nnnnn		date 	time 	vol01
crw-------		 	 1	 	root	 root 	nnn,nnnnn		date 	time 	vol02
 ...

This an example for changing the permissions and ownership for vol01.

# vxedit -g group_name set mode=755 user=oracle vol01

After the edit, note how the permissions and ownership have changed.

# ls -l
crwxr-xr-x		 	 1 	oracle	 root	 nnn,nnnnn	 	date 	time 	vol01
crw-------		 	 1	 root	 root	 nnn,nnnnn 		date 	time 	vol02
 ...

A.2.7 Administering SSVM and CVM Objects

Volumes, or virtual disks, can contain file systems or applications such as databases. A volume can consist of up to 32 plexes, each of which contains one or more subdisks. In order for a volume to be usable, it must have at least one associated plex with at least one associated subdisk. Note that all subdisks within a volume must belong to the same disk group.

A.2.7.1 Creating Volumes and Adding Mirrors to Volumes

Use the graphical user interface or the command-line utility vxassist(1M) to create volumes in each disk group, and to create an associated mirror for each volume.

The actual size of an SSVM or CVM device is slightly less than the full disk drive size. SSVM and CVM reserve a small amount of space for private use, called the private region.

Note -

The use of the same volume name is allowed if the volumes belong to different disk groups.

A.2.7.2 Adding Dirty Region Logging

Dirty Region Logging (DRL) is an optional property of a volume, used to provide a speedy recovery of mirrored volumes after a system failure. DRL keeps track of the regions that have changed due to I/O writes to a mirrored volume and uses this information to recover only the portions of the volume that need to be recovered.

A.2.7.3 Creating a Log File for an Existing Volume

Log subdisks are used to store the dirty region log of a volume that has DRL enabled. A volume with DRL has at least one log subdisk; multiple log subdisks can be used to mirror the dirty region log. Each log subdisk is associated with one of the volume's plexes. Only one log subdisk can exist per plex. If the plex contains only a log subdisk and no data subdisks, that plex can be referred to as a log plex. The log subdisk can also be associated with a regular plex containing data subdisks, in which case the log subdisk risks becoming unavailable in the event that the plex must be detached due to the failure of one of its data subdisks.

Use the graphical user interface or the command-line utility vxassist(1M) to create a log for an existing volume.

A.2.7.4 Using Hot-Relocation

Hot-relocation is the ability of a system to automatically react to I/O failures on redundant (mirrored or RAID5) volume manager objects, and to restore redundancy and access to those objects. Hot-relocation is supported only on configurations using SSVM. SSVM detects I/O failures on volume manager objects and relocates the affected subdisks to disks designated as spare disks or free space within the disk group. SSVM then reconstructs the objects that existed before the failure and makes them redundant and accessible again.

When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated, and the existing volumes consisting of the unaffected portions of the disk remain accessible.

Note -

Hot-relocation is performed only for redundant (mirrored or RAID5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but you are notified of their failure.

A spare disk must be initialized and placed in a disk group as a spare before it can be used for replacement purposes. If no disks have been designated as spares when a failure occurs, SSVM automatically uses any available free space in the disk group in which the failure occurs. If there is not enough spare disk space, a combination of spare space and free space is used. You can designate one or more disks as hot-relocation spares within each disk group. Disks can be designated as spares with the vxedit(1M) command.

A.2.7.5 Using VxFS File Systems

You can configure and specify either UFS or VxFS file systems associated with a logical host's disk groups on volumes of type fsgen. When a cluster node masters a logical host, the logical host's file systems associated with the disk groups are mounted on the mastering node's specified mount points.

During a logical host reconfiguration sequence, it is necessary to check file systems with the fsck(1M) command. Though this process is performed in non-interactive parallel mode on UFS file systems, it can affect the overall time of the reconfiguration sequence. The logging feature of UFS, SDS, and VxFS file systems greatly reduce the time that fsck(1M) takes prior to mounting file systems.

When the switchover of a data service is required along with volume recovery, the recovery takes longer than allowed in the reconfiguration steps. This causes step time-outs and the node aborts.

Consequently, when setting up mirrored volumes, always add a DRL log to decrease volume recovery time in the event of a system crash. When mirrored volumes are used in the cluster environment, DRL must be assigned for volumes greater than 500 Mbytes.

Use VxFS if large file systems (greater than 500 Mbytes) are used for HA data services. Under most circumstances, VxFS is not bundled with Sun Cluster and must be purchased separately from Veritas.

Note -

Although it is possible to configure logical hosts with very small mirrored file systems, you should use Dirty Region Logging (DRL) or VxFS file systems because of the possibility of time-outs as the size of the file system increases.

A.2.7.6 Growing a File System

To grow a striped or RAID5 volume containing a file system, you must have the free space on the same number of disks that are currently in the stripe or RAID5 volume. For example, if you have four 1GB disks striped together (giving you a 4GB file system), and you wish to add 1GB of space (to yield a 5GB filesystem), you must have four new disks, each with at least .25GB of free space. In other words, you can not add one disk to a 4-disk stripe.

The SSVM or CVM graphical user interface will choose the disks on which to grow your file system. To select the specific disks on which to grow the file system, use the command line interface instead.

UFS file systems cannot be shrunk. The only way to "shrink" a file system is to recreate the volume, run newfs on the volume, then restore the data from backup.

A.2.8 Administering Local Mirrors

Local disks can be mirrored. If a single mirror fails, use the instructions in your volume manager documentation to replace the failed mirror and resynchronize the replacement disk with the good disk.

A.3 Backing Up Multihost Data Using Solstice Backup

This section contains suggestions for using Solstice Backup^TM to back up Sun Cluster file systems.

Solstice Backup is designed to run each copy of the server software on a single server. Solstice Backup expects files to be recovered using the same physical server from which they were backed up.

Solstice Backup has considerable data about the physical machines (host names and host IDs) corresponding to the server and clients. Solstice Backup's information about the underlying physical machines on which the logical hosts are configured affects how it stores client indexes.

Do not put the Solstice Backup /nsr database on the multihost disks. Conflicts can arise if two different Solstice Backup servers attempt to access the same /nsr database.

Because of the way Solstice Backup stores client indexes, do not back up a particular client using different Solstice Backup servers on different days. Make sure that a particular logical host is always mastered by the same physical server whenever backups are performed. This will enable future recover operations to succeed.

Note -

By default, Sun Cluster systems will not generate the full file system list for your backup configuration. If the save set list consists of the keyword All, then the /etc/vfstab file will be examined to determine which file systems should be saved. Because Sun Cluster vfstab files are kept in /etc/opt/SUNWcluster/conf/hanfs by default, Solstice Backup will not find them unless you explicitly list the Sun Cluster files systems to be saved. When you are testing your backup procedures, verify that all of the Sun Cluster file systems that need to be backed up appear in the Solstice Backup file system list.

Four methods of configuring Solstice Backup are presented here. You might prefer one depending on your particular Sun Cluster configuration. Switchover times could influence your decision. Once you decide on a method, continue using that method so that future recover operations will succeed.

Here is a description of the configuration methods:

Use a non-cluster node, non-high availability server configured as a Solstice Backup server.

Configure an additional server apart from the Sun Cluster servers to act as the Solstice Backup server. Configure the logical hosts as clients of the server. For best results, always ensure that the logical hosts are configured on their respective default masters before doing the daily backup. This might require a switchover. Having the logical hosts mastered by alternate servers on different days (possibly as the result of a takeover) could cause Solstice Backup to become confused upon attempting a recover operation, due to the way Solstice Backup stores client indexes.
Use one Sun Cluster server configured to perform local backups.

Configure one of the Sun Cluster servers to perform local backups. Always switch the logical hosts to the Solstice Backup server before performing the daily backup. That is, if phys-hahost1 and phys-hahost2 are the Sun Cluster servers, and phys-hahost1 is the Solstice Backup server, always switch the logical hosts to phys-hahost1 before performing backups. When backups are complete, switch back the logical host normally mastered by phys-hahost2.
Use the Sun Cluster servers configured as Solstice Backup servers.

Configure each Sun Cluster server to perform local backups of the logical host it masters by default. Always ensure that the logical hosts are configured on their respective default masters before performing the daily backup. This might require a switchover. Having the logical hosts mastered by alternate servers on different days (possibly as the result of a takeover) could cause Solstice Backup to become confused upon attempting a recover operation, due to the way Solstice Backup stores client indexes.
Use one Sun Cluster server configured as the Solstice Backup server.

Configure one Sun Cluster server to back up its logical host locally and to back up its sibling's logical host over the network. Always ensure that the logical hosts are configured on their respective default masters before doing the daily backup. This might require a switchover. Having the logical hosts mastered by alternate servers on different days (possibly as the result of a takeover) could cause Solstice Backup to become confused upon attempting a recover operation, due to the way Solstice Backup stores client indexes.

In all four of the above backup options, you can have another server configured to temporarily perform backups in the event the designated Solstice Backup server is down. Note that you will not be able to use the temporary Solstice Backup server to recover files backed up by the normal Solstice Backup server, and that you cannot recover files backed up by the temporary server from the normal backup server.

Appendix B Sun Cluster Man Page Quick Reference

This appendix contains a quick reference to the syntax and descriptions for all the commands and utilities associated with the Sun Cluster framework and the Sun Cluster data services. They are presented in alphabetical order within their appropriate man page section.

Note -

The complete man pages described in this appendix are available online by using the man(1) command.

B.1 Man1

cconsole(1) - Provides multi-window, multi-machine remote console access.
cconsole [clustername ...|hostname ...]
ccp(1) - Starts the Sun Cluster Cluster Control Panel GUI.
ccp [clustername]
chosts(1) - Expands the arguments into a list of host names or cluster name.
chosts name [name ...]
cports(1) - Expands the hostname arguments into a list of host, server, port triples.
cports hostname [hostname ...]
crlogin(1) - Provides multi-window, multi-machine remote login access.
crlogin [-l user] [clustername|hostname ...]
ctelnet(1) - Provides multi-window, multi-machine remote telnet access.
ctelnet [clustername ...|hostname]
lktest(1) - Tests the state of the Distributed Lock Manager (DLM).
lktest [-q]

B.2 Man1M

ccdadm(1M) - Provides administrative services to the Cluster Configuration Database (CCD).

ccdadm clustername -r restore_file [-t timeout]
 ccdadm clustername -c checkpoint_file [-t timeout]
 ccdadm clustername -q quorum_flag [-t timeout]
 ccdadm clustername [-vdh] [-t timeout]
 ccdadm -p candidate_file [-t timeout]
 ccdadm -x candidate_file [-t timeout]

ccdctl(1M) - The ccdctl command is used to control the starting, reconfiguring, and stopping of the ccdd(1m) server.
ccdctl transition clustername timeout [restart]
ccdd(1M) - The ccdd server performs the initialization, update, query, and reconfiguration of the Cluster Configuration Database (CCD).
ccdd
ccdinstall(1M) - This allows the Sun Cluster installation program, scinstall(1m), to update either the initial (ccd.database.init) or dynamic (ccd.database) instance of the CCD with package-specific information.
ccdinstall [-i] add component_file ccdinstall [-i] rem key component_file
confccdssa(1M) - Configures an SSVM mirrored volume for use as a highly-available file system to store the CCD database.
confccdssa clustername

finddevices(1M) - Probes a Sun Cluster for disk devices and array controllers.

finddevices
 finddevices ssa
 finddevices disks [cX]
 finddevices rootdev

get_ci_status(1M) - Displays the cluster configuration, the SCI adapter status and the SMA session status.
get_ci_status [-l]
get_node_status(1M) - Prints the current status of certain Sun Cluster software on the local node.
get_node_status

ha_fault_mon_config(1M) - Gets or sets NFS fault monitoring configuration parameters.

ha_fault_mon_config get ParameterName
ha_fault_mon_config set ParameterName NewValue

hactl(1M) - Provides various cluster control operations.

hactl [-n] -t|-g -s service_name -l|-p hostname [-L severity]
 	[-k cluster_key]
 hactl [-n] -r -s service_name [-k cluster_key]
hactl -f fieldname

hadsconfig(1M) - HA data services configuration command.
hadsconfig

haget(1M) - Queries the current state of the cluster HA configuration.

haget [-S] [-a APIversion] -f fieldname [-h hostname]
 	[-s dataservicename]

hainformix(1M) - Sun Cluster HA for Informix administration command.
hainformix [-s] command [server] [datafield1 ...]
halockrun(1M) - Runs a child program while holding a lock file.
halockrun [-vsn] [-e exitcode] lockfilename prog [args]
haoracle(1M) - Sun Cluster HA for Oracle administration command.
haoracle [-s] command [instance] [datafield1 ...]

hareg(1M) - Registers Sun Cluster data services.

hareg -r service_name -m method=path [, method=path ...]
 	[-b basedir] [-t method=timeout [, method=timeout ...]
 	[-d depends_on_service ...] [-h logical_host ...]
 	[-v service_version] [-a APIversion] [-p pkg ...]
 hareg -s -r Sun_service_name [-h logical_host ...]
 hareg -u service_name
hareg -q service_name [-M method|-T method|-D|-H|-V|-A|-P|-B]
 hareg -y|-n service_name ...
 hareg [-Y|-N]

hasap_dbms(1M) - .Set dependencies and modify timeouts for the Sun Cluster HA for SAP data service

hasap_dbms -d depends_on_svc[depends_on_svc]...
 hasap_dbms -r
 hasap_dbms -d method=timeout[method=timeout]...
 hasap_dbms -r

hasap_start_all_instances(1M), hasap_stop_all_instance(1M)- .Scripts called from the Sun Cluster HA for SAP data service code that should be customized to start and stop all application servers and test or development instances.
hasap_start_all_instances Instance-Name Context Timeout hasap_stop_all_instances Instance-Name Context Timeout
hastat(1M) - Monitors status of Sun Cluster configurations.
hastat [-i interval] [-m message-lines]

haswitch(1M) - Performs a switchover of services or a cluster reconfiguration in a Sun Cluster configuration.

haswitch destination_hostname logicalhostname ...
 haswitch -m logicalhostname ...
 haswitch -r

hasybase(1M) - Sun Cluster HA for Sybase administration command.
hasybase [-s] command [server] [datafield1 ...]
hatimerun(1M) - Provides a convenient facility for timing out the execution of another (child) program.
hatimerun [-va] [-k signalname] [-e exitcode] -t timeOutSecs prog args
lkdbx(1M) - Runs the Interactive Distributed Lock Manager (DLM).
lkdbx [options]

pmfadm(1M) - Provides the administrative command-line interface to the process monitor facility.

pmfadm -c nametag [-n retries] [-t period] [-a action]
           command [args_to_command ...]
 pmfadm -m nametag [-n retries] [-t period]
 pmfadm -s nametag [-w timeout] [signal]
 pmfadm -k nametag [-w timeout] [signal]
 pmfadm -l nametag [-h host]
 pmfadm -q nametag [-h host]

pnmd(1M) - The Public Network Management (PNM) service daemon.
pnmd [-s]
pnmptor(1M) - Maps the pseudo adapter name to the real adapter name in a NAFO backup group.
pnmptor p_adpname
pnmrtop(1M) - Maps the real adapter name to the pseudo adapter name in a NAFO backup group.
pnmrtop r_adpname
pnmset(1M) - Sets up the configuration for PNM (Public Network Management).
pnmset [-s] [-f filename] [-n [-t]] [-v]
pnmstat(1M) - Reports the status for NAFO backup groups monitored by PNM.
pnmstat [-d] [-l|[[[-s] -h hostname] [-c p_adpname]]]
rpc.pmfd(1M) - RPC-based process monitor server.
rpc.pmfd

scadmin(1M) - Starts up or shuts down a cluster.

scadmin [-af] startcluster localnode clustname
scadmin [-af] startnode [clustname]
scadmin [-a] stopnode [clustname]
scadmin abortpartition localnode clustname
scadmin continuepartition localnode clustname
scadmin reldisks [clustname]
scadmin resdisks [clustname]
scadmin reserve disk
scadmin switch clustname [-m] logical-hosts ...
scadmin switch clustname dest-host logical-hosts ...
scadmin switch clustname -r

scconf(1M) - Creates or modifies cluster system configuration.

scconf clustname -h hostname1 [... hostname4]
scconf clustname -i hostname if0 if1
scconf clustname -F logicalhost [disk-group]
scconf clustname -L logicalhost -n nodelist -g dglist -i logaddrinfo [-m]
scconf clustname -L logicalhost -r
scconf clustname -s data-service-name logicalhost-name
scconf clustname -s -r data-service-name logicalhost-name
scconf clustname -p
scconf clustname [+|-] D
scconf clustname -q [-m quorum-device] hostname1 hostname2
scconf clustname -q -D [-m quorum-device]
scconf clustname -U [config-file-for-Oracle-Unix-DLM]
scconf clustname -N 0|1|2|3 ethernet-address-of-host
scconf clustname -A number-of-active-hosts
scconf clustname -S none|ccdvol
scconf clustname -T new-step10-and-11-timeout-value
scconf clustname -l new-loghost-timeout-value
scconf clustname -H hostname [-dpt]
scconf clustname -t old-ip-address|TC-name [-iPl]
scconf clustname -R data-service-name [data-service-name ...]

scdidadm(1M) - Configures and administers disk ID configurations that have Solstice DiskSuite as their volume manager.

scdidadm -r [-H hostname, ...]
scdidadm -R path|instance_number
scdidadm -l|-L [-h] [-o fmt] [path|instance_number]

scinstall(1M) - Installs Sun Cluster software on the Sun Cluster servers, and sets up the configuration.

scinstall -i [-a|c|s] [-d package dir] [-A admin file]
scinstall -u [-a|c|s] [-d package dir] [-A admin file]
scinstall -l [-d package dir]
scinstall [-d package dir]
scinstall [-V|h|o]

sm_config(1M)-This is the SCI adapter configuration utility for clusters.
sm_config [-t] -f filename
sma_configd(1M)-This is the SCI adapter configuration daemon process.
sma_configd [-i] [-t]

B.3 Man3HA

ha_close(3HA) - Once a data service has finished with a snapshot by some previous call to ha_open(3HA), the data service should call ha_close(3HA) to release the snapshot, the handle, and all associated memory.

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
#include <hads.h>
ha_error_t ha_open(ha_handle_t *handlep);
ha_error_t ha_close(ha_handle_t handle);

ha_get_calls(3HA) - These are used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by some previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
#include <hads.h>
ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getconfig - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by some previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name , boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getcurstate - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by a previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getlogfacility - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by a previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getmastered - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by a previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
      [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getmastered - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by a previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_getonoff - This is used by data services to obtain selected Sun Cluster configuration and state information as recorded by snapshot by a previous call to ha_open(3HA).

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>
 ha_error_t ha_getconfig(ha_handle_t handle, ha_config_t
     **config);
 ha_error_t ha_getcurstate(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getmastered(ha_handle_t handle, ha_lhost_dyn_t
     **lhosts[]);
 ha_error_t ha_getnotmastered(ha_handle_t handle,
     ha_lhost_dyn_t **lhosts[]);
 ha_error_t ha_getonoff(ha_handle_t handle, char
     *service_name, boolean_t *ison);
 ha_error_t ha_getlogfacility(ha_handle_t handle, int *facility);

ha_open(3HA) - Obtains a snapshot of the current state of the Sun Cluster environment.

cc [flag ...] - I/opt/SUNWcluster/include file ...
      - L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
      [library ...]
 #include <hads.h>
 ha_error_t ha_open(ha_handle_t *handlep);
 ha_error_t ha_close(ha_handle_t handle);

hads -- The hads library routines provide a programming interface for the addition of data service modules to a Sun Cluster server cluster.

cc [flag ...] -I /opt/SUNWcluster/include file ...
      -L /opt/SUNWcluster/lib [threads lib] -lhads -lintl -ldl
     [library ...]
 #include <hads.h>

B.4 Man4

ccd(4) - The Cluster Configuration Database (CCD) is a highly-available, replicated database whose main purpose is to provide a single cluster-wide namespace and configuration repository for the cluster software components for the HA framework.
ccd
clusters(4) - The clusters file contains information regarding the known clusters in the local naming domain. For each cluster a single line should be present with the following information:
[clustername ...|hostname ...]
hainformix_config_V1(4) - The configuration file for Sun Cluster HA for Informix fault monitor.
hainformix_config_V1
hainformix_support(4) - The table of Sun Cluster HA for Informix releases supported by Sun Cluster.
hainformix_support
haoracle_config_V1(4) - The configuration file for Sun Cluster HA for Oracle fault monitor.
haoracle_config_V1
haoracle_support(4) - The table of Sun Cluster HA for Oracle releases supported by Sun Cluster.
haoracle_support
hasybase_config_V1(4) - The configuration file for Sun Cluster HA for Sybase fault monitor.
hasybase_config_V1
hasybase_support(4) - The table of Sun Cluster HA for Sybase releases supported by Sun Cluster.
hasybase_support
ora_cdb(4) -This file contains configuration information that will be used by the Oracle Unix DLM during start-up.
clustername.ora_cdb
pnmconfig(4) - Public Network Management (PNM) configuration file for Network Adapter Failover (NAFO). This file contains the configuration information for PNM.
pnmconfig
serialports(4) - The serialports NIS or NIS+ database maps a name to a servername and TCP port number that represents the serial port connected to the specified terminal server host.
[hostname ...|concentrator hostname ...|tcp port number ...]

B.5 Man7

did(7) - User-configurable pseudo device driver used to run Solstice DiskSuite in an N+1 configuration.
did
mond(7) - System configuration and status reporting daemon.
in.mond
smond(7) - This is the super monitor daemon that gathers system configuration and status information from several Sun Cluster 2.0 systems and passes the same to the SNMP agent on the local host.
smond -h clustername1|hostname1 clustername2|host-name2 ... smond -p port number smond -c config_file
smond_conf(7) - A utility script useful for configuring the supermonitor daemon's configuration file.
smond_conf -h clustername1|hostname1 clustername2|hostname2 ...
smond_ctl(7) - Starts or stops the super monitor daemon on the localhost.
smond_ctl [start|stop]
snmpd(7) - The Sun Cluster SNMP Agent is an RFC 1157-compliant SNMP agent.
snmpd [-r] [-p port] [-a] [-c config-file] [-T trace-level]

Appendix C Sun Cluster Fault Detection

This appendix describes fault detection for Sun Cluster.

This section presents an overview of Sun Cluster fault detection. This fault detection encompasses three general approaches:

A heartbeat mechanism
Fault monitoring of networks
Fault monitoring of specific data services

Fault monitoring performs sanity checks to ensure that the faulty node is the one being blamed for a problem, and not the healthy node.

Some of the information presented is specific to this release of Sun Cluster, and is expected to change as the product evolves. The time estimates given to detect various faults are rough approximations and are intended only to give the reader a general understanding of how Sun Cluster behaves. This document is not intended to be a program logic manual for the internals of Sun Cluster nor does it describe a programming interface.

C.1 Fault Detection Overview

As noted in the basic Sun Cluster architecture discussion, when one server goes down the other server takes over. This raises an important issue: how does one server recognize that another server is down?

Sun Cluster uses three methods of fault detection.

Heartbeat and SMA link monitoring - These monitors run over the private links. For Ethernet, there are two monitors: an SMA link monitor and a cluster membership monitor. For SCI, there are three monitors: an SMA link monitor, a cluster membership monitor, and a low-level SCI heartbeat monitor.
Network fault monitoring - All servers' public network connections are monitored: if a server cannot communicate over the public network because of a hardware or software problem, then another server in the server set will take over.
Data service-specific fault probes - Each Sun Cluster data service performs fault detection that is specific for that data service. This last method addresses the issue of whether the data service is performing useful work, not just the low-level question of whether the machine and operating system appear to be running.

For the second and third methods, one server is probing the other server for a response. After detecting an apparent problem, the probing server carries out a number of sanity checks of itself before forcibly taking over from the other server. These sanity checks try to ensure that a problem on the probing server is not the real cause of the lack of response from the other server. These sanity checks are provided by hactl(1M), a library subroutine that is part of the Sun Cluster base framework; hence, data service-specific fault detection code need only call hactl(1M) to perform sanity checks on the probing server. See the hactl(1M) man page for details.

C.1.1 The Heartbeat Mechanism: Cluster Membership Monitor

Sun Cluster uses a heartbeat mechanism. The heartbeat processing is performed by a real-time high-priority process which is pinned in memory, that is, it is not subject to paging. This process is called the cluster membership monitor. In a ps(1) listing, its name appears as clustd.

Each server sends out an "I am alive" message, or heartbeat, over both private links approximately once every two seconds. In addition, each server is listening for the heartbeat messages from other servers, on both private links. Receiving the heartbeat on either private link is sufficient evidence that another server is running. A server will decide that another server is down if it does not hear a heartbeat message from that server for a sufficiently long period of time, approximately 12 seconds.

In the overall fault detection strategy, the cluster membership monitor heartbeat mechanism is the first line of defense. The absence of the heartbeat will immediately detect hardware crashes and operating system panics. It might also detect some gross operating system problems, for example, leaking away all communication buffers. The heartbeat mechanism is also Sun Cluster's fastest fault detection method. Because the cluster membership monitor runs at real-time priority and because it is pinned in memory, a relatively short timeout for the absence of heartbeats is justified. Conversely, for the other fault detection methods, Sun Cluster must avoid labelling a server as being down when it is merely very slow. For those methods, relatively long timeouts of several minutes are used, and, in some cases, two or more such timeouts are required before Sun Cluster will perform a takeover.

The fact that the cluster membership monitor runs at real-time priority and is pinned in memory leads to the paradox that the membership monitor might be alive even though its server is performing no useful work at the data service level. This motivates the data service-specific fault monitoring, as described in "C.4 Data Service-Specific Fault Probes".

C.1.2 Sanity Checking of Probing Node

The network fault probing and data service-specific fault probing require each node to probe another node for a response. Before doing a takeover, the probing node performs a number of basic sanity checks of itself. These checks attempt to ensure that the problem does not really lie with the probing node. They also try to ensure that taking over from the server that seems to be having a problem really will improve the situation. Without the sanity checks, the problem of false takeovers would likely arise. That is, a sick node would wrongly blame another node for lack of response and would take over from the healthier server.

The probing node performs the following sanity checks on itself before doing a takeover from another node:

The probing node checks its own ability to use the public network, as described in "C.2 Public Network Monitoring (PNM)".
The probing node also checks whether its own HA data services are responding. All the HA data services that the probing node is already running are checked. If any are not responsive, takeover is inhibited, on the assumption that the probing node will not do any better trying to run another node's services if it can't run its own. Furthermore, the failure of the probing node's own HA data services to respond might be an indication of some underlying problem with the probing node that could be causing the probe of the other node to fail. Sun Cluster HA for NFS provides an important example of this phenomenon: to lock a file on another node, the probing node's own lockd and statd daemons must be working. By checking the response of its lockd and statd daemons, the probing node rules out the scenario where its own daemons' failure to respond makes the other node look unresponsive.

C.2 Public Network Monitoring (PNM)

The PNM component has two primary functions:

To monitor the status of configured adapters on a node and report general adapter or network failures.
To fail over transparently to other backup adapters on a node when the primary adapter fails.

PNM is implemented as a daemon (pnmd) which periodically gathers network statistics on the set of public network interfaces in a node. If the results indicate any abnormalities, pnmd attempts to distinguish between the following three cases:

The network is quiescent.
The network is down.
The network interface is down.

PNM then does a directed ping to a peer daemon on the same subnet. If there is no reply, PNM does a broadcast ping on the same subnet. PNM then places the results of its findings in the CCD and compares the local results with the results of the other nodes (which are also placed in the CCD). This comparison is used to determine whether the network is down or whether the network interface is faulty. If PNM detects that the network interface is faulty and backup adapters are configured, it performs the network adapter failover.

The results of PNM monitoring are used by various entities. The network adapter failover component of PNM uses the monitoring results to decide whether an adapter failover would be useful. For example, if the network is experiencing a failure, no adapter failover is performed. Fault monitors associated with SC HA data services and the API call hactl use the PNM facility to diagnose the cause of data service failures. The information returned by PNM is used to decide whether to migrate the data service, and to determine the location of the data service after migration.

The syslog messages written by the PNM facility on detection of adapter failures are read by the SC Manager, which translates the messages into graphic icons and displays them through the graphical user interface.

You also can run the PNM utilities on the command line to determine the status of network components. For more information, see the man pages pnmset(1M), pnmstat(1M), pnmptor(1M), pnmrtop(1M), and pnmd(1M).

C.3 Sun Cluster Fault Probes

PNM monitors the health of the public network and will switch to backup connections when necessary. However, in the event of the total loss of public network access, PNM will not provide data service or logical host failover. In such a case, PNM will report the loss but it is up to an external fault probe to handle switching between backup nodes.

If you are using SSVM as your volume manager, the Sun Cluster framework is responsible for monitoring each Network Adapter Failover (NAFO) backup group defined per logical host, and initiating a switchover to a backup node when either of the following conditions are met:

There is total loss of the public network (all NAFO backup groups are unavailable) and the backup node has at least one NAFO group available.
There is partial loss of the public network--at least one NAFO backup group is still active when more than one (multiple subnets) are defined for a logical host--and the backup node has a greater number of valid, active NAFO backup groups.

If neither of these conditions are met, Sun Cluster will not attempt a switchover.

If your volume manager is Solstice DiskSuite, loss of public network causes the disconnected node to abort and causes the logical hosts mastered by that node to migrate to the backup node.

The Sun Cluster framework monitors the public networks only while the configuration includes a logical host and while a data service is in the "on" state and registered on that logical host. Only those NAFO backup groups that are in use by a logical host are monitored.

C.4 Data Service-Specific Fault Probes

The motivation for performing data service-specific fault probing is that although the server node and operating system are running, the software or hardware might be in such a confused state that no useful work at the data service level is occurring. In the overall architecture, the total failure of the node or operating system is detected by the cluster membership monitor's heartbeat mechanism. However, a node might be working well enough for the heartbeat mechanism to continue to execute even though the data service is not doing useful work.

Conversely, the data service-specific fault probes do not need to detect the state where one node has crashed or has stopped sending cluster heartbeat messages. The assumption is made that the cluster membership monitor detects such states, and the data service fault probes themselves contain no logic for handling these states.

A data service fault probe behaves like a client of the data service. A fault probe running on a machine monitors both the data service exported by that machine and, more importantly, the data service exported by another server. A sick server is not reliable enough to detect its own sickness, so each server is monitoring another node in addition to itself.

In addition to behaving like a client, a data service-specific fault probe will also, in some cases, use statistics from the data service as an indication that useful work is or is not occurring. A probe might also check for the existence of certain processes that are crucial to a particular data service.

Typically, the fault probes react to the absence of service by forcing one server to take over from another. In some cases, the fault probes will first attempt to restart the data service on the original machine before doing the takeover. If many restarts occur within a short time, the indication is that the machine has serious problems. In this case, a takeover by another server is executed immediately, without attempting another local restart.

C.4.1 Sun Cluster HA for NFS Fault Probes

The probing server runs two types of periodic probes against another server's NFS service.

The probing server sends a NULL RPC to all daemon processes on the target node that are required to provide NFS service; these daemons are rpcbind, mountd, nfsd, lockd, and statd.
The probing server does an end-to-end test: it tries to mount an NFS file system from the other node, and then to read and write a file in that file system. It does this end-to-end test for every file system that the other node is currently sharing. Because the mount is expensive, it is executed less often than the other probes.

If any of these probes fail, the probing node will consider doing a takeover from the serving node. However, certain conditions might inhibit the takeover from occurring immediately:

Grace period for local restart - Before doing the takeover, the probing node waits for a short time period that is intended to:
- Give the victim node a chance to notice its own sickness and fix the problem by doing a local restart of its own daemons
- Give the victim node a chance to be less busy (if it is merely overloaded)
After waiting, the prober retries the probe, going on with takeover consideration only if it fails again. In general, two entire timeouts of the basic probe are required for a takeover, to allow for a slow server.
Multiple public networks - If the other node is on multiple public networks, the probing node will try the probe on at least two of them.
Locks - Some backup utilities exploit the lockfs(1M) facility, which locks out various types of updates on a file system, so that backup can take a snapshot of an unchanging file system. Unfortunately, in the NFS context, lockfs(1M) makes a file system appear unavailable; NFS clients will see the condition NFS server not responding. Before doing a takeover, the probing node queries the other node to find out whether the file system is in lockfs state, and, if so, takeover is inhibited. The takeover is inhibited because the lockfs is part of a normal, intended administrative procedure for doing backup. Note that not all backup utilities use lockfs; some permit NFS service to continue uninterrupted.
Daemons - Unresponsiveness of lockd and statd daemons does not cause a takeover. The lockd and statd daemons, together, provide network locking for NFS files. If these daemons are unresponsive, the condition is merely logged to syslog, and a takeover does not occur. lockd and statd, in the course of their normal work, must perform RPCs to client machines, so that a dead or partitioned client can cause lockd and statd to hang for long periods of time. Thus, a bad client can make lockd and statd on the server look sick. And if a takeover by the probing server were to occur, the probing server would probably be stalled by the bad client in the same way. With the current model, a bad client will not cause a false takeover.

After passing these Sun Cluster HA for NFS-specific tests, the process of considering whether or not to do a takeover continues with calls to hactl(1M) (see "C.1.2 Sanity Checking of Probing Node").

The probing server also checks its own NFS service. The logic is similar to the probes of the other server, but instead of doing takeovers, error messages are logged to syslog and an attempt is made to restart any daemons whose process no longer exists. In other words, the restart of a daemon process is performed only when the daemon process has exited or crashed. The restart of a daemon process is not attempted if the daemon process still exists but is not responding, because that would require killing the daemon without knowing which data structures it is updating. The restart is also not done if a local restart has been attempted too recently (within the last hour). Instead, the other server is told to consider doing a takeover (provided the other server passes its own sanity checks). Finally, the rpcbind daemon is never restarted, because there is no way to inform processes that had registered with rpcbind that they need to re-register.

C.4.2 HA-DBMS Fault Probes

The fault probes for Sun Cluster HA for Oracle, Sun Cluster HA for Sybase and Sun Cluster HA for Informix perform similarly to monitor the database server. The HA-DBMS fault probes are configured by running one of the utilities, haoracle(1M), hasybase(1M), or hainformix(1M). (See the online man pages for a detailed description of the options for these utilities.)

Once the utilities are configured and activated, two processes are started on the local node and two processes are started on the remote node simulating a client access. The remote fault probe is initiated by the ha_dbms_serv daemon and is started when hareg -y dataservicename is initiated.

The HA-DBMS module uses two methods to monitor whether the DBMS service is available. First, HA-DBMS extracts statistics from the DBMS itself:

In Oracle, the V$SYSSTAT table is queried.
In Sybase, the global variables @@io_busy, @@pack_received, @@pack_sent, @@total_read, @@total_write, and @@connections are queried.
In Informix, the SYSPROFILE table is queried.

If the extracted statistics indicate that work is being performed for clients, then no other probing of the DBMS is required. Second, if the DBMS statistics show that no work is occurring, then HA-DBMS submits a small test transaction to the DBMS. If all clients happen to be idle, the DBMS statistics would show no work occurring; that is, the test transaction distinguishes the situation of the database being hung from the legitimately idle situation. Because the test transaction is executed only when the statistics show no activity, it imposes no overhead on an active database. The test transaction consists of:

Creating a table by the name of either HA_DBMS_REM or HA_DBMS_LOC
Inserting values into the created table
Updating the inserted value
Dropping the created table

HA-DBMS carefully filters the error codes returned by the DBMS, using a table that describes which codes should or should not cause a takeover. For example, in the case of Sun Cluster HA for Oracle, the scenario of table space full does not cause a takeover, because an administrator must intervene to fix this condition. (If a takeover were to occur, the new master server would encounter the same table space full condition.)

On the other hand, an error return code such as could not allocate Unix semaphore causes Sun Cluster HA for Oracle to attempt to restart ORACLE locally on this server machine. If a local restart has occurred too recently, then the other machine takes over instead (after first passing its own sanity checks).

C.4.3 Sun Cluster HA for Netscape Fault Probes

The fault monitors for all of the Sun Cluster HA for Netscape data services share a common methodology for fault monitoring of the data service instance. All use the concept of remote and local fault monitoring.

The fault monitor process running on the node which currently masters the logical host that the data service is running on is called the local fault monitor. The fault monitor process running on a node which is a possible master of the logical host is called a remote fault monitor.

Sun Cluster HA for Netscape fault monitors periodically perform a simple data service operation with the server. If the operation fails or times out, that particular probe is declared to have failed.

When a probe fails, the local fault probe attempts to restart the data service locally. This is usually sufficient to restore the data service. The remote probe keeps a record of the probe failure but does not take any action. Upon two successive failures of the probe (indicating that a restart of the data service did not correct the problem), the remote probe invokes the hactl(1M) command in "takeover" mode to initiate a failover of the logical host. Some Netscape data services use a sliding window algorithm of probe successes and failures, in which a pre-configured number of failures within the window causes the probe to take action.

You can use the hadsconfig(1M) command to tune probe interval and timeout values for Sun Cluster HA for Netscape fault monitors. Reducing the probe interval value for fault probing results in faster detection of problems, but it also might result in spurious failovers due to transient problems. Similarly, reducing the probe timeout value results in faster detection of problems related to the data service instances, but also might result in spurious takeovers if the data service is merely busy due to heavy load. For most situations, the default values for these parameters are sufficient. The parameters are described in the hadsconfig(1M) man page and in the configuration sections of each data service chapter in the Sun Cluster 2.2 Software Installation Guide.

C.4.3.1 Sun Cluster HA for DNS Fault Probes

The Sun Cluster HA for DNS fault probe performs an nslookup operation to check the health of the Sun Cluster HA for DNS server. It looks up the domain name of the Sun Cluster HA for DNS logical host from the Sun Cluster HA for DNS server. Depending upon the configuration of your /etc/resolv.conf file, nslookup might contact other servers if the primary Sun Cluster HA for DNS server is down. Thus, the nslookup operation might succeed, even when the primary Sun Cluster HA for DNS server is down. To guard against this, the fault probe verifies whether replies come from the primary Sun Cluster HA for DNS server or other servers.

C.4.3.2 Sun Cluster HA for Netscape HTTP Fault Probes

The Sun Cluster HA for Netscape HTTP fault probe checks the health of the http server by trying to connect to it on the logical host address on the configured port. Note that the fault monitor uses the port number specified to hadsconfig(1M) during configuration of the nshttp service instance.

C.4.3.3 Sun Cluster HA for Netscape News Fault Probes

The Sun Cluster HA for Netscape News fault probe checks the health of the news server by connecting to it on the logical host IP addresses and the nntp port number. It then attempts to execute the NNTP date command on the news server, and expects a response from the server within the specified probe timeout period.

C.4.3.4 Sun Cluster HA for Netscape Mail or Message Server Fault Probes

The Sun Cluster HA for Netscape Mail or Message Server fault probe checks the health of the mail or message server by probing it on all three service ports served by the server, namely the SMTP, IMAP, and POP3 ports:

SMTP (port 25)--Executes an SMTP "hello" message on the server and then executes a quit command.
IMAP (port 143)--Executes an IMAP4 CAPABILITY command followed by an IMAP4 LOGOUT command.
POP3 (port 110)--Executes a quit command.

For all of these tests, the fault probe expects a response string from the server within the probe timeout interval. Note that a probe failure on any of the above three service ports is considered a failure of the server. To avoid spurious failovers, the nsmail fault probe uses a sliding window algorithm for tracking probe failures and successes. If the number for probe failures in the sliding window is greater than a pre-configured number, a takeover is initiated by the remote probe.

C.4.3.5 Sun Cluster HA for Netscape LDAP Fault Probes

The Sun Cluster HA for Netscape LDAP local probe can perform a variable number of local restarts before initiating a failover. The local restart mechanism uses a sliding window algorithm; only when the number of retries is exhausted within that window does a failover occur.

The Sun Cluster HA for Netscape LDAP remote probe uses a simple telnet connection to the LDAP port to check the status of the server. The LDAP port number is the one specified during initial set-up with hadsconfig(1M).

The local probe:

Probes the server by running a monitoring script. The script performs a search for the LDAP common name "monitor." The common name is defined by the Directory Server and is used only for monitoring. The probe uses the ldapsearch utility to perform this operation.
Tries to restart the server locally, upon detecting a problem with the server.
Initiates the hactl(1M) command in the giveup mode upon deciding that the local node cannot reliably run the directory server instance, while the remote probe initiates the hactl(1M) command in the takeover mode. If there are multiple possible masters of the logical host, all of the remote probes invoke the takeover operation in unison. However, after the takeover, the underlying framework ensures that a unique master node is chosen for the Directory Server.

C.4.4 Sun Cluster HA for Lotus Fault Probes

The Sun Cluster HA for Lotus fault probe has two parts--a local probe that runs on the node on which the Lotus Domino server processes are currently running, and a remote probe that runs on all other nodes that are possible masters of the Lotus Domino server's logical host.

Both probes use a simple telnet connection to the Lotus Domino port to check the status of the Domino server. If a probe fails to connect, it initiates a failover or takeover by invoking the hactl(1M) command.

The local fault probe can perform three local restarts before initiating a failover. The local restart mechanism uses a sliding time window algorithm; only when the number of retries is exhausted within that window does a failover occur.

C.4.5 Sun Cluster HA for Tivoli Fault Probes

Sun Cluster HA for Tivoli uses only a local fault probe. It runs on the node on which the Tivoli object dispatcher, the oserv daemon, is currently running.

The fault probe uses the Tivoli command wping to check the status of the monitored oserv daemon. The wping of an oserv daemon can fail for the following reasons:

The monitored oserv daemon is not running.
The oserv daemon on the server dies while monitoring a client oserv daemon.
Proper Tivoli roles (authorization) have not been set for the administrative user. See the Sun Cluster 2.2 Software Installation Guide for details about Tivoli.

If the local probe fails to ping the oserv daemon, it initiates a failover by invoking the hactl(1M) command. The fault probe will perform one local restart before initiating a failover.

C.4.6 Sun Cluster HA for SAP Fault Probes

The Sun Cluster HA for SAP fault probe monitors the availability of the Central Instance, specifically the message server, the enqueue server, and the dispatcher. The probe monitors only the local node by checking for the existence of the critical SAP processes. It also uses the SAP utility lgtst to verify that the SAP message server is reachable.

Upon detecting a problem, such as when a process dies prematurely or lgtst reports an error, the fault probe will first try to restart SAP on the local node for a configurable number of times (configurable through hadsconfig(1M)). If the number of restarts that the user has configured has been exhausted, then the fault probe initiates a switchover by calling hactl(1M), if this instance has been configured to allow failover (also configurable through hadsconfig(1M)). The Central Instance is shut down before the switchover occurs, and then is restarted on the remote node after the switchover is complete.

Appendix D Using Sun Cluster SNMP Management Solutions

This appendix describes how to use SNMP to monitor the behavior of a Sun Cluster configuration.

This appendix includes the following procedures:

You can use the following SNMP Management solutions to monitor Sun Cluster configurations:

Sun Cluster SNMP Agent
Domain Manager
Enterprise Manager
Sun Net Manager
SNMP compliant HP OpenView

D.1 Cluster SNMP Agent and Cluster Management Information Base

Sun Cluster includes a Simple Network Management Protocol (SNMP) agent, along with a Management Information Base (MIB), for the cluster. The name of the agent file is snmpd (SNMP daemon) and the name of the MIB is sun.mib.

The cluster SNMP agent is a proxy agent that is capable of monitoring several clusters (a maximum of 32) at the same time. You can manage a typical Sun Cluster from the administration workstation or System Service Processor (SSP). By installing the cluster SNMP agent on the administrative workstation or SSP, network traffic is regulated and the CPU power of the nodes is not wasted in transmitting SNMP packets.

The snmpd daemon:

Is an RFC 1157-compliant SNMP agent.
Is dedicated to support the Sun Cluster (SC) MIB extensions under the enterprise group of Sun Microsystems, Inc.
Provides the cluster sun.mib in ASCII format.
Supports SNMP protocol operations including GET-REQUEST, GETNEXT-REQUEST and TRAP.
Provides the Super Monitor agent smond for data collection.

The Super Monitor daemon, smond, collects hardware configuration information and critical cluster events by connecting to the in.mond daemon from each of the member nodes of the cluster(s). The smond daemon then reports the same information to the SNMP daemon (snmpd).

Note -

You need to configure only one smond daemon to collect cluster information for several clusters.

The SUNWcsnmp package contains the following:

/opt/SUNWcluster/bin/snmpd and /opt/SUNWcluster/bin/smond binaries
ASCII /opt/SUNWcluster/etc/sun.mib file
/opt/SUNWcluster/bin/init.snmpd script (snmpd control)
/var/opt/SUNWcluster/snmpd.conf file (SNMP configuration)
/opt/SUNWcluster/etc/snmp.traps file (SNMP traps)
/opt/SUNWcluster/etc/sun-snmp.schema file (SunNet Manager schema)
/opt/SUNWcluster/bin/smond_conf script (smond configuration)
/opt/SUNWcluster/bin/smond_ctl script (smond control)
Applicable man pages

For additional information on the snmpd and smond daemons, see Appendix B, Sun Cluster Man Page Quick Reference.

D.2 Cluster Management Information Base

The Management Information Base (MIB) is a collection of objects that can be accessed through a network management protocol. The definition of the objects should be in a generic and consistent manner so that various management platforms can read and parse the definition.

Run the snmpd daemon on the management server, which is the cluster administration workstation, or on any client. This agent provides information (gathered from smond) for all the SNMP attributes defined in the cluster MIB. This MIB file is typically compiled into an "SNMP-aware" network manager like the SunNet Manager Console. See "D.5 Changing the snmpd.conf File".

The sun.mib file provides information about clusters in the following tables:

clustersTable
clusterNodesTable
switchesTable
portsTable
lhostTable
dsTable
dsinstTable

Note -

In the preceding bullets, time refers to the local time on the SNMP server (in which the table is maintained). Thus, the time indicates when any attribute change is reported on the server.

D.2.1 The `clustersTable` Attributes

The clusters table consists of entries for all of the monitored clusters. Each entry in the table contains specific attributes that provide cluster information. See Table D-1 for the clustersTable attributes.

Table D-1 clustersTable Attributes


Attribute Names	Description
`clusterName`	The name of the cluster
`clusterDescr`	A description of the cluster
`clusterVersion`	The release version of the cluster
`numNodes`	The number of nodes in the cluster
`nodeNames`	The names of all the nodes in the cluster, separated by commas
`quorumDevices`	The names of all the quorum devices in the cluster, separated by commas
`clusterLastUpdate`	The last time any of the attributes of this entry were modified

D.2.2 The `clusterNodesTable` Attributes

The cluster nodes table consists of the known nodes of all of the monitored clusters. Each entry contains specific information about the node. See Table D-2 for the clusterNodesTable attributes.

Note -

When using a cross-reference, note that the belongsToCluster attribute acts as the key reference between this table and the clustersTable.

Table D-2 clusterNodesTable Attributes


Attribute Names	Description
`nodeName`	The host name of the node.
`belongsToCluster`	The name of the cluster (to which this node belongs).
scState	State of the Sun Cluster software component on this node (Stopped, Aborted, In Transition, Included, Excluded, or Unknown). An enterprise specific trap signals a change in state.
vmState	State of the volume manager software component on this node. An enterprise specific trap signals a change in state.
dbState	State of the database software component on this node (Down, Up, or Unknown). An enterprise specific trap signals a change in state.
vmType	The type of volume manager currently being used on this node.
vmOnNode	Mode of the SSVM software component on this node (Master, Slave, or Unknown). An enterprise specific trap signals a change in state. This attribute is not valid for clusters with other volume managers.
`nodeLastUpdate`	The last time any of the attributes of this entry were modified.

D.2.3 The `switchesTable` Attributes

The switches table consists of entries for all of the switches. Each entry in the table contains specific information about a switch in a cluster. See Table D-3 for the switchesTable attributes.

Table D-3 switchesTable Attributes


Attribute Names	Description
`switchName`	The name of the switch
`numPorts`	The number of ports on the switch
`connectedNodes`	The names of all the nodes that are presently connected to the ports of the switch
`switchLastUpdate`	The last time any of the switch attributes of this entry were modified

D.2.4 The `portsTable` Attributes

The ports table consists of entries for all of the switch ports. Each entry in the table contains specific information about a port within a switch. See Table D-4 for the portsTable attributes.

Note -

When using a cross-reference, note that the belongsToSwitch attribute acts as the key reference between this table and the switchesTable.

Table D-4 portsTable Attributes


Attribute Names	Description
`portId`	The port ID or number
`belongsToSwitch`	The name of the switch (to which this port belongs)
`connectedNode`	The name of the node (to which this port is presently connected)
`nodeAdapterId`	The adapter ID (of the SCI card) on the node to which this port is connected
portStatus	The status of the port (Active, Inactive, and so forth)
portLastUpdate	The last time any of the port attributes of this entry were modified

D.2.5 The `lhostTable` Attributes

The logical hosts table consists of entries for each logical host configured in the cluster. See Table D-5 for the lhostTable attributes.

Table D-5 lhostTable Attributes


Attribute Names	Description
`lhostName`	The name of the logical host
`lhostMasters`	The list of node names that constitute the logical host
`lhostCurrMaster`	The name of the node that is currently the master for the logical host
`lhostDS`	The list of data services configured to run on this logical host
`lhostDG`	The disk groups configured on this logical host
`lhostLogicalIP`	The logical IP address associated with this logical host
`lhostStatus`	The current status of the logical host (UP or DOWN)
`lhostLastUpdate`	The last time any of the attributes of this entry were modified

D.2.6 The `dsTable` Attributes

The data services table consists of entries for all data services that are configured for all logical hosts in the monitored clusters. Each entry in the table consists of specific information about a data service configured on a logical host. See Table D-6 for the dsTable attributes.

Note -

When using a cross-reference, note that the dsonLhost attribute acts as a key reference between this table and the lhostTable.

Table D-6 dsTable Attributes


Attribute Names	Description
`dsName`	The name of the data service.
`dsOnLhost`	The name of the logical host on which the data service is configured.
`dsReg`	The value is 1 or 0 depending on whether the data service is registered and configured to run (1) or not run (0).
`dsStatus`	The current status of the data service (ON/OFF/INST DOWN).
`dsDep`	The list of other data services on which this data service depends.
`dsPkg`	The package name for the data service.
`dsLastUpdate`	The last time any of the attributes of this entry were last modified.

D.2.7 The `dsinstTable` Attributes

The data service instance table consists of entries for all data service instances. See Table D-7 for the dsinstTable attributes.

Note -

When using a cross-reference, note that the dsinstOfDS attribute can be used as a key reference between this table and the dsTable. Similarly, the dsinstOnLhost attribute can be used as a key reference between this table and the lhostTable.

Table D-7 dsinstTable Attributes


Attribute Names	Description
`dsinstName`	The name of the data service instance
`dsinstOfDS`	The name of the data service of which this is a data service instance
`dsinstOnLhost`	The name of the logical host on which this data service instance is running
`dsinstStatus`	The status of the data service instance
`dsinstLastUpdate`	The last time any of the attributes of this entry were modified

D.3 Cluster SNMP Daemon and Super Monitor Daemon Operation

The SNMP daemon operates under the following provisions:

The smond daemon connects to in.mond on all of the requested cluster nodes.
The smond daemon passes the collected config and syslog information to the snmpd daemon.
The snmpd daemon fills in the cluster MIB tables (which are available to clients through SNMP GET operations).
The snmpd daemon sends out enterprise-specific traps for critical cluster events when notified by smond syslog data.

D.4 SNMP Traps

SNMP traps are asynchronous notifications generated by the SNMP agent that indicate an unintended change in the state of monitored objects.

The software generates Sun Cluster-specific traps for critical cluster events. These events are listed in the following tables.

Table D-8 lists the Sun Cluster traps reflecting the state of the cluster software on a node.

Table D-8 Sun Cluster Traps Reflecting the Software on a Node


Trap Number	Trap Name
0	`sc:stopped`
1	`sc:aborted`
2	`sc:in_transition`
3	`sc:included`
4	`sc:excluded`
5	`sc:unknown`

Table D-9 lists the Sun Cluster traps reflecting the state of the volume manager on a node.

Table D-9 Sun Cluster Traps Reflecting the volume manager on a Node


Trap Number	Trap Name
10	`vm:down`
11	`vm:up`
12	`vm:unknown`

Table D-10 lists the Sun Cluster traps reflecting the state of the database on a node.

Table D-10 Sun Cluster Traps Reflecting the Database on a Node


Trap Number	Trap Name
20	`db:down`
21	`db:up`
22	`db:unknown`

Table D-11 lists the Sun Cluster traps reflecting the nature of the Cluster Volume Manager (master or slave) on a node.

Table D-11 Sun Cluster Traps Reflecting Cluster Volume Manager on a Node


Trap Number	Trap Name
30	`vm_on_node:master`
31	`vm_on_node:slave`
32	`vm_on_node:unknown`

Table D-12 lists the Sun Cluster traps reflecting the states of a logical host.

Table D-12 Sun Cluster Traps Reflecting the States of a Logical Host


Trap Number	Trap Name
40	`lhost:givingup`
41	`lhost:given`
42	lhost:takingover
43	lhost:taken
46	`lhost:unknown`

Table D-13 lists the Sun Cluster traps reflecting the states of a data service instance.

Table D-13 Sun Cluster Traps Reflecting the States of a Data Service Instance


Trap Number	Trap Name
50	`ds:started`
51	`ds:stopped`
52	ds:in-transition
53	ds:failed-locally
54	ds:failed-remotely
57	`ds:unknown`

Table D-14 lists the Sun Cluster traps reflecting the states of the HA-NFS data service.

Table D-14 Sun Cluster Traps Reflecting the States of the HA-NFS Data Service Instance


Trap Number	Trap Name
60	`hanfs:start`
61	`hanfs:stop`
70	`hanfs:unknown`

Table D-15 lists the Sun Cluster traps reflecting SNMP errors.

Table D-15 Sun Cluster Traps Reflecting SNMP Errors


Trap Number	Trap Name
100	`SOCKET_ERROR:node_out_of_system_resources`
101	`CONNECT_ERROR:node_out_of_system_resources`
102	`BADMOND_ERROR:node_running_bad/old_mond_version`
103	`NOMOND_ERROR:mond_not_installed_on_node`
104	`NOMONDYET_ERROR:mond_on_node_not_responding:node_may_be_rebooting`
105	`TIMEOUT_ERROR:timed_out_upon_trying_to_connect_to_nodes_mond`
106	`UNREACHABLE_ERROR:node's_mond_unreachable:network_problems??`
107	`READFAILED_ERROR:node_out_of_system_resources`
108	`NORESPONSE_ERROR:node_out_of_system_resources`
109	`BADRESPONSE_ERROR:unexpected_welcome_message_from_node's_mond`
110	`SHUTDOWN_ERROR:node's_mond_shutdown`
200	`Fatal:super_monitor_daemon(smond)_exited!`

For trap numbers 100-110, check the faulty node and fix the problem. For trap number 200, see "D.8 SNMP Troubleshooting".

D.5 Changing the `snmpd.conf` File

The snmpd.conf file is used for configuration information. Each entry in the file consists of a keyword followed by a parameter string. The default values in the file should suit your needs.

D.5.1 How to Change the `snmpd.conf` File

Edit the snmpd.conf file.

For details on the descriptions of the keywords, refer to the snmpd(7) man page.

After making any changes to the snmpd.conf file, stop the smond and snmpd programs, then restart the scripts by entering:

# /opt/SUNWcluster/bin/smond_ctl stop
# /opt/SUNWcluster/bin/init.snmpd stop
# /opt/SUNWcluster/bin/init.snmpd start
# /opt/SUNWcluster/bin/smond_ctl start

An example snmpd.conf file follows.

sysdescr        Sun SNMP Agent, SPARCstation 10, Company
                              Property Number 123456
 syscontact 	Coby Phelps
 sysLocation 	Room 123
 #
 system-group-read-community     public
 system-group-write-community    private
 #
 read-community  all_public
 write-community all_private
 #
 trap            localhost
 trap-community  SNMP-trap
 #
 #kernel-file    /vmunix
 #
 managers        lvs golden

D.6 Configuring the Cluster SNMP Agent Port

By default, the cluster SNMP agent listens on User Datagram Protocol (UDP) Port 161 for requests from the SNMP manager, for example, SunNet Manager Console. You can change this port by using the -p option to the snmpd and smond daemons.

Both the snmpd and smond daemons must be configured on the same port in order to function properly.

Caution -

If you are installing the cluster SNMP agent on an SSP or an Administrative workstation running the Solaris 2.6 Operating Environment or compatible versions, always configure the snmpd and the smond programs on a port other than the default UDP port 161.

For example, with the SSP, the cluster SNMP agent interferes with the SSP SNMP agent which also uses UDP port 161. This interference could result in the loss of RAS features of the Sun Enterprise 10000 server.

D.6.1 How to Configure the Cluster SNMP Agent Port

To configure the cluster SNMP agent on a port other than the default UDP Port 161, perform the following steps.

Edit the /opt/SUNWcluster/bin/init.snmpd file and change the value of the CSNMP_PORT variable from 161 to the desired value.

Edit the /opt/SUNWcluster/bin/smond_ctl file and change the value of the CSNMP_PORT variable from 161 to the same value you chose in Step 1.

Stop and then restart both the snmpd and smond daemons for the changes to take effect.
# /opt/SUNWcluster/bin/smond_ctl stop # /opt/SUNWcluster/bin/init.snmpd stop # /opt/SUNWcluster/bin/smond_ctl start # /opt/SUNWcluster/bin/init.snmpd start
Note -
Configuration files specific to the SNMP manager may need to be edited for SNMP manager to become aware of the new port number. Refer to your SNMP manager documentation for more information. Alternatively, you can configure the master SNMP agent on the Administrative workstation to start the cluster SNMP proxy agent as a subagent on a port other than 161. See the Solstice Enterprise Agents User's Guide or the snmpdx(1M) man page for information on how to configure the master SNMP agent.

D.7 Using the SNMP Agent With SunNet Manager

The cluster SNMP agent has been qualified with the SunNet Manager. Perform the following procedures prior to using SunNet Manager to monitor clusters.

Note -

These procedures assume you are using the UDP port 161 for SNMP. If you changed the port number as described in "D.6 Configuring the Cluster SNMP Agent Port", you need to run the SunNet Manager SNMP proxy agent, na.snmp to use the alternate port.

D.7.1 How to Use the SNMP Agent With SunNet Manager to Monitor Clusters

Copy the cluster MIB /opt/SUNWcluster/etc/sun.mib to /opt/SUNWconn/snm/agents/cluster.mib on the SunNet Manager console.

On the SunNet Manager console run mib2schema on the copied cluster.mib file:
# /opt/SUNWconn/snm/bin/mib2schema cluster.mib

On the Sun Cluster Administrative workstation, edit the snmpd.conf file and set the parameter string in the trap keyword to the name of the SunNet Manager console.

For more information on editing the snmpd.conf file, refer to "D.5 Changing the snmpd.conf File".

Run the smond_conf command on the Sun Cluster Administrative workstation for each cluster you want to monitor. For example:
# /opt/SUNWcluster/bin/smond_conf -h [clustername ...]

Set the proxy for cluster-snmp to be the name of the SunNet Manager console.

Note -
In order to monitor clusters, you must also monitor the Administrative workstation using SunNet Manager.

D.7.2 How to Reconfigure `smond` to Monitor a Different Cluster

You can reconfigure the smond daemon to monitor a different cluster.

Stop the snmpd daemon by using:
# /opt/SUNWcluster/bin/init.snmpd stop

Reconfigure the smond daemon by using:

# /opt/SUNWcluster/bin/smond_conf -h [clustername ...]

Start the snmpd daemon by using:

# /opt/SUNWcluster/bin/init.snmpd start

Start the smond daemon by using:
# /opt/SUNWcluster/bin/smond_ctl start

D.8 SNMP Troubleshooting

If the Cluster MIB tables are not filled in your application, or if you receive trap number 200, be sure that the snmpd and smond daemons are running by entering:

# ps -ef | grep snmpd
# ps -ef | grep smond

You do not see any output if the daemons are not running.

If the daemons are not running, enter:

# /opt/SUNWcluster/bin/init.snmpd start
# /opt/SUNWcluster/bin/smond_ctl start

Part III Technical Reference

Appendix A Administering Volume Managers

A.1 Using Solstice DiskSuite in the Sun Cluster Environment

A.1.1 Metadevice and Diskset Administration

A.1.1.1 About Disksets

A.1.2 Adding a Disk to a Diskset

A.1.3 How to Add a Disk to a Diskset (Solstice DiskSuite)

A.1.4 Removing a Disk From a Diskset

A.1.5 How to Remove a Disk From a Diskset (Solstice DiskSuite)

A.1.6 Administering Multihost Metadevices

A.1.6.1 Managing Metadevices

A.1.6.2 Adding a Mirror to a Diskset

A.1.6.3 Removing a Mirror From a Diskset

A.1.6.4 Taking Submirrors Offline

A.1.6.5 Creating New Metadevices

A.1.6.6 Replacing Errored Components

A.1.6.7 Deleting Metadevices

A.1.6.8 Growing Metadevices

A.1.6.9 Managing Hot Spare Pools

A.1.6.10 Managing UFS Logs

A.1.6.11 Adding UFS Logging to a Logical Host

A.1.7 Administering Local Metadevices

A.1.8 Destructive Metadevice Actions

A.2 Using SSVM and CVM in the Sun Cluster Environment

A.2.1 Objects Administration Overview (SSVM and CVM)

A.2.2 About Disks

A.2.3 How to Initialize and Configure a Disk (SSVM and CVM)

A.2.3.1 Taking a Disk Offline

A.2.3.2 Removing a Disk

A.2.4 Administering Disk Groups

A.2.5 How to Create a Disk Group (SSVM and CVM)

A.2.6 How to Move a Disk to a Different Disk Group (SSVM and CVM)

A.2.7 Administering SSVM and CVM Objects

A.2.7.1 Creating Volumes and Adding Mirrors to Volumes

A.2.7.2 Adding Dirty Region Logging

A.2.7.3 Creating a Log File for an Existing Volume

A.2.7.4 Using Hot-Relocation

A.2.7.5 Using VxFS File Systems

A.2.7.6 Growing a File System

A.2.8 Administering Local Mirrors

A.3 Backing Up Multihost Data Using Solstice Backup

Appendix B Sun Cluster Man Page Quick Reference

B.1 Man1

B.2 Man1M

B.3 Man3HA

B.4 Man4

B.5 Man7

Appendix C Sun Cluster Fault Detection

C.1 Fault Detection Overview

C.1.1 The Heartbeat Mechanism: Cluster Membership Monitor

C.1.2 Sanity Checking of Probing Node

C.2 Public Network Monitoring (PNM)

C.3 Sun Cluster Fault Probes

C.4 Data Service-Specific Fault Probes

C.4.1 Sun Cluster HA for NFS Fault Probes

C.4.2 HA-DBMS Fault Probes

C.4.3 Sun Cluster HA for Netscape Fault Probes

C.4.3.1 Sun Cluster HA for DNS Fault Probes

C.4.3.2 Sun Cluster HA for Netscape HTTP Fault Probes

C.4.3.3 Sun Cluster HA for Netscape News Fault Probes

C.4.3.4 Sun Cluster HA for Netscape Mail or Message Server Fault Probes

C.4.3.5 Sun Cluster HA for Netscape LDAP Fault Probes

C.4.4 Sun Cluster HA for Lotus Fault Probes

C.4.5 Sun Cluster HA for Tivoli Fault Probes

C.4.6 Sun Cluster HA for SAP Fault Probes

Appendix D Using Sun Cluster SNMP Management Solutions

D.1 Cluster SNMP Agent and Cluster Management Information Base

D.2 Cluster Management Information Base

D.2.1 The clustersTable Attributes

D.2.2 The clusterNodesTable Attributes

D.2.3 The switchesTable Attributes

D.2.4 The portsTable Attributes

D.2.5 The lhostTable Attributes

D.2.6 The dsTable Attributes

D.2.7 The dsinstTable Attributes

D.3 Cluster SNMP Daemon and Super Monitor Daemon Operation

D.4 SNMP Traps

D.5 Changing the snmpd.conf File

D.5.1 How to Change the snmpd.conf File

D.6 Configuring the Cluster SNMP Agent Port

D.2.1 The `clustersTable` Attributes

D.2.2 The `clusterNodesTable` Attributes

D.2.3 The `switchesTable` Attributes

D.2.4 The `portsTable` Attributes

D.2.5 The `lhostTable` Attributes

D.2.6 The `dsTable` Attributes

D.2.7 The `dsinstTable` Attributes

D.5 Changing the `snmpd.conf` File

D.5.1 How to Change the `snmpd.conf` File

D.7.2 How to Reconfigure `smond` to Monitor a Different Cluster