Documentation Errata (Sun Cluster 2.2 7/00 Release Notes)

Sun Cluster 2.2 7/00 Release Notes

Documentation Errata

4233113 - Sun Cluster documentation omits information regarding logical host timeout values and how they are used. When you configure the cluster, you set a timeout value for the logical host. This timeout value is used by the CCD when you bring a data service up or down using the hareg(1M) command. The CCD operation occurs in two steps; half of the timeout value is used for each step. Therefore, when configuring START and STOP methods for data services, make sure each method uses no more than half of the timeout value set for the logical host.

4330501 - The Sun Cluster 2.2 System Administration Guide, section 4.4, "Disabling Automatic Switchover" indicates that you can disable automatic switchover of logical hosts by using the scconf -m command. This is misleading. You can use scconf -m to disable automatic switchover of logical hosts only if you issue the command when you create the logical hosts initially.

If the logical host already exists, you must remove the logical host and then re-create it using scconf -m, in order to disable automatic switchover.

4336091 - Sun Cluster documentation omits information regarding how to set logical unit numbers (LUNs) for A1000 and A3x00 storage devices.

When you add A1000 and A3x00s to a Sun Cluster configuration, you must set the LUNs so that they survive switchover or failover of a cluster node, without loss of pseudo-device information. Use the following procedure to ensure that LUNs are set correctly and permanently for these disk types.

On both nodes, install or verify the existence of the RAID manager packages, SUNWosafw, SUNWosamn, SUNWosar, and SUNWosau.
(Solaris 8 only) Install or verify the existence of the RAID manager patch 108553.

Obtain the patch from your service provider or from the patch web site http://sunsolve.sun.com.
Use the RM6 tool to set up the LUNs on the first node.

Using the tool's GUI, click on "Configuration," then on "Module Name," and then on "Create LUN icon."
Compare the /etc/osa/rdac_address files on both nodes.

In Step 3, LUNs were assigned to either controller A or B, and the rdac_address file records this assignment. If necessary, modify the rdac_address file on the second node so that the controller assignments match those on the first node.

Run the following RAID manager command on both nodes.
# /usr/lib/osa/bin/hot_add

4341222 - Chapter 1, sections 1.3.2 and 1.5.6 of the Sun Cluster 2.2 Software Installation Guide do not accurately describe the behavior of CCD quorum during cluster configuration. The documentation should reflect that it is possible to modify the cluster configuration database even when CCD quorum conditions are not met (that is, when greater than half of all cluster nodes do not have a valid CCD).

Typically, the cluster software requires a quorum before updating the CCD. This requirement is highly restrictive in configurations using logical hosts. Therefore, to overcome this limitation in Sun Cluster 2.2, all administrative and configuration commands related to logical hosts and data services that update the CCD database can be executed without CCD quorum. Such commands include hareg(1M) and scconf(1M) operations, for example.

To prevent loss of any CCD updates, you should always make sure that the last node to leave the cluster during cluster shutdown is the first node to rejoin the cluster upon start up.

4342066 - The procedure "How to Change the Name of a Cluster Node" in section 3.2 of the Sun Cluster 2.2 System Administration Guide is incorrect and should not be used. The correct procedure involves changing various framework files which should not be altered manually by anyone other than your service representative. If you need to change the name of a cluster node, contact your service provider for assistance.

4342236 - The abort_net method is described incorrectly in the Sun Cluster 2.2 API Developer's Guide. The documentation states that the abort_net method can be used to execute "last wishes" cleanup code before a cluster is stopped. This is incorrect.

Instead, the abort_net method is called by the clustd daemon when a node is about to abort from the cluster, typically in a split-brain situation when the node in question is the loser in the race for the quorum device (see Chapter 1 in the Sun Cluster 2.2 Software Installation Guide for more information about quorum devices). In such a case, first abort_net is called, then the network is taken down, and finally abort is called. However, these methods are executed on the node that is aborting only if the node owns the data service associated with the methods. (That is, if the aborting node does not own any logical host, then it will not execute any of the abort methods associated with a data service.) The aborting node will stop the cluster software, but the node itself will remain up.

Note that stop and stop_net methods are called each time the cluster reconfigures itself (due to nodes joining or leaving the cluster), as part of normal cluster operation.

4343021 - In Chapter 14 in the Sun Cluster 2.2 Software Installation Guide, the documented installation procedure for Sun Cluster HA for NetBackup is incorrect. The correct procedure is as follows.

Install Sun Cluster 2.2 7/00 Release using the procedures documented in Chapter 3 of the Sun Cluster 2.2 Software Installation Guide.

Stop the cluster by running the following command on all nodes, sequentially.
# scadmin stopnode

On all nodes, install VERITAS NetBackup, using the procedures documented in Chapter 14 of the Sun Cluster 2.2 Software Installation Guide.

On all nodes, install Sun Cluster patch 109214, which enhances the scinstall(1M) command to recognize Sun Cluster HA for NetBackup. The patch is available from your service provider or from the Sun patch web site http://sunsolve.sun.com.

On all nodes, re-run the scinstall command and install the Sun Cluster HA for NetBackup data service.
# scinstall

On all nodes, install Sun Cluster patches 108450 and 108423, which enhance the hadsconfig(1M) command to recognize Sun Cluster HA for NetBackup.

Start the cluster. On the first node, run the following command:
# scadmin startcluster
Sequentially, on all other nodes, run the following command:
# scadmin startnode

Configure the Sun Cluster HA for NetBackup data service by running the hadsconfig command on one node only. See Chapter 14 in the Sun Cluster 2.2 Software Installation Guide for configuration parameters to supply to hadsconfig.
# hadsconfig

Activate the data service by running the following command on one node only.
# hareg -y netbackup

4343093 - Chapter 2, section 2.6.5, in the Sun Cluster 2.2 Software Installation Guide states that Sun Cluster 2.2 must run in C locale. This is incorrect. Sun Cluster 2.2 7/00 Release can run in C, fr (French), ko (Korean), and ja (Japanese) locales.

4344711 - Appendix C in the Sun Cluster 2.2 Software Installation Guide contains incorrect or incomplete information about configuring VERITAS Volume Manager. These errors are described in more detail below.

The document mentions only VxFS file systems, and omits information about UFS file systems. In Sun Cluster configurations, UFS file systems can be created in a similar fashion to VxFS file systems. See your system administration documentation for more information about creating and administering UFS file systems.

When using the mkfs(1M) command to administer VxFS file systems, use the fully qualified path to the command, such as /usr/lib/fs/vxfs/mkfs. The documentation omits this information wherever the mkfs command is described.

In section C.3, "Configuring VxFS File Systems on the Multihost Disks," the procedure contains erroneous steps. The correct procedure follows. Use this procedure after creating logical hosts as described in the scconf(1M) man page or in Chapter 3, "Installing and Configuring Sun Cluster Software."

Take ownership of the disk group containing the volume by using the vxdg(1M) command to import the disk group to the active node.
phys-hahost1# vxdg import diskgroup

Run the following scconf(1M) command on each cluster node.

This scconf command will create a volume for the administrative file system, create a file system within that volume, create mount points for that volume in the root file system ("/"), create dfstab.logicalhost and vfstab.logicalhost files in /etc/opt/SUNWcluster/conf/hanfs, and create an appropriate entry in the vfstab.logicalhost file for the administrative file system.
phys-hahost1# scconf clustername -F logicalhost

Create file systems for all volumes. These volumes will be mounted by the logical hosts.
phys-hahost1# mkfs -F vxfs /dev/vx/rdsk/diskgroup/volume

Update the vfstab.logicalhost file to include entries for the file systems created in Step 3.

Create mount points for the file systems created in Step 3.
phys-hahost1# mkdir /logicalhost/volume

Import the disk groups to their default masters.

It is most convenient to create and populate disk groups from the active node that is the default master of the particular disk group.

Import each disk group onto the default master node using the -t option. The -t option is important, as it prevents the import from persisting across the next boot.
phys-hahost1# vxdg -t import diskgroup

(Optional) To make file systems NFS-sharable, refer to Chapter 11, "Installing and Configuring Sun Cluster HA for NFS."

4345750 - In Chapter 14 of the Sun Cluster 2.2 System Administration Guide, the procedure "How to Replace a Sun StorEdge A5000 Disk (VxVM)," Steps 3 and 4 are not valid for a cluster that runs on the Solaris 7 11/99 operating environment and later. In Step 3, you should run the luxadm remove_device command on only one of the nodes connected to the array. Performing the command on additional nodes is unnecessary and will generate error messages. In Step 4, after you physically replace the disk, do not run the luxadm insert_device command. This command is not necessary.

If your cluster runs on a Solaris operating environment earlier than the Solaris 7 11/99 release, Steps 3 and 4 are still valid as documented.

4356674 - In Chapter 14 of the Sun Cluster 2.2 System Administration Guide, the procedure "How to Replace a Sun StorEdge A5000 Disk (Solstice DiskSuite)" contains errors in Steps 2, 3, 11, 12, and 13. In these steps, the directory /tmp should be replaced with /var/tmp, and physical device names should be replaced with did device names. The corrected procedure, in its entirety, is as follows.

Identify all metadevices or applications that use the failing disk.

If the metadevices are mirrored or RAID 5, the disk can be replaced without stopping the metadevices. Otherwise all I/O to the disk must be stopped using the appropriate commands. For example, use the umount(1M) command to unmount a file system on a stripe or concatenation.

Preserve the disk label, if necessary. For example:

# prvtoc /dev/rdsk/c1t3d0s2 > /var/tmp/c1t3d0.vtoc

(Optional) Use metareplace to replace the disk slices if the disk has not been hot-spared. For example:
# metareplace d1 /dev/did/dsk/d23 /dev/did/dsk/d88 d1: device d23 is replaced with d88

Use luxadm -F to remove the disk.

The -F option is required because Solstice DiskSuite does not offline disks. Repeat the command for all hosts, if the disk is multihosted. For example:

# luxadm remove -F /dev/rdsk/c1t3d0s2
WARNING!!! Please ensure that no filesystems are mounted on these
device(s). All data on these devices should have been backed 
up. The list of devices which will be removed is: 
1: Box Name "macs1" rear slot 1
Please enter `q' to Quit or <Return> to Continue: stopping: Drive
in "macs1" rear slot 1....Done
offlining: Drive in "macs1" rear  slot 1....Done
Hit <Return> after removing the device(s).

only -

The FPM icon for the disk drive to be removed should be blinking. The amber LED under the disk drive should also be blinking.

Remove the disk drive and enter Return. The output should look similar to the following:

Hit <Return> after removing the device(s). 
Drive in Box Name "macs1" rear slot 1 
Removing Logical Nodes: 
Removing c1t3d0s0 Removing c1t3d0s1 Removing c1t3d0s2 Removing
c1t3d0s3 Removing c1t3d0s4 Removing c1t3d0s5 Removing c1t3d0s6
Removing c1t3d0s7 Removing c2t3d0s0 Removing c2t3d0s1 Removing
c2t3d0s2 Removing c2t3d0s3 Removing c2t3d0s4 Removing c2t3d0s5
Removing c2t3d0s6 Removing c2t3d0s7
#

Repeat Step 4 for all nodes, if the disk array is in a multi-host configuration.

Use the luxadm insert command to insert the new disk. Repeat for all nodes. The output should be similar to the following:

# luxadm insert macs1,r1
The list of devices which will be inserted is: 
1: Box Name "macs1" rear slot 1
Please enter `q' to Quit or <Return> to Continue: Hit <Return>
after inserting the device(s).

Insert the disk drive and enter Return. The output should be similar to the following:

Hit <Return> after inserting the device(s). Drive in Box Name
"macs1" rear slot 1  Logical Nodes under /dev/dsk and /dev/rdsk:
c1t3d0s0 c1t3d0s1 c1t3d0s2 c1t3d0s3 c1t3d0s4 c1t3d0s5 c1t3d0s6
c1t3d0s7 c2t3d0s0 c2t3d0s1 c2t3d0s2 c2t3d0s3 c2t3d0s4 c2t3d0s5
c2t3d0s6 c2t3d0s7
#

only -

The FPM icon for the disk drive you replaced should be lit. In addition, the green LED under the disk drive should be blinking.

On all nodes connected to the disk, use scdidadm(1M) to update the DID pseudo device information.

In this command, DID_instance is the instance number of the disk that was replaced. Refer to the scdidadm(1M) man page for more information.
# scdidadm -R DID_instance

Reboot all nodes connected to the new disk.

To avoid down time, use the haswitch(1M) command to switch ownership of all logical hosts that can be mastered by the node to be rebooted. For example,
# haswitch phys-hahost2 hahost1 hahost2

Label the disk, if necessary. For example:

# cat /var/tmp/c1t3d0.vtoc | fmthard -s - /dev/rdsk/c1t3d0s2
fmthard:  New volume table of contents now in place.

Replace the metadb, if necessary. For example:

# metadb -s setname -d /dev/did/rdsk/d23s7; 
metadb -s setname -a -c 3 /dev/did/rdsk/d23s7

Enable the new disk slices with metareplace -e. For example:
# metareplace -e d1 /dev/did/rdsk/d23s0 d1: device d23s0 is enabled
This completes the disk replacement procedure.

4448815 - In the cports(1M) man page, there is a typo in a file name. The man page currently says: "If an entry for "serialports" has been made in the /etc/nisswitch.conf file, then the order of lookups is ..." The correct file name is /etc/nsswitch.conf.

4448860 - In the chosts(1) man page, there is a typo in a file name. The man page currently says: "If an entry for "clusters" has been made in the /etc/nisswitch.conf file, then the order of lookups is ..." The correct file name is /etc/nsswitch.conf.