Sun Cluster 3.0 Hardware Guide

Appendix A Verifying Sun Cluster Hardware Redundancy

This appendix provides the tests for verifying and validating the high availability (HA) of your Sun Cluster configuration. The tests in this appendix assume that you installed Sun Cluster hardware, the Solaris operating environment, and Sun Cluster software. All nodes should be booted as cluster members.

This appendix contains the following procedures:

If your cluster passes these tests, your hardware has adequate redundancy: Nodes, cluster transport cables, and Network Adapter Failover (NAFO) groups are not single points of failure.

To perform the tests in "How to Test Nodes Using a Power-off Method" and "How to Test Cluster Interconnects", you must first identify the device groups that each node masters. Perform these tests on all cluster pairs that share a disk device group. Each pair will have a primary and a secondary for a particular device group. Use the scstat(1M) command to determine the initial primary and secondary.

For conceptual information on primary, secondary, failover, device groups, or cluster hardware, see Sun Cluster 3.0 Concepts.

Testing Node Redundancy

This section provides the procedure for testing node redundancy and high availability of device groups. Perform the following procedure to confirm that the secondary takes over the device group mastered by the primary when the primary fails.

How to Test Nodes Using a Power-off Method

Power off the primary node.

Cluster interconnect error messages appear on the console of the existing nodes.

On another node, run the scstat command to verify that the secondary took ownership of the device group mastered by the primary.

Look for the output that shows the device group ownership.
# scstat

Power on the initial primary and boot into cluster mode.

Wait for the system to boot. The system automatically starts the membership monitor software. The node then rejoins the configuration.

If you have the device group failback option enabled, skip Step 4 because the system boot process moves ownership of the device group back to the initial primary. Otherwise, proceed to Step 4 to move ownership of the device group back to the initial primary. Use the scconf -p command to determine if your device group has the device group failback option enabled.

If you do not have the device group failback option enabled, from the initial primary, run the scswitch(1M) command to move ownership of the device group back to the initial primary.
# scswitch -S -h nodename

Verify that the initial primary has ownership of the device group.

Look for the output that shows the device group ownership.
# scstat

Testing Cluster Interconnect and Network Adapter Failover Group Redundancy

This section provides the procedure for testing cluster interconnect and Network Adapter Failover (NAFO) group redundancy.

How to Test Cluster Interconnects

Disconnect one of the cluster transport cables from a node that masters a device group.

Messages appear on the consoles of each node, and error messages appear in the /var/adm/messages file. If you run the scstat(1M) command, the Sun Cluster software assigns the cluster transport path you disconnected a faulted status. This fault does not result in a failover.

Disconnect the remaining cluster transport cable from the primary node you identified in Step 1.

Messages appear on the consoles of each node, and error messages appear in the /var/adm/messages file. If you run the scstat command, the Sun Cluster software assigns the cluster transport path you disconnected a faulted status. This action causes the primary node to go down, resulting in a partitioned cluster.

For conceptual information on failure fencing or split brain, see Sun Cluster 3.0 Concepts.

On another node, run the scstat command to verify that the secondary node took ownership of the device group mastered by the primary.
# scstat

Reconnect all cluster transport cables.

Boot the initial primary, which you identified in Step 1, into cluster mode.
{0} ok boot
If you have the device group failback option enabled, skip Step 7 because the system boot process moves ownership of the device group back to the initial primary. Otherwise, proceed to Step 7 to move ownership of the device group back to the initial primary. Use the scconf -p command to determine if your device group has the device group failback option enabled.

Verify that the Sun Cluster software assigned each cluster transport path you reconnected in Step 4 a path online status.
# scstat

If you do not have the device group failback option enabled, move ownership of the device group back to the initial primary.
# scswitch -S -h nodename

How to Test Network Adapter Failover Groups

Perform this procedure on each node.

Identify the current active network adapter.
# pnmstat -l

Disconnect one public network cable from the current active network adapter.

Error messages appear in the node's console. This action causes a NAFO failover to a backup network adapter.

From the master console, verify that the Sun Cluster software failed over to the backup NAFO adapter.

A NAFO failover occurred if the backup NAFO adapter displays an active status.
# pnmstat -l

Reconnect the public network cable, and wait for the initial network adapter to come online.

Switch over all IP addresses hosted by the active network adapter to the initial network adapter, and make the initial network adapter the active network adapter.
# pnmset switch adapter