Sun Cluster 3.0-3.1 Hardware Administration Manual for Solaris OS

Chapter 8 Verifying Sun Cluster Hardware Redundancy

This chapter describes the tests for verifying and validating the high availability (HA) of your Sun Cluster configuration. The tests in this chapter assume that you installed Sun Cluster hardware, the Solaris Operating System, and Sun Cluster software. All nodes should be booted as cluster members.

This chapter contains the following procedures.

If your cluster passes these tests, your hardware has adequate redundancy. This redundancy means that your nodes, cluster transport cables, and Network Adapter Failover (NAFO) groups are not single points of failure.

To perform the tests in How to Test Nodes Using a Power-Off Method and How to Test Cluster Interconnects, you must first identify the device groups that each node masters. Perform these tests on all cluster pairs that share a disk device group. Each pair has a primary and a secondary for a particular device group. Use the scstat(1M) command to determine the initial primary and secondary.

For conceptual information on primary, secondary, failover, device groups, or cluster hardware, see your Sun Cluster concepts documentation.

Testing Node Redundancy

This section provides the procedure about how to test node redundancy and high availability of device groups. Perform the following procedure to confirm that the secondary takes over the device group that is mastered by the primary when the primary fails.

How to Test Nodes Using a Power-Off Method

Before You Begin

To perform these tests, you must first identify the device groups that each node masters. Perform these tests on all cluster pairs that share a disk device group. Each pair has a primary and a secondary for a particular device group. Use the scstat(1M) command to determine the initial primary and secondary.

Power off the primary node.

Cluster interconnect error messages appear on the consoles of the existing nodes.

On another node, verify that the secondary took ownership of the device group that is mastered by the primary.

Look for the output that shows the device group ownership.
# scstat

Power on the initial primary. Boot the node into cluster mode.

Wait for the system to boot. The system automatically starts the membership monitor software. The node then rejoins the Sun Cluster configuration.

Do you have the device group failback option enabled?

Use the scconf -p command to determine if your device group has the device group failback option enabled.
- If yes, skip to Step 6.
  
  The system boot process moves ownership of the device group back to the initial primary.
- If no, proceed to Step 5.

From the initial primary, move ownership of the device group back to the initial primary.
# scswitch -S -h nodename

Verify that the initial primary has ownership of the device group.

Look for the output that shows the device group ownership.
# scstat

Testing Cluster Interconnect and IP Network Multipathing Group Redundancy

This section provides the procedure about how to test cluster interconnect and IP Network Multipathing group redundancy.

How to Test Cluster Interconnects

Before You Begin

Disconnect one of the cluster transport cables from a primary node that masters a device group.

Messages appear on the consoles of each node, and error messages appear in the /var/adm/messages file. If you run the scstat(1M) command, the Sun Cluster software assigns a faulted status to the cluster transport path that you disconnected. This fault does not result in a failover.

Disconnect the remaining cluster transport cable from the primary node that you identified in Step 1.

Messages appear on the consoles of each node, and error messages appear in the /var/adm/messages file. If you run the scstat command, the Sun Cluster software assigns a faulted status to the cluster transport path that you disconnected. This action causes the primary node to shutdown. This shutdown results in a partitioned cluster.

For conceptual information on failure fencing or split brain, see your Sun Cluster concepts documentation.

On another node, verify that the secondary node took ownership of the device group that was mastered by the primary.
# scstat

Reconnect all cluster transport cables.

Boot the initial primary, which you identified in Step 1, into cluster mode.

For the procedure about how to boot a node, see your Sun Cluster system administration documentation.

Do you have the device group failback option enabled?

Use the scconf -p command to determine if your device group has the device group failback option enabled.
- If yes, skip to Step 8.
  
  The system boot process returns ownership of the resource groups and device groups to the initial primary.
- If no, proceed to Step 7.

Move all resource groups and device groups off the current primary.
# scswitch -S -h from-node

Verify that the Sun Cluster software assigned a path online status to each cluster transport path that you reconnected in Step 4.
# scstat

How to Test IP Network Multipathing Groups

If you perform this test, you can verify that IP addresses failover from one adapter to another adapter within the same IP Network Multipathing group.

Verify that all network adapters are active.
# scstat -i

Disconnect one public network cable from an active network adapter.

Verify that the network adapter status displays as Failed.
# scstat -i

Ensure the IP address failed over to another adapter.
# ifconfig -a

Reconnect the public network cable to the network adapter.
- If failback is set to yes in the /etc/default/mpathd file, the IP address automatically returns to the original network adapter.
- If failback is not set to yes, the IP address remains with the new adapter until you perform a manual switchover.
  
  For more procedures about how to move an IP address to another adapter, see your Sun Cluster system administration documentation.

Repeat Step 1 to Step 5 for each active adapter.