Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS

Chapter 8 Verifying Sun Cluster Hardware Redundancy

This chapter describes the tests for verifying and demonstrating the high availability (HA) of your Sun Cluster configuration. The tests in this chapter assume that you installed Sun Cluster hardware, the Solaris Operating System, and Sun Cluster software. All nodes should be booted as cluster members.

This chapter contains the following procedures:

If your cluster passes these tests, your hardware has adequate redundancy. This redundancy means that your nodes, cluster transport cables, and IPMP groups are not single points of failure.

To perform the tests in How to Test Device Group Redundancy Using Resource Group Failover and How to Test Cluster Interconnects, you must first identify the device groups that each node masters. Perform these tests on all cluster pairs that share a disk device group. Each pair has a primary node and a secondary node for a particular device group.

Use one of the following commands to determine the initial primary and secondary:

The Sun Cluster 3.2 command cldevicegroup status with the -n option
The Sun Cluster 3.1 command scstat command with the -D option

For conceptual information about primary nodes, secondary nodes, failover, device groups, or cluster hardware, see your Sun Cluster concepts documentation.

Testing Node Redundancy

This section provides the procedure for testing node redundancy and high availability of device groups. Perform the following procedure to confirm that the secondary node takes over the device group that is mastered by the primary node when the primary node fails.

How to Test Device Group Redundancy Using Resource Group Failover

Before You Begin

This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Geographic Edition Object-Oriented Commands.

To perform this procedure, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.

Create an HAStoragePlus resource group with which to test.

If you are using Sun Cluster 3.2, use the following commands:

# clresourcegroup create testgroup
# clresourcetype register SUNW.HAStoragePlus
# clresource create -t HAStoragePlus -g testgroup \
  -p GlobalDevicePaths=/dev/md/red/dsk/d0 \
  -p Affinityon=true testresource

clresourcetype register: If the HAStoragePlus resource type is not already registered, register it.
/dev/md/red/dsk/d0: Replace this path with your device path.

If you are using Sun Cluster 3.1, use the following commands:

# scrgadm -a -g testgroup
# scrgadm -a -t SUNW.HAStoragePlus
# scrgadm -a -g devicegroup -t SUNW.HAStoragePlus -j testgroup \
  -x GlobalDevicePaths=/dev/md/red/dsk/d0
# scswitch -Z -g testgroup

/dev/md/red/dsk/d0: Replace this path with your device path.

Identify the node that masters the testgroup.

Run one of the following commands.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup status testgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -g

Power off the primary node for the testgroup.

Cluster interconnect error messages appear on the consoles of the existing nodes.

On another node, verify that the secondary node took ownership of the resource group that is mastered by the primary node.

Check the output for the resource group ownership.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup status testgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scstat

Power on the initial primary node. Boot the node into cluster mode.

Wait for the system to boot. The system automatically starts the membership monitor software. The node then rejoins the cluster.

From the initial primary node, return ownership of the resource group to the initial primary node.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup switch -n nodename testgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -g testgroup -h nodename
In these commands, nodename is the name of the primary node.

Verify that the initial primary node has ownership of the resource group.

Look for the output that shows the device group ownership.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup status testgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scstat

Testing Cluster Interconnect Redundancy

This section provides the procedure for testing cluster interconnect redundancy.

How to Test Cluster Interconnects

Before You Begin

To perform this procedure, become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC authorization.

Disconnect one of the cluster transport cables from a node in the cluster.

Messages similar to the following appear on the consoles of each node and are logged in the /var/adm/messages file.

Nov  4 08:27:21 node1 genunix: WARNING: ce1: fault detected external to device; service degraded
Nov  4 08:27:21 node1 genunix: WARNING: ce1: xcvr addr:0x01 - link down
Nov  4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being cleaned up
Nov  4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being drained
Nov  4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being constructed
Nov  4 08:28:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 errors during initiation
Nov  4 08:28:31 node1 genunix: WARNING: Path node1:ce1 - node1:ce0 initiation 
encountered errors, errno = 62.
  Remote node may be down or unreachable through this path.

Verify that Sun Cluster has registered that the interconnect is down.

Enter one of the following commands and verify that the interconnect path displays as Faulted.
- If you are using Sun Cluster 3.2, use the following command:
  # clinterconnect status
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -W

Reconnect the cluster transport cable

Messages similar to the following appear on the consoles of each node and are logged in the /var/adm/messages file.

Nov  4 08:30:26 node1 genunix: NOTICE: ce1: fault cleared external to device; service available
Nov  4 08:30:26 node1 genunix: NOTICE: ce1: xcvr addr:0x01 - link up 1000 Mbps full duplex
Nov  4 08:30:26 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being initiated
Nov  4 08:30:26 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 online

Verify that Sun Cluster has registered that the interconnect is up.

Enter one of the following commands and verify that the interconnect path displays as Online.
- If you are using Sun Cluster 3.2, use the following command:
  # clinterconnect status
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -W

Repeat Step 1 through Step 4 on each cluster transport cable in the node.

Repeat Step 1 through Step 5 on each node in the cluster.

Testing Public Network Redundancy

This section provides the procedure for testing public network redundancy.

How to Test Public Network Redundancy

If you perform this test, you can verify that IP addresses failover from one adapter to another adapter within the same IPMP group.

Before You Begin

To perform this procedure, become superuser or assume a role that provides solaris.cluster.read RBAC authorization.

Create a logical hostname resource group which is the failover hostname to use the IPMP groups on the system.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup create lhtestgroup # clreslogicalhostname create -g lhtestgroup logicalhostname # clresourcegroup online lhtestgroup
  logicalhostname
  
  The IP address that is hosted on the device on which an IPMP group is configured.
- If you are using Sun Cluster 3.1, use the following commands:
  # scrgadm -a lhtestgroup # scrgadm -aLg lhtestgroup -l logicalhostname # scswitch -Z -g lhtestgroup
  logicalhostname
  
  The IP address that is hosted on the device on which an IPMP group is configured.

Determine the adapter on which the logicalhostname exists.
# ifconfig -a

Disconnect one public network cable from the adapter you identified in Step 2.

If there are no more adapters in the group, skip to Step 7.

If there is another adapter in the group, verify that the logical hostname failed over to that adapter.
# ifconfig -a

Continue to disconnect adapters in the group, until you have disconnected the last adapter.

The resource group (lhtestgroup) should fail over to the secondary node.

Verify that the resource group failed over to the secondary node.
- If you are using Sun Cluster 3.2, use the following command:
  # clnode status lhtestgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -g

Reconnect all adapters in the group.

From the initial primary node, return ownership of the resource group to the initial primary node.
- If you are using Sun Cluster 3.2, use the following command:
  # clresourcegroup switch -n nodename lhtestgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -g lhtestgroup -h nodename
In these commands, nodename is the name of the original primary node.

Verify that the resource group is running on the original primary node.
- If you are using Sun Cluster 3.2, use the following command:
  # clnode status lhtestgroup
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -g