This chapter describes the tests for verifying and demonstrating the high availability (HA) of your Sun Cluster configuration. The tests in this chapter assume that you installed Sun Cluster hardware, the Solaris Operating System, and Sun Cluster software. All nodes should be booted as cluster members.
This chapter contains the following procedures:
If your cluster passes these tests, your hardware has adequate redundancy. This redundancy means that your nodes, cluster transport cables, and IPMP groups are not single points of failure.
To perform the tests in How to Test Device Group Redundancy Using Resource Group Failover and How to Test Cluster Interconnects, you must first identify the device groups that each node masters. Perform these tests on all cluster pairs that share a disk device group. Each pair has a primary node and a secondary node for a particular device group.
Use one of the following commands to determine the initial primary and secondary:
The Sun Cluster 3.2 command cldevicegroup status with the -n option
The Sun Cluster 3.1 command scstat command with the -D option
For conceptual information about primary nodes, secondary nodes, failover, device groups, or cluster hardware, see your Sun Cluster concepts documentation.
This section provides the procedure for testing node redundancy and high availability of device groups. Perform the following procedure to confirm that the secondary node takes over the device group that is mastered by the primary node when the primary node fails.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Geographic Edition Object-Oriented Commands.
To perform this procedure, become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.
Create an HAStoragePlus resource group with which to test.
If you are using Sun Cluster 3.2, use the following commands:
# clresourcegroup create testgroup # clresourcetype register SUNW.HAStoragePlus # clresource create -t HAStoragePlus -g testgroup \ -p GlobalDevicePaths=/dev/md/red/dsk/d0 \ -p Affinityon=true testresource |
If the HAStoragePlus resource type is not already registered, register it.
Replace this path with your device path.
If you are using Sun Cluster 3.1, use the following commands:
# scrgadm -a -g testgroup # scrgadm -a -t SUNW.HAStoragePlus # scrgadm -a -g devicegroup -t SUNW.HAStoragePlus -j testgroup \ -x GlobalDevicePaths=/dev/md/red/dsk/d0 # scswitch -Z -g testgroup |
Replace this path with your device path.
Identify the node that masters the testgroup.
Run one of the following commands.
Power off the primary node for the testgroup.
Cluster interconnect error messages appear on the consoles of the existing nodes.
On another node, verify that the secondary node took ownership of the resource group that is mastered by the primary node.
Check the output for the resource group ownership.
Power on the initial primary node. Boot the node into cluster mode.
Wait for the system to boot. The system automatically starts the membership monitor software. The node then rejoins the cluster.
From the initial primary node, return ownership of the resource group to the initial primary node.
If you are using Sun Cluster 3.2, use the following command:
# clresourcegroup switch -n nodename testgroup |
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -g testgroup -h nodename |
In these commands, nodename is the name of the primary node.
Verify that the initial primary node has ownership of the resource group.
Look for the output that shows the device group ownership.
This section provides the procedure for testing cluster interconnect redundancy.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Geographic Edition Object-Oriented Commands.
To perform this procedure, become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC authorization.
Disconnect one of the cluster transport cables from a node in the cluster.
Messages similar to the following appear on the consoles of each node and are logged in the /var/adm/messages file.
Nov 4 08:27:21 node1 genunix: WARNING: ce1: fault detected external to device; service degraded Nov 4 08:27:21 node1 genunix: WARNING: ce1: xcvr addr:0x01 - link down Nov 4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being cleaned up Nov 4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being drained Nov 4 08:27:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being constructed Nov 4 08:28:31 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 errors during initiation Nov 4 08:28:31 node1 genunix: WARNING: Path node1:ce1 - node1:ce0 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path. |
Verify that Sun Cluster has registered that the interconnect is down.
Enter one of the following commands and verify that the interconnect path displays as Faulted.
Reconnect the cluster transport cable
Messages similar to the following appear on the consoles of each node and are logged in the /var/adm/messages file.
Nov 4 08:30:26 node1 genunix: NOTICE: ce1: fault cleared external to device; service available Nov 4 08:30:26 node1 genunix: NOTICE: ce1: xcvr addr:0x01 - link up 1000 Mbps full duplex Nov 4 08:30:26 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 being initiated Nov 4 08:30:26 node1 genunix: NOTICE: clcomm: Path node1:ce1 - node1:ce0 online |
Verify that Sun Cluster has registered that the interconnect is up.
Enter one of the following commands and verify that the interconnect path displays as Online.
Repeat Step 1 through Step 4 on each cluster transport cable in the node.
This section provides the procedure for testing public network redundancy.
If you perform this test, you can verify that IP addresses failover from one adapter to another adapter within the same IPMP group.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Geographic Edition Object-Oriented Commands.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Geographic Edition Object-Oriented Commands.
To perform this procedure, become superuser or assume a role that provides solaris.cluster.read RBAC authorization.
Create a logical hostname resource group which is the failover hostname to use the IPMP groups on the system.
If you are using Sun Cluster 3.2, use the following command:
# clresourcegroup create lhtestgroup # clreslogicalhostname create -g lhtestgroup logicalhostname # clresourcegroup online lhtestgroup |
The IP address that is hosted on the device on which an IPMP group is configured.
If you are using Sun Cluster 3.1, use the following commands:
# scrgadm -a lhtestgroup # scrgadm -aLg lhtestgroup -l logicalhostname # scswitch -Z -g lhtestgroup |
The IP address that is hosted on the device on which an IPMP group is configured.
Determine the adapter on which the logicalhostname exists.
# ifconfig -a |
Disconnect one public network cable from the adapter you identified in Step 2.
If there are no more adapters in the group, skip to Step 7.
If there is another adapter in the group, verify that the logical hostname failed over to that adapter.
# ifconfig -a |
Continue to disconnect adapters in the group, until you have disconnected the last adapter.
The resource group (lhtestgroup) should fail over to the secondary node.
Verify that the resource group failed over to the secondary node.
Reconnect all adapters in the group.
From the initial primary node, return ownership of the resource group to the initial primary node.
If you are using Sun Cluster 3.2, use the following command:
# clresourcegroup switch -n nodename lhtestgroup |
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -g lhtestgroup -h nodename |
In these commands, nodename is the name of the original primary node.
Verify that the resource group is running on the original primary node.