Use this procedure to replace a failed host adapter in a running cluster. This procedure defines Node A as the node with the failed host adapter that you are replacing.
This procedure relies on the following prerequisites and assumptions.
Except for the failed host adapter, your cluster is operational and all nodes are powered on.
Your nodes are not configured with dynamic reconfiguration functionality.
If your nodes are configured for dynamic reconfiguration and you are using two entirely separate hardware paths to your shared data, see the Sun Cluster Hardware Administration Manual for Solaris OS and skip steps that instruct you to shut down the cluster.
You cannot replace a single, dual-port HBA that has quorum configured on that storage path by using DR. Follow all steps in the procedure. For the details on the risks and limitations of this configuration, see Configuring Cluster Nodes With a Single, Dual-Port HBA in Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS.
Exceptions to this restriction include three-node or larger cluster configurations where no storage device has a quorum device configured.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Object-Oriented Commands, in Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS.
Become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC authorization.
Determine the resource groups and device groups that are running on Node A.
Record this information because you use this information in Step 10 and Step 11 of this procedure to return resource groups and device groups to Node A.
Move all resource groups and device groups off Node A.
Shut down Node A.
For the full procedure about how to shut down and power off a node, see Chapter 3, Shutting Down and Booting a Cluster, in Sun Cluster System Administration Guide for Solaris OS.
Power off Node A.
Replace the failed host adapter.
To remove and add host adapters, see the documentation that shipped with your nodes.
If you need to upgrade the node's host adapter firmware, boot Node A into noncluster mode by adding -x to your boot instruction. Proceed to Step 8.
If you do not need to upgrade firmware, skip to Step 9.
Upgrade the host adapter firmware on Node A.
If you use the Solaris 8, Solaris 9, or Solaris 10 Operating System, Sun Connection Update Manager keeps you informed of the latest versions of patches and features. Using notifications and intelligent needs-based updating, Sun Connection helps improve operational efficiency and ensures that you have the latest software patches for your Sun software.
You can download the Sun Connection Update Manager product for free by going to http://www.sun.com/download/products.xml?id=4457d96d.
Additional information for using the Sun patch management tools is provided in Solaris Administration Guide: Basic Administration at http://docs.sun.com. Refer to the version of this manual for the Solaris OS release that you have installed.
If you must apply a patch when a node is in noncluster mode, you can apply it in a rolling fashion, one node at a time, unless instructions for a patch require that you shut down the entire cluster. Follow the procedures in How to Apply a Rebooting Patch (Node) in Sun Cluster System Administration Guide for Solaris OS to prepare the node and to boot it in noncluster mode. For ease of installation, consider applying all patches at the same time. That is, apply all patches to the node that you place in noncluster mode.
For a list of patches that affect Sun Cluster, see the Sun Cluster Wiki Patch Klatch.
For required firmware, see the Sun System Handbook.
Boot Node A into cluster mode.
For more information about how to boot nodes, see Chapter 3, Shutting Down and Booting a Cluster, in Sun Cluster System Administration Guide for Solaris OS.
(Optional) Restore the device groups to the original node.
Do the following for each device group that you want to return to the original node.
If you are using Sun Cluster 3.2, use the following command:
# cldevicegroup switch -n nodename devicegroup1[ devicegroup2 ...] |
The node to which you are restoring device groups.
The device group or groups that you are restoring to the node.
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -D devicegroup -h nodename |
(Optional) Restore the resource groups to the original node.
Do the following for each resource group that you want to return to the original node.
If you are using Sun Cluster 3.2, use the following command:
# clresourcegroup switch -n nodename resourcegroup1[ resourcegroup2 …] |
For failover resource groups, the node to which the groups are returned. For scalable resource groups, the node list to which the groups are returned.
The resource group or groups that you are returning to the node or nodes.
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -g resourcegroup -h nodename |