Use this procedure to replace a failed host adapter in a running cluster. This procedure defines Node A as the node with the failed host adapter that you are replacing.
This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Object-Oriented Commands, in Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS.
This procedure relies on the following prerequisites and assumptions.
Except for the failed host adapter, your cluster is operational and all nodes are powered on.
Your nodes are not configured with dynamic reconfiguration functionality.
If your nodes are configured for dynamic reconfiguration and you are using two entirely separate hardware paths to your shared data, see the Sun Cluster Hardware Administration Manual for Solaris OS and skip steps that instruct you to shut down the cluster.
If you are using a single, dual-port HBA to provide the connections to your shared data, you cannot use dynamic reconfiguration for this procedure. Follow all steps in the procedure. For the details on the risks and limitations of this configuration, see Configuring Cluster Nodes With a Single, Dual-Port HBA in Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS.
To perform this procedure, become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC (role-based access control) authorization.
Determine the resource groups and device groups that are running on Node A.
Record this information because you use this information in Step 12 and Step 13 of this procedure to return resource groups and device groups to Node A.
Record the details of any metadevices that are affected by the failed host adapter.
Record this information because you use it in Step 11 of this procedure to repair any affected metadevices.
Move all resource groups and device groups off Node A.
Shut down Node A.
For the full procedure about how to shut down and power off a node, see your Sun Cluster system administration documentation.
Power off Node A.
Replace the failed host adapter.
For the procedure about how to remove and add host adapters, see the documentation that shipped with your nodes.
If you need to upgrade the node's host adapter firmware, boot Node A into noncluster mode by adding -x to your boot instruction. Proceed to Step 9.
For more information about how to boot nodes, see your Sun Cluster system administration documentation.
If you do not need to upgrade the node's host adapter firmware, proceed to Step 10.
Upgrade the host adapter firmware on Node A.
If you use the Solaris 8, Solaris 9, or Solaris 10 Operating System, Sun Connection Update Manager keeps you informed of the latest versions of patches and features. Using notifications and intelligent needs-based updating, Sun Connection helps improve operational efficiency and ensures that you have the latest software patches for your Sun software.
You can download the Sun Connection Update Manager product for free by going to http://www.sun.com/download/products.xml?id=4457d96d.
Additional information for using the Sun patch management tools is provided in Solaris Administration Guide: Basic Administration at http://docs.sun.com. Refer to the version of this manual for the Solaris OS release that you have installed.
If you must apply a patch when a node is in noncluster mode, you can apply it in a rolling fashion, one node at a time, unless instructions for a patch require that you shut down the entire cluster. Follow the procedures in How to Apply a Rebooting Patch (Node) in Sun Cluster System Administration Guide for Solaris OS to prepare the node and to boot it in noncluster mode. For ease of installation, consider applying all patches at the same time. That is, apply all patches to the node that you place in noncluster mode.
For a list of patches that affect Sun Cluster, see the Sun Cluster Wiki Patch Klatch.
For required firmware, see the Sun System Handbook.
Boot Node A into cluster mode.
For more information about how to boot nodes, see Chapter 3, Shutting Down and Booting a Cluster, in Sun Cluster System Administration Guide for Solaris OS.
Perform any volume management procedures that are necessary to fix any metadevices affected by this procedure, as you identified in Step 2.
For more information, see your volume manager software documentation.
(Optional) Restore the device groups to Node A.
Perform the following step for each device group you want to return to the original node.
If you are using Sun Cluster 3.2, use the following command:
# cldevicegroup switch -n NodeA devicegroup1[ devicegroup2 …] |
The node to which you are restoring device groups.
The device group or groups that you are restoring to the node.
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -D devicegroup -h NodeA |
(Optional) Restore the resource groups to Node A.
Perform the following step for each resource group you want to return to the original node.
If you are using Sun Cluster 3.2, use the following command:
# clresourcegroup switch -n NodeA resourcegroup1[ resourcegroup2 …] |
For failover resource groups, the node to which the groups are returned. For scalable resource groups, the node list to which the groups are returned.
The resource group or groups that you are returning to the node or nodes.
If you are using Sun Cluster 3.1, use the following command:
# scswitch -z -g resourcegroup -h NodeA |