Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS

Chapter 6 Maintaining Platform Hardware

This chapter contains information about node hardware in a cluster environment. It contains the following topics:

Mirroring Internal Disks on Servers that Use Internal Hardware Disk Mirroring or Integrated Mirroring

Some servers support the mirroring of internal hard drives (internal hardware disk mirroring or integrated mirroring) to provide redundancy for node data. To use this feature in a cluster environment, follow the steps in this section.

Depending on the version of the Solaris operating system you use, you might need to install a patch to correct change request 5023670 and ensure the proper operation of internal mirroring.

The best way to set up hardware disk mirroring is to perform RAID configuration during cluster installation, before you configure multipathing. For instructions on performing this configuration, see the Sun Cluster Software Installation Guide for Solaris OS. If you need to change your mirroring configuration after you have established the cluster, you must perform some cluster-specific steps to clean up the device IDs, as described in the procedure that follows.

Note –

Specific servers might have additional restrictions. See the documentation that shipped with your server hardware.

For specifics about how to configure your server's internal disk mirroring, refer to the documents that shipped with your server and the raidctl(1M) man page.

How to Configure Internal Disk Mirroring After the Cluster Is Established

Before You Begin

This procedure assumes that you have already installed your hardware and software and have established the cluster. To configure an internal disk mirror during cluster installation, see the Sun Cluster Software Installation Guide for Solaris OS.

If you use the Solaris 8, Solaris 9, or Solaris 10 Operating System, Sun Connection Update Manager keeps you informed of the latest versions of patches and features. Using notifications and intelligent needs-based updating, Sun Connection helps improve operational efficiency and ensures that you have the latest software patches for your Sun software.

You can download the Sun Connection Update Manager product for free by going to http://www.sun.com/download/products.xml?id=4457d96d.

Additional information for using the Sun patch management tools is provided in Solaris Administration Guide: Basic Administration at http://docs.sun.com. Refer to the version of this manual for the Solaris OS release that you have installed.

If you must apply a patch when a node is in noncluster mode, you can apply it in a rolling fashion, one node at a time, unless instructions for a patch require that you shut down the entire cluster. Follow the procedures in How to Apply a Rebooting Patch (Node) in Sun Cluster System Administration Guide for Solaris OS to prepare the node and to boot it in noncluster mode. For ease of installation, consider applying all patches at the same time. That is, apply all patches to the node that you place in noncluster mode.

For a list of patches that affect Sun Cluster, see the Sun Cluster Wiki Patch Klatch.

For required firmware, see the Sun System Handbook.

Caution –

If there are state database replicas on the disk that you are mirroring, you must recreate them during this procedure.

If necessary, prepare the node for establishing the mirror.
1. Determine the resource groups and device groups that are running on the node.
  
  Record this information because you use it later in this procedure to return resource groups and device groups to the node.
  - If you are using Sun Cluster 3.2, use the following commands:
    # clresourcegroup status -n nodename # cldevicegroup status -n nodename
  - If you are using Sun Cluster 3.1, use the following command:
    # scstat
2. If necessary, move all resource groups and device groups off the node.
  - If you are using Sun Cluster 3.2, use the following command:
    # clnode evacuate fromnode
  - If you are using Sun Cluster 3.1, use the following command:
    # scswitch -S -h fromnode

Configure the internal mirror.
# raidctl -c clt0d0 clt1d0
-c clt0d0 clt1d0

Creates the mirror of primary disk to the mirror disk. Enter the name of your primary disk as the first argument. Enter the name of the mirror disk as the second argument.

Boot the node into single user mode.
# reboot -- -S

Clean up the device IDs.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice repair /dev/rdsk/clt0d0
  /dev/rdsk/clt0d0
  
  Updates the cluster's record of the device IDs for the primary disk. Enter the name of your primary disk as the argument.
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -R /dev/rdsk/clt0d0
  -R /dev/rdsk/clt0d0
  
  Updates the cluster's record of the device IDs for the primary disk. Enter the name of your primary disk as the argument.

Confirm that the mirror has been created and only the primary disk is visible to the cluster.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice list
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -l
The command lists only the primary disk, and not the mirror disk, as visible to the cluster.

Boot the node back into cluster mode.
# reboot

If you are using Solaris Volume Manager and if the state database replicas are on the primary disk, recreate the state database replicas.
# metadb -a /dev/rdsk/clt0d0s4

If you moved device groups off the node in Step 1, restore device groups to the original node.

Perform the following step for each device group you want to return to the original node.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevicegroup switch -n nodename devicegroup1[ devicegroup2 ...]
  -n nodename
  
  The node to which you are restoring device groups.
  
  devicegroup1[ devicegroup2 …]
  
  The device group or groups that you are restoring to the node.
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -D devicegroup -h nodename

If you moved resource groups off the node in Step 1, move all resource groups back to the node.
- If you are using Sun Cluster 3.2, use the following command:
  
  Perform the following step for each resource group you want to return to the original node.
  # clresourcegroup switch -n nodename resourcegroup1[ resourcegroup2 …]
  nodename
  
  For failover resource groups, the node to which the groups are returned. For scalable resource groups, the node list to which the groups are returned.
  
  resourcegroup1[ resourcegroup2 …]
  
  The resource group or groups that you are returning to the node or nodes.
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -g resourcegroup -h nodename

How to Remove an Internal Disk Mirror

If necessary, prepare the node for removing the mirror.
1. Determine the resource groups and device groups that are running on the node.
  
  Record this information because you use this information later in this procedure to return resource groups and device groups to the node.
  - If you are using Sun Cluster 3.2, use the following commands:
    # clresourcegroup status -n nodename # cldevicegroup status -n nodename
  - If you are using Sun Cluster 3.1, use the following command:
    # scstat
2. If necessary, move all resource groups and device groups off the node.
  - If you are using Sun Cluster 3.2, use the following command:
    # clnode evacuate fromnode
  - If you are using Sun Cluster 3.1, use the following command:
    # scswitch -S -h fromnode

Remove the internal mirror.
# raidctl -d clt0d0
-d clt0d0

Deletes the mirror of primary disk to the mirror disk. Enter the name of your primary disk as the argument.

Boot the node into single user mode.
# reboot -- -S

Clean up the device IDs.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice repair /dev/rdsk/clt0d0 /dev/rdsk/clt1d0
  /dev/rdsk/clt0d0 /dev/rdsk/clt1d0
  
  Updates the cluster's record of the device IDs. Enter the names of your disks separated by spaces.
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -R /dev/rdsk/clt0d0 # scdidadm -R /dev/rdsk/clt1d0
  -R /dev/rdsk/clt0d0
  -R /dev/rdsk/clt1d0
  
  Updates the cluster's record of the device IDs. Enter the names of your disks separated by spaces.

Confirm that the mirror has been deleted and that both disks are visible.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice list
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -l
The command lists both disks as visible to the cluster.

Boot the node back into cluster mode.
# reboot

If you are using Solaris Volume Manager and if the state database replicas are on the primary disk, recreate the state database replicas.
# metadb -c 3 -ag /dev/rdsk/clt0d0s4

If you moved device groups off the node in Step 1, restore the device groups to the original node.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevicegroup switch -n nodename devicegroup1 devicegroup2 …
  -n nodename
  
  The node to which you are restoring device groups.
  
  devicegroup1[ devicegroup2 …]
  
  The device group or groups that you are restoring to the node.
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -D devicegroup -h nodename

If you moved resource groups off the node in Step 1, restore the resource groups and device groups to the original node.
- If you are using Sun Cluster 3.2, use the following command:
  
  Perform the following step for each resource group you want to return to the original node.
  # clresourcegroup switch -n nodename resourcegroup[ resourcegroup2 …]
  nodename
  
  For failover resource groups, the node to which the groups are restored. For scalable resource groups, the node list to which the groups are restored.
  
  resourcegroup[ resourcegroup2 …]
  
  The resource group or groups that you are restoring to the node or nodes.
- If you are using Sun Cluster 3.1, use the following command:
  # scswitch -z -g resourcegroup -h nodename

Configuring Cluster Nodes With a Single, Dual-Port HBA

This section explains the use of dual-port host bus adapters (HBAs) to provide both connections to shared storage in the cluster. While Sun Cluster Geographic Edition supports this configuration, it is less redundant than the recommended configuration. You must understand the risks that a dual-port HBA configuration poses to the availability of your application, if you choose to use this configuration.

This section contains the following topics:

Risks and Trade-offs When Using One Dual-Port HBA

You should strive for as much separation and hardware redundancy as possible when connecting each cluster node to shared data storage. This approach provides the following advantages to your cluster:

The best assurance of high availability for your clustered application
Good failure isolation
Good maintenance robustness

Sun Cluster Geographic Edition is usually layered on top of a volume manager, mirrored data with independent I/O paths, or a multipathed I/O link to a hardware RAID arrangement. Therefore, the cluster software does not expect a node ever to ever lose access to shared data. These redundant paths to storage ensure that the cluster can survive any single failure.

Sun Cluster Geographic Edition does support certain configurations that use a single, dual-port HBA to provide the required two paths to the shared data. However, using a single, dual-port HBA for connecting to shared data increases the vulnerability of your cluster. If this single HBA fails and takes down both ports connected to the storage device, the node is unable to reach the stored data. How the cluster handles such a dual-port failure depends on several factors:

The cluster configuration
The volume manager configuration
The node on which the failure occurs
The state of the cluster when the failure occurs

If you choose one of these configurations for your cluster, you must understand that the supported configurations mitigate the risks to high availability and the other advantages. The supported configurations do not eliminate these previously mentioned risks.

Supported Configurations When Using a Single, Dual-Port HBA

Sun Cluster Geographic Edition supports the following volume manager configurations when you use a single, dual-port HBA for connecting to shared data:

Solaris Volume Manager with more than one disk in each diskset and no dual-string mediators configured. For details about this configuration, see Cluster Configuration When Using Solaris Volume Manager and a Single Dual-Port HBA.
Solaris Volume Manager for Sun Cluster Geographic Edition. For details about this configuration, see Cluster Configuration When Using Solaris Volume Manager for Sun Cluster Geographic Edition and a Single Dual-Port HBA.

Cluster Configuration When Using Solaris Volume Manager and a Single Dual-Port HBA

If the Solaris Volume Manager metadbs lose replica quorum for a diskset on a cluster node, the volume manager panics the cluster node. Sun Cluster Geographic Edition then takes over the diskset on a surviving node and your application fails over to a secondary node.

To ensure that the node panics and is fenced off if it loses its connection to shared storage, configure each metaset with at least two disks. In this configuration, the metadbs stored on the disks create their own replica quorum for each diskset.

Dual-string mediators are not supported in Solaris Volume Manager configurations that use a single dual-port HBA. Using dual-string mediators prevents the service from failing over to a new node.

Configuration Requirements

When configuring Solaris Volume Manager metasets, ensure that each metaset contains at least two disks. Do not configure dual-string mediators.

Expected Failure Behavior with Solaris Volume Manager

When a dual-port HBA fails with both ports in this configuration, the cluster behavior depends on whether the affected node is primary for the diskset.

If the affected node is primary for the diskset, Solaris Volume Manager panics that node because it lacks required state database replicas. Your cluster reforms with the nodes that achieve quorum and brings the diskset online on a new primary node.
If the affected node is not primary for the diskset, your cluster remains in a degraded state.

Failure Recovery with Solaris Volume Manager

Follow the instructions for replacing an HBA in your storage device documentation.

Cluster Configuration When Using Solaris Volume Manager for Sun Cluster Geographic Edition and a Single Dual-Port HBA

Because Solaris Volume Manager for Sun Cluster Geographic Edition uses raw disks only and is specific to Oracle Real Application Clusters (RAC), no special configuration is required.

Expected Failure Behavior with Solaris Volume Manager for Sun Cluster Geographic Edition

When a dual-port HBA fails and takes down both ports in this configuration, the cluster behavior depends on whether the affected node is the current master for the multi-owner diskset.

If the affected node is the current master for the multi-owner diskset, the node does not panic. If any other node fails or is rebooted, the affected node will panic when it tries to update the replicas. The volume manager chooses a new master for the diskset if the surviving nodes can achieve quorum.
If the affected node is not the current master for the multi-owner diskset, the node remains up but the device group is in a degraded state. If an additional failure affects the master node and Solaris Volume Manager for Sun Cluster Geographic Edition attempts to remaster the diskset on the node with the failed paths, that node will also panic. A new master will be chosen if any surviving nodes can achieve quorum.

Failure Recovery with Solaris Volume Manager for Sun Cluster Geographic Edition

Follow the instructions for replacing an HBA in your storage device documentation.