How to Replace a Disk Drive Without Oracle Real Application Clusters (Sun Cluster 3.1 - 3.2 With SCSI JBOD Storage Device Manual for Solaris OS)

Sun Cluster 3.1 - 3.2 With SCSI JBOD Storage Device Manual for Solaris OS

How to Replace a Disk Drive Without Oracle Real Application Clusters

You need to replace a disk drive if the disk drive fails or when you want to upgrade to a higher-quality or to a larger disk.

For conceptual information about quorum, quorum devices, global devices, and device IDs, see the Sun Cluster concepts documentation.

Before You Begin

This procedure relies on the following prerequisites and assumptions.

Your cluster is operational.
Your system is not running Oracle Real Application Clusters.

If your system is running Oracle Real Application Clusters, see SPARC: How to Replace a Disk Drive With Oracle Real Application Clusters.
(Solaris Volume Manager Only) If the disk drive failure prevents Solaris Volume Manager from reading the disk label, you have a backup of the disk-partitioning information.
Your nodes are not configured with dynamic reconfiguration functionality.

If your nodes are configured for dynamic reconfiguration, see the Sun Cluster system administration documentation, and skip steps that instruct you to shut down the node.

This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix A, Sun Cluster Object-Oriented Commands, in Sun Cluster 3.1 - 3.2 Hardware Administration Manual for Solaris OS.

To perform this procedure, become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC authorization.

Become superuser or assume a role that provides solaris.cluster.read and solaris.cluster.modify RBAC authorization.

Identify the failed disk drive.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice show -v cNtNdN
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -o diskid -l cNtNdN

Record the device identifier (DID) of the failed disk drive because you assign it to the replaced disk drive later in this procedure.

If the disk drive you are removing is configured as a quorum device, add a new quorum device that will not be affected by this procedure. Then remove the old quorum device.

To determine whether a quorum device will be affected by this procedure, use one of the following commands.
- If you are using Sun Cluster 3.2, use the following command:
  # clquorum show +
- If you are using Sun Cluster 3.1, use the following command:
  # scstat -q
To add and remove quorum devices, see the Sun Cluster system administration documentation.

If possible, back up the metadevice or volume.

For more information, see your Solaris Volume Manager or Veritas Volume Manager documentation.

(Solaris Volume Manager Only) If your disk drive failure prevents Solaris Volume Manager from reading the disk label, use the disk partitioning information that you saved.

You saved this information when you performed one of the following tasks:
- Installed your storage array in an initial cluster as outlined in Step 10 of SPARC: How to Install a Storage Array in a New SPARC Based Cluster
- Added the storage array to an operational cluster.

(Solaris Volume Manager Only) If your disk drive failure does not prevent Solaris Volume Manager from reading the disk label, save the disk partitioning information now if you have not already done so.

Caution –
Do not save disk partitioning information under /tmp because you will lose this file when you reboot. Instead, save this file under /usr/tmp.
# prtvtoc /dev/rdsk/cNtNdNsN > filename
Use this information when you partition the new disk drive.

Replace the failed disk.
1. Determine which node owns the device group.
  - If you are using Sun Cluster 3.2, use the following command:
    # cldevicegroup status devgroup1
  - If you are using Sun Cluster 3.1, use the following command:
    # scstat -D
2. If you are using Veritas Volume Manager, remove the disk drives from the Veritas Volume Manager control on a node that does not have ownership of the device group.
  # vxdisk offline cNtNdN # vxdisk rm cNtNdN
3. On a node that does not have ownership of the device group, suspend activity on the SCSI bus.
  # cfgadm -x replace_device cN::disk/cNtNdN
  When prompted, type y to suspend activity on the SCSI bus.
4. If the message cfgadm: Component system is busy, try again: failed to offline is displayed, follow these steps:
  1. Become superuser.
  2. Temporarily rename the file named es_rcm.pl.
    - If you are using Version 4.1 of Veritas Volume Manager or a version of Veritas Volume Manager that was released after 4.1, type:
      # mv /usr/lib/rcm/scripts/es_rcm.pl /usr/lib/rcm/scripts/DONTUSE
    - If you are using a version of Veritas Volume Manager that was released before Version 4.1, type:
      # mv /etc/rcm/scripts/es_rcm.pl /etc/rcm/scripts/DONTUSE
  3. Reissue the cfgadm command that you tried to issue previously.
    # cfgadm -x replace_device cN::disk/cNtNdN
  4. Rename the DONTUSE file to its original name.
    - If you are using Version 4.1 of Veritas Volume Manager or a version of Veritas Volume Manager that was released after 4.1, type:
      # mv /usr/lib/rcm/scripts/DONTUSE /usr/lib/rcm/scripts/es_rcm.pl
    - If you are using a version of Veritas Volume Manager that was released before Version 4.1, type:
      # mv /etc/rcm/scripts/DONTUSE /etc/rcm/scripts/es_rcm.pl
5. After SCSI bus activity stops, replace the disk and type y at the prompt.
  
  After replacing the disk, warning messages might be displayed. Ignore these messages.

On all nodes that are attached to the device, run the devfsadm command to probe all devices and to update the device tree.
# devfsadm
Depending on the number of devices that are connected to the node, the devfsadm(1M) command can require at least five minutes to complete.

Label the new disk drive by using the format command.

(Solaris Volume Manager Only) If you successfully saved the disk partitioning information in Step 7, from any node that is connected to the device, partition the new disk drive by using the partitioning you saved when you installed or added the storage array.
# fmthard -s filename /dev/rdsk/cNtNdNsN

On all nodes, repair the device instance for the replaced disk drive.
- If you are using Sun Cluster 3.2, use the following command:
  # cldevice repair DID_number
- If you are using Sun Cluster 3.1, use the following command:
  # scdidadm -R DID_number
  DID_number is the DID of the failed disk drive that you recorded earlier in this procedure.

Perform volume management administration to add the disk drive back to its diskset or disk group.

For more information, see your Solaris Volume Manager or Veritas Volume Manager documentation.

If you want this new disk drive to be a quorum device, add the quorum device.

To add and remove quorum devices, see the Sun Cluster system administration documentation.