Sun Cluster 2.2 System Administration Guide

Chapter 10 Administering Sun Cluster Local Disks

This chapter provides instructions for administering Sun Cluster local disks. Some of the procedures documented in this chapter are dependent on your volume management software (Solstice DiskSuite or SSVM or CVM). Those that are dependent on the volume manager include the volume manager name in the procedure title.

This chapter includes the following topics:

This chapter includes the following procedures:

Sun Cluster administration involves monitoring the status of the configuration. (See Chapter 2, Sun Cluster Administration Tools, for information about monitoring methods.) The monitoring process might reveal problems with the local disks. The following sections provide instructions for correcting these problems.

For multihost disk administration procedures, see the administration chapter for your particular disk expansion unit. Also refer to your volume manager software documentation when you are replacing or repairing hardware in the Sun Cluster configuration.

10.1 Restoring a Local Boot Disk From Backup

Some situations require you to replace a cluster node's boot disk, such as when a software problem leaves the boot disk in an unknown state, an operating system upgrade fails, or a hardware problem occurs. Use the following procedures to restore the boot disk to a known state, or to replace the disk.

Note -

These procedures assume the existence of a backup copy of the boot disk.

10.1.1 How to Restore a Local Boot Disk From Backup (Solstice DiskSuite)

When the physical hosts are in the same cluster, this procedure is performed on the local host while another host provides data services for all hosts. In this example, we use two physical hosts phys-hahost1 and phys-hahost2, and two logical hosts hahost1 and hahost2.

These are the high-level steps to restore a boot disk from backup in a Solstice DiskSuite configuration.

Removing the host containing the boot disk from the disksets
Restoring the boot disk from backup
Renewing or creating replicas on the restored disk
Adding the host back to the disksets
Starting Sun Cluster on that host
Switching over the logical host to its default master (if manual mode is set for switchback)

These are the detailed steps to restore a boot disk from backup in a Solstice DiskSuite configuration. In this example, phys-hahost1 contains the disk to be restored. The boot disk is not mirrored.

Halt the host requiring the restore.

On the other hosts in the cluster, use the metaset(1M) command to remove the host being restored from the disksets.

In this example, the metaset(1M) command is run from the other host in the cluster, phys-hahost2.
phys-hahost2# metaset -s hahost1 -f -d -h phys-hahost1 phys-hahost2# metaset -s hahost2 -f -d -h phys-hahost1

Restore the boot disk on the host being restored from the backup media.

Follow the procedure described in "Restoring Files and File Systems" in the Solaris Solaris System Administrators Guide to restore the boot disk file system.

Reboot the host being restored.

Remove old DiskSuite replicas and reboot.

If you are replacing a failed disk, old replicas will not be present. If you are restoring a disk, run the metadb(1M) command to check whether old replicas are present. If so, delete the old replicas.

Note -
The default location for replicas is Slice 7. However, you are not required to place replicas on Slice 7.
phys-hahost1# metadb -d -f c0t3d0s7 phys-hahost1# reboot

Create new DiskSuite replicas on the restored disk with the metadb(1M) command.
phys-hahost1# metadb -afc 3 c0t3d0s7

Add the restored host to the diskset or disksets, from the sibling host.

phys-hahost2# metaset -s hahost1 -a -h phys-hahost1
phys-hahost2# metaset -s hahost2 -a -h phys-hahost1

Start Sun Cluster on the restored host.
phys-hahost1# scadmin startnode

Switch back the logical hosts to the default master, if necessary.

If manual mode is not set, an automatic switchback will occur.
phys-hahost1# haswitch phys-hahost1 hahost1

10.1.2 How to Restore a Local Boot Disk From Backup (SSVM or CVM)

These are the high-level steps to restore a boot disk from backup in an SSVM or CVM configuration.

Halting the host requiring the restore
Restoring the boot disk from backup
Starting Sun Cluster on that host
Switching over the logical host to its default master (if manual mode is set for switchback)

These are the detailed steps to restore a boot disk from backup in an SSVM or CVM configuration. In this example, phys-hahost1 contains the disk to be restored.

Halt the host requiring the restore.

Restore the boot disk on the host being restored from the backup media.

Follow the procedure described in "Restoring Files and File Systems" in the Solaris -- Book Title for SYSADMIN1, unknown -- to restore the boot disk file system.

Reboot the host being restored.

The reboot causes the host to discover all the devices.

Note -
If the disks are reserved, it may be necessary to run vxdctl -enable at a later time, when reservations are released.

Start Sun Cluster on the local host.
phys-hahost1# scadmin startnode

Switch back the logical hosts to the default master, if necessary.

If manual mode is not set, an automatic switchback will occur.
phys-hahost1# haswitch phys-hahost1 hahost1

10.2 Replacing a Local Non-Boot Disk

This section describes the replacement of a failed local disk that does not contain the Solaris operating environment.

In general, if a local non-boot disk fails, you recover using a backup copy to restore the data to a new disk.

The procedures for restoring a local boot disk are described in "10.1.1 How to Restore a Local Boot Disk From Backup (Solstice DiskSuite)", and in "10.1.2 How to Restore a Local Boot Disk From Backup (SSVM or CVM)".

These are the high-level steps to replace a failed local non-boot disk.

(Optional) Stopping Sun Cluster on the node with the bad disk, and shutting down that node
Replacing the disk
Formatting and partitioning the new disk
Restoring data from a backup copy
Starting Sun Cluster on that host
Switching over the logical host to its default master (if manual mode is set for switchback)

10.2.1 How to Replace a Local Non-Boot Disk

These are the detailed steps to replace a failed local non-boot disk. In this example, phys-hahost2 contains the disk that failed.

(Optional) Shut down the Sun Cluster services on the node with the failed disk and halt the node.

You may not need to perform this step if the node boots from a SPARCstorage Array disk. However, if the disk to be replaced is on the same SCSI bus as the functioning boot disk, you must shut down Sun Cluster and halt the node.
# scadmin stopnode ... # halt

Perform the disk replacement.

Use the procedure described in the service manual for your Sun Cluster node.

Start the node in single-user mode.

Run the format(1M) or fmthard(1M) command to repartition the new disk.

Make sure that you partition the new disk exactly as the disk that was replaced. (Saving the disk format information is outlined in Chapter 1, Preparing for Sun Cluster Administration.)

Run the newfs(1M) command on the new slices to create file systems.

Run the mount(1M) command to mount the appropriate file systems.

Specify the device and mount points for each file system.

Restore data from a backup copy.

Use the instructions in the Solaris System Administration documentation to perform this step.

Reboot the node.

Start Sun Cluster on the local host.
phys-hahost1# scadmin startnode

Switch back the logical hosts to the default master, if necessary.

If manual mode is not set, an automatic switchback will occur.
phys-hahost2# haswitch phys-hahost2 hahost2