Sun Cluster 2.2 System Administration Guide

Administering SPARCstorage Array Trays

This section describes procedures for administering SPARCstorage Array trays. Use the procedures described in your node hardware manual to identify the tray associated with the failed component.

To guard against data loss and a failure that might require you to replace the entire SPARCstorage Array chassis, set up all mirrors so that a single chassis contains only one submirror.

Note -

There are several different SPARCstorage Array models supported by Sun Cluster. The procedures in this section are only applicable to the SPARCstorage Array 100 series.

How to Take a SPARCstorage Array Tray Out of Service (Solstice DiskSuite)

Before removing a SPARCstorage Array tray, you must halt all I/O and spin down all drives in the tray. The drives automatically spin up if I/O requests are made, so it is necessary to stop all I/O before the drives are spun down.

These are the high-level steps to take a SPARCstorage Array tray out of service in a Solstice DiskSuite configuration:

Switching logical hosts to one cluster node
Stopping I/O to the affected tray
Identifying any replicas, hot spares, and submirrors on the affected tray
Flushing NVRAM data, if appropriate
Spinning down and removing the tray

If the entire SPARCstorage Array is being serviced, you must perform these steps on each tray.

These are the detailed steps to take a SPARCstorage Array tray out of service in a Solstice DiskSuite configuration.

Switch ownership of the affected logical hosts to other nodes by using the haswitch(1M) command.
phys-hahost1# haswitch phys-hahost1 hahost1 hahost2
The SPARCstorage Array tray to be removed might contain disks included in more than one logical host. If this is the case, switch ownership of all logical hosts with disks using this tray to another node in the cluster. The luxadm(1M) command will be used later to spin down the disks. In this example, the haswitch(1M) command switched the logical hosts to phys-hahost1, enabling phys-hahost2 to perform the administrative functions.

Use the metastat(1M) command on all affected logical hosts to identify all submirrors containing slices on the tray to be removed.
phys-hahost1# metastat -s disksetname

Stop I/O to the submirrors whose components (slices) are on the affected tray.

Use the metaoffline(1M) command for this step. This takes the submirror offline. You can use the metadetach(1M) command to stop the I/O, but the resync cost is greater.

When the submirrors on a tray are taken offline, the corresponding mirrors provide only one-way mirroring (that is, there will be no data redundancy). (A three-way mirror does not have this problem.) When the mirror is brought back online, an automatic resync occurs.

With all affected submirrors offline, I/O to the tray is stopped.

Use the metadb(1M) command to identify any replicas on the tray.

Save the metadb(1M) output to use when you replace the tray.

Use the metahs(1M) command to identify any available hot spare devices and associated submirrors.

Save the metahs(1M) output to use when you replace the tray.

If NVRAM is enabled, flush the NVRAM data on the appropriate controller, tray, or disk(s).
phys-hahost1# luxadm sync_cache pathname
A confirmation appears, indicating that NVRAM data has been flushed. See "Flushing and Purging NVRAM", for details on flushing NVRAM data.

Spin down the tray using the luxadm stop command.

When the tray lock light is out, remove the tray and perform the required service.
phys-hahost1# luxadm stop c1

How to Take a SPARCstorage Array Tray Out of Service (VxVM)

These are the high-level steps to take a SPARCstorage Array tray out of service in an VxVM configuration:

Switching logical hosts to one cluster node
Identifying VxVM objects on the affected tray
Stopping I/O to the affected tray
Flushing NVRAM data, if appropriate
Spinning down and removing the tray

If the entire SPARCstorage Array is being serviced, you must perform these steps on each tray.

These are the detailed steps to take a SPARCstorage Array tray out of service in an VxVM configuration.

Switch ownership of the affected logical hosts to other nodes by using the haswitch(1M) command.
phys-hahost1# haswitch phys-hahost1 hahost1 hahost2
The SPARCstorage Array tray to be removed might contain disks included in more than one logical host. If this is the case, switch ownership of all logical hosts with disks using this tray to another node in the cluster. The luxadm(1M) command will be used later to spin down the disks. In this example, the haswitch(1M) command switched the logical hosts to phys-hahost1, enabling phys-hahost1 to perform the administrative functions.

Identify all the volumes and corresponding plexes on the disks in the tray which is being taken out of service.
1. From the physical device address cNtNdN, obtain the controller number and the target number.
  
  For example, if the device address is c3t2d0, the controller number is 3 and the target is 2.
2. Identify VxVM devices on the affected tray from a vxdisk list output.
  
  If the target is 0 or 1, identify all devices with physical addresses beginning with cNt0 and cNt1. If the target is 2 or 3, identify all devices with physical addresses beginning with cNt2 and cNt3. If the target is 4 or 5, identify all devices with physical addresses beginning with cNt4 and cNt5. Here is an example of how vxdisk can be used to obtain the information.
  # vxdisk -g diskgroup -q list | egrep c3t2\|c3t3 | nawk '{print $3}'
3. Identify all plexes on the above devices by using the appropriate version (csh, ksh, or Bourne shell) of the following command.
  PLLIST=`vxprint -ptq -g diskgroup-e '(aslist.sd_dm_name in ("c3t2d0","c3t3d0","c3t3d1")) && (pl_kstate=ENABLED)' | nawk '{print $2}'`
  For csh, the syntax is set PLLIST .... For ksh, the syntax is export PLLIST= .... The Bourne shell requires the command export PLLIST after the variable is set.

After you have set the variable, stop I/O to the volumes whose components (subdisks) are on the tray.

Make sure all volumes associated with that tray are detached (mirrored or RAID5 configurations) or stopped (simple plexes). Issue the following command to detach a mirrored plex.
# vxplex det ${PLLIST}
An alternate command for detaching each plex in a tray is:
# vxplex -g diskgroup -v volume det plex
To stop I/O to simple plexes, unmount any file systems or stop database access.

Note -
Mirrored volumes will still be active because the other half of the mirror is still available.

If NVRAM is enabled, flush the NVRAM data on the appropriate controller, tray, or disk(s). Otherwise, skip to Step 5.
# luxadm sync_cache pathname
A confirmation appears, indicating that NVRAM data has been flushed. See "Flushing and Purging NVRAM", for details on flushing NVRAM data.

To remove the tray, use the luxadm stop command to spin it down.

When the tray lock light is out, remove the tray and perform the required service.
# luxadm stop c1

How to Return a SPARCstorage Array Tray to Service (Solstice DiskSuite)

These are the high-level steps to return a SPARCstorage Array tray back to service in a Solstice DiskSuite configuration.

Spinning up the drives
Restoring all replicas, submirrors, and hot spares
Switching each logical host back to its default master

If the entire SPARCstorage Array has been serviced, you must perform these steps on each tray.

These are the detailed steps to return a SPARCstorage Array tray back to service in a Solstice DiskSuite configuration.

If the SPARCstorage Array was removed, spin up the drives in the SPARCstorage Array tray. Otherwise, skip to Step 3.

When you have completed work on a SPARCstorage Array tray, replace the tray in the chassis. The disks will spin up automatically. However, if the disks fail to spin up, run the luxadm(1M) start command to manually spin up the entire tray. There is a short delay (several seconds) between invocation of the command and spin-up of drives in the SPARCstorage Array. In this example, c1 is the controller ID:
phys-hahost1# luxadm start c1

Add all metadevice state database replicas that were deleted from disks on this tray.

Use the information saved from Step 4 in the procedure "How to Take a SPARCstorage Array Tray Out of Service (Solstice DiskSuite)" to restore the metadevice state database replicas.
phys-hahost1# metadb -s hahost1 -a deleted-replicas
To add multiple replicas to the same slice, use the -c option.

After the disks spin up, place online all the submirrors that were taken offline.

Use the metaonline(1M) command appropriate for the disks in this tray.
phys-hahost1# metaonline -s hahost1 d15 d35 phys-hahost1# metaonline -s hahost1 d24 d54 ...
When the metaonline(1M) command is run, an optimized resync operation automatically brings the submirrors up-to-date. The optimized resync copies only those regions of the disk that were modified while the submirror was offline. This is typically a very small fraction of the submirror capacity.

Run metaonline(1M) as many times as necessary to bring back online all of the submirrors.

Note -
If you used the metadetach(1M) command to detach the submirror rather than metaoffline(1M), you must synchronize the entire submirror using the metattach(1M) command. This typically takes about 10 minutes per Gigabyte of data.

Add back all hot spares that were deleted when the SPARCstorage Array was taken out of service.

Use the metahs(1M) command as appropriate for your hot spare configuration. Use the information saved from Step 5 in the procedure "How to Take a SPARCstorage Array Tray Out of Service (Solstice DiskSuite)" to replace your hot spares.
phys-hahost1# metahs -s hahost1 -a hotsparepool cNtXdYsZ

Switch each logical host back to its default master, if necessary.
phys-hahost1# haswitch phys-hahost2 hahost2

How to Return a SPARCstorage Array Tray to Service (VxVM)

These are the high-level steps to return a SPARCstorage Array tray back to service in a VxVM configuration:

Spinning up the drives
Restoring VxVM objects
Switching each logical host back to its default master

If the entire SPARCstorage Array has been serviced, you must perform these steps on each tray.

These are the detailed steps to return a SPARCstorage Array tray back to service in an VxVM configuration.

If the SPARCstorage Array was removed, spin up the drives in the SPARCstorage Array tray. Otherwise, skip to Step 2.

When you have completed work on a SPARCstorage Array tray, replace the tray in the chassis. The disks will spin up automatically. However, if the disks fail to spin up, run the luxadm(1M) start command to manually spin up the entire tray. There is a short delay (several seconds) between invocation of the command and spin-up of drives in the SPARCstorage Array. In this example, c1 is the controller ID.
phys-hahost1# luxadm start c1

After the disks spin up, monitor the volume management recovery.

Previously affected volumes that are on the tray should begin to come back online, and the rebuilding of data should start automatically within a few minutes. If necessary, use the vxreattach and vxrecover commands to reattach disks and recover from error. Refer to the respective man pages for more information.

Note -
DRL subdisks that were detached must be manually reattached.

Switch each logical host back to its default master, if necessary.
phys-hahost1# haswitch phys-hahost2 hahost2