As part of standard Sun Cluster administration, you should monitor the status of the configuration. See Chapter 2, Sun Cluster Administration Tools, for information about monitoring methods. During the monitoring process you might discover problems with multihost disks. The following sections provide instructions for correcting these problems.
Sun Cluster supports these SSA disk types:
100 series
200 series with differential SCSI tray
200 series with RSM (214 RSM)
Depending on which type you have and the electrical and mechanical characteristics of this disk enclosure, adding a disk might require you to prepare all disks connected to a particular controller, all disks in a particular array tray, or only the disk being added. For example, in the SPARCstorage Array 200 series with the differential SCSI tray, you must prepare the array controller and the disk enclosure. In the SPARCstorage Array 200 series with RSM (214 RSM), you need to prepare only the new disk. In the SPARCstorage Array 110, you must prepare a single tray.
If you have a SPARCstorage Array 100 series array, follow the steps as documented. If you have a SPARCstorage Array 200 series array with differential SCSI tray, you must bring down all disks attached to the array controller that will connect to the new disk. This means you repeat all of the tray-specific steps for all disk enclosures attached to the array controller that will connect to the new disk. If you have a SPARCstorage Array 214 RSM, you need not perform any of the tray-specific steps, since individual disk drives can be installed without affecting other disks.
Refer to the hardware service manual for your multihost disk expansion unit for a description of your disk enclosure.
Depending upon the disk enclosure, adding SPARCstorage Array (SSA) multihost disks might involve taking off line all volume manager objects in the affected disk tray or disk enclosure. Additionally, the disk tray or disk enclosure might contain disks from more than one disk group, requiring that a single node own all of the affected disk groups.
These are the high-level steps to add a multihost disk in a Solstice DiskSuite configuration:
Switching logical hosts to one cluster node
Identifying the controller for this new disk, and locating an empty slot in the tray or enclosure
For Model 100 series SPARCstorage Arrays, preparing the disk enclosure for removal of a disk tray
For Model 200 series SPARCstorage Arrays with wide differential SCSI disk trays, powering down the controller and all attached disks
Deleting all hot spares from the affected drives
Deleting all metadevice state databases from the affected drives
Taking offline all metadevices containing affected drives
Spinning down all affected drives
Adding the new disk
Returning the affected drives to service
Spinning up all drives
Bringing back online all affected metadevices
Adding back all deleted hot spares
Recreating all deleted metadevices
Performing the administrative actions to prepare the disk for use by Sun Cluster
Creating the /devices special files and /dev/dsk and /dev/rdsk links
Running the scdidadm -r command
Adding the disk to the diskset
Formatting and partitioning the disk, if necessary
Performing the volume manager-related administrative tasks
These are the detailed steps to add a new multihost disk to a Solstice DiskSuite configuration.
Switch ownership of the logical host that will include the new disk to other nodes in the cluster.
Switch over any logical hosts with disks in the tray you are removing.
phys-hahost1# haswitch phys-hahost1 hahost1 hahost2 |
Determine the controller number of the tray to which the disk will be added.
SPARCstorage Arrays are assigned World Wide Names (WWN). The WWN on the front of the SPARCstorage Array also appears as part of the /devices entry, which is linked by pointer to the /dev entry containing the controller number. For example:
phys-hahost1# ls -l /dev/rdsk | grep -i WWN | tail -1 |
If the WWN on the front of the SPARCstorage Array is 36cc, the following output will display, and the controller number would be c2:
phys-hahost1# ls -l /dev/rdsk | grep -i 36cc | tail -1 lrwxrwxrwx 1 root root 94 Jun 25 22:39 c2t5d2s7 -> ../../devices/io-unit@f,e1200000/sbi@0,0/SUNW,soc@3,0/SUNW,pln@a0000800,201836cc/ssd@5,2:h,raw |
Use the luxadm(1M) command with the display option to view the empty slots.
phys-hahost1# luxadm display c2 SPARCstorage Array Configuration ... DEVICE STATUS TRAY 1 TRAY 2 TRAY 3 slot 1 Drive: 0,0 Drive: 2,0 Drive: 4,0 2 Drive: 0,1 Drive: 2,1 Drive: 4,1 3 NO SELECT NO SELECT NO SELECT 4 NO SELECT NO SELECT NO SELECT 5 NO SELECT NO SELECT NO SELECT 6 Drive: 1,0 Drive: 3,0 Drive: 5,0 7 Drive: 1,1 NO SELECT NO SELECT 8 NO SELECT NO SELECT NO SELECT 9 NO SELECT NO SELECT NO SELECT 10 NO SELECT NO SELECT NO SELECT ... |
The empty slots are shown with a NO SELECT status. The output shown here is from a SPARCstorage Array 110; your display will be slightly different if you are using a different series SPARCstorage Array.
Determine the tray to which you will add the new disk. If you can add the disk without affecting other drives, such as in the SPARCstorage Array 214 RSM, skip to Step 11.
In the remainder of the procedure, Tray 2 is used as an example. The slot selected for the new disk is Tray 2 Slot 7. The new disk will be known as c2t3d1.
Locate all hot spares affected by the installation.
To determine the status and location of all hot spares, run the metahs(1M) command with the -i option on each of the logical hosts.
phys-hahost1# metahs -s hahost1 -i ... phys-hahost1# metahs -s hahost2 -i ... |
Save a list of the hot spares. The list is used later in this maintenance procedure. Be sure to note the hot spare devices and their hot spare pools.
Use the metahs(1M) command with the -d option to delete all affected hot spares.
Refer to the man page for details on the metahs(1M) command.
phys-hahost1# metahs -s hahost1 -d hot-spare-pool components phys-hahost1# metahs -s hahost2 -d hot-spare-pool components |
Locate all metadevice state database replicas that are on affected disks.
Run the metadb(1M) command on each of the logical hosts to locate all metadevice state databases. Direct the output into temporary files.
phys-hahost1# metadb -s hahost1 > /usr/tmp/mddb1 phys-hahost1# metadb -s hahost2 > /usr/tmp/mddb2 |
The output of metadb(1M) shows the location of metadevice state database replicas in this disk enclosure. Save this information for the step in which you restore the replicas.
Delete the metadevice state database replicas that are on affected disks.
Keep a record of the number and locale of the replicas that you delete. The replicas must be restored in a later step.
phys-hahost1# metadb -s hahost1 -d replicas phys-hahost1# metadb -s hahost2 -d replicas |
Run the metastat(1M) command to determine all the metadevice components on affected disks.
Direct the output from metastat(1M) to a temporary file so that you can use the information later when deleting and re-adding the metadevices.
phys-hahost1# metastat -s hahost1 > /usr/tmp/replicalog1 phys-hahost1# metastat -s hahost2 > /usr/tmp/replicalog2 |
Take offline all submirrors containing affected disks.
Use the temporary files to create a script to take offline all affected submirrors in the disk expansion unit. If only a few submirrors exist, run the metaoffline(1M) command to take each offline. The following is a sample script.
#!/bin/sh # metaoffline -s <diskset> <mirror> <submirror> metaoffline -s hahost1 d15 d35 metaoffline -s hahost2 d15 d35 ... |
Spin down the SPARCstorage Array disks in the tray using the luxadm(1M) command.
phys-hahost1# luxadm stop -t 2 c2 |
Add the new disk.
Use the instructions in your multihost disk expansion unit service manual to perform the hardware procedure of adding the disk. After the addition:
Make sure all disks in the tray spin up.
The disks in the SPARCstorage Array tray should spin up automatically, but if the tray fails to spin up within two minutes, force the action using the following command:
phys-hahost1# luxadm start -t 2 c2 |
Bring the submirrors back online.
Modify the script that you created in Step 9 to bring the submirrors back online.
#!/bin/sh # metaonline -s <diskset> <mirror> <submirror> metaonline -s hahost1 d15 d35 metaonline -s hahost2 d15 d35 ... |
Restore the hot spares that were deleted in Step 5.
phys-hahost1# metahs -s hahost1 -a hot-spare-pool components phys-hahost1# metahs -s hahost2 -a hot-spare-pool components |
Restore the original count of metadevice state database replicas to the devices in the tray.
The replicas were removed in Step 7.
phys-hahost1# metadb -s hahost1 -a replicas phys-hahost1# metadb -s hahost2 -a replicas |
Run the drvconfig(1M) and disks(1M) commands to create the new entries in /devices, /dev/dsk, and /dev/rdsk for all new disks.
phys-hahost1# drvconfig phys-hahost1# disks |
Switch ownership of the logical host to which this disk will be added to the other node that is connected to the SPARCstorage Array.
This assumes a topology in which each disk is connected to two nodes.
phys-hahost1# haswitch phys-hahost2 hahost2 |
Run the drvconfig(1M) and disks(1M) commands on the cluster node that now owns the diskset to which this disk will be added.
phys-hahost2# drvconfig phys-hahost2# disks |
Run the scdidadm(1M) command to initialize the new disk for use by the DID pseudo driver.
You must run scdidadm(1M) on Node 0 in the cluster. Refer to the Sun Cluster 2.2 Software Installation Guide for details on the DID pseudo driver.
phys-hahost2# scdidadm -r |
Add the disk to a diskset.
The command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
# metaset -s diskset -a drive |
The metaset(1M) command might repartition this disk automatically. See the Solstice DiskSuite documentation for more information.
Use the scadmin(1M) command to reserve and enable failfast on the specified disk that has just been added to the diskset.
phys-hahost2# scadmin reserve cNtXdYsZ |
Perform the usual administration actions on the new disk.
You can now perform the usual administration steps that bring a new drive into service. These include partitioning the disk, adding it to the configuration as a hot spare, or configuring it as a metadevice. See the Solstice DiskSuite documentation for more information on these tasks.
If necessary, switch logical hosts back to their default masters.
These are the high-level steps to add a multihost disk to an VxVM configuration:
Switching logical hosts to one cluster node
Identifying the controller for this new disk and locating an empty slot in the tray or enclosure
For Model 100 series SPARCstorage Arrays, preparing the disk enclosure for removal of a disk tray
For Model 200 series SPARCstorage Arrays with wide differential SCSI disk trays, powering down the controller and all attached disks
Identifying VxVM objects on the affected tray
Stopping I/O to volumes with subdisks on the affected tray
Adding the new disk
Returning the affected drives to service
Spinning up all drives
Bringing back online all affected VxVM objects
Performing the administrative actions to prepare the disk for use by Sun Cluster
Creating the /devices special files and /dev/dsk and /dev/rdsk links
Scanning for the new disk
Adding the disk to VM control
Formatting and partitioning the disk, if necessary
Performing the volume manager-related administrative tasks
These are the detailed steps to add a new multihost disk to an VxVM configuration.
Switch ownership of the logical host that will include the new disk to another node in the cluster.
Switch over any logical hosts with disks in the tray you are removing.
phys-hahost1# haswitch phys-hahost1 hahost1 hahost2 |
In a mirrored configuration, you may not need to switch logical hosts as long as the node is not shut down.
Determine the controller number of the tray to which the disk will be added.
SPARCstorage Arrays are assigned World Wide Names (WWN). The WWN on the front of the SPARCstorage Array also appears as part of the /devices entry, which is linked by pointer to the /dev entry containing the controller number. For example:
phys-hahost1# ls -l /dev/rdsk | grep -i WWN | tail -1 |
If the WWN on the front of the SPARCstorage Array is 36cc, the following output will display and the controller number would be c2:
phys-hahost1# ls -l /dev/rdsk | grep -i 36cc | tail -1 lrwxrwxrwx 1 root root 94 Jun 25 22:39 c2t5d2s7 -> ../../devices/io-unit@f,e1200000/sbi@0,0/SUNW,soc@3,0/SUNW,pln@a0000800,201836cc/ssd@5,2:h,raw phys-hahost1# |
Use the luxadm(1M) command with the display option to view the empty slots.
If you can add the disk without affecting other drives, skip to Step 11.
phys-hahost1# luxadm display c2 SPARCstorage Array Configuration ... DEVICE STATUS TRAY 1 TRAY 2 TRAY 3 slot 1 Drive: 0,0 Drive: 2,0 Drive: 4,0 2 Drive: 0,1 Drive: 2,1 Drive: 4,1 3 NO SELECT NO SELECT NO SELECT 4 NO SELECT NO SELECT NO SELECT 5 NO SELECT NO SELECT NO SELECT 6 Drive: 1,0 Drive: 3,0 Drive: 5,0 7 Drive: 1,1 NO SELECT NO SELECT 8 NO SELECT NO SELECT NO SELECT 9 NO SELECT NO SELECT NO SELECT 10 NO SELECT NO SELECT NO SELECT ... |
The empty slots are shown with a NO SELECT status. The output shown here is from a SPARCstorage Array 110; your display will be slightly different if you are using a different series SPARCstorage Array.
Determine the tray to which you will add the new disk.
In the remainder of the procedure, Tray 2 is used as an example. The slot selected for the new disk is Tray 2 Slot 7. The new disk will be known as c2t3d1.
Identify all the volumes and corresponding plexes on the disks in the tray which will contain the new disk.
From the physical device address cNtNdN, obtain the controller number and the target number.
In this example, the controller number is 2 and the target is 3.
Identify devices from a vxdisk list output.
Here is an example of how vxdisk can be used to obtain the information.
# vxdisk -g diskgroup -q list | nawk '/^c2/ {print $3}' |
Record the volume media name for the disks from the output of the command.
Identify all plexes on the above devices by using the appropriate version (csh, ksh, or Bourne shell) of the following command.
PLLIST=`vxprint -ptq -g diskgroup -e '(aslist.sd_dm_name in ("c2t3d0")) && (pl_kstate=ENABLED)' | nawk '{print $2}'` |
For csh, the syntax is set PLLIST .... For ksh, the syntax is export PLLIST= .... The Bourne shell requires the command export PLLIST after the variable is set.
After you have set the variable, stop I/O to the volumes whose components (subdisks) are on the tray.
Make sure all volumes associated with that tray are detached (mirrored or RAID5 configurations) or stopped (simple plexes). Issue the following command to detach a mirrored plex.
# vxplex -g diskgroup det ${PLLIST} |
An alternate command for detaching each plex in a tray is:
# vxplex -g diskgroup -v volume det plex |
To stop I/O to simple plexes, unmount any file systems or stop database access.
Mirrored volumes will still be active because the other half of the mirror is still available.
Add the new disk.
Use the instructions in your multihost disk expansion unit service manual to perform the hardware procedure of adding the disk.
Make sure all disks in the tray spin up.
The disks in the SPARCstorage Array tray should spin up automatically, but if the tray fails to spin up within two minutes, force the action with the following command:
phys-hahost1# luxadm start -t 2 c2 |
Run the drvconfig(1M) and disks(1M) commands to create the new entries in /devices, /dev/dsk, and /dev/rdsk for all new disks.
phys-hahost1# drvconfig phys-hahost1# disks |
Force the VxVM vxconfigd driver to scan for new disks.
phys-hahost1# vxdctl enable |
Bring the new disk under VM control by using the vxdiskadd command.
Perform the usual administration actions on the new disk.
You can now perform the usual administration steps that bring a new drive into service. These include partitioning the disk, adding it to the configuration as a hot spare, or configuring it as a plex.
This completes the procedure of adding a multihost disk to an existing SPARCstorage Array.
This section describes replacing a SPARCstorage Array (SSA) multihost disk without interrupting Sun Cluster services (online replacement) when the volume manager is reporting problems such as:
Components in the "Needs Maintenance" state
Hot spare replacement
Intermittent disk errors
These are the high-level steps to replace a multihost disk in a Solstice DiskSuite configuration. Some of the steps in this procedure apply only to configurations using SPARCstorage Array 100 series or SPARCstorage Array 200 series with the differential SCSI tray.
Switching logical hosts to one cluster node
Determining which disk needs replacement
Determining which tray holds the disk to be replaced
(SSA 100 and SSA 200 only) Detaching submirrors on the affected tray or disk enclosure
(SSA 100 and SSA 200 only) Running metaclear(1M) on the detached submirrors
(SSA 100 and SSA 200 only) Deleting available hot spares in the affected disk tray
Removing the bad disk from the diskset
(SSA 100 and SSA 200 only) Deleting any affected metadevice state database replicas on disks in the affected tray
(SSA 100 and SSA 200 only) Producing a list of metadevices in the affected tray
(SSA 100 and SSA 200 only) Using metaoffline(1M) on submirrors in the affected tray or submirrors using hot spares in the tray
(SSA 100 and SSA 200 only) Flushing NVRAM, if enabled
Spinning down the disk(s) and removing the tray or disk enclosure
Replacing the disk drive
Running the scdidadm -R command
Adding the new disk to the diskset
Reserving and enabling failfast on the new disk
Partitioning the new disk
(SSA 100 and SSA 200 only) Using the metainit(1M) command to initialize any devices that were cleared previously with the metaclear(1M) command
(SSA 100 and SSA 200 only) Bringing offline mirrors back on line using metaonline(1M) and resynchronizing
(SSA 100 and SSA 200 only) Attaching submirrors unattached previously
(SSA 100 and SSA 200 only) Replacing any hot spares in use in the submirrors that have just been attached
(SSA 100 and SSA 200 only) Returning the deleted hot spare devices to their original hot spare pools
Running the metastat(1M) command to verify the problem has been fixed
These are the detailed steps to replace a failed multihost disk in a Solstice DiskSuite configuration.
Switch ownership of the affected logical hosts to other nodes by using the haswitch(1M) command.
phys-hahost1# haswitch phys-hahost1 hahost1 hahost2 |
The SPARCstorage Array tray containing the failed disk might contain disks included in more than one logical host. If this is the case, switch ownership of all logical hosts with disks using this tray to another node in the cluster.
Identify the disk to be replaced by examining metastat(1M) and /var/adm/messages output.
When metastat(1M) reports that a device is in maintenance state or some of the components have been replaced by hot spares, you must locate and replace the device. A sample metastat(1M) output follows. In this example, device c3t3d4s0 is in maintenance state.
phys-hahost1# metastat -s hahost1 ... d50:Submirror of hahost1/d40 State: Needs Maintenance Stripe 0: Device Start Block Dbase State Hot Spare c3t3d4s0 0 No Okay c3t5d4s0 ... |
Check /var/adm/messages to see what kind of problem has been detected.
... Jun 1 16:15:26 host1 unix: WARNING: /io-unit@f,e1200000/sbi@0.0/SUNW,pln@a0000000,741022/ssd@3,4(ssd49): Jun 1 16:15:26 host1 unix: Error for command `write(I))' Err Jun 1 16:15:27 host1 unix: or Level: Fatal Jun 1 16:15:27 host1 unix: Requested Block 144004, Error Block: 715559 Jun 1 16:15:27 host1 unix: Sense Key: Media Error Jun 1 16:15:27 host1 unix: Vendor `CONNER': Jun 1 16:15:27 host1 unix: ASC=0x10(ID CRC or ECC error),ASCQ=0x0,FRU=0x15 ... |
Determine the location of the problem disk by running the luxadm(1M) command.
The luxadm(1M) command lists the trays and the drives associated with them. The output differs for each SPARCstorage Array series. This example shows output from a SPARCstorage Array 100 series array. The damaged drive is highlighted below.
phys-hahost1# luxadm display c3 SPARCstorage Array Configuration Controller path: /devices/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@0,0/SUNW,pln@a0000000,779a16:ctlr DEVICE STATUS TRAY1 TRAY2 TRAY3 Slot 1 Drive:0,0 Drive:2,0 Drive:4,0 2 Drive:0,1 Drive:2,1 Drive:4,1 3 Drive:0,2 Drive:2,2 Drive:4,2 4 Drive:0,3 Drive:2,3 Drive:4,3 5 Drive:0,4 Drive:2,4 Drive:4,4 6 Drive:1,0 Drive:3,0 Drive:5,0 7 Drive:1,1 Drive:3,1 Drive:5,1 8 Drive:1,2 Drive:3,2 Drive:5,2 9 Drive:1,3 Drive:3,3 Drive:5,3 10 Drive:1,4 Drive:3,4 Drive:5,4 CONTROLLER STATUS Vendor: SUN Product ID: SSA110 Product Rev: 1.0 Firmware Rev: 3.9 Serial Num: 000000741022 Accumulate performance Statistics: Enabled |
Detach all submirrors with components on the disk being replaced.
If you are detaching a submirror that has a failed component, you must force the detach using the metadetach -f command. The following example command detaches submirror d50 from metamirror d40.
phys-hahost1# metadetach -s hahost1 -f d40 d50 |
Use the metaclear(1M) command to clear the submirrors detached in Step 4.
phys-hahost1# metaclear -s hahost1 -f d50 |
Before deleting replicas and hot spares, make a record of the location (slice), number of replicas, and hot spare information (names of the devices and list of devices that contain hot spare pools) so that the actions can be reversed following the disk replacement.
Delete all hot spares that have Available status and are in the same tray as the problem disk.
This includes all hot spares, regardless of their logical host assignment. In the following example, the metahs(1M) command reports hot spares on hahost1, but shows that none are present on hahost2"
phys-hahost1# metahs -s hahost1 -i hahost1:hsp000 2 hot spares c1t4d0s0 Available 2026080 blocks c3t2d5s0 Available 2026080 blocks phys-hahost1# metahs -s hahost1 -d hsp000 c3t2d4s0 hahost1:hsp000: Hotspare is deleted phys-hahost1# metahs -s hahost2 -i phys-hahost1# hahost1:hsp000 1 hot spare c3t2d5s0 Available 2026080 blocks |
Use the metaset(1M) command to remove the failed disk from the diskset.
The syntax for the command is shown below. In this example, diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).
# metaset -s diskset -d drive |
This operation can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.
Delete any metadevice state database replicas that are on disks in the tray to be serviced.
The metadb(1M) command with the -s option reports replicas in a specified diskset.
phys-hahost1# metadb -s hahost1 phys-hahost1# metadb -s hahost2 phys-hahost1# metadb -s hahost1 -d replicas-in-tray phys-hahost1# metadb -s hahost2 -d replicas-in-tray |
Locate the submirrors using components that reside in the affected tray.
One method is to use the metastat(1M) command to create temporary files that contain the names of all metadevices. For example:
phys-hahost1# metastat -s hahost1 > /usr/tmp/hahost1.stat phys-hahost1# metastat -s hahost2 > /usr/tmp/hahost2.stat |
Search the temporary files for the components in question (c3t3dn and c3t2dn in this example). The information in the temporary files will look like this:
... hahost1/d35: Submirror of hahost1/d15 State: Okay Hot Spare pool: hahost1/hsp100 Size: 2026080 blocks Stripe 0: Device Start Block Dbase State Hot Spare c3t3d3s0 0 No Okay hahost1/d54: Submirror of hahost1/d24 State: Okay Hot Spare pool: hahost1/hsp106 Size: 21168 blocks Stripe 0: Device Start Block Dbase State Hot Spare c3t3d3s6 0 No Okay ... |
Take offline all other submirrors that have components in the affected tray.
Using the output from the temporary files in Step 10, run the metaoffline(1M) command on all submirrors in the affected tray
phys-hahost1# metaoffline -s hahost1 d15 d35 phys-hahost1# metaoffline -s hahost1 d24 d54 ... |
Run metaoffline(1M) as many times as necessary to take all the submirrors off line. This forces Solstice DiskSuite to stop using the submirror components.
If enabled, flush the NVRAM on the controller, tray, individual disk or disks.
phys-hahost1# luxadm sync_cache pathname |
A confirmation appears, indicating that NVRAM has been flushed. See "Flushing and Purging NVRAM", for details on flushing NVRAM data.
Spin down all disks in the affected SPARCstorage Array tray(s).
Use the luxadm stop command to spin down the disks. Refer to the luxadm(1M) man page for details.
phys-hahost1# luxadm stop -t 2 c3 |
Do not run any Solstice DiskSuite commands while a SPARCstorage Array tray is spun down because the commands might have the side effect of spinning up some or all of the drives in the tray.
Replace the disk.
Refer to the hardware service manuals for your SPARCstorage Array for details on this procedure.
Update the DID driver's database with the new device ID.
Use the -l flag to scdidadm(1M) to identify the DID name for the lower-level device name of the drive to be replaced. Then update the DID drive database using the -R flag to scdidadm(1M). Refer to the Sun Cluster 2.2 Software Installation Guide for details on the DID pseudo driver.
phys-hahost1# scdidadm -o name -l /dev/rdsk/c3t3d4 6 phys-hahost1:/dev/rdsk/c3t3d4 /dev/did/rdsk/d6 phys-hahost1# scdidadm -R d6 |
Make sure all disks in the affected multihost disk expansion unit spin up.
The disks in the multihost disk expansion unit should spin up automatically. If the tray fails to spin up within two minutes, force the action by using the following command:
phys-hahost1# luxadm start -t 2 c3 |
Add the new disk back into the diskset by using the metaset(1M) command.
This step automatically adds back the number of replicas that were deleted from the failed disk. The command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
# metaset -s diskset -a drive |
(Optional) If you deleted replicas that belonged to other disksets from disks that were in the same tray as the errored disk, use the metadb(1M) command to add back the replicas.
phys-hahost1# metadb -s hahost2 -a deleted-replicas |
To add multiple replicas to the same slice, use the -c option.
Use the scadmin(1M) command to reserve and enable failfast on the specified disk that has just been added to the diskset.
phys-hahost2# scadmin reserve c3t3d4 |
Use the format(1M) or fmthard(1M) command to repartition the new disk.
Make sure that you partition the new disk exactly as the disk that was replaced. (Saving the disk format information was recommended in Chapter 1, Preparing for Sun Cluster Administration.)
Use the metainit(1M) command to reinitialize disks that were cleared in Step 5.
phys-hahost1# metainit -s hahost1 d50 |
Bring online all submirrors that were taken off line in Step 11.
phys-hahost1# metaonline -s hahost1 d15 d35 phys-hahost1# metaonline -s hahost1 d24 d54 ... |
Run the metaonline(1M) command as many times as necessary to bring online all the submirrors.
When the submirrors are brought back online, Solstice DiskSuite automatically performs resyncs on all the submirrors, bringing all data up-to-date.
Running the metastat(1M) command at this time would show that all metadevices with components residing in the affected tray are resyncing.
Attach submirrors that were detached in Step 4.
Use the metattach(1M) command to perform this step. See the metattach(1M) man page for details.
phys-hahost1# metattach -s hahost1 d40 d50 |
Replace any hot spares in use in the submirrors attached in Step 23.
If a submirror had a hot spare replacement in use before you detached the submirror, this hot spare replacement will be in effect after the submirror is reattached. This step returns the hot spare to Available status.
phys-hahost1# metareplace -s hahost1 -e d40 c3t3d4s0 |
Restore all hot spares that were deleted in Step 7.
Use the metahs(1M) command to add back the hot spares. See the metahs(1M) man page for details.
phys-hahost1# metahs -s hahost1 -a hsp000 c3t2d5s0 |
If necessary, switch logical hosts back to their default masters.
phys-hahost1# haswitch phys-hahost2 hahost2 |
Verify that the replacement corrected the problem.
phys-hahost1# metastat -s hahost1 |
In a VxVM configuration, it is possible to replace a SPARCstorage Array disk without halting the system, as long as the configuration is mirrored.
If you need to replace a disk in a bootable SPARCstorage Array, do not remove the SSA trays containing the boot disk of the hosts. Instead, shut down the host whose boot disk is present on that tray. Let the cluster software reconfigure the surviving nodes for failover to take effect before servicing the faulty disk. Refer to the SPARCstorage Array User's Guide for more information.
These are the high-level steps to replace a multihost disk in an VxVM environment using SPARCstorage Array 100 series disks.
Identifying all volumes and corresponding plexes on the disks in the tray which contains the faulty disk
Determining the controller and target number of the errored disk
Identifying devices on the tray by using the vxdisk list command
Identifying all plexes on the affected tray
Detaching all plexes on the affected tray
Removing the disk from its disk group
Spinning down the disks in the tray
Replacing the disk drive
Spinning up the drives in the tray
Initializing the replacement disk drive
Scanning the current disk configuration
Adding the replacement disk drive to the disk group
Resynchronizing the volumes
These are the detailed steps to replace a multihost disk in an VxVM environment using SPARCstorage Array 100 series disks.
If the replaced disk is a quorum device, use the scconf -q command to change the quorum device to a different disk.
Identify all the volumes and corresponding plexes on the disks in the tray which contains the faulty disk.
From the physical device address cNtNdN, obtain the controller number and the target number.
For example, if the device address is c3t2d0, the controller number is 3 and the target is 2.
Identify devices from a vxdisk list output.
If the target is 0 or 1, identify all devices with physical addresses beginning with cNt0 and cNt1, where N is the controller number. If the target is 2 or 3, identify all devices with physical addresses beginning with cNt2 and cNt3. If the target is 4 or 5, identify all devices with physical addresses beginning with cNt4 and cNt5. Here is an example of how vxdisk can be used to obtain the information.
# vxdisk -g diskgroup -q list | egrep c3t2\|c3t3 | nawk '{print $3}' |
Record the volume media name for the faulty disk from the output of the command.
You will need this name in Step 10.
Identify all plexes on the above devices by using the appropriate version (csh, ksh, or Bourne shell) of the following command.
PLLIST=`vxprint -ptq -g diskgroup -e '(aslist.sd_dm_name in ("c3t2d0","c3t3d0","c3t3d1")) && (pl_kstate=ENABLED)' | nawk '{print $2}'` |
For csh, the syntax is set PLLIST .... For ksh, the syntax is export PLLIST= .... The Bourne shell requires the command export PLLIST after the variable is set.
After you have set the variable, stop I/O to the volumes whose components (subdisks) are on the tray.
Make sure all volumes associated with that tray are detached (mirrored or RAID5 configurations) or stopped (simple plexes). Issue the following command to detach a mirrored plex.
# vxplex det ${PLLIST} |
An alternate command for detaching each plex in a tray is:
# vxplex -g diskgroup -v volume det plex |
To stop I/O to simple plexes, unmount any file systems or stop database access.
Mirrored volumes will still be active because the other half of the mirror is still available.
Remove the disk from the disk group.
# vxdg -g diskgroup rmdisk diskname |
Spin down the disks in the tray.
# luxadm stop -t tray controller |
Replace the faulty disk.
Spin up the drives.
# luxadm start -t tray controller |
Initialize the replacement disk.
# vxdisksetup -i devicename |
Scan the current disk configuration again.
Enter the following commands on all nodes in the cluster.
# vxdctl enable # vxdisk -a online |
Add the new disk to the disk group.
The device-media-name is the volume media name recorded in Step 2c.
# vxdg -g diskgroup -k adddisk device-media-name=device-name |
Resynchronize the volumes.
# vxrecover -g diskgroup -b -o |