This chapter provides instructions for administering Sun StorEdgeTM MultiPack and Sun StorEdge D1000 disks. Some of the procedures documented in this chapter are dependent on your volume management software (Solstice DiskSuite, SSVM, or CVM). These procedures include the volume manager name in their titles.
"12.2 Administering Sun StorEdge MultiPacks and Sun StorEdge D1000s"
"12.3 Administering Sun StorEdge MultiPack and Sun StorEdge D1000 Disks"
This chapter includes the following procedures:
"12.1.1 How to Recover From Power Loss (Solstice DiskSuite)"
"12.2.2 How to Repair a Lost Sun StorEdge MultiPack or Sun StorEdge D1000 Connection"
"12.2.4 How to Add a Sun StorEdge MultiPack or Sun StorEdge D1000"
"12.3.2 How to Add a Sun StorEdge MultiPack or a Sun StorEdge D1000 Disk"
"12.3.4 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Disk (Solstice DiskSuite)"
"12.3.4 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Disk (Solstice DiskSuite)"
"12.3.5 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Disk (SSVM or CVM)"
"12.3.7 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Enclosure (SSVM or CVM)"
Use the service manual for your Sun StorEdge MultiPack and Sun StorEdge D1000 disks, and the volume management software documentation, when you are replacing or repairing disk hardware in the Sun Cluster configuration.
When power is lost to one Sun StorEdge MultiPack or Sun StorEdge D1000, I/O operations generate errors that are detected by your volume management software. Errors are not reported until I/O transactions are made to the disk.
You should monitor the configuration for these events using the commands described in Chapter 2, Sun Cluster Administration Tools.
These are the high-level steps to recover from power loss to a disk enclosure in a Solstice DiskSuite environment:
Identifying the errored replicas
Returning the errored replicas to service
Identifying the errored devices
Returning the errored devices to service
Resyncing the disks
These are the detailed steps to recover from power loss to a disk enclosure in a Solstice DiskSuite environment.
When power is restored, use the metadb(1M) command to identify the errored replicas:
# metadb -s diskset |
Return replicas to service.
After the loss of power, all metadevice state database replicas on the affected disk enclosure chassis enter an errored state. Because metadevice state database replica recovery is not automatic, it is safest to perform the recovery immediately after the disk enclosure returns to service. Otherwise, a new failure can cause a majority of replicas to be out of service and cause a kernel panic. This is the expected behavior of Solstice DiskSuite when too few replicas are available.
While these errored replicas will be reclaimed at the next takeover (haswitch(1M) or reboot(1M)), you might want to return them to service manually by first deleting and then adding them back.
Make sure that you add back the same number of replicas that were deleted on each slice. You can delete multiple replicas with a single metadb(1M) command. If you need multiple copies of replicas on one slice, you must add them in one invocation of the metadb(1M) command using the -c flag.
Use the metastat(1M) command to identify the errored metadevices.
# metastat -s diskset |
Return errored metadevices to service using the metareplace(1M) command, and resync the disks.
# metareplace -s diskset -e mirror component |
The -e option transitions the component (slice) to the available state and performs a resync.
Components that have been replaced by a hot spare should be the last devices replaced using the metareplace(1M) command. If the hot spare is replaced first, it could replace another errored submirror as soon as it becomes available.
You can perform a resync on only one component of a submirror (metadevice) at a time. If all components of a submirror were affected by the power outage, each component must be replaced separately. It takes approximately 10 minutes to resync a 1.05GB disk.
If both disksets in a symmetric configuration were affected by the power outage, you can resync each diskset's affected submirrors concurrently. Log into each host separately to recover that host's diskset by running metareplace(1M) on each.
Depending on the number of submirrors and the number of components in these submirrors, the resync actions can require a considerable amount of time. A single submirror made up of 30 1.05GB drives might take about five hours to complete, whereas a configuration made up of five component submirrors might take only 50 minutes to complete.
Power failures can detach disk drives and cause plexes to become detached, and thus, unavailable. The volume remains active, however, because the remaining plexes in a mirrored volume are still available. It is possible to reattach the disk drives and recover from this condition without halting nodes in the cluster.
These are the high-level steps to recover from power loss to a disk enclosure in an SSVM configuration:
Determining the errored plex(es) by using the vxprint and vxdisk commands
Fixing the problem that caused the power loss
Using the drvconfig and disks commands to create the /devices and /dev entries
Scanning the current disk configuration
Reattaching disks that had transient errors
Verifying there are no more errors
(Optional) For shared disk groups, running the vxdg command for each disk that was powered off
Starting volume recovery
These are the detailed steps to recover from power loss to a disk enclosure in an SSVM configuration.
Use the vxprint command to view the errored plexes.
Optionally, specify a diskgroup with the -g diskgroup option.
Use the vxdisk command to identify the errored disks.
# vxdisk list DEVICE TYPE DISK GROUP STATUS .. - - c1t5d0 toi failed was:c1t5d0s2 ... |
Fix the condition that resulted in the problem so that power is restored to all failed disks.
Be sure that the disks are spun up before proceeding.
Enter the following commands on all nodes in the cluster.
In some cases, the drive(s) must be rediscovered by the node(s).
# drvconfig # disks |
Enter the following commands on all nodes in the cluster.
The volume manager must scan the current disk configuration again.
# vxdctl enable # vxdisk -a online |
Enter the following command on all nodes in the cluster.
For CVM, enter the command on the master node first, then on the remaining nodes.
This will reattach and initiate recovery on disks that had transitory failure.
# vxreattach -r |
Verify the output of the vxdisk command to see if there are any more errors.
# vxdisk list |
If media was replaced, enter the following command from the master node for each disk that has been disconnected.
The physical disk and the volume manager access name for that disk must be reconnected.
# vxdg -g diskgroup -k adddisk medianame=accessname |
The values for medianame and accessname appear at the end of the vxdisk list command output.
For example:
# vxdg -g toi -k adddisk c1t5d0=c1t5d0s2 # vxdg -g toi -k adddisk c1t5d1=c1t5d1s2 # vxdg -g toi -k adddisk c1t5d2=c1t5d2s2 # vxdg -g toi -k adddisk c1t5d3=c1t5d3s2 # vxdg -g toi -k adddisk c1t5d4=c1t5d4s2 |
You can also use the vxdiskadm command, or the graphical user interface, to reattach the disks.
From the node, start volume recovery.
# vxrecover -bv [-g diskgroup] |
If you have shared disk groups, use the -svc options to the vxrecover command.
(Optional) Use the vxprint -g command to view the changes.
This section describes procedures for administering Sun StorEdge MultiPack and Sun StorEdge D1000 components. Use the procedures described in your server hardware manual to identify the failed component.
When a connection from a disk enclosure to one of the cluster nodes fails, the failure is probably due to a bad SCSI-2 cable or an SBus card.
In any event, the node on which the failure occurred will begin generating errors when the failure is discovered. Later accesses to the disk enclosure will generate additional errors. The node will exhibit the same behavior as though power had been lost to the disk enclosure.
I/O operations from the other nodes in the cluster are unaffected by this type of failure.
To diagnose the failure, use the procedures for testing the card module in the service manual for your Sun Cluster node to determine which component failed. You should free up one node and the disk enclosure that appears to be down, for hardware debugging.
Prepare the Sun Cluster system for component replacement.
Depending on the cause of the connection loss, prepare the Sun Cluster node with one of the following procedures.
If the failed component is an SBus card, see Chapter 7, Administering Server Components, to prepare the Sun Cluster node for power down.
If the problem is a bad SCSI-2 cable, the volume management software will have detected the problem and prepared the system for cable replacement.
Replace the failed component.
If the SCSI-2 cable or SBus card fails, refer to the service manual for your Sun Cluster node for detailed instructions on replacing them.
Recover from volume management software errors.
Use the procedures described in "12.1 Recovering From Power Loss".
You can add Sun StorEdge MultiPacks or Sun StorEdge D1000s to a Sun Cluster configuration at any time.
You must review the disk group configuration in your Sun Cluster configuration before adding a disk enclosure. The discussions in the chapter on planning the configuration in the Sun Cluster 2.2 Software Installation Guide, and in Appendix A, Administering Volume Managers, in this book, will help determine the impact of the disk enclosure on the configuration of disk groups.
Shut down one of the cluster nodes.
Use the procedure in "4.2 Stopping the Cluster and Cluster Nodes", to shut down the node.
Install an additional SBus card in the node, if necessary.
Use the instructions in the hardware service manual for your Sun Cluster node to install the SBus card.
Install the SBus card in the first available empty SBus slot, following all other cards in the node. This ensures that the controller numbering will be preserved if the Solaris operating environment is reinstalled. Refer to "1.4 Instance Names and Numbering", for more information.
Connect the SCSI-2 cables to the disk enclosure.
Use the instructions in the hardware service manual for your Sun Cluster node.
Set the SCSI initiator ID, as appropriate.
Use the instructions in the hardware service manual for your Sun Cluster node.
Perform a reconfiguration reboot of the node.
ok boot -r |
Use the haswitch(1M) command to switch ownership of all logical hosts that can be mastered to the rebooted node.
phys-hahost1# haswitch phys-hahost2 hahost1 hahost2 |
Repeat "" through Step 5 on other nodes connected to this disk enclosure.
Switch ownership of the logical hosts back to the appropriate default master if necessary.
For example:
phys-hahost1# haswitch phys-hahost2 hahost2 |
Add the disks in the disk enclosures to the selected disk group.
Use the instructions in your volume manager documentation to add the disks to the selected disk group(s). Also, refer to appendixes in the Sun Cluster 2.2 Software Installation Guide for information on Solstice DiskSuite, SSVM, or CVM.
As part of standard Sun Cluster administration, you should monitor the status of the configuration. See Chapter 2, Sun Cluster Administration Tools, for information about monitoring methods. During the monitoring process you might discover problems with multihost disks. The following procedures describe how to correct these problems.
Sun Cluster supports different disk types. Refer to the hardware service manual for your multihost disk expansion unit for a description of your disk enclosure.
In a symmetric configuration, the disk enclosure might contain disks from multiple disk groups and will require that a single node own all of the affected disk groups.
These are the high-level steps to add a Sun StorEdge MultiPack or Sun StorEdge D1000 disk:
Identifying the controller for this new disk and locating an empty slot in the disk enclosure
Adding the new disk
Performing the administrative actions to prepare the disk for use by Sun Cluster
Creating the /devices special files and /dev/dsk and /dev/rdsk links
Adding the disk to the disk group
Formatting and partitioning the disk, if necessary
Performing the volume management-related administrative tasks
These are the detailed steps to add a new Sun StorEdge MultiPack or Sun StorEdge D1000 disk.
Determine the controller number of the disk enclosure to which the disk will be added.
Use the mount(1M) or format(1M) command to determine the controller number.
Locate an appropriate empty disk slot in the disk enclosure for the disk being added.
Identify the empty slots either by observing the disk drive LEDs on the front of the disk enclosure, or by removing the left side cover of the unit. The target address IDs corresponding to the slots appear on the middle partition of the drive bay.
In the following steps, Tray 2 is used as an example. The slot selected for the new disk is Tray 2 Slot 7. The new disk will be known as c2t3d1.
Add the new disk.
Use the instructions in your disk enclosure unit service manual to perform the hardware procedure of adding the disk.
Run the drvconfig(1M) and disks(1M) commands to create the new entries in /devices, /dev/dsk, and /dev/rdsk for all new disks.
phys-hahost1# drvconfig phys-hahost1# disks |
Switch ownership of the logical hosts to the other cluster node to which this disk is connected.
phys-hahost1# haswitch phys-hahost2 hahost1 hahost2 |
Run the drvconfig(1M) and disks(1M) commands on the node that now owns the disk group to which the disk will be added.
phys-hahost2# drvconfig phys-hahost2# disks |
Add the disk to a disk group using your volume management software.
For Solstice DiskSuite, the command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
# metaset -s diskset -a drive |
For SSVM or CVM, you can use the command line or graphical user interface to add the disk to the disk group.
If you are using Solstice DiskSuite, the metaset(1M) command might repartition this disk automatically. See the Solstice DiskSuite documentation for more information.
(Solstice DiskSuite configurations only) After adding the disks to the diskset by using the metaset(1M) command, use the scadmin(1M) command to reserve and enable failfast on the specified disks.
phys-hahost1# scadmin reserve drivename |
Perform the usual administration actions on the new disk.
You can now perform the usual administration steps that are performed when a new drive is brought into service. See your volume management software documentation for more information on these tasks.
If necessary, switch logical hosts back to their default masters.
This section describes replacing a multihost disk without interrupting Sun Cluster services (online replacement) when the volume manager is reporting problems such as:
Components in the Needs Maintenance state
Hot spare replacement
Intermittent disk errors
Consult your volume management software documentation for offline replacement procedures.
Use the following procedure if you have determined that a disk has components in the Needs Maintenance state, a hot spare has replaced a component, or a disk is generating intermittent errors.
These are the high-level steps to replace a Sun StorEdge MultiPack or Sun StorEdge D1000 disk in a Solstice DiskSuite configuration:
Determining which disk needs replacement
Determining which disk expansion unit holds the disk to be replaced
Removing the bad disk from the diskset
Spinning down the disk and opening the disk enclosure
Replacing the disk drive
Running the scdidadm -R command
Adding the new disk to the diskset
Reserving and enabling failfast on the disk
Partitioning the new disk
Running the metastat(1M) command to verify the problem has been fixed
These are the detailed steps to replace a failed Sun StorEdge MultiPack or Sun StorEdge D1000 disk in a Solstice DiskSuite configuration.
Run the procedure on the host that masters the diskset in which the bad disk resides. This might require you to switch over the diskset using the haswitch(1M) command.
Identify the disk to be replaced.
Use the metastat(1M) command and /var/adm/messages output.
When metastat(1M) reports that a device is in maintenance state or some of the components have been replaced by hot spares, you must locate and replace the device. A sample metastat(1M) output follows. In this example, device c3t3d4s0 is in maintenance state:
phys-hahost1# metastat -s hahost1 ... d50:Submirror of hahost1/d40 State: Needs Maintenance Stripe 0: Device Start Block Dbase State Hot Spare c3t3d4s0 0 No Okay c3t5d4s0 ... |
Check /var/adm/messages to see what kind of problem has been detected.
... Jun 1 16:15:26 host1 unix: WARNING: /io- unit@f,e1200000/sbi@0.0/SUNW,pln@a0000000,741022/ssd@3,4(ssd49): Jun 1 16:15:26 host1 unix: Error for command `write(I))' Err Jun 1 16:15:27 host1 unix: or Level: Fatal Jun 1 16:15:27 host1 unix: Requested Block 144004, Error Block: 715559 Jun 1 16:15:27 host1 unix: Sense Key: Media Error Jun 1 16:15:27 host1 unix: Vendor `CONNER': Jun 1 16:15:27 host1 unix: ASC=0x10(ID CRC or ECC error),ASCQ=0x0,FRU=0x15 ... |
Determine the location of the problem disk.
Use the mount(1M) or format(1M) command to determine the controller number.
If the problem disk contains replicas, make a record of the slice and number, then delete the replicas.
Use the metadb(1M) command to delete the replicas.
Detach all submirrors with components on the disk being replaced.
If you are detaching a submirror that has a failed component, you must force the detach using the metadetach -f option. The following example detaches submirror d50 from metamirror d40.
phys-hahost1# metadetach -s hahost1 -f d40 d50 |
Use the metaclear(1M) command to clear the submirrors detached in Step 5.
phys-hahost1# metaclear -s hahost1 -f d50 |
If the problem disk contains hot spares, make a record of the names of devices and list of devices that contain hot spare pools, then delete the hot spares.
Use the metahs(1M) command to delete hot spares.
You need to record the information before deleting the objects so that the actions can be reversed following the disk replacement.
Use the metaset(1M) command to remove the failed disk from the diskset.
The command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
phys-hahost1# metaset -s diskset -d drive |
This can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.
Replace the bad disk.
Refer to the hardware service manuals for your disk enclosure for details on this procedure.
Make sure the new disk spins up.
The disk should spin up automatically.
Update the DID driver's database with the new device ID.
If you upgraded from HA 1.3, your installation does not use the DID driver, so skip this step.
Use the -l flag to scdidadm(1M) to identify the DID name for the lower level device name of the drive to be replaced. Then update the DID drive database using the -R flag to scdidadm(1M). Refer to the Sun Cluster 2.2 Software Installation Guide for details on the DID pseudo driver.
phys-hahost1# scdidadm -o name -l /dev/rdsk/c3t3d4 6 phys-hahost1:/dev/rdsk/c3t3d4 /dev/did/rdsk/d6 phys-hahost1# scdidadm -R d6 |
Add the new disk back into the diskset using the metaset(1M) command.
This step adds automatically adds back the proper number of replicas that were deleted from the failed disk. The syntax of the command is show below. In this example, diskset is the name of the diskset containing the failed disk and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).
phys-hahost1# metaset -s diskset -a drive |
This operation can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.
Use the scadmin(1M) command to reserve and enable failfast on the specified disk that has just been added back to the diskset.
phys-hahost1# scadmin reserve c3t3d4 |
Use the format(1M) or fmthard(1M) command to repartition the new disk.
Make sure that you partition the new disk exactly as the disk that was replaced. (Saving the disk format information was recommended in Chapter 1, Preparing for Sun Cluster Administration.)
Use the metainit(1M) command to reinitialize disks that were cleared in Step 6.
phys-hahost1# metainit -s hahost1 d50 |
Attach submirrors that were detached in Step 5.
Use the metattach(1M) command to perform this step. See the metattach(1M) man page for details.
phys-hahost1# metattach -s hahost1 d40 d50 |
Restore all hot spares that were deleted in Step 7.
Use metahs(1M) to add back the hot spares. See the metahs(1M) man page for details.
phys-hahost1# metahs -s hahost1 -a hsp000 c3t2d5s0 |
Verify that the replacement corrected the problem.
phys-hahost1# metastat -s hahost1 |
These are the high-level steps to replace a Sun StorEdge MultiPack or Sun StorEdge D1000 disk in an SSVM or CVM configuration:
Removing the failed disk in the disk enclosure by using the vxdiskadm command
Replacing the failed disk
Replacing the disk removed earlier by using the vxdiskadm command
For systems not running shared disk groups, master node refers to the node that has imported the disk group.
If you are running shared disk groups, determine the master and slave node by entering the following command on all nodes in the cluster:
# vxdctl -c mode |
Complete the following steps from the master node.
Determine if the disk in question had failures and is in the NODEVICE state.
If this is not the case, skip to Step 8.
Run the vxdiskadm utility and enter 4 (Remove a disk for replacement).
This option removes a physical disk while retaining the disk name. The utility then queries you for the particular device that you want to replace.
Enter the disk name or list.
The following example illustrates the removal of disk c2t8d0.
Enter disk name [<disk>,list,q,?] list Disk group: rootdg DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c0t0d0s7 c0t0d0s7 simple 1024 20255 - Disk group: demo DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c1t2d0 c2t2d0s2 sliced 1519 4152640 - dm c1t3d0 c2t3d0s2 sliced 1519 4152640 - dm c1t4d0 c2t4d0s2 sliced 1519 4152640 - dm c1t5d0 c2t5d0s2 sliced 1519 4152640 - dm c1t8d0 c2t8d0s2 sliced 1519 4152640 - dm c1t9d0 c2t9d0s2 sliced 1519 4152640 - dm c2t2d0 c1t2d0s2 sliced 1519 4152640 - dm c2t3d0 c1t3d0s2 sliced 1519 4152640 - dm c2t4d0 c1t4d0s2 sliced 1519 4152640 - dm c2t5d0 c1t5d0s2 sliced 1519 4152640 - dm c2t8d0 c1t8d0s2 sliced 1519 4152640 - dm c2t9d0 c1t9d0s2 sliced 1519 4152640 - Enter disk name [<disk>,list,q,?] c2t8d0 The requested operation is to remove disk c2t8d0 from disk group demo. The disk name will be kept, along with any volumes using the disk, allowing replacement of the disk. Select "Replace a failed or removed disk" from the main menu when you wish to replace the disk. |
Enter y or press Return to continue.
Continue with operation? [y,n,q,?] (default: y) y Removal of disk c2t8d0 completed successfully. |
Enter q to quit the utility.
Remove another disk? [y,n,q,?] (default: n) q |
Enter vxdisk list and vxprint to view the changes.
The example disk c2t8d0 is removed.
# vxdisk list . c2t3d0s2 sliced c1t3d0 demo online shared c2t4d0s2 sliced c1t4d0 demo online shared c2t5d0s2 sliced c1t5d0 demo online shared c2t8d0s2 sliced c1t8d0 demo online shared c2t9d0s2 sliced c1t9d0 demo online shared - - c2t8d0 demo removed # vxprint . dm c2t3d0 c1t3d0s2 - 4152640 - - - - dm c2t4d0 c1t4d0s2 - 4152640 - - - - dm c2t5d0 c1t5d0s2 - 4152640 - - - - dm c2t8d0 - - - - REMOVED - - dm c2t9d0 c1t9d0s2 - 4152640 - - - - pl demo05-02 - DISABLED 51200 - REMOVED - - sd c2t8d0-1 demo05-02 DISABLED 51200 0 REMOVED - - . . . |
Replace the physical drive without powering off any component.
For further information, refer to the documentation accompanying the disk enclosure unit.
As you replace the drive, you may see messages on the system console similar to those in the following example. Do not become alarmed as these messages may not indicate a problem. Instead, proceed with the replacement as described in the next steps.
Nov 3 17:44:00 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:00 updb10a unix: SCSI transport failed: reason 'incomplete': \ retrying command Nov 3 17:44:03 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:03 updb10a unix: disk not responding to selection |
Run the vxdiskadm utility and enter 5 (Replace a failed or removed disk).
Enter the disk name.
You can enter list to see a list of disks in the REMOVED state.
The disk may appear in the NODEVICE state if it had failures.
Select a removed or failed disk [<disk>,list,q,?] list Disk group: rootdg DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE Disk group: demo DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c2t8d0 - - - - REMOVED Select a removed or failed disk [<disk>,list,q,?] c2t8d0 |
The vxdiskadm utility detects the new device and asks you whether the new device should replace the removed device.
If there are other unused disks attached to the system, vxdiskadm also presents these disks as viable choices.
Enter the device name, or if the utility lists the device as the default, press Return.
The following devices are available as replacements: c1t8d0s2 You can choose one of these disks to replace c2t8d0. Choose "none" to initialize another disk to replace c2t8d0. Choose a device, or select "none" [<device>,none,q,?] (default: c1t8d0s2) <Return> The requested operation is to use the initialized device c1t8d0s2 to replace the removed or failed disk c2t8d0 in disk group demo. |
Enter y or press Return to verify that you want this device (in the example, c1t8d0s2) to be the replacement disk.
Continue with operation? [y,n,q,?] (default: y) <Return> Replacement of disk c2t8d0 in group demo with disk device c1t8d0s2 completed successfully. |
Enter n or press Return to quit this utility.
Replace another disk? [y,n,q,?] (default: n) <Return> |
Enter vxdisk list and vxprint to see the changes.
The example, disk c2t8d0, is no longer in the REMOVED state.
# vxdisk list . c2t2d0s2 sliced c1t2d0 demo online shared c2t3d0s2 sliced c1t3d0 demo online shared c2t4d0s2 sliced c1t4d0 demo online shared c2t5d0s2 sliced c1t5d0 demo online shared c2t8d0s2 sliced c1t8d0 demo online shared c2t9d0s2 sliced c1t9d0 demo online shared # vxprint . dm c2t4d0 c1t4d0s2 - 4152640 - - - - dm c2t5d0 c1t5d0s2 - 4152640 - - - - dm c2t8d0 c1t8d0s2 - 4152640 - - - - dm c2t9d0 c1t9d0s2 - 4152640 - - - - . |
This section describes how to replace an entire Sun StorEdge MultiPack or Sun StorEdge D1000 enclosure running SSVM or CVM.
These are the high-level steps for replacing an entire failed Sun StorEdge MultiPack or Sun StorEdge D1000 in an SSVM or CVM configuration:
Removing all the disks in the defective disk enclosure by using the vxdiskadm command
Replacing the failed disk enclosure
Replacing all the disks removed earlier into the new disk enclosure by using the vxdiskadm command
For systems not running shared disk groups, master node refers to the node that has imported the disk group.
If you are running shared disk groups, determine the master and slave node by entering the following command on all nodes in the cluster:
# vxdctl -c mode |
Complete the following steps from the master node.
Remove all the disks on the failed disk enclosure by running the vxdiskadm utility and entering 4 (Remove a disk for replacement).
This option enables you to remove only one disk at a time. Repeat this procedure for each disk.
Enter the list command.
In the following example, assume that the disk enclosure on controller c2 needs replacement. Based on the list output, the SSVM or CVM names for these disks are c2t2d0, c2t3d0, c2t4d0, c2t5d0, c2t8d0, and c2t9d0.
Remove a disk for replacement Menu: VolumeManager/Disk/RemoveForReplace Use this menu operation to remove a physical disk from a disk group, while retaining the disk name. This changes the state for the disk name to a "removed" disk. If there are any initialized disks that are not part of a disk group, you will be given the option of using one of these disks as a replacement. Enter disk name [<disk>,list,q,?] list Disk group: rootdg DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c0t0d0s7 c0t0d0s7 simple 1024 20255 - Disk group: demo DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c1t2d0 c2t2d0s2 sliced 1519 4152640 - dm c1t3d0 c2t3d0s2 sliced 1519 4152640 - dm c1t4d0 c2t4d0s2 sliced 1519 4152640 - dm c1t5d0 c2t5d0s2 sliced 1519 4152640 - dm c1t8d0 c2t8d0s2 sliced 1519 4152640 - dm c1t9d0 c2t9d0s2 sliced 1519 4152640 - dm c2t2d0 c1t2d0s2 sliced 1519 4152640 - dm c2t3d0 c1t3d0s2 sliced 1519 4152640 - dm c2t4d0 c1t4d0s2 sliced 1519 4152640 - dm c2t5d0 c1t5d0s2 sliced 1519 4152640 - dm c2t8d0 c1t8d0s2 sliced 1519 4152640 - dm c2t9d0 c1t9d0s2 sliced 1519 4152640 - |
Enter the disk name (in this example, c2t2d0).
Enter disk name [<disk>,list,q,?] c2t2d0 The following volumes will lose mirrors as a result of this operation: demo-1 No data on these volumes will be lost. The requested operation is to remove disk c2t2d0 from disk group demo. The disk name will be kept, along with any volumes using the disk, allowing replacement of the disk. Select "Replace a failed or removed disk" from the main menu when you wish to replace the disk. |
Enter y or press Return to verify that you want to replace the disk.
Continue with operation? [y,n,q,?] (default: y) <Return> Removal of disk c2t2d0 completed successfully. |
Enter y to continue.
Remove another disk? [y,n,q,?] (default: n) y Remove a disk for replacement Menu: VolumeManager/Disk/RemoveForReplace Use this menu operation to remove a physical disk from a disk group, while retaining the disk name. This changes the state for the disk name to a "removed" disk. If there are any initialized disks that are not part of a disk group, you will be given the option of using one of these disks as a replacement. |
Enter the next example disk name, c2t3d0.
Enter disk name [<disk>,list,q,?] c2t3d0 The following volumes will lose mirrors as a result of this operation: demo-2 No data on these volumes will be lost. The following devices are available as replacements: c1t2d0 You can choose one of these disks now, to replace c2t3d0. Select "none" if you do not wish to select a replacement disk. |
Enter none, if necessary.
This query arises whenever the utility recognizes a good disk in the system. If there are no good disks, you will not see this query.
Choose a device, or select "none" [<device>,none,q,?] (default: c1t2d0) none |
Enter y or press Return to verify that you want to remove the disk.
The requested operation is to remove disk c2t3d0 from disk group demo. The disk name will be kept, along with any volumes using the disk, allowing replacement of the disk. Select "Replace a failed or removed disk" from the main menu when you wish to replace the disk. Continue with operation? [y,n,q,?] (default: y) <Return> Removal of disk c2t3d0 completed successfully. |
Repeat Step 6 through Step 9 for each disk you identified in Step 3.
Power off and replace the disk enclosure.
For more information, refer to the disk enclosure documentation.
As you replace the disk enclosure, you may see messages on the system console similar to those in the following example. Do not become alarmed, as these messages may not indicate a problem. Instead, proceed with the replacement as described in the next section.
Nov 3 17:44:00 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:00 updb10a unix: SCSI transport failed: reason 'incomplete': \ retrying command Nov 3 17:44:03 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:03 updb10a unix: disk not responding to selection |
Power on the disk enclosure.
For more information, refer to your disk enclosure service manual.
Attach all the disks removed earlier by running the vxdiskadm utility and entering 5 (Replace a failed or removed disk).
This option enables you to replace only one disk at a time. Repeat this procedure for each disk.
Enter the list command to see a list of disk names now in the REMOVED state.
Replace a failed or removed disk Menu: VolumeManager/Disk/ReplaceDisk Use this menu operation to specify a replacement disk for a disk that you removed with the "Remove a disk for replacement" menu operation, or that failed during use. You will be prompted for a disk name to replace and a disk device to use as a replacement. You can choose an uninitialized disk, in which case the disk will be initialized, or you can choose a disk that you have already initialized using the Add or initialize a disk menu operation. Select a removed or failed disk [<disk>,list,q,?] list Disk group: rootdg DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE Disk group: demo DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm c2t2d0 - - - - REMOVED dm c2t3d0 - - - - REMOVED dm c2t4d0 - - - - REMOVED dm c2t5d0 - - - - REMOVED dm c2t8d0 - - - - REMOVED dm c2t9d0 - - - - REMOVED |
Enter the disk name (in this example, c2t2d0).
Select a removed or failed disk [<disk>,list,q,?] c2t2d0 The following devices are available as replacements: c1t2d0s2 c1t3d0s2 c1t4d0s2 c1t5d0s2 c1t8d0s2 c1t9d0s2 |
The vxdiskadm utility detects the new devices and asks you whether the new devices should replace the removed devices.
Enter the "replacement" or "new" device name, or if the utility lists the device as the default, press Return.
You can choose one of these disks to replace c2t2d0. Choose "none" to initialize another disk to replace c2t2d0. Choose a device, or select "none" [<device>,none,q,?] (default: c1t2d0s2) <Return> |
Enter y or press Return to verify that you want this device (in the example, c1t2d0s2) to be the replacement disk.
The requested operation is to use the initialized device c1t2d0s2 to replace the removed or failed disk c2t2d0 in disk group demo. Continue with operation? [y,n,q,?] (default: y) <Return> Replacement of disk c2t2d0 in group demo with disk device c1t2d0s2 completed successfully. |
Enter y to continue.
Replace another disk? [y,n,q,?] (default: n) y |
Repeat Step 15 through Step 18 for each of the REMOVED/NODEVICE disk names.