Sun Cluster 2.2 System Administration Guide

Chapter 12 Administering Sun StorEdge MultiPacks and Sun StorEdge D1000s

This chapter provides instructions for administering Sun StorEdge^TM MultiPack and Sun StorEdge D1000 disks. Some of the procedures documented in this chapter are dependent on your volume management software (Solstice DiskSuite, SSVM, or CVM). These procedures include the volume manager name in their titles.

This chapter includes the following procedures:

Use the service manual for your Sun StorEdge MultiPack and Sun StorEdge D1000 disks, and the volume management software documentation, when you are replacing or repairing disk hardware in the Sun Cluster configuration.

12.1 Recovering From Power Loss

When power is lost to one Sun StorEdge MultiPack or Sun StorEdge D1000, I/O operations generate errors that are detected by your volume management software. Errors are not reported until I/O transactions are made to the disk.

You should monitor the configuration for these events using the commands described in Chapter 2, Sun Cluster Administration Tools.

12.1.1 How to Recover From Power Loss (Solstice DiskSuite)

These are the high-level steps to recover from power loss to a disk enclosure in a Solstice DiskSuite environment:

Identifying the errored replicas
Returning the errored replicas to service
Identifying the errored devices
Returning the errored devices to service
Resyncing the disks

These are the detailed steps to recover from power loss to a disk enclosure in a Solstice DiskSuite environment.

When power is restored, use the metadb(1M) command to identify the errored replicas:
# metadb -s diskset

Return replicas to service.

After the loss of power, all metadevice state database replicas on the affected disk enclosure chassis enter an errored state. Because metadevice state database replica recovery is not automatic, it is safest to perform the recovery immediately after the disk enclosure returns to service. Otherwise, a new failure can cause a majority of replicas to be out of service and cause a kernel panic. This is the expected behavior of Solstice DiskSuite when too few replicas are available.

While these errored replicas will be reclaimed at the next takeover (haswitch(1M) or reboot(1M)), you might want to return them to service manually by first deleting and then adding them back.

Note -
Make sure that you add back the same number of replicas that were deleted on each slice. You can delete multiple replicas with a single metadb(1M) command. If you need multiple copies of replicas on one slice, you must add them in one invocation of the metadb(1M) command using the -c flag.

Use the metastat(1M) command to identify the errored metadevices.
# metastat -s diskset

Return errored metadevices to service using the metareplace(1M) command, and resync the disks.
# metareplace -s diskset -e mirror component
The -e option transitions the component (slice) to the available state and performs a resync.

Components that have been replaced by a hot spare should be the last devices replaced using the metareplace(1M) command. If the hot spare is replaced first, it could replace another errored submirror as soon as it becomes available.

You can perform a resync on only one component of a submirror (metadevice) at a time. If all components of a submirror were affected by the power outage, each component must be replaced separately. It takes approximately 10 minutes to resync a 1.05GB disk.

If both disksets in a symmetric configuration were affected by the power outage, you can resync each diskset's affected submirrors concurrently. Log into each host separately to recover that host's diskset by running metareplace(1M) on each.

Note -
Depending on the number of submirrors and the number of components in these submirrors, the resync actions can require a considerable amount of time. A single submirror made up of 30 1.05GB drives might take about five hours to complete, whereas a configuration made up of five component submirrors might take only 50 minutes to complete.

12.1.2 How to Recover From Power Loss (SSVM or CVM)

Power failures can detach disk drives and cause plexes to become detached, and thus, unavailable. The volume remains active, however, because the remaining plexes in a mirrored volume are still available. It is possible to reattach the disk drives and recover from this condition without halting nodes in the cluster.

These are the high-level steps to recover from power loss to a disk enclosure in an SSVM configuration:

Determining the errored plex(es) by using the vxprint and vxdisk commands
Fixing the problem that caused the power loss
Using the drvconfig and disks commands to create the /devices and /dev entries
Scanning the current disk configuration
Reattaching disks that had transient errors
Verifying there are no more errors
(Optional) For shared disk groups, running the vxdg command for each disk that was powered off
Starting volume recovery

These are the detailed steps to recover from power loss to a disk enclosure in an SSVM configuration.

Use the vxprint command to view the errored plexes.

Optionally, specify a diskgroup with the -g diskgroup option.

Use the vxdisk command to identify the errored disks.

# vxdisk list
DEVICE       TYPE      DISK         GROUP        STATUS
 ..
 -            -         c1t5d0       toi          failed was:c1t5d0s2
 ...

Fix the condition that resulted in the problem so that power is restored to all failed disks.

Be sure that the disks are spun up before proceeding.

Enter the following commands on all nodes in the cluster.

In some cases, the drive(s) must be rediscovered by the node(s).
# drvconfig # disks

Enter the following commands on all nodes in the cluster.

The volume manager must scan the current disk configuration again.
# vxdctl enable # vxdisk -a online

Enter the following command on all nodes in the cluster.

Note -
For CVM, enter the command on the master node first, then on the remaining nodes.

This will reattach and initiate recovery on disks that had transitory failure.
# vxreattach -r

Verify the output of the vxdisk command to see if there are any more errors.
# vxdisk list

If media was replaced, enter the following command from the master node for each disk that has been disconnected.

The physical disk and the volume manager access name for that disk must be reconnected.

# vxdg -g diskgroup -k adddisk medianame=accessname

The values for medianame and accessname appear at the end of the vxdisk list command output.

For example:

# vxdg -g toi -k adddisk c1t5d0=c1t5d0s2
# vxdg -g toi -k adddisk c1t5d1=c1t5d1s2
# vxdg -g toi -k adddisk c1t5d2=c1t5d2s2
# vxdg -g toi -k adddisk c1t5d3=c1t5d3s2
# vxdg -g toi -k adddisk c1t5d4=c1t5d4s2

You can also use the vxdiskadm command, or the graphical user interface, to reattach the disks.

From the node, start volume recovery.
# vxrecover -bv [-g diskgroup]
If you have shared disk groups, use the -svc options to the vxrecover command.

(Optional) Use the vxprint -g command to view the changes.

12.2 Administering Sun StorEdge MultiPacks and Sun StorEdge D1000s

This section describes procedures for administering Sun StorEdge MultiPack and Sun StorEdge D1000 components. Use the procedures described in your server hardware manual to identify the failed component.

12.2.1 Repairing a Lost Sun StorEdge MultiPack or Sun StorEdge D1000 Connection

When a connection from a disk enclosure to one of the cluster nodes fails, the failure is probably due to a bad SCSI-2 cable or an SBus card.

In any event, the node on which the failure occurred will begin generating errors when the failure is discovered. Later accesses to the disk enclosure will generate additional errors. The node will exhibit the same behavior as though power had been lost to the disk enclosure.

I/O operations from the other nodes in the cluster are unaffected by this type of failure.

To diagnose the failure, use the procedures for testing the card module in the service manual for your Sun Cluster node to determine which component failed. You should free up one node and the disk enclosure that appears to be down, for hardware debugging.

12.2.2 How to Repair a Lost Sun StorEdge MultiPack or Sun StorEdge D1000 Connection

Prepare the Sun Cluster system for component replacement.

Depending on the cause of the connection loss, prepare the Sun Cluster node with one of the following procedures.
- If the failed component is an SBus card, see Chapter 7, Administering Server Components, to prepare the Sun Cluster node for power down.
- If the problem is a bad SCSI-2 cable, the volume management software will have detected the problem and prepared the system for cable replacement.

Replace the failed component.

If the SCSI-2 cable or SBus card fails, refer to the service manual for your Sun Cluster node for detailed instructions on replacing them.

Recover from volume management software errors.

Use the procedures described in "12.1 Recovering From Power Loss".

12.2.3 Adding a Sun StorEdge MultiPack or Sun StorEdge D1000

You can add Sun StorEdge MultiPacks or Sun StorEdge D1000s to a Sun Cluster configuration at any time.

You must review the disk group configuration in your Sun Cluster configuration before adding a disk enclosure. The discussions in the chapter on planning the configuration in the Sun Cluster 2.2 Software Installation Guide, and in Appendix A, Administering Volume Managers, in this book, will help determine the impact of the disk enclosure on the configuration of disk groups.

12.2.4 How to Add a Sun StorEdge MultiPack or Sun StorEdge D1000

Shut down one of the cluster nodes.

Use the procedure in "4.2 Stopping the Cluster and Cluster Nodes", to shut down the node.

Install an additional SBus card in the node, if necessary.

Use the instructions in the hardware service manual for your Sun Cluster node to install the SBus card.

Note -
Install the SBus card in the first available empty SBus slot, following all other cards in the node. This ensures that the controller numbering will be preserved if the Solaris operating environment is reinstalled. Refer to "1.4 Instance Names and Numbering", for more information.

Connect the SCSI-2 cables to the disk enclosure.

Use the instructions in the hardware service manual for your Sun Cluster node.

Set the SCSI initiator ID, as appropriate.

Use the instructions in the hardware service manual for your Sun Cluster node.

Perform a reconfiguration reboot of the node.
ok boot -r

Use the haswitch(1M) command to switch ownership of all logical hosts that can be mastered to the rebooted node.
phys-hahost1# haswitch phys-hahost2 hahost1 hahost2

Repeat "" through Step 5 on other nodes connected to this disk enclosure.

Switch ownership of the logical hosts back to the appropriate default master if necessary.

For example:
phys-hahost1# haswitch phys-hahost2 hahost2

Add the disks in the disk enclosures to the selected disk group.

Use the instructions in your volume manager documentation to add the disks to the selected disk group(s). Also, refer to appendixes in the Sun Cluster 2.2 Software Installation Guide for information on Solstice DiskSuite, SSVM, or CVM.

12.3 Administering Sun StorEdge MultiPack and Sun StorEdge D1000 Disks

As part of standard Sun Cluster administration, you should monitor the status of the configuration. See Chapter 2, Sun Cluster Administration Tools, for information about monitoring methods. During the monitoring process you might discover problems with multihost disks. The following procedures describe how to correct these problems.

Sun Cluster supports different disk types. Refer to the hardware service manual for your multihost disk expansion unit for a description of your disk enclosure.

12.3.1 Adding Sun StorEdge MultiPack or Sun StorEdge D1000 Disks

In a symmetric configuration, the disk enclosure might contain disks from multiple disk groups and will require that a single node own all of the affected disk groups.

12.3.2 How to Add a Sun StorEdge MultiPack or a Sun StorEdge D1000 Disk

These are the high-level steps to add a Sun StorEdge MultiPack or Sun StorEdge D1000 disk:

Identifying the controller for this new disk and locating an empty slot in the disk enclosure
Adding the new disk
Performing the administrative actions to prepare the disk for use by Sun Cluster
- Creating the /devices special files and /dev/dsk and /dev/rdsk links
- Adding the disk to the disk group
- Formatting and partitioning the disk, if necessary
- Performing the volume management-related administrative tasks

These are the detailed steps to add a new Sun StorEdge MultiPack or Sun StorEdge D1000 disk.

Determine the controller number of the disk enclosure to which the disk will be added.

Use the mount(1M) or format(1M) command to determine the controller number.

Locate an appropriate empty disk slot in the disk enclosure for the disk being added.

Identify the empty slots either by observing the disk drive LEDs on the front of the disk enclosure, or by removing the left side cover of the unit. The target address IDs corresponding to the slots appear on the middle partition of the drive bay.

In the following steps, Tray 2 is used as an example. The slot selected for the new disk is Tray 2 Slot 7. The new disk will be known as c2t3d1.

Add the new disk.

Use the instructions in your disk enclosure unit service manual to perform the hardware procedure of adding the disk.

Run the drvconfig(1M) and disks(1M) commands to create the new entries in /devices, /dev/dsk, and /dev/rdsk for all new disks.
phys-hahost1# drvconfig phys-hahost1# disks

Switch ownership of the logical hosts to the other cluster node to which this disk is connected.
phys-hahost1# haswitch phys-hahost2 hahost1 hahost2

Run the drvconfig(1M) and disks(1M) commands on the node that now owns the disk group to which the disk will be added.
phys-hahost2# drvconfig phys-hahost2# disks

Add the disk to a disk group using your volume management software.

For Solstice DiskSuite, the command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
# metaset -s diskset -a drive
For SSVM or CVM, you can use the command line or graphical user interface to add the disk to the disk group.

Caution -
If you are using Solstice DiskSuite, the metaset(1M) command might repartition this disk automatically. See the Solstice DiskSuite documentation for more information.

(Solstice DiskSuite configurations only) After adding the disks to the diskset by using the metaset(1M) command, use the scadmin(1M) command to reserve and enable failfast on the specified disks.
phys-hahost1# scadmin reserve drivename

Perform the usual administration actions on the new disk.

You can now perform the usual administration steps that are performed when a new drive is brought into service. See your volume management software documentation for more information on these tasks.

If necessary, switch logical hosts back to their default masters.

12.3.3 Replacing Sun StorEdge MultiPack or Sun StorEdge D1000 Disks

This section describes replacing a multihost disk without interrupting Sun Cluster services (online replacement) when the volume manager is reporting problems such as:

Components in the Needs Maintenance state
Hot spare replacement
Intermittent disk errors

Consult your volume management software documentation for offline replacement procedures.

12.3.4 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Disk (Solstice DiskSuite)

Use the following procedure if you have determined that a disk has components in the Needs Maintenance state, a hot spare has replaced a component, or a disk is generating intermittent errors.

These are the high-level steps to replace a Sun StorEdge MultiPack or Sun StorEdge D1000 disk in a Solstice DiskSuite configuration:

Determining which disk needs replacement
Determining which disk expansion unit holds the disk to be replaced
Removing the bad disk from the diskset
Spinning down the disk and opening the disk enclosure
Replacing the disk drive
Running the scdidadm -R command
Adding the new disk to the diskset
Reserving and enabling failfast on the disk
Partitioning the new disk
Running the metastat(1M) command to verify the problem has been fixed

These are the detailed steps to replace a failed Sun StorEdge MultiPack or Sun StorEdge D1000 disk in a Solstice DiskSuite configuration.

Run the procedure on the host that masters the diskset in which the bad disk resides. This might require you to switch over the diskset using the haswitch(1M) command.

Identify the disk to be replaced.

Use the metastat(1M) command and /var/adm/messages output.

When metastat(1M) reports that a device is in maintenance state or some of the components have been replaced by hot spares, you must locate and replace the device. A sample metastat(1M) output follows. In this example, device c3t3d4s0 is in maintenance state:

phys-hahost1# metastat -s hahost1
...
  d50:Submirror of hahost1/d40
       State: Needs Maintenance
       Stripe 0:
           Device       Start Block      Dbase      State          Hot Spare
           c3t3d4s0     0                No         Okay           c3t5d4s0
 ...

Check /var/adm/messages to see what kind of problem has been detected.

...
 Jun 1 16:15:26 host1 unix: WARNING: /io-
 unit@f,e1200000/sbi@0.0/SUNW,pln@a0000000,741022/ssd@3,4(ssd49):  
 Jun 1 16:15:26 host1 unix: Error for command `write(I))' Err
 Jun 1 16:15:27 host1 unix: or Level: Fatal
 Jun 1 16:15:27 host1 unix: Requested Block 144004, Error Block: 715559
 Jun 1 16:15:27 host1 unix: Sense Key: Media Error
 Jun 1 16:15:27 host1 unix: Vendor `CONNER':
 Jun 1 16:15:27 host1 unix: ASC=0x10(ID CRC or ECC error),ASCQ=0x0,FRU=0x15
 ...

Determine the location of the problem disk.

Use the mount(1M) or format(1M) command to determine the controller number.

If the problem disk contains replicas, make a record of the slice and number, then delete the replicas.

Use the metadb(1M) command to delete the replicas.

Detach all submirrors with components on the disk being replaced.

If you are detaching a submirror that has a failed component, you must force the detach using the metadetach -f option. The following example detaches submirror d50 from metamirror d40.
phys-hahost1# metadetach -s hahost1 -f d40 d50

Use the metaclear(1M) command to clear the submirrors detached in Step 5.
phys-hahost1# metaclear -s hahost1 -f d50

If the problem disk contains hot spares, make a record of the names of devices and list of devices that contain hot spare pools, then delete the hot spares.

Use the metahs(1M) command to delete hot spares.

Caution -
You need to record the information before deleting the objects so that the actions can be reversed following the disk replacement.

Use the metaset(1M) command to remove the failed disk from the diskset.

The command syntax is as follows, where diskset is the name of the diskset containing the failed disk, and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3):
phys-hahost1# metaset -s diskset -d drive
This can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.

Replace the bad disk.

Refer to the hardware service manuals for your disk enclosure for details on this procedure.

Make sure the new disk spins up.

The disk should spin up automatically.

Update the DID driver's database with the new device ID.

Note -
If you upgraded from HA 1.3, your installation does not use the DID driver, so skip this step.

Use the -l flag to scdidadm(1M) to identify the DID name for the lower level device name of the drive to be replaced. Then update the DID drive database using the -R flag to scdidadm(1M). Refer to the Sun Cluster 2.2 Software Installation Guide for details on the DID pseudo driver.
phys-hahost1# scdidadm -o name -l /dev/rdsk/c3t3d4 6 phys-hahost1:/dev/rdsk/c3t3d4 /dev/did/rdsk/d6 phys-hahost1# scdidadm -R d6

Add the new disk back into the diskset using the metaset(1M) command.

This step adds automatically adds back the proper number of replicas that were deleted from the failed disk. The syntax of the command is show below. In this example, diskset is the name of the diskset containing the failed disk and drive is the DID name of the disk in the form dN (for new installations of Sun Cluster), or cNtYdZ (for installations that upgraded from HA 1.3).
phys-hahost1# metaset -s diskset -a drive
This operation can take up to fifteen minutes or more, depending on the size of your configuration and the number of disks.

Use the scadmin(1M) command to reserve and enable failfast on the specified disk that has just been added back to the diskset.
phys-hahost1# scadmin reserve c3t3d4

Use the format(1M) or fmthard(1M) command to repartition the new disk.

Make sure that you partition the new disk exactly as the disk that was replaced. (Saving the disk format information was recommended in Chapter 1, Preparing for Sun Cluster Administration.)

Use the metainit(1M) command to reinitialize disks that were cleared in Step 6.
phys-hahost1# metainit -s hahost1 d50

Attach submirrors that were detached in Step 5.

Use the metattach(1M) command to perform this step. See the metattach(1M) man page for details.
phys-hahost1# metattach -s hahost1 d40 d50

Restore all hot spares that were deleted in Step 7.

Use metahs(1M) to add back the hot spares. See the metahs(1M) man page for details.
phys-hahost1# metahs -s hahost1 -a hsp000 c3t2d5s0

Verify that the replacement corrected the problem.
phys-hahost1# metastat -s hahost1

12.3.5 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Disk (SSVM or CVM)

These are the high-level steps to replace a Sun StorEdge MultiPack or Sun StorEdge D1000 disk in an SSVM or CVM configuration:

Removing the failed disk in the disk enclosure by using the vxdiskadm command
Replacing the failed disk
Replacing the disk removed earlier by using the vxdiskadm command

Note -

For systems not running shared disk groups, master node refers to the node that has imported the disk group.

If you are running shared disk groups, determine the master and slave node by entering the following command on all nodes in the cluster:
# vxdctl -c mode
Note -
Complete the following steps from the master node.

Determine if the disk in question had failures and is in the NODEVICE state.

If this is not the case, skip to Step 8.

Run the vxdiskadm utility and enter 4 (Remove a disk for replacement).

This option removes a physical disk while retaining the disk name. The utility then queries you for the particular device that you want to replace.

Enter the disk name or list.

The following example illustrates the removal of disk c2t8d0.

Enter disk name [<disk>,list,q,?] list

 Disk group: rootdg

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c0t0d0s7     c0t0d0s7     simple   1024     20255    -

 Disk group: demo

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c1t2d0       c2t2d0s2     sliced   1519     4152640  -
 dm c1t3d0       c2t3d0s2     sliced   1519     4152640  -
 dm c1t4d0       c2t4d0s2     sliced   1519     4152640  -
 dm c1t5d0       c2t5d0s2     sliced   1519     4152640  -
 dm c1t8d0       c2t8d0s2     sliced   1519     4152640  -
 dm c1t9d0       c2t9d0s2     sliced   1519     4152640  -
 dm c2t2d0       c1t2d0s2     sliced   1519     4152640  -
 dm c2t3d0       c1t3d0s2     sliced   1519     4152640  -
 dm c2t4d0       c1t4d0s2     sliced   1519     4152640  -
 dm c2t5d0       c1t5d0s2     sliced   1519     4152640  -
 dm c2t8d0       c1t8d0s2     sliced   1519     4152640  -
 dm c2t9d0       c1t9d0s2     sliced   1519     4152640  -

 Enter disk name [<disk>,list,q,?] c2t8d0

   The requested operation is to remove disk c2t8d0 from disk group
   demo.  The disk name will be kept, along with any volumes using
   the disk, allowing replacement of the disk.

   Select "Replace a failed or removed disk" from the main menu
   when you wish to replace the disk.

Enter y or press Return to continue.

Continue with operation? [y,n,q,?] (default: y) y
Removal of disk c2t8d0 completed successfully.

Enter q to quit the utility.

Remove another disk? [y,n,q,?] (default: n) q

Enter vxdisk list and vxprint to view the changes.

The example disk c2t8d0 is removed.

# vxdisk list
.
 c2t3d0s2     sliced    c1t3d0       demo         online shared
 c2t4d0s2     sliced    c1t4d0       demo         online shared
 c2t5d0s2     sliced    c1t5d0       demo         online shared
 c2t8d0s2     sliced    c1t8d0       demo         online shared
 c2t9d0s2     sliced    c1t9d0       demo         online shared
 -            -         c2t8d0       demo         removed
 # vxprint
.
 dm c2t3d0       c1t3d0s2     -        4152640  -        -        -       -
 dm c2t4d0       c1t4d0s2     -        4152640  -        -        -       -
 dm c2t5d0       c1t5d0s2     -        4152640  -        -        -       -
 dm c2t8d0       -            -        -        -        REMOVED  -       -
 dm c2t9d0       c1t9d0s2     -        4152640  -        -        -       -

 pl demo05-02    -            DISABLED 51200    -        REMOVED  -       -
 sd c2t8d0-1     demo05-02    DISABLED 51200    0        REMOVED  -       -
 .
 .
 .

Replace the physical drive without powering off any component.

For further information, refer to the documentation accompanying the disk enclosure unit.

Note -

As you replace the drive, you may see messages on the system console similar to those in the following example. Do not become alarmed as these messages may not indicate a problem. Instead, proceed with the replacement as described in the next steps.

Nov  3 17:44:00 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17):
 Nov  3 17:44:00 updb10a unix:   SCSI transport failed: reason 'incomplete': \
 retrying command
 Nov  3 17:44:03 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17):
 Nov  3 17:44:03 updb10a unix:   disk not responding to selection

Run the vxdiskadm utility and enter 5 (Replace a failed or removed disk).

Enter the disk name.

You can enter list to see a list of disks in the REMOVED state.

Note -

The disk may appear in the NODEVICE state if it had failures.

Select a removed or failed disk [<disk>,list,q,?] list

 Disk group: rootdg

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE


 Disk group: demo

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c2t8d0       -            -        -        -        REMOVED


 Select a removed or failed disk [<disk>,list,q,?] c2t8d0

The vxdiskadm utility detects the new device and asks you whether the new device should replace the removed device.

Note -

If there are other unused disks attached to the system, vxdiskadm also presents these disks as viable choices.

Enter the device name, or if the utility lists the device as the default, press Return.

The following devices are available as replacements:

         c1t8d0s2

   You can choose one of these disks to replace c2t8d0.
   Choose "none" to initialize another disk to replace c2t8d0.

 Choose a device, or select "none"
 [<device>,none,q,?] (default: c1t8d0s2) <Return>

  The requested operation is to use the initialized device c1t8d0s2
  to replace the removed or failed disk c2t8d0 in disk group demo.

Enter y or press Return to verify that you want this device (in the example, c1t8d0s2) to be the replacement disk.

Continue with operation? [y,n,q,?] (default: y) <Return>

   Replacement of disk c2t8d0 in group demo with disk device
   c1t8d0s2 completed successfully.

Enter n or press Return to quit this utility.

Replace another disk? [y,n,q,?] (default: n)  <Return>

Enter vxdisk list and vxprint to see the changes.

The example, disk c2t8d0, is no longer in the REMOVED state.

# vxdisk list
.
 c2t2d0s2     sliced    c1t2d0       demo         online shared
 c2t3d0s2     sliced    c1t3d0       demo         online shared
 c2t4d0s2     sliced    c1t4d0       demo         online shared
 c2t5d0s2     sliced    c1t5d0       demo         online shared
 c2t8d0s2     sliced    c1t8d0       demo         online shared
 c2t9d0s2     sliced    c1t9d0       demo         online shared

 # vxprint
.
 dm c2t4d0       c1t4d0s2     -        4152640  -        -        -       -
 dm c2t5d0       c1t5d0s2     -        4152640  -        -        -       -
 dm c2t8d0       c1t8d0s2     -        4152640  -        -        -       -
 dm c2t9d0       c1t9d0s2     -        4152640  -        -        -       -
 .

12.3.6 Replacing Sun StorEdge MultiPack or Sun StorEdge D1000 Enclosures

This section describes how to replace an entire Sun StorEdge MultiPack or Sun StorEdge D1000 enclosure running SSVM or CVM.

12.3.7 How to Replace a Sun StorEdge MultiPack or Sun StorEdge D1000 Enclosure (SSVM or CVM)

These are the high-level steps for replacing an entire failed Sun StorEdge MultiPack or Sun StorEdge D1000 in an SSVM or CVM configuration:

Removing all the disks in the defective disk enclosure by using the vxdiskadm command
Replacing the failed disk enclosure
Replacing all the disks removed earlier into the new disk enclosure by using the vxdiskadm command

Note -

For systems not running shared disk groups, master node refers to the node that has imported the disk group.

If you are running shared disk groups, determine the master and slave node by entering the following command on all nodes in the cluster:
# vxdctl -c mode
Note -
Complete the following steps from the master node.

Remove all the disks on the failed disk enclosure by running the vxdiskadm utility and entering 4 (Remove a disk for replacement).

Note -
This option enables you to remove only one disk at a time. Repeat this procedure for each disk.

Enter the list command.

In the following example, assume that the disk enclosure on controller c2 needs replacement. Based on the list output, the SSVM or CVM names for these disks are c2t2d0, c2t3d0, c2t4d0, c2t5d0, c2t8d0, and c2t9d0.

Remove a disk for replacement
 Menu: VolumeManager/Disk/RemoveForReplace

   Use this menu operation to remove a physical disk from a disk
   group, while retaining the disk name.  This changes the state
   for the disk name to a "removed" disk.  If there are any
   initialized disks that are not part of a disk group, you will be
   given the option of using one of these disks as a replacement.

 Enter disk name [<disk>,list,q,?] list
Disk group: rootdg

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c0t0d0s7     c0t0d0s7     simple   1024     20255    -

 Disk group: demo

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c1t2d0       c2t2d0s2     sliced   1519     4152640  -
 dm c1t3d0       c2t3d0s2     sliced   1519     4152640  -
 dm c1t4d0       c2t4d0s2     sliced   1519     4152640  -
 dm c1t5d0       c2t5d0s2     sliced   1519     4152640  -
 dm c1t8d0       c2t8d0s2     sliced   1519     4152640  -
 dm c1t9d0       c2t9d0s2     sliced   1519     4152640  -
 dm c2t2d0       c1t2d0s2     sliced   1519     4152640  -
 dm c2t3d0       c1t3d0s2     sliced   1519     4152640  -
 dm c2t4d0       c1t4d0s2     sliced   1519     4152640  -
 dm c2t5d0       c1t5d0s2     sliced   1519     4152640  -
 dm c2t8d0       c1t8d0s2     sliced   1519     4152640  -
 dm c2t9d0       c1t9d0s2     sliced   1519     4152640  -

Enter the disk name (in this example, c2t2d0).

Enter disk name [<disk>,list,q,?] c2t2d0


   The following volumes will lose mirrors as a result of this
   operation:

         demo-1

   No data on these volumes will be lost.

   The requested operation is to remove disk c2t2d0 from disk group
   demo.  The disk name will be kept, along with any volumes using
   the disk, allowing replacement of the disk.

   Select "Replace a failed or removed disk" from the main menu
   when you wish to replace the disk.

Enter y or press Return to verify that you want to replace the disk.

Continue with operation? [y,n,q,?] (default: y) <Return>

   Removal of disk c2t2d0 completed successfully.

Enter y to continue.

Remove another disk? [y,n,q,?] (default: n) y

 Remove a disk for replacement
 Menu: VolumeManager/Disk/RemoveForReplace

   Use this menu operation to remove a physical disk from a disk
   group, while retaining the disk name.  This changes the state
   for the disk name to a "removed" disk.  If there are any
   initialized disks that are not part of a disk group, you will be
   given the option of using one of these disks as a replacement.

Enter the next example disk name, c2t3d0.

Enter disk name [<disk>,list,q,?] c2t3d0

   The following volumes will lose mirrors as a result of this
   operation:

         demo-2

   No data on these volumes will be lost.

 The following devices are available as replacements:

         c1t2d0

   You can choose one of these disks now, to replace c2t3d0.
   Select "none" if you do not wish to select a replacement disk.

Enter none, if necessary.

Note -
This query arises whenever the utility recognizes a good disk in the system. If there are no good disks, you will not see this query.
Choose a device, or select "none" [<device>,none,q,?] (default: c1t2d0) none

Enter y or press Return to verify that you want to remove the disk.

The requested operation is to remove disk c2t3d0 from disk group
   demo.  The disk name will be kept, along with any volumes using
   the disk, allowing replacement of the disk.

   Select "Replace a failed or removed disk" from the main menu
   when you wish to replace the disk.

 Continue with operation? [y,n,q,?] (default: y) <Return>

   Removal of disk c2t3d0 completed successfully.

Repeat Step 6 through Step 9 for each disk you identified in Step 3.

Power off and replace the disk enclosure.

For more information, refer to the disk enclosure documentation.

Note -

As you replace the disk enclosure, you may see messages on the system console similar to those in the following example. Do not become alarmed, as these messages may not indicate a problem. Instead, proceed with the replacement as described in the next section.

Nov  3 17:44:00 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17):
 Nov  3 17:44:00 updb10a unix:   SCSI transport failed: reason 'incomplete': \
 retrying command
 Nov  3 17:44:03 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17):
 Nov  3 17:44:03 updb10a unix:   disk not responding to selection

Power on the disk enclosure.

For more information, refer to your disk enclosure service manual.

Attach all the disks removed earlier by running the vxdiskadm utility and entering 5 (Replace a failed or removed disk).

Note -
This option enables you to replace only one disk at a time. Repeat this procedure for each disk.

Enter the list command to see a list of disk names now in the REMOVED state.

Replace a failed or removed disk
 Menu: VolumeManager/Disk/ReplaceDisk

  Use this menu operation to specify a replacement disk for a disk
  that you removed with the "Remove a disk for replacement" menu
  operation, or that failed during use.  You will be prompted for
  a disk name to replace and a disk device to use as a replacement.
  You can choose an uninitialized disk, in which case the disk will
  be initialized, or you can choose a disk that you have already
  initialized using the Add or initialize a disk menu operation.

 Select a removed or failed disk [<disk>,list,q,?] list

 Disk group: rootdg

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE


 Disk group: demo

 DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

 dm c2t2d0       -            -        -        -        REMOVED
 dm c2t3d0       -            -        -        -        REMOVED
 dm c2t4d0       -            -        -        -        REMOVED
 dm c2t5d0       -            -        -        -        REMOVED
 dm c2t8d0       -            -        -        -        REMOVED
 dm c2t9d0       -            -        -        -        REMOVED

Enter the disk name (in this example, c2t2d0).

Select a removed or failed disk [<disk>,list,q,?] c2t2d0

   The following devices are available as replacements:

         c1t2d0s2 c1t3d0s2 c1t4d0s2 c1t5d0s2 c1t8d0s2 c1t9d0s2

The vxdiskadm utility detects the new devices and asks you whether the new devices should replace the removed devices.

Enter the "replacement" or "new" device name, or if the utility lists the device as the default, press Return.

You can choose one of these disks to replace c2t2d0.
   Choose "none" to initialize another disk to replace c2t2d0.

 Choose a device, or select "none"
 [<device>,none,q,?] (default: c1t2d0s2) <Return>

Enter y or press Return to verify that you want this device (in the example, c1t2d0s2) to be the replacement disk.

The requested operation is to use the initialized device c1t2d0s2
   to replace the removed or failed disk c2t2d0 in disk group demo.

 Continue with operation? [y,n,q,?] (default: y) <Return>

   Replacement of disk c2t2d0 in group demo with disk device
   c1t2d0s2 completed successfully.

Enter y to continue.
Replace another disk? [y,n,q,?] (default: n) y
Repeat Step 15 through Step 18 for each of the REMOVED/NODEVICE disk names.