Sun Cluster 3.0 5/02 Supplement

Maintaining a StorEdge/Netra st A1000 Array

This section contains the procedures for maintaining a StorEdge/Netra st A1000 array in a Sun Cluster environment. Some maintenance tasks listed in Table D-2 are performed the same as in a non-cluster environment, so the task's procedures are referenced rather than contained in this section. Table D-2 lists the procedures for maintaining the StorEdge/Netra st A1000 array.

Table D-2 Tasks: Maintaining a StorEdge/Netra st A1000 Array

Task 

For Instructions, Go To 

A1000 array procedures: 

Add an array to a running cluster. 

"How to Add a Pair of StorEdge/Netra st A1000 Arrays to a Running Cluster"

Remove an array from a running cluster. 

"How to Remove a StorEdge/Netra st A1000 Array From a Running Cluster"

Replace a failed array. 

 

To replace a failed array, remove the failed array and add a new array to the configuration. 

"How to Remove a StorEdge/Netra st A1000 Array From a Running Cluster"

 

"How to Add a Pair of StorEdge/Netra st A1000 Arrays to a Running Cluster"

Add a disk drive to a running cluster. 

"How to Add a Disk Drive in a Running Cluster"

Replace a disk drive in a running cluster. 

"How to Replace a Failed Disk Drive in a Running Cluster"

Remove a disk drive from a running cluster. 

"How to Remove a Disk Drive From a Running Cluster"

Upgrade array firmware and NVSRAM file. 

"How to Upgrade Disk Drive Firmware in a Running Cluster"

Replace a failed controller or restore an offline controller. 

"How to Replace a Failed Controller or Restore an Offline Controller "

Replace a power cord to an array. 

Sun StorEdge A1000 and D1000 Installation, Operations, and Service Manual

 

Netra st A1000 and Netra st D1000 Installation and Maintenance Manual 

Replace an array cooling canister. 

Follow the same procedure that is used in a non-cluster environment. 

Sun StorEdge A1000 and D1000 Installation, Operations, and Service Manual

 

Netra st A1000 and Netra st D1000 Installation and Maintenance Manual 

Cable procedures: 

Replace a StorEdge A1000-to-host SCSI cable. 

Follow the same procedure that is used in a non-cluster environment. 

Sun StorEdge RAID Manager User's Guide

 

Sun StorEdge RAID Manager 6.22.1 Release Notes

Cabinet/power procedures: 

Replace the battery unit. 

Sun StorEdge RAID Manager 6.22.1 Release Notes

 

Sun StorEdge A1000 and D1000 Installation, Operations, and Service Manual

 

Netra st A1000 and Netra st D1000 Installation and Maintenance Manual 

Replace a power supply. 

Sun StorEdge A1000 and D1000 Installation, Operations, and Service Manual

 

Netra st A1000 and Netra st D1000 Installation and Maintenance Manual 

Node/host adapter procedures: 

Replace a host adapter in a node. 

"How to Replace a Host Adapter in a Node (Connected to a StorEdge/Netra st A1000 array)"

How to Add a Pair of StorEdge/Netra st A1000 Arrays to a Running Cluster

Use this procedure to add a pair of StorEdge/Netra st A1000 arrays to a running cluster.

  1. Install the RAID Manager software on cluster nodes.

    For the procedure on installing RAID Manager software, see the Sun StorEdge RAID Manager Installation and Support Guide.


    only -

    RAID Manager 6.22 or a compatible version is required for clustering with Sun Cluster 3.0.


  2. Install any StorEdge/Netra st A1000 array patches on cluster nodes.


    only -

    For the most current list of software, firmware, and patches that are required for the StorEdge/Netra st A1000 array, refer to EarlyNotifier 20029, "A1000/A3x00/A1000FC Software/Firmware Configuration Matrix." This document is available online to Sun service providers and to customers with SunSolve service contracts at the SunSolve site: http://sunsolve.sun.com.


  3. Set the Rdac parameters in the /etc/osa/rmparams file on both nodes.


    Rdac_RetryCount=1
    Rdac_NoAltOffline=TRUE
    

  4. Power on the StorEdge/Netra st A1000 array.

    To power on the StorEdge/Netra st A1000 array, push the power switch to the momentary on position (right side) and then release it.

  5. Shut down the first node.


    # scswitch -S -h nodename
    # shutdown -y -g0 -i0
    

  6. If you are installing new host adapters, power off the first node.

    For the full procedure on shutting down and powering off a node, see the Sun Cluster 3.0 12/01 System Administration Guide.

  7. Install the host adapters in the first node.

    For the procedure on installing host adapters, see the documentation that shipped with your host adapters and nodes.

  8. Cable the StorEdge/Netra st A1000 array to the first node.

    If you are adding a StorEdge/Netra st A1000 array, connect the differential SCSI cable between the node and the array. Verify that the entire SCSI bus length to each enclosure is less than 25 m. This measurement includes the cables to both nodes, as well as the bus length internal to each enclosure, node, and host adapter.

    Figure D-2 StorEdge/Netra st A1000 Array Cabling

    Graphic

  9. Did you power off the first node to install a host adapter?

    • If not, go to Step 10.

    • If you did power off the first node, power it and the StorEdge/Netra st A1000 array on, but do not allow the node to boot. If necessary, halt the array to continue with OpenBoot PROM (OBP) Monitor tasks.

  10. Find the paths to the SCSI host adapters.


    {0} ok show-disks
    ...b) /sbus@6,0/QLGC,isp@2,10000/sd...d) /sbus@2,0/QLGC,isp@2,10000/sd...

    Identify and record the two controllers that are to be connected to the disk arrays, and record these paths. Use this information to change the SCSI addresses of these controllers in the nvramrc script in Step 11. Do not include the sd directories in the device paths.

  11. Edit the nvramrc script to change the scsi-initiator-id for the host adapters on the first node.

    The default SCSI address for host adapters is 7. Reserve SCSI address 7 for one host adapter in the SCSI chain. This procedure refers to the host adapter that has SCSI address 7 as the host adapter on the "second node."

    To avoid conflicts, change the scsi-initiator-id of the remaining host adapter in the SCSI chain to an available SCSI address. This procedure refers to the host adapter that has an available SCSI address as the host adapter on the "first node."

    For a partial list of nvramrc editor and nvedit keystroke commands, see Appendix B of the Sun Cluster 3.0 12/01 Hardware Guide. For a full list of commands, see the OpenBoot 3.x Command Reference Manual.

    The following example sets the scsi-initiator-id to 6. The OpenBoot PROM Monitor prints the line numbers (0:, 1:, and so on).


    only -

    Insert exactly one space after the quotation mark and before scsi-initiator-id.


    {0} ok nvedit 
    0: probe-all
    1: cd /sbus@6,0/QLGC,isp@2,10000 
    2: 6 " scsi-initiator-id" integer-property 
    3: device-end 
    4: cd /sbus@2,0/QLGC,isp@2,10000
    5: 6 " scsi-initiator-id" integer-property 
    6: device-end 
    7: install-console 
    8: banner <Control C>
    {0} ok


  12. Store the changes.

    The changes you make through the nvedit command are recorded on a temporary copy of the nvramrc script. You can continue to edit this copy without risk. After you have completed your edits, save the changes. If you are not sure about the changes, discard them.

    • To store the changes, type:


      {0} ok nvstore
      {0} ok 

    • To discard the changes, type:


      {0} ok nvquit
      {0} ok 

  13. Verify the contents of the nvramrc script you created in Step 11, as shown in the following example.

    If the contents of the nvramrc script are incorrect, use the nvedit command to make corrections.


    {0} ok printenv nvramrc
    nvramrc =             probe-all
                          cd /sbus@6,0/QLGC,isp@2,10000
                          6 " scsi-initiator-id" integer-property
                          device-end 
                          cd /sbus@2,0/QLGC,isp@2,10000
                          6 " scsi-initiator-id" integer-property
                          device-end 
                          install-console
                          banner
    {0} ok

  14. Instruct the OpenBoot PROM Monitor to use the nvramrc script:


    {0} ok setenv use-nvramrc? true
    use-nvramrc? = true
    {0} ok 

  15. Boot the first node.


    {0} ok boot -r
    

    For more information on booting nodes, see the Sun Cluster 3.0 12/01 System Administration Guide.

  16. Check the StorEdge/Netra st A1000 array NVSRAM file and firmware revisions, and if necessary, install the most recent revision.

    To verify that you have the current revision, see the Sun StorEdge RAID Manager Release Notes. For the procedure on upgrading the NVSRAM file and firmware, see the Sun StorEdge RAID Manager User's Guide.

  17. Shut down the second node.


    # scswitch -S -h nodename
    # shutdown -y -g0 -i0
    

  18. If you are installing new host adapters, power off the second node.

    For the full procedure on shutting down and powering off a node, see the Sun Cluster 3.0 12/01 System Administration Guide.

  19. Install the host adapters in the second node.

    For the procedure on installing host adapters, see the documentation that shipped with your nodes.

  20. Cable the StorEdge/Netra st A1000 array to your node.

    Connect the differential SCSI cable between the node and the array. Make sure that the entire SCSI bus length to each enclosure is less than 25 m. This measurement includes the cables to both nodes, as well as the bus length internal to each enclosure, node, and host adapter.

    Figure D-3 StorEdge/Netra st A1000 Array Cabling

    Graphic

  21. Did you power off the second node to install a host adapter?

    • If not, go to Step 23.

    • If you did power off the second node, power it and the StorEdge/Netra st A1000 array on, but do not allow the node to boot. If necessary, halt the array to continue with OpenBoot PROM (OBP) Monitor tasks.

  22. Verify that the second node recognizes the new host adapters and disk drives.

    If the node does not recognize the new hardware, check all hardware connections and repeat installation steps you performed in Step 19.


    {0} ok show-disks
    ...b) /sbus@6,0/QLGC,isp@2,10000/sd...d) /sbus@2,0/QLGC,isp@2,10000/sd...{0} ok

  23. Verify that the scsi-initiator-id for the host adapters on the second node is set to 7.

    Use the show-disks command to find the paths to the host adapters that are connected to these enclosures. Select each host adapter's device tree node, and display the node's properties to confirm that the scsi-initiator-id for each host adapter is set to 7.


    {0} ok cd /sbus@6,0/QLGC,isp@2,10000
    {0} ok .properties
    scsi-initiator-id        00000007 
    ...

  24. Boot the second node.


    {0} ok boot -r
    

    For more information, see the Sun Cluster 3.0 12/01 System Administration Guide.

  25. On one node, verify that the DIDs have been assigned to the StorEdge/Netra st A1000 LUNs for all nodes that are attached to the StorEdge/Netra st A1000 array:


    # scdidadm -L
    

Where to Go From Here

To create a LUN from disk drives that are unassigned, see "How to Create a LUN".

To upgrade StorEdge/Netra st A1000 array firmware, see "How to Upgrade Disk Drive Firmware in a Running Cluster".

How to Remove a StorEdge/Netra st A1000 Array From a Running Cluster

Use this procedure to remove a StorEdge/Netra st A1000 array from a running cluster.


Caution - Caution -

This procedure removes all data that is on the StorEdge/Netra st A1000 array you remove.


  1. Migrate any Oracle Parallel Server/Real Application Clusters (OPS) tables, data services, or volumes off the array.

  2. Determine if the array contains a LUN that is configured as a quorum device.


    # scstat -q
    
    • If the array does not contain a quorum device, go to Step 3.

    • If the array contains a LUN that is configured as a quorum device, choose and configure another device on a different array to be the new quorum device. Then remove the old quorum device.

  3. Halt all activity to the array.

    See the RAID Manager User's Guide and your operating array documentation for instructions.

  4. Remove the LUN from disksets or disk groups.

    If a volume manager does manage the LUN, run the appropriate Solstice DiskSuite/Solaris Volume Manager or VERITAS Volume Manager commands to remove the LUN from any diskset or disk group. For more information, see your Solstice DiskSuite/Solaris Volume Manager or VERITAS Volume Manager documentation. See the following paragraph for additional VERITAS Volume Manager commands that are required.

    LUNs that were managed by VERITAS Volume Manager must be completely removed from VERITAS Volume Manager control before you can delete them. To remove the LUNs, after you delete the LUN from any disk group, use the following commands:


    # vxdisk offline cNtXdY
    # vxdisk rm cNtXdY
    

  5. From one node, delete the LUN.

    For the procedure on deleting a LUN, see the Sun StorEdge RAID Manager User's Guide.

  6. Disconnect all cables from the array and remove the hardware from your cluster.

  7. Remove the paths to the LUN(s) you are deleting:


    # rm /dev/rdsk/cNtXdY*
    # rm /dev/dsk/cNtXdY*
    
    # rm /dev/osa/dev/dsk/cNtXdY*
    # rm /dev/osa/dev/rdsk/cNtXdY*
    

  8. On all cluster nodes, remove references to the StorEdge/Netra st A1000 array.


    # scdidadm -C
    

  9. Remove any unused host adapter from nodes that were attached to the StorEdge/Netra st A1000 array.

    1. Shut down and power off the first node from which you are removing a host adapter:


      # scswitch -S -h nodename
      # shutdown -y -g0 -i0
      

      For the procedure on shutting down and powering off a node, see the Sun Cluster 3.0 12/01 System Administration Guide.

    2. Remove the host adapter from the first node.

      See the documentation that came with your node hardware for removal instructions.

    3. Boot the node and wait for it to rejoin the cluster.


      ok boot -r
      

    4. Repeat Step a through Step c for the second node that was attached to the StorEdge/Netra st A1000 array.

  10. Return resource groups to their primary nodes.


    # scswitch -Z
    

  11. Are you removing the last StorEdge/Netra st A1000 array from your cluster?

    • If not, you are finished with this procedure.

    • If you are removing the last StorEdge/Netra st A1000 array from your cluster, go to Step 12.

  12. Remove RAID Manager patches, then remove RAID Manager software packages.


    Caution - Caution -

    If you improperly remove RAID Manager packages, the next reboot of the node will fail. Before you remove RAID Manager software packages, see the Sun StorEdge RAID Manager 6.22.1 Release Notes for uninstallation issues.


    For the procedure on removing software packages, see the documentation that shipped with your StorEdge/Netra st A1000 array.

How to Replace a Failed Controller or Restore an Offline Controller

Use this procedure to replace a StorEdge/Netra st A1000 controller, or to restore an offline controller.

For conceptual information on SCSI reservations and failure fencing, see the Sun Cluster 3.0 12/01 Concepts.

  1. Determine if the array contains a LUN that is configured as a quorum device.


    # scstat -q
    
    • If the array does not contain a quorum device, go to Step 2.

    • If the array contains a LUN that is configured as a quorum device, choose and configure another LUN on a different array to be the new quorum device. Then remove the old quorum device.

  2. Restart the RAID Manager daemon:


    # /etc/init.d/amdemon stop
    # /etc/init.d/amdemon start
    

  3. Do you have a failed controller?

    • If your array is offline, but does not have a failed controller, go to Step 4.

    • If you have a failed controller, replace the failed controller with a new controller, but do not bring the controller online.

      For the procedure on replacing StorEdge/Netra st A1000 controllers, see the Sun StorEdge A3500/A3500FC Controller Module Guide and the Sun StorEdge RAID Manager Installation and Support Guide for additional considerations.

  4. On one node, use the RAID Manager GUI's Recovery application to restore the controller online.


    only -

    You must use the RAID Manager GUI's Recovery application to bring the controller online.


    For information on the Recovery application, see the Sun StorEdge RAID Manager User's Guide. If you have problems with bringing the controller online, see the Sun StorEdge RAID Manager Installation and Support Guide.

  5. On one node that is connected to the StorEdge/Netra st A1000 array, verify that the controller has the correct SCSI reservation state.

    Run the scdidadm(1M) repair option (-R) on LUN 0 of the controller you want to bring online:


    # scdidadm -R /dev/dsk/cNtXdY
    

How to Add a Disk Drive in a Running Cluster

Use this procedure to add a disk drive to a StorEdge/Netra st A1000 array that is in a running cluster.

  1. Verify that the new disk drive is formatted and not being transferred from another array.

    For information about moving drives between StorEdge/Netra st array subsystems, see the Sun StorEdge RAID Manager 6.22.1 Release Notes.

  2. Install the new disk drive to the disk array.

    For the procedure on installing a disk drive, see the Sun StorEdge A1000 and D1000 Installation, Operations, and Service Manual or the Netra st A1000 and Netra st D1000 Installation and Maintenance Manual.

  3. Allow the disk drive to spin up (approximately 30 seconds).

  4. Run Health Check to ensure that the new disk drive is not defective.

    For instructions on running Recovery Guru and Health Check, see the Sun StorEdge RAID Manager User's Guide.

  5. Fail the new drive, then revive the drive to update DacStore on the drive.

    For instructions on failing drives and manual recovery procedures, see the Sun StorEdge RAID Manager User's Guide.

  6. Repeat Step 2 through Step 5 for each disk drive you are adding.

Where to Go From Here

To create LUNs for the new drives, see "How to Create a LUN" for more information.

How to Replace a Failed Disk Drive in a Running Cluster

Use this procedure to replace a failed disk drive in a running cluster.

  1. Does replacing the disk drive affect any LUN's availability?

    • If not, go to Step 2.

    • If the replacement does affect LUN availability, remove the LUN(s) from volume management control. For more information, see your Solstice DiskSuite/Solaris Volume Manager or VERITAS Volume Manager documentation.

  2. Replace the disk drive in the disk array.

    For the procedure on replacing a disk drive, see the Sun StorEdge D1000 Storage Guide.

  3. Run Health Check to ensure that the new disk drive is not defective.

    For instructions on running Recovery Guru and Health Check, see the Sun StorEdge RAID Manager User's Guide.

  4. Does the failed drive belong to a drive group?

    • If the drive does not belong to a device group, go to Step 5.

    • If the drive is part of a device group, reconstruction is started automatically. If reconstruction does not start automatically for any reason, then select Reconstruct from the Manual Recovery application. Do not select Revive. When reconstruction is complete, go to Step 6.

  5. Fail the new drive, then revive the drive to update DacStore on the drive.

    For instructions on failing drives and manual recovery procedures, see the Sun StorEdge RAID Manager User's Guide.

  6. If you removed LUNs from volume management control in Step 1, return the LUN(s) to volume management control.

    For more information, see your Solstice DiskSuite/Solaris Volume Manager or VERITAS Volume Manager documentation.

How to Remove a Disk Drive From a Running Cluster

Use this procedure to remove a disk drive from a running cluster.

  1. Determine if the LUN that is associated with the disk drive you plan to remove is configured as a quorum device.


    # scstat -q
    
    • If the LUN is not a quorum device, go to Step 2.

    • If the LUN is configured as a quorum device, choose and configure another device to be the new quorum device. Then remove the old quorum device.

  2. Remove the LUN that is associated with the disk drive you are removing.

    For the procedure on removing a LUN, see "How to Delete a LUN".

  3. Remove the disk drive from the disk array.

    For the procedure on removing a disk drive, see the Sun StorEdge D1000 Storage Guide.


    Caution - Caution -

    After you remove the disk drive, install a dummy drive to maintain proper cooling.


How to Upgrade Disk Drive Firmware in a Running Cluster


only -

Only qualified service personnel should perform disk drive firmware updates. If you need to upgrade drive firmware, contact your local Sun solution center or Sun service provider.


How to Replace a Host Adapter in a Node (Connected to a StorEdge/Netra st A1000 array)

This section describes the procedure for replacing a failed host adapter in a running node that is attached to a StorEdge/Netra st A1000 array.

In the following procedure, Node 1's host adapter on SCSI bus A needs replacement but Node 2 remains in service.


only -

Several steps in this procedure require that you halt I/O activity. To halt I/O activity, take the array offline by using the RAID Manager GUI's manual recovery procedure in the Sun StorEdge RAID Manager User's Guide.


  1. Without powering off the node, shut down Node 1.


    # scswitch -S -h nodename
    # shutdown -y -g0 -i0
    

    For the procedure on shutting down a node, see the Sun Cluster 3.0 12/01 System Administration Guide.

  2. From Node 2, halt I/O activity to SCSI bus A.

    See the RAID Manager User's Guide for instructions.

  3. From the array end of the SCSI cable, disconnect the SCSI bus A cable that connects the array to Node 1, then replace this cable with a differential SCSI terminator.

  4. Restart I/O activity on SCSI bus A.

    See the RAID Manager User's Guide for instructions.

  5. Does servicing the failed host adapter affect SCSI bus B?

    • If SCSI bus B is not affected, go to Step 9.

    • If SCSI bus B is affected, continue with Step 6.

  6. From Node 2, halt I/O activity to the array on SCSI bus B.

    See the RAID Manager User's Guide for instructions.

  7. From the array end of the SCSI cable, disconnect the SCSI bus B cable that connects the array to Node 1 and replace this cable with a differential SCSI terminator.

  8. Restart I/O activity on SCSI bus B.

    See the RAID Manager User's Guide for instructions.

  9. Power off Node 1.

  10. Replace Node 1's host adapter.

    See the documentation that came with your node hardware for instructions.

  11. Power on Node 1, but do not allow it to boot. If necessary, halt the array.

  12. From Node 2, halt I/O activity to the array on SCSI bus A.

    See the RAID Manager User's Guide for instructions.

  13. Remove the differential SCSI terminator from SCSI bus A, then reinstall the SCSI cable to connect the array to Node 1.

  14. Restart I/O activity on SCSI bus A.

    See the RAID Manager User's Guide for instructions.

  15. Did you install a differential SCSI terminator to SCSI bus B in Step 7?

    • If not, skip to Step 18.

    • If you did install a SCSI terminator to SCSI bus B, halt I/O activity on SCSI bus B, then continue with Step 16.

  16. Remove the differential SCSI terminator from SCSI bus B, then reinstall the SCSI cable to connect the array to Node 1.

  17. Restart I/O activity on SCSI bus B.

    See the RAID Manager User's Guide for instructions.

  18. Bring the array back online.

    See the RAID Manager User's Guide for instructions.

  19. Rebalance all logical units (LUNs).

    See the RAID Manager User's Guide for instructions.

  20. Boot Node 1 into cluster mode.


    {0} ok boot