Sun Cluster 2.2 System Administration Guide

11.5 Replacing a SPARCstorage Array Controller and Changing the World Wide Name

The SPARCstorage Array controller has a unique identifier known as the World Wide Name (WWN) that identifies the controller to Solaris. Therefore, when SPARCstorage Array failures make it necessary to replace the controller or the entire chassis containing the controller, special procedures apply.

The WWN is like the host ID stored in the host IDPROM of a SPARC machine. The last four digits of the SPARCstorage Array WWN are displayed on the LCD panel of the chassis. The WWN is part of the /devices path associated with the SPARCstorage Array and its component drives.

If you must replace the SPARCstorage Array controller or the entire chassis, the Sun Cluster nodes will discover the new WWN when they are rebooted. To avoid confusion of the upper layers of Sun Cluster software by the new WWN, change the WWN of the new controller to be the WWN of the old controller. (This is similar to swapping the IDPROM when replacing a system board in a SPARC machine.)

Consider the following situations when deciding which WWN replacement procedure to use:

The procedure described in "11.5.1 How to Change a SPARCstorage Array World Wide Name Using a Maintenance System" makes use of a separate maintenance system that enables the controller to be changed without stopping cluster nodes.
If the SPARCstorage Array has not entirely failed or is being swapped for some other reason, prepare for the swap by performing the steps described in "11.4 Administering SPARCstorage Array Trays", for each tray in the SPARCstorage Array. Then use the procedure described in "11.5.2 How to Change a SPARCstorage Array World Wide Name".
If the SPARCstorage Array controller has failed entirely, your volume management software has already prepared for the swap. In this case, you can use the procedure described in "11.5.2 How to Change a SPARCstorage Array World Wide Name".

11.5.1 How to Change a SPARCstorage Array World Wide Name Using a Maintenance System

This procedure describes how to change a SPARCstorage Array controller and replace its WWN with the WWN of the failed controller. This procedure enables you to replace a SPARCstorage Array controller without taking down any nodes in the cluster.

This procedure makes use of a "maintenance system," which can be any Sun Microsystems architecture capable of supporting a SPARCstorage Array. The presence of a maintenance system enables you to complete this procedure without taking down any nodes in the cluster.

This system should be loaded with the same version of the Solaris Operating Environment (2.6 or 7) as the cluster nodes, and should have all applicable patches. It also should have available a CD-ROM drive, a Fibre Channel SBus Card (FC/S), and a Fibre Channel Optical Module (FC/OM). The system should have the proper FCODE and hardware revisions. Alternately, you can boot the maintenance system over the net.

Note -

If a "maintenance system" is not available, use one of the cluster nodes for this purpose by following the steps in this procedure.

These are the high-level steps to change a SPARCstorage Array World Wide Name (WWN) using a maintenance system:

(Optional) If the controller is the quorum device, using the scconf(1M) command to select a new quorum device
Obtaining the WWN of the previous array
Detaching the optical cables and replacing the controller or array
Attaching the optical cable from the maintenance system to the new controller
Booting the maintenance system with "mini-unix" from a Solaris CD
Downloading the original WWN
Resetting the SSA
Shutting down the maintenance system
Attaching the SSA controller to the cluster nodes
Checking the new controller's firmware level from the cluster node
(Optional) If necessary, upgrading the new controller's firmware from the cluster node
Bringing the SSA tray online and performing volume management recovery

These are the detailed steps to change a SPARCstorage Array World Wide Name by using a maintenance system.

If the failed SPARCstorage Array controller is the quorum controller, select a new quorum controller by using the scconf(1M) command.

Refer to the scconf(1M) man page for more information.

Determine the WWN of the broken SPARCstorage Array.
- If the SPARCstorage Array is powered down, use the following instructions to obtain the WWN.
  
  The WWN is composed of 12 hexadecimal digits. The digits are shown as part of the device path component. They are the last 12 digits following the characters pln@a0, excluding the comma. Use the ls(1) command on a cluster node connected to the SSA to identify the current WWN.
# ls -l /dev/rdsk/cNt0d0s0 ...SUNW,pln@a0000000,7412bf ...
In this example, the WWN for the SPARCstorage Array being replaced is 0000007412bf. The variable N in the device name represents the controller number for the broken SPARCstorage Array. The string "t0d0s0" is just an example. Use a device name that you know exists on the SPARCstorage Array, or use /dev/rdsk/cN* to match all devices.
- If the SPARCstorage Array is up and running, you can obtain the WWN by using the luxadm(1M) command.
  
  When you run luxadm(1M) with the display option and specify a controller, all the information about the SPARCstorage Array is displayed. The serial number reported by luxadm(1M) is the WWN.
# /usr/sbin/luxadm display cN

Detach the optical cable from the faulty SPARCstorage Array Controller.

Replace the faulty controller.

Use the instructions in your SPARCstorage Array service manual to perform this step.

If the SPARCstorage Array has not failed entirely or is being swapped for a reason other than controller failure, prepare for the swap by performing the steps described in "11.4 Administering SPARCstorage Array Trays", for each tray in the SPARCstorage Array.

If the SPARCstorage Array controller has failed entirely, your volume manager has already prepared for the swap.

Attach the optical cable from the maintenance system to the new controller.

Enter the OpenBoot PROM on the maintenance system and boot it with "mini-unix."

Do this from the distribution CD (or its network equivalent) to put the maintenance system into single-user mode and to obtain an in-memory version of the device structure that contains the new SPARCstorage Array WWN.
<#0> ok boot cdrom -s or <#0> ok boot netqe1 -s
Use "mini-unix" to avoid making any permanent device information changes.

Run the luxadm download command to set the WWN.
# /usr/sbin/luxadm -s -w WWN download cN
WWN is the 12-digit WWN of the replaced controller and N is the controller number from cNtXdX in the device name. You should have obtained the WWN in Step 2.

Note -
The leading zeros must be entered as part of the WWN to make a total of 12 digits.

Caution -
Do not interrupt the download process. Wait for the shell prompt after completion of the luxadm(1M)command.

After the prompt is redisplayed, reset the SSA.

The new address should appear in the window on the SPARCstorage Array.

Shut down the maintenance system.

Reattach the SPARCstorage Array controller to the cluster nodes.

Verify the SPARCstorage Array firmware level from the cluster node.

Use the luxadm(1M) command to determine the current version of the firmware. Specify the controller number (N in the example) to the luxadm(1M) command.
# /usr/sbin/luxadm display cN
Note -
If the Solaris system detects an old version of firmware on your system, it displays a message on the console and in /var/adm/messages similar to the following:
NOTICE: pln0: Old SSA firmware has been detected (Ver:3.11) : Expected (Ver:3.12) - Please upgrade

(Optional) To upgrade the controller's firmware, follow these steps.
1. Download the proper firmware. Refer to the README file in the firmware patch for details.
  # /usr/sbin/ssaadm download -f path/ssafirmware cN
  where path is the path to the directory where the firmware is stored and N is the controller number. For example:
  # /usr/sbin/ssaadm download -f /usr/lib/firmware/ssa/ssafirmware cN
2. Reset the SPARCstorage Array by pressing the SYS OK button on the unit.
  
  There will be a short delay while the unit reboots.
3. Verify the firmware level again (using Step 11). If the firmware level or WWN is still incorrect, repeat Step 12 using a different controller.

Begin volume manager recovery.

Refer to "11.4 Administering SPARCstorage Array Trays". Wait until the SPARCstorage Array is online for all nodes, and all nodes can see all the disks.

11.5.2 How to Change a SPARCstorage Array World Wide Name

Caution -

This procedure will not work if the root disk is encapsulated by SSVM or CVM, or if the boot disk of one of the nodes is on this SPARCstorage Array. For those situations, use the procedure "11.5.1 How to Change a SPARCstorage Array World Wide Name Using a Maintenance System".

Note -

If a quorum controller fails, you must select a new quorum controller before shutting down a node.

These are the high-level steps to change a SPARCstorage Array World Wide Name:

(Optional) If the controller is the quorum device, using the scconf(1M) command to select a new quorum device
Switching ownership of logical hosts away from the node on which the repair procedure will be performed or controller being replaced
Obtaining the WWN of the previous array
Replacing the controller or array
Stopping Sun Cluster software and halting the node that does not own the disks
With "mini-unix," rebooting the node that does not own the disks
Determining the controller number for the new array
Setting the new WWN and resetting the array
Rebooting the other cluster nodes, if necessary
Performing volume management recovery

These are the detailed steps to change a SPARCstorage Array World Wide Name.

If the failed SPARCstorage Array controller is the quorum controller, select a new quorum controller by using the scconf(1M) command.

Refer to the scconf(1M) man page for more information.

On the cluster node that is connected to the SSA being repaired, stop the Sun Cluster software and halt the system.

Use the scadmin(1M) command to switch ownership of all logical hosts to the other nodes in the cluster, and to stop Sun Cluster. Then run the halt(1M) command to stop the machine.

In this example, phys-hahost2 is the node from which the repair procedure is performed.
phys-hahost2# scadmin stopnode ... phys-hahost2# halt

Determine the WWN of the broken SPARCstorage Array.
- If the SPARCstorage Array is powered down, use the following instructions to obtain the WWN.
  
  The WWN is composed of 12 hexadecimal digits. The digits are shown as part of the device path component containing the characters pln@a0. They are the last 12 digits following the characters pln@a0, excluding the comma. Use the ls(1) command on a cluster node connected to the SSA to identify the current WWN.
phys-hahost1# ls -l /dev/rdsk/cNt0d0s0 ...SUNW,pln@a0000000,7412bf ...
In this example, the WWN for the SPARCstorage Array being replaced is 0000007412bf. The variable N in the device name represents the controller number for the broken SPARCstorage Array. The string t0d0s0 is just an example. Use a device name that you know exists on the SPARCstorage Array, or use /dev/rdsk/cN* to match all devices.
- If the SPARCstorage Array is up and running, you can obtain the WWN by using the luxadm(1M) command.
  
  When you run luxadm(1M) with the display option and specify a controller, all the information about the SPARCstorage Array is displayed. The serial number reported by luxadm(1M) is the WWN.
phys-hahost1# /usr/sbin/luxadm display cN

Replace the controller or SPARCstorage Array.

Use the instructions in your SPARCstorage Array service manual to perform this step.
- If the SPARCstorage Array has not failed completely or is being swapped for a reason other than controller failure, prepare for the swap by performing the steps described in "11.4 Administering SPARCstorage Array Trays", for each tray in the SPARCstorage Array.
- If the SPARCstorage Array controller has failed completely, your volume manager has already prepared for the swap.

Enter the OpenBoot PROM on the halted node and boot it with "mini-unix."

Do this from the distribution CD (or net equivalent) to put the host into single-user mode and to obtain an in-memory version of the device structure that contains the new SPARCstorage Array WWN.
<#0> ok boot cdrom -s or <#0> ok boot netqe1 -s
Use "mini-unix" to avoid making any permanent device information changes to the cluster node.

Determine the controller number for the new SPARCstorage Array.

Use the ls(1) command and the four digits displayed on the LCD display of the new SPARCstorage Array to identify the controller number.

In this example, the four digits shown on the LCD display are 143b. Note that the device name c*t0d0s0 uses pattern matching for the controller number but specifies a slice that is known to exist. This reduces the number of lines generated in the output.

# ls -l /dev/rdsk/c*t0d0s0 | grep -i 143b
lrwxrwxrwx   1 root     root          98 Mar 14 13:38
 /dev/rdsk/c3t0d0s0 ->
 ../../devices/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@3,0/SUNW
 ,pln@a0000000,74143b/ssd@0,0:a,raw

In this example, 3 (from /dev/rdsk/c3...) is the controller number of the new SPARCstorage Array under "mini-unix".

Note -

The hex digits in the LCD display are in mixed case--letters A, C, E, and F are in upper case, and letters b and d are in lower case. The example uses grep -i to ignore case in the comparison.

Run the luxadm download command to set the WWN.

Use the controller number determined in Step 6. For example, the following command would change the WWN from the current value to the value determined in Step 3, 0000007412bf. The SPARCstorage Array controller is Controller 3.
phys-hahost2# /usr/sbin/luxadm download -w 0000007412bf c3
Note -
The leading zeros must be entered as part of the WWN to make a total of 12 digits.

Caution -
Do not interrupt the download process. Wait for the shell prompt after completion of the luxadm(1M) command.

Reset the SPARCstorage Array by pressing the SYS OK button on the unit.

There will be a short delay while the unit reboots and begins communicating with the Sun Cluster nodes.

Abort "mini-unix" and boot the host normally.

Send a break to the console, and boot the machine.

Verify the SPARCstorage Array firmware level from the cluster node.

Use the luxadm(1M) command to determine the current version of the firmware. Specify the controller number (N in the example) to the luxadm(1M) command.
phys-hahost2# /usr/sbin/luxadm display cN
Note -
If the Solaris system detects an old version of firmware on your system, it displays a message on the console and in /var/adm/messages similar to the following:
NOTICE: pln0: Old SSA firmware has been detected (Ver:3.11) : Expected (Ver:3.12) - Please upgrade

(Optional) To upgrade the controller's firmware, follow these steps.
1. Download the proper firmware. Refer to the README file in the firmware patch for details.
  # /usr/sbin/ssaadm download -f path/ssafirmware cN
  where path is the path to the directory where the firmware is stored and N is the controller number. For example:
  # /usr/sbin/ssaadm download -f /usr/lib/firmware/ssa/ssafirmware cN
2. Reset the SPARCstorage Array using the SYS OK button on the unit.
  
  There will be a short delay while the unit reboots.
3. Re-verify the firmware level (see Step 10). If either the firmware level or WWN are still incorrect, then repeat Step 11 using a different controller.

Start the node.
phys-hahost2# scadmin startnode

Switch back the logical hosts to the default master, if necessary.

Complete the replacement by restoring the volume manager components onto the repaired SPARCstorage Array.

This procedure is described in "11.4 Administering SPARCstorage Array Trays".

Reboot the other nodes in the cluster, if necessary.

You might need to reboot the other cluster nodes, if they are unable to recognize all disks in the SPARCstorage Array following the replacement. If this is the case, use the scadmin stopnode command to stop Sun Cluster activity, then reboot. After the reboot, if necessary, switch the logical hosts back to their default masters. See the scadmin(1M) man page for more information.