Sun Cluster Upgrade Guide for Solaris OS

Chapter 7 Recovering From an Incomplete Upgrade

This chapter provides the following information to recover from certain kinds of incomplete upgrades:

Cluster Recovery After an Incomplete Upgrade

This section provides information to recover from incomplete upgrades of a Sun Cluster configuration.

How to Recover from a Failed Dual-Partition Upgrade

If you experience an unrecoverable error during upgrade, perform this procedure to back out of the upgrade.

Note –

You cannot restart a dual-partition upgrade after the upgrade has experienced an unrecoverable error.

Become superuser on each node of the cluster.

Boot each node into noncluster mode.

On SPARC based systems, perform the following command:
ok boot -x

On x86 based systems, perform the following commands:

In the GRUB menu, use the arrow keys to select the appropriate Solaris entry and type e to edit its commands.

The GRUB menu appears similar to the following:

GNU GRUB version 0.95 (631K lower / 2095488K upper memory)
+----------------------------------------------------------------------+
| Solaris 10 /sol_10_x86                                               |
| Solaris failsafe                                                     |
|                                                                      |
+----------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, 'e' to edit the
commands before booting, or 'c' for a command-line.

For more information about GRUB based booting, see Booting an x86 Based System by Using GRUB (Task Map) in System Administration Guide: Basic Administration.

In the boot parameters screen, use the arrow keys to select the kernel entry and type e to edit the entry.

The GRUB boot parameters screen appears similar to the following:

GNU GRUB version 0.95 (615K lower / 2095552K upper memory)
+----------------------------------------------------------------------+
| root (hd0,0,a)                                                       |
| kernel /platform/i86pc/multiboot                                     |
| module /platform/i86pc/boot_archive                                  |
+----------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press 'b' to boot, 'e' to edit the selected command in the
boot sequence, 'c' for a command-line, 'o' to open a new line
after ('O' for before) the selected line, 'd' to remove the
selected line, or escape to go back to the main menu.

Add -x to the command to specify that the system boot into noncluster mode.

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ESC at any time exits. ]

grub edit> kernel /platform/i86pc/multiboot -x

Press Enter to accept the change and return to the boot parameters screen.

The screen displays the edited command.

GNU GRUB version 0.95 (615K lower / 2095552K upper memory)
+----------------------------------------------------------------------+
| root (hd0,0,a)                                                       |
| kernel /platform/i86pc/multiboot -x                                  |
| module /platform/i86pc/boot_archive                                  |
+----------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press 'b' to boot, 'e' to edit the selected command in the
boot sequence, 'c' for a command-line, 'o' to open a new line
after ('O' for before) the selected line, 'd' to remove the
selected line, or escape to go back to the main menu.-

Type b to boot the node into noncluster mode.

Note –
This change to the kernel boot parameter command does not persist over the system boot. The next time you reboot the node, it will boot into cluster mode. To boot into noncluster mode instead, perform these steps to again to add the -x option to the kernel boot parameter command.

On each node, run the upgrade recovery script from the installation media.

If the node successfully upgraded to Sun Cluster 3.2 1/09 software from an earlier 3.2 release, you can alternatively run the scinstall command from the /usr/cluster/bin directory.

Note –
If you upgraded from a Sun Cluster 3.1 release, run the scinstall command only from the installation media. Recovery capability for dual-partition upgrade is not available from the 3.1 versions of the scinstall command.
phys-schost# cd /cdrom/cdrom0/Solaris_arch/Product/sun_cluster/Solaris_ver/Tools phys-schost# ./scinstall -u recover
-u

Specifies upgrade.

recover

Restores the /etc/vfstab file and the Cluster Configuration Repository (CCR) database to their original state before the start of the dual-partition upgrade.

The recovery process leaves the cluster nodes in noncluster mode. Do not attempt to reboot the nodes into cluster mode.

For more information, see the scinstall(1M) man page.

Perform either of the following tasks.
- Restore the old software from backup to return the cluster to its original state.
- Continue to upgrade software on the cluster by using the standard upgrade method.
  
  This method requires that all cluster nodes remain in noncluster mode during the upgrade. See the task map for standard upgrade, Table 2–1. You can resume the upgrade at the last task or step in the standard upgrade procedures that you successfully completed before the dual-partition upgrade failed.

SPARC: How to Recover From a Partially Completed Dual-Partition Upgrade

Perform this procedure if a dual-partition upgrade fails and the state of the cluster meets all of the following criteria:

The nodes of the first partition have been upgraded.
None of the nodes of the second partition are yet upgraded.
None of the nodes of the second partition are in cluster mode.

You can also perform this procedure if the upgrade has succeeded on the first partition but you want to back out of the upgrade.

Note –

Do not perform this procedure after dual-partition upgrade processes have begun on the second partition. Instead, perform How to Recover from a Failed Dual-Partition Upgrade.

Before You Begin

Before you begin, ensure that all second-partition nodes are halted. First-partition nodes can be either halted or running in noncluster mode.

Perform all steps as superuser.

Boot each node in the second partition into noncluster mode.
phys-schost# ok boot -x

From the DVD image, run the scinstall -u recover command on each node in the second partition.

Change to the /Solaris_arch/Product/sun_cluster/Solaris_ver/Tools/ directory, where arch is sparc or x86 (Solaris 10 only) and where ver is 9 for Solaris 9 or 10 for Solaris 10.
phys-schost# cd /cdrom/cdrom0/Solaris_arch/Product/sun_cluster/Solaris_ver/Tools/phys-schost# ./scinstall -u recover
The command restores the original CCR information, restores the original /etc/vfstab file, and eliminates modifications for startup.

Boot each node of the second partition into cluster mode.
phys-schost# shutdown -g0 -y -i6
When the nodes of the second partition come up, the second partition resumes supporting cluster data services while running the old software with the original configuration.

Restore the original software and configuration data from backup media to the nodes in the first partition.

Boot each node in the first partition into cluster mode.
phys-schost# shutdown -g0 -y -i6
The nodes rejoin the cluster.

x86: How to Recover From a Partially Completed Dual-Partition Upgrade

Perform this procedure if a dual-partition upgrade fails and the state of the cluster meets all of the following criteria:

The nodes of the first partition have been upgraded.
None of the nodes of the second partition are yet upgraded.
None of the nodes of the second partition are in cluster mode.

You can also perform this procedures if the upgrade has succeeded on the first partition but you want to back out of the upgrade.

Note –

Do not perform this procedure after dual-partition upgrade processes have begun on the second partition. Instead, perform How to Recover from a Failed Dual-Partition Upgrade.

Before You Begin

Before you begin, ensure that all second-partition nodes are halted. First-partition nodes can be either halted or running in noncluster mode.

Perform all steps as superuser.

Boot each node in the second partition into noncluster mode by completing the following steps.

In the GRUB menu, use the arrow keys to select the appropriate Solaris entry and type e to edit its commands.

The GRUB menu appears similar to the following:

GNU GRUB version 0.95 (631K lower / 2095488K upper memory)
+-------------------------------------------------------------------------+
| Solaris 10 /sol_10_x86                                                  |
| Solaris failsafe                                                        |
|                                                                         |
+-------------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, 'e' to edit the
commands before booting, or 'c' for a command-line.

For more information about GRUB-based booting, see Booting an x86 Based System by Using GRUB (Task Map) in System Administration Guide: Basic Administration.

In the boot parameters screen, use the arrow keys to select the kernel entry and type e to edit the entry.

The GRUB boot parameters screen appears similar to the following:

GNU GRUB version 0.95 (615K lower / 2095552K upper memory)
+----------------------------------------------------------------------+
| root (hd0,0,a)                                                       |
| kernel /platform/i86pc/multiboot                                     |
| module /platform/i86pc/boot_archive                                  |
+----------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press 'b' to boot, 'e' to edit the selected command in the
boot sequence, 'c' for a command-line, 'o' to open a new line
after ('O' for before) the selected line, 'd' to remove the
selected line, or escape to go back to the main menu.

Add the -x option to the command to specify that the system boot into noncluster mode.

Minimal BASH-like line editing is supported.
For the first word, TAB lists possible command completions.
Anywhere else TAB lists the possible completions of a device/filename.
ESC at any time exits.

phys-schost# grub edit> kernel /platform/i86pc/multiboot -x

Press Enter to accept the change and return to the boot parameters screen.

The screen displays the edited command.

GNU GRUB version 0.95 (615K lower / 2095552K upper memory)
+----------------------------------------------------------------------+
| root (hd0,0,a)                                                       |
| kernel /platform/i86pc/multiboot -x                                  |
| module /platform/i86pc/boot_archive                                  |
+----------------------------------------------------------------------+
Use the ^ and v keys to select which entry is highlighted.
Press 'b' to boot, 'e' to edit the selected command in the
boot sequence, 'c' for a command-line, 'o' to open a new line
after ('O' for before) the selected line, 'd' to remove the
selected line, or escape to go back to the main menu.-

Type b to boot the node into noncluster mode.

Note –
This change to the kernel boot parameter command does not persist over the system boot. The next time you reboot the node, it will boot into cluster mode. To boot into noncluster mode instead, perform these steps to again to add the -x option to the kernel boot parameter command.

On each node in the second partition, run the scinstall -u recover command.
phys-schost# /usr/cluster/bin/scinstall -u recover
The command restores the original CCR information, restores the original /etc/vfstab file, and eliminates modifications for startup.

Boot each node of the second partition into cluster mode.
phys-schost# shutdown -g0 -y -i6
When the nodes of the second partition come up, the second partition resumes supporting cluster data services while running the old software with the original configuration.

Restore the original software and configuration data from backup media to the nodes in the first partition.

Boot each node in the first partition into cluster mode.
phys-schost# shutdown -g0 -y -i6
The nodes rejoin the cluster.

Recovering From Storage Configuration Changes During Upgrade

This section provides the following repair procedures to follow if changes were inadvertently made to the storage configuration during upgrade:

How to Handle Storage Reconfiguration During an Upgrade

Any changes to the storage topology, including running Sun Cluster commands, should be completed before you upgrade the cluster to Solaris 9 or Solaris 10 software. If, however, changes were made to the storage topology during the upgrade, perform the following procedure. This procedure ensures that the new storage configuration is correct and that existing storage that was not reconfigured is not mistakenly altered.

Note –

This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix B, Sun Cluster Object-Oriented Commands, in Sun Cluster System Administration Guide for Solaris OS. This procedure provides the long forms of the Sun Cluster commands. Most commands also have short forms. Except for the forms of the command names, the commands are identical. For a list of the commands and their short forms, see Appendix B, Sun Cluster Object-Oriented Commands, in Sun Cluster System Administration Guide for Solaris OS.

Before You Begin

Ensure that the storage topology is correct. Check whether the devices that were flagged as possibly being replaced map to devices that actually were replaced. If the devices were not replaced, check for and correct possible accidental configuration changes, such as incorrect cabling.

On a node that is attached to the unverified device, become superuser.

Manually update the unverified device.
phys-schost# cldevice repair device
See the cldevice(1CL) man page for more information.

Update the DID driver.
phys-schost# scdidadm -ui phys-schost# scdidadm -r
-u

Loads the device-ID configuration table into the kernel.

-i

Initializes the DID driver.

-r

Reconfigures the database.

Repeat Step 2 through Step 3 on all other nodes that are attached to the unverified device.

Next Steps

Return to the remaining upgrade tasks. Go to Step 4 in How to Upgrade Sun Cluster 3.2 1/09 Software (Standard).

How to Resolve Mistaken Storage Changes During an Upgrade

If accidental changes are made to the storage cabling during the upgrade, perform the following procedure to return the storage configuration to the correct state.

Note –

This procedure assumes that no physical storage was actually changed. If physical or logical storage devices were changed or replaced, instead follow the procedures in How to Handle Storage Reconfiguration During an Upgrade.

Before You Begin

Return the storage topology to its original configuration. Check the configuration of the devices that were flagged as possibly being replaced, including the cabling.

On each node of the cluster, become superuser.

Update the DID driver on each node of the cluster.
phys-schost# scdidadm -ui phys-schost# scdidadm -r
-u

Loads the device–ID configuration table into the kernel.

-i

Initializes the DID driver.

-r

Reconfigures the database.

See the scdidadm(1M) man page for more information.

If the scdidadm command returned any error messages in Step 2, make further modifications as needed to correct the storage configuration, then repeat Step 2.

Next Steps

Return to the remaining upgrade tasks. Go to Step 4 in How to Upgrade Sun Cluster 3.2 1/09 Software (Standard).