Sun Cluster 2.2 System Administration Guide

Recovering From Partial Power Loss

If the Sun Cluster nodes and the multihost disk expansion units have separate power sources, a failure can take down one or more components. Several scenarios can occur. The most likely cases are:

The power to one Sun Cluster node fails, taking down only the node.
The power to one multihost disk expansion unit fails, taking down only the expansion unit.
The power to one Sun Cluster node fails, taking down at least one multihost disk expansion unit.
The power to one Sun Cluster node fails, taking down the node, at least one multihost disk expansion unit, and the Terminal Concentrator.

Failure of One Node

If separate power sources are used on the nodes and the multihost disk expansion units, and you lose power to only one of the nodes, the other node detects the failure and initiates a takeover.

When power is restored to the node that failed, it reboots. You must rejoin the cluster by using the scadmin startnode command. Then perform a manual switchover by using the haswitch(1M) command to restore the default logical host ownership.

Failure of a Multihost Disk Expansion Unit

If you lose power to one of the multihost disk expansion units, your volume management software detects errors on the affected disks and takes action to put them into an error state. Disk mirroring masks this failure from the Sun Cluster fault monitoring. No switchover or takeover occurs.

When power is returned to the multihost disk expansion unit, perform the procedure documented in Chapter 11, Administering SPARCstorage Arrays, or Chapter 12, Administering Sun StorEdge MultiPacks and Sun StorEdge D1000s.

Failure of One Server and One Multihost Disk Expansion Unit

If power is lost to one of the Sun Cluster nodes and one multihost disk expansion unit, a secondary node immediately initiates a takeover.

After the power is restored, you must reboot the node, rejoin the node to the configuration by using the scadmin startnode command, and then begin monitoring activity. If manual switchover is configured, use the haswitch(1M) command to manually return ownership of the diskset to the node that had lost power. Refer to "Switching Over Logical Hosts", for more information.

After the diskset ownership has been returned to the default master, any multihost disks that reported errors must be returned to service. Use the instructions provided in the chapters on your disk expansion unit to return the multihost disks to service.

Note -

The node might reboot before the multihost disk expansion unit. Therefore, the associated disks will not be accessible. Reboot the node after the multihost disk expansion unit comes up.