Sun Cluster Geographic Edition System Administration Guide

Resolving Problems With Application Resource Group Failover When Communication Lost With the Storage Device

When a loss of communication occurs between a node on which the application is online and the storage device, some application resource groups might not failover gracefully to the nodes from which the storage is accessible. The application resource group might result in a ERROR_STOP_FAILED state.

Solution or Workaround

The Sun Cluster infrastructure does not initiate a switchover when I/O errors occur in a volume or its underlying devices. Because no switchover or failover occurs, the device service remains online on this node despite the fact that storage has been rendered inaccessible.

If this problem occurs, restart the application resource group on the correct nodes by using the standard Sun Cluster procedures. Refer to Clearing the STOP_FAILED Error Flag on Resources in Sun Cluster Data Services Planning and Administration Guide for Solaris OS about recovering from the ERROR_STOP_FAILED state and restarting the application.

The Sun Cluster Geographic Edition software detects state changes in the application resource group and displays the states in the output of the geoadm status command. For more information about using this command, see Monitoring the Runtime Status of the Sun Cluster Geographic Edition Software.