Sun Cluster 3.0 U1 Data Services Installation and Configuration Guide

Clearing the STOP_FAILED Error Flag on Resources

When the Failover_mode resource property is NONE or SOFT and the STOP of a resource fails, the individual resource goes into the STOP_FAILED state and the resource group goes into the ERROR_STOP_FAILED state. You cannot bring a resource group in this state on any node online, nor can you edit it (create or delete resources, or change resource-group or resource properties).

How to Clear the STOP_FAILED Error Flag on Resources

To complete this procedure, you must supply the following information.

See the scswitch(1M) man page for additional information.


Note -

Perform this procedure from any cluster node.


  1. Become superuser on a cluster member.

  2. Identify which resources have gone into the STOP_FAILED state and on which nodes.


    # scstat -g
    
  3. Manually stop the resources and their monitors on the nodes on which they are in STOP_FAILED state.

    This step might require killing processes or running resource type-specific commands or other commands.

  4. Manually set the state of these resources to OFFLINE on all the nodes on which they were manually stopped.


    # scswitch -c -h nodelist -j resource -f STOP_FAILED
    
    -c

    Clears the flag.

    -h nodelist

    Specifies the node names on which the resource was running.

    -j resource

    Specifies the name of the resource to take offline.

    -f STOP_FAILED

    Specifies the flag name.

  5. Check the resource-group state on the nodes where the STOP_FAILED flag was cleared in Step 4.

    The resource-group state should now be OFFLINE or ONLINE.


    # scstat -g
    

    If the resource group remains in the ERROR_STOP_FAILED state, which the command scstat -g indicates, run the following scswitch command to take the resource group offline on the nodes where the resource group is still in the ERROR_STOP_FAILED state.


    # scswitch -F -g resource-group
    

    -F

    Takes the resource group offline on all nodes that can master the group.

    -g resource-group

    Specifies the name of the resource group to take offline.

    This situation can occur if the resource group was being switched offline when the STOP method failure occurred and the resource that failed to stop had a dependency on other resources in the resource group. Otherwise, the resource group reverts to the ONLINE or OFFLINE state automatically after you have run the command in Step 4 on all STOP_FAILED resources.

    Now you can switch the resource group to the ONLINE state.