Sun Cluster Data Services Planning and Administration Guide for Solaris OS

How to Clear the STOP_FAILED Error Flag on Resources

To complete this procedure, you must supply the following information.

See the scswitch(1M) man page for additional information.


Note –

Perform this procedure from any cluster node.


  1. Become superuser on a cluster member.

  2. Identify which resources have gone into the STOP_FAILED state and on which nodes.


    # scstat -g
    
  3. Manually stop the resources and their monitors on the nodes on which they are in STOP_FAILED state.

    This step might require that you kill processes or run commands that are specific to resource types or other commands.

  4. Manually set the state of these resources to OFFLINE on all of the nodes on which you manually stopped the resources.


    # scswitch -c -h nodelist -j resource -f STOP_FAILED
    
    -c

    Clears the flag.

    -h nodelist

    Specifies a comma-separated list of the names of the nodes where the resource is in the STOP_FAILED state. The list may contain one node name or more than one node name.

    -j resource

    Specifies the name of the resource to switch offline.

    -f STOP_FAILED

    Specifies the flag name.

  5. Check the resource group state on the nodes where you cleared the STOP_FAILED flag in Step 4.

    The resource group state should now be OFFLINE or ONLINE.


    # scstat -g
    

    The command scstat -g indicates whether the resource group remains in the ERROR_STOP_FAILED state. If the resource group is still in the ERROR_STOP_FAILED state, then run the following scswitch command to switch the resource group offline on the appropriate nodes.


    # scswitch -F -g resource-group
    

    -F

    Switches the resource group offline on all of the nodes that can master the group.

    -g resource-group

    Specifies the name of the resource group to switch offline.

    This situation can occur if the resource group was being switched offline when the STOP method failure occurred and the resource that failed to stop had a dependency on other resources in the resource group. Otherwise, the resource group reverts to the ONLINE or OFFLINE state automatically after you have run the command in Step 4 on all of the STOP_FAILED resources.

    Now you can switch the resource group to the ONLINE state.