A split-brain partition occurs if a subset has exactly half the cluster members. (The split-brain partition does not include the scenario of a two-node cluster with a quorum device.) During initial installation of Sun Cluster, you were prompted to choose your preferred type of recovery from a split-brain scenario. Your choices were ask and select. If you chose ask, then if a split-brain partition occurs, the system asks you for a decision about which nodes should stay up. If you chose select, the system automatically selects for you which cluster members should stay up.
If you chose an automatic selection policy to deal with split-brain situations, your options were Lowest Nodeid or Highest Nodeid. If you chose Lowest Nodeid, then the subset containing the node with the lowest ID value becomes the new cluster. If you chose Highest Nodeid, then the subset containing the node with the highest ID value becomes the new cluster. For more details, see the section on installation procedures in the Sun Cluster 2.2 Software Installation Guide.
In either case, you must manually abort the nodes in all other subsets.
If you did not choose an automatic selection policy or if the system prompts you for input at the time of the partition, then the system displays the following error message.
SUNWcluster.clustd.reconf.3010 "*** ISSUE ABORTPARTITION OR CONTINUEPARTITION *** Proposed cluster: xxx Unreachable nodes: yyy" |
Additionally, a message similar to the following is displayed on the console every ten seconds:
*** ISSUE ABORTPARTITION OR CONTINUEPARTITION *** If the unreachable nodes have formed a cluster, issue ABORTPARTITION. (scadmin abortpartition <localnode> <clustername>) You may allow the proposed cluster to form by issuing CONTINUEPARTITION. (scadmin continuepartition <localnode> <clustername>) Proposed cluster partition: 0 Unreachable nodes: 1 |
If you did not choose an automatic select process, use the procedure "4.6.2 How to Choose a New Cluster" to choose the new cluster.
To restart the cluster after a split-brain failure, you must wait for the stopped node to come up entirely (it might undergo automatic reconfiguration or reboot) before you bring it back into the cluster using the scadmin startnode command.