If a node drops out of a four-node cluster as a result of a reset issued via the terminal concentrator (TC), the surviving cluster nodes are unable to reserve the quorum device, since the reservation by any other node prevents the two healthy nodes from accessing the device. However, if you erroneously ran the scadmin startcluster command on the partitioned node, the partitioned node would form its own cluster, since it is unable to communicate with any other node. There are no quorum reservations in effect to prevent it from forming its own cluster.
Instead of the quorum scheme, Sun Cluster resorts to a cluster-wide lock (nodelock) mechanism. An unused port in the TC of the cluster, or the SSP, is used. (Multiple TCs are used for campus-wide clusters.) During installation, you choose the TC or SSP for this node-locking mechanism. This information is stored in the CCD. One of the cluster members always holds this lock for the lifetime of a cluster activation; that is, from the time the first node successfully forms a new cluster until the last node leaves the cluster. If the node holding the lock fails, the lock is automatically moved to another node.
The only function of the nodelock is to prevent operator error from starting a new cluster in a split-brain scenario.
The first node joining the cluster aborts if it is unable to obtain this lock. However, node failures or aborts do not occur if the second and subsequent nodes of the cluster are unable to obtain this lock.
Node locking functions in this way:
If the first node to form a new cluster is unable to acquire this lock, it aborts with the following message:
[SUNWcluster.reconf.nodelock.4002] $clustname Failed to obtain NodeLock status = ?? |
If the first node to form a new cluster acquires this lock, the following message is displayed:
[SUNWcluster.reconf.nodelock.1000] $clustname Obtained Nodelock |
If one of the current nodes in a cluster is unable to acquire this lock during the course of a reconfiguration, an error message is logged on the system console:
[SUNWcluster.reconf.nodelock.3004] $clustname WARNING: Failed to Force obtain NodeLock status = ?? |
This message warns you that the lock could not be acquired. You need to diagnose and fix this error as soon as possible to prevent possible future problems.
If a partitioned node tries to form its own cluster (by using the scadmin startcluster command), it is unable to acquire the cluster lock if the cluster is active in some other partition. Failure to acquire this lock causes this node to abort.