|Skip Navigation Links|
|Exit Print View|
|Oracle Solaris Cluster 4.1 Hardware Administration Manual Oracle Solaris Cluster 4.1|
When you perform a Dynamic Reconfiguration (DR) remove operation on a memory board with kernel cage memory, the affected node becomes unresponsive so heartbeat monitoring for that node is suspended on all other nodes and the node's quorum vote count is set to 0. After DR is completed, the heartbeat monitoring of the affected node is automatically re-enabled and the quorum vote count is reset to 1. If the DR operation does not complete, you might need to manually recover. For general information about DR, see Dynamic Reconfiguration Support in Oracle Solaris Cluster Concepts Guide.
The monitor-heartbeat subcommand is not supported in an exclusive-IP zone cluster. For more information about this command, see the cluster(1CL) man page.
When you use a DR operation to remove a system board containing kernel cage memory (memory used by the Oracle Solaris OS), the system must be quiesced in order to allow the memory contents to be copied to another system board. In a clustered system, the tight coupling between cluster nodes means that the quiescing of one node for repair can cause operations on non-quiesced nodes to be delayed until the repair operation is complete and the node is unquiesced. For this reason, using DR to remove a system board containing kernel cage memory from a cluster node requires careful planning and preparation.
Use the following information to reduce the impact of the DR quiesce on the rest of the cluster:
I/O operations for file systems or global device groups with their primary or secondary on the quiesced node will hang until the node is unquiesced. If possible, ensure that the node being repaired is not the primary for any global file systems or device groups.
I/O to SVM multi-owner disksets that include the quiesced node will hang until the node is unquiesced.
Updates to the CCR require communication between all cluster members. Any operations that result in CCR updates should not be performed while the DR operation is ongoing. Configuration changes are the most common cause of CCR updates.
Many cluster commands result in communication among cluster nodes. Refrain from running cluster commands during the DR operation.
Applications and cluster resources on the node being quiesced will be unavailable for the duration of the DR event. The time required to move applications and resources to another node should be weighed against the expected outage time of the DR event.
Scalable applications such as Oracle RAC often have a different membership standard, and have communication and synchronization actions among members. Scalable application instances on the node to be repaired should be brought offline before you initiate the DR operation.
If the DR operation does not complete, perform the following steps to re-enable heartbeat timeout monitoring for that node and to reset the quorum vote count.
From a single cluster node (which is not the node where the DR operation was performed), run the following command.
# cluster monitor-heartbeat
Use this command only in the global zone. Messages display indicating that monitoring has been enabled.
If the node is at the ok prompt, boot it now.
# clquorum status --- Quorum Votes by Node (current status) --- Node Name Present Possible Status --------- ------- -------- ------ pnode1 1 1 Online pnode2 1 1 Online pnode3 0 0 Online
# clquorum votecount -n nodename 1
The hostname of the node that has a quorum vote count of 0.
# clquorum status --- Quorum Votes by Node (current status) --- Node Name Present Possible Status --------- ------- -------- ------ pnode1 1 1 Online pnode2 1 1 Online pnode3 1 1 Online