Oracle® Solaris Cluster 4.2 Hardware Administration Manual

Exit Print View

Updated: July 2014, E39726–01

Preparing the Cluster for Kernel Cage Dynamic Reconfiguration

When you use a dynamic reconfiguration operation to remove a system board containing kernel cage memory (memory used by the Oracle Solaris OS), the system must be quiesced in order to allow the memory contents to be copied to another system board. In a clustered system, the tight coupling between cluster nodes means that the quiescing of one node for repair can cause operations on non-quiesced nodes to be delayed until the repair operation is complete and the node is unquiesced. For this reason, using dynamic reconfiguration to remove a system board containing kernel cage memory from a cluster node requires careful planning and preparation.

Use the following information to reduce the impact of the dynamic reconfiguration quiesce on the rest of the cluster:

  • I/O operations for file systems or global device groups with their primary or secondary on the quiesced node will hang until the node is unquiesced. If possible, ensure that the node being repaired is not the primary for any global file systems or device groups.

  • I/O to SVM multi-owner disksets that include the quiesced node will hang until the node is unquiesced.

  • Updates to the CCR require communication between all cluster members. Any operations that result in CCR updates should not be performed while the dynamic reconfiguration operation is ongoing. Configuration changes are the most common cause of CCR updates.

  • Many cluster commands result in communication among cluster nodes. Refrain from running cluster commands during the dynamic reconfiguration operation.

  • Applications and cluster resources on the node being quiesced will be unavailable for the duration of the dynamic reconfiguration event. The time required to move applications and resources to another node should be weighed against the expected outage time of the dynamic reconfiguration event.

  • Scalable applications such as Oracle RAC often have a different membership standard, and have communication and synchronization actions among members. Scalable application instances on the node to be repaired should be brought offline before you initiate the dynamic reconfiguration operation.