JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster 3.3 3/13 Hardware Administration Manual     Oracle Solaris Cluster 3.3 3/13
search filter icon
search icon

Document Information

Preface

1.  Introduction to Oracle Solaris Cluster Hardware

2.  Installing and Configuring the Terminal Concentrator

3.  Installing Cluster Interconnect Hardware and Configuring VLANs

4.  Maintaining Cluster Interconnect Hardware

5.  Installing and Maintaining Public Network Hardware

6.  Maintaining Platform Hardware

Mirroring Internal Disks on Servers that Use Internal Hardware Disk Mirroring or Integrated Mirroring

How to Configure Internal Disk Mirroring After the Cluster Is Established

How to Remove an Internal Disk Mirror

Configuring Cluster Nodes With a Single, Dual-Port HBA

Risks and Trade-offs When Using One Dual-Port HBA

Supported Configurations When Using a Single, Dual-Port HBA

Cluster Configuration When Using Solaris Volume Manager and a Single Dual-Port HBA

Configuration Requirements

Expected Failure Behavior with Solaris Volume Manager

Failure Recovery with Solaris Volume Manager

Cluster Configuration When Using Solaris Volume Manager for Oracle Solaris Cluster and a Single Dual-Port HBA

Expected Failure Behavior with Solaris Volume Manager for Oracle Solaris Cluster

Failure Recovery with Solaris Volume Manager for Oracle Solaris Cluster

Kernel Cage DR Recovery

Preparing the Cluster for Kernel Cage DR

How to Recover From an Interrupted Kernel Cage DR Operation

7.  Campus Clustering With Oracle Solaris Cluster Software

8.  Verifying Oracle Solaris Cluster Hardware Redundancy

Index

Kernel Cage DR Recovery

When you perform a Dynamic Reconfiguration (DR) remove operation on a memory board with kernel cage memory, the affected node becomes unresponsive so heartbeat monitoring for that node is suspended on all other nodes and the node's quorum vote count is set to 0. After DR is completed, the heartbeat monitoring of the affected node is automatically re-enabled and the quorum vote count is reset to 1. If the DR operation does not complete, you might need to manually recover. For general information about DR, see Dynamic Reconfiguration Support in Oracle Solaris Cluster Concepts Guide.

The monitor-heartbeat subcommand is not supported in an exclusive-IP zone cluster. For more information about this command, see the cluster(1CL) man page.

Preparing the Cluster for Kernel Cage DR

When you use a DR operation to remove a system board containing kernel cage memory (memory used by the Oracle Solaris OS), the system must be quiesced in order to allow the memory contents to be copied to another system board. In a clustered system, the tight coupling between cluster nodes means that the quiescing of one node for repair can cause operations on non-quiesced nodes to be delayed until the repair operation is complete and the node is unquiesced. For this reason, using DR to remove a system board containing kernel cage memory from a cluster node requires careful planning and preparation.

Use the following information to reduce the impact of the DR quiesce on the rest of the cluster:

How to Recover From an Interrupted Kernel Cage DR Operation

If the DR operation does not complete, perform the following steps to re-enable heartbeat timeout monitoring for that node and to reset the quorum vote count.

  1. If DR does not complete successfully, manually re-enable heartbeat timeout monitoring.

    From a single cluster node (which is not the node where the DR operation was performed), run the following command.

    # cluster monitor-heartbeat

    Use this command only in the global zone. Messages display indicating that monitoring has been enabled.

  2. If the node that was dynamically reconfigured paused during boot, allow it to finish booting and join the cluster membership.

    If the node is at the ok prompt, boot it now.

  3. Verify that the node is now part of the cluster membership and check the quorum vote count of the cluster nodes by running the following command on a single node in the cluster.
    # clquorum status
    --- Quorum Votes by Node (current status) ---
    
    Node Name       Present       Possible       Status
    ---------       -------       --------       ------
    pnode1          1             1              Online
    pnode2          1             1              Online
    pnode3          0             0              Online
  4. If one of the nodes has a vote count of 0, reset its vote count to 1 by running the following command on a single node in the cluster.
    # clquorum votecount -n nodename 1
    nodename

    The hostname of the node that has a quorum vote count of 0.

  5. Verify that all nodes now have a quorum vote count of 1.
    # clquorum status
    --- Quorum Votes by Node (current status) ---
    
    Node Name       Present       Possible       Status
    ---------       -------       --------       ------
    pnode1          1             1              Online
    pnode2          1             1              Online
    pnode3          1             1              Online