Sun Cluster 3.0 12/01 Concepts

Dynamic Reconfiguration Support

Sun Cluster 3.0 support for the dynamic reconfiguration (DR) software feature is being developed in incremental phases. This section describes concepts and considerations for Sun Cluster 3.0 12/01 support of the DR feature.

Note that all of the requirements, procedures, and restrictions that are documented for the Solaris 8 DR feature also apply to Sun Cluster DR support (except for the operating environment quiescence operation). Therefore, review the documentation for the Solaris 8 DR feature before using the DR feature with Sun Cluster software. You should review in particular the issues that affect non-network IO devices during a DR detach operation. The Sun Enterprise 10000 Dynamic Reconfiguration User Guide and the Sun Enterprise 10000 Dynamic Reconfiguration Reference Manual (from the Solaris 8 on Sun Hardware collection) are both available for download from http://docs.sun.com.

Dynamic Reconfiguration General Description

The DR feature allows operations, such as the removal of system hardware, in running systems. The DR processes are designed to ensure continuous system operation with no need to halt the system or interrupt cluster availability.

DR operates at the board level. Therefore, a DR operation affects all of the components on a board. Each board can contain multiple components, including CPUs, memory, and peripheral interfaces for disk drives, tape drives, and network connections.

Removing a board terminates the system's ability to use any of the components on the board. Before removing a board, the DR subsystem determines whether the components on the board are being used. Removing a device that is being used would result in system errors. If the DR subsystem finds that a device is in use, this subsystem rejects the DR remove-board operation. Therefore, it is always safe to issue a DR remove-board operation.

The DR add-board operation is always safe also. CPUs and memory on a newly added board are automatically brought into service by the system. However, the system administrator must manually configure the cluster in order to actively use other components that are on the newly added board.


Note -

The DR subsystem has several levels. If a lower level reports an error, the upper level also reports an error. However, when the lower level reports the specific error, the upper level will report "Unknown error." System administrators should ignore the "Unknown error" reported by the upper level.


The following sections describe DR considerations for the different device types.

DR Clustering Considerations for CPU Devices

When a DR remove-board operation affects CPUs on the board, the DR subsystem allows the operation and automatically makes the node stop using these CPUs.

When a DR add-board operation affects CPUs on the added board, the DR subsystem automatically makes the node start using these CPUs.

DR Clustering Considerations for Memory

For the purposes of DR, there are two types of memory to consider. These two types differ only in usage. The actual hardware is the same for both types.

The memory used by the operating system is called the kernel memory cage. Sun Cluster software does not support remove-board operations on a board that contains the kernel memory cage and will reject any such operation. When a DR remove-board operation affects memory other than the kernel memory cage, the DR subsystem allows the operation and automatically makes the node stop using that memory.

When a DR add-board operation affects memory, the DR subsystem automatically makes the node start using the new memory.

DR Clustering Considerations for Disk and Tape Drives

DR remove operations on active drives in the primary node are not allowed. DR remove operations can be performed on non-active drives in the primary node and on drives in the secondary node. Cluster data access continues both before and after the DR operation.


Note -

DR operations that affect the availability of quorum devices are not allowed. For considerations about quorum devices and the procedure for performing DR operations on them, see "DR Clustering Considerations for Quorum Devices".


The following steps describe a brief summary of the procedure for performing a DR remove operation on a disk or tape drive. See the Sun Cluster 3.0 U1 System Administration Guide for detailed instructions on how to perform these actions.

  1. Determine whether the disk or tape drive is part of an active device group.

    • If the drive is not part of an active device group, you can perform the DR remove operation on it.

    • If the DR remove-board operation would affect an active disk or tape drive, the system rejects the operation and identifies the drives that would be affected by the operation. If the drive is part of an active device group, go to Step 2.

  2. Determine whether the drive is a component of the primary node or the secondary node.

    • If the drive is a component of the secondary node, you can perform the DR remove operation on it.

    • If the drive is a component of the primary node, you must switch the primary and secondary nodes before performing the DR remove operation on the device.


Caution - Caution -

If the current primary node fails while you are performing the DR operation on a secondary node, cluster availability is impacted. The primary node has no place to fail over until a new secondary node is provided.


DR Clustering Considerations for Quorum Devices

DR remove operations cannot be performed on a device that is currently configured as a quorum device. If the DR remove-board operation would affect a quorum device, the system rejects the operation and identifies the quorum device that would be affected by the operation. You must disable the device as a quorum device before you can perform a DR remove operation on it.

The following steps describe a brief summary of the procedure for performing a DR remove operation on a quorum device. See the Sun Cluster 3.0 U1 System Administration Guide for detailed instructions on how to perform these actions.

  1. Enable a device other than the one you are performing the DR operation on to be the quorum device.

  2. Disable the device you are performing the DR operation on as a quorum device.

  3. Perform the DR remove operation on the device.

DR Clustering Considerations for Private Interconnect Interfaces

DR operations cannot be performed on active private interconnect interfaces. If the DR remove-board operation would affect an active private interconnect interface, the system rejects the operation and identifies the interface that would be affected by the operation. An active interface must first be disabled before you remove it (also see the caution below). When an interface is replaced to the private interconnect, its state remains the same, avoiding any need for additional Sun Cluster reconfiguration steps.

The following steps describe a brief summary of the procedure for performing a DR remove operation on a private interconnect interface. See the Sun Cluster 3.0 U1 System Administration Guide for detailed instructions on how to perform these actions.


Caution - Caution -

Sun Cluster requires that each cluster node has at least one functioning path to every other cluster node. Do not disable a private interconnect interface that supports the last path to any cluster node.


  1. Disable the transport cable that contains the interconnect interface upon which you are performing the DR operation.

  2. Perform the DR remove operation on the physical private interconnect interface.

DR Clustering Considerations for Public Network Interfaces

DR remove operations can be performed on public network interfaces that are not active. If the DR remove-board operation would affect an active public network interface, the system rejects the operation and identifies the interface that would be affected by the operation. Any active public network interface must first be removed from the status of being an active adapter instance in a network adapter fail over (NAFO) group.


Caution - Caution -

If the active network adapter fails while you are performing the DR remove operation on the disabled network adapter, availability is impacted. The active adapter has no place to fail over for the duration of the DR operation.


The following steps describe a brief summary of the procedure for performing a DR remove operation on a public network interface. See the Sun Cluster 3.0 U1 System Administration Guide for detailed instructions on how to perform these actions.

  1. Switch the active adapter to be a backup adapter so that it can be removed from the NAFO group.

  2. Remove the adapter from the NAFO group.

  3. Perform the DR operation on the public network interface.