JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster Concepts Guide     Oracle Solaris Cluster 4.1
search filter icon
search icon

Document Information

Preface

1.  Introduction and Overview

2.  Key Concepts for Hardware Service Providers

3.  Key Concepts for System Administrators and Application Developers

Administrative Interfaces

Cluster Time

Campus Clusters

High-Availability Framework

Global Devices

Device IDs and DID Pseudo Driver

Zone Cluster Membership

Cluster Membership Monitor

Failfast Mechanism

Cluster Configuration Repository (CCR)

Device Groups

Device Group Failover

Device Group Ownership

Global Namespace

Local and Global Namespaces Example

Cluster File Systems

Using Cluster File Systems

HAStoragePlus Resource Type

syncdir Mount Option

Disk Path Monitoring

DPM Overview

Monitoring Disk Paths

Using the cldevice Command to Monitor and Administer Disk Paths

Using the clnode set Command to Manage Disk Path Failure

Quorum and Quorum Devices

About Quorum Vote Counts

About Quorum Configurations

Adhering to Quorum Device Requirements

Adhering to Quorum Device Best Practices

Recommended Quorum Configurations

Quorum in Two-Node Configurations

Quorum in Greater Than Two-Node Configurations

Load Limits

Data Services

Data Service Methods

Failover Data Services

Scalable Data Services

Load-Balancing Policies

Failback Settings

Data Services Fault Monitors

Developing New Data Services

Characteristics of Scalable Services

Data Service API and Data Service Development Library API

Using the Cluster Interconnect for Data Service Traffic

Resources, Resource Groups, and Resource Types

Resource Group Manager (RGM)

Resource and Resource Group States and Settings

Resource and Resource Group Properties

Support for Oracle Solaris Zones

Support for Zones on Cluster Nodes Through Oracle Solaris Cluster HA for Solaris Zones

Criteria for Using Oracle Solaris Cluster HA for Solaris Zones

Requirements for Using Oracle Solaris Cluster HA for Solaris Zones

Additional Information About Oracle Solaris Cluster HA for Solaris Zones

Service Management Facility

System Resource Usage

System Resource Monitoring

Control of CPU

Viewing System Resource Usage

Data Service Project Configuration

Determining Requirements for Project Configuration

Setting Per-Process Virtual Memory Limits

Failover Scenarios

Two-Node Cluster With Two Applications

Two-Node Cluster With Three Applications

Failover of Resource Group Only

Public Network Adapters and IP Network Multipathing

SPARC: Dynamic Reconfiguration Support

SPARC: Dynamic Reconfiguration General Description

SPARC: DR Clustering Considerations for CPU Devices

SPARC: DR Clustering Considerations for Memory

SPARC: DR Clustering Considerations for Disk and Tape Drives

SPARC: DR Clustering Considerations for Quorum Devices

SPARC: DR Clustering Considerations for Cluster Interconnect Interfaces

SPARC: DR Clustering Considerations for Public Network Interfaces

Index

SPARC: Dynamic Reconfiguration Support

Oracle Solaris Cluster supports the dynamic reconfiguration (DR) software feature. This section describes concepts and considerations for Oracle Solaris Cluster support of the DR feature.

All the requirements, procedures, and restrictions that are documented for the Oracle Solaris DR feature also apply to Oracle Solaris Cluster DR support (except for the operating environment quiescence operation). Therefore, review the documentation for the Oracle Solaris DR feature before using the DR feature with Oracle Solaris Cluster software. You should review in particular the issues that affect non-network IO devices during a DR detach operation.

DR implementation can be system dependent, and might be implemented differently as technology changes. For more information, see Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC Guide.

SPARC: Dynamic Reconfiguration General Description

The DR feature enables operations, such as the removal of system hardware, in running systems. The DR processes are designed to ensure continuous system operation with no need to halt the system or interrupt cluster availability.

DR operates at the board level. Therefore, a DR operation affects all the components on a board. Each board can contain multiple components, including CPUs, memory, and peripheral interfaces for disk drives, tape drives, and network connections.

Removing a board that contains active components would result in system errors. Before removing a board, the DR subsystem queries other subsystems, such as Oracle Solaris Cluster, to determine whether the components on the board are being used. If the DR subsystem finds that a board is in use, the DR remove-board operation is not done. Therefore, it is always safe to issue a DR remove-board operation because the DR subsystem rejects operations on boards that contain active components.

The DR add-board operation is also always safe. CPUs and memory on a newly added board are automatically brought into service by the system. However, the system administrator must manually configure the cluster to actively use components that are on the newly added board.


Note - The DR subsystem has several levels. If a lower level reports an error, the upper level also reports an error. However, when the lower level reports the specific error, the upper level reports Unknown error. You can safely ignore this error.


The following sections describe DR considerations for the different device types.

SPARC: DR Clustering Considerations for CPU Devices

Oracle Solaris Cluster software does not reject a DR remove-board operation because of the presence of CPU devices.

When a DR add-board operation succeeds, CPU devices on the added board are automatically incorporated in system operation.

SPARC: DR Clustering Considerations for Memory

For the purposes of DR, there are two types of memory:

These two types differ only in usage. The actual hardware is the same for both types. Kernel memory cage is the memory that is used by the Oracle Solaris Operating System. Careful consideration must be taken before performing a DR remove-board operation which will impact kernel memory cage. Oracle Solaris Cluster software does not reject the operation, but in most cases, such a DR operation will have a significant impact on the entire cluster. The tight coupling between cluster nodes, between multiple instances of scalable applications, and between the primary and secondary nodes of HA applications and services means that the quiescing of one node for repair can cause operations on non-quiesced nodes to be delayed until the repair operation is complete and the node is unquiesced.

In most cases, the preferred method of removing or replacing a system board with kernel cage memory is to bring the node requiring repair down. This allows the remainder of the cluster to cleanly take over the duties of the node being repaired. Only when circumstances prevent the node requiring repair from being brought out of the cluster, should DR be used to remove or replace a system board with kernel cage memory while the node is still part of the operating cluster. For suggestions on preparing the cluster for a DR kernel cage remove-board operation, see Preparing the Cluster for Kernel Cage DR in Oracle Solaris Cluster 4.1 Hardware Administration Manual.

When a DR add-board operation that pertains to memory succeeds, memory on the added board is automatically incorporated in system operation.

If the node being repaired panics during the DR operation, or if the DR operation is otherwise interrupted, it may be necessary to manually re-enable heartbeat monitoring and reset the repaired node's quorum vote count. These two actions are normally done automatically at the completion of the DR operation to return the cluster to a stable state. For instructions on recovering in this case, see How to Recover From an Interrupted Kernel Cage DR Operation in Oracle Solaris Cluster 4.1 Hardware Administration Manual.

SPARC: DR Clustering Considerations for Disk and Tape Drives

Oracle Solaris Cluster rejects dynamic reconfiguration (DR) remove-board operations on active drives on the primary node. You can perform DR remove-board operations on inactive drives on the primary node and on any drives in the secondary node. After the DR operation, cluster data access continues as before.


Note - Oracle Solaris Cluster rejects DR operations that impact the availability of quorum devices. For considerations about quorum devices and the procedure for performing DR operations on them, see SPARC: DR Clustering Considerations for Quorum Devices.


See Dynamic Reconfiguration With Quorum Devices in Oracle Solaris Cluster System Administration Guide for detailed instructions about how to perform these actions.

SPARC: DR Clustering Considerations for Quorum Devices

If the DR remove-board operation pertains to a board that contains an interface to a device configured for quorum, Oracle Solaris Cluster software rejects the operation. Oracle Solaris Cluster software also identifies the quorum device that would be affected by the operation. You must disable the device as a quorum device before you can perform a DR remove-board operation.

See Chapter 6, Administering Quorum, in Oracle Solaris Cluster System Administration Guide for detailed instructions about how administer quorum.

SPARC: DR Clustering Considerations for Cluster Interconnect Interfaces

If the DR remove-board operation pertains to a board containing an active cluster interconnect interface, Oracle Solaris Cluster software rejects the operation. Oracle Solaris Cluster software also identifies the interface that would be affected by the operation. You must use an Oracle Solaris Cluster administrative tool to disable and remove the active interface before the DR operation can succeed.


Caution

Caution - Oracle Solaris Cluster software requires each cluster node to have at least one functioning path to every other cluster node. Do not disable a private interconnect interface that supports the last path to any node in the cluster.


See Administering the Cluster Interconnects in Oracle Solaris Cluster System Administration Guide for detailed instructions about how to perform these actions.

SPARC: DR Clustering Considerations for Public Network Interfaces

If the DR remove-board operation pertains to a board that contains an active public network interface, Oracle Solaris Cluster software rejects the operation. Oracle Solaris Cluster software also identifies the interface that would be affected by the operation. Before you remove a board with an active network interface present, switch over all traffic on that interface to another functional interface in the multipathing group by using the if_mpadm command.


Caution

Caution - If the remaining network adapter fails while you are performing the DR remove operation on the disabled network adapter, availability is impacted. The remaining adapter has no place to fail over for the duration of the DR operation.


See Administering the Public Network in Oracle Solaris Cluster System Administration Guide for detailed instructions about how to perform a DR remove operation on a public network interface.