JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle® ZFS Storage Appliance Administration Guide
Oracle Technology Network
Library
PDF
Print View
Feedback
search filter icon
search icon

Document Information

Using This Documentation

Chapter 1 Oracle ZFS Storage Appliance Overview

Chapter 2 Status

Chapter 3 Initial Configuration

Chapter 4 Network Configuration

Chapter 5 Storage Configuration

Chapter 6 Storage Area Network Configuration

Chapter 7 User Configuration

Chapter 8 Setting ZFSSA Preferences

Chapter 9 Alert Configuration

Chapter 10 Cluster Configuration

Cluster Features and Benefits

Cluster Disadvantages

Cluster Terminology

Understanding Clustering

Cluster Interconnect I/O

Understanding Cluster Resource Management

Cluster Takeover and Failback

Configuration Changes in a Clustered Environment

Clustering Considerations for Storage

Clustering Considerations for Networking

Private Local IP Interfaces

Clustering Considerations for Infiniband

Clustering Redundant Path Scenarios

Preventing 'Split-Brain' Conditions

Estimating and Reducing Takeover Impact

Cluster Configuration Using the BUI

Configuring Clustering

Unconfiguring Clustering

Configuring Clustering Using the CLI

Shutting Down a Clustered Configuration

Shutdown the Stand-by Head

Unconfiguring Clustering

Cluster Node Cabling

ZS3-2 Cluster Cabling

ZS3-4 and 7x20 Cluster Cabling

Storage Shelf Cabling

Cluster Configuration BUI Page

Chapter 11 ZFSSA Services

Chapter 12 Shares, Projects, and Schema

Chapter 13 Replication

Chapter 14 Shadow Migration

Chapter 15 CLI Scripting

Chapter 16 Maintenance Workflows

Chapter 17 Integration

Index

Cluster Takeover and Failback

Clustered head nodes are in one of a small set of states at any given time:

Table 10-4  Cluster States
State
Icon
CLI/BUI Expression
Description
UNCONFIGURED
image:Status: Disabled
Clustering is not configured
A system that has no clustering at all is in this state. The system is either being set up or the cluster setup task has never been completed.
OWNER
image:Status: On
Active (takeover completed)
Clustering is configured, and this node has taken control of all shared resources in the cluster. A system enters this state immediately after cluster setup is completed from its user interface, and when it detects that its peer has failed (i.e. after a take-over). It remains in this state until an administrator manually executes a fail-back operation.
STRIPPED
image:Status: Off
Ready (waiting for failback)
Clustering is configured, and this node does not control any shared resources. A system is STRIPPED immediately after cluster setup is completed from the user interface of the other node, or following a reboot, power disconnect, or other failure. A node remains in this state until an administrator manually executes a fail-back operation.
CLUSTERED
image:Status: On
Active
Clustering is configured, and both nodes own shared resources according to their resource assignments. If each node owns a ZFS pool and is in the CLUSTERED state, then the two nodes form what is commonly called an active-active cluster.
-
image:Enable
Rejoining cluster ...
The appliance has recently rebooted, or the appliance management software is restarting after an internal failure. Resource state is being resynchronized.
-
Unknown (disconnected or restarting)
The peer appliance is powered off or rebooting, all its cluster interconnect links are down, or clustering has not yet been configured.

Transitions among these states take place as part of two operations: takeover and failback.

Takeover can occur at any time; as discussed above, takeover is attempted whenever peer failure is detected. It can also be triggered manually using the cluster configuration CLI or BUI. This is useful for testing purposes as well as to perform rolling software upgrades (upgrades in which one head is upgraded while the other provides service running the older software, then the second head is upgraded once the new software is validated). Finally, takeover will occur when a head boots and detects that its peer is absent. This allows service to resume normally when one head has failed permanently or when both heads have temporarily lost power.

Failback never occurs automatically. When a failed head is repaired and booted, it will rejoin the cluster (resynchronizing its view of all resources, their properties, and their ownership) and proceed to wait for an administrator to perform a failback operation. Until then, the original surviving head will continue to provide all services. This allows for a full investigation of the problem that originally triggered the takeover, validation of a new software revision, or other administrative tasks prior to the head returning to production service. Because failback is disruptive to clients, it should be scheduled according to business-specific needs and processes. There is one exception: Suppose that head A has failed and head B has taken over. When head A rejoins the cluster, it becomes eligible to take over if it detects that head B is absent or has failed. The principle is that it is always better to provide service than not, even if there has not yet been an opportunity to investigate the original problem. So while failback to a previously-failed head will never occur automatically, it may still perform takeover at any time.

When you set up a cluster, the initial state consists of the node that initiated the setup in the OWNER state and the other node in the STRIPPED state. After performing an initial failback operation to hand the STRIPPED node its portion of the shared resources, both nodes are CLUSTERED. If both cluster nodes fail or are powered off, then upon simultaneous startup they will arbitrate and one of them will become the OWNER and the other STRIPPED.

During failback all foreign resources (those assigned to the peer) are exported, then imported by the peer. A pool that cannot be imported because it is faulted will trigger reboot of the STRIPPED node. An attempt to failback with a faulted pool can reboot the STRIPPED node as a result of the import failure.