Chapter 1 Oracle ZFS Storage Appliance Overview
Chapter 3 Initial Configuration
Chapter 4 Network Configuration
Chapter 5 Storage Configuration
Chapter 6 Storage Area Network Configuration
Chapter 8 Setting ZFSSA Preferences
Chapter 10 Cluster Configuration
Understanding Cluster Resource Management
Configuration Changes in a Clustered Environment
Clustering Considerations for Storage
Clustering Considerations for Networking
Clustering Considerations for Infiniband
Clustering Redundant Path Scenarios
Preventing 'Split-Brain' Conditions
Estimating and Reducing Takeover Impact
Cluster Configuration Using the BUI
Configuring Clustering Using the CLI
Shutting Down a Clustered Configuration
ZS3-4 and 7x20 Cluster Cabling
Cluster Configuration BUI Page
Chapter 12 Shares, Projects, and Schema
It is important to understand the scope of the Oracle ZFS Storage Appliance clustering implementation. The term 'cluster' is used in the industry to refer to many different technologies with a variety of purposes. We use it here to mean a metasystem comprised of two appliance heads and shared storage, used to provide improved availability in the case in which one of the heads succumbs to certain hardware or software failures. A cluster contains exactly two appliances or storage controllers, referred to for brevity throughout this document as heads. Each head may be assigned a collection of storage, networking, and other resources from the set available to the cluster, which allows the construction of either of two major topologies. Many people use the terms active-active to describe a cluster in which there are two (or more) storage pools, one of which is assigned to each head along with network resources used by clients to reach the data stored in that pool, and active-passive to refer to which a single storage pool is assigned to the head designated as active along with its associated network interfaces. Both topologies are supported by the Oracle ZFS Storage Appliance. The distinction between these is artificial; there is no software or hardware difference between them and one can switch at will simply by adding or destroying a storage pool. In both cases, if a head fails, the other (its peer) will take control of all known resources and provide the services associated with those resources.
As an alternative to incurring hours or days of downtime while the head is repaired, clustering allows a peer appliance to provide service while repair or replacement is performed. In addition, clusters support rolling upgrade of software, which can reduce the business disruption associated with migrating to newer software. Some clustering technologies have certain additional capabilities beyond availability enhancement; the Oracle ZFS Storage Appliance clustering subsystem was not designed to provide these. In particular, it does not provide for load balancing among multiple heads, improve availability in the face of storage failure, offer clients a unified filesystem namespace across multiple appliances, or divide service responsibility across a wide geographic area for disaster recovery purposes. These functions are likewise outside the scope of this document; however, the Oracle ZFS Storage Appliance and the data protocols it offers support numerous other features and strategies that can improve availability:
Chapter 13, Replication of data, which can be used for disaster recovery at one or more geographically remote sites,
Client-side mirroring of data, which can be done using redundant iSCSI LUNs provided by multiple arbitrarily located storage servers,
Load balancing, which is built into the NFS protocol and can be provided for some other protocols by external hardware or software (applies to read-only data),
Redundant hardware components including power supplies, network devices, and storage controllers,
Problems in Oracle ZFS Storage Appliance Customer Service Manual software that can identify failed components, remove them from service, and guide technicians to repair or replace the correct hardware,
Network fabric redundancy provided by LACP and IPMP functionality, and
Redundant storage devices (RAID).
Additional information about other availability features can be found in the appropriate sections of this document.