Chapter 1 Oracle ZFS Storage Appliance Overview
Chapter 3 Initial Configuration
Chapter 4 Network Configuration
Chapter 5 Storage Configuration
Chapter 6 Storage Area Network Configuration
Chapter 8 Setting ZFSSA Preferences
Chapter 10 Cluster Configuration
Understanding Cluster Resource Management
Configuration Changes in a Clustered Environment
Clustering Considerations for Storage
Clustering Considerations for Networking
Clustering Considerations for Infiniband
Clustering Redundant Path Scenarios
Preventing 'Split-Brain' Conditions
Estimating and Reducing Takeover Impact
Cluster Configuration Using the BUI
Configuring Clustering Using the CLI
Shutting Down a Clustered Configuration
ZS3-4 and 7x20 Cluster Cabling
Cluster Configuration BUI Page
Chapter 12 Shares, Projects, and Schema
All inter-head communication consists of one or more messages transmitted over one of the three cluster I/O links provided by the CLUSTRON hardware (see illustration below). This device offers two low-speed serial links and one Ethernet link. The use of serial links allows for greater reliability; Ethernet links may not be serviced quickly enough by a system under extremely heavy load. False failure detection and unwanted takeover are the worst way for a clustered system to respond to load; during takeover, requests will not be serviced and will instead be enqueued by clients, leading to a flood of delayed requests after takeover in addition to already heavy load. The serial links used by the Oracle ZFS Storage Appliances are not susceptible to this failure mode. The Ethernet link provides a higher-performance transport for non-heartbeat messages such as rejoin synchronization and provides a backup heartbeat.
All three links are formed using ordinary straight-through EIA/TIA-568B (8-wire, Gigabit Ethernet) cables. To allow for the use of straight-through cables between two identical controllers, the cables must be used to connect opposing sockets on the two connectors as shown below in the section on cabling.
Figure 10-2 ZS3-2 Controller Cluster I/O Ports
|
Figure 10-3 ZS3-4 and 7x20 Controller Cluster I/O Ports
Figure 2. ZS3-4 and 7x20 controller cluster I/O ports
|
Clustered heads only communicate with each other over the secure private network established by the cluster interconnects, and never over network interfaces intended for service or administration. Messages fall into two general categories: regular heartbeats used to detect the failure of a remote head, and higher-level traffic associated with the resource manager and the cluster management subsystem. Heartbeats are sent, and expected, on all three links; they are transmitted continuously at fixed intervals and are never acknowledged or retransmitted as all heartbeats are identical and contain no unique information. Other traffic may be sent over any link, normally the fastest available at the time of transmission, and this traffic is acknowledged, verified, and retransmitted as required to maintain a reliable transport for higher-level software.
Regardless of its type or origin, every message is sent as a single 128-byte packet and contains a data payload of 1 to 68 bytes and a 20-byte verification hash to ensure data integrity. The serial links run at 115200 bps with 9 data bits and a single start and stop bit; the Ethernet link runs at 1Gbps. Therefore the effective message latency on the serial links is approximately 12.2ms. Ethernet latency varies greatly; while typical latencies are on the order of microseconds, effective latencies to the appliance management software can be much higher due to system load.
Normally, heartbeat messages are sent by each head on all three cluster I/O links at 50ms intervals. Failure to receive any message is considered link failure after 200ms (serial links) or 500ms (Ethernet links). If all three links have failed, the peer is assumed to have failed; takeover arbitration will be performed. In the case of a panic, the panicking head will transmit a single notification message over each of the serial links; its peer will immediately begin takeover regardless of the state of any other links. Given these characteristics, the clustering subsystem normally can detect that its peer has failed within:
550ms, if the peer has stopped responding or lost power, or
30ms, if the peer has encountered a fatal software error that triggered an operating system panic.
All of the values described in this section are fixed; as an appliance, the Oracle ZFS Storage Appliance does not offer the ability (nor is there any need) to tune these parameters. They are considered implementation details and are provided here for informational purposes only. They may be changed without notice at any time.