Clustering Considerations for Storage
When sizing Oracle ZFS Storage Appliance for use in a cluster configuration, consider the following points:
- 
                     
                     Whether all pools are owned by the same controller, or pools are split between the two controllers. 
- 
                     
                     Whether you want pools with no single point of failure (NSPF). 
Assigning storage pool ownership - Perhaps the most important decision is whether all storage pools will be assigned ownership to the same controller, or split between the two controllers. There are several trade-offs to consider, as shown in Clustering Considerations for Storage.
Generally, pools should be configured on a single controller except when optimizing for throughput during nominal operation or when failed-over performance is not a consideration. The exact changes in performance characteristics when a controller is in the failed-over state will depend on the nature and size of the workloads. Generally, the closer a controller is to providing maximum performance on any particular axis, the greater the performance degradation along that axis when the workload is taken over by that controller's peer. Of course, in the multiple pool case, this degradation will apply to both workloads.
Read cache devices are located in the controller or disk shelf, depending on your configuration.
Read cache devices, located in a controller slot (internal L2ARC), do not follow data pools in takeover or failback situations. A read cache device is only active in a particular cluster node when the pool that is assigned to the read cache device is imported on the node where the device resides. Unless additional configuration steps are taken, read cache will not be available for a pool that has migrated due to a failover event. In order to enable a read cache device for a pool that is not owned by the cluster peer, take over the pool on the non-owning node, and then add storage and select the cache devices for configuration. Read cache devices in a cluster node should be configured as described in the Configuring Storage. Write-optimized log devices are located in the storage fabric and are always accessible to whichever controller has imported the pool.
If read cache devices are located in a disk shelf (external L2ARC), read cache is always available. During a failback or takeover operation, read cache remains sharable between controllers. In this case, read performance is sustained. For external read cache configuration details, see Disk Shelf Configurations in Oracle ZFS Storage Appliance Customer Service Manual, Release OS8.8.x.
Configuring NSPF - A second important consideration for storage is the use of pool configurations with no single point of failure (NSPF). Since the use of clustering implies that the application places a high premium on availability, there is seldom a good reason to configure storage pools in a way that allows the failure of a single disk shelf to cause loss of availability. The downside to this approach is that NSPF configurations require a greater number of disk shelves than do configurations with a single point of failure. When the required capacity is very small, installation of enough disk shelves to provide for NSPF at the desired RAID level might not be economical.
The following table describes storage pool ownership for cluster configurations.
Table 2-4 Clustering Considerations for Storage Pools
| Variable | Single Controller Pool Ownership | Multiple Pools Owned by Different Controllers | 
|---|---|---|
| Total throughput (nominal operation) | Up to 50% of total CPU resources, 50% of DRAM, and 50% of total network connectivity can be used to provide service at any one time. This is straightforward: only a single controller is ever servicing client requests, so the other is idle. | All CPU and DRAM resources can be used to provide service at any one time. Up to 50% of all network connectivity can be used at any one time (dark network devices are required on each controller to support failover). | 
| Total throughput (failed over) | No change in throughput relative to nominal operation. | 100% of the surviving controller's resources will be used to provide service. Total throughput relative to nominal operation may range from approximately 40% to 100%, depending on utilization during nominal operation. | 
| I/O latency | Internal read cache is not available during a failback or takeover operation, which can significantly increase latencies for read-heavy workloads that fit into available read cache. Latency of write operations is unaffected. With external read cache configurations (EL2ARC), read performance is unaffected. Read cache is shared between cluster peers during a failback or takeover operation, resulting in no read latency. | Internal read cache is not available during a failback or takeover operation, which can significantly increase latencies for read-heavy workloads that fit into available read cache. Latency of both read and write operations may be increased due to greater contention for controller resources. This is caused by running two workloads on the surviving controller instead of the usual one. When nominal workloads on each controller approach the controller's maximum capabilities, latencies in the failed-over state may be extremely high. With external read cache configurations (EL2ARC), read performance is unaffected. Read cache is shared between cluster peers during a failback or takeover operation, resulting in no read latency. | 
| Storage flexibility | All available physical storage can be used by shares and LUNs. | Only the storage allocated to a particular pool can be used by that pool's shares and LUNs. Storage is not shared across pools, so if one pool fills up while the other has free space, some storage may be wasted. | 
| Network connectivity | All network devices in each controller can be used while that controller is providing service. | Only half of all network devices in each controller can be used while that controller is providing service. Therefore each pool can be connected to only half as many physically disjoint networks. | 
Related Topics