Configuration Changes in a Clustered Environment

Language:

The vast majority of appliance configuration is represented as either service properties or share/LUN properties. While share and LUN properties are stored with the user data on the storage pool itself (and thus are always accessible to the current owner of that storage resource), service configuration is stored within each controller. To ensure that both controllers provide coherent service, all service properties must be synchronized when a change occurs or a controller that was previously down rejoins with its peer. Since all services are represented by replica resources, this synchronization is performed automatically by the appliance software any time a property is changed on either controller.

It is therefore unnecessary and redundant for administrators to replicate configuration changes. Standard operating procedures should reflect this attribute and call for making changes to only one of the two controllers once initial cluster configuration has been completed. Note as well that the process of initial cluster configuration will replicate all existing configuration onto the newly-configured peer. Generally, then, we derive two best practices for clustered configuration changes:

Make all storage- and network-related configuration changes on the controller that currently controls (or will control, if a new resource is being created) the underlying storage or network interface resources.
Make all other changes on either controller, but not both. Site policy should specify which controller is to be considered the master for this purpose, and should in turn depend on which of the controllers is functioning and the number of storage pools that have been configured. Note that the appliance software does not make this distinction.

The problem of amnesia, in which disjoint configuration changes are made and subsequently lost on each controller while its peer is not functioning, is largely overstated. This is especially true of Oracle ZFS Storage Appliance, in which no mechanism exists for making independent changes to system configuration on each controller. This simplification largely alleviates the need for centralized configuration repositories and argues for a simpler approach: whichever controller is currently operating is assumed to have the correct configuration, and its peer will be synchronized to it when booting. While future product enhancements may allow for selection of an alternate policy for resolving configuration divergence, this basic approach offers simplicity and ease of understanding: the second controller will adopt a set of configuration parameters that are already in use by an existing production system (and are therefore highly likely to be correct). To ensure that this remains true, administrators should ensure that a failed controller rejoins the cluster as soon as it is repaired.

Related Topics

Shutting Down a Clustered Configuration (CLI)