Configuration Changes in a Clustered Environment

The majority of appliance configuration is represented as either service properties or share/LUN properties. While share and LUN properties are stored with the user data on the storage pool itself and thus are always accessible to the current owner of that storage resource, service configuration is stored within each controller. To ensure that both controllers provide coherent service, all service properties must be synchronized when a change occurs or a controller that was previously down rejoins with its peer. Since all services are represented by replica resources, this synchronization is performed automatically by the appliance software any time a property is changed on either controller.

It is therefore unnecessary and redundant for administrators to replicate configuration changes. Standard operating procedures should reflect this attribute and call for making changes to only one of the two controllers once initial cluster configuration has been completed. The process of initial cluster configuration will replicate all existing configuration onto the newly-configured peer.

The following are best practices for clustered configuration changes:

Make all storage and network configuration changes on the controller that currently controls (or will control, if a new resource is being created) the underlying storage or network interface resources.
Make all other changes on either controller, but not both. The controller that you specify as the primary controller should depend on which of the controllers is functioning and the number of storage pools that have been configured.

Oracle ZFS Storage Appliance has no mechanism for making independent changes to system configuration on each controller. This simplification alleviates the need for centralized configuration repositories. The controller that is currently operating is assumed to have the correct configuration, and its peer will be synchronized to it when booting. The peer will adopt a set of configuration parameters that are already in use by an existing production system and are therefore highly likely to be correct. Best practice is to ensure that a failed controller rejoins the cluster as soon as it is repaired.