Volume reconfiguration is the process of creating, changing, and removing the volume manager objects in the configuration (such as disk groups, volumes, mirrors, and so forth.). In a cluster, this process is performed with the cooperation of all nodes. Volume reconfiguration is distributed to all nodes; identical configuration changes occur on all nodes simultaneously.
The vxconfigd daemons play an active role in volume reconfiguration. For the reconfiguration to succeed, vxconfigd must be running on all nodes.
The reconfiguration is initiated and coordinated by the initiating node, which is the node on which the system administrator is running the utility that is requesting changes to volume manager objects.
The utility on the initiating node contacts its local vxconfigd daemon, which performs some local checking to make sure that a requested change is reasonable. For instance, it will fail an attempt to create a new disk group when one with the same name already exists. vxconfigd on the initiating node then sends messages with the details of the changes to the vxconfigd daemons on all other nodes in the cluster. The vxconfigds on each of the non-initiating nodes then perform their own checking. For example, a non-initiating node checks that it does not have a private disk group with the same name as the one being created; if the operation involves a new disk, each node checks that it can access that disk. When all of the vxconfigds on all the nodes agree that the proposed change is reasonable, each vxconfigd notifies its kernel and the kernels then cooperate to either commit or abort the transaction. Before the transaction can commit, all of the kernels ensure that no I/O is underway. The master is responsible for initiating a reconfiguration and coordinating the transaction commit.
If vxconfigd on any node goes away during a reconfiguration process, all nodes will be notified and the operation will fail. If any node leaves the cluster, the operation will fail unless the master has already committed the operation. If the master leaves the cluster, the new master (which was a slave previously) either completes or fails the operation. This depends on whether or not it received notification of successful completion from the previous master. This notification is done in such a way that if the new master did not receive it, neither did any other slave.
If a node attempts to join the cluster while a volume reconfiguration is being performed, the results depend on how far the reconfiguration has progressed. If the kernel is not yet involved, the volume reconfiguration is suspended and restarted when the join is complete. If the kernel is involved, the join waits until the reconfiguration is complete.
When an error occurs (such as when a check on a slave fails or a node leaves the cluster), the error is returned to the utility and a message is issued to the console on the initiating node to identify the node on which the error occurred.