6 Working With Quorum Devices

A quorum device acts as a third-party arbitrator in the event where standard quorum rules might not adequately cater for node failure. A quorum device is typically used where there may be an even number of nodes in a cluster. For example, in a cluster that contains two nodes failure of the nodes to communicate can result in a split-brain issue where both nodes function as primary at the same time, which results in possible data corruption. By using a quorum device, quorum arbitration can be achieved and a selected node survives.

A quorum device is a service that ideally runs on a separate physical network to the cluster itself. It should run on a system that's not a node in the cluster. Although the quorum device can service multiple clusters at the same time, it should be the only quorum device for each cluster that it serves. Each node in the cluster is configured for the quorum device. The quorum device is installed and run as a network bound service on a system outside of the cluster network.

Installing and Enabling a Quorum Device

Installation of the quorum device requires that you install the pcs and corosync-qnetd packages on the system where you intend to run the quorum device service and then install the corosync-qdevice package on each of the nodes in the existing cluster.

  1. On the system assigned to run the quorum device service, run:
    sudo dnf install -y pcs corosync-qnetd
  2. Enable and start the systemd pcsd service by running:
    sudo systemctl enable --now pcsd
  3. If you're running a firewall on the quorum device service host, you must open the firewall ports to allow the host to communicate with the cluster. For example, run:
    sudo firewall-cmd --permanent --add-service=high-availability
    sudo systemctl restart firewalld
  4. On the quorum device service host, enable and start the quorum device service by setting the Pacemaker configuration for the node to use the net model. Run:
    sudo pcs qdev setup model net --enable --start

    This command creates a configuration for the host and names the node qdev. It sets the model to net and enables and starts the node. The command triggers the corosync-qnetd daemon to load and run at boot.

  5. On each of the nodes within the existing cluster, install the corosync-qdevice package by running:
    sudo dnf install -y corosync-qdevice

Configuring the Cluster for a Quorum Device

The node running the quorum device service must be authenticated to the rest of the cluster and must then be added to the cluster. When you add the quorum device service node, you can set configuration options such as which algorithm to use to determine quorum. After the quorum device is added to the cluster you can verify the quorum device status to check that the device is functioning correctly.

  1. Authenticate the quorum device service node to the cluster. On a node within the existing cluster, to authenticate the node named qdev, run:
    sudo pcs host auth qdev

    You're prompted for the cluster username and password.

  2. Check that no quorum device is already configured for the cluster. A cluster must never have more than one quorum device configured. On a node within the existing cluster, run:
    sudo pcs quorum status
    Note that the output includes membership information:
    ...
    Membership information
    ----------------------
        Nodeid      Votes    Qdevice Name
             1          1         NR node1 (local)
             2          1         NR node2
    Under the Qdevice column, the value NR is displayed. The NR value indicates that no quorum devices are registered with any of the nodes within the cluster. If any other value is displayed, don't proceed with adding another quorum device to the cluster without removing the existing device first.
  3. Add the quorum device to the cluster. On one of the nodes within the existing cluster, run:
    sudo pcs quorum device add model net host=qdev algorithm=ffsplit
    Note that you specify the host to match the host where you're running the quorum device service, in this case named qdev; and the algorithm that you want to use to determine quorum, in this case ffsplit.

    Algorithm options are:

    • ffsplit: is a fifty-fifty split algorithm that favors the partition with the highest number of active nodes in the cluster.
    • lms: is a last-man-standing algorithm that returns a vote for the nodes that are still able to connect to the quorum device service node. If a single node is still active and it can connect to the quorum device service, the cluster remains quorate. If none of the nodes can connect to the quorum device service and any one node loses connection with the rest of the cluster, the cluster becomes inquorate.

    See the corosync-qdevice(8) manual page for more information.

  4. Verify that the quorum device is configured within the cluster. On any node in the existing cluster, run:
    sudo pcs quorum config
    The output displays that a quorum device is configured and indicates the algorithm that is in use:
    Options:
    Device:
      Model: net
        algorithm: ffsplit
        host: qdev
    You can also query the quorum status for the cluster by running:
    sudo pcs quorum status
    The output displays the quorum status.
    Quorum information
    ------------------
    Date:             Fri Jul 15 14:19:07 2022
    Quorum provider:  corosync_votequorum
    Nodes:            2
    Node ID:          1
    Ring ID:          1/8272
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   3
    Highest expected: 3
    Total votes:      3
    Quorum:           2
    Flags:            Quorate Qdevice
    
    Membership information
    ----------------------
        Nodeid      Votes    Qdevice Name
             1          1    A,V,NMW node1 (local)
             2          1    A,V,NMW node2
             0          1            Qdevice
    Note that the membership information now displays values A,V,NMW for the Qdevice field. Values for this field can be equal to any of the following:
    • A/NA: indicates that the quorum device is alive or not alive to each node in the cluster.
    • V/NV: indicates whether the quorum device has provided a vote to a node. In the case where the cluster is split, one node would be set to V and the other to NV.
    • MW/NMW: indicates whether the quorum device master_wins flag is set. Any node with an active quorum device that also has the master_wins flag set becomes quorate regardless of the node votes of the cluster. By default the option is unset.

Managing Quorum Devices

The quorum device service must be managed from the host system where the quorum device service is running.

Quorum configuration for the cluster and the configuration of the quorum device on the cluster nodes is performed by running operations on any of the nodes within the cluster itself.

Controlling the Quorum Device Service

You can perform various operations to directly control the quorum device service. Commands that control the quorum device service must be run on the host where the quorum device service is running.

  • To view the full status for the service, run:
    sudo pcs qdevice status net --full
    Output similar to the following is displayed:
    QNetd address:                  *:5403
    TLS:                            Supported (client certificate required)
    Connected clients:              2
    Connected clusters:             1
    Maximum send/receive size:      32768/32768 bytes
    Cluster "test":
        Algorithm:          ffsplit
        Tie-breaker:        Node with lowest node ID
        Node ID 2:
            Client address:         ::ffff:192.168.2.25:33526
            HB interval:            8000ms
            Configured node list:   1, 2
            Ring ID:                1.16
            Membership node list:   1, 2
            TLS active:             Yes (client certificate verified)
            Vote:                   ACK (ACK)
        Node ID 1:
            Client address:         ::ffff:192.168.2.26:48786
            HB interval:            8000ms
            Configured node list:   1, 2
            Ring ID:                1.16
            Membership node list:   1, 2
            TLS active:             Yes (client certificate verified)
            Vote:                   ACK (ACK)
  • To start the service, run:
    sudo pcs qdevice start net
  • To stop the service, run:
    sudo pcs qdevice stop net
  • To enable the service so that it runs at boot time, run:
    sudo pcs qdevice enable net
  • To disable the service to prevent it from restarting at boot, run:
    sudo pcs qdevice disable net
  • To force the service to stop if the normal stop process is not working, run:
    sudo pcs qdevice kill net

Updating Quorum Device Settings

The quorum device can be updated in the cluster configuration at any time. Modifications to the quorum device configuration must be performed on a node within the cluster. Typically modifications to the quorum device involve changing the algorithm, however you can modify other options that are available for a quorum device in the same way.

To update the algorithm used for the quorum device, run:

sudo pcs quorum device update model algorithm=lms

The example changes the algorithm to use the lms or last-man-standing algorithm.

Note:

You can't update the host for a quorum device. You must remove the device and add it back into the cluster if you need to change the host.

Removing the Quorum Device From the Cluster

To remove the quorum device from the cluster, run the following command on a node within the cluster:
sudo pcs quorum device remove

Removing the quorum device updates the cluster configuration to remove any configuration entries for the quorum device, reloads the cluster configuration into the cluster and then disables and stops the quorum device on each node.

Because you might use the same quorum device service across multiple clusters, removing the quorum device from the cluster doesn't affect the quorum device service in any way. The service continues to run on the service host, but no longer serves the cluster where it has been removed.

Destroying the Quorum Device Service

You can destroy the quorum device service on the host where the service is running. This action stops the service and removes any configuration for the service from the host.
sudo pcs qdevice destroy net

Note:

Remove the quorum device from any clusters that it services before destroying the quorum device service.