16.4 Fencing Configuration

Fencing or stonith is used to protect data when nodes become unresponsive. If a node fails to respond, it may still be accessing data. To be sure that your data is safe, you can use fencing to prevent a live node from having access to the data until the original node is truly offline. To do this, you must configure a device that can ensure that a node is taken offline. There are a number of fencing agents available that can be configured for this purpose. In general, stonith relies on particular hardware and service protocols that can force reboot or shutdown nodes physically to protect the cluster.

In this section, different configurations using some of the available fencing agents are presented as examples. Note that these examples make certain presumptions about hardware and assume that you are already aware of how to set up, configure and use the hardware concerned. The examples are provided for basic guidance and it is recommended that you also refer to upstream documentation to familiarize yourself with some of the concepts presented here.

Before proceeding with any of these example configurations, you must ensure that stonith is enabled for your cluster configuration:

# pcs property set stonith-enabled=true

After you have configured stonith, you can check your configuration to ensure that it is set up correctly by running the following commands:

# pcs stonith show --full
# pcs cluster verify -V

To check the status of your stonith configuration, run:

# pcs stonith

To check the status of your cluster, run:

# pcs status

IPMI LAN Fencing

Intelligent Platform Management Interface (IPMI) is an interface to a subsystem that provides management features of the host system's hardware and firmware and includes facilities to power cycle a system over a dedicated network without any requirement to access the system's operating system. The fence_ipmilan fencing agent can be configured for the cluster so that stonith can be achieved across the IPMI LAN.

If your systems are configured for IPMI, you can run the following commands on one of the nodes in the cluster to enable the ipmilan fencing agent and to configure stonith for both nodes:

# pcs stonith create ipmilan_n1_fencing fence_ipmilan pcmk_host_list=node1 delay=5 \
ipaddr=203.0.113.1 login=root passwd=password lanplus=1 op monitor interval=60s
# pcs stonith create ipmilan_n2_fencing fence_ipmilan pcmk_host_list=node2 \
ipaddr=203.0.113.2 login=root passwd=password lanplus=1 op monitor interval=60s

In the above example, the host named node1 has an IPMI LAN interface configured on the IP 203.0.113.1. The host named node2 has an IPMI LAN interface configured on the IP 203.0.113.2. The root user password for the IPMI login on both systems is specified here as password. In each instance, you should replace these configuration variables with the appropriate information to match your own environment.

Note that the delay option should only be set to one node. This helps to ensure that in the rare case of a fence race condition only one node is killed and the other continues to run. Without this option set, it is possible that both nodes believe they are the only surviving node and simultaneously reset each other.

Warning

The IPMI LAN agent exposes the login credentials of the IPMI subsystem in plain text. Your security policy should ensure that it is acceptable for users with access to the Pacemaker configuration and tools to also have access to these credentials and the underlying subsystems concerned.

SCSI Fencing

The SCSI Fencing agent is used to provide storage level fencing. This protects storage resources from being written to by two nodes at the same time, using SCSI-3 PR (Persistent Reservation). Used in conjunction with a watchdog service, a node can be reset automatically via stonith when it attempts to access the SCSI resource without a reservation.

To configure an environment in this way, install the watchdog service on both nodes and copy the provided fence_scsi_check script to the watchdog configuration before enabling the service:

# yum install watchdog
# cp /usr/share/cluster/fence_scsi_check /etc/watchdog.d/
# systemctl enable --now watchdog

To use this fencing agent, you must also enable the iscsid service provided in the iscsi-initiator-utils package on both nodes:

# yum install -y iscsi-initiator-utils
# systemctl enable --now iscsid

Once both nodes are configured with the watchdog service and the iscsid service, you can configure the fence_scsi fencing agent on one of the cluster nodes to monitor a shared storage device, such as an iSCSI target. For example:

# pcs stonith create scsi_fencing fence_scsi pcmk_host_list="node1 node2" \
 devices="/dev/sdb" meta provides="unfencing"

In the example, node1 and node2 are the hostnames of the nodes in the cluster and /dev/sdb is the shared storage device. You should replace these variables with the appropriate information to match your own environment.

SBD Fencing

Storage Based Death (SBD) is a daemon that can run on a system and monitor shared storage and that can use a messaging system to track cluster health. SBD can trigger a reset in the event that the appropriate fencing agent determines that stonith should be implemented.

To set up and configure SBD fencing, stop the cluster by running the following command on one of the nodes:

# pcs cluster stop --all

On each node, install and configure the SBD daemon:

# yum install sbd

Edit /etc/sysconfig/sbd to set the SBD_DEVICE parameter to identify the shared storage device. For example, if your shared storage device is available on /dev/sdc, edit the file to contain the line:

SBD_DEVICE="/dev/sdc"

Enable the SBD service in systemd:

# systemctl enable --now sbd

On one of the nodes, create the SDB messaging layout on the shared storage device and confirm that it is in place. For example, to set up and verify messaging on the shared storage device at /dev/sdc, run the following commands:

# sbd -d /dev/sdc create
# sbd -d /dev/sdc list

Finally, start the cluster and configure the fence_sbd fencing agent for the shared storage device. For example, to configure the shared storage device, /dev/sdc, run the following commands on one of the nodes:

# pcs cluster start --all
# pcs stonith create sbd_fencing fence_sbd devices=/dev/sdc

IF-MIB Fencing

IF-MIB fencing takes advantage of SNMP to access the IF-MIB on an Ethernet network switch and to shutdown the port on the switch to effectively take a host offline. This leaves the host running, but disconnects it from the network. It is worth bearing in mind that any FibreChannel or InfiniBand connections could remain intact, even after the Ethernet connection has been terminated, which could mean that data made available on these connections could still be at risk. As a result, it is best to configure this as a fallback fencing mechanism. See Configuring Fencing Levels for more information on how to use multiple fencing agents together to maximise stonith success.

To configure IF-MIB fencing, ensure that your switch is configured for SNMP v2c at minimum and that SNMP SET messages are enabled. For example, on an Oracle Switch, via the ILOM CLI, you could run:

# set /SP/services/snmp/ sets=enabled
# set /SP/services/snmp/ v2c=enabled

On one of the nodes in your cluster, configure the fence_ifmib fencing agent for each node in your environment. For example:

# pcs stonith create ifmib_n1_fencing fence_ifmib pcmk_host_list=node1 \
ipaddr=203.0.113.10 community=private port=1 delay=5 op monitor interval=60s
# pcs stonith create ifmib_n2_fencing fence_ifmib pcmk_host_list=node2 \
ipaddr=203.0.113.10 community=private port=2 op monitor interval=60s

In the above example, the switch SNMP IF-MIB is accessible at the IP address 203.0.113.10. The host node1 is connected to port 1 on the switch. The host node2 is connected to port 2 on the switch. You should replace these variables with the appropriate information to match your own environment.

Configuring Fencing Levels

If you have configured multiple fencing agents, you may want to set different fencing levels. Fencing levels allow you to prioritize different approaches to fencing and can provide a valuable mechanism to provide fallback options should a default fencing approach fail.

Each fencing level is attempted in ascending order starting from level 1. If the fencing agent configured for a particular level fails, the fencing agent from the next level is attempted instead.

For example, you may wish to configure IPMI-LAN fencing at level 1, but fallback to IF-MIB fencing as a level 2 option. Using the example configurations from IPMI LAN Fencing and IF-MIB Fencing, you could run the following commands on one of the nodes to set the fencing levels for each configured agent:

# pcs stonith level add 1 node1 ipmilan_n1_fencing
# pcs stonith level add 1 node2 ipmilan_n2_fencing
# pcs stonith level add 2 node1 ifmib_n1_fencing
# pcs stonith level add 2 node2 ifmib_n2_fencing