Creating a Service and Testing Failover
To create a service and test failover:
Services are created and usually configured to run a resource agent that is responsible for
starting and stopping processes. Most resource agents are created according to the OCF (Open
Cluster Framework) specification, which is defined as an extension for the Linux Standard Base
(LSB). Many handy resource agents for commonly used processes are included in the
resource-agents
packages, including various heartbeat agents that track
whether commonly used daemons or services are still running.
In the following example, a service is set up that uses a Dummy resource agent created precisely to test Pacemaker. This agent is used because it requires a basic configuration and doesn't make any assumptions about the environment or the types of services that you intend to run with Pacemaker.
-
Add the service as a resource by using the pcs resource create command:
sudo pcs resource create dummy_service ocf:pacemaker:Dummy
In the previous example, dummy_service is the name that is provided for the service for this resource:
To invoke the Dummy resource agent, a notation (
ocf:pacemaker:Dummy
) is used to specify that it conforms to the OCF standard, that it runs in the pacemaker namespace, and that the Dummy script is used. If you were configuring a heartbeat monitor service for a clustered file system, you might use theocf:heartbeat:Filesystem
resource agent.When you create a service, the cluster starts the resource on a node by using the resource agent's start command.
-
Use the
pcs status
command to view the status of the cluster and its resources:sudo pcs status
The following sample extract shows the command output:
Cluster name: pacemaker1 Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node1 (version version_information) - partition with quorum * Last updated: Fri Mar 7 13:31:26 2025 on node1 * Last change: Tue Mar 4 14:55:20 2025 by root via root on node1 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ node2 node1 ] Full List of Resources: * dummy_service (ocf:pacemaker:Dummy): Started node1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ...
The preceding sample output shows both nodes are online, and the service is started on node1.
-
Trigger a failover by stopping the cluster on the node where the service is running:
sudo pcs cluster stop node1
The command reports that Pacemaker and Corosync are being stopped on the node, as shown in the following sample output:
node1: Stopping Cluster (pacemaker)... node1: Stopping Cluster (corosync)...
-
Verify the cluster has been stopped, for example, by running
pcs cluster status
command on the node (node1 in this example):sudo pcs cluster status
The command reports the cluster is no longer running, as shown in the following sample output:
Error: cluster is not currently running on this node
-
Sign in to the other node, node2 in this example, and run the
pcs status
command to confirm that the service has been started on that node:sudo pcs status
The following sample extract shows the command output:
Cluster name: pacemaker1 Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node2 (version version_information) - partition with quorum * Last updated: Fri Mar 7 13:35:11 2025 on node2 * Last change: Tue Mar 4 14:55:20 2025 by root via root on node1 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ node2 ] * OFFLINE: [ node1 ] Full List of Resources: * dummy_service (ocf:pacemaker:Dummy): Started node2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ...
The preceding sample output shows that node1 is offline and that the service has successfully failed over and is now started on node2.
-
Use the
pcs cluster start
command to start the cluster on the offline node:sudo pcs cluster start node1
-
Confirm the node is back online:
sudo pcs status
The following sample extract shows the expected command output:
Cluster name: pacemaker1 Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node2 (version version_information) - partition with quorum * Last updated: Fri Mar 7 13:38:36 2025 on node1 * Last change: Tue Mar 4 14:55:20 2025 by root via root on node1 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ node2 node1] Full List of Resources: * dummy_service (ocf:pacemaker:Dummy): Started node2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ...
The preceding sample output shows node1 is back online.