Creating a Service and Testing Failover

To create a service and test failover:

Services are created and usually configured to run a resource agent that is responsible for starting and stopping processes. Most resource agents are created according to the OCF (Open Cluster Framework) specification, which is defined as an extension for the Linux Standard Base (LSB). Many handy resource agents for commonly used processes are included in the resource-agents packages, including various heartbeat agents that track whether commonly used daemons or services are still running.

In the following example, a service is set up that uses a Dummy resource agent created precisely to test Pacemaker. This agent is used because it requires a basic configuration and doesn't make any assumptions about the environment or the types of services that you intend to run with Pacemaker.

  1. Add the service as a resource by using the pcs resource create command:

    sudo pcs resource create dummy_service ocf:pacemaker:Dummy

    In the previous example, dummy_service is the name that is provided for the service for this resource:

    To invoke the Dummy resource agent, a notation (ocf:pacemaker:Dummy) is used to specify that it conforms to the OCF standard, that it runs in the pacemaker namespace, and that the Dummy script is used. If you were configuring a heartbeat monitor service for a clustered file system, you might use the ocf:heartbeat:Filesystem resource agent.

    When you create a service, the cluster starts the resource on a node by using the resource agent's start command.

  2. Use the pcs status command to view the status of the cluster and its resources:

    sudo pcs status

    The following sample extract shows the command output:

    Cluster name: pacemaker1
    Cluster Summary:
      * Stack: corosync (Pacemaker is running)
      * Current DC: node1 (version version_information) - partition with quorum
      * Last updated: Fri Mar  7 13:31:26 2025 on node1
      * Last change:  Tue Mar  4 14:55:20 2025 by root via root on node1
      * 2 nodes configured
      * 1 resource instance configured
    
    Node List:
      * Online: [ node2 node1 ]
    
    Full List of Resources:
      * dummy_service	(ocf:pacemaker:Dummy):	 Started node1
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    ...
    

    The preceding sample output shows both nodes are online, and the service is started on node1.

  3. Trigger a failover by stopping the cluster on the node where the service is running:

    sudo pcs cluster stop node1

    The command reports that Pacemaker and Corosync are being stopped on the node, as shown in the following sample output:

    node1: Stopping Cluster (pacemaker)...
    node1: Stopping Cluster (corosync)...
  4. Verify the cluster has been stopped, for example, by running pcs cluster status command on the node (node1 in this example):

    sudo pcs cluster status

    The command reports the cluster is no longer running, as shown in the following sample output:

    Error: cluster is not currently running on this node
  5. Sign in to the other node, node2 in this example, and run the pcs status command to confirm that the service has been started on that node:

    sudo pcs status

    The following sample extract shows the command output:

    Cluster name: pacemaker1
    Cluster Summary:
      * Stack: corosync (Pacemaker is running)
      * Current DC: node2 (version version_information) - partition with quorum
      * Last updated: Fri Mar  7 13:35:11 2025 on node2
      * Last change:  Tue Mar  4 14:55:20 2025 by root via root on node1
      * 2 nodes configured
      * 1 resource instance configured
    
    Node List:
      * Online: [ node2 ]
      * OFFLINE: [ node1 ]
    
    Full List of Resources:
      * dummy_service	(ocf:pacemaker:Dummy):	 Started node2
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    ...
    

    The preceding sample output shows that node1 is offline and that the service has successfully failed over and is now started on node2.

  6. Use the pcs cluster start command to start the cluster on the offline node:

    sudo pcs cluster start node1
  7. Confirm the node is back online:

    sudo pcs status

    The following sample extract shows the expected command output:

    Cluster name: pacemaker1
    Cluster Summary:
      * Stack: corosync (Pacemaker is running)
      * Current DC: node2 (version version_information) - partition with quorum
      * Last updated: Fri Mar  7 13:38:36 2025 on node1
      * Last change:  Tue Mar  4 14:55:20 2025 by root via root on node1
      * 2 nodes configured
      * 1 resource instance configured
    
    Node List:
      * Online: [ node2 node1]
    
    Full List of Resources:
      * dummy_service	(ocf:pacemaker:Dummy):	 Started node2
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    ...
    

    The preceding sample output shows node1 is back online.