Create a High Availability Cluster For Oracle Linux on Azure

Introduction

High availability cluster services in Oracle Linux consist of open-source packages that include the Pacemaker and Corosync features.

Pacemaker is a high availability cluster resource manager responsible for managing the life cycle of software deployed on a cluster. Pacemaker works with compatible fencing agents to manage the fencing, or stonith (shoot the other node in the head) of faulty nodes. Pacemaker ships with the Pacemaker Command Shell (pcs) that provides a high-level command-line interface to configure the cluster and its resources.

Corosync, which is installed with Pacemaker, is a cluster engine that manages cluster membership and provides a quorum system that can notify applications when quorum is achieved or lost.

Objectives

In this tutorial you will:

Prerequisites

The example steps in the procedure that follows assumes you have completed the following:

Note: For systems hosted on Azure, clustering with Pacemaker and Corosync is only available for Azure x86 VMs.

For more information about setting up and configuring resources in Azure see https://learn.microsoft.com/en-us/azure/

Enable Access to the Pacemaker and Corosync Packages

  1. On each node in the cluster, enable repositories with the Oracle Linux Yum Server:

    sudo dnf config-manager --enable ol8_appstream ol8_baseos_latest ol8_addons
    

Install the Azure Fencing Agent

  1. On each node in the cluster, install the package with the Azure SDK dependency:

    sudo dnf install fence-agents-azure-arm python3-azure-sdk
    

Install and Enable the Pacemaker Packages

  1. On each node in the cluster, install the pcs and pacemaker software packages:

    sudo dnf install -y pcs pacemaker
    
  2. On each node, configure the firewall so that the service components are able to communicate across the network.

    For example:

    sudo firewall-cmd --permanent --add-service=high-availability
    
  3. Set a password for the hacluster user on each node:

    sudo passwd hacluster
    
  4. On each of the nodes in the cluster, set the pcsd service and pacemaker service to run, and to start at boot, by running the following commands:

    sudo systemctl enable --now pcsd.service
    sudo systemctl enable --now pacemaker.service
    

Create the Cluster

  1. Authenticate the pcs cluster configuration tool for the hacluster user on each node in your configuration by running the following command on one of the nodes that will form part of the cluster:

    sudo pcs host auth node1 node2 -u hacluster
    

    Replace node1 and node2 with the resolvable hostnames of the nodes that will form part of the cluster.

    The tool prompts you to provide a password for the hacluster user. Provide the password that you set for this user when you installed and configured the pacemaker software on each node in one of the earlier steps.

  2. On one of the nodes, create the cluster by using the pcs cluster setup command. You must specify a name for the cluster and the node names and IP addresses for each node in the cluster. For example, run the following command:

    sudo pcs cluster setup azure_cluster node1 addr=192.0.2.1 node2 addr=192.0.2.2
    

    Replace azure_cluster with an appropriate name for the cluster.

    Replace node1 and node2 with the resolvable hostnames of the nodes in the cluster.

    Replace 192.0.2.1 and 192.0.2.2 with the IP addresses of each of the respective hosts in the cluster.

  3. Still on the same node as the one you ran the setup command on, start the cluster by running the following command:

    sudo pcs cluster start --all
    
  4. Run the following command and confirm both nodes are online:

    sudo pcs status
    
  5. Still on the same server, disable stonith before setting the creating the virtual IP address:

    pcs property set stonith-enabled=false
    

Create and Set up an apache Website with a Floating Virtual IP Address

  1. Continuing on the same server, create a virtual IP address that can be assigned dynamically to any of the nodes:

    sudo pcs resource create VirtualIP IPaddr2 ip=192.0.2.3 cidr_netmask=24 nic=eth1 op monitor interval=1s --group apachegroup
    

    Replace 192.0.2.3 with the virtual IP address you are using in your setup.

    Replace apachegroup with a name of your choice.

  2. Continuing on the same server, create the apache website:

    pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://192.0.2.3/server-status" --group apachegroup
    

    Replace 192.0.2.3 with the virtual IP address you specified earlier.

    Replace apachegroup with the group name you specified earlier.

  3. Verify the cluster status:

    sudo pcs status
    

Create a Fencing Device for Each Node

  1. Continuing on the node you set up the pcs cluster on, run the following, once for each node in your cluster:

    sudo pcs stonith create resource_stonith_azure fence_azure_arm msi=true \
    resourceGroup="Azure_resource_group" \
    subscriptionId="Azure_subscription_id" \
    pcmk_host_map="node:Azure_VM_Name" \
    power_timeout=240 \
    pcmk_reboot_timeout=900 \
    pcmk_monitor_timeout=120 \
    pcmk_monitor_retries=4 \
    pcmk_action_limit=3 \
    op monitor interval=3600 \
    --group fencegroup
    

    Replace resource_stonith_azure with a node-specific resource name of your choice.

    For example, you might specify resource name resource_stonith_azure-1 when you run the command for the first server, and resource_stonith_azure-2 when you run the command for the second server, and so on.

    Replace Azure_resource_group with the name of the Azure portal resource group that holds VMs and other resources.

    Replace Azure_subscription_id with your subscription ID in Azure.

    Replace node with the resolvable hostname of the node you are creating the device for, and Azure_VM_Name with the name of the host in Azure.

    Note: The option pcmk_host_map is only required if the hostnames and the Azure VM names are not identical.

    Replace fencegroup with a group name of your choice.

  2. Continuing on the same node, enable stonith:

    pcs property set stonith-enabled=true
    
  3. Continuing on the same node, run the following commands to check your configuration and ensure that it is set up correctly:

    sudo pcs stonith config
    sudo pcs cluster verify --full
    

Test the Fencing Works as Expected

  1. Fence one of the nodes:

    sudo pcs stonith fence node2
    

    Replace node2 with the resolvable hostname of the node you are fencing.

    The command output confirms the node2 has been fenced:

    Node: node2 fenced
    
  2. Verify that the node you fence (node2 in this example) is offline as it reboots and that the resources fail over to the node that is still online (node1 in this example):

    sudo pcs status
    
    Cluster name: azure_cluster
    Cluster Summary:
     * Stack: corosync (Pacemaker is running)
     * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9-
    6fdc9deea29) - partition with quorum
     * Last updated: Sat Feb 3 22:35:58 2024 on node1
     * Last change: Sat Feb 3 22:33:37 2024 by root via
    cibadmin on node1
     * 2 nodes configured
     * 4 resource instances configured
    
    Node List:
     * Online: [ node1 ]
     * OFFLINE: [ node2 ]
    
    Full List of Resources:
     * Resource Group: apachegroup:
      * VirtualIP (ocf::heartbeat:IPaddr2): Started node1
      * Website (ocf::heartbeat:apache): Started node1
    * Resource Group: fencegroup:
     * resource_stonith_azure (stonith:fence_azure_arm):
    Started node1
     * resource_stonith_azure-2 (stonith:fence_azure_arm):
    Started node1
    
    Daemon Status:
     corosync: active/disabled
     pacemaker: active/enabled
     pcsd: active/enabled
    
    

    The preceding output in this example shows the fencing resource from the fenced server (resource_stonith_azure-2 from node2 ) and the apache website are now both running on the online node (node1 in our example).

    You can use the wget command to confirm the website is still accessible at the floating virtual IP address (replace 192.0.2.3 with the virtual IP address you are using in your setup):

     wget http://192.0.2.3/server-status
     -2024-02-07 19:47:42- http://192.0.2.3/server-status
     Connecting to 192.0.2.3:80... connected.
     HTTP request sent, awaiting response... 200 OK
     Length: unspecified [text/html]
     Saving to: ‘server-status’
    
     server-status [ <=> ] 9.53K --.-KB/s in 0s
    
     2024-02-07 19:47:42 (204 MB/s) - 'server-status' saved [9757]
    
  3. Rerun the pcs status command to confirm that the node you fenced (node2 in this example) has successfully rebooted and come back online:

    pcs status
    
    Cluster name: azure_cluster
    Cluster Summary:
     * Stack: corosync (Pacemaker is running)
     * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9-
    6fdc9deea29) - partition with quorum
     * Last updated: Sat Feb 3 22:36:08 2024 on node1
     * Last change: Sat Feb 3 22:33:37 2024 by root via
    cibadmin on node1
     * 2 nodes configured
     * 4 resource instances configured
    Node List:
     * Online: [ node1 node2 ]
    Full List of Resources:
     * Resource Group: apachegroup:
     * VirtualIP (ocf::heartbeat:IPaddr2): Started node1
     * Website (ocf::heartbeat:apache): Started node1
     * Resource Group: fencegroup:
     * resource_stonith_azure(stonith:fence_azure_arm):
    Started node2
     * resource_stonith_azure-2(stonith:fence_azure_arm):
    Started node2
    
    Daemon Status:
     corosync: active/disabled
     pacemaker: active/enabled
     pcsd: active/enabled
    

    The preceding output in this example shows the server fenced earlier (node2 in our example) is back online with the fencing resource resource_stonith_azure-2 running on it.

    For More Information

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.