Create a High Availability Cluster For Oracle Linux on Azure
Introduction
High availability cluster services in Oracle Linux consist of open-source packages that include the Pacemaker and Corosync features.
Pacemaker is a high availability cluster resource manager responsible for managing the life cycle of software deployed on a cluster. Pacemaker works with compatible fencing agents to manage the fencing, or stonith
(shoot the other node in the head) of faulty nodes. Pacemaker ships with the Pacemaker Command Shell (pcs
) that provides a high-level command-line interface to configure the cluster and its resources.
Corosync, which is installed with Pacemaker, is a cluster engine that manages cluster membership and provides a quorum system that can notify applications when quorum is achieved or lost.
Objectives
In this tutorial you will:
- Install Pacemaker and the Pacemaker Command Shell (
pcs
), and the Azure fencing agent for the Azure Resource Manager (ARM). - Create a high availability Oracle Linux cluster in Azure.
- Create a fencing device on each node in the cluster.
- Create an apache website accessed with a floating virtual IP address.
- Verify that the fencing and resource failover functionality works as expected.
Prerequisites
The example steps in the procedure that follows assumes you have completed the following:
Note: For systems hosted on Azure, clustering with Pacemaker and Corosync is only available for Azure x86 VMs.
-
Created 2 Oracle Linux VMs in Azure.
-
Created a system-assigned managed identity (MSI) for each VM in the cluster.
-
Set up each VM with the following NIC cards:
-
eth0
configured with an IP address. -
eth1
configured without an IP address (eth1
is to be used for a virtual IP).
-
-
Set up a shared storage device accessible from all your nodes.
-
Mounted the storage device on
/var/www/html
for use with Apache testing.
For more information about setting up and configuring resources in Azure see https://learn.microsoft.com/en-us/azure/
Enable Access to the Pacemaker and Corosync Packages
-
On each node in the cluster, enable repositories with the Oracle Linux Yum Server:
sudo dnf config-manager --enable ol8_appstream ol8_baseos_latest ol8_addons
Install the Azure Fencing Agent
-
On each node in the cluster, install the package with the Azure SDK dependency:
sudo dnf install fence-agents-azure-arm python3-azure-sdk
Install and Enable the Pacemaker Packages
-
On each node in the cluster, install the
pcs
andpacemaker
software packages:sudo dnf install -y pcs pacemaker
-
On each node, configure the firewall so that the service components are able to communicate across the network.
For example:
sudo firewall-cmd --permanent --add-service=high-availability
-
Set a password for the
hacluster
user on each node:sudo passwd hacluster
-
On each of the nodes in the cluster, set the
pcsd
service andpacemaker
service to run, and to start at boot, by running the following commands:sudo systemctl enable --now pcsd.service sudo systemctl enable --now pacemaker.service
Create the Cluster
-
Authenticate the
pcs
cluster configuration tool for thehacluster
user on each node in your configuration by running the following command on one of the nodes that will form part of the cluster:sudo pcs host auth node1 node2 -u hacluster
Replace node1 and node2 with the resolvable hostnames of the nodes that will form part of the cluster.
The tool prompts you to provide a password for the
hacluster
user. Provide the password that you set for this user when you installed and configured thepacemaker
software on each node in one of the earlier steps. -
On one of the nodes, create the cluster by using the
pcs cluster setup
command. You must specify a name for the cluster and the node names and IP addresses for each node in the cluster. For example, run the following command:sudo pcs cluster setup azure_cluster node1 addr=192.0.2.1 node2 addr=192.0.2.2
Replace azure_cluster with an appropriate name for the cluster.
Replace node1 and node2 with the resolvable hostnames of the nodes in the cluster.
Replace 192.0.2.1 and 192.0.2.2 with the IP addresses of each of the respective hosts in the cluster.
-
Still on the same node as the one you ran the
setup
command on, start the cluster by running the following command:sudo pcs cluster start --all
-
Run the following command and confirm both nodes are online:
sudo pcs status
-
Still on the same server, disable
stonith
before setting the creating the virtual IP address:pcs property set stonith-enabled=false
Create and Set up an apache Website with a Floating Virtual IP Address
-
Continuing on the same server, create a virtual IP address that can be assigned dynamically to any of the nodes:
sudo pcs resource create VirtualIP IPaddr2 ip=192.0.2.3 cidr_netmask=24 nic=eth1 op monitor interval=1s --group apachegroup
Replace 192.0.2.3 with the virtual IP address you are using in your setup.
Replace apachegroup with a name of your choice.
-
Continuing on the same server, create the
apache
website:pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://192.0.2.3/server-status" --group apachegroup
Replace 192.0.2.3 with the virtual IP address you specified earlier.
Replace apachegroup with the group name you specified earlier.
-
Verify the cluster status:
sudo pcs status
Create a Fencing Device for Each Node
-
Continuing on the node you set up the
pcs cluster
on, run the following, once for each node in your cluster:sudo pcs stonith create resource_stonith_azure fence_azure_arm msi=true \ resourceGroup="Azure_resource_group" \ subscriptionId="Azure_subscription_id" \ pcmk_host_map="node:Azure_VM_Name" \ power_timeout=240 \ pcmk_reboot_timeout=900 \ pcmk_monitor_timeout=120 \ pcmk_monitor_retries=4 \ pcmk_action_limit=3 \ op monitor interval=3600 \ --group fencegroup
Replace resource_stonith_azure with a node-specific resource name of your choice.
For example, you might specify resource name resource_stonith_azure-1 when you run the command for the first server, and resource_stonith_azure-2 when you run the command for the second server, and so on.
Replace Azure_resource_group with the name of the Azure portal resource group that holds VMs and other resources.
Replace Azure_subscription_id with your subscription ID in Azure.
Replace node with the resolvable hostname of the node you are creating the device for, and Azure_VM_Name with the name of the host in Azure.
Note: The option
pcmk_host_map
is only required if the hostnames and the Azure VM names are not identical.Replace fencegroup with a group name of your choice.
-
Continuing on the same node, enable
stonith
:pcs property set stonith-enabled=true
-
Continuing on the same node, run the following commands to check your configuration and ensure that it is set up correctly:
sudo pcs stonith config sudo pcs cluster verify --full
Test the Fencing Works as Expected
-
Fence one of the nodes:
sudo pcs stonith fence node2
Replace node2 with the resolvable hostname of the node you are fencing.
The command output confirms the node2 has been fenced:
Node: node2 fenced
-
Verify that the node you fence (node2 in this example) is offline as it reboots and that the resources fail over to the node that is still online (node1 in this example):
sudo pcs status Cluster name: azure_cluster Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9- 6fdc9deea29) - partition with quorum * Last updated: Sat Feb 3 22:35:58 2024 on node1 * Last change: Sat Feb 3 22:33:37 2024 by root via cibadmin on node1 * 2 nodes configured * 4 resource instances configured Node List: * Online: [ node1 ] * OFFLINE: [ node2 ] Full List of Resources: * Resource Group: apachegroup: * VirtualIP (ocf::heartbeat:IPaddr2): Started node1 * Website (ocf::heartbeat:apache): Started node1 * Resource Group: fencegroup: * resource_stonith_azure (stonith:fence_azure_arm): Started node1 * resource_stonith_azure-2 (stonith:fence_azure_arm): Started node1 Daemon Status: corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled
The preceding output in this example shows the fencing resource from the fenced server (resource_stonith_azure-2 from node2 ) and the apache website are now both running on the online node (node1 in our example).
You can use the
wget
command to confirm the website is still accessible at the floating virtual IP address (replace 192.0.2.3 with the virtual IP address you are using in your setup):wget http://192.0.2.3/server-status -2024-02-07 19:47:42- http://192.0.2.3/server-status Connecting to 192.0.2.3:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘server-status’ server-status [ <=> ] 9.53K --.-KB/s in 0s 2024-02-07 19:47:42 (204 MB/s) - 'server-status' saved [9757]
-
Rerun the
pcs status
command to confirm that the node you fenced (node2 in this example) has successfully rebooted and come back online:pcs status Cluster name: azure_cluster Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9- 6fdc9deea29) - partition with quorum * Last updated: Sat Feb 3 22:36:08 2024 on node1 * Last change: Sat Feb 3 22:33:37 2024 by root via cibadmin on node1 * 2 nodes configured * 4 resource instances configured Node List: * Online: [ node1 node2 ] Full List of Resources: * Resource Group: apachegroup: * VirtualIP (ocf::heartbeat:IPaddr2): Started node1 * Website (ocf::heartbeat:apache): Started node1 * Resource Group: fencegroup: * resource_stonith_azure(stonith:fence_azure_arm): Started node2 * resource_stonith_azure-2(stonith:fence_azure_arm): Started node2 Daemon Status: corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled
The preceding output in this example shows the server fenced earlier (node2 in our example) is back online with the fencing resource resource_stonith_azure-2 running on it.
For More Information
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Create a High Availability Cluster For Oracle Linux on Azure
F93996-02
March 2024
Copyright © 2024, Oracle and/or its affiliates.