Create a High Availability Cluster For Oracle Linux on Azure

Introduction

High availability cluster services in Oracle Linux consist of open-source packages that include the Pacemaker and Corosync features.

Pacemaker is a high availability cluster resource manager responsible for managing the life cycle of software deployed on a cluster. Pacemaker works with compatible fencing agents to manage the fencing, or stonith (shoot the other node in the head) of faulty nodes. Pacemaker ships with the Pacemaker Command Shell (pcs) that provides a high-level command-line interface to configure the cluster and its resources.

Corosync, which is installed with Pacemaker, is a cluster engine that manages cluster membership and provides a quorum system that can notify applications when quorum is achieved or lost.

Objectives

In this tutorial you will:

Install Pacemaker and the Pacemaker Command Shell (pcs), and the Azure fencing agent for the Azure Resource Manager (ARM).
Create a high availability Oracle Linux cluster in Azure.
Create a fencing device on each node in the cluster.
Create an apache website accessed with a floating virtual IP address.
Verify that the fencing and resource failover functionality works as expected.

Prerequisites

The example steps in the procedure that follows assumes you have completed the following:

Note: For systems hosted on Azure, clustering with Pacemaker and Corosync is only available for Azure x86 VMs.

Created 2 Oracle Linux VMs in Azure.
Created a system-assigned managed identity (MSI) for each VM in the cluster.
Set up each VM with the following NIC cards:
- eth0 configured with an IP address.
- eth1 configured without an IP address (eth1 is to be used for a virtual IP).
Set up a shared storage device accessible from all your nodes.
Mounted the storage device on /var/www/html for use with Apache testing.

For more information about setting up and configuring resources in Azure see https://learn.microsoft.com/en-us/azure/

Enable Access to the Pacemaker and Corosync Packages

On each node in the cluster, enable repositories with the Oracle Linux Yum Server:
```
sudo dnf config-manager --enable ol8_appstream ol8_baseos_latest ol8_addons
```

Install the Azure Fencing Agent

On each node in the cluster, install the package with the Azure SDK dependency:
```
sudo dnf install fence-agents-azure-arm python3-azure-sdk
```

Install and Enable the Pacemaker Packages

On each node in the cluster, install the pcs and pacemaker software packages:
```
sudo dnf install -y pcs pacemaker
```
On each node, configure the firewall so that the service components are able to communicate across the network.

For example:
```
sudo firewall-cmd --permanent --add-service=high-availability
```
Set a password for the hacluster user on each node:
```
sudo passwd hacluster
```
On each of the nodes in the cluster, set the pcsd service and pacemaker service to run, and to start at boot, by running the following commands:
```
sudo systemctl enable --now pcsd.service
sudo systemctl enable --now pacemaker.service
```

Create the Cluster

Authenticate the pcs cluster configuration tool for the hacluster user on each node in your configuration by running the following command on one of the nodes that will form part of the cluster:
```
sudo pcs host auth node1 node2 -u hacluster
```
Replace node1 and node2 with the resolvable hostnames of the nodes that will form part of the cluster.

The tool prompts you to provide a password for the hacluster user. Provide the password that you set for this user when you installed and configured the pacemaker software on each node in one of the earlier steps.
On one of the nodes, create the cluster by using the pcs cluster setup command. You must specify a name for the cluster and the node names and IP addresses for each node in the cluster. For example, run the following command:
```
sudo pcs cluster setup azure_cluster node1 addr=192.0.2.1 node2 addr=192.0.2.2
```
Replace azure_cluster with an appropriate name for the cluster.

Replace node1 and node2 with the resolvable hostnames of the nodes in the cluster.

Replace 192.0.2.1 and 192.0.2.2 with the IP addresses of each of the respective hosts in the cluster.
Still on the same node as the one you ran the setup command on, start the cluster by running the following command:
```
sudo pcs cluster start --all
```
Run the following command and confirm both nodes are online:
```
sudo pcs status
```
Still on the same server, disable stonith before setting the creating the virtual IP address:
```
pcs property set stonith-enabled=false
```

Create and Set up an apache Website with a Floating Virtual IP Address

Continuing on the same server, create a virtual IP address that can be assigned dynamically to any of the nodes:
```
sudo pcs resource create VirtualIP IPaddr2 ip=192.0.2.3 cidr_netmask=24 nic=eth1 op monitor interval=1s --group apachegroup
```
Replace 192.0.2.3 with the virtual IP address you are using in your setup.

Replace apachegroup with a name of your choice.
Continuing on the same server, create the apache website:
```
pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://192.0.2.3/server-status" --group apachegroup
```
Replace 192.0.2.3 with the virtual IP address you specified earlier.

Replace apachegroup with the group name you specified earlier.
Verify the cluster status:
```
sudo pcs status
```

Create a Fencing Device for Each Node

Continuing on the node you set up the pcs cluster on, run the following, once for each node in your cluster:
```
sudo pcs stonith create resource_stonith_azure fence_azure_arm msi=true \
resourceGroup="Azure_resource_group" \
subscriptionId="Azure_subscription_id" \
pcmk_host_map="node:Azure_VM_Name" \
power_timeout=240 \
pcmk_reboot_timeout=900 \
pcmk_monitor_timeout=120 \
pcmk_monitor_retries=4 \
pcmk_action_limit=3 \
op monitor interval=3600 \
--group fencegroup
```
Replace resource_stonith_azure with a node-specific resource name of your choice.

For example, you might specify resource name resource_stonith_azure-1 when you run the command for the first server, and resource_stonith_azure-2 when you run the command for the second server, and so on.

Replace Azure_resource_group with the name of the Azure portal resource group that holds VMs and other resources.

Replace Azure_subscription_id with your subscription ID in Azure.

Replace node with the resolvable hostname of the node you are creating the device for, and Azure_VM_Name with the name of the host in Azure.

Note: The option pcmk_host_map is only required if the hostnames and the Azure VM names are not identical.

Replace fencegroup with a group name of your choice.
Continuing on the same node, enable stonith:
```
pcs property set stonith-enabled=true
```
Continuing on the same node, run the following commands to check your configuration and ensure that it is set up correctly:
```
sudo pcs stonith config
sudo pcs cluster verify --full
```

Test the Fencing Works as Expected

Fence one of the nodes:
```
sudo pcs stonith fence node2
```
Replace node2 with the resolvable hostname of the node you are fencing.

The command output confirms the node2 has been fenced:
```
Node: node2 fenced
```

Verify that the node you fence (node2 in this example) is offline as it reboots and that the resources fail over to the node that is still online (node1 in this example):

sudo pcs status

Cluster name: azure_cluster
Cluster Summary:
 * Stack: corosync (Pacemaker is running)
 * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9-
6fdc9deea29) - partition with quorum
 * Last updated: Sat Feb 3 22:35:58 2024 on node1
 * Last change: Sat Feb 3 22:33:37 2024 by root via
cibadmin on node1
 * 2 nodes configured
 * 4 resource instances configured

Node List:
 * Online: [ node1 ]
 * OFFLINE: [ node2 ]

Full List of Resources:
 * Resource Group: apachegroup:
  * VirtualIP (ocf::heartbeat:IPaddr2): Started node1
  * Website (ocf::heartbeat:apache): Started node1
* Resource Group: fencegroup:
 * resource_stonith_azure (stonith:fence_azure_arm):
Started node1
 * resource_stonith_azure-2 (stonith:fence_azure_arm):
Started node1

Daemon Status:
 corosync: active/disabled
 pacemaker: active/enabled
 pcsd: active/enabled

The preceding output in this example shows the fencing resource from the fenced server (resource_stonith_azure-2 from node2 ) and the apache website are now both running on the online node (node1 in our example).

You can use the wget command to confirm the website is still accessible at the floating virtual IP address (replace 192.0.2.3 with the virtual IP address you are using in your setup):

 wget http://192.0.2.3/server-status
 -2024-02-07 19:47:42- http://192.0.2.3/server-status
 Connecting to 192.0.2.3:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: ‘server-status’

 server-status [ <=> ] 9.53K --.-KB/s in 0s

 2024-02-07 19:47:42 (204 MB/s) - 'server-status' saved [9757]

Rerun the pcs status command to confirm that the node you fenced (node2 in this example) has successfully rebooted and come back online:

pcs status

Cluster name: azure_cluster
Cluster Summary:
 * Stack: corosync (Pacemaker is running)
 * Current DC: node1 (version 2.1.6-9.1.0.1.el8_9-
6fdc9deea29) - partition with quorum
 * Last updated: Sat Feb 3 22:36:08 2024 on node1
 * Last change: Sat Feb 3 22:33:37 2024 by root via
cibadmin on node1
 * 2 nodes configured
 * 4 resource instances configured
Node List:
 * Online: [ node1 node2 ]
Full List of Resources:
 * Resource Group: apachegroup:
 * VirtualIP (ocf::heartbeat:IPaddr2): Started node1
 * Website (ocf::heartbeat:apache): Started node1
 * Resource Group: fencegroup:
 * resource_stonith_azure(stonith:fence_azure_arm):
Started node2
 * resource_stonith_azure-2(stonith:fence_azure_arm):
Started node2

Daemon Status:
 corosync: active/disabled
 pacemaker: active/enabled
 pcsd: active/enabled

The preceding output in this example shows the server fenced earlier (node2 in our example) is back online with the fencing resource resource_stonith_azure-2 running on it.

For More Information

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.

Title and Copyright Information

Create a High Availability Cluster For Oracle Linux on Azure

F93996-02

March 2024