Note:

Backup Control Plane Nodes on Oracle Cloud Native Environment

Introduction

Oracle Cloud Native Environment ships with a module that allows an administrator to back up and restore the control plane configuration files. This tutorial covers performing a backup, inspecting the backup file, and then restoring from the backup file.

Objectives

In this lab, you’ll learn how to:

Prerequisites

Deploy Oracle Cloud Native Environment

Note: If running in your own tenancy, read the linux-virt-labs GitHub project README.md and complete the prerequisites before deploying the lab environment.

  1. Open a terminal on the Luna Desktop.

  2. Clone the linux-virt-labs GitHub project.

    git clone https://github.com/oracle-devrel/linux-virt-labs.git
    
  3. Change into the working directory.

    cd linux-virt-labs/ocne
    
  4. Install the required collections.

    ansible-galaxy collection install -r requirements.yml
    
  5. Deploy the lab environment.

    ansible-playbook create_instance.yml -e localhost_python_interpreter="/usr/bin/python3.6"
    

    The free lab environment requires the extra variable local_python_interpreter, which sets ansible_python_interpreter for plays running on localhost. This variable is needed because the environment installs the RPM package for the Oracle Cloud Infrastructure SDK for Python, located under the python3.6 modules.

    Important: Wait for the playbook to run successfully and reach the pause task. At this stage of the playbook, the installation of Oracle Cloud Native Environment is complete, and the instances are ready. Take note of the previous play, which prints the public and private IP addresses of the nodes it deploys and any other deployment information needed while running the lab.

Back Up the Control Plane Node

A proper backup strategy for the control plane node, especially the etcd database, is essential for cluster administration. Although high-availability clusters have resilience provided by the replication and failover functionality, that does not replace the need for backups.

  1. Open a terminal and connect via SSH to the ocne-operator node.

    ssh oracle@<ip_address_of_node>
    
  2. Confirm installation of the cluster.

    olcnectl module instances --environment-name myenvironment
    
  3. Confirm the cluster is running.

    ssh ocne-control-01 kubectl get nodes
    

    You can keep the cluster up and running while performing a backup as part of your disaster recovery plan.

  4. Create a backup file.

    olcnectl module backup --environment-name myenvironment --name mycluster
    

    Important: The backup only contains the key containers required for the Kubernetes control plane node. It does not back up any application containers.

  5. Check for the new backup files.

    The backup module writes the backup files to /var/olcne/backups/<environment-name>/<module-name>/<cluster-name> in a timestamped directory.

    Example:

    total 2368
    drwxr-x---. 2 olcne olcne 75 Apr 22 18:53 .
    drwxr-x---. 3 olcne olcne 28 Apr 22 18:53 ..
    -rwxr-x---. 1 olcne olcne 2304000 Apr 22 18:53 etcd.tar
    -rw-r--r--. 1 olcne olcne 1100 Apr 22 18:53 module-config.json
    -rwxr-x---. 1 olcne olcne 112640 Apr 22 18:53 ocne-control-01.tar
    

Verify the Backup File

One of the most critical parts of the backup is etcd. The etcd in Kubernetes is similar to the etc directory on Linux but works on distributed systems such as Oracle Cloud Native Environment. Oracle Cloud Native Environment uses etcd as its primary data store for the Kubernetes cluster to hold its configuration data and cluster settings.

Within the backup file, the etcd.tar file contains the etcd database, while the ocne-control-01.tar file contains other configurations such as certificates, endpoints, and deployment tracking.

  1. Change into the directory containing the backup files.

  2. List the files within the etcd.tar file.

    tar -tvf etcd.tar
    

    The output shows the backup of the etcd database and its member list.

    Example:

    -rw------- root/root 2220064 2024-04-24 12:51 var/olcne/scratch/etcd.backup
    -rw-r--r-- root/root 39 2024-04-24 12:51 var/olcne/scratch/etcd.member
    
  3. Show the contents of the etcd.member file.

    tar xfO etcd.tar var/olcne/scratch/etcd.member
    

    Example:

    ocne-control-01=https://10.0.0.54:2380
    
  4. Perform a diff between the backup and active files.

    ssh ocne-control-01 sudo cat /etc/kubernetes/manifests/etcd.yaml | diff - <(tar xfO ocne-control-01.tar etc/kubernetes/manifests/etcd.yaml)
    

    This one-liner grabs the contents of the active file over SSH to the control plane node and then passes the contents to the diff command. On the diff side, the - is the placeholder for the incoming contents, and the < redirects the output from tar, which grabs the contents of the file from within the backup file.

    If nothing returns from the command, no differences exist between the files.

Restore from Back Up

To show that the backup is working, we need to change the configuration of the Kubernetes cluster. We’ll do this by creating a new pod running Nginx.

  1. Check for existing pods in the default namespace.

    The SSH command runs this check on the control plane node.

    ssh ocne-control-01 kubectl get pod
    

    The result shows No resources found in default namespace.

  2. Deploy a new pod running Nginx.

    ssh ocne-control-01 kubectl run newpod --image=nginx
    
  3. Verify the new pod is running.

    ssh ocne-control-01 kubectl get pod
    

    Example:

    NAME READY STATUS RESTARTS AGE
    newpod 1/1 Running 0 16s
    

    If the STATUS does not report as Running, run the command a few times until the deployment is complete.

    1. Restore the backup.
    olcnectl module restore --environment-name myenvironment --name mycluster --log-level info
    

    Reply y to the prompt to continue restoring the backup.

    Note: Kubernetes cordons and drains the nodes during the restore, which can take 15-20 minutes to complete. Adding the --log-level option to the restore command displays more details. In contrast, without this option the command runs silently without showing progress.

  4. Recheck for existing pods in the default namespace.

    ssh ocne-control-01 kubectl get pod
    

    The result shows No resources found in default namespace, which shows the restore as successful because it removes configuration changes that occur within the cluster after the initial backup.

Summary

A successful backup restore demonstrates how to back up and restore the control plane on Oracle Cloud Native Environment. It also illustrates why administrators need regular incremental backups after configuration changes and deployments to ensure restoring to those specific points in time without losing the changes is possible.

For More Information

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.