Chapter 7 Backing up and Restoring a Kubernetes Cluster

This chapter discusses how to back up and restore a Kubernetes cluster in Oracle Linux Cloud Native Environment.

7.1 Backing up Control Plane Nodes

Adopting a back up strategy to protect your Kubernetes cluster against control plane node failures is important, particularly for clusters with only one control plane node. High availability clusters with multiple control plane nodes also need a fallback plan if the resilience provided by the replication and failover functionality has been exceeded.

You do not need to bring down the cluster to perform a back up as part of your disaster recovery plan. On the operator node, use the olcnectl module backup command to back up the key containers and manifests for all the control plane nodes in your cluster.

Important

Only the key containers required for the Kubernetes control plane node are backed up. No application containers are backed up.

For example:

olcnectl module backup \
--environment-name myenvironment \
--name mycluster

The back up files are stored in the /var/olcne/backups directory on the operator node. The files are saved to a timestamped folder that follows the pattern:

/var/olcne/backups/environment-name/kubernetes/module-name/timestamp

You can interact with the directory and the files it contains just like any other, for example:

sudo ls /var/olcne/backups/myenvironment/kubernetes/mycluster/20191007040013
control1.example.com.tar control2.example.com.tar control3.example.com.tar etcd.tar

7.2 Restoring Control Plane Nodes

These restore steps are intended for use when a Kubernetes cluster needs to be reconstructed as part of a planned disaster recovery scenario. Unless there is a total cluster failure you do not need to manually recover individual control plane nodes in a high availability cluster that is able to self-heal with replication and failover.

In order to restore a control plane node, you must have a pre-existing Oracle Linux Cloud Native Environment, and have deployed the Kubernetes module. You cannot restore to a non-existent environment.

To restore a control plane node:
  1. Make sure the Platform Agent is running correctly on the control plane nodes before proceeding:

    systemctl status olcne-agent.service
  2. On the operator node, use the olcnectl module restore command to restore the key containers and manifests for the control plane nodes in your cluster. For example:

    olcnectl module restore \
    --environment-name myenvironment \
    --name mycluster

    The files from the latest timestamped folder from /var/olcne/backups/environment-name/kubernetes/module-name/ are used to restore the cluster to its previous state.

    You may be prompted by the Platform CLI to perform additional set up steps on your control plane nodes to fulfil the prerequisite requirements. If that happens, follow the instructions and run the olcnectl module restore command again.

  3. You can verify the restore operation was successful using the kubectl command on a control plane node. For example:

    kubectl get nodes
    NAME STATUS ROLES AGE VERSION control1.example.com Ready master 9m27s v1.18.x+x.x.x.el7 worker1.example.com Ready <none> 8m53s v1.18.x+x.x.x.el7 kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE coredns-5bc65d7f4b-qzfcc 1/1 Running 0 9m coredns-5bc65d7f4b-z64f2 1/1 Running 0 9m etcd-control1.example.com 1/1 Running 0 9m kube-apiserver-control1.example.com 1/1 Running 0 9m kube-controller-control1.example.com 1/1 Running 0 9m kube-flannel-ds-2sjbx 1/1 Running 0 9m kube-flannel-ds-njg9r 1/1 Running 0 9m kube-proxy-m2rt2 1/1 Running 0 9m kube-proxy-tbkxd 1/1 Running 0 9m kube-scheduler-control1.example.com 1/1 Running 0 9m kubernetes-dashboard-7646bf6898-d6x2m 1/1 Running 0 9m