7 Backing up and Restoring a Kubernetes Cluster

Important:

The software described in this documentation is either in Extended Support or Sustaining Support. See Oracle Open Source Support Policies for more information.

We recommend that you upgrade the software described by this documentation as soon as possible.

This chapter discusses how to back up and restore a Kubernetes cluster in Oracle Cloud Native Environment.

Backing up Control Plane Nodes

Adopting a back up strategy to protect your Kubernetes cluster against control plane node failures is important, particularly for clusters with only one control plane node. High availability clusters with multiple control plane nodes also need a fallback plan if the resilience provided by the replication and failover functionality has been exceeded.

You do not need to bring down the cluster to perform a back up as part of your disaster recovery plan. On the operator node, use the olcnectl module backup command to back up the key containers and manifests for all the control plane nodes in your cluster.

Important:

Only the key containers required for the Kubernetes control plane node are backed up. No application containers are backed up.

For example:

olcnectl module backup \
--environment-name myenvironment \
--name mycluster

The back up files are stored in the /var/olcne/backups directory on the operator node. The files are saved to a timestamped folder that follows the pattern:

/var/olcne/backups/environment-name/kubernetes/module-name/timestamp

You can interact with the directory and the files it contains just like any other, for example:

sudo ls /var/olcne/backups/myenvironment/kubernetes/mycluster/20191007040013
control1.example.com.tar control2.example.com.tar control3.example.com.tar  etcd.tar

Restoring Control Plane Nodes

These restore steps are intended for use when a Kubernetes cluster needs to be reconstructed as part of a planned disaster recovery scenario. Unless there is a total cluster failure you do not need to manually recover individual control plane nodes in a high availability cluster that is able to self-heal with replication and failover.

In order to restore a control plane node, you must have a pre-existing Oracle Cloud Native Environment, and have deployed the Kubernetes module. You cannot restore to a non-existent environment.

To restore a control plane node:

  1. Make sure the Platform Agent is running correctly on the control plane nodes before proceeding:

    systemctl status olcne-agent.service
  2. On the operator node, use the olcnectl module restore command to restore the key containers and manifests for the control plane nodes in your cluster. For example:

    olcnectl module restore \
    --environment-name myenvironment \
    --name mycluster

    The files from the latest timestamped folder from /var/olcne/backups/environment-name/kubernetes/module-name/ are used to restore the cluster to its previous state.

    You may be prompted by the Platform CLI to perform additional set up steps on your control plane nodes to fulfill the prerequisite requirements. If that happens, follow the instructions and run the olcnectl module restore command again.

  3. You can verify the restore operation was successful using the kubectl command on a control plane node. For example:

    kubectl get nodes
    NAME                    STATUS   ROLES    AGE     VERSION
    control1.example.com    Ready    master   9m27s   v1.21.x+x.x.x.el8
    worker1.example.com     Ready    <none>   8m53s   v1.21.x+x.x.x.el8
    
    kubectl get pods -n kube-system
    NAME                                      READY   STATUS    RESTARTS   AGE
    coredns-5bc65d7f4b-qzfcc                  1/1     Running   0          9m
    coredns-5bc65d7f4b-z64f2                  1/1     Running   0          9m
    etcd-control1.example.com                 1/1     Running   0          9m
    kube-apiserver-control1.example.com       1/1     Running   0          9m
    kube-controller-control1.example.com      1/1     Running   0          9m
    kube-flannel-ds-2sjbx                     1/1     Running   0          9m
    kube-flannel-ds-njg9r                     1/1     Running   0          9m
    kube-proxy-m2rt2                          1/1     Running   0          9m
    kube-proxy-tbkxd                          1/1     Running   0          9m
    kube-scheduler-control1.example.com       1/1     Running   0          9m
    kubernetes-dashboard-7646bf6898-d6x2m     1/1     Running   0          9m