4.3.1 Single Master Cluster

The kubeadm-setup.sh script enables cluster backup and restore functionality so that you can easily protect your Kubernetes deployment from a failure of the master node in the cluster. Cluster status and configuration data is stored in the Cluster State Store, also referred to as etcd.

For the backup and restore processes to work properly, there are some basic requirements:

  • The hostname and IP address of the master node being restored, must match the hostname and IP address of the master node that was backed up. The usual use case for restore is after system failure, so the restore process expects a matching system for the master node with a fresh installation of the Docker engine and the Kubernetes packages.

  • The master node must be tainted so that is unable to run any workloads or containers other than those that the master node requires. This is the default configuration if you used the kubeadm-setup.sh script to setup your environment. The backup process does not back up any containers running on the master node other than the containers specific to managing the Kubernetes cluster.

  • The backup command must be run on the master node.

  • Any Docker engine configuration applied to the master node prior to the backup process must be manually replicated on the node on which you intend to run the restore operation. You may need to manually configure your Docker storage driver and proxy settings before running a restore operation.

  • The backup command checks for minimum disk space of 100 MB at the specified backup location. If the space is not available, the backup command exits with an error.

  • A restore can only function correctly using the backup file for a Kubernetes cluster running the same version of Kubernetes. You cannot restore a backup file for a Kubernetes 1.7.4 cluster, using the Kubernetes 1.8.4 tools.

The backup command requires that you stop the cluster during the backup process. Running container configurations on the worker nodes are unaffected during the backup process. The following steps describe how to create a backup file for the master node.

Back up the cluster configuration and state

  1. Stop the cluster.

    To back up the cluster configuration and state, the cluster must be stopped so that no changes can occur in state or configuration during the backup process. While the cluster is stopped, the worker nodes continue to run independently of the cluster, allowing the containers hosted on each of these nodes to continue to function. To stop the cluster, on the master node, run:

    # kubeadm-setup.sh stop
    Stopping kubelet now ...
    Stopping containers now ...
  2. Run kubeadm-setup.sh backup and specify the directory where the backup file should be stored.

    # kubeadm-setup.sh backup /backups
    Using container-registry.oracle.com/etcd:3.2.24
    Checking if container-registry.oracle.com/etcd:3.2.24 is available
    376ebb3701caa1e3733ef043d0105569de138f3e5f6faf74c354fa61cd04e02a 
    /var/run/kubeadm/backup/etcd-backup-1544442719.tar
    e8e528be930f2859a0d6c7b953cec4fab2465278376a59f8415a430e032b1e73 
    /var/run/kubeadm/backup/k8s-master-0-1544442719.tar
    Backup is successfully stored at /backups/master-backup-v1.12.5-2-1544442719.tar ...
    You can restart your cluster now by doing: 
    # kubeadm-setup.sh restart

    Substitute /backups with the path to a directory where you wish to store the backed up data for your cluster.

    Each run of the backup command creates as a tar file that is timestamped so that you can easily restore the most recent backup file. The backup file also contains a sha256 checksum that is used to verify the validity of the backup file during restore. The backup command instructs you to restart the cluster when you have finished backing up.

  3. Restart the cluster.

    # kubeadm-setup.sh restart
    Restarting containers now ...
    Detected node is master ...
    Checking if env is ready ...
    Checking whether docker can pull busybox image ...
    Checking access to container-registry.oracle.com ...
    Trying to pull repository container-registry.oracle.com/pause ... 
    3.1: Pulling from container-registry.oracle.com/pause
    Digest: sha256:802ef89b9eb7e874a76e1cfd79ed990b63b0b84a05cfa09f0293379ac0261b49
    Status: Image is up to date for container-registry.oracle.com/pause:3.1
    Checking firewalld settings ...
    Checking iptables default rule ...
    Checking br_netfilter module ...
    Checking sysctl variables ...
    Restarting kubelet ...
    Waiting for node to restart ...
    ....
    Master node restarted. Complete synchronization between nodes may take a few minutes.

    Checks, similar to those performed during cluster setup, are performed when the cluster is restarted, to ensure that no environment changes may have occurred that could prevent the cluster from functioning correctly. Once the cluster has started, it can take a few minutes for the nodes within the cluster to report status and for the cluster to settle back to normal operation.

A restore operation is typically performed on a freshly installed host, but can be run on an existing setup, as long as any pre-existing setup configuration is removed. The restore process assumes that the Docker engine is configured in the same way as the original master node. The Docker engine must be configured to use the same storage driver and if proxy configuration is required, you must set this up manually before restoring, as described in the following steps.

Restore the cluster configuration and state

  1. On the master host, ensure that the latest Docker and Kubernetes versions are installed and that the master node IP address and hostname match the IP address and hostname used before failure. The kubeadm package pulls in all of the required dependencies, including the correct version of the Docker engine.

    # yum install kubeadm kubectl kubelet
  2. Run the kubeadm-setup.sh restore command.

    # kubeadm-setup.sh restore /backups/master-backup-v1.12.5-2-1544442719.tar
    Checking sha256sum of the backup files ...
    /var/run/kubeadm/backup/etcd-backup-1544442719.tar: OK
    /var/run/kubeadm/backup/k8s-master-0-1544442719.tar: OK
    Restoring backup from /backups/master-backup-v1.12.5-2-1544442719.tar ...
    Using 3.2.24
    etcd cluster is healthy ...
    Cleaning up etcd container ...
    27148ae6765a546bf45d527d627e5344130fb453c4a532aa2f47c54946f2e665
    27148ae6765a546bf45d527d627e5344130fb453c4a532aa2f47c54946f2e665
    Restore successful ...
    You can restart your cluster now by doing: 
    # kubeadm-setup.sh restart
    

    Substitute /backups/master-backup-v1.12.5-2-1544442719.tar with the full path to the backup file that you wish to restore.

  3. Restart the cluster.

    # kubeadm-setup.sh restart
    Restarting containers now ...
    Detected node is master ...
    Checking if env is ready ...
    Checking whether docker can pull busybox image ...
    Checking access to container-registry.oracle.com ...
    Trying to pull repository container-registry.oracle.com/pause ... 
    3.1: Pulling from container-registry.oracle.com/pause
    Digest: sha256:802ef89b9eb7e874a76e1cfd79ed990b63b0b84a05cfa09f0293379ac0261b49
    Status: Image is up to date for container-registry.oracle.com/pause:3.1
    Checking firewalld settings ...
    Checking iptables default rule ...
    Checking br_netfilter module ...
    Checking sysctl variables ...
    Enabling kubelet ...
    Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service 
    to /etc/systemd/system/kubelet.service.
    Restarting kubelet ...
    Waiting for node to restart ...
    ....+++++
    Restarting pod kube-flannel-ds-glwgx
    pod "kube-flannel-ds-glwgx" deleted
    Restarting pod kube-flannel-ds-jz8sf
    pod "kube-flannel-ds-jz8sf" deleted
    Master node restarted. Complete synchronization between nodes may take a few minutes.
    
  4. Copy the Kubernetes admin.conf file to your home directory:

    $ sudo cp /etc/kubernetes/admin.conf $HOME/ 

    Change the ownership of the file to match your regular user profile:

    $ sudo chown $(id -u):$(id -g) $HOME/admin.conf

    Export the path to the file for the KUBECONFIG environment variable:

    $ export KUBECONFIG=$HOME/admin.conf

    You cannot use the kubectl command if the path to this file is not set for this environment variable. Remember to export the KUBECONFIG variable for each subsequent login so that the kubectl and kubeadm commands use the correct admin.conf file, otherwise you might find that these commands do not behave as expected after a reboot or a new login. For instance, append the export line to your .bashrc:

    $ echo 'export KUBECONFIG=$HOME/admin.conf' >> $HOME/.bashrc
  5. Check that you cluster has been properly restored. Use kubectl to check on the status of the nodes within the cluster and to check any existing configuration. For example:

    $ kubectl get nodes
    NAME                  STATUS    ROLES   AGE       VERSION
    master.example.com    Ready     master  1h        v1.12.5+2.1.1.el7
    worker1.example.com   Ready     <none>  1h        v1.12.5+2.1.1.el7
    worker2.example.com   Ready     <none>  1h        v1.12.5+2.1.1.el7
    
    $ kubectl get pods
    NAME                                READY     STATUS    RESTARTS   AGE
    nginx-deployment-4234284026-g8g95   1/1       Running   0          10m
    nginx-deployment-4234284026-k1h8w   1/1       Running   0          10m
    nginx-deployment-4234284026-sbkqr   1/1       Running   0          10m