4.3.2 High Availability Cluster

The kubeadm-ha-setup tool enables cluster backup and restore functionality so that you can easily protect your Kubernetes deployment from a failure of the master node in the cluster. Cluster status, configuration data and snapshots are stored in the Cluster State Store, also referred to as etcd.

For the backup and restore processes to work properly, there are some basic requirements:

  • The hostname and IP address of the master node being restored, must match the hostname and IP address of the master node that was backed up. The usual use case for restore is after system failure, so the restore process expects a matching system for each master node with a fresh installation of the Docker engine and the Kubernetes packages.

  • A restore can only function correctly using a backup of a Kubernetes high availability cluster running the same version of Kubernetes. The Docker engine versions must also match.

  • There must be a dedicated share storage directory that is accessible to all nodes in the master cluster during the backup and restore phases.

  • All nodes in the master cluster must have root access using password-less key-based authentication for all other nodes in the master cluster whenever kubeadm-ha-setup is used.

A full restore is only required if a period of downtime included more than one node in the master cluster. Note that a full restore disrupts master node availability throughout the duration of the restore process.

Back up the cluster configuration and state

  1. Run kubeadm-ha-setup backup and specify the directory where the backup file should be stored.

    # kubeadm-ha-setup backup /backups
    Disaster Recovery
    Reading configuration file /usr/local/share/kubeadm/run/kubeadm/ha.yaml ...
    CreateSSH /root/.ssh/id_rsa root
    Backup  /backup
    Checking overall clusters health ...
    Performing backup on 192.0.2.10
    Performing backup on 192.0.2.11
    Performing backup on 192.0.2.13
    {"level":"info","msg":"created temporary db file","path":"/var/lib/etcd/etcd-snap.db.part"}
    {"level":"info","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
    {"level":"info","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","took":"110.033606ms"}
    {"level":"info","msg":"saved","path":"/var/lib/etcd/etcd-snap.db"}
    [Backup is stored at /backup/fulldir-1544115826/fullbackup-1544115827.tar]

    Substitute /backups with the path to the network share directory where you wish to store the backup data for your master cluster.

    Each run of the backup command creates as a tar file that is timestamped so that you can easily restore the most recent backup file. The backup file also contains a sha256 checksum that is used to verify the validity of the backup file during restore. The backup command instructs you to restart the cluster when you have finished backing up.

A restore operation is typically performed on a freshly installed host, but can be run on an existing setup, as long as any pre-existing setup configuration is removed.

The restore process assumes that the IP address configuration for each node in the master cluster matches the configuration in the backed up data. If you are restoring on one or more freshly installed hosts, make sure that the IP addressing matches the address assigned to the host or hosts that you are replacing.

The restore process assumes that the Docker engine is configured in the same way as the original master node. The Docker engine must be configured to use the same storage driver and if proxy configuration is required, you must set this up manually before restoring, as described in the following steps.

Note

A full restore of the high availability master cluster disrupts service availability for the duration of the restore operation

Restore the cluster configuration and state

  1. On the master host, ensure that the latest Docker and Kubernetes versions are installed and that the master node IP address and hostname match the IP address and hostname used before failure. The kubeadm package pulls in all of the required dependencies, including the correct version of the Docker engine.

    # yum install kubeadm kubectl kubelet kubeadm-ha-setup
  2. Run the kubeadm-ha-setup restore command.

    # kubeadm-ha-setup restore /backups/fulldir-1544115826/fullbackup-1544115827.tar
    Disaster Recovery
    Reading configuration file /usr/local/share/kubeadm/run/kubeadm/ha.yaml ...
    CreateSSH /root/.ssh/id_rsa root
    Restore  /share/fulldir-1544115826/fullbackup-1544115827.tar 
    with binary /usr/bin/kubeadm-ha-setup
    Checking etcd clusters health (this will take a few mins) ...
    Cleaning up node 10.147.25.195
    Cleaning up node 10.147.25.196
    Cleaning up node 10.147.25.197
    file to be restored from:  /share/fulldir-1544115826/backup-10.147.25.195-1544115826.tar
    Configuring keepalived for HA ...
    success
    success
    file to be restored from:  /share/fulldir-1544115826/backup-10.147.25.196-1544115826.tar
    [INFO]  /usr/local/share/kubeadm/kubeadm-ha/etcd-extract.sh 
    /share/fulldir-1544115826/fullbackup-1544115827.tar 10.147.25.196:22  retrying ...
    file to be restored from:  /share/fulldir-1544115826/backup-10.147.25.197-1544115827.tar
    [INFO]  /usr/bin/kubeadm-ha-setup etcd 
    fullrestore 10.147.25.197 10.147.25.197:22  retrying ...
    [COMPLETED] Restore completed, cluster(s) may take a few minutes to get backup!
    

    Substitute /backups/fulldir-1544115826/fullbackup-1544115827.tar with the full path to the backup file that you wish to restore. Note that the backup directory and file must be accessible to all master nodes in the cluster during the restore process.

    If the script detects that all three master nodes are currently healthy, you need to confirm you wish to proceed:

    [WARNING] All nodes are healthy !!! This will perform a FULL CLUSTER RESTORE
    pressing [y] will restore cluster to the state stored 
    in /share/fulldir-1544115826/fullbackup-1544115827.tar

    Alternatively if the script detects that more than one master node is unavailable then it prompts you before proceeding with a full cluster restore.

  3. Copy the Kubernetes admin.conf file to your home directory:

    $ sudo cp /etc/kubernetes/admin.conf $HOME/ 

    Change the ownership of the file to match your regular user profile:

    $ sudo chown $(id -u):$(id -g) $HOME/admin.conf

    Export the path to the file for the KUBECONFIG environment variable:

    $ export KUBECONFIG=$HOME/admin.conf

    You cannot use the kubectl command if the path to this file is not set for this environment variable. Remember to export the KUBECONFIG variable for each subsequent login so that the kubectl and kubeadm commands use the correct admin.conf file, otherwise you might find that these commands do not behave as expected after a reboot or a new login. For instance, append the export line to your .bashrc:

    $ echo 'export KUBECONFIG=$HOME/admin.conf' >> $HOME/.bashrc
  4. Check that you cluster has been properly restored. Use kubectl to check on the status of the nodes within the cluster and to check any existing configuration. For example:

    $ kubectl get nodes
    NAME                  STATUS    ROLES   AGE      VERSION
    master1.example.com   Ready     master  1h       v1.12.5+2.1.1.el7
    master2.example.com   Ready     master  1h       v1.12.5+2.1.1.el7
    master3.example.com   Ready     master  1h       v1.12.5+2.1.1.el7
    worker2.example.com   Ready     <none>  1h       v1.12.5+2.1.1.el7
    worker3.example.com   Ready     <none>  1h       v1.12.5+2.1.1.el7