- Kubernetes Clusters restore based on etcd snapshots
- Configure for Disaster Recovery
Configure for Disaster Recovery
Note:
This solution assumes that both Kubernetes clusters, including the control plane and worker nodes, already exist.Plan the Configuration
Note:
This solution assumes that both Kubernetes clusters, including the control plane and worker nodes, already exist. The recommendations and utilities provided in this playbook do not check resources, control plane, or worker node capacity and configuration.Restore, as described here, can be applied in clusters that “mirror” primary (same
number of control plane nodes, same number of worker nodes). The procedures assume
that a primary Kubernetes cluster created with kubeadm
exists. Host
names in the secondary system are configured to mimic primary, as described in the
next paragraphs. Then, the secondary cluster is created also with
kubeadm
(again only AFTER the required host name resolution is taken
care of).
Complete the following requirements for Restore
when planning your configuration:
- Confirm that the required worker nodes and resources in the primary are
available in secondary. This includes shared storage mounts, load balancers, and databases used by the pods and systems used in the namespaces that will be restored.
- Configure your host name resolution so that the host names used by the control
and worker plane are valid in secondary.
For example, if your primary site resolves the cluster similar to the following:
[opc@olk8-m1 ~]$ kubectl get nodes -A NAME STATUS ROLES AGE VERSION olk8-m1 Ready control-plane 552d v1.25.12 olk8-m2 Ready control-plane 552d v1.25.12 olk8-m3 Ready control-plane 2y213d v1.25.12 olk8-w1 Ready <none> 2y213d v1.25.12 olk8-w2 Ready <none> 2y213d v1.25.12 olk8-w3 Ready <none> 2y213d v1.25.12 [opc@olk8-m1 ~]$ nslookup olk8-m1 Server: 169.254.169.254 Address: 169.254.169.254#53 Non-authoritative answer: Name: olk8-m1.k8dbfrasubnet.k8dbvcn.oraclevcn.com Address: 10.11.0.16
Then, your secondary site must use the same node names. In the previous example node in the control plane, the host name in region 2 will be the same mapped to a different IP.[opc@k8dramsnewbastion ~]$ nslookup olk8-m1 Server: 169.254.169.254 Address: 169.254.169.254#53 Non-authoritative answer: Name: olk8-m1.sub01261629121.k8drvcnams.oraclevcn.com Address: 10.5.176.144 [opc@k8dramsnewbastion ~]$
The resulting configuration in secondary (after usingkubeadm
to create the cluster and adding the worker nodes) will use the exact same node names, even if internal IPs and other values defer.[opc@k8dramsnewbastion ~]$ kubectl get nodes -A NAME STATUS ROLES AGE VERSION olk8-m1 Ready control-plane 552d v1.25.11 olk8-m2 Ready control-plane 552d v1.25.11 olk8-m3 Ready control-plane 2y213d v1.25.11 olk8-w1 Ready <none> 2y213d v1.25.11 olk8-w2 Ready <none> 2y213d v1.25.11 olk8-w3 Ready <none> 2y213d v1.25.11
- Use a similar “host name aliasing” for the
kube-api
front end address.Note:
Your primary kubernetes cluster should NOT use IPs for the front-end
kube-api
. You must use a host name so that this front-end can be aliased in the secondary system. See the maak8s-kube-api-alias.sh script for an example on how to add a host name alias to your existing primarykube-api
system.For example, if the primary’skube-api
address resolution is as follows:[opc@olk8-m1 ~]$ grep server .kube/config server: https://k8lbr.paasmaaoracle.com:6443 [opc@olk8-m1 ~]$ grep k8lbr.paasmaaoracle.com /etc/hosts 132.145.247.187 k8lbr.paasmaaoracle.com k8lbr
Then, the secondary’skube-api
should use the same host name (you can map it to a different IP):[opc@k8dramsnewbastion ~]$ grep server .kube/config server: https://k8lbr.paasmaaoracle.com:6443 [opc@k8dramsnewbastion ~]$ grep k8lbr.paasmaaoracle.com /etc/hosts 144.21.37.81 k8lbr.paasmaaoracle.com k8lbr
You can achieve this by using virtual hosts, local/etc/hosts
resolution, or a different DNS servers in each location. To determine the host name resolution method used by a particular host, search for the value of the hosts parameter in the/etc/nsswitch.conf
file on the host.-
If you want to resolve host names locally on the host, then make the files entry the first entry for the
hosts
parameter. Whenfiles
is the first entry for the hosts parameter, entries in the host/etc/hosts
file are used first to resolve host names.Specifying the Use of Local Host Name Resolution in
/etc/nsswitch.conf
file:hosts: files dns nis
-
If you want to resolve host names by using DNS on the host, then make the
dns
entry the first entry for the hosts parameter. Whendns
is the first entry for thehosts
parameter, DNS server entries are used first to resolve host names.Specifying the Use of DNS Host Name Resolution
/etc/nsswitch.conf
file:hosts: dns files nis
For simplicity and consistency, Oracle recommends that all the hosts within a site (production site or standby site) use the same host name resolution method (resolving host names locally or resolving host names using separate DNS servers or a global DNS server).
The “host name aliasing” technique has been used for many years in Disaster Protection for Middleware systems. You can find details and examples in Oracle’s documentation, including the Oracle Fusion Middleware Disaster Recovery Guide and other documents pertaining to Oracle Cloud Disaster Protection, such as Oracle WebLogic Server for Oracle Cloud Infrastructure Disaster Recovery and SOA Suite on Oracle Cloud Infrastructure Marketplace Disaster Recovery.
-
- Create the secondary cluster using the same host name for the front end
kube-api
load balancer as in primary.Perform this step after your host name resolution is ready. See the Kuberneteskubeadm
tool documentation. Use the samekubeadm
and Kubernetes versions as in primary. Container runtimes may defer, but you should use the same versions of Kubernetes infrastructure in both regions.For example, if the primary cluster was created with the following:kubeadm init --control-plane-endpoint $LBR_HN:$LBR_PORT --pod-network-cidr=10.244.0.0/16 --node-name $mnode1 --upload-certs --v=9
Then, use the exact same
$LBR_HN:$LBR_PORT
and CIDR values in secondary as in primary. The same applies if you use other cluster creation tools, such as kOps and kubesparay. - When adding additional control plane or worker nodes, ensure that you use the
same node names in primary and secondary.
kubeadm join $LBR_HN:$LBR_PORT --token $token --node-name $host --discovery-token-ca-cert-hash $token_ca --control-plane --certificate-key $cp_ca
- Once the secondary cluster is configured, the same host names should appear
when retrieving the node information from kubernetes.
The $host variables used in secondary for each control plane and worker nodes must be the same as those used in primary.
Primary Cluster
Run the following command on primary to confirm the control plane and worker node status, role, age, version, internal IP, external IP, OS image, kernel version, and container runtime:The following is example output.[opc@olk8-m1 ~]$ kubectl get nodes -o wide
[opc@olk8-m1 ~]$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME olk8-m1 Ready control-plane 578d v1.25.12 10.11.0.16 <none> Oracle Linux Server 7.9 4.14.35-1902.302.2.el7uek.x86_64 cri-o://1.26.1 olk8-m2 Ready control-plane 578d v1.25.12 10.11.210.212 <none> Oracle Linux Server 7.9 5.4.17-2136.301.1.3.el7uek.x86_64 cri-o://1.26.1 olk8-m3 Ready control-plane 2y238d v1.25.12 10.11.0.18 <none> Oracle Linux Server 7.9 4.14.35-2047.527.2.el7uek.x86_64 cri-o://1.26.1 olk8-w1 Ready <none> 2y238d v1.25.12 10.11.0.20 <none> Oracle Linux Server 7.9 4.14.35-1902.302.2.el7uek.x86_64 cri-o://1.26.1 olk8-w2 Ready <none> 2y238d v1.25.12 10.11.0.21 <none> Oracle Linux Server 7.9 4.14.35-1902.302.2.el7uek.x86_64 cri-o://1.26.1 olk8-w3 Ready <none> 2y238d v1.25.12 10.11.0.22 <none> Oracle Linux Server 7.9 4.14.35-1902.302.2.el7uek.x86_64 cri-o://1.26.1 [opc@olk8-m1 ~]$
Run the following command on primary to identify where the Kubernetes control plane and the Core DNS are running.[opc@olk8-m1 ~]$ kubectl cluster-info
Secondary Cluster
Run the following command on secondary to confirm the control plane and worker node status, role, age, version, internal IP, external IP, OS image, kernel version, and container runtime:[opc@k8dramsnewbastion ~]$ kubectl get node -o wide
The following is example output.[opc@k8dramsnewbastion ~]$ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME olk8-m1 Ready control-plane 579d v1.25.11 10.5.176.144 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.21 olk8-m2 Ready control-plane 579d v1.25.11 10.5.176.167 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.21 olk8-m3 Ready control-plane 2y239d v1.25.11 10.5.176.154 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.21 olk8-w1 Ready <none> 2y239d v1.25.11 10.5.176.205 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.22 olk8-w2 Ready <none> 2y239d v1.25.11 10.5.176.247 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.22 olk8-w3 Ready <none> 2y239d v1.25.11 10.5.176.132 <none> Oracle Linux Server 8.7 5.15.0-101.103.2.1.el8uek.x86_64 containerd://1.6.22 [opc@k8dramsnewbastion ~]$ kubectl cluster-info Kubernetes control plane is running at https://k8lbr.paasmaaoracle.com:6443 CoreDNS is running at https://k8lbr.paasmaaoracle.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. [opc@k8dramsnewbastion ~]$
Run the following command on secondary to identify where the Kubernetes control plane and the Core DNS are running.[opc@k8dramsnewbastion ~]$ kubectl cluster-info
With the default settings in
kubeadm
cluster creation,etcd
will use the same ports in primary and secondary. If the cluster in secondary needs to use different ports, then you must modify the scripts to handle it. You can use different storage locations in primary and secondary for theetcds
database. The scripts will take care of restoring in the appropriate location that the secondary cluster is using foretcd
. - Install
etcdctl
both in the primary and secondary locations (nodes executing the backup and restore scripts).The scripts for backup and restore will useetcdctl
to obtain information from the cluster and to create and applyetcd
snapshots. To installetcdctl
see the https://github.com/etcd-io/etcd/releases documentation. - Ensure that the appropriate firewall and security rules are in place so that
the node executing the backup and restore operations are enabled for this type
of access.The scripts will also need to access the cluster with kubectl and reach out the different nodes through SSH and HTTP (for shell commands and
etcdctl
operations).
Configure
Configure for disaster recovery.
The steps for a restore involve the following:
- Take an
etcd
backup in a primary location. - Ship the backup to the secondary location.
- Restore that
etcd
backup in a secondary cluster.
Perform the following steps:
- Create an
etcd
backup in a primary Kubernetes cluster.- Download ALL the of the scripts for
etcd
snapshot DR from the "Download Code" section of this document.Note:
All of the scripts must be in the same path because the main scripts use other auxiliary scripts. - Obtain the advert_port from a control plane node
etcd
configuration.[opc@olk8-m1 ~]$ sudo grep etcd.advertise-client-urls /etc/kubernetes/manifests/etcd.yaml | awk -F ":" '{print $NF}' 2379
And the same for theinit_port
:[opc@olk8-m1 ~]$ sudo grep initial-advertise-peer-urls /etc/kubernetes/manifests/etcd.yaml | awk -F ":" '{print $NF}' 2380
These ports are the default ones and are used by all of the control plane’s
etcd
pods. In the rare situations whereetcd
has been customized to use a differentinit
andadvertise
port in each node, you must customize the scripts to consider those. You can also customize the value for theinfra_pod_list
if other network plugins are used or other relevant pods or deployments must be restarted after restore in your particular case. However, in general, it can be defaulted to the values provided in the file. - Edit the
maak8s.env
script and update the variables according to your environment.The following is an examplemaak8s.env
file:[opc@olk8-m1 ~]$ cat maak8s.env #sudo ready user to ssh into the control plane nodes export user=opc #ssh key for the ssh export ssh_key=/home/opc/KeyMAA.ppk #etcdctl executable's location export etcdctlhome=/scratch/etcdctl/ #etcd advertise port export advert_port=2379 #etcd init cluster port export init_port=2380 #infrastructure pods that will be restarted on restore export infra_pod_list="flannel proxy controller scheduler"
- Run the
maak8-etcd-backup.sh
script and provide as arguments the following fields in this order:- The directory where the backup will be stored
- A “LABEL/TEXT” describing the backup
- The location of the cluster configuration to run
kubectl
operations
For example:[opc@olk8-m1 ~]$ ./maak8-etcd-backup.sh /backup-volumes/ "ETCD Snapshot after first configuration " /home/opc/.kubenew/config
The script performs the following tasks:
- Creates an
etcd
snapshot from theetcd
master node - Creates a copy of the current configuration of each control plane node (manifests and certs for each control plane node), including the signing keys for the cluster
- Records the list of nodes, pods, services, and cluster configuration
- Stores all the information above in a directory labeled with the date.
If the directory specified in the command line argument is
/backup-volume
, then the backup is stored under/backup-volume/etcd_snapshot_date
of the backup. For example,/backup-volume/etcd_snapshot_2022-08-29_15-56-59
.
- Download ALL the of the scripts for
- Copy the entire directory
(
/backup-volume/etcd_snapshot_date
) to the secondary cluster.- Use an
sftp
tool or create a tar with the directory and send it to the secondary location. - Untar or unzip the file to make it available in the secondary system, as it was in primary.
- Make a note of the date label in the backup (in the example above it would be 2022-08-29_15-56-59).
For example,[opc@olk8-m1 ~]$ scp -i KeyMAA.ppk -qr /backup-volume/etcd_snapshot_2022-08-29_15-56-59 154.21.39.171:/restore-volume [opc@olk8-m1 ~]$ ssh -i KeyMAA.ppk 154.21.39.171 "ls -lart /restore-volume" total 4 drwxrwxrwt. 6 root root 252 Aug 30 15:11 .. drwxrwxr-x. 3 opc opc 47 Aug 30 15:12 . drwxrwxr-x. 5 opc opc 4096 Aug 30 15:12 etcd_snapshot_2022-08-29_15-56-59
- Use an
- Once the backup is available in the secondary location, follow these steps to
restore it:
- Download ALL the scripts for
etcd
snapshot DR from the "Download Code" section to the secondary region node that will run the restore.Remember that this node must also haveetcdctl
installed andkubectl
access to the secondary cluster.Note:
Because the main scripts use other auxiliary scripts, you must have all scripts in the same path when executing the different steps. - Edit the
maak8s.env
script and update the variables according to your environment.You can alter the user, ssh key andetcdctl
location, accordingly to your secondary nodes, but theadvert
andinit
ports should be the same as those that are used in primary.The following is an examplemaak8s.env
file:[opc@olk8-m1 ~]$ cat maak8s.env #sudo ready user to ssh into the control plane nodes export user=opc #ssh key for the ssh export ssh_key=/home/opc/KeyMAA.ppk #etcdctl executable's location export etcdctlhome=/scratch/etcdctl/ #etcd advertise port export advert_port=2379 #etcd init cluster port export init_port=2380 #infrastructure pods that will be restarted on restore export infra_pod_list="flannel proxy controller scheduler"
- Run the restore using the
maak8-etcd-restore.sh
script. Provide, as arguments, the root directory where the backup was copied from primary to standby, the timestamp of the backup, and the location of thekubectl
configuration for the cluster.For example,[opc@k8dramsnewbastion ~]$ ./maak8-etcd-restore.sh /restore-volume 2022-08-29_15-56-59 /home/opc/.kube/config
The script looks in the
/restore-volume
directory for a subdirectory namedetcd_snapshot_date
. Using the example, it will use/restore-volume/etcd_snapshot_2022-08-29_15-56-59
.The restore performs the following tasks:- Force stops the control plane in secondary, if it is running
- Restores the
etcd
snapshot in all of the control plane nodes - Replaces the cluster signing keys in all of the control plane nodes
- Starts the control plane
- Recycles all infrastructure pods (proxy, scheduler, controllers) and deployments in the cluster (to bring it to a consistent state)
At the end of the restore, a report displays the status of the pods and
etcd
subsystem. For example,NAMESPACE NAME READY STATUS RESTARTS AGE default dnsutils 1/1 Running 0 27d default nginx-deployment-566ff9bd67-6rl7f 1/1 Running 0 19s default nginx-deployment-566ff9bd67-hnx69 1/1 Running 0 17s default nginx-deployment-566ff9bd67-hvrwq 1/1 Running 0 15s default test-pd 1/1 Running 0 26d kube-flannel kube-flannel-ds-4f2fz 1/1 Running 3 (22d ago) 35d kube-flannel kube-flannel-ds-cvqzh 1/1 Running 3 (22d ago) 35d kube-flannel kube-flannel-ds-dmbhp 1/1 Running 3 (22d ago) 35d kube-flannel kube-flannel-ds-skhz2 1/1 Running 3 (22d ago) 35d kube-flannel kube-flannel-ds-zgkkp 1/1 Running 4 (22d ago) 35d kube-flannel kube-flannel-ds-zpbn7 1/1 Running 3 (22d ago) 35d kube-system coredns-8f994fbf8-6ghs4 0/1 ContainerCreating 0 15s kube-system coredns-8f994fbf8-d79h8 1/1 Running 0 19s kube-system coredns-8f994fbf8-wcknd 1/1 Running 0 12s kube-system coredns-8f994fbf8-zh8w4 1/1 Running 0 19s kube-system etcd-olk8-m1 1/1 Running 22 (89s ago) 44s kube-system etcd-olk8-m2 1/1 Running 59 (88s ago) 44s kube-system etcd-olk8-m3 1/1 Running 18 (88s ago) 26s kube-system kube-apiserver-olk8-m1 1/1 Running 26 (89s ago) 44s kube-system kube-apiserver-olk8-m2 1/1 Running 60 (88s ago) 42s kube-system kube-apiserver-olk8-m3 1/1 Running 18 (88s ago) 27s kube-system kube-controller-manager-olk8-m1 1/1 Running 19 (89s ago) 10s kube-system kube-controller-manager-olk8-m2 1/1 Running 18 (88s ago) 10s kube-system kube-controller-manager-olk8-m3 1/1 Running 18 (88s ago) 10s kube-system kube-flannel-ds-62dcq 1/1 Running 0 19s kube-system kube-flannel-ds-bh5w7 1/1 Running 0 19s kube-system kube-flannel-ds-cc2rk 1/1 Running 0 19s kube-system kube-flannel-ds-p8kdk 1/1 Running 0 19s kube-system kube-flannel-ds-vj8r8 1/1 Running 0 18s kube-system kube-flannel-ds-wz2kv 1/1 Running 0 18s kube-system kube-proxy-28d98 1/1 Running 0 14s kube-system kube-proxy-2gb99 1/1 Running 0 15s kube-system kube-proxy-4dfjd 1/1 Running 0 14s kube-system kube-proxy-72l5q 1/1 Running 0 14s kube-system kube-proxy-s8zbs 1/1 Running 0 14s kube-system kube-proxy-tmqnm 1/1 Running 0 14s kube-system kube-scheduler-olk8-m1 0/1 Pending 0 5s kube-system kube-scheduler-olk8-m2 1/1 Running 18 (88s ago) 5s kube-system kube-scheduler-olk8-m3 1/1 Running 18 (88s ago) 5s newopns weblogic-operator-5d74f56886-mtjp6 0/1 Terminating 0 26d newopns weblogic-operator-webhook-768d9f6f79-tdt8b 0/1 Terminating 0 26d soans soaedgdomain-adminserver 0/1 Running 0 22d soans soaedgdomain-soa-server1 0/1 Running 0 22d soans soaedgdomain-soa-server2 0/1 Running 0 22d +--------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +--------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | olk8-m1:2379 | 63c63522f0be24a6 | 3.5.6 | 146 MB | true | false | 2 | 1195 | 1195 | | | olk8-m2:2379 | 697d3746d6f10842 | 3.5.6 | 146 MB | false | false | 2 | 1195 | 1195 | | | olk8-m3:2379 | 7a23c67093a3029 | 3.5.6 | 146 MB | false | false | 2 | 1195 | 1195 | | +--------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ +------------------+---------+---------+----------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+---------+----------------------+---------------------------+------------+ | 7a23c67093a3029 | started | olk8-m3 | https://olk8-m3:2380 | https://10.5.176.154:2379 | false | | 63c63522f0be24a6 | started | olk8-m1 | https://olk8-m1:2380 | https://10.5.176.144:2379 | false | | 697d3746d6f10842 | started | olk8-m2 | https://olk8-m2:2380 | https://10.5.176.167:2379 | false | +------------------+---------+---------+----------------------+---------------------------+------------+ Restore completed at 2023-08-30_15-18-22 [opc@k8dramsnewbastion ~]$
- Download ALL the scripts for
Verify
maak8DR-apply.sh
script, verify
that all of your artifacts which existed in the primary cluster have been replicated to
the secondary cluster. Look at the secondary cluster and verify that the pods in the
secondary site are running without error.
- Check the status of the secondary until the required pods match the state in
primary. By default, the pods and deployments are started in the secondary region. At the end of the restore, the status of the secondary cluster is shown. Some pods might take additional time to reach RUNNING state.
- Check the
restore
log in the secondary for possible errors.The log location is reported at the beginning of the restore. By default the log is created under the directory where the backup itself was located, at/backup_dir/etcd_snapshot_backup-date/restore_attempted_restore-date/restore.log
. Another log is created specifically for theetcd
snapshot restore operation/backup_dir/etcd_snapshot_backup-date/restore_attempted_restore-date/etcd_op.log
. - (Optional) Revert back.
In addition to the restore logs, a backup of the previous
/etc/kubernetes
configuration is created for each one of the control planes nodes under the/backup_dir/etcd_snapshot_backup-date/restore_attempted_restore-date/current_etc_kubernetes
directory. Similarly, theetcd
databases in each node BEFORE the restore are copied to/backup_dir/etcd_snapshot_backup-date/restore_attempted_restore-date/node_name
. You can use these to revert back to the cluster configuration that existed before the restore was executed.