Learn About Kubernetes Clusters
Restore Based on etcd
Snapshots
Despite the tremendous change that Kubernetes adoption implies for IT system’s architecture, a Kubernetes system presents similar disaster recovery (DR) paradigms as a traditional application (such as Oracle Java SE, Oracle Java EE, or JavaScript). It must maintain a consistent and “as up-to-date as possible” copy of your primary system in a secondary location that can resume workloads should a disaster cause downtime in the primary region.
This solution playbook provides Oracle MAA recommendations and utilities that will create a secondary Kubernetes system and enable you to recover in disaster scenarios affecting a location and forcing the redirection of workloads to a replica site.
Although this solution playbook focuses on restoring a Kubernetes cluster in a different region, you can use the same scripts and procedures to restore a cluster in-place to a previous point in time. This operation may be useful in scenarios other than disaster recovery, such as the following:
- When the control plane is misconfigured.
- When the
etcd
database is lost, corrupted, or whenetcd
is not coming up properly. - When an incorrect deployment or user error affects multiple applications or microservices and the cluster must be reverted to a previous version or incarnation. An ETCD snapshot restore will revert all of the artifacts to the point in time at which the snapshot (backup) was taken.
This document focuses on replicating Kubernetes’ etcd
data
to a secondary location. All the information of a Kubernetes cluster is stored in
etcd
, which is a key value store used as Kubernetes' backing store
for the cluster’s data. This solution playbook provides recommendations to replicate a
Kubernetes cluster created with the Kubernetes kubeadm
tool
(see https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/)
based on etcd restore
. The setup procedures and scripts provided may
require customizations for other type of clusters (those not created with
kubeadm
), but in general, are valid as long as there is access to
the etcd
endpoints that the Kubernetes’ control plane uses. This
replication solution requires a specific setup for the secondary cluster and will use a
copy of etcd
(also called etcd snapshots
) to bring up
the exact same artifacts that existed in primary.
You can use artifact snapshots or third party Kubernetes backup tools to
replicate specific namespaces and applications across Kubernetes systems. However, they
don't protect clusters from file corruptions and misconfigurations in the control plane
metadata. In addition to using them for disaster protection, you can use etcd
snapshots
for local restores and to revert Kubernetes clusters to a
previous, working point in time. If your etcd store
and etcd
cluster
system are unhealthy, then applications may keep running but pod
relocations, configuration changes, secret access, and any other operation requiring the
control plane availability will not work properly. It is for this reason that
etcd
preservation must be a critical part of any Kubernetes cluster
lifecycle procedures.
Besides the Kubernetes configuration data, applications and microservices running on Kubernetes may generate data at runtime. Runtime data disaster protection is out of the scope of this document and should be treated exactly in the same way as in traditional applications running on application servers:
- Avoid Polyglot persistence (the use of different types of persistent stores for runtime data is an “almost” impossible to solve problem per the BAC Theorem)
- Use a single store for all the different data types, microservices and applications with dependencies as much as possible
- See Oracle MAA Best Practices for Oracle Database for disaster protection for your runtime
Before You Begin
Review the following for more details:
- Oracle WebLogic Server for Oracle Cloud Infrastructure Disaster Recovery
- SOA Suite on Oracle Cloud Infrastructure Marketplace Disaster Recovery
See Use artifact snapshots to protect your OCI Kubernetes Engine clusters from disaster for details on using artifact snapshots for application-specific configuration protection.
Architecture
This architecture shows the disaster recovery (DR) system's topology for the Kubernetes cluster.
All runtime, configuration, and metadata information residing in the primary
database is replicated from Region 1 to Region 2 with Oracle Autonomous Data Guard. The required Kubernetes cluster configuration is replicated through etcd snapshots
for control plane protection. You can use etcd
copies or artifact
snapshots for application-specific configuration protection. Images used by the
containers are hosted in registries either local to each cluster or in external
repositories (images are not considered a Kubernetes cluster configuration by
themselves).
Note:
Setting up Oracle Autonomous Data Guard for the runtime database is out of the scope of this document.
Description of the illustration kubernetes-etcd-multiregion-dr.png
kubernetes-etcd-multiregion-dr-oracle.zip
This architecture supports the following components:
- Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Load balancer
The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.
- Dynamic routing gateway
(DRG)
The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.
- Data Guard
Oracle Data Guard and Oracle Active Data Guard provide a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases and that enable production Oracle databases to remain available without interruption. Oracle Data Guard maintains these standby databases as copies of the production database by using in-memory replication. If the production database becomes unavailable due to a planned or an unplanned outage, Oracle Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Oracle Active Data Guard provides the additional ability to offload read-mostly workloads to standby databases and also provides advanced data protection features.
- Oracle Real Application Clusters (Oracle
RAC)
Oracle RAC enables you to run a single Oracle Database across multiple servers to maximize availability and enable horizontal scalability, while accessing shared storage. User sessions connecting to Oracle RAC instances can failover and safely replay changes during outages, without any changes to end user applications.
- Registry
Oracle Cloud Infrastructure Registry is an Oracle-managed registry that enables you to simplify your development-to-production workflow. Registry makes it easy for you to store, share, and manage development artifacts, like Docker images. The highly available and scalable architecture of Oracle Cloud Infrastructure ensures that you can deploy and manage your applications reliably.
- Kubernetes Engine
Oracle Cloud Infrastructure Kubernetes Engine (OCI Kubernetes Engine or OKE) is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Kubernetes Engine provisions them on Oracle Cloud Infrastructure in an existing tenancy. OKE uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.
- Kubernetes
cluster
A Kubernetes cluster is a set of machines that run containerized applications. Kubernetes provides a portable, extensible, open source platform for managing containerized workloads and services in those nodes. A Kubernetes cluster is formed of worker nodes and control plane nodes.
- Kubernetes
control planeA Kubernetes control plane manages the resources for the worker nodes and pods within a Kubernetes cluster. The control plane components detect and respond to events, perform scheduling, and move cluster resources. The following are the control plane components:
- kube-apiserver: Runs the Kubernetes API server.
- etcd: Distributed key-value store for all cluster data.
- kube-scheduler: Determines which node new unassigned pods will run on.
- kube-controller-manager: Runs controller processes.
- cloud-controller-manager: Links your cluster with cloud-specific API.
- Kubernetes worker
node
A Kubernetes worker node is a worker machine that runs containerized applications within a Kubernetes cluster. Every cluster has at least one worker node.
- Ingress Controller
An Ingress controller is a component that runs in a Kubernetes cluster and manages the Ingress resources. It receives traffic from the external network, routes it to the correct service, and performs load balancing and SSL termination. The Ingress controller typically runs as a separate pod in the cluster and can be scaled independently from the services it manages.
- KUBE-Endpoint API
The KUBE-Endpoint API is the
kube-apiserver
component of the Kubernetes control plane. It runs the Kubernetes API server. - ETCD Backup
ETCD Backup is a backup of
etcd
component of the Kubernetes control plane. Theetcd
contains the distributed key-value store for all cluster data. It's important to create an ETCD Backup to recover Kubernetes clusters for disaster recovery. - YAML Snapshots
A YAML snapshot is a point-in-time copy of the (yaml) files containing the definition of the artifacts in a Kubernetes cluster. The snapshot is a tar file that you can use to restore those artifacts in the same or a different Kubernetes cluster.
Considerations for Kubernetes Disaster Protection
When implementing disaster protection for Kubernetes, consider the following:
- Symmetric disaster recovery (DR): Oracle recommends using the exact same resource capacity and configuration in primary and secondary. The Kubernetes clusters in the two regions should have similar resources available, such as the number of worker nodes (and their hardware capacity) and other infrastructure (shared storage, load balancers, databases, and so on). The resources on which the Kubernetes cluster in the secondary region depend, must be able to keep up with the same workloads as primary. Also, the two systems must be consistent functionally with the exact same services on which the restored system depends on, side cars, configuration maps (CMs) must be used in both locations.
- Host name alias and virtualization: It is important to plan the
host names used by the nodes in the secondary site. The host names or alias host
names for the control plane and worker nodes must be “active” in the secondary
location before restoring an
etcd
snapshot from a primary Kubernetes cluster. Node names are stored in different artifacts of a Kubernetes cluster to identify worker nodes, to mark pod allocations, in configuration (config) maps describing the cluster itself, and in multiple configuration files and entries. Your secondary location must identify the worker, control plane, and front-endkube-api
addresses with the same host names used in the primary Kubernetes cluster (a fully-qualified name may differ in the domain name, but the host name must be the same. Without host name aliasing, anetcd
snapshot restore will not work properly since kubelets, schedulers, controllers and, in general, the control plane services will not be able to reach the appropriate endpoints and hosts used by the replicated configuration. DO not use IP addresses to identify nodes in Kubernetes, always use host names instead. - Images present a similar paradigm to binaries: Images might not
change as frequently as the Kubernetes configuration and you might not need to
update images with every Kubernetes cluster replication. The images used by the
primary system must be the same as the ones used in the secondary system or
inconsistencies and failure may take place. However, image replication is out of the
scope of this playbook. There are multiple strategies that you can use to maintain a
consistent use of images between two locations, including the following:
- Save images in primary and load to secondary’s worker nodes. This approach is very easy to implement but incurs in management overhead. Using container registries has considerable benefits and saving images locally makes it more difficult to manage versions and updates.
- Images can reside in totally external Container registries in different regions or data centers from the ones used by primary and standby. External products and libraries are maintained by third parties and their availability is typically implicit in their releases.
- Images can reside in Container Registries located in primary and standby. Each region gets updated in parallel when a new version of an image is released. This provides better control over the software used but incurs in higher management overhead. It requires duplicating images and managing the credentials to access two different registries. CI/CD tools are typically used for this approach.
- Differences between the primary and secondary cluster: It is
expected (as it is in general for DR systems) that primary and secondary are a
replica of each other in terms of versions and configuration used. The following
aspects are especially relevant:
- Kubernetes versions
- Container runtime and container runtime version
- Network plugin and network plugin versions
podSubnet
andservicesubnet
etcd
hostpath directory, andetcd
image version- Network plugin and dns image version
- Image repository for control plane pods
Disaster Protection Configurations with minor differences between sites in the Kubernetes version may work, but they may leave the cluster in an inconsistent state after a restore from a primary’s
etcd
snapshot. Information about container runtime sockets, Kubernetes version, and so on is stored in Kubernetes itself . For example,cri-socket
information is specific to each node based on the container runtime being used and it is stored in internal config maps. A lot of information used for upgrades bykubeadm
is based on config maps in thekube-system
namespace. Therefore, it's important that both primary and secondary use the same information in these configmaps. You can use this command to store all the relevant information from the important configmaps in both primary and standby inyaml
files.[prompt]$ kubectl get cm -n kube-system | grep -v NAME | awk '{print $1}'| xargs -I{} sh -c 'kubectl get cm "$1" -o yaml -n kube-system > "$1-`date +%y-%m-%d-%H-%M-%S`.yaml"' -- {}
Similarly, you can make a copy of the node configuration in each site with the following command:
[prompt]$ kubectl get node |grep -v NAME | awk '{print $1}'| xargs -I{} sh -c 'kubectl get node "$1" -o yaml > "$1-`date +%y-%m-%d-%H-%M-%S`.yaml"' -- {}
This solution playbook presents an example using Kubernetes clusters created
using kubeadm
tool. The recommendations are generic to custom
Kubernetes clusters installed in on-premises systems but most of the scripts may require
modifications for clusters that are not created with the kubeadm
tool.
You must use the steps and scripts provided between Kubernetes clusters running the same
etcd
and Kubernetes version. Restore of etcd
snapshots across different Kubernetes versions may lead to inconsistencies and unstable
etcd
clusters.
About Required Products and Roles
This solution requires the following products and roles:
- Kubernetes cluster
- Bastion capable of managing the Kubernetes system access the etcd endpoints of the cluster and access the different control plane nodes with ssh
- (Optional) Oracle Cloud
Infrastructure (OCI)
This playbook is based on using OCI regions and resources for the primary and secondary regions. However, this solution is also applicable for Kubernetes clusters that are located on-premises.
These are the roles needed for each service.
Service Name: Role | Required to ... |
---|---|
Kubernetes cluster (primary): administrator |
run all of the scripts. |
Kubernetes (primary) nodes: user with sudo rights to root |
run the following scripts:
|
Kubernetes cluster (secondary):
administrator |
run all of the scripts. |
Kubernetes (secondary) nodes: user with sudo rights to root | run the following scripts:
|
See Oracle Products, Solutions, and Services to get what you need.