Configure for Disaster Recovery

You can use the scripts provided with this solution playbook to create a YAML snapshot in a primary Kubernetes cluster and restore it in another (secondary) Kubernetes cluster. It's important to plan the configuration and understand the requirements before downloading and using the scripts to configure your YAML snapshot.

Note:

This solution assumes that both Kubernetes clusters, including the control plan and worker nodes, already exist.

Plan the Configuration

Plan the resources and configuration on the secondary system based on the primary system. The scripts require that both Kubernetes clusters already exist. You must be able to access both clusters with the Kubernetes command-line tool, kubectl, to run commands against them.

Note:

This solution assumes that both Kubernetes clusters, including the control plan and worker nodes, already exist. The recommendations and scripts provided in this playbook do not check resources, control plane, or worker node configuration.

The following diagram shows that when configured, you can restore the artifact snapshot in completely different Kubernetes clusters.

Description of kube-api-dr.png follows
Description of the illustration kube-api-dr.png

Complete the following requirements for Restore when planning your configuration:

  1. Confirm that the required worker nodes and resources in the primary are available in secondary.
    This includes the pod's shared storage mounts, load balancers, and databases. It also includes any external systems used by the namespaces being restored.
  2. Manually create the required persistent volumes used by the namespaces involved before running the scripts.

    This is the default action. The scripts will create the persistent volume claims used in primary. However, because persistent volumes may be shared by different claims in different namespaces the automation scripts expect you to manually create persistent volumes on the secondary cluster before running the extract-apply scripts.

    Alternatively, you can add pv to the nons_artifacts_types variable in the maak8DR-apply.env file (that is, use export nons_artifacts_types="crd clusterrole clusterrolebinding pv"). This instructs the scripts to ALSO create the persistent volumes in secondary. In this second case, it is up to you to determine if conflicts may arise with other persistent volume claims.

  3. Confirm that the secondary cluster has appropriate access to the container images used by the namespaces being replicated:
    • Secrets used to access container registries that exist in the namespaces to be replicated are copied by the scripts provided with this playbook. If the credentials to the registries are stored in other namespaces, then you must create them manually in secondary. Alternatively you can label the credentials with maak8sapply: my_ns (where my_ns is the namespace that is being restored) so that the secret is also included in the YAML snapshot. For example,
      kubectl label secret regcredfra -n other_namespace 
      maak8sapply=namespace_being_backedup
    • If you're using any images manually loaded in the primary worker nodes, you must also load them manually in the secondary worker nodes.

      Note:

      The scripts provided will report the images used in the namespaces being replicated.
  4. Provide access to the primary and secondary clusters through bastions nodes capable of running kubectl operations against each cluster’s Kube API endpoints.
    It's possible to use a third node that can ssh or scp to both (primary and standby) and coordinate the DR synchronization. However, to avoid unnecessary hops and session overhead, Oracle recommends using the primary bastion as the DR coordinator. Other options require customizing the scripts provided.
  5. Use the maak8sapply: my_ns label if you want non-namespaced resources included in the backup to be applied in secondary when restoring the my_ns namespace.

    For those artifacts residing in the cluster’s root (that is, not part of a precise namespace), the scripts look for namespace: and group: field references that contain the namespaces names. If you need any other non-namespaced resources included in the backup, you can label them to be added.

    For example, the domains.weblogic.oracle custom resource definition is not part of any namespace, but you can use the following to include it in the apply operation labeling: kubectl label crd domains.weblogic.oracle maak8sapply=opns.

Configure

Configure YAML snapshot disaster recovery.

  1. Download all of the YAML snapshot disaster recovery scripts from "Download Code".

    Note:

    All of the scripts must be in the same path because the main scripts use other auxiliary scripts.
  2. Edit the maak8DR-apply.env script and update the addresses and ssh key required to access the secondary system.
    For example,
    export user_sec=opc
    export ssh_key_sec=/home/opc/Key.ppk
    #Secondary bastion node
    export sechost=10.10.0.23
  3. Customize the values for the exclude_list and nons_artifacts_types, as needed.
    • exclude_list: This is a space-separated list of those namespaces that should be excluded from the backup even when trying to backup ALL custom namespaces. This is to avoid copying control plane related namespaces that will not be applicable on secondary.
    • nons_artifacts_types: This is the list or artifacts that belong to the root tree (that is, not part of a precise namespace) but that must also be included in the snapshot. The framework will look for references in this to the namespaces being backed up.
    Generally, you can use the default values provided in the file:
    #List of namespaces that will be excluded from the backup
    export exclude_list="kube-system kube-flannel kube-node-lease kube-public"
    #Root artifacts that will be included
    export nons_artifacts_types="crd clusterrole clusterrolebinding"
    
  4. Run the maak8DR-apply.sh script providing as arguments the selected namespaces to replicate.
    • If you don't provide arguments, then the scripts will replicate ALL namespaces excluding the namespaces provided in the exclude_list variable.

    • If you use a precise list of namespaces, then you must order them based on the dependencies with other namespaces.

      That is, if the namespace soans depends on, or uses, services in the namespace opns, then opns must appear first in the list. For example, instead of ./maak8DR-apply.sh soans opns, run the following:

      ./maak8DR-apply.sh opns soans

Verify

After running the maak8DR-apply.sh script, verify that all of your artifacts which existed in the primary cluster have been replicated to the secondary cluster. Look at the secondary cluster and verify that the pods in the secondary site are running without error.

When you run the maak8DR-apply.sh script, the framework creates the working_dir directory as /tmp/backup.date. When you run the maak8-get-all-artifacts.sh and maak8-push-all-artifacts.sh scripts individually, the working directory is provided in each case as an argument in the command line.

  1. Check the status of the secondary until the required pods match the state in primary.
    By default, the pods and deployments are started in the secondary region. At the end of the restore, the status of the secondary cluster is shown. Some pods might take additional time to reach RUNNING state.
  2. Check the $working_dir/date/backup-operations.log file in the primary for possible errors in the extract and apply operations.
  3. Check the $working_dir/restore.log and $working_dir/date/restore-operations.log files in the secondary for possible errors in the extract and apply operations.
    The restore-operations.log file contains the detailed restore operations.