Disaster Recovery

The goal of disaster recovery (DR) is to provide high availability at the level of an installation site, and to protect critical workloads hosted on a Roving Edge device against outages and data loss.

The current Roving Edge controller software provides a Disaster Recovery service with orchestration of DR operations from within the Service Enclave. The service is also called native DR because it is built directly into the infrastructure services layer.

Note

The native DR service is shared with Private Cloud Appliance. However, the previous generation disaster recovery service , which might still be used on existing Private Cloud Appliance installations, is NOT supported in the Roving Edge controller software.

Setting up disaster recovery is the responsibility of a device administrator. The systems participating in the DR setup are fully operational installations on their own, running in different physical locations. A mutual peer connection must be established first, so they can operate as each other's standby or replica in case an outage occurs at one of the sites. The DR services on the primary and standby racks communicate using REST API calls. Local commands are sent to the appropriate cloud infrastructure service through the platform layer's internal messaging and administration services.

The DR service makes a clear distinction between resources and operations. The administrator determines which workloads and resources are under disaster recovery protection by creating and managing DR configurations. The DR configurations define which compute instances are protected against site-level incidents, and map the relevant source and target compartments and networking resources between the peered systems.

DR operations are defined in a DR plan, which outlines the steps to perform during a switchover, failover, or postfailover operation. DR configurations and DR plans are shared between the peered systems, and can be created and maintained from the primary or standby system. DR operations could also be executed from either system, except for failover, which is always triggered from the standby.

It is important to understand what is covered under disaster recovery and what is not.

Disaster recovery supports:
  • Compute instances

  • The block volumes associated with these compute instances

The following limitations apply to the disaster recovery feature:
  • File systems are not supported

  • Object storage is not supported

  • OKE clusters are not supported

  • Application and network load balancers are not supported

  • SR-IOV instances are not supported

  • Compute instances with shared block volumes are not supported.