Disaster Recovery

A well-architected disaster recovery (DR) plan enables you to recover quickly from disasters and continue to provide services to your users.

DR is the process of preparing for and recovering from a disaster. A disaster can be any event that puts your applications at risk, from network outages to equipment and application failures to natural disasters. It's almost impossible to predict when you will need disaster recovery, just like you can't predict when you'll get in a car accident. If you can't control when a disaster strikes, the next best thing is to be able to control the recovery process.

A well-designed DR plan lets you recover quickly from disasters and provide business continuity. As your organization moves workloads to the cloud, you need to translate your understanding about how to build resilient on-premises systems to the cloud. Oracle Cloud Infrastructure (OCI) provides highly available, secure, and scalable infrastructure and services that enable you to recover your cloud workloads quickly, reliably, and securely.

Because multi-tier or three-tier architectures are common in traditional on-premises enterprise applications, let's use an example three-tier enterprise application to show how you can make that application more resilient from disaster by using OCI DR capabilities and the reliable and resilient cloud topology best practices. The following diagram shows an example enterprise application in warm standby DR configuration.

Example enterprise application in warm standby disaster recovery configuration.

DR Concepts

The first step in planning for DR involves determining the recovery time objective (RTO) and recovery point objective (RPO).

The RTO is the target time within which a given application must be restored after a disaster occurs. Typically, the more critical the application, the lower the RTO.

The RPO is the period after a disaster occurs for which an application can tolerate lost data before the disaster begins to affect the business.

To build a plan that guarantees the recovery of your applications after a disaster and is cost effective, you must consider both the target time to recover and the tolerance for data loss.

Diagram showing recovery point objective before a disaster, the disaster, then the recovery time objective.

For more information, see Best practices for protecting your cloud topology against disasters.

Choosing a DR Approach

Some applications are more critical than others. The DR solution you choose depends on many possible requirements, including availability, data durability, RTO, and RPO.

Evaluate the DR methods in the following table to decide which OCI DR capabilities to use when deploying multi-tier enterprise applications on OCI.

DR MethodRPORTOCost
Backup and restoreHoursHours$
Pilot lightMinutesMinutes$$
Warm standbySecondsMinutes$$$
Active/activeNear zeroPotential zero$$$$

Consider both regions and the availability domains within a region for DR and high availability (HA) scenarios. A region is a localized geographic area, and an availability domain is one or more data centers located within a region. If your DR plan requires that DR sites are physically located far apart, using multiple regions can accomplish this goal.

For our example enterprise application, we need to be able to survive a regional outage but can handle some downtime if a region is affected. For these reasons we chose a warm standby deployment in multiple regions.

DR Design Considerations

Depending on the DR method that you implement, there are many considerations to make.

For background information about DR capabilities, see DR Capabilities of Oracle Cloud. In this example, we are looking at the warm standby method and the Oracle Cloud Infrastructure resources that are needed to implement warm standby, which include a second region for a cross-region deployment.

Networking

After creating the network foundation of virtual cloud networks (VCNs) and subnets in the respective regions, to configure DR you need to peer the VCNs in the different regions to facilitate network connectivity.

Compute

To run applications on compute instances in two regions, you must make the compute images available in both regions. In the region for DR, deploy a minimal setup to maintain a warm standby. Then, use capacity reservations to reserve the rest of the required capacity to run all the VMs when the DR region becomes primary. For more information, see Overview of the Compute Service and Best Practices for Your Compute Instances.

Storage

Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Object Storage is a regional service and is available across all availability domains within a region and object storage replication needs to be configured for DR purposes. In the example architecture, Object Storage is used but Block Volume and File Storage are not used.

Database

Autonomous Database is a self-driving, self-securing, self-repairing database service. You do not need to configure or manage any hardware or install any software. Autonomous Database handles creating the database, as well as backing up, patching, upgrading, and tuning the database. For more information, see Provision an Autonomous Database and Create and Manage Dedicated Autonomous Databases.

For DR purposes, the primary database is replicated to a standby database in a remote region using Autonomous Data Guard. For more information about how to achieve HA with Autonomous Database standby databases, see About Standby Databases and Enable a Standby Database.

  • With Autonomous Data Guard with a standby database in the current region, Autonomous Database monitors the primary database. If the primary database is unavailable, the standby database automatically assumes the role of the primary database.

  • Autonomous Data Guard does not perform automatic failover for a cross-region standby. If the primary database is unavailable and a local standby is unavailable, you can perform a manual failover to make the standby database in a remote region assume the primary role. For more information, see Oracle Maximum Availability Architecture and Autonomous Database Cloud and Plan Disaster Recovery for Databases.

Monitoring

Monitoring enables you to actively and passively monitor your cloud resources for improved availability and consistent service levels. Also ensure that you subscribe to the OCI status notifications and check the Service Health Dashboard. For an example, see End-to-End Monitoring of applications running on Oracle Cloud Infrastructure.

Explore More