Disaster Recovery
A well-architected disaster recovery (DR) plan enables you to recover quickly from disasters and continue to provide services to your users.
DR is the process of preparing for and recovering from a disaster. A disaster can be any event that puts your applications at risk, from network outages to equipment and application failures to natural disasters. It's almost impossible to predict when you will need disaster recovery, just like you can't predict when you'll get in a car accident. If you can't control when a disaster strikes, the next best thing is to be able to control the recovery process.
A well-designed DR plan lets you recover quickly from disasters and provide business continuity. As your organization moves workloads to the cloud, you need to translate your understanding about how to build resilient on-premises systems to the cloud. Oracle Cloud Infrastructure (OCI) provides highly available, secure, and scalable infrastructure and services that enable you to recover your cloud workloads quickly, reliably, and securely.
Because multi-tier or three-tier architectures are common in traditional on-premises enterprise applications, let's use an example three-tier enterprise application to show how you can make that application more resilient from disaster by using OCI DR capabilities and the reliable and resilient cloud topology best practices. The following diagram shows an example enterprise application in warm standby DR configuration.
DR Concepts
The first step in planning for DR involves determining the recovery time objective (RTO) and recovery point objective (RPO).
The RTO is the target time within which a given application must be restored after a disaster occurs. Typically, the more critical the application, the lower the RTO.
The RPO is the period after a disaster occurs for which an application can tolerate lost data before the disaster begins to affect the business.
To build a plan that guarantees the recovery of your applications after a disaster and is cost effective, you must consider both the target time to recover and the tolerance for data loss.
For more information, see Best practices for protecting your cloud topology against disasters.
Choosing a DR Approach
Some applications are more critical than others. The DR solution you choose depends on many possible requirements, including availability, data durability, RTO, and RPO.
Evaluate the DR methods in the following table to decide which OCI DR capabilities to use when deploying multi-tier enterprise applications on OCI.
DR Method | RPO | RTO | Cost |
---|---|---|---|
Backup and restore | Hours | Hours | $ |
Pilot light | Minutes | Minutes | $$ |
Warm standby | Seconds | Minutes | $$$ |
Active/active | Near zero | Potential zero | $$$$ |
Consider both regions and the availability domains within a region for DR and high availability (HA) scenarios. A region is a localized geographic area, and an availability domain is one or more data centers located within a region. If your DR plan requires that DR sites are physically located far apart, using multiple regions can accomplish this goal.
For our example enterprise application, we need to be able to survive a regional outage but can handle some downtime if a region is affected. For these reasons we chose a warm standby deployment in multiple regions.
DR Design Considerations
Depending on the DR method that you implement, there are many considerations to make.
For background information about DR capabilities, see DR Capabilities of Oracle Cloud. In this example, we are looking at the warm standby method and the Oracle Cloud Infrastructure resources that are needed to implement warm standby, which include a second region for a cross-region deployment.
Networking
After creating the network foundation of virtual cloud networks (VCNs) and subnets in the respective regions, to configure DR you need to peer the VCNs in the different regions to facilitate network connectivity.
Compute
To run applications on compute instances in two regions, you must make the compute images available in both regions. In the region for DR, deploy a minimal setup to maintain a warm standby. Then, use capacity reservations to reserve the rest of the required capacity to run all the VMs when the DR region becomes primary. For more information, see Overview of the Compute Service and Best Practices for Your Compute Instances.
Storage
Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Object Storage is a regional service and is available across all availability domains within a region and object storage replication needs to be configured for DR purposes. In the example architecture, Object Storage is used but Block Volume and File Storage are not used.
Database
Autonomous Database is a self-driving, self-securing, self-repairing database service. You do not need to configure or manage any hardware or install any software. Autonomous Database handles creating the database, as well as backing up, patching, upgrading, and tuning the database. For more information, see Provision an Autonomous Database and Create and Manage Dedicated Autonomous Databases.
For DR purposes, the primary database is replicated to a standby database in a remote region using Autonomous Data Guard. For more information about how to achieve HA with Autonomous Database standby databases, see About Standby Databases and Enable a Standby Database.
With Autonomous Data Guard with a standby database in the current region, Autonomous Database monitors the primary database. If the primary database is unavailable, the standby database automatically assumes the role of the primary database.
Autonomous Data Guard does not perform automatic failover for a cross-region standby. If the primary database is unavailable and a local standby is unavailable, you can perform a manual failover to make the standby database in a remote region assume the primary role. For more information, see Oracle Maximum Availability Architecture and Autonomous Database Cloud and Plan Disaster Recovery for Databases.
Monitoring
Monitoring enables you to actively and passively monitor your cloud resources for improved availability and consistent service levels. Also ensure that you subscribe to the OCI status notifications and check the Service Health Dashboard. For an example, see End-to-End Monitoring of applications running on Oracle Cloud Infrastructure.
Explore More
Solution Playbooks:
Learn about protecting your cloud topology against disasters
Design the infrastructure to deploy Oracle Enterprise Performance Management in the cloud (DR Architecture: Multiple Regions)
Deploy Commvault to protect your VMware SDDC in the cloud against disasters
Deploy Zerto to protect your VMware SDDC in the cloud against disasters
Deploy Veeam to protect your VMware SDDC in the cloud against disasters
Deploy Actifio to protect your VMware SDDC in the cloud against disasters
Reference Architectures:
Deploy Exadata Cloud Service with Data Guard in multiple regions
Deploy a cross-region disaster recovery solution using RackWare
Configure cross-region private connectivity between tenancies
Documentation and other resources: