Select a DR Method

Depending on your business and IT requirements, decide the disaster recovery (DR) method that's most appropriate for your deployment.

Back Up and Restore Block Volumes

The primary purpose of backups is to support business continuity, disaster recovery, and long-term archiving.

The following are common use cases for block volume backups:
  • Creating multiple copies of the same volume. Backups are useful when you need instances with many volumes that must contain the same data.
  • Taking a snapshot that you can restore to a new volume later.
  • Ensuring that you have a reliable copy of data in case something goes wrong with the primary volume.
When you define your backup plan and goals, consider the following factors:
  • Frequency: How often you want to back up your data.
  • Recovery time: How long you can wait for a backup to be restored and accessible to the applications that use it. The time for a backup to be created depends on several factors, such as the size of the data being backed up and the amount of data that has changed since the last backup.
  • Number of stored backups: How many backups you must keep available, and the deletion schedule for backups that you no longer need.
When creating backups and restoring from them, consider the following best practices:
  • Before creating a backup, ensure that the data is consistent: synchronize the file system, unmount the file system if possible, and save your application data. Only the data on the disk is backed up. When creating a backup, after the backup state changes from REQUEST_RECEIVED to CREATING, you can resume writing data to the volume. While a backup is in progress, the volume that is being backed up can’t be deleted.
  • If you want to attach a restored volume to compute instance that has the original volume attached, then note that some operating systems don’t support restoring identical volumes. To overcome this constraint, change the partition IDs before restoring the volume. The steps to change an operating system's partition ID depend on the operating system. See the documentation for the operating system of your compute instance.
  • Don’t delete the original volume until you have verified that the backup is created successfully.

If your application uses multiple volumes that span more than one compute instance, then use volume-group backups. Volume groups simplify the process of creating backups and clones for applications that use multiple volumes across multiple instances. You can restore an entire group of volumes from a volume-group backup, as shown in the following diagram.

Description of volume-backup-restore.png follows
Description of the illustration volume-backup-restore.png

Create a Pilot Light

The term pilot light refers to a small flame in a traditional gas-powered heater that is always lit and can be used to quickly restart the heater when triggered by temperature sensors in the house. In the context of DR, a pilot light consists of the critical core components of your application, deployed at the DR site and containing the latest application configuration and critical data. These core pilot-light components can then be used to restore a production-sized environment in the event of a disaster.

The following are the critical components of the pilot light at the DR site:
  • Database Tier

    The Oracle Cloud Infrastructure Database service enables you to provision your entire database in the DR site (availability domain, region, or both) without enabling production-sized resources. When DR is activated, you can enable more resources, with a single REST API call to the service without restarting the database server.

  • Application Tier

    You deploy only one application server in your DR site (availability domain, region, or both) that contains all your latest configuration. You can use the custom images feature in Oracle Cloud Infrastructure to back up your OS and applications periodically and then use these images to provision new servers when the DR site is activated.

    For example, if a production site contains eight application servers, you deploy only one application server in the DR site and keep it synchronized with the primary site by using rsync or another tool. You create a custom image from this server in the DR site daily that can be used to provision the remaining seven servers when DR is activated.

  • Networking Tier
    Use the following Oracle Cloud Infrastructure features and services in your pilot light at the DR site
    • IP addresses (private and public)
    • DNS service
    • Load balancing service

Use an Active Standby

An Active Standby (versus mounted standby) is a standby that is open read only while recovering the database. Active standby requires the Active Data Guard feature and license.

With Active Data Guard, the physical standby can be leveraged for reads and reporting reducing the potential workload on the primary. Active Data Guard provides comprehensive data protection with auto block repair of physical data corruptions and checks for other types of data corruptions such as lost writes and logical block corruptions. With mounted standby, you will also take advantage of many of data protection benefits except for auto block repair of physical block corruptions. Recovery time (RTO) and data loss (RPO) are typically very low when failing over to any standby database regardless if it's open read only or not.

When selecting a DR method, consider whether you want symmetric or asymmetric resources:

  • Symmetric resources: This is the recommended architecture so that the standby is symmetric to the primary system to ensure application and database performance is similar or identical at time of role transition. This is also ensures the standby database has sufficient resources to keep up with the production workload so data loss is minimal at the time of a disaster. If deployed as an Active Standby Database or with the Active Data Guard option, then the standby is open read only while providing DR protection. This allows you to offload reporting and queries.

  • Asymmetric resources: This architecture is a scale down configuration of the standby environment. With Active Data Guard, the standby can still be read only providing the same benefits to offload work to the standby. However after failover, the performance may not be the same unless you scale up the system to match the primary.

    Asymmetric or smaller standby systems cost less, but may have less compute, CPU, and memory to reduce costs. The trade off is after role transition or a failover event, you must either scale up (expand) to match your previous primary system, or accept lower performance or reduced functionality.

Use a Cold Standby

The term cold standby is used to describe a DR scenario in which a redundant replica of the primary environment is deployed at a DR site. The cold standby environment is activated only if the primary system fails. This approach provides production continuity with a well defined activation time for the switchover.

Oracle Cloud Infrastructure supports automated (programmatic) deployment of a cold standby environment that keeps the cost of maintaining such an environment to a minimum. You're billed for only the active resources and any persistent storage that you consume at the DR site.