Protect Critical Databases from Failures and Disasters Using Autonomous Data Guard

The Autonomous Data Guard feature enables you to keep your critical production databases available to mission critical applications despite failures, disasters, human error, or data corruption. This kind of capability is often called disaster recovery.

In Autonomous Database on Dedicated Exadata Infrastructure, you configure and manage Autonomous Data Guard at the Autonomous Container Database level.

About Autonomous Data Guard

Autonomous Data Guard creates and maintains two completely separate copies of your database: a primary database that your applications connect to and use, and a standby database that is a synchronized copy of the primary database. Then, should the primary database become unavailable for any reason, Autonomous Data Guard automatically converts the standby database to the primary database and, as such, it begins servicing your applications.

The primary and standby databases are often called peer databases of each other.

Note

Applications must be configured to use Transparent Application Continuity (TAC) to gain the full benefit of the database availability features provided by Autonomous Data Guard.

The following diagram shows how the standby database is kept synchronized with the primary database.


Description of autonomous-data-guard.eps follows

Changes made to the primary database are recorded in the primary database's redo log. Autonomous Data Guard transmits these redo records as a stream over the network to the standby database's redo log. Then, the standby database applies these records to the standby database. In this way, the standby database is kept synchronized with the primary database.

Synchronization is nearly instantaneous, but, as the process just described implies, there are two operations that consume time: transporting the redo records to the standby database and applying the redo records to the standby database. The first of these is called the transport lag, and the other is called the apply lag. You can view current lag values for an Autonomous Database from the database's Details page by choosing Autonomous Data Guard Associations under Resources in the side menu. You can view current lag values across all the Autonomous Databases in a container database from the container database's Details page in a similar fashion.

Configuring Autonomous Data Guard

In Oracle Autonomous Database on Dedicated Exadata Infrastructure on dedicated infrastructure, you configure and manage Autonomous Data Guard at the Autonomous Container Database (ACD) level. You can enable Autonomous Data Guard for both new ACDs and already provisioned ACDs on Oracle Public Cloud deployments.

When you create an Autonomous Container Database, you select the Enable Autonomous Data Guard option and then provide details about the standby database, most notably the Exadata Infrastructure and Autonomous Exadata VM Cluster resources you want it created in and the protection mode you want to use. For information about the choices you have regarding these two details, see Autonomous Data Guard Configuration Options.

You can enable Autonomous Data Guard and add a standby ACD to an existing ACD from its Details page on the Oracle Cloud Infrastructure (OCI) console. However, you cannot enable Autonomous Data Guard on an ACD with an active maintenance run scheduled within the next two weeks. When an add standby ACD operation is in progress, any scheduled maintenance on that ACD will not begin until the add standby operation is complete. See Enable Autonomous Data Guard on an Autonomous Container Database for instructions.

Role Transitions and Operations

After an Autonomous Container Database is created, you can change the role of the peer databases using either a switchover or a failover operation. Additionally, should the primary database become unavailable for any reason, Autonomous Data Guard automatically performs a failover operation.

A switchover is a role reversal between the primary database and its standby database. A switchover ensures no data loss. During a switchover, the primary database transitions to the standby role, and the standby database transitions to the primary role. In general, you perform a switchover operation to restore the roles of peer databases to their original configuration after a failed primary database has been reinstated as a standby. To perform a switchover operation, see Switch Roles in an Autonomous Data Guard Configuration.

A failover is when the primary database is unavailable. The failover results in a transition of the standby database to the primary role. In general, you do not need to perform a failover operation manually. However, in the rare case when an automatic failure operation fails, you can perform a manual failover as described in Fail Over to the Standby in an Autonomous Data Guard Configuration.

The availability and status of the database after a failover operation is characterized by two recovery objectives:

  • Recovery Time Objective (RTO). The RTO the maximum amount of time required for the database to become available to applications after a failover, and is related to some degree to the apply lag at the time of the failure. For Autonomous Data Guard, the RTO is seconds up to two minutes.
  • Recovery Point Objective (RPO). The RPO is the maximum duration of potential data loss from the failed primary database, and is related to some degree to the transport lag at the time of the failure. For Autonomous Data Guard, the RPO is near-zero.

When a failed primary database becomes available again, its role is set to Disabled Standby. At this point, you re-enable it by performing a reinstate operation. To perform a reinstate operation, see Reinstate the Disabled Standby in an Autonomous Data Guard Configuration.

Automatic Failover or Fast-Start Failover

With Automatic failover, whenever the primary Autonomous Container Database becomes unavailable because of a region failure, an availability domain failure, a failure of the Exadata Infrastructure or Autonomous Exadata VM Cluster, or the failure of the Autonomous Container Database itself, it automatically fails over to the standby Autonomous Container Database. This is also known as Fast-Start Failover.

Automatic failover is not enabled by default. You can enable automatic failover by selecting the Enable automatic failover option, while configuring the Autonomous Data Guard. See Update Autonomous Data Guard Settings to enable or disable automatic failover.

Note

You can not enable Automatic Failover for databases with cross-region Autonomous Data Guard setup on Exadata Cloud@Customer deployments.
Both the maximum performance and maximum availability protection modes support automatic failover:
  • In the Maximum availability mode, automatic failover guarantees zero data loss.
  • In the Maximum performance mode, automatic failover ensures that the standby database does not fall behind the primary database beyond the value specified for Fast Start failover lag limit. By default, Fast Start failover lag limit is set to 30 seconds and is applicable only to the Maximum performance mode. In this case, automatic failover is only possible when the configured data loss guarantee can be upheld. You can modify the Fast Start failover lag limit to any value between 5 and 3600. See Update Autonomous Data Guard Settings for instructions.
    Note

    Minimum fast start lag limit value supported by ACDs with 19.17 and older versions is 30 seconds.
Besides hardware failures, availability domain outages, and regional outages, there are a few more database health conditions that can trigger a Fast-Start Failover, as listed below:
Database Health Condition Description
Corrupted Controlfile Controlfile is permanently damaged because of a disk failure.
Corrupted Dictionary Dictionary corruption of a critical database. Currently, this state can be detected only when the database is open.
Datafile Write Errors Write errors are encountered in any data files, including temp files, system data files, and undo files.

As a result of automatic failover, the role of the failed primary database becomes Disabled Standby and, after a brief period, the standby database assumes the role of the primary database. After automatic failover concludes, a message is displayed on the details page of the disabled standby database advising you that failover has occurred.

After the service resolves the former primary Autonomous Container Database issues, you can perform a manual switchover to return both databases to their initial roles. Once you provision the standby database, you can perform various management tasks related to the standby database, including:
  • Manually switching over a primary database to a standby database.
  • Manually failing over a primary database to a standby database.
  • Reinstating a primary database to standby role after failover.
  • Terminating a standby database.

Snapshot Standby Database

A snapshot standby database is a fully updateable standby database created by converting a standby Autonomous Container Database (ACD) to a snapshot standby ACD. See Convert Physical Standby to Snapshot Standby for step-by-step instructions.

A snapshot standby database receives and archives, but does not apply, redo data from the primary database. However, it increases your Recovery Time Objective (RTO) because real-time changes from the primary database are not applied.

Snapshot standby feature supports various use cases, but here are the primary use cases;
  • Connect primary and standby application instances to primary and standby databases in read-write mode to perform initial configurations.
  • Patch snapshot standby database first and test with your standby application instance to confirm patch stability.
Note

You can not convert a physical standby Autonomous Container Database to a snapshot standby with automatic failover enabled.
While converting to a snapshot standby, you can either activate new database services that are active only in snapshot mode or use the same set of services used with the primary database. However, activating primary database services on the snapshot standby database may result in snapshot standby connection requests forwarded to the primary database or vice-versa if you use incorrect database connect strings. Hence, you must be careful to use appropriate connect string while connecting to your primary and snapshot standby database.
Note

When you create new services with snapshot standby, wallets for all the Autonomous Databases in the snapshot standby ACD are updated. To access the database, reload the wallets from standby Autonomous Databases and use snapshot standby connection strings.

You can convert the snapshot standby ACD back to a physical standby ACD from the Oracle Cloud Infrastructure (OCI) manually. See Convert Snapshot Standby to Physical Standby for detailed instructions. If a snapshot standby is not converted to a physical standby manually, it will be automatically converted back to a physical standby after 7 days from its creation. In any case, converting the snapshot standby back to a physical standby will discard all the local updates to your snapshot standby databases and apply the redo data received from the primary databases.

When a standby ACD is in the snapshot standby mode, you cannot perform the following operations on the primary ACD:
  • Create or terminate Autonomous Databases
  • Scale up or scale down Autonomous Databases
  • Restore Autonomous Databases

If the situation demands, you can manually failover to a snapshot standby from the primary database. In that case, failover converts your snapshot standby database to a physical standby database by discarding all the local updates made to your snapshot standby and applying data from the primary database. See Fail Over to the Standby in an Autonomous Data Guard Configuration for step-by-step instructions.

A switchover between the primary database and its snapshot standby database is not allowed. You must manually convert your snapshot standby to a physical standby before attempting a switchover.

Accessing Standby Databases from Client Applications

In an Autonomous Data Guard configuration, your client applications normally connect to and perform operations on the primary database.

Connecting to the Physical Standby Database

In addition to this normal connectivity, Autonomous Data Guard provides you the option to connect client applications that perform only read-only operations on the standby database. To take advantage of this option, client applications connect to the database using database service names that include "_RO" (for "read only"), as described in Predefined Database Service Names for Autonomous Databases.

Connecting to the Snapshot Standby Database

Autonomous Data Guard also lets you connect client applications that perform read-write operations to the snapshot standby database. To connect to a snapshot standby database, client applications can use database service names that include "_SS" (for "snapshot standby"), as described in Predefined Database Service Names for Autonomous Databases.

Note

When the standby database is in the snapshot standby mode, all the database services that include "_RO" services in their name are inactive and can not be used for connections.

Monitoring Lag Times

As your databases that use Autonomous Data Guard are running, you can monitor transport lag and apply lag times from the database's (or container database's) Details page by choosing Autonomous Data Guard Associations under Resources in the side menu.

You should expect to see minor fluctuations over time as the workload on your database ebbs and flows. However, if you notice a continuing upward trend in lag time, you can take these actions to resolve the situation:

  • Upward Trend in Apply Lag. A continuing upward trend in apply lag indicates that the standby database doesn't have sufficient capacity to keep up with the redo records coming from the primary database. To resolve this situation, scale up the OCPUs of the database, as described in Add CPU or Storage Resources to a Dedicated Autonomous Database.
  • Upward Trend in Transport Lag. A continuing upward trend in transport lag indicates a network performance issue. Oracle Cloud operations staff constantly monitors network performance, so you should see the situation resolve itself without you taking any action. However, if you want, you can bring the situation to the operations staff by raising a service request, as described in Create a Service Request in My Oracle Support.

Autonomous Data Guard Configuration Options

When you create an Autonomous Container Database with Autonomous Data Guard enabled, you specify which Exadata Infrastructure and Autonomous Exadata VM Cluster resources you want the standby database created in, and you specify which data protection mode you want to use.

You have three choices when specifying which Exadata Infrastructure and Autonomous Exadata VM Cluster resources to use for the standby:

  • In a different region from the primary database's Exadata Infrastructure and Autonomous Exadata VM Cluster:

    This choice provides the highest level of protection against disasters, including a catastrophic loss of external network connectivity or power to an entire region.

    To make best use of this cross-region protection, your application tier also needs to be configured to support cross-region protection. Therefore, Oracle recommends that you choose this option if your application tier is already configured this way or if you are willing to reconfigure it to support cross-region protection.

    If you choose to locate the standby database in a different region, Oracle recommends that you use the Maximum Performance protection mode.

  • In a different availability domain (AD) from the primary database's Exadata Infrastructure and Autonomous Exadata VM Cluster:

    This choice provides a high level of protection against disasters, including a catastrophic loss of external network connectivity or power to an availability domain within a region.

    This choice provides a good balance between data protection and simplicity of configuration in your application tier.

    If you choose to locate the standby database in a different availability domain, Oracle recommends that you use the Maximum Availability protection mode.

  • In the same availability domain (AD) as the primary database's Exadata Infrastructure and Autonomous Exadata VM Cluster:

    This choice provides a minimum level of protection against disasters, and Oracle recommends that you not choose it.

    If the primary database's Exadata Infrastructure and Autonomous Exadata VM Cluster resources are in a region that has only one availability domain, Oracle recommends that you use the "in a different region" option.

    If you do choose to locate the standby database in the same availability domain, Oracle recommends that you use the Maximum Availability protection mode.

About Protection Modes

Autonomous Data Guard provides these data protection modes:

  • Maximum Availability. This protection mode provides the highest level of data protection that is possible without compromising the availability of the primary database.

    The primary database does not commit transactions until it receives acknowledgment that the data has been received on the standby, (not that it has been written to disk). If the primary database does not receive this acknowledgment within 30 seconds, it operates as if it were in maximum performance mode to preserve primary database availability until it again receives acknowledgments in a timely fashion.

    This protection mode ensures zero data loss except in the case of certain double faults, such as failure of a primary database after failure of the standby database.

  • Maximum Performance. This is the default protection mode. It provides the highest level of data protection that is possible without affecting the performance of the primary database.

    The primary database commits transactions as soon as all redo data generated by those transactions has been written to its online redo log. It also sends redo data to the standby database, but this is done asynchronously with respect to transaction commitment, so primary database performance is unaffected by delays in writing redo data to the standby database.

    This protection mode offers slightly less data protection than maximum availability mode and has minimal impact on primary database performance.

You can change the protection mode in an Autonomous Data Guard setup from the Oracle Cloud Infrastructure (OCI) console. See Update Autonomous Data Guard Settings for step-by-step instructions.

For more information about protection modes in Oracle Data Guard (which underlies the Autonomous Data Guard feature), see Oracle Data Guard Protection Modes in Oracle Data Guard Concepts and Administration.

How Autonomous Data Guard Affects Standard Management Operations

In some cases, the standard management operations you perform on Autonomous Container Databases work differently on the primary and standby container databases in an Autonomous Data Guard configuration as compared to standard container databases. The following list describes these differences.

  • Change the maintenance schedule

    Maintenance scheduling of a primary container database and its standby are linked: maintenance on the standby is performed a number of days before maintenance on the primary. The default is 7 days; you can choose from 1 to 7 days when you create the primary container database or later by editing its Maintenance Details.

  • Change the maintenance type

    The maintenance type of a primary container database and its standby must be the same. You choose the maintenance type for both the primary and standby when you create the primary container database or later by editing its Maintenance Details.

  • Disable automatic backups

    You can not disable automatic backups while provisioning an Autonomous Container Database (ACD) with Autonomous Data Guard.

  • Manage scheduled maintenance

    You can manage scheduled maintenance of a primary container database and its standby separately. However, because maintenance of the two are linked, you must perform scheduled maintenance on the standby before the primary if you choose to override the scheduled maintenance time.

  • Move to a different compartment

    You can move primary and standby container databases to different compartments separately and independently, just as though they were standard container databases. However, as with standard container databases, you should exercise extreme caution when moving a container database to ensure that the container database remains accessible to the appropriate groups of cloud users.

  • Restart

    You can restart primary and standby container databases separately and independently, just as though they were standard container databases.

  • Rotate the encryption key

    You can rotate the encryption keys of primary and standby container databases separately and independently, just as though they were standard container databases.

  • Terminate

    You can terminate primary and standby container databases separately. However, the consequences of terminating a primary container database and terminating a standby container database differ:

    • Terminating a primary container database terminates both the primary and standby container databases. You cannot terminate a primary container database that contains autonomous databases.
    • Terminating a standby container database terminates the standby container database and removes the Autonomous Data Guard configuration from the primary container database, making it a standard container database.