18 Oracle Maximum Availability Architecture and Oracle Autonomous Database

Oracle Maximum Availability Architecture (MAA) is a set of best practices developed by Oracle engineers over many years for the integrated use of Oracle High Availability, data protection, and disaster recovery technologies. The key goal of Oracle MAA is to meet Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for Oracle databases and applications running on our system and database platforms using Oracle Cloud MAA architectures and solutions.

See Oracle MAA Reference Architectures for an overview of the MAA reference architectures and their associated benefits and potential RTO and RPO targets. Also, see Oracle Maximum Availability Architecture in Oracle Exadata Cloud Systems for the inherent differentiated Oracle Exadata Cloud HA and data protection benefits, because Oracle Autonomous Database runs on the Oracle Exadata Cloud infrastructure (ExaInfra and ExaC@C).

Note that Maximum Availability Architectures leverage Chaos Engineering throughout its testing and development life cycles to ensure that end-to-end application and database availability is preserved, or at its optimal levels, for any fault or maintenance event in Oracle Cloud. Chaos Engineering is the discipline of experimenting on a system to build confidence in the system’s capability to withstand turbulent conditions in production. Specifically, MAA aggressively injects various faults and planned maintenance events to evaluate application and database impact throughout our development, stress, and testing cycles. With that experimentation, best practices, defects, and lessons learned are derived, and that knowledge is put back into practice to evolve and improve our cloud MAA solutions.

Oracle Autonomous Database with Default High Availability Option

High availability is suitable for all development, test, and production databases that have high uptime requirements and zero or low data loss tolerance. By default, Oracle Autonomous Database (ADB) is highly available, incorporating a multi-node configuration to protect against localized hardware failures.

Each ADB application service resides in at least one Oracle Real Application Clusters (Oracle RAC) instance, with the option to fail over to another available Oracle RAC instance for unplanned outages or planned maintenance activities, enabling zero or near-zero downtime.

ADB automatic backups are stored in Oracle Cloud Infrastructure Object Storage and are replicated to another availability domain. These backups can be used to restore the database in the event of a disaster. For Oracle Autonomous Database on Exadata Cloud@Customer (ADB-C@C), you have an option to backup to NFS or Zero Data Loss Recovery Appliance (ZDLRA); however, replication of those backups is your responsibility to configure and manage.

Major database upgrades are automated. For Oracle Autonomous Database on Shared Exadata Infrastructure (ADB-S), the downtime is minimum.

The uptime service-level agreements (SLAs) per month is 99.95% (a maximum of 22 minutes of downtime per month). To achieve the application uptime SLAs where most months would be zero downtime, refer to Maintaining Application Uptime section below.

The following table describes the recovery-time objectives and recovery-point objectives (data loss tolerance) for different outages.

Table 18-1 Default High Availability Policy Recovery Time and Recovery Point Service-level Objectives

Failure and Maintenance Events Database Downtime Service-level Downtime (RTO) Potential Service-level Data Loss (RPO)

Localized events, including:

  • Exadata cluster network topology failures
  • Storage (disk and flash) failures
  • Database instance failures
  • Database server failures
  • Periodic software and hardware maintenance updates
Zero Near-zero Zero

Events that require restoring from backup because the standby database does not exist:

  • Data corruptions
  • Full database failures
  • Complete storage failures
  • Availability domain (AD) for multi-AD regions

Minutes to hours

(without Autonomous Data Guard)

Minutes to hours

(without Autonomous Data Guard)

15 minutes for Oracle Autonomous Database on Dedicated Exadata Infrastructure (ADB-D)

1 minute for ADB-S

(without Autonomous Data Guard)

Events that require non-rolling software updates or database upgrades

Less than 10 minutes for ADB-S

Minutes to hour for ADB-D

(without Autonomous Data Guard)

Less than 10 minutes for ADB-S

Minutes to hour for ADB-D

(without Autonomous Data Guard)

Zero

In the table above, the amount of downtime for events that require restoring from a backup varies depending on the nature of the failure. In the most optimistic case, a physical block corruption is detected and the block is repaired with block media recovery in minutes. In this case, only a small portion of the database is affected with zero data loss. In a more pessimistic case, the entire database or cluster fails, then the database is restored and recovered using the latest database backup, including all archives.

Data loss is limited by the last successful archive log backup, the frequency of which is every 15 minutes for ADB-D and 1 minute for ADB-S. Archive or redo are backed up to Oracle Cloud Infrastructure Object Storage or File Storage Service for future recovery purposes. Data loss can be seconds, or, at worst, around last successful archive log to external storage.

Oracle Autonomous Database with Autonomous Data Guard Option

Enable Autonomous Data Guard for mission-critical production databases that require better uptime requirements for disasters from data corruptions, and database or site failures, while still reaping the Oracle Autonomous Database High Availability Option benefits.

Additionally, the read-only standby database provides expanded application services to offload reporting, queries, and some updates. Read-only standby database is only available with Autonomous Data Guard on Dedicated Oracle Exadata Cloud Infrastructure (Dedicated ExaInfra).

Enabling Autonomous Data Guard adds one symmetric standby database with Oracle Data Guard to an Exadata rack that is located in same availability domain, another availability domain, or in another region. The primary and standby database systems are configured symmetrically by default to ensure that performance service levels are maintained after Data Guard role transitions.

Oracle Data Guard features asynchronous redo transport (in maximum performance mode) by default to ensure zero application performance impact. The standby database can be placed within the same availability domain, across availability domains, or across regions.

Data Guard zero data loss protection can be achieved by changing to synchronous redo transport (in maximum availability mode); however, maximum availability database protection mode with synchronous redo transport is only available with Oracle Autonomous Database on Dedicated Exadata Infrastructure (ADB-D), and the standby database is typically placed within the same availability domain or across availability domains to ensure a minimum impact on application response time. Furthermore, local and remote virtual cloud network peering provides a secure, high-bandwidth network across availability domains and regions for any traffic between the primary and standby servers.

The Database Backup Cloud Service schedules automated backups, which are stored in Oracle Cloud Infrastructure Object Storage and replicated to another availability domain. Oracle Autonomous Database on Exadata Cloud@Customer (ADB-C@C), provides you with an option to backup to NFS or Zero Data Loss Recovery Appliance; however, replication of those backups are your responsibility to configure and manage. Those backups can be used to restore databases in the event of a double disaster, where both primary and standby databases are lost.

The uptime service-level agreement (SLA) per month is 99.995% (maximum 132 seconds of downtime per month) and recovery-time objectives (downtime) and recovery-point objectives (data loss) are low, as described in the table below. To achieve the application uptime SLAs where most months would be zero downtime, refer to Maintaining Application Uptime (XREF). This 99.995 uptime SLA is only applicable for Autonomous Data Guard on Dedicated ExaInfra, while Autonomous Data Guard on Shared ExaInfra supports the 99.95% uptime SLA.

Automatic Data Guard failover with zero data loss on Shared ExaInfra can be initiated when the database fails and is inaccessible, but the database's last redo changes can still be sent to the standby. Automatic Data Guard failover with zero data loss can be enabled on Dedicated ExaInfra, and is initiated automatically when the primary database, cluster, or data center is not available. The target standby becomes the new primary database, and all application services are enabled automatically.

A manual Data Failover option is provided in the OCI Console. For the manual Data Guard failover option, the calculated downtime for the uptime SLA starts with the time to perform the Data Guard failover operation and ends when the new primary service is enabled.

You can choose whether your database failover site is located in the same availability domain, in a different availability domain within the same region, or in a different region, contingent upon application or business requirements and data center availability.

Table 18-2 Autonomous Data Guard Recovery Time and Recovery Point Service-level Objectives

Failure and Maintenance Events Service-level Downtime (RTO) Potential Service-level Data Loss (RPO)

Localized events, including:

  • Exadata cluster network fabric failures
  • Storage (disk and flash) failures
  • Database instance failures
  • Database server failures
  • Periodic software and hardware maintenance updates
Zero or Near Zero

Zero

Events that require failover to the standby database using Autonomous Data Guard, including:

  • Data corruptions (because Data Guard has automatic block repair for physical corruptions1, a failover operation is required only for logical corruptions or extensive data corruptions)
  • Full database failures
  • Complete storage failures
  • Availability domain or region failures3
Few seconds to two minutes
  • Zero with maximum availability protection mode (uses synchronous redo transport). Most commonly used for intra-region standby databases.2 This is available for Autonomous Data Guard on Dedicated ExaInfra.
  • Near zero for maximum performance protection mode (uses asynchronous redo transport). Most commonly used for cross-region standby databases.
  • For Autonomous Data Guard on Shared ExaInfra, the RPO is 1 minute.

1 The Active Data Guard automatic block repair for physical corruptions feature is only available for Autonomous Data Guard on Dedicated ExaInfra.

2 Active Data Guard Maximum Availability protection mode with synchronous redo transport (zero data loss protection) is only available with Autonomous Data Guard on Dedicated ExaInfra. For complete container database (CDB) failure (the most common disaster) or complete storage and data center failures, zero data loss Data Guard failover is available for Autonomous Data Guard on Dedicated ExaInfra. Autonomous Data Guard on Shared ExaInfra provides automatic zero data loss failover when the source customer database is not accessible and if the source primary system is available; otherwise, you can do a minimum data loss manual failover using the Cloud Console. Application and database downtime during Data Guard role transition through the Cloud Console should be evaluated from the application level. The Data Guard Cloud Console status response may not be representative of the actual downtime.

3Regional failure protection is only available if the standby is located across regions.

Maintaining Application Uptime

Ensure that network connectivity to Oracle Cloud Infrastructure is reliable so that you can access your tenancy's Oracle Autonomous Database (ADB) resources.

Follow the guidelines to connect to your ADB (on Shared Exadata Infrastructure, or on Dedicated Exadata Infrastructure). Applications must connect to the predefined service name and download client credentials that include the proper tnsnsames.ora and sqlnet.ora files. You can also change your specific application service’s drain_timeout attribute to fit your requirements.

For more details about enabling continuous application service through planned and unplanned outages, see Application Checklist for Continuous Service for MAA Solutions. Oracle recommends that you test your application readiness by following Validating Application Failover Readiness (Doc ID 2758734.1).

For Oracle Exadata Cloud Infrastructure planned maintenance events that require restarting database instance, Oracle automatically relocates services and drain sessions to another available Oracle RAC instance before stopping any Oracle RAC instance. For OLTP applications that follow the MAA checklist, draining and relocating services results in zero application downtime.

Some applications, such as long running batch jobs or reports, may not be able to drain and relocate gracefully, even with a longer drain timeout. For those applications, Oracle recommends that you schedule the software planned maintenance window excluding these types of activities, or stop these activities before the planned maintenance window. For example, you can reschedule a planned maintenance window so that it is outside your batch windows, or stop batch jobs before a planned maintenance window.