Oracle Maximum Availability Architecture in Oracle Cloud Infrastructure

Oracle Maximum Availability Architecture is a set of best practices developed by Oracle engineers over many years for the integrated use of Oracle High Availability technologies.

Oracle Maximum Availability Architecture and Autonomous Database Cloud

High availability is suitable for all development, test, and production databases that have high uptime requirements and low data loss tolerance. By default, Autonomous Databases are highly available, incorporating a multi-node configuration to protect against localized hardware failures that do not require fast disaster recovery. Each Autonomous Database application service resides in at least one Oracle Real Application Clusters (Oracle RAC) instance with the option to fail over to another available Oracle RAC instance using Autonomous Data Guard for unplanned outages or planned maintenance activities, resulting in zero or near-zero downtime. Autonomous Database's automatic backups are stored in Oracle Cloud Infrastructure Object Storage and replicated to another availability domain, and can be restored in the event of a disaster. Major database upgrades, however, require downtime.

The uptime service-level objective per month is 99.95% (a maximum of 22 minutes of downtime per month), but when you use Maximum Availability Architecture best practices for continuous service, most months would effectively have zero downtime. The uptime service-level objective does not include downtime due to customer-initiated high availability tests, disaster recovery (such as an availability domain or regional outage), database corruptions, or downtime due to planned maintenance that cannot be done online or through an Oracle RAC rolling update solution, such as major database upgrades from one release to another.

The following table describes the recovery-time objectives and recovery-point objectives (data loss tolerance) for service-level objectives.

Default High Availability Policy Recovery Time (RTO) and Recovery Point (RPO) Service-level Objectives
Failure and Maintenance Events Database Downtime Service-Level Downtime (RTO) Potential Service-Level Data Loss (RPO)
Localized events, including:
  • Exadata cluster network topology failures

  • Storage (disk and flash) failures

  • Database instance failures

  • Database server failures

  • Periodic software and hardware maintenance updates

Zero Near-zero Zero
Events requiring restoring from backup because standby database does not exist:
  • Data corruptions

  • Full database failures

  • Complete storage failures

  • Availability domain or region failures

Minutes to hours Minutes to hours 15 minutes

In the preceding table, the amount of downtime for events requiring restoring from a backup varies due to the nature of the failure. In the most optimistic case, a physical block corruption is detected and the block is repaired with block media recovery in minutes. In this case, only a small portion of the database is affected with zero data loss.

In a more pessimistic case, an availability domain or data region fails, and a new cluster must be provisioned and restored with the latest database backup, including all archives, and a complete database recovery must be run. Data loss is limited by the last successful archive log backup, the frequency of which is every 15 minutes, by default, and includes a log switch and subsequent archive log backup of any redo that has not been backed up to Oracle Cloud Infrastructure Object Storage. Data loss can be seconds or, at worst, around 15 minutes.

Autonomous Data Guard for Autonomous Databases on Dedicated Exadata Infrastructure

Enable Autonomous Data Guard for mission-critical production databases that have more strict uptime requirements than databases with the default high-availability configuration and limited data-loss tolerance considering a wider range of potential problems, such as data corruption and database and regional site failures. Enabling Autonomous Data Guard adds one symmetric standby database with Oracle Data Guard to an Exadata rack that is located in another availability domain or in another region.

The primary and standby database systems are configured symmetrically to ensure that performance service levels are maintained after Data Guard role transitions. Oracle Data Guard features asynchronous redo transport (maximum performance mode) within the same region across availability domains, or across regions, by default. If zero data loss is required, then you can change to synchronous redo transport (maximum availability mode).

As with databases that are not Data Guard-enabled, each Autonomous Database application service resides in at least one Oracle RAC instance and will automatically fail over to another available Oracle RAC instance, as previously described. The standby database provides expanded application services to offload reporting, queries, and some updates. The Database Backup Cloud Service schedules automated backups, which are stored in Oracle Cloud Infrastructure Object Storage and replicated to another availability domain. Those backups can be used to restore databases in the event of a double disaster where both primary and standby databases are lost.

Local and remote virtual cloud network peering provides a secure, high-bandwidth network across availability domains and regions for any traffic between primary and standby servers.

The uptime service-level objective per month is 99.995% (maximum 132 seconds of downtime per month) and recovery-time objectives (downtime) and recovery-point objectives (data loss) are low, as described in the subsequent table, when a manual failover is initiated. When you use Maximum Availability Architecture best practices for continuous service, most months would have an effective downtime of zero. The uptime service-level objective does not include downtime as a result of user-initiated high availability tests, user-initiated Data Guard switchover tests, or the time it takes to initiate a manual Data Guard failover.

Users can choose whether their database failover site is located in a different availability domain within the same region or in a different region, contingent upon application or business requirements, and data center availability.

Autonomous Data Guard Recovery Time (RTO) and Recovery Point (RPO) Service-level Objectives
Failure and Maintenance Events Service-Level Downtime (RTO) Potential Service-Level Data Loss (RPO)
Localized events, including:
  • Exadata cluster network fabric failures
  • Storage (disk and flash) failures
  • Database instance failures
  • Database server failures
  • Periodic software and hardware maintenance updates
Near zero Zero
Events requiring failover to the standby database using Autonomous Data Guard-enabled dedicated Autonomous Databases, including:
  • Data corruptions (because Data Guard has automatic block repair for physical corruptions, a failover operation is required only for logical corruptions or extensive data corruptions)
  • Full database failures
  • Complete storage failures
  • Availability domain or region failures
Few seconds to two minutes
  • Zero for maximum availability protection mode (uses synchronous redo transport). Most commonly used for intra-region standby databases.
  • Near zero for maximum performance protection mode (uses asynchronous redo transport). Most commonly used for cross-region standby databases.

Maintaining Application Uptime

Ensure that network connectivity to Oracle Cloud Infrastructure is reliable so that you can access your tenancy's Autonomous Database resources.

Follow the guidelines in the Continuous Availability: Best Practices for Applications Using Autonomous Database - Dedicated and Application Continuity: MAA Checklist for Preparation white papers to experience application-level service uptime similar to that of the database uptime.

Oracle Maximum Availability Architecture in Exadata DB Systems

Oracle Maximum Availability Architecture in Oracle Cloud Infrastructure provides inherent high availability, data protection, and disaster recovery protection integrated with both cloud automation and lifecycle operations, enabling Oracle Cloud Infrastructure to be the best cloud solution for enterprise databases and applications.

Oracle Maximum Availability Architecture Benefits in Oracle Cloud

  • Deployment: Oracle deploys Exadata in Oracle Cloud Infrastructure using Oracle Maximum Availability Architecture best practices, including configuration best practices for storage, network, operating system, Oracle Grid Infrastructure, and Oracle Database. Exadata is optimized to run enterprise Oracle Databases with extreme scalability and availability.
  • Oracle Maximum Availability Architecture database templates: All cloud databases created with Oracle Cloud automation use Oracle Maximum Availability Architecture settings optimized for the Exadata in Oracle Cloud. Oracle does not recommend that you use custom scripts to create cloud databases.
  • Backup and restore automation: When you configure automatic backup to Oracle Cloud Infrastructure Object Storage, backup copies exist across multiple availability domains for additional protection, and RMAN validates cloud database backups for any physical corruptions. Database backups occur daily with a full backup occurring once per week and incremental backups occurring on all other days. Archive log backups occur frequently to reduce potential data loss in case of disaster.
  • Exadata inherent benefits: Exadata is the best Oracle Maximum Availability Architecture platform that Oracle offers, engineered with hardware, software, database, and availability innovations to support the most mission-critical enterprise applications. Specifically, Exadata provides unique high availability, data protection, and quality-of-service capabilities that set Oracle apart from any other platforms or cloud vendor.

    For a comprehensive list of Oracle Maximum Availability Architecture benefits for Exadata DB systems, see Exadata Database Machine: Maximum Availability Architecture Best Practices and Deploying Oracle Maximum Availability Architecture with Exadata Database Machine. Examples of these benefits include:

    • High availability and low brownout: Fully-redundant, fault-tolerant hardware exists in the storage, network, and database servers. Resilient, highly-available software, such as Oracle Real Application Clusters (Oracle RAC), Oracle Clusterware, Oracle Database, Oracle Automatic Storage Management, Oracle Linux, and Oracle Exadata Storage Server enable applications to maintain application service levels through unplanned outages and planned maintenance events. For example, Exadata has instant failure detection that can detect and repair database node, storage server, and network failures in less than two seconds, and resume application and database service uptime and performance. Other platforms can experience 30 seconds, or even minutes, of blackout and extended application brownouts for the same type of failures. Only the Exadata platform offers a wide range of unplanned outage and planned maintenance tests to evaluate end-to-end application and database brownouts and blackouts.
    • Data protection: Exadata provides Oracle Database physical and logical block corruption prevention, detection, and, in some cases, automatic remediation. The Exadata Hardware Assisted Resilient Data (HARD) checks include support for server parameter files, control files, log files, Oracle data files, and Oracle Data Guard broker files when those files are stored in Exadata storage. This intelligent Exadata storage validation stops corrupted data from being written to disk when a HARD check fails, which eliminates a large class of failures that the database industry had previously been unable to prevent. Examples of the Exadata HARD checks include:
      • Redo and block checksum

      • Correct log sequence

      • Block type validation

      • Block number validation

      • Oracle data structures, such as block magic number, block size, sequence number, and block header and tail data structures

      Exadata HARD checks initiate from Exadata storage software (cell services) and work transparently after enabling a database DB_BLOCK_CHECKSUM parameter, which is enabled by default in the cloud. Exadata is the only platform that currently supports the HARD initiative. Furthermore, Oracle Exadata Storage Server provides non-intrusive, automatic hard disk scrub and repair. This feature periodically inspects and repairs hard disks during idle time. If bad sectors are detected on a hard disk, then Oracle Exadata Storage Server automatically sends a request to Oracle Automatic Storage Management to repair the bad sectors by reading the data from another mirror copy. Finally, Exadata and Oracle Automatic Storage Management can detect corruptions as data blocks are read into the buffer cache and automatically repair data corruption with a good copy of the data block on a subsequent database write. This inherent intelligent data protection makes Exadata and Exadata Cloud the best data protection storage platform for Oracle Databases. For comprehensive data protection, a Maximum Availability Architecture best practice is to use a standby database on a separate Exadata to detect, prevent, and automatically repair corruptions that cannot be addressed by Exadata, alone. The standby database also minimizes downtime and data loss for disasters that result from site, cluster, and database failures.

    • Response time quality of service: Only Exadata has end-to-end quality-of-service capabilities to ensure that response time remains low and optimum. Database server I/O capping and Exadata storage I/O latency capping ensures that read or write I/O can be redirected to partnered cells when response time exceeds a certain threshold. If storage becomes unreliable (but not failed) because of poor and unpredictable performance, then the disk or flash cache can be quarantined, offline, and later brought back online if heuristics show that I/O performance is back to acceptable levels. Resource management can help prioritize key database network or I/O functionality, so that your application and database perform at an optimized level. For example, database log writes get priority over backup requests on the Exadata network and storage. Furthermore, rapid response time is maintained during storage software updates by ensuring that partner flash cache is warmed so flash misses are minimized.
    • End-to-end testing and holistic health checks: Because Oracle owns the entire Exadata Cloud infrastructure, end-to-end testing and optimizations benefit every Exadata customer around the world, whether hosted on premise or in the cloud. Validated optimizations and fixes required to run any mission-critical system are uniformly applied after rigorous testing. Health checks are designed to evaluate the entire stack. The Exadata health check utility EXACHK is Exadata cloud-aware and highlights any configuration and software alerts that may have occurred because of customer changes. No other cloud platform currently has this kind of end-to-end health check available. For Oracle Autonomous Database, EXACHK runs automatically to evaluate Maximum Availability Architecture compliance. For non-autonomous databases, Oracle recommends running EXACHK at least once a month, and before and after any software updates, to evaluate any new best practices and alerts.
  • Oracle Maximum Availability Architecture best practices paper: Oracle Maximum Availability Architecture engineering collaborates with Oracle Cloud teams to integrate Oracle Maximum Availability Architecture practices that are optimized for Oracle Cloud Infrastructure and security. See MAA Best Practices for the Oracle Cloud for additional information about continuous availability, Oracle Data Guard, Hybrid Data Guard, Oracle GoldenGate, and other Maximum Availability Architecture-related topics.

The following table lists various software updates and the impacts associated with those updates on databases and applications.

Software Update Database Impact Application Impact Implementation
Network Zero downtime Zero to single-digit seconds Performed by Oracle Cloud
Storage cells Zero downtime Zero to single-digit seconds Performed by Oracle Cloud
Exadata Dom0 Zero downtime with Oracle RAC rolling updates Zero downtime Performed by Oracle Cloud
Exadata DomU Zero downtime with Oracle RAC rolling updates Zero downtime

Performed by Oracle Cloud for Autonomous Database

Performed by customer using cloud-assisted tools for non-Autonomous Database

Oracle Database quarterly update or patch Zero downtime with Oracle RAC rolling updates Zero downtime

Performed by Oracle Cloud for Autonomous Database

Performed by customer using cloud-assisted tools for non-Autonomous Database

Oracle Grid Infrastructure quarterly update, patch, or upgrade Zero downtime with Oracle RAC rolling updates Zero downtime

Performed by Oracle Cloud for Autonomous Database

Performed by customer using cloud-assisted tools for non-Autonomous Database

Oracle Database upgrade

Minimal downtime with DBMS_ROLLING, Oracle GoldenGate replication, or with pluggable database relocate

Minimal downtime with DBMS_ROLLING, Oracle GoldenGate replication, or with pluggable database relocate

Not applicable for Autonomous Database

Performed by customer using generic Maximum Availability Architecture best practices

Achieving Continuous Availability for your Applications

As part of Exadata Cloud, all software updates (except for non-rolling database upgrades) can be done online or with Oracle RAC rolling updates to achieve continuous database uptime. Furthermore, any local failures of the storage, Exadata network, or Exadata database server are managed, automatically, and database uptime is maintained.

To achieve continuous application uptime during Oracle RAC switchover or failover events, follow these application-configuration best practices:

  • Use non-default Oracle Clusterware-managed services to connect your application.

  • Use recommended connection string with built-in timeouts, retries, and delays, so that incoming connections do not see errors during outages.

  • Configure your connections with Fast Application Notification.

  • Drain and relocate services prior to any planned maintenance outage on Exadata that requires restarting any of the Oracle RAC instances. Software updates to Exadata Dom0 or DomU are automatic. For Oracle Database and Oracle Grid Infrastructure software updates, Exadata Cloud-assisted tools and Autonomous Database drain and relocate services automatically.

  • Leverage Application Continuity or Transparent Application Continuity to replay in-flight uncommitted transactions transparently after failures.

For more information, see Continuous Availability: Best Practices for Applications Using Autonomous Database - Dedicated and Application Continuity: MAA Checklist for Preparation white papers to experience application-level service uptime similar to that of the database uptime..

Oracle Maximum Availability Architecture Reference Architectures in the Exadata Cloud

Exadata Cloud supports all four Oracle Maximum Availability Architecture reference architectures, providing support for all Oracle Databases, regardless of their specific high availability, data protection, and disaster recovery service-level agreements. See MAA Best Practices for the Oracle Cloud for more information about Oracle Maximum Availability Architecture in the Exadata Cloud.