4 Oracle Database High Availability Solutions for Unplanned Downtime

Oracle Database offers an integrated suite of high availability solutions that increase availability.

These solutions also eliminate or minimize both planned and unplanned downtime, and help enterprises maintain business continuity 24 hours a day, 7 days a week. However, Oracle's high availability solutions not only go beyond reducing downtime, but also help to improve overall performance, scalability, and manageability.

Outage Types and Oracle High Availability Solutions for Unplanned Downtime

Various Oracle MAA high availability solutions for unplanned downtime are described here in an easy to navigate matrix.

The following table shows how the features discussed in the referenced (hyperlinked) sections can be used to address various causes of unplanned downtime. Where several Oracle solutions are listed, the MAA recommended solution is indicated in the Oracle MAA Solution column.

Table 4-1 Outage Types and Oracle High Availability Solutions for Unplanned Downtime

Outage Scope Oracle MAA Solution Benefits

Site failures

Oracle Data Guard and Enabling Continuous Service for Applications (MAA recommended)

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard

    • Supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary
    • Allows small updates to be redirected to the primary enabling read-mostly reports to be offloaded to standby
  • Database In-Memory support

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

  • Potential zero downtime with custom application failover

Recovery Manager, Zero Data Loss Recovery Appliance and Oracle Secure Backup

  • Fully managed database recovery and integration with Oracle Secure Backup

  • Recovery Appliance

    • provides end-to-end data protection for backups
    • reduces data loss for database restores
    • Non-real-time recovery

Instance or computer failures

Oracle Real Application Clusters and Oracle Clusterware and Enabling Continuous Service for Applications (MAA recommended)

  • Integrated client and application failover

  • Automatic recovery of failed nodes and instances

  • Lowest application brownout with Oracle Real Application Clusters

Oracle RAC One Node and Enabling Continuous Service for Applications

  • Integrated client and application failover

  • Online database relocation migrates connections and instances to another node

  • Better database availability than traditional cold failover solutions

Oracle Data Guard and Enabling Continuous Service for Applications

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard

    • Supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary
    • Allows small updates to be redirected to the primary enabling read-mostly reports to be offloaded to standby
  • Database In-Memory support

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-Active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

  • Potential zero downtime with custom application failover

Storage failures

Oracle Automatic Storage Management (MAA recommended)

Mirroring and online automatic rebalancing places redundant copies of the data in separate failure groups.

Oracle Data Guard (MAA recommended)

  • Integrated client and application failover

  • Fastest and simplest database replication

  • Supports all data types

  • Zero data loss by eliminating propagation delay

  • Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary

  • Database In-Memory support

Recovery Manager with Fast Recovery Area, and Zero Data Loss Recovery Appliance (MAA recommended)

Fully managed database recovery and managed disk and tape backups

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write)

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

  • Potential zero downtime with custom application failover

Data corruption

Corruption Prevention, Detection, and Repair (MAA recommended)

Database initialization settings such as DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_PROTECT

Different levels of data and redo block corruption prevention and detection at the database level

Data corruption

Oracle Data Guard (MAA recommended)

Oracle Active Data Guard Automatic Block Repair

DB_LOST_WRITE_PROTECT initialization parameter

  • In a Data Guard configuration with an Oracle Active Data Guard standby

    • Physical block corruptions detected by Oracle at a primary database are automatically repaired using a good copy of the block retrieved from the standby, and vice versa
    • The repair is transparent to the user and application, and data corruptions can definitely be isolated
  • With MAA recommended initialization settings, Oracle Active Data Guard and Oracle Exadata Database Machine, achieve most comprehensive full stack corruption protection.

  • With DB_LOST_WRITE_PROTECT enabled

    • A lost write that occurred on the primary database is detected either by the physical standby database or during media recovery of the primary database, recovery is stopped to preserve the consistency of the database
    • Failing over to the standby database using Data Guard will result in some data loss
    • Data Guard Broker's PrimaryLostWrite property supports SHUTDOWN and CONTINUE, plus FAILOVER and FORCEFAILOVER options, when lost writes are detected on the primary database. See Oracle Data Guard Broker
    • DB_LOST_WRITE_PROTECT initialization parameter provides lost write detection
  • Shadow lost write protection detects a lost write before it can result in major data corruption. You can enable shadow lost write protection for a database, a tablespace, or a data file without requiring an Oracle Data Guard standby database. Note the impact on your workload may vary.

Dbverify, Analyze, Data Recovery Advisor and Recovery Manager, Zero Data Loss Recovery Appliance, and ASM Scrub with Fast Recovery Area (MAA recommended)

These tools allow the administrator to execute manual checks to help detect and potentially repair from various data corruptions.

  • Dbverify and Analyze conduct physical block and logical intra-block checks. Analyze can conduct inter-object consistency checks.

  • Data Recovery Advisor automatically detects data corruptions and recommends the best recovery plan.

  • RMAN operations can

    • Conduct both physical and inter-block logical checks
    • Run online block-media recovery using flashback logs, backups, or the standby database to help recover from physical block corruptions
  • Recovery Appliance

    • Does periodic backup validation that helps ensure that your backups are valid
    • Allows you to input your recovery window requirements, and alerts you when those SLAs cannot be met with your existing backups managed by Recovery Appliance
  • ASM Scrub detects and attempts to repair physical and logical data corruptions with the ASM pair in normal and high redundancy disks groups.

Data corruption

Oracle Exadata Database Machine and Oracle Automatic Storage Management (MAA recommended)

DIX + T10 DIF Extensions (MAA recommended where applicable)

  • If Oracle ASM detects a corruption and has a good mirror, ASM returns the good block and repairs the corruption during a subsequent write I/O.

  • Exadata provides implicit HARD enabled checks to prevent data corruptions caused by bad or misdirected storage I/O.

  • Exadata provides automatic HARD disk scrub and repair. Detects and fixes bad sectors.

  • DIX +T10 DIF Extensions provides end to end data integrity for reads and writes through a checksum validation from a vendor's host adapter to the storage device

Oracle GoldenGate

  • Flexible logical replication solution (target is open read/write). Logical replica can be used as a failover target if partner replica is corrupted.

  • Active-active high availability (with conflict resolution)

  • Heterogeneous platform and heterogeneous database support

Human errors

Oracle security features (MAA recommended)

Restrict access to prevent human errors

Oracle Flashback Technology (MAA recommended)

  • Fine-grained error investigation of incorrect results

  • Fine-grained and database-wide or pluggable database rewind and recovery capabilities

Delays or slow downs

Oracle Database and Oracle Enterprise Manager

Oracle Data Guard (MAA recommended) and Enabling Continuous Service for Applications

  • Oracle Database automatically monitors for instance and database delays or cluster slow downs and attempts to remove blocking processes or instances to prevent prolonged delays or unnecessary node evictions.

  • Oracle Enterprise Manager or a customized application heartbeat can be configured to detect application or response time slowdown and react to these SLA breaches. For example, you can configure the Enterprise Manager Beacon to monitor and detect application response times. Then, after a certain threshold expires, Enterprise Manager can call the Data Guard
    DBMS_DG.INITIATE_FS_FAILOVER
    PL/SQL procedure to initiate a failover. See the section about "Managing Fast-Start Failover" in Oracle Data Guard Broker.
  • Database In-Memory support

File system data

Oracle Replication Technologies for Non-Database Files

Enables full stack failover that includes non-database files

Managing Unplanned Outages for MAA Reference Architectures and Multitenant Architectures

High availability solutions in each of the MAA service-level tiers for the MAA reference architectures and multitenant architectures are described in an easy to navigate matrix.

If you are managing many databases in DBaaS, we recommend using the MAA tiers and Oracle Multitenant as described in Oracle MAA Reference Architectures.

The following table identifies various unplanned outages that can impact a database in a multitenant architecture. It also identifies the Oracle high availability solution to address that outage that is available in each of the MAA reference architectures.

Table 4-2 Unplanned Outage Matrix for MAA Reference Architectures and Multitenant Architectures

Event Solutions by MAA Architecture Recovery Window (RTO) Data Loss (RPO)

Instance Failure

BRONZE: Oracle Restart

Minutes if instance can restart

Zero

SILVER: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware) or Oracle RAC One Node, and Enabling Continuous Service for Applications

Seconds with Oracle RAC, minutes with Oracle RAC One Node

Zero

GOLD: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware and Enabling Continuous Service for Applications

Seconds

Zero

PLATINUM: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware) and Enabling Continuous Service for Applications

Zero Application Outage

Zero

Permanent Node Failure (but storage available)

BRONZE: Restore and recover

Hours to Day

Zero

SILVER: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware) and Enabling Continuous Service for Applications

Seconds

Zero

SILVER: Oracle RAC One Node and Enabling Continuous Service for Applications

Minutes

Zero

GOLD: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware) and Enabling Continuous Service for Applications

Seconds

Zero

PLATINUM: Oracle RAC (see Oracle Real Application Clusters and Oracle Clusterware) and Enabling Continuous Service for Applications

Seconds

Zero

Storage Failure

ALL: Oracle Automatic Storage Management

Zero downtime

Zero

Data corruptions

BRONZE/SILVER: Basic protection

Some corruptions require recover restore and recovery of pluggable database (PDB), entire multitenant container database (CDB) or non-container database (non-CDB)

Hour to Days

  • Since last backup if unrecoverable

  • Zero or Near Zero with Recovery Appliance

GOLD: Comprehensive corruption protection and Auto Block Repair with Oracle Active Data Guard

  • Zero with auto block repair

  • Seconds to minutes if corruption due to lost writes and using Data Guard Fast Start failover.

Zero unless corruption due to lost writes

PLATINUM: Comprehensive corruption protection and Auto Block Repair with Oracle Active Data Guard

Oracle GoldenGate replica with custom application failover

  • Zero with auto block repair

  • Zero with Oracle GoldenGate replica

Zero when using Active Data Guard Fast-Start Failover and Oracle GoldenGate

Human error

ALL: Logical failures resolved by flashback drop, flashback table, flashback transaction, flashback query flashback pluggable database, and undo.

Dependent on detection time but isolated to PDB and applications using those objects.

Dependent on logical failure

All: Comprehensive logical failures impacting an entire database and PDB that requires RMAN point in time recovery (PDB) or flashback pluggable database

Dependent on detection time

Dependent on logical failure

Database unusable, system, site or storage failures, wide spread corruptions or disasters

BRONZE/SILVER: Restore and recover

Hours to Days

  • Since last database and archive backup

  • Zero or near zero with Recovery Appliance

GOLD: Active Data Guard Fast-Start Failover and Enabling Continuous Service for Applications

Seconds

Zero to Near Zero

PLATINUM: Oracle GoldenGate replica with custom application failover

Zero

Zero when using Active Data Guard Fast-Start Failover and Oracle GoldenGate

Performance Degradation

ALL: Oracle Enterprise Manager for monitoring and detection, Database Resource Management for Resource Limits and ongoing Performance Tuning

No downtime but degraded service

Zero