7 High Availability Architectures

Oracle MAA provides best practice recommendations for the design, implementation and operation of high availability architectures for the Oracle Database.This chapter provides a detailed description of the major components used by each MAA reference architecture tier and the service levels that can be achieved.

It includes the following sections:

7.1 Introduction to MAA Reference Architectures

Chapter 2, "High Availability and Data Protection – Getting From Requirements to Architecture," provided an overview of the four MAA reference architectures: Bronze, Silver, Gold, and Platinum. Each reference architecture, or HA tier, utilizes an optimal set of Oracle capabilities that when deployed together will reliably achieve a given service level for high availability and data protection. Figure 7-1 provides an overview of the technologies used by each HA tier.

Figure 7-1 Oracle MAA Reference Architectures

Surrounding text describes Figure 7-1 .

Each of the architectures described above are implemented using the operational and configuration best practices described in Chapter 6, "Operational Prerequisites to Maximizing Availability," additional best practices provided in MAA technical white papers, and in Oracle Database High Availability Best Practices. See Section Section 1.5, "Roadmap to Implementing the Maximum Availability Architecture," for more information that will help you navigate MAA best practices documentation.

7.2 The Bronze Tier – A Single Instance HA Architecture

The Bronze tier provides basic database service at the lowest possible cost. A reduced level of HA and data protection is accepted in exchange for reduced cost and implementation complexity. Figure 7-2 provides an overview of the Bronze tier.

Figure 7-2 Bronze Tier – Single Instance HA Architecture

Surrounding text describes Figure 7-2 .

Bronze uses a single instance Oracle Database; there is no clustering technology used for automatic failover if there is an outage of the server on which the Oracle Database instance is running. When a server becomes unusable or the database unrecoverable, RTO is a function of how quickly a replacement system can be provisioned or a backup restored. In a worst case scenario of a complete site outage there will be additional time required to perform these tasks at a secondary location, and in some cases this can take days.

Oracle Recovery Manager (RMAN) is used to perform regular backups of the Oracle Database. The RPO, if there is an unrecoverable outage, is equal to the data generated since the last backup was taken. Copies of database backups are also retained at a remote location or on the Cloud for the dual purpose of archival and DR should a disaster strike the primary data center.

The Bronze tier is comprised of the major components described in the following topics:

7.2.1 Oracle Database HA and Data Protection

Bronze utilizes HA and data protection capabilities that are included with the Oracle Database Enterprise Edition at no additional cost.

  • Oracle Restart automatically restarts the database, the listener, and other Oracle components after a hardware or software failure, or whenever a database host computer restarts.

  • Oracle corruption protection checks for physical corruption and logical intra-block corruptions. In-memory corruptions are detected and prevented from being written to disk, and in many cases can be repaired automatically. For more details see Preventing, Detecting, and Repairing Block Corruption for the Oracle Database.

  • Automatic Storage Management (ASM) is an Oracle-integrated file system and volume manager that includes local mirroring to protect against disk failure.

  • Oracle Flashback Technologies provide fast error correction at a level of granularity that is appropriate to repair an individual transaction, a table, or the full database.

  • Oracle Recovery Manager (RMAN) enables low-cost, reliable backup and recovery optimized for the Oracle Database.

  • Online maintenance includes online redefinition and reorganization for database maintenance, online file movement, and online patching.

7.2.2 Database Consolidation in the Bronze Tier

Databases deployed in the Bronze tier include development and test databases and databases supporting smaller work group and departmental applications that are often the first candidates for database consolidation and for deployment as Database as a Service (DBaaS).

Oracle Multitenant is the MAA best practice for database consolidation and virtualization from Oracle Database 12c onward. Other consolidation options include

  • Operating System Virtualization - Virtual Machines

  • Schema Consolidation

  • Consolidation of multiple discrete databases onto a single physical machine or cluster using Oracle RAC

For a more complete discussion of the trade-offs between Oracle Multitenant and other consolidation approaches please refer to the Oracle MAA technical white paper, 'High Availability Best Practices for Database Consolidation".

7.2.3 Life Cycle Management and DBaaS

Oracle Enterprise Manager Cloud Control enables self service deployment of IT resources for business users along with resource pooling models that cater to various multitenant architectures. These capabilities are required for implementing Database as a Service (DBaaS), a paradigm in which end users (Database Administrators, Application Developers, Quality Assurance Engineers, Project Leads, and so on) can request database services, consume it for the lifetime of the project, and then have them automatically de-provisioned and returned to the resource pool. Cloud Control Database as a Service (DBaaS) provides:

  • A shared, consolidated platform on which to provision database services

  • A self-service model for provisioning those resources

  • Elasticity to scale out and scale back database resources

  • Chargeback based on database usage

7.2.4 Oracle Engineered Systems

Oracle Engineered Systems are an efficient deployment option for database consolidation and DBaaS at all tiers. Oracle Engineered Systems reduce lifecycle cost by standardizing on a pre-integrated and optimized platform for Oracle Database, with hardware and software supported by Oracle. Oracle Engineered Systems include:

  • Oracle Virtual Compute Appliance radically simplifies the way customers install, deploy, and manage virtual infrastructures for any Linux, Oracle Solaris, or Microsoft Windows application.

  • Oracle Database Appliance is a complete low cost package of software, server, storage, and networking engineered for simplicity, saving time and money by simplifying deployment, maintenance, and support of database and application workloads. The Oracle Database Appliance supports both physical and virtual deployments.

  • Oracle Exadata Database Machine is the highest performing, most scalable, and most available platform for running Oracle Database. Oracle Exadata Database Machine runs all types of database workloads including Online Transaction Processing (OLTP), Data Warehousing (DW), and consolidation of mixed workloads, and it is the ideal foundation for database consolidation.

  • Oracle SuperCluster engineered systems are ideal for consolidating databases and applications, private cloud deployments, and Oracle software on a single, general purpose platform. Oracle SuperCluster uses the world's fastest processors based on SPARC architecture and Exadata storage.

  • Oracle ZFS Storage Appliance provides immediate space, management, and cost benefits for customers using network-attached storage (NAS). Oracle ZFS includes a rich software suite for managing, monitoring, troubleshooting, snaps, clones, replication, and advanced data services that are a natural complement to all Oracle Engineered Systems.

7.2.5 Bronze Summary: Data Protection, RTO, and RPO

Table 7-1 summarizes the data protection capabilities of the Bronze tier. The first column of Table 7-1 indicates when validations for physical and logical corruption are performed.

  • Manual checks are initiated by the administrator or at regular intervals by a scheduled job that performs periodic checks.

  • Runtime checks are automatically executed on a continuous basis by background processes while the database is open.

  • Background checks are run on a regularly scheduled interval, but only during periods when resources would otherwise be idle.

Each check is unique to Oracle Database using specific knowledge of Oracle data block and redo structures.

Table 7-1 Bronze Tier Data Protection

Type Capability Physical Block Corruption Logical Block Corruption

Manual

Dbverify, Analyze

Physical block checks

Logical checks for intra-block and inter-object consistency

Manual

RMAN

Physical block checks during backup and restore

Intra-block logical checks

Runtime

Database

In-memory block and redo checksum

In-memory intra block logical checks

Runtime

ASM

Automatic corruption detection and repair using local extent pairs

 

Runtime

Exadata

HARD checks on write

HARD checks on write

Background

Exadata

Automatic HARD Disk Scrub and RepairFoot 1 

 

Footnote 1 Available with Exadata 11.2.3.3 and later and Oracle Database 11g Release 2 (11.2.0.4) and later.

Note that HARD validation and the Automatic Hard Disk Scrub and Repair (the last two rows of Table 2) are unique to Exadata storage. HARD validation ensures that Oracle Database does not write physically corrupt blocks to disk. Automatic Hard Disk Scrub and Repair inspects and repairs hard disks with damaged or worn out disk sectors (cluster of storage) or other physical or logical defects periodically when there are idle resources. Exadata sends a request to ASM to repair the bad sectors by reading the data from another mirror copy. By default the hard disk scrub runs every two weeks.

Table 3 summarizes RTO and RPO for the Bronze tier for various unplanned and planned outages.

Table 7-2 Bronze Tier Recovery Time (RTO) and Data Loss Potential (RPO)

Type Event Downtime Data Loss Potential

Unplanned

Database instance failure

Minutes

Zero

Unplanned

Recoverable server failure

Minutes to an hour

Zero

Unplanned

Data corruptions, unrecoverable server failure, database failures or site failures

Hours to days

Since last backup

Planned

Online File Move, Online Reorganization and Redefinition, Online Patching

Zero

Zero

Planned

Hardware or operating system maintenance and database patches that cannot be done online

Minutes to hours

Zero

Planned

Database upgrades: patch sets and full database releases

Minutes to hours

Zero

Planned

Platform migrations

Hours to a day

Zero

Planned

Application upgrades that modify back-end database objects

Hours to days

Zero


7.3 The Silver Tier - High Availability with Automatic Failover

The Silver tier builds upon Bronze by incorporating clustering technology for improved availability for both unplanned outages and planned maintenance. Silver uses Oracle RAC or Oracle RAC One Node for HA within a data center by providing automatic failover should there be an unrecoverable outage of a database instance or a complete failure of the server on which it runs. Oracle RAC also delivers substantial benefit for eliminating many types of planned downtime by performing maintenance in a rolling manner across Oracle RAC nodes. Figure 7-3 provides an overview of the Silver tier.

Figure 7-3 Silver Tier – High Availability with Automatic Failover

Surrounding text describes Figure 7-3 .

Silver includes the HA components described in the following sections.

7.3.1 Oracle RAC

Oracle RAC improves application availability within a data center should there be an outage of a database instance or of the server on which it runs. Server failover with Oracle RAC is instantaneous. There is a very brief brownout before service is resumed on surviving instances and users from the down instance are able to reconnect. Downtime is also eliminated for planned maintenance tasks that can be performed in a rolling manner across Oracle RAC nodes. Users complete their work and terminate their sessions on the node where maintenance is to be performed. When they reconnect they are directed to a database instance already running on another node.

A quick review of how Oracle RAC works helps to understand its benefits. There are two components: Oracle Database instances and the Oracle Database itself.

  • A database instance is defined as a set of server processes and memory structures running on a single node (or server) which make a particular database available to clients.

  • The database is a particular set of shared files (data files, index files, control files, and initialization files) that reside on persistent storage, and together can be opened and used to read and write data.

  • Oracle RAC uses an active-active architecture that enables multiple database instances, each running on different nodes, to simultaneously read and write to the same database.

The active-active architecture of Oracle RAC provides a number of advantages:

  • Improved high availability: If a server or database instance fails, connections to surviving instances are not affected; connections to the failed instance are quickly failed over to surviving instances that are already running and open on other servers in the cluster.

  • Scalability: Oracle RAC is ideal for high volume applications or consolidated environments where scalability and the ability to dynamically add or reprioritize capacity across more than a single server are required. An individual database may have instances running on one or more nodes of a cluster. Similarly, a database service may be available on one or more database instances. Additional nodes, database instances, and database services can be provisioned online. The ability to easily distribute workload across the cluster makes Oracle RAC the ideal complement for Oracle Multitenant.

  • Reliable performance: Oracle Quality of Service (QoS) can be used to allocate capacity for high priority database services to deliver consistent high performance in database consolidated environments. Capacity can be dynamically shifted between workloads to quickly respond to changing requirements.

  • HA during planned maintenance: High availability is maintained by implementing changes in a rolling manner across Oracle RAC nodes. This includes hardware, OS, or network maintenance that requires a server to be taken offline; software maintenance to patch the Oracle Grid Infrastructure or database; or if a database instance needs to be moved to another server to increase capacity or balance the workload.

Oracle RAC is the MAA best practice for server HA.

7.3.2 Oracle RAC One Node

Oracle RAC One Node provides an option to Oracle RAC in the Silver tier when server HA is a requirement, but scalability and instant failover are not. An Oracle RAC One Node license is one-half the cost of Oracle RAC, providing a lower cost alternative if an RTO of minutes is sufficient for managing server failures.

Oracle RAC One Node is an active-passive failover technology. It is built upon an infrastructure that is identical to Oracle RAC, but in the case of Oracle RAC One Node there is only one database instance open at a time during normal operation. This can reduce memory requirements significantly, especially when consolidating a large number of databases. If the server hosting the open instance fails, Oracle RAC One Node automatically starts a new database instance on a second node to quickly resume service.

Oracle RAC One Node provides several advantages over alternative active-passive clustering technologies. In an Oracle RAC One Node configuration, Oracle Database HA Services, Grid Infrastructure, and database listeners are always running on the second node. At failover time only the database instance and database services need to start, reducing the time required to resume service, and enabling service to resume in minutes.

Oracle RAC One Node also provides the same advantages for planned maintenance as Oracle RAC. Oracle RAC One Node allows two active database instances during periods of planned maintenance to allow graceful migration of users from one node to another with zero downtime. Maintenance is performed in a rolling manner across nodes while database services remain available to users at all times.

7.3.3 Silver Tier Summary: Data Protection, RTO, and RPO

There is no change in the level of data protection compared to what is offered by the Bronze tier. All of the improvements that Silver offers compared to Bronze are related to RTO for server outages and for several frequently executed types of planned maintenance. Table 7-3 summarizes RTO and RPO enabled by the Silver tier. Areas of improvement compared to Bronze are in parentheses.

Table 7-3 Silver Tier Recovery Time (RTO) and Data Loss Potential (RPO)

Type Event Downtime Data Loss Potential

Unplanned

Database instance failure

Seconds if Oracle RAC (vs. minutes)

Zero

Unplanned

Recoverable Server failure

Seconds if Oracle RAC (vs. minutes to an hour)

Minutes if Oracle RAC One Node (vs. minutes to an hour)

Zero

Zero

Unplanned

Data corruptions, unrecoverable server failure, database or site failures

Hours to days

Since last backup

Planned

Online File Move, Online Reorganization and Redefinition, Online Patching

Zero

Zero

Planned

Hardware or operating system maintenance and database patches that cannot be done online but are qualified for Oracle RAC rolling install

Zero (vs. minutes to hours)

Zero

Planned

Database upgrades: patch sets and full database releases

Minutes to hours

Zero

Planned

Platform migrations

Hours to a day

Zero

Planned

Application upgrades that modify back-end database objects

Hours to days

Zero


7.4 The Gold Tier - Comprehensive High Availability and Disaster Recovery

The Gold tier builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned outages including data corruptions, database failures, and site failures. The existence of a replicated copy also provides substantial advantages for reducing downtime during periods of planned maintenance. An overview of the Gold tier is provided in Figure 7-4. RTO is reduced to seconds or minutes with an accompanying RPO of zero or near zero depending upon configuration.

Figure 7-4 Gold Tier – Comprehensive HA and DR

Surrounding text describes Figure 7-4 .

Note that Gold uses Oracle RAC as the standard for server HA in place of the lesser option of Oracle RAC One Node that is available for Silver.

The Gold tier adds the advanced HA components to achieve improved service levels described in the following sections.

7.4.1 Oracle Active Data Guard - Real Time Data Protection and Availability

Oracle Active Data Guard maintains one or more synchronized physical replicas (standby databases) at a remote location that are used to eliminate single point of failure for a production database (the primary database). Capabilities that Oracle Active Data Guard adds to the Gold tier include:

  • Choice of zero or near-zero data loss potential. Oracle Active Data Guard performs real-time replication of changes from a primary to a standby database. Changes are transmitted directly from the log buffer of the primary to minimize propagation delay and overhead, and to completely isolate replication from corruptions that can occur in the I/O stack of a production database.

    Administrators can choose synchronous transport with Maximum Availability for a guarantee of zero data loss. Alternatively they can choose asynchronous transport and Maximum Performance for near-zero data loss. Maximum Performance can achieve sub-second data loss exposure when provided sufficient network bandwidth to accommodate transport volume.

    Data Guard is the Oracle replication technology that provides zero data loss protection.

  • An Oracle Active Data Guard standby database can quickly take over production and restore service if there is a database or site outage that impacts the availability of the primary database. The Oracle Database is always running, it does not need to be restarted to transition to the primary role, and role transitions can complete in less than 60 seconds, even on heavily loaded systems.

    Gold utilizes Data Guard Fast-Start Failover to automate database failover. This accelerates recovery time by eliminating the delay required for an administrator to be notified and respond to an outage. Fast Start Failover uses role-specific database services and the Oracle client notification framework to ensure that applications quickly drop their connections to a failed primary database and automatically reconnect to the new primary. Role transitions can also be executed manually using either a command line interface or Oracle Enterprise Manager.

  • Transparent replication. Oracle Active Data Guard performs complete, one-way physical replication of an Oracle Database with the following characteristics: high performance, simple to manage, support for all data types, applications, and workloads such as DML, DDL, OLTP, batch processing, data warehouse, and consolidated databases. Oracle Active Data Guard is closely integrated with Oracle RAC, ASM, RMAN and Oracle Flashback technologies.

  • Production offload for high return on investment (ROI). Oracle Active Data Guard standby databases can be opened read-only while replication is active, and they can be used to offload ad-hoc queries and reporting workloads from the production database. The offload increases ROI in standby systems and improves performance for all workloads by utilizing capacity that would otherwise be idle. It also provides continuous application validation because the standby systems are ready to support production workloads.

  • Backup offload. Primary and standby systems are exact physical replicas, enabling backups to be offloaded from the primary to the standby database. A backup taken at the standby can be used to restore either the primary or standby database. This provides administrators with flexible recovery options without burdening production systems with the overhead of performing backups.

  • Reduced downtime for planned maintenance. Standby databases can be used to upgrade to new Oracle Patch Sets (for example, patch release 11.2.0.2 to 11.2.0.4) or new Oracle releases (for example, release 11.2 to 12.1) in a rolling manner by implementing the upgrade at the standby first then switching production to the new version. Total downtime is limited to the time required to switch a standby database to the primary production role after maintenance has been completed.

  • An Oracle Active Data Guard standby performs continuous Oracle validation to ensure that corruption is not propagated from the source database. It detects physical and logical intra-block corruptions that can occur independently at either primary or standby databases. It is also unique in enabling run-time detection of silent lost-write corruptions (lost or stray writes that are acknowledged by the I/O subsystem as successful). For more details see My Oracle Support Note 1302539.1 - Best Practices for Corruption Detection, Prevention, and Automatic Repair.

  • Automatic block repair. Oracle Active Data Guard automatically repairs block-level corruption caused by intermittent random I/O errors that can occur independently at either primary or standby databases. It does this by retrieving a good copy of the block from the opposite database. No application changes are required and the repair is transparent to the user.

The points above explain how the Gold tier utilizes Oracle replication technology to maintain a synchronized copy, rather than using storage remote mirroring products (for example, SRDF, Hitachi TrueCopy, and so on) For a more in-depth discussion of the differences see Oracle Active Data Guard vs. Storage Remote Mirroring.

7.4.2 Oracle GoldenGate

Oracle GoldenGate provides the option of logical replication to maintain a synchronized copy (target database) of the production database (source database). Logical replication is a more complex process than physical replication but provides greater flexibility to handle different replication scenarios and heterogeneous platforms.

  • From a data distribution perspective, logical replication is designed to efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database (for example, an Operational Data Store) from multiple source databases.

  • From a high availability perspective, logical replication can be used to maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable. Oracle GoldenGate uses a logical replication process. It reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL (updates, inserts, and deletes) native to the target database. The target database contains the same data, but is a different database from the source (for example, backups are not interchangeable).

  • Oracle GoldenGate logical replication provides increased flexibility to perform maintenance and migrations in a rolling manner that is not possible using Data Guard physical replication. For example, Oracle GoldenGate enables replication of a source running on a big-endian platform and target running on a little-endian platform (cross-endian replication). This makes it possible to execute platform migrations with the additional advantage of being able to reversing the replication for fast fallback to the prior version after cutover.

Oracle GoldenGate logical replication is a more sophisticated process that has a number of prerequisites that do not apply to Data Guard physical replication. In return for these prerequisites Oracle GoldenGate provides unique capabilities to address advanced replication requirements. Refer to MAA Best Practices: Oracle Active Data Guard and Oracle GoldenGate for additional insights on the tradeoffs of each replication technology and requirements that may favor the use of one versus the other, or the use of both technologies in a complementary manner.

7.4.3 Oracle Site Guard

Oracle Site Guard enables administrators to orchestrate switchover (a planned event) and failover (in response to an unplanned outage) of their Oracle environment, multiple databases, and applications, between a production site and a remote disaster recovery site. Oracle Site Guard is included with the Oracle Enterprise Manager Life-Cycle Management Pack.

Oracle Site Guard offers the following benefits:

  • Reduction of errors due to prepared response to site failure. Oracle Site Guard reduces the possibility of human error in case of disasters. Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required.

  • Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers. Site Guard also provides an easy mechanism to integrate with any storage remote mirroring product. It integrates with storage appliances to perform switchover or failover by using callouts to any user-specified storage role reversal scripts in the operation workflow.

  • Faster recovery time. Oracle Site Guard automation minimizes the manual coordination of recovery activities. This accelerates recovery time even compared to the case where all manual efforts are executed successfully. Site Guard also avoids time consuming resolution of human error that often accompanies manual implementation of complex procedures.

7.4.4 Gold Summary: Data Protection, RTO, and RPO

Table 7-4 summarizes the data protection offered by the Gold tier.

Table 7-4 Gold Tier Data Protection

Type Capability Physical Block Corruption Logical Block Corruption

Manual

Dbverify, Analyze

Physical block checks

Logical checks for intra-block and inter-object consistency

Manual

RMAN

Physical block checks during backup and restore

Intra-block logical checks

Runtime

Oracle Active Data Guard

Physical block checking at standby

Strong isolation between primary and standby eliminates single point of failure

Automatic repair of physical corruptions

Automatic database failover

Detect lost write corruption, auto shutdown and failover

Intra-block logical checks at standby

Runtime

Database

In-memory block and redo checksum

In-memory intra block logical checks

Runtime

ASM

Automatic corruption detection and repair using local extent pairs

 

Runtime

Exadata

HARD checks on write

HARD checks on write

Background

Exadata

Automatic Hard Disk Scrub and Repair

 

Table 7-5 summarizes RTO and RPO for the Gold tier. Recovery time and data loss potential are dramatically reduced in the Gold tier compared to Silver. Areas of improvement compared to the Silver tier are in parentheses.

Table 7-5 Gold Tier Recovery Time (RTO) and Data Loss Potential (RPO)

Type Event Downtime Data Loss Potential

Unplanned

Database instance failure

Seconds

Zero

Unplanned

Recoverable server failure

Seconds

Zero

Unplanned

Data corruptions, unrecoverable server failure, database failures or site failures

Zero to minutes (vs. hours to days)

Near-zero if using ASYNC (vs. since last backup)

Zero if using Data Guard synchronous transport (vs. since last backup)

Planned

Online File Move, Online Reorganization and Redefinition, Online Patching

Zero

Zero

Planned

Hardware or operating system maintenance and database patches that cannot be done online

Zero

Zero

Planned

Database upgrades: patch sets and full database releases

Seconds (vs. minutes to hours)

Zero

Planned

Platform migrations

Seconds (vs. hours to a day)

Zero

Planned

Application upgrades that modify back-end database objects

Hours to days

Zero


7.5 The Platinum Tier - Zero Outage for Platinum Ready Applications

The Platinum tier builds upon Gold to provide the highest level of HA and data protection for applications that have zero tolerance for outages or data loss. Platinum introduces several new Oracle Database 12c capabilities as well as previously available products that have been enhanced with the latest release. Platinum masks the impact of outages to applications and users, ensuring that even in-flight transactions are preserved following recoverable failures. It enables zero downtime maintenance, migrations, and application upgrades. It guarantees zero data loss in the event of failure of the primary database for any reason, regardless of the distance between sites. Finally, Platinum automatically manages the availability of database services and workload load balancing across database replicas in multiple sites. An overview of the Platinum tier is provided in Figure 7-5.

Figure 7-5 Platinum Tier – Zero Outage

Surrounding text describes Figure 7-5 .

Some applications will require a level of modification to achieve zero application outage using the capabilities provided by the Platinum tier. This explains why Platinum is described as providing zero application outage for Platinum-Ready Applications. Note that no application modifications are necessary in order to achieve zero data loss.

The Platinum tier enables the HA capabilities described in the following sections.

7.5.1 Application Continuity

Application Continuity protects applications from database session failures due to instance, server, storage, network, or any other related component, and even complete database failure. Application Continuity re-plays affected "in-flight" requests so that the failure appears to the application as a slightly delayed execution, masking the outage to the user.

If an entire Oracle RAC cluster fails, making the database unavailable, Application Continuity will replay the session including the transaction, following an Oracle Active Data Guard failover. Use of Application Continuity with a standby database requires Data Guard Maximum Availability mode (zero data loss) and Data Guard Fast Start Failover (automatic database failover).

While in many cases there is some modification to existing application code required to use Application Continuity, it simplifies development of new applications by transparently handling recoverable failures.

7.5.2 Oracle Active Data Guard Far Sync

Oracle Active Data Guard is the only Oracle-aware replication technology that offers zero data loss failover for Oracle Database. Zero data loss is achieved using synchronous transport with Data Guard Maximum Availability mode. Network latency between primary and standby sites will affect database performance when synchronous transport is used. As distance between site increases, so will latency and its impact on database performance. Because primary and secondary data centers are often separated by long distances, zero data loss failover is impractical to implement for many databases.

Oracle Active Data Guard Far Sync with Oracle Database 12c eliminates prior limitations by enabling zero data loss failover even when primary and standby databases are hundreds or thousands of miles apart, without impacting primary database performance. It achieves this by using a light-weight forwarding mechanism that is simple to deploy and transparent to Oracle Active Data Guard failover or switchover operations. Far Sync, when used in combination with the Oracle Advanced Compression Option, also enables off-host transport compression to conserve network bandwidth.

By combining Far Sync with Data Guard Fast-Start-Failover (automatic database failover), Application Continuity can mask outages for in-flight transactions regardless of the distance between primary and stand by sites. Far Sync, therefore, enables two critical enhancements offered by the Platinum tier: zero data loss failover for any database and the ability to use Application Continuity regardless of the distance between sites. There are no application modifications required to take advantage of Far Sync.

7.5.3 Oracle GoldenGate Zero Downtime Maintenance and Active-Active Replication

The Platinum tier uses Oracle GoldenGate's advanced replication capabilities to implement zero downtime maintenance and migrations using bi-directional replication. In such a scenario:

  • Maintenance is first implemented at a target database.

  • Source and target are synchronized across versions using Oracle GoldenGate logical replication. This handles cross-endian platform migrations. It also handles complex application upgrades that modify back-end objects where the replication mechanism must be able to transform data from old to new versions and vice versa.

  • Once the new version or platform is synchronized and stable, the bi-directional replication enables users to be gradually migrated to the new platform as they terminate sessions on the previous version and reconnect, providing a zero downtime experience. Oracle GoldenGate bi-directional replication keeps old and new versions in sync during the migration process. This also provides for a quick fall back option should any unanticipated issues arise with the new version as load is added.

Active-active bi-directional replication can also be used to increase availability service levels where a continuous read-write connection to multiple copies of the same data is required.

Bi-directional replication is not application transparent. It requires conflict detection and resolution when changes are made to the same record at the same time in multiple databases. It also requires careful consideration of the impact of different failure states and replication lag. When GoldenGate bi-directional replication is used for application upgrades that modify back-end database objects, developer-level knowledge of the database objects modified or added by the new release is required in order to enable GoldenGate to replicate across versions. Implementing cross-version mapping is required for every new release of the application.

As GoldenGate replication is by definition an asynchronous process, it is not able to provide zero data loss protection. For this reason the Platinum tier does not use Oracle GoldenGate to replicate between sites when the remote replica must provide zero data loss protection if the primary database or primary site experiences an unplanned outage. Platinum uses GoldenGate bi-directional replication in combination with Oracle Active Data Guard to meet the zero data loss requirement. A local GoldenGate replica is used to execute planned maintenance with zero downtime while an Oracle Active Data Guard standby provides continuous zero data loss failover protection should an unplanned outage occur while maintenance is in progress.

7.5.4 Edition Based Redefinition

Edition-Based Redefinition (EBR) enables an online application upgrade that changes back-end database objects with uninterrupted availability of the application. When an upgrade installation is complete, the pre-upgrade application and the post-upgrade application can be used at the same time. Existing sessions can continue to use the pre-upgrade application until their users decide to end them, and all new sessions can use the post-upgrade application. When there are no longer any sessions using the pre-upgrade application, it can be retired. EBR used in this manner enables hot rollover from the pre-upgrade version to the post-upgrade version.

EBR enables online application upgrades in the following manner:

  • Code changes are installed in the privacy of a new edition.

  • Data changes are made safely by writing only to new columns or new tables not seen by the old edition. An editioning view exposes a different projection of a table into each edition to allow each to see just its own columns.

  • A cross-edition trigger propagates data changes made by the old edition into the new edition's columns, or (in hot-rollover) vice-versa.

Similar to Oracle GoldenGate zero downtime application upgrades, the use of EBR requires deep knowledge of the application and a non-trivial effort on the part of the developer to incorporate it. Unlike Oracle GoldenGate, there is a one-time investment to utilize EBR. From that point forward minimal investment is required to use EBR for subsequent new releases of the application. EBR has proven that it can be implemented even for the most complex applications, for example, Oracle E-Business Suite 12.2 uses EBR for online patching. EBR is a feature included with Oracle Database as a zero cost option to encourage its adoption by application developers.

7.5.5 Global Data Services

Global Data Services (GDS) is a complete automated workload management solution for replicated databases that use Oracle Active Data Guard or Oracle GoldenGate. GDS achieves better system utilization and offers better performance, scalability, and availability for application workloads running on replicated databases. GDS provides the following capabilities for a set of replicated databases:

  • Region-based workload routing

  • Connect-time load balancing

  • Run-time load balancing advisory for Oracle integrated clients

  • Inter-database service failover

  • Replication lag based workload routing for Oracle Active Data Guard

  • Role-based global services for Oracle Active Data Guard

  • Centralized workload management framework

7.5.6 Platinum Summary: Data Protection, RTO, and RPO

The Platinum tier provides the same corruption protection as the Gold tier. The differences between the Platinum and Gold tiers are recovery time (RTO) and data loss potential (RPO) for Platinum-ready applications. RTO/RPO for the Platinum tier is summarized in Table 7-6.

Table 7-6 Platinum Tier Recovery Time (RTO) and Data Loss Potential (RPO)

Type Event Downtime Data Loss Potential

Unplanned

Database instance failure

Zero application outage (vs. seconds)

Zero

Unplanned

Recoverable server failure

Zero application outage (vs. seconds)

Zero

Unplanned

Data corruptions, unrecoverable server failure, database failures or site failures

Zero application outage (vs. zero to minutes)

Zero (vs. near-zero)

Planned

Online File Move, Online Reorganization and Redefinition, Online Patching

Zero

Zero

Planned

Hardware or operating system maintenance and database patches that cannot be done online

Zero application outage

Zero

Planned

Database upgrades: patch sets and full database releases

Zero application outage (vs. seconds)

Zero

Planned

Platform migrations

Zero application outage (vs. seconds)

Zero

Planned

Application upgrades that modify back-end database objects

Zero application outage (vs. hours to days)

Zero


7.6 Integrating Oracle Fusion Middleware High Availability

Flexible and automated high availability solutions ensure that applications you deploy on Oracle WebLogic Server meet the required availability to achieve your business goals. The solutions introduced in this book are described in detail in the Oracle Fusion Middleware High Availability Guide.

This section contains the following topics:

See Also:

  • Oracle Fusion Middleware Disaster Recovery Guide

  • Part VII "Advanced Administration: Backup and Recovery" in Oracle Fusion Middleware Administrator's Guide

7.6.1 Oracle WebLogic Server High Availability Architectures

Oracle WebLogic Server provides high availability and disaster recovery solutions for maximum protection against any kind of failure with flexible installation, deployment, and security options. These solutions are categorized into local high availability solutions that provide high availability in a single data center deployment, and disaster-recovery solutions, which are usually geographically distributed deployments that protect your applications from disasters such as floods or regional network outages.

At a high level, Oracle WebLogic Server local high availability architectures include several active-active and active-passive architectures. Although both types of solutions provide high availability, active-active solutions generally offer higher scalability and faster failover, although they tend to be more expensive. With either the active-active or the active-passive category, multiple solutions exist that differ in ease of installation, cost, scalability, and security.

See Also:

7.6.2 Redundant Architectures

Oracle WebLogic Server provides redundancy by offering support for multiple instances supporting the same workload. These redundant configurations provide increased availability either through a distributed workload, through a failover setup, or both.

From the entry point to an Oracle WebLogic Server system (content cache) to the back-end layer (data sources), all the tiers that are crossed by a request can be configured in a redundant manner with Oracle WebLogic Server. The configuration can be an active-active configuration using Oracle WebLogic Server Cluster or an active-passive configuration using Oracle WebLogic Server Cold Cluster Failover.

7.6.3 High Availability Services in Oracle Fusion Middleware

The Oracle Fusion Middleware High Availability Guide describes the following high availability services in Oracle Fusion Middleware in detail:

  • Process death detection and automatic restart

    Oracle WebLogic Server Node Manager monitors the Managed Servers. If a Managed Server goes down, Node Manager tries to restart it for a configured number of times.

  • Clustering

    Oracle Fusion Middleware uses WebLogic clustering capabilities, such as redundancy, failover, session state replication, cluster-wide JNDI services, Whole Server Migration, and cluster wide configuration.

  • State replication and routing

    Oracle WebLogic Server can be configured for replicating the state of stateful applications.

  • Load balancing and failover

    Oracle Fusion Middleware has a comprehensive feature set around load balancing and failover to leverage availability and scalability of Oracle RAC databases. All Oracle Fusion Middleware components have built-in protection against loss of service, data or transactions as a result of Oracle RAC instance unavailability due to planned or unplanned downtime.

  • Server migration

    Oracle Fusion Middleware components leverage WebLogic Server capabilities to provide failover an automatic restart on a different cluster member.

  • Rolling patching

    Oracle WebLogic Server allows for rolling patching where a minor maintenance patch can be applied to the product binaries in a rolling fashion without having to shut down the entire cluster.

  • Configuration management

    Most of the Oracle Fusion Middleware component configuration can done at the cluster level. Oracle Fusion Middleware uses WebLogic Server's cluster wide- configuration capabilities for server configuration, such as data sources, EJBs, and JMS, as well as component application artifacts, and ADF and WebCenter custom applications.

  • Backup and recovery

    Oracle Fusion Middleware backup and recovery is a simple solution based on file system copy for Middle-tier components.

See Also:

Oracle Fusion Middleware High Availability Guide

7.7 Integrating High Availability for All Applications

A highly available and resilient application requires that every component of the application must tolerate failures and changes. A highly available application must analyze every component that affects the application, including the network topology, application server, application flow and design, systems, and the database configuration and architecture. This book focuses primarily on the database high availability solutions.

See the high availability solutions and recommendations for Oracle Fusion Middleware, Oracle Fusion Applications, Oracle Enterprise Manager, and Oracle Applications Unlimited on the MAA website at:

http://www.oracle.com/goto/maa