3 Common Design Considerations for Continuous Availability

Oracle provides recommended design considerations and best practices for the supported Maximum Availability Architecture (MAA) solutions that support continuous availability. MAA architectures span data centers in distributed geographical locations. See Supported MAA Architectures for Continuous Availability. Oracle MAA is Oracle's best practices blueprint based on proven Oracle high availability technologies, expert recommendations and customer experiences. The goal of MAA is to achieve optimal high availability for Oracle customers at the lowest cost and complexity.

The recommendations in this chapter apply to all of the MAA architectures that support continuous availability. Recommendations that are specific to a particular architecture are provided in the subsequent chapters.

Topics in this chapter include:

Potential Failure Scenarios

Potential failure scenarios range from unexpected full and partial site failures to maintenance outages.

The design considerations and recommendations provided in this document apply to the following potential failure scenarios:

  • Full site failure - With full site failure, the database, the middle-tier application server, and all user connections fail over to a secondary site that is prepared to handle the production load.

  • Partial site failure - In the context of this document, partial failures are at the mid-tier. Partial site failures at the mid-tier can consist of the entire mid-tier (WebLogic Server and Coherence), WebLogic Server only failure, Coherence cluster failure, or a failure in one instance of Oracle Traffic Director when two instances are configured for high availability.

  • Network partition failure - The communication between sites fails.

  • Maintenance outage - During a planned maintenance all components of a site are brought down gracefully. A switchover will take place from one site to the other.

The behavior of the continuous availability features in the different failure scenarios are described in the following sections:

Global Load Balancer

When a global load balancer is deployed in front of the production and standby sites, it provides fault detection services and performance-based routing redirection for the two sites. Additionally, the load balancer can provide authoritative DNS name server equivalent capabilities.

In the event of a primary-site disaster and after the standby site has assumed the production role, a global load balancer is used to reroute user requests to the standby site. Global load balancers such as F5 –BigIP Global Traffic Manager (GTM) and Cisco –Global Site Selector (GSS) also handle DNS server resolution (by off loading the resolution process from the traditional DNS servers).

During normal operations, the global load balancer can be configured with the production site's load balancer name-to-IP mapping. When a DNS switchover is required, this mapping in the global load balancer is changed to map to the standby site's load balancer IP. This allows requests to be directed to the standby site, which now has the production role.

This method of DNS switchover works for both site switchover (planned) and failover (unplanned). One advantage of using a global load balancer is that the time for a new name-to-IP mapping to take effect can be almost immediate. The downside is that an additional investment must be made for the global load balancer. For instructions for performing a DNS switchover, see Manually Changing DNS Names in Disaster Recovery Guide.

Oracle Traffic Director

In an MAA environment you should configure an Oracle Traffic Director failover group. Combining two instances of Oracle Traffic Director using one or two virtual IP (VIP) addresses ensures high availability. Both the hosts in the failover group must run the same operating system version, use identical patches and service packs, and run Oracle Traffic Director instances of the same configuration.

When two Oracle Traffic Director instances are grouped by a virtual IP address (VIP) for high availability, they are in active-passive failover mode. The VIP receives requests and routes them to the Oracle Traffic Director instance that is designated as the primary instance. If the primary instance is not reachable, requests are routed to the backup instance.

For active-active failover mode, two failover groups are required. Each failover group must have a unique VIP, and consist of the same nodes, each with the primary and backup roles reversed. Each instance in the failover group is designated as the primary instance for one VIP and the backup for the other VIP. See Configuring Oracle Traffic Director for High Availability in Administering Oracle Traffic Director.

If you are running Oracle Traffic director on a Linux platform in high availability mode (both active-passive and active-active), you need to ensure that Oracle Traffic Director is the only consumer of the Keepalived process on all the hosts that are used for configuring the failover group. Keepalived is a process that runs on Linux. It is a health-check framework and implements a Hot Standby protocol. There can be no other applications running on these hosts that require the Keepalived process.

Although Oracle HTTP Server and Apache Plugins can also be used in Continuous Availability architectures to load balance requests to WebLogic Servers, they do not provide the integration you receive with Oracle Traffic Director.

The following sections provide some specific guidelines for configuring Oracle Traffic Director for continuous availability.

Network Preparation

  • Allocate one virtual IP address for each site to be used for the failover group of Oracle Traffic Director. The addresses must belong to the same subnet as that of the nodes in the failover group. They should be DNS resolvable and accessible over the network. Ensure that for the network interface on which the failover-group virtual IP is created is the same on all the Administration Node hosts. This is a requirement for smooth migration of the failover group from the primary site to the standby site.

  • At the standby site, ensure that the primary site's host names and the primary site's virtual IP resolve to the IP addresses of the corresponding peer systems. This can be set up by creating aliases for host names in the /etc/hosts file. For both disaster recovery deployment options, make sure aliases for all the systems and the virtual IP names exist.

Best Practices

  1. Oracle recommends that you test the standby site periodically. This will help mitigate failures at both sites. Test the standby site by switching its role with the current primary site:

    1. Follow the site switchover procedure to switch over the standby site to the new primary site.

    2. Once testing is complete, follow the site switchback procedure to reverse the roles.

    Periodic testing validates that both the primary and standby sites are completely functional and mitigates the risk of failure at both sites. It also validates the switchover and switchback procedures.

  2. Do not configure project-level and share-level replication within the same project.

  3. Ensure that the Oracle Traffic Director setup resides on shared storage and gets replicated to the remote site, making the Oracle Traffic Director binaries and latest configuration data available at the standby site during a site failure or site maintenance event. All the Oracle Traffic Director binaries, configuration data, logs, and security data should be replicated to the remote site using existing replication technology.

  4. Use the Scheduled replication mode for projects and shares in these cases:

    1. Data does not change frequently.

    2. The Recovery Point Objective value falls within your scheduled replication window.

  5. Use the Continuous replication mode for projects and shares in these cases:

    1. The standby site is required to be as close as possible to the primary site.

    2. The Recovery Point Objective value is a range of a few seconds, and the allowance is for very little data loss. Data is of a critical nature.

  6. Snapshots and clones can be used at the target site to off load backup, test, and development types of environments.

  7. When configuring a local standby site (disaster recovery within the data center), consider disabling SSL on the replication channel. Removing the encryption algorithm enables a higher replication throughput.

  8. Always enable SSL when replication is across a wide area network. For the OTD instance synchronization-based standby disaster recovery option, there must be a remote sync tool and a time-based scheduler application on the Administration Server host at each site for transferring the OTD instance changes between sites. Ensure that the NIS settings are configured and the NIS service is started

Web Tier

Configuring a web tier is optional in continuous availability MAA architectures. Web Tier products such as Oracle HTTP Server (OHS) and Oracle WebLogic Server Proxy Plug-In are designed to efficiently front-end WebLogic Server applications.

When possible, Oracle recommends using Oracle Traffic Director (OTD) to handle all load balancing to WebLogic server instances. You should not front end your web tier with Oracle Traffic Director in cases where there are:

  • Static contents like HTML, Images, Java Script, and so on. Oracle Traffic Director integrates with Continuous Availability features such as Zero Down Time Patching to provide maximum availability to applications during the rollout process and Live Partition Migration to provide continuous availability during the migration of application and resources in a MT partition from one WebLogic Cluster to another.

  • Middleware configurations that already make use of Oracle HTTP Server or WebLogic Server Proxy Plug-In.

Use existing replication technology or methods to keep Web Tier binaries and configuration consistent between sites.

OHS and WebLogic Server Proxy Plug-in can be used with other WebLogic Server Continuous Availability features but might require manual intervention and do not offer the same level of availability as Oracle Traffic Director.

For more information about Oracle HTTP Server and WebLogic Server Proxy Plug-Ins, see:

WebLogic Server

WebLogic Server features such as clustering, singleton services, session replication, and others can be used with the Continuous Availability features to provide the highest level of availability.

The following sections provide the design considerations for these WebLogic Server features in a continuous availability MAA architecture:

Clustering

A WebLogic Server cluster consists of multiple WebLogic Server server instances running simultaneously and working together to provide increased scalability, reliability, and high availability. A cluster appears to clients as a single WebLogic Server instance. The server instances that constitute a cluster can run on the same machine, or be located on different machines. You can increase a cluster's capacity by adding additional server instances to the cluster on an existing machine, or you can add machines to the cluster to host the incremental server instances. Each server instance in a cluster must run the same version of WebLogic Server.

WebLogic Server supports two types of clusters:

  • Dynamic clusters - Dynamic clusters consist of server instances that can be dynamically scaled up to meet the resource needs of your application. When you create a dynamic cluster, the dynamic servers are preconfigured and automatically generated for you, enabling you to easily scale up the number of server instances in your dynamic cluster when you need additional server capacity. Dynamic clusters allows you to define and configure rules and policies to scale up or shrink the dynamic cluster.

    In dynamic clusters, the Managed Server configurations are based off of a single, shared template. It greatly simplifies the configuration of clustered Managed Servers, and allows for dynamically assigning servers to machine resources and greater utilization of resources with minimal configuration.

    Dynamic cluster elasticity allows the cluster to be scaled up or down based on conditions identified by the user. Scaling a cluster can be performed on-demand (interactively by the administrator), at a specific date or time, or based on performance as seen through various server metrics.

    When shrinking a dynamic cluster, the Managed Servers are shut down gracefully and the work/transactions are allowed to complete. If needed, singleton services are automatically migrated to another instance in the cluster.

  • Static clusters - In a static cluster the end-user must configure new servers and add them to the cluster, and start and stop them manually. The expansion and shrinking of the cluster is not automatic; it must be performed by an administrator.

In most cases, Oracle recommends the use of dynamic clusters to provide elasticity to WebLogic deployments. The benefits of dynamic clusters are minimal configuration, elasticity of clusters, and proper migration of JMS and JTA singleton services when shrinking the cluster.

However, there are some instances where static clusters should be used:

  • If you need to manually migrate singleton services. Dynamic clusters do not support manual migration of singleton services.

  • If your configuration consists of Oracle Fusion Middleware upper stack products such as Oracle SOA Suite and Oracle Business Process Management. These products do not provide support for dynamic clusters in this release.

Singleton Services

A singleton service is a service running on a Managed Server that is available on only one member of a cluster at a time. WebLogic Server allows you to automatically monitor and migrate singleton services from one server instance to another.

Pinned services, such as JMS-related services and user-defined singleton services are hosted on individual server instances within a WebLogic cluster. To ensure that singleton JMS or JTA services do not introduce a single point of failure for dependent applications in the cluster, WebLogic Server can be configured to automatically or manually migrate them to any server instance in the cluster.

Within an application, you can define a singleton service that can be used to perform tasks that you want to be executed on only one member of a cluster at any give time. Automatic singleton service migration allows the automatic health monitoring and migration of user-defined singleton services.

Singleton services described in the following sections include:

Server and Service Migration

Oracle WebLogic Server supports two distinct types of automatic migration mechanisms:

  • Whole server migration, where a migratable server instance, and all of its services, is migrated to a different physical machine upon failure. When a failure occurs in a server that is part of a cluster that is configured with server migration, the server is restarted on any of the other machines that host members of the cluster. See Whole Server Migration in Administering Clusters for Oracle WebLogic Server.

  • Service migration, where failed services are migrated from one server instance to a different available server instance within the cluster. In some circumstances, service migration performs much better then whole server migration because you are only migrating the singleton services as opposed to the entire server. See Service Migration in Administering Clusters for Oracle WebLogic Server.

Both whole server and Service migration require that you configure a database leasing table. See Leasing.

Instructions for configuring WebLogic Server to use server and service migration in an MAA environment are provided in Using Whole Server Migration and Service Migration in an Enterprise Deployment in Enterprise Deployment Guide for Oracle SOA Suite.

Data Stores

There are two kinds of persistent data stores for Oracle WebLogic Server transactions logs and Oracle WebLogic Server JMS: database-based and file-based.

Keeping persistent stores in the database provides the replication and high availability benefits inherent in the underlying database system. With JMS, TLogs and the application in the same database and replication handled by Oracle Data Guard, cross-site synchronization is simplified and the need for a shared storage sub-system such as a NAS or a SAN is alleviated in the middle tier. See Database.

However, storing TLogs and JMS stores in the database has a penalty on system performance. This penalty is increased when one of the sites needs to cross communicate with the database on the other site. Ideally, from a performance perspective, shared storage that is local to each site should be used for both types of stores and the appropriate replication and backup strategies at storage level should be provisioned in order to guarantee zero data loss without performance degradation. Whether using database stores will be more suitable than shared storage for a system depends on the criticality of the JMS and transaction data, because the level of protection that shared storage provides is much lower than the database guarantees.

You can minimize the performance impact of database stores, especially when there is a large concurrency, by using techniques such as global hash partitions for indexes (if Oracle Database partitioning is available). For recommendations about minimizing the performance impact, see Using Persistent Stores for TLOGs and JMS in an Enterprise Deployment in Enterprise Deployment Guide for Oracle SOA Suite.

In active-active and active-passive topologies, keeping the data stores in the database is a requirement. Oracle recommends keeping WebLogic Server stores such as JMS and JTA stores, in a highly available database such as Oracle RAC and connecting to the database using Active GridLink data sources for maximum performance and availability.

In the case of an active-active stretch cluster, you can choose between keeping the data stores in a shared storage sub-system such as a NAS or a SAN, or in the database.

Leasing

Leasing is the process WebLogic Server uses to manage services that are required to run on only one member of a cluster at a time. Leasing ensures exclusive ownership of a cluster-wide entity. Within a cluster, there is a single owner of a lease. Additionally, leases can failover in case of server or cluster failure which helps to avoid having a single point of failure. See Leasing in Administering Clusters for Oracle WebLogic Server.

For database leasing we recommend the following:

  • A highly available database such as Oracle RAC and Active GridLink (AGL).

  • A standby database, and Oracle Data Guard to provide replication between the two databases.

When using database leasing, Oracle WebLogic Servers may shut down if the database remains unavailable (during switchover or failover) for a period that is longer than their server migration fencing times. You can adjust the server migration fencing times as described in the following topics in Administering Clusters for Oracle WebLogic Server:

Session Replication

WebLogic Server provides three methods for replicating HTTP session state across servers in a cluster:

  • In-memory replication - Using in-memory replication, WebLogic Server copies a session state from one server instance to another. The primary server creates a primary session state on the server to which the client first connects, and a secondary replica on another WebLogic Server instance in the cluster. The replica is kept up-to-date so that it may be used if the server that hosts the servlet fails.

  • JDBC-based persistence - In JDBC-based persistence, WebLogic Server maintains the HTTP session state of a servlet or JSP using file-based or JDBC-based persistence. For more information on these persistence mechanisms, see Configuring Session Persistence in Developing Web Applications, Servlets, and JSPs for Oracle WebLogic Server.

  • Coherence*Web - Coherence*Web is not a replacement for WebLogic Server's in-memory HTTP state replication services. However, you should consider using Coherence*Web when an application has large HTTP session state objects, when running into memory constraints due to storing HTTP session object data, or if you want to reuse an existing Coherence cluster. See Using Coherence*Web with WebLogic Server in Administering HTTP Session Management with Oracle Coherence*Web.

Depending on the latency model, tolerance to session loss, and performance, you should choose the method that best fits your requirement.

  • When the latency is small, such as in MAN networks (stretch cluster topology), Oracle recommends WebLogic Server in-memory session replication. However, if a site experiences a failure there is the possibility of session loss.

  • When the latency is large (WAN networks), Active-Active, or Active-Passive topologies, and when your applications cannot tolerate session loss, Oracle recommends database session replication.

In most cases, in-memory session replication performs much better than database session replication. See Failover and Replication in a Cluster in Administering Clusters for Oracle WebLogic Server.

Data Sources

WebLogic Active GridLink Data Sources integrate with Oracle RAC databases and Oracle Data Guard to provide the best performance, high scalability and the highest availability. The integration with Oracle RAC enables Active GridLink to do Fast Connection Failover (FCF), Runtime Load Balancing (RCLB) and Affinity features. Active GridLink can handle planned maintenance in the database without any interruptions to end-users while allowing all work to complete.

Oracle recommends keeping WebLogic Server stores such as JMS and JTA stores, and leasing tables in a highly available database such as Oracle RAC and connecting to the database using Active GridLink data sources. Storing the stores and leasing tables in the database provides the following advantages:

  • Exploits the replication and other high availability aspects inherent in the underlying database system.

  • Enhances handling of disaster recovery scenarios. When JMS, the TLogs and the application are in the same database and the replication is handled by Data Guard, there is no need to worry about cross-site synchronization.

  • Alleviates the need for a shared storage sub-system such as a NAS or a SAN. Usage of the database also reduces overall system complexity since in most cases a database is already present for normal runtime/application work.

You can configure your Active GridLink URL to minimize the time to failover between databases. See Supported AGL Data Source URL Formats in Administering JDBC Data Sources for Oracle WebLogic Server.

Security

It is important that you determine your security needs and make sure that you take the appropriate security measures before you deploy WebLogic Server and your Java EE applications into a production environment. See Ensuring the Security of Your Production Environment in Securing a Production Environment for Oracle WebLogic Server.

Storage

The Oracle Fusion Middleware components in a given environment are usually interdependent on each other, so it is important that the components in the topology are in sync. Some of the storage artifacts that you need to take into consideration in an MAA environment are classified as static and dynamic.

  • Static artifacts are files and directories that do not change frequently. These include:

    • home: The Oracle home usually consists of an Oracle home and an Oracle WebLogic Server home.

    • Oracle Inventory: This includes oraInst.loc and oratab files, which are located in the /etc directory.

  • Dynamic or runtime artifacts are files that change frequently. Runtime artifacts include:

    • Domain home: Domain directories of the Administration Server and the Managed Servers.

    • Oracle instances: Oracle Instance home directories.

    • Application artifacts, such as .ear or .war files.

    • Database artifacts, such as the MDS repository.

    • Database metadata repositories used by Oracle Fusion Middleware.

    • Persistent stores, such as JMS providers and transaction logs. See Data Stores.

    • Deployment plans: Used for updating technology adapters such as file and JMS adapters. They need to be saved in a location that is accessible to all nodes in the cluster to which the artifacts are deployed.

For maximum availability, Oracle recommends using redundant binary installations. Each node should have its own Oracle home so that when you apply Zero Downtime Patches, only servers in one node need to come down at a time.

For recommended guidelines regarding shared storage for artifacts such as home directories and configuration files, see Using Shared Storage in High Availability Guide.

Zero Downtime Patching

Zero Downtime Patching (ZDT Patching) provides continuous application availability during the process of rolling out upgrades, even though the possibility of failures during the rollout process always exists. In an MAA environment, Oracle recommends patching one site at a time, and staggering the update to the other site to ensure that the sites remain synchronized. In the case of a site failure scenario, allow for the failed site to resume before resuming ZDT Patching.

When using ZDT Patching, consider the following:

  • Rollout shuts down one node at a time, so the more nodes in a cluster, the less impact it has on the cluster's ability to handle traffic.

  • If a cluster has only two nodes, and one node is down for patching, then high availability cannot be guaranteed. Oracle recommends having more than two nodes in the cluster.

  • If you include a Managed Server on the same node that includes the Administration Server, then both servers must be shutdown together to update Oracle home.

  • Two clusters can have servers on the same node sharing an Oracle home, but both clusters need to be shutdown and patched together.

  • If your configuration contains two Oracle homes, then Oracle recommends that you create and patch the second Oracle home on a nonproduction machine so that you can test the patches you apply, but this is not required. The Oracle home on that node must be identical to the Oracle home you are using for your production domain.

Cross-Site XA Transaction Recovery

Cross-site XA transaction recovery uses the leasing framework to automate cross-site recovery. The leasing design requirements for cross-site transaction recovery are essentially the same as those defined for database leasing of transactions recovery service as described in Leasing. The leasing table is created automatically when the Managed Servers are started and the site is configured. There is one leasing table per site for all domains in the site.

See Transaction Recovery Spanning Multiple Sites or Data Centers in Developing JTA Applications for Oracle WebLogic Server.

Oracle recommends that you always use hostnames (not static IPs) to specify the listen address of Managed Servers and that you configure the hostnames identically on all sites. As hostnames between sites are identical but IPs are not, the hostname provides the dynamic ability to simply start an identically configured server or domain on the recovery site.

Table 3-1 lists the recommended settings for the JTA MBean properties that can be configured in a continuous availability MAA topology. See also JTAMBean in MBean Reference for Oracle WebLogic Server.

Table 3-1 Configurable MBean Properties for Cross-Site XA Transaction Recovery

Configuration Property Description Recommended Setting

CrossSiteRecoveryLeaseExpiration

Specifies the time, in seconds, after which a lease expires making it eligible for recovery by another site.

30 seconds (default)

CrossSiteRecoveryRetryInterval

Specifies the interval at which the recovery site servers check the leasing table to see if the lease has expired and if the owning serves are down.

60 seconds (default)

CrossSiteRecoveryLeaseUpdate

Specifies the frequency at which the servers renew their leases

10 seconds (default)

Coherence

Coherence features such as federated caching, persistence, and GoldenGate Hot Cache can be used together with the Continuous Availability features to provide the highest level of availability.

The following sections provide the design considerations for Coherence in a continuous availability MAA architecture.

Coherence Federated Caching

For Coherence running federated caching (but not stretch clusters), these are some of the ramifications:

  • Coherence data reaches the other site at some point in an ordered fashion (in Coherence, ordering is per Coherence partition), even after network partition or remote cluster outage.

  • The remote site may read stale data for a period of time after the local site is being updated.

  • Update conflicts are possible, and we identify these and call out to an application-specific conflict resolver.

Coherence federated caching implements an eventual consistency model between sites for the following reasons:

  • The data center can be anywhere; the location is not constrained by latency or available bandwidth.

  • Tolerance for unavailability of the remote data center or cluster is extremely desirable. Note that it is very hard to tell the difference between communications being down and a remote cluster being down, and it is not necessary to differentiate.

  • Full consistency in active-active configurations requires some sort of distributed center concurrency control, as well as synchronous writes. This can have a significant impact on performance and is not desirable. Instead, where consistency matters, you can use stretch clusters with synchronous replications. In this case, it is reasonable to assert a maximum latency between data centers, with guaranteed bandwidth.

Coherence Persistent Cache

Cached data is persisted so that it can be quickly recovered after a catastrophic failure or after a cluster restart due to planned maintenance. In multi-data center environments, Oracle recommends using Coherence persistence and federated caching together to ensure the highest level of protection during failure or planned maintenance events.

Persistence is only available for distributed caches and requires the use of a centralized partition assignment strategy. There are two persistence modes:

  • Active persistence - cache contents are automatically persisted on all mutations and are automatically recovered on cluster/service startup.

  • On-demand persistence - a cache service is manually persisted and recovered upon request using the persistence coordinator.

See Persisting Caches in Administering Oracle Coherence.

Coherence GoldenGate Hot Cache

Within a single Coherence cluster with a mix of data, where some of the data is owned by the database and some of it is owned by Coherence, you can use both Coherence Read-Through cache and Coherence GoldenGate Hot Cache.

The choice between HotCache and Read-Through cache comes down to (1) whether Read-Through may lead to stale reads if the database is updated behind Coherence's back and (2) whether the real-time nature of HotCache is preferred for other reasons. There can also be situations where both HotCache and Read-Through are used together, for example to push real-time updates via HotCache, but then to handle the case where data was removed due to eviction or expiration.

Database

Oracle Database provides several products such as Oracle Data Guard, Oracle Real Application Clusters (Oracle RAC) and others that can be integrated to provide high availability of the database in MAA architectures. See Disaster Recovery for Oracle Database. Regardless of the topology, the goal is to minimize the time that it will take for switchover and failover of your databases.

To achieve high availability of your database for both planned and unplanned outages, Oracle recommends using an active-passive configuration with a combination of the following products:

  • Oracle RAC as the highly available database.

  • Oracle Data Guard because it eliminates single points of failure for mission critical Oracle Databases. It prevents data loss and downtime by maintaining a synchronized physical replica of a production database at a remote location. If the production database is unavailable for any reason, client connections can quickly, and in some configurations transparently, failover to the synchronized replica to restore service. Applications can take advantage of Oracle Data Guard with little or no application changes required.

    In a WebLogic Server continuous availability MAA architecture, Oracle recommends using the Oracle Data Guard maximum availability protection mode. This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. It ensures zero data loss except in the case of certain double faults, such as failure of a primary database after failure of the standby database. See Oracle Data Guard Protection Modes in Data Guard Concepts and Administration.

    Note:

    Oracle Data Guard can only be used in active-passive configurations, but guarantees zero-data loss.

  • Oracle Active Data Guard, an option built on the infrastructure of Oracle Data Guard, allows a physical standby database to be opened read-only while changes are applied to it from the primary database. This enables read-only applications to use the physical standby with minimal latency between the data on the standby database and that on the primary database, even while processing very high transaction volumes at the primary database. This is sometimes referred to as real-time query.

    An Oracle Active Data Guard standby database is used for automatic repair of data corruption detected by the primary database, transparent to the application. In the event of an unplanned outage on the primary database, high availability is maintained by quickly failing over to the standby database. An Active Data Guard standby database can also be used to off-load fast incremental backups from the primary database because it is a block-for-block physical replica of the primary database.

    Oracle Active Data Guard provides a far sync feature that improves performance in zero data loss configurations. An Oracle Data Guard far sync instance is a remote Oracle Data Guard destination that accepts redo from the primary database and then ships that redo to other members of the Oracle Data Guard configuration. Unlike a standby database, a far sync instance does not have data files, cannot be opened, and cannot apply received redo. These limitations yield the benefit of using fewer disk and processing resources. More importantly, a far sync instance provides the ability to failover to a terminal database with no data loss if it receives redo data using synchronous transport mode and the configuration protection mode is set to maximum availability. See Far Sync in Oracle Data Guard Concepts and Administration.

  • Oracle Data Guard broker as a distributed management framework that automates and centralizes the creation, maintenance, and monitoring of Data Guard configurations. Some of the operations Data Guard Broker can perform is the creation, management, monitoring of the Data Guard configurations, invoking switchover or failover to initiate and control complex role changes across all databases in the configuration, and configuring failover to occur automatically.

    You can enable Oracle Data Guard fast-start failover to fail over automatically when the primary database becomes unavailable. When fast-start failover is enabled, the Oracle Data Guard broker determines if a failover is necessary and initiates the failover to the specified target standby database automatically, with no need for database administrator intervention. See Managing Fast-Start Failover in Oracle Data Guard Broker.

    Deploying Data Guard broker with Oracle Data Guard is essential before Oracle Site Guard can be used for disaster recovery.

  • Active GridLink makes the scheduled maintenance process at the database servers transparent to applications. When an instance is brought down for maintenance at the database server, draining ensures that all work using instances at that node completes and that idle sessions are removed. Sessions are drained without impacting in-flight work.

  • Application continuity protects you during planned and unplanned outages. Use Application Continuity and Active GridLink for maximum availability during unplanned down events.

  • Global Data Services (GDS) or Data Guard broker with Fast Application Notifications (FAN) to drain across sites. When you use Active Data Guard, work can complete before switching over to secondary database.

If your configuration requires that you have an active-active database configuration, Oracle recommends:

  • Oracle GoldenGate, which allows for databases to be in active-active mode. Both read and write services are active-active on the databases on both sites. See Configuring Oracle GoldenGate for Active-Active High Availability in Administering Oracle GoldenGate.

    When using Oracle Golden Gate, Application Continuity and Active GridLink can be used within a site (intra-site) to handle planned and unplanned down database events. Application Continuity cannot be used to replay transactions during failover or switchover operations across sites (inter-site). Application Continuity does not support failover to a logically different database –including Oracle Logical Standby and Oracle Golden Gate. Replay has a strict requirement that it applies to databases with verified no transaction loss.

    Note:

    Because of the asynchronous replication nature of Oracle GoldenGate, applications must tolerate data loss due to network lag.

  • Implementing conflict resolution with full active/active for all applications/schema that are using Oracle GoldenGate.

  • Designing an environment that requires web affinity to avoid seeing stale data (stick at a site in conversation). Global Data Services (GDS) can provide affinity to the database that is local to the site and manage global services.

When environments require an active-active database, a combination of these technologies can be used to maximize availability and minimize data loss in planned maintenance events.

Oracle Enterprise Manager and Oracle Site Guard

Oracle Site Guard, a part of Oracle Enterprise Manager, provides flexible and seamless orchestration of switchovers and failovers between disaster recovery sites, thereby minimizing downtime for enterprise deployments. The disaster recovery automation features in Oracle Site Guard eliminate the need for human intervention and prevent human-induced errors in the switchover or failover process.

For information about how to configure Oracle Site Guard for switchovers and failovers between sites, see Configuring Oracle Site Guard in Site Guard Administrator's Guide.

For Site Guard best practices, see the Automating Disaster Recovery using Oracle Site Guard for Oracle Exalogic white paper at http://www.oracle.com/technetwork/database/availability/maa-site-guard-exalogic-exadata-1978799.pdf.