PK
e|Loa, mimetypeapplication/epub+zipPK e|L META-INF/container.xml
Topics include:
Continuous availability is the ability of a system to provide maximum availability by employing both high availability and disaster recovery solutions to ensure that applications are available when they are needed. Typically, a high availability solution provides redundancy in one data center. Disaster recovery solutions provide the ability to safeguard against natural or unplanned outages at a production site by having a recovery strategy for applications and data to a geographically separate standby site.
Oracle WebLogic Server Continuous Availability provides an integrated solution for building maximum availability architectures (MAA) that span data centers in distributed geographical locations. Integrated components include Oracle WebLogic Server, Oracle Coherence, Oracle Traffic Director, and Oracle Site Guard. The major benefits of this integrated solution are faster failover or switchover, increased overall application availability, data integrity, reduced human error and risk, recovery of work, and local access of real-time data.
The following list describes the common terminology that applies to continuous availability:
Active-active: An active-active solution deploys two or more active servers to improve scalability and provide high availability. In active-active deployments, all instances handle requests concurrently. When an entire domain or site fails, transactions can be recovered by an active server in a different domain either collocated in the same site or on a different site.
Active-passive: An active-passive solution involves setting up and pairing a standby site at a geographically different location with an active (production) site. The standby site may have equal or fewer services and resources compared to the production site. Application data, metadata, configuration data, and security data are replicated periodically to the standby site. The standby site is normally in a passive mode; it is started when the production site is not available. This model is usually adopted when the two sites are connected over a WAN, and network latency does not allow clustering across the two sites.
WebLogic Server cluster: A Weblogic Server cluster is a collection of WebLogic Server server instances running simultaneously and working together to provide increased scalability and reliability. In a cluster, most resources and services are deployed identically to each Managed Server, enabling failover and load balancing.
Coherence cluster: A Coherence cluster is a collection of Java Virtual Machines (JVM) processes, called Coherence servers, that run Coherence. A Coherence cluster consists of multiple Coherence server instances that distribute data in-memory to increase application scalability, availability, and performance. Application data is automatically and transparently distributed and backed up across cluster members.
Stretch cluster: A stretch cluster is a cluster in which nodes can span data centers within a proximate geographical range, usually with guaranteed, relatively low latency networking between the sites. Stretch clusters are also referred to as extended clusters.
High availability: High availability is the ability of a system or device to be available when it is needed. A high availability architecture ensures that users can access a system without loss of service. Deploying a high availability system minimizes the time when the system is down, or unavailable, and maximizes the time when it is running, or available.
Disaster recovery: Disaster recovery is the ability to safeguard against natural or unplanned outages at a production site by having a recovery strategy for applications and data to a geographically separate standby site.
Switchover: Switchover is the process of reversing the roles of the production site and the standby site. Switchovers are planned operations done for periodic validation or to perform planned maintenance on the current production site. During a switchover, the current standby site becomes the new production site, and the current production site becomes the new standby site.
Failover: Failover is the process of making the current standby site the new production site after the production site becomes unexpectedly unavailable (for example, due to a disaster at the production site).
Latency: Latency is the time that it takes for packets to travel from one cluster to another, and can be a factor of many things, including the length of the path between the sites and any layers in between. Typically latency is determined by using utilities such as traceroute
or ping
to send test packets from one site to another. The latency or round-trip time (RTT) has a direct effect on the response time that any one user experiences when accessing the system. The effects of high latency can be seen even with only one user on the system.
Metropolitan area network (MAN): A MAN is a telecommunications or computer network that spans an entire city or campus. The MAN standard for data communication specified in the IEEE 802.6 standard is called distributed-queue dual-bus (DQDB). With DQDB, networks can extend up to 20 miles (30 km) long and operate at speeds of 34–155 Mbit/s. A stretch cluster topology is appropriate in a MAN.
Wide Area Network (WAN): A WAN is a telecommunications or computer network that extends over large geographical distances and between different LANs, MANs and other localized computer networking architectures. Wide area networks are often established with leased telecommunication circuits. Distance and latency of a WAN need to taken into consideration when determining the type of topology you can configure.
Continuous Availability provides maximum availability, reliability, and application stability during planned upgrades or unexpected failures. It builds on the existing high availability features in Oracle WebLogic Server, Oracle Coherence, and Oracle Fusion Middleware, and supports the key features described in the following sections.
Automated cross-site XA transaction recovery provides automatic recovery of XA transactions across an entire domain, or across an entire site with servers running in a different domain or at a different site. Cross-site transaction recovery uses the leasing framework to automate cross-site recovery. The leasing design follows the existing model for database leasing of transaction recovery service (TRS) migration within a cluster.
In active/active architectures, transactions can be recovered when an entire domain or site fails by having an active server running in a different domain either collocated at the same site or at a different site. In active/passive architectures, the server at the passive (standby) site at a different location can be started when the production site is no longer available.
For more information, see "Transaction Recovery Spanning Multiple Sites or Data Centers" in Developing JTA Applications for Oracle WebLogic Server. For design considerations when using cross-site XA transaction recovery in continuous availability architectures, see Cross-Site XA Transaction Recovery.
Automated cross-site XA transaction recovery also takes advantage of the WebLogic Server high availability features described in WebLogic Server High Availability Features.
WebLogic Server Zero Downtime Patching (ZDT Patching) provides an automated mechanism to orchestrate the rollout of patches while avoiding downtime or loss of sessions. It reduces risks and downtime of mission-critical applications that require availability and predictability while applying patches.
Using workflows that you define, you can patch or update any number of nodes in a domain with little or no manual intervention. Changes are rolled out to one node at a time, allowing a load balancer such as Oracle Traffic Director to redirect incoming traffic to the remaining nodes until the node has been updated.
You can use ZDT Patching to update Coherence applications while maintaining high availability of the Coherence data during the rollout process.
For more information, see Administering Zero Downtime Patching Workflows.
In WebLogic Server Multitenant environments, you can migrate partition resource groups that are running from one cluster/server to another within a domain without impacting the application users. A key benefit of migrating the resource groups is that it eliminates application downtime for planned events.
Resource groups are a collection of (typically) related deployable resources, such as Java EE applications and the data sources, Java Message Service (JMS) artifacts, and other resources that the applications use. When you migrate a resource group, you change the virtual target used by the resource group from one physical target (cluster/server) to another. After migration, the virtual target points to the new physical target (cluster/server).
For more information about resource groups and migration, see the following topics in Using WebLogic Server Multitenant:
The Oracle Coherence federated caching feature replicates cache data asynchronously across multiple geographically distributed clusters. Cached data is replicated across clusters to provide redundancy, off-site backup, and multiple points of access for application users in different geographical locations.
Federated caching supports multiple replication topologies. These include:
Active-passive: Replicates data from an active cluster to a passive cluster. The passive site supports read-only operations and off-site backup.
Active-active: Replicates data between active clusters. Data that is put into one active cluster is replicated at the other active clusters. Applications at different sites have access to a local cluster instance.
Hub and spoke: Replicates data from a single hub cluster to multiple spoke clusters. The hub cluster can only send data and the spoke clusters can only receive data. This topology requires multiple geographically dispersed copies of a cluster. Each spoke cluster can be used by local applications to perform read-only operations.
For more information, see “Federating Caches Across Clusters” in Administering Oracle Coherence.
The Oracle Coherence GoldenGate HotCache feature detects and reflects database changes in cache in real time. Third-party updates to the database can cause Coherence applications to work with data that can be stale and out-of-date. Coherence GoldenGate HotCache solves this problem by monitoring the database and pushing any changes into the Coherence cache in real time. It employs an efficient push model that processes only stale data. Low latency is assured because the data is pushed when the change occurs in the database.
In Maximum Availability Architectures, when the database is replicated to a secondary site during failover, the database changes are reflected to the cache using GoldenGate HotCache.
For more information, see “Integrating with Oracle Coherence GoldenGate HotCache” in Integrating Oracle Coherence.
Oracle Traffic Director is a fast, reliable, and scalable software load balancer that routes HTTP, HTTPS, and TCP traffic to application servers and web servers on the network. It distributes the requests that it receives from clients to available servers based on the specified load-balancing method, routes the requests based on specified rules, caches frequently accessed data, prioritizes traffic, and controls the quality of service. Oracle Traffic Director is a layer-7 software load balancer and is not meant to replace a global load balancer.
The architecture of Oracle Traffic Director enables it to handle large volumes of application traffic with low latency. For high availability, you can set up pairs of Oracle Traffic Director instances for either active-passive or active-active failover. As the volume of traffic to your network grows, you can easily scale the environment by reconfiguring Oracle Traffic Director with additional back-end servers to which it can route requests.
There is a tight integration between Oracle Traffic Director and Continuous Availability features such as Zero Downtime Patching and Live Partition Migration to provide zero downtime to applications, either during a rolling upgrade process or during partition migration. This integration allows for applications to be highly available without requiring any changes.
For design considerations when using Oracle Traffic Director in continuous availability architectures, see Oracle Traffic Director. For additional information, see Administering Oracle Traffic Director.
Oracle Site Guard, a component of Oracle Enterprise Manager Cloud Control, is a disaster recovery solution that enables administrators to automate complete site switchover or failover, thereby minimizing downtime for enterprise deployments. Because Oracle Site Guard operates at the site level, it eliminates the need to tediously perform manual disaster recovery for individual site components like applications, middleware, databases, and so on. The traffic of an entire production site can be redirected to a standby site in a single operation.
Administrators do not require any special skills or domain expertise in areas like databases, applications, and storage replication. Oracle Site Guard can continuously monitor disaster recovery readiness, and it can do this without disrupting the production site.
You can manage an Oracle Site Guard configuration by using either the Enterprise Manager Command-Line Interface (EMCLI) or a compatible version of Oracle Enterprise Manager Cloud Control (Cloud Control).
For more information, see Site Guard Administrator's Guide.
In addition to the features described in Continuous Availability Key Features, Oracle Continuous Availability also takes advantage of the high availability features provided with WebLogic Server and Coherence, as described in the following sections.
The following WebLogic Server features can be used with the Oracle Continuous Availability features to provide the highest level of availability:
Clustering: A WebLogic Server cluster consists of multiple WebLogic Server server instances running simultaneously and working together in a domain to provide increased scalability and reliability. For more information, see Clustering.
Singleton services: Services such as server and service migration, persistent data stores, and leasing make singleton services such as JMS and JTA highly available in a WebLogic Server cluster. For more information, see Singleton Services.
Session Replication: Session replication is a feature of WebLogic Server clusters that is used to replicate the data stored in a session across different server instances in the cluster. For more information, see Session Replication.
Transaction and data source features:
Active GridLink data sources that use Fast Connection Failover to provide rapid failure detection of Oracle Real Application Clusters (Oracle RAC) nodes, and failover to remaining nodes for continuous connectivity. For design considerations when using Active Gridlink in continuous availability architectures, see Data Sources. For additional information, see “Using Active GridLink Data Sources” in Administering JDBC Data Sources for Oracle WebLogic Server.
Transaction logs in the database (JDBC TLogs) that store information about committed transactions coordinated by the server that may not have been completed. WebLogic Server uses the TLogs when recovering from system crashes or network failures. For more information, see “Using Transaction Log Files to Recover Transactions” in Developing JTA Applications for Oracle WebLogic Server.
No transaction TLog writes (No TLog) where you eliminate writes of the transaction checkpoints to the TLog store. For more information, see “XA Transactions without Transaction TLog Write” in Developing JTA Applications for Oracle WebLogic Server.
Logging Last Resource (LLR) transaction optimization which is a performance enhancement option that enables one non-XA resource to participate in a global transaction with the same ACID (atomicity, consistency, isolation, durability) guarantee as XA. For more information, see “Logging Last Resource Transaction Optimization” in Developing JTA Applications for Oracle WebLogic Server.
These features work with Oracle Data Guard which replicates databases to make transaction logs needed for recovery to be highly available. For more information about Oracle Data Guard, see Data Guard Concepts and Administration.
For more information about high availability in Oracle Fusion Middleware, see High Availability Guide.
Coherence persistence is a set of tools and technologies that manage the persistence and recovery of Coherence distributed caches. Cached data is persisted so that it can be quickly recovered after a catastrophic failure or after a cluster restart due to planned maintenance. Persistence and federated caching can be used together as required. For more information about Coherence persistence, see “Persisting Caches” in Administering Oracle Coherence.
When an application asks for an entry to the Coherence cache, if the entry does not exist in the cache and does exist in the database, then Coherence updates the cache with the database value. This is called Read-Through caching. For more information, see Read-Through Caching in Developing Applications with Oracle Coherence.
Coherence clusters consist of multiple Coherence server instances that distribute data in-memory to increase application scalability, availability, and performance. Application data is automatically and transparently distributed and backed up across cluster members. For more information about Coherence clusters, see “Configuring and Managing Coherence Clusters” in Administering Clusters for Oracle WebLogic Server.
Oracle WebLogic Server 12c provides strong support for integrating with the High Availability (HA) features of the Oracle database. Integrating with these HA features minimizes database access time while allowing transparent access to rich pooling management functions that maximizes both connection performance and application availability.
Oracle Continuous Availability takes advantage of the HA database features described in this section. The integration of all these products contributes to managing and orchestrating the failover and switchover of the Oracle Database, and makes the failover of the database fast and automatic.
Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. It provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive disasters and data corruptions. Oracle Data Guard maintains these standby databases as transactionally consistent copies of the primary database. If the primary database becomes unavailable because of a planned or an unplanned outage, then Oracle Data Guard enables you to switch any standby database to the production role, thus minimizing the downtime associated with the outage. See Data Guard Concepts and Administration.
Oracle Active Data Guard is a comprehensive solution to eliminate single points of failure for mission critical Oracle Databases. It prevents data loss and downtime by maintaining a synchronized physical replica (standby) of a production database (primary). If there is an outage, client connections quickly failover to the standby and resume service. Active Data Guard achieves the highest level of data protection through deep integration with Oracle Database, strong fault isolation, and unique Oracle-aware data validation. System and software defects, data corruption, and administrator error that affect a primary are not mirrored to the standby. Idle redundancy is eliminated by directing read-only workloads and backups to active standby databases for high return on investment. See Data Guard Concepts and Administration.
Oracle Data Guard broker logically groups these primary and standby databases into a broker configuration that enables the broker to manage and monitor them together as an integrated unit. It sends notifications to WebLogic Active GridLink which then makes new connections to the database in the failover site, and coordinates with Oracle Clusterware to fail over role-based services. Oracle Site Guard uses Oracle Data Guard broker to perform the failover or switchover of the databases. See Data Guard Broker.
Oracle Real Application Clusters (Oracle RAC) is a clustered version of Oracle Database that allows running multiple database instances on different servers in the cluster against a shared set of data files, also known as the database. The database spans multiple hardware systems and yet appears as a single unified database to the application. See Real Application Clusters Administration and Deployment Guide.
Oracle Clusterware manages the availability of instances of an Oracle RAC database. It works to rapidly recover failed instances to keep the primary database available. If Oracle Clusterware cannot recover a failed instance, then the broker continues to run automatically with one fewer instance. If the last instance of the primary database fails, then the broker provides a way to fail over to a specified standby database. If the last instance of the primary database fails, and fast-start failover is enabled, then the broker can continue to provide high availability by automatically failing over to a pre-determined standby database. See Clusterware Administration and Deployment Guide.
Oracle GoldenGate is a high-performance software application that uses log-based bidirectional data replication for real-time capture, transformation, routing, and delivery of database transactions across heterogeneous systems. Oracle GoldenGate allows for databases to be in active-active mode. Applications that use Oracle GoldenGate must have tolerance for data loss due to the asynchronous nature of Oracle GoldenGate replication. See Administering Oracle GoldenGate for Windows and UNIX.
Oracle Database 12c Global Data Services (GDS) streamline the delivery of database services on a global scale, which is key to deploying databases in MAA environments.These technologies oversee replication and failover while performing load balancing within and across data centers, optimizing resource utilization and streamlining database management practices in a distributed database environment. GDS works by enabling a Global Service across Oracle Real Application Clusters (RAC) and single-instance Oracle databases interconnected via Oracle Data Guard, Oracle GoldenGate, or any other replication technology. Client access to this distributed infrastructure is completely transparent. GDS implementations are easy to apply to Oracle WebLogic Server with minimal changes. See Global Data Services Concepts and Administration Guide.
Application Continuity (AC) is available with the Oracle RAC, Oracle RAC One Node and Oracle Active Data Guard options that masks outages from end users and applications by recovering the in-flight database sessions following recoverable outages. Application Continuity enables replay, in a non-disruptive and rapid manner, of a database request when a recoverable error makes the database session unavailable. The request can contain transactional and nontransactional calls to the database and calls that are executed locally at the client or middle tier. After a successful replay, the application can continue where that database session left off. See Ensuring Application Continuity in Oracle Database Development Guide.
WebLogic Server Active GridLink integrates with the Oracle Database 12c features like Application Continuity and Global Data Services to provide the highest possible availability. Application Continuity will replay transactions when encountered with unplanned database outages. End-user applications will not receive errors or even know that there have been outages. Active GridLink, Application Continuity, and Data Guard provide protection for planned and unplanned database outages in highly available environments.
These technologies oversee replication and failover while performing load balancing within and across data centers, optimizing resource utilization and streamlining database management practices in a distributed database environment.
Continuous Availability for Oracle WebLogic Server
12c (12.2.1.1.0)
E69719-02
August 2016
Documentation that describes the features and benefits of Oracle Continuous Availability, the architectures it supports and how you can use the continuous availability features in the supported architectures, and design considerations and recommendations.
Oracle Fusion Middleware Continuous Availability for Oracle WebLogic Server, 12c (12.2.1.1.0)
E69719-02
Copyright © 2015, 2016, Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.
Topics include:
The design considerations and recommendations in this section apply to all the supported MAA solutions that can be used to provide continuous availability. MAA architectures span data centers in distributed geographical locations. See Supported MAA Architectures for Continuous Availability.
Oracle Maximum Availability Architecture (MAA) is Oracle's best practices blueprint based on proven Oracle high availability technologies, expert recommendations and customer experiences. The goal of MAA is to achieve optimal high availability for Oracle customers at the lowest cost and complexity.
Topics in this section include:
The design considerations and recommendations provided in this chapter apply to the following potential failure scenarios:
Full site failure - With full site failure, the database, the middle-tier application server, and all user connections fail over to a secondary site that is prepared to handle the production load.
Partial site failure - In the context of this document, partial failures are at the mid-tier. Partial site failures at the mid-tier can consist of the entire mid-tier (WebLogic Server and Coherence), WebLogic Server only failure, Coherence cluster failure, or a failure in one instance of Oracle Traffic Director when two instances are configured for high availability.
Network partition failure - The communication between sites fails.
Maintenance outage - During a planned maintenance all components of a site are brought down gracefully. A switchover will take place from one site to the other.
The behavior of the continuous availability features in the different failure scenarios are described in the following sections:
When a global load balancer is deployed in front of the production and standby sites, it provides fault detection services and performance-based routing redirection for the two sites. Additionally, the load balancer can provide authoritative DNS name server equivalent capabilities.
In the event of a primary-site disaster and after the standby site has assumed the production role, a global load balancer is used to reroute user requests to the standby site. Global load balancers such as F5 –BigIP Global Traffic Manager (GTM) and Cisco –Global Site Selector (GSS) also handle DNS server resolution (by off loading the resolution process from the traditional DNS servers).
During normal operations, the global load balancer can be configured with the production site's load balancer name-to-IP mapping. When a DNS switchover is required, this mapping in the global load balancer is changed to map to the standby site's load balancer IP. This allows requests to be directed to the standby site, which now has the production role.
This method of DNS switchover works for both site switchover (planned) and failover (unplanned). One advantage of using a global load balancer is that the time for a new name-to-IP mapping to take effect can be almost immediate. The downside is that an additional investment must be made for the global load balancer. For instructions for performing a DNS switchover, see “Manually Changing DNS Names” in Disaster Recovery Guide.
In an MAA environment you should configure an Oracle Traffic Director failover group. Combining two instances of Oracle Traffic Director using one or two virtual IP (VIP) addresses ensures high availability. Both the hosts in the failover group must run the same operating system version, use identical patches and service packs, and run Oracle Traffic Director instances of the same configuration.
When two Oracle Traffic Director instances are grouped by a virtual IP address (VIP) for high availability, they are in active-passive failover mode. The VIP receives requests and routes them to the Oracle Traffic Director instance that is designated as the primary instance. If the primary instance is not reachable, requests are routed to the backup instance.
For active-active failover mode, two failover groups are required. Each failover group must have a unique VIP, and consist of the same nodes, each with the primary and backup roles reversed. Each instance in the failover group is designated as the primary instance for one VIP and the backup for the other VIP. For more information about failover groups, see Creating and Managing Failover Groups in Administering Oracle Traffic Director.
If you are running Oracle Traffic director on a Linux platform in high availability mode (both active-passive and active-active), you need to ensure that Oracle Traffic Director is the only consumer of the Keepalived
process on all the hosts that are used for configuring the failover group. Keepalived
is a process that runs on Linux. It is a health-check framework and implements a Hot Standby protocol. There can be no other applications running on these hosts that require the Keepalived
process.
Although Oracle HTTP Server and Apache Plugins can also be used in Continuous Availability architectures to load balance requests to WebLogic Servers, they do not provide the integration you receive with Oracle Traffic Director.
The following sections provide some specific guidelines for configuring Oracle Traffic Director for continuous availability.
Network Preparation
Allocate one virtual IP address for each site to be used for the failover group of Oracle Traffic Director. The addresses must belong to the same subnet as that of the nodes in the failover group. They should be DNS resolvable and accessible over the network. Ensure that for the network interface on which the failover-group virtual IP is created is the same on all the Administration Node hosts. This is a requirement for smooth migration of the failover group from the primary site to the standby site.
At the standby site, ensure that the primary site's host names and the primary site's virtual IP resolve to the IP addresses of the corresponding peer systems. This can be set up by creating aliases for host names in the /etc/hosts file. For both disaster recovery deployment options, make sure aliases for all the systems and the virtual IP names exist.
Best Practices
Oracle recommends that you test the standby site periodically. This will help mitigate failures at both sites. Test the standby site by switching its role with the current primary site:
Follow the site switchover procedure to switch over the standby site to the new primary site.
Once testing is complete, follow the site switchback procedure to reverse the roles.
Periodic testing validates that both the primary and standby sites are completely functional and mitigates the risk of failure at both sites. It also validates the switchover and switchback procedures.
Do not configure project-level and share-level replication within the same project.
Ensure that the Oracle Traffic Director setup resides on shared storage and gets replicated to the remote site, making the Oracle Traffic Director binaries and latest configuration data available at the standby site during a site failure or site maintenance event. All the Oracle Traffic Director binaries, configuration data, logs, and security data should be replicated to the remote site using existing replication technology.
Use the Scheduled replication mode for projects and shares in these cases:
Data does not change frequently.
The Recovery Point Objective value falls within your scheduled replication window.
Use the Continuous replication mode for projects and shares in these cases:
The standby site is required to be as close as possible to the primary site.
The Recovery Point Objective value is a range of a few seconds, and the allowance is for very little data loss. Data is of a critical nature.
Snapshots and clones can be used at the target site to off load backup, test, and development types of environments.
When configuring a local standby site (disaster recovery within the data center), consider disabling SSL on the replication channel. Removing the encryption algorithm enables a higher replication throughput.
Always enable SSL when replication is across a wide area network. For the OTD instance synchronization-based standby disaster recovery option, there must be a remote sync tool and a time-based scheduler application on the Administration Server host at each site for transferring the OTD instance changes between sites. Ensure that the NIS settings are configured and the NIS service is started
Configuring a web tier is optional in continuous availability MAA architectures. Web Tier products such as Oracle HTTP Server (OHS) and Oracle WebLogic Server Proxy Plug-In are designed to efficiently front-end WebLogic Server applications.
When possible, Oracle recommends using Oracle Traffic Director (OTD) to handle all load balancing to WebLogic server instances. You should not front end your web-tier with Oracle Traffic Director in cases where there are:
Static contents like HTML, Images, Java Script, and so on. Oracle Traffic Director integrates with Continuous Availability features such as Zero Down Time Patching to provide maximum availability to applications during the rollout process and Live Partition Migration to provide continuous availability during the migration of application and resources in a MT partition from one WebLogic Cluster to another.
Middleware configurations that already make use of Oracle HTTP Server or WebLogic Server Proxy Plug-In.
Use existing replication technology or methods to keep Web Tier binaries and configuration consistent between sites.
OHS and WebLogic Server Proxy Plug-in can be used with other WebLogic Server Continuous Availability features but might require manual intervention and do not offer the same level of availability as Oracle Traffic Director.
For more information about Oracle HTTP Server and WebLogic Server Proxy Plug-Ins, see:
The following sections provide the design considerations for WebLogic Server in a continuous availability MAA architecture:
A WebLogic Server cluster consists of multiple WebLogic Server server instances running simultaneously and working together to provide increased scalability, reliability, and high availability. A cluster appears to clients as a single WebLogic Server instance. The server instances that constitute a cluster can run on the same machine, or be located on different machines. You can increase a cluster's capacity by adding additional server instances to the cluster on an existing machine, or you can add machines to the cluster to host the incremental server instances. Each server instance in a cluster must run the same version of WebLogic Server.
WebLogic Server supports two types of clusters:
Dynamic clusters - Dynamic clusters consist of server instances that can be dynamically scaled up to meet the resource needs of your application. When you create a dynamic cluster, the dynamic servers are preconfigured and automatically generated for you, enabling you to easily scale up the number of server instances in your dynamic cluster when you need additional server capacity. Dynamic clusters allows you to define and configure rules and policies to scale up or shrink the dynamic cluster.
In dynamic clusters, the Managed Server configurations are based off of a single, shared template. It greatly simplifies the configuration of clustered Managed Servers, and allows for dynamically assigning servers to machine resources and greater utilization of resources with minimal configuration.
Dynamic cluster elasticity allows the cluster to be scaled up or down based on conditions identified by the user. Scaling a cluster can be performed on-demand (interactively by the administrator), at a specific date or time, or based on performance as seen through various server metrics.
When shrinking a dynamic cluster, the Managed Servers are shut down gracefully and the work/transactions are allowed to complete. If needed, singleton services are automatically migrated to another instance in the cluster.
Static clusters - In a static cluster the end-user must configure new servers and add them to the cluster, and start and stop them manually. The expansion and shrinking of the cluster is not automatic; it must be performed by an administrator.
In most cases, Oracle recommends the use of dynamic clusters to provide elasticity to WebLogic deployments. The benefits of dynamic clusters are minimal configuration, elasticity of clusters, and proper migration of JMS and JTA singleton services when shrinking the cluster.
However, there are some instances where static clusters should be used:
If you need to manually migrate singleton services. Dynamic clusters do not support manual migration of singleton services.
If your configuration consists of Oracle Fusion Middleware upper stack products such as Oracle SOA Suite and Oracle Business Process Management. These products do not provide support for dynamic clusters in this release.
A singleton service is a service running on a Managed Server that is available on only one member of a cluster at a time. WebLogic Server allows you to automatically monitor and migrate singleton services from one server instance to another.
Pinned services, such as JMS-related services and user-defined singleton services are hosted on individual server instances within a WebLogic cluster. To ensure that singleton JMS or JTA services do not introduce a single point of failure for dependent applications in the cluster, WebLogic Server can be configured to automatically or manually migrate them to any server instance in the cluster.
Within an application, you can define a singleton service that can be used to perform tasks that you want to be executed on only one member of a cluster at any give time. Automatic singleton service migration allows the automatic health monitoring and migration of user-defined singleton services.
Singleton services described in the following sections include:
Oracle WebLogic Server supports two distinct types of automatic migration mechanisms:
Whole server migration, where a migratable server instance, and all of its services, is migrated to a different physical machine upon failure. When a failure occurs in a server that is part of a cluster that is configured with server migration, the server is restarted on any of the other machines that host members of the cluster. For details about whole server migration, see “Whole Server Migration” in Administering Clusters for Oracle WebLogic Server.
Service migration, where failed services are migrated from one server instance to a different available server instance within the cluster. In some circumstances, service migration performs much better then whole server migration because you are only migrating the singleton services as opposed to the entire server. For details about service migration, see “Service Migration” in Administering Clusters for Oracle WebLogic Server.
Both whole server and Service migration require that you configure a database leasing table. For more information, see Leasing.
Instructions for configuring WebLogic Server to use server and service migration in an MAA environment are provided in “Using Whole Server Migration and Service Migration in an Enterprise Deployment” in Enterprise Deployment Guide for Oracle SOA Suite.
There are two kinds of persistent data stores for Oracle WebLogic Server transactions logs and Oracle WebLogic Server JMS: database-based and file-based.
Keeping persistent stores in the database provides the replication and high availability benefits inherent in the underlying database system. With JMS, TLogs and the application in the same database and replication handled by Oracle Data Guard, cross-site synchronization is simplified and the need for a shared storage sub-system such as a NAS or a SAN is alleviated in the middle tier. See Database.
However, storing TLogs and JMS stores in the database has a penalty on system performance. This penalty is increased when one of the sites needs to cross communicate with the database on the other site. Ideally, from a performance perspective, shared storage that is local to each site should be used for both types of stores and the appropriate replication and backup strategies at storage level should be provisioned in order to guarantee zero data loss without performance degradation. Whether using database stores will be more suitable than shared storage for a system depends on the criticality of the JMS and transaction data, because the level of protection that shared storage provides is much lower than the database guarantees.
In active-active and active-passive topologies, keeping the data stores in the database is a requirement. Oracle recommends keeping WebLogic Server stores such as JMS and JTA stores, in a highly available database such as Oracle RAC and connecting to the database using Active GridLink data sources for maximum performance and availability.
In the case of an active-active stretch cluster, you can choose between keeping the data stores in a shared storage sub-system such as a NAS or a SAN, or in the database.
Leasing is the process WebLogic Server uses to manage services that are required to run on only one member of a cluster at a time. Leasing ensures exclusive ownership of a cluster-wide entity. Within a cluster, there is a single owner of a lease. Additionally, leases can failover in case of server or cluster failure which helps to avoid having a single point of failure. For details about leasing, see “Leasing” in Administering Clusters for Oracle WebLogic Server.
For database leasing we recommend the following:
A highly available database such as Oracle RAC and Active GridLink (AGL).
A standby database, and Oracle Data Guard to provide replication between the two databases.
When using database leasing, Oracle WebLogic Servers may shut down if the database remains unavailable (during switchover or failover) for a period that is longer than their server migration fencing times. You can adjust the server migration fencing times as described in the following topics in Administering Clusters for Oracle WebLogic Server:
WebLogic Server provides three methods for replicating HTTP session state across servers in a cluster:
In-memory replication - Using in-memory replication, WebLogic Server copies a session state from one server instance to another. The primary server creates a primary session state on the server to which the client first connects, and a secondary replica on another WebLogic Server instance in the cluster. The replica is kept up-to-date so that it may be used if the server that hosts the servlet fails.
JDBC-based persistence - In JDBC-based persistence, WebLogic Server maintains the HTTP session state of a servlet or JSP using file-based or JDBC-based persistence. For more information on these persistence mechanisms, see “Configuring Session Persistence” in Developing Web Applications, Servlets, and JSPs for Oracle WebLogic Server.
Coherence*Web - Coherence*Web is not a replacement for WebLogic Server's in-memory HTTP state replication services. However, you should consider using Coherence*Web when an application has large HTTP session state objects, when running into memory constraints due to storing HTTP session object data, or if you want to reuse an existing Coherence cluster. For more information, see “Using Coherence*Web with WebLogic Server” in Administering HTTP Session Management with Oracle Coherence*Web.
Depending on the latency model, tolerance to session loss, and performance, you should choose the method that best fits your requirement.
When the latency is small, such as in MAN networks (stretch cluster topology), Oracle recommends WebLogic Server in-memory session replication. However, if a site experiences a failure there is the possibility of session loss.
When the latency is large (WAN networks), Active-Active, or Active-Passive topologies, and when your applications cannot tolerate session loss, Oracle recommends database session replication.
In most cases, in-memory session replication performs much better than database session replication. For more information, see