4 Design Considerations for Active-Active Application Tier Topology

In an active-active application tier topology, two or more active server instances at distributed geographic locations are deployed to handle requests concurrently and thereby improve scalability and provide high availability. When designing an active-active solution for Oracle WebLogic Server Continuous Availability, consider Oracle’s best practices.

In addition to the general best practices recommended for all continuous availability MAA architectures as described in Common Design Considerations for Continuous Availability, the following sections describe the design considerations and failure scenarios that apply to the MAA architecture shown in Active-Active Application Tier with an Active-Passive Database Tier.

Active-Active Application Tier With Active-Passive Database Tier Design Considerations

Consider Oracle’s best practice design recommendations for continuous availability in an active-active application tier topology with an active-passive database tier.

To take full advantage of continuous availability features in an active-active topology, consider the following:

  • Active-active domains must be configured with symmetric topology; they must be similar and use the same domain configurations such as domain and server names, port numbers, user accounts, load balancers and virtual server names, and the same version of the software. Host names (not static IPs) must be used to specify the listen address of the Managed Servers. In this topology if you configure cross-site transaction recovery the configurable MBean properties, SiteName, RecoverySiteName, are site specific. You can use any existing replication technology or methods that you currently use to keep these sites in sync.

    Note:

    When you are taking advantage of Cross-Site XA Transaction recovery, there are some configuration parameters (see Table 3-1), as well as the Site Name and Recovery Site name that are specific to a site. You should not overwrite these parameters and names when replicating the configuration.

  • In this topology network latency is normally large (WAN network). If applications require session replication between sites you must choose either database session replication or Coherence*Web. See Session Replication.

  • Server and service migration only applies intra-site (within a site) in an active-active topology. See Server and Service Migration.

  • JMS is supported only intra-site in this topology. JMS recovery during failover or planned maintenance is not supported across sites.

  • Zero Downtime Patching is only supported intra-site (within a site) in an active-active topology. You can upgrade your WebLogic homes, Java, and applications in each site independently. Keep upgrade versions in sync to keep the domains symmetric at both sites.

  • You can design applications to minimize data loss during failures by combining different Continuous Availability features. For example, you can use a combination of cross-site XA transaction recovery, Coherence federated cache, Coherence HotCache or Coherence Read-Through cache.

    If the Coherence data is backed up in the database and there is a network partition failure, federated caching is unable to perform the replication and data becomes inconsistent on both sites since the Coherence clusters can independently continue doing work. Once the communication between the sites resumes, backed up data is pushed from the database to Coherence via Coherence HotCache or Coherence Read-Through cache, and eventually data in the Coherence cache is synchronized. See Coherence.

  • In an active-active topology, Oracle Site Guard can only switchover or failover the database. See Active-Active Application Tier With Active-Passive Database Tier Failure Scenarios.

  • Figure 4-1 and Figure 4-2 represent results found during benchmark testing using the cross-site transaction recovery default parameter settings. These figures illustrate the latency incurred in recovering transactions across sites when the mid-tier failed. Figure 4-1 shows that when the latency between sites is 1s, the average recovery time was between 170 and 180 ms. Figure 4-2 shows that when the latency between the sites is 77ms, the average transaction recovery time was between 90 and 100ms.

    As latency increases, the cross-site transaction recovery parameters CrossSiteRecoveryLeaseExpiration, CrossSiteRecoveryRetryInterval, and CrossSiteRecoveryLeaseUpdate will need to increase to adjust to latency. If the database where the cross-site transaction recovery leasing table is kept is remote to one of the sites, tuning these values is especially important. See Cross-Site XA Transaction Recovery.

Figure 4-1 Cross-Site Transaction Recovery with 1 Second Latency Between Sites

Description of Figure 4-1 follows
Description of "Figure 4-1 Cross-Site Transaction Recovery with 1 Second Latency Between Sites"

Figure 4-2 Cross-Site Transaction Recovery with 77ms Latency Between Sites

Description of Figure 4-2 follows
Description of "Figure 4-2 Cross-Site Transaction Recovery with 77ms Latency Between Sites"

Active-Active Application Tier With Active-Passive Database Tier Failure Scenarios

Learn how the Continuous Availability features are used in each of the different potential failure scenarios for an active-active application tier topology.

Table 4-1 describes the different failure scenarios and how each Continuous Availability feature applies. For an explanation of the different failure scenarios, see Potential Failure Scenarios.

Table 4-1 Active-Active Application Tier and Active/Passive Database Tier Failure Scenarios

Continuous Availability Features Complete Site Failure Partial Site/Mid-Tier Failure (WebLogic Server/Coherence/OTD) Maintenance Outage Network Partition Failure

Transaction Recovery Across Sites

Oracle Site Guard integrates with Oracle Data Guard broker to perform database failover/switchover. Site Guard calls Oracle Data Guard broker to perform the failover and Site Guard switches the roles from primary to secondary.

JDBC TLog is replicated to the database on Site 2 by database replication technology such as Oracle Data Guard or Oracle Active Data Guard.

Transactions are recovered on Site 2 using cross-site transaction recovery.

If the primary site leasing table has expired, then the servers on Site 2 automatically take ownership of the TLog tables and start transaction recovery.

End-user invokes switchover operation in Oracle Site Guard to initiate the orchestration of the switchover operation.

Oracle Site Guard integrates with Oracle Data Guard broker to perform database failover/switchover. Site Guard calls Oracle Data Guard broker to perform the failover and Site Guard switches the roles from primary to secondary.

JDBC TLog is replicated to the database on Site 2.

Transactions are recovered on Site 2 using cross-site transaction recovery.

Site 1 continues processing its transactions.

If the transaction fails before writing transactions to the store, then servers on Site 2 remain alive. If the transaction fails because the transaction log store could not be reached, the server shuts itself down (default behavior).

Server trying to connect to leasing table writes warning of inability to recover for recovery site.

Oracle Traffic Director

Oracle Traffic Director on Site 2 continues to route traffic to the servers running on its site.

Oracle Traffic Director on Site 2 continues to route traffic to server running on its site.

Oracle Traffic Director on Site 2 continues to route traffic to the servers running on its site.

Oracle Traffic Director on Site 1 and Oracle Traffic Director on Site 2 continue to route traffic to the server running on its site.

Oracle Site Guard

Oracle Site Guard integrates with Oracle Data Guard broker to perform database failover/switchover. Site Guard calls Oracle Data Guard broker to perform the failover and Site Guard switches the roles in the database from primary to secondary.

No-op.

End-user invokes switchover operation in Oracle Site Guard to initiate the orchestration of the switchover operation.

Oracle Site Guard integrates with Oracle Data Guard broker to perform database failover/switchover. Site Guard calls Oracle Data Guard broker to perform the failover and Site Guard switches the roles in the database from primary to secondary

No-op.

Coherence Federated Caching

Coherence cluster on Site 2 becomes active.

Because replication is asynchronous, the cache data eventually becomes consistent either through Coherence Hot Cache or Read-Through cache, or when the other site comes back up.

Because replication is asynchronous, the cache data eventually becomes consistent either through Coherence Hot Cache or Read-Through cache, or when the other site comes back up.

Because replication is asynchronous, the cache data eventually becomes consistent either through Coherence Hot Cache or Read-Through cache, or when the network connectivity is re-established between the two sites.