About Performance Results

This section presents the results of different performance tests conducted on a stretched cluster.

The system used for the tests is a Oracle Fusion Middleware (FMW) stretched cluster configured across the Frankfurt and Amsterdam Oracle Cloud Infrastructure (OCI) regions.

The WebLogic domain consists of 5 nodes distributed across different locations to enable performance comparisons of various topologies: servers running in the same availability domain as the database, servers in the same region but in different availability domains, and servers located in a different region.

These stress tests were performed using the SOA Fusion Order Demo (FOD) as a sample SOA application. The FOD demo was modified to insert Java Message Service (JMS) messages in a Uniform Distributed Destination (UDD) destination when an order completes. This example is very database-intensive and uses multiple SOA adapters such as the File Adapter and JMS Adapter. It also involves different SOA components such Mediator, Business Process Execution Language (BPEL), rules engine, and multiple WebLogic features.

The following diagram shows the FMW stretched cluster environment used for the tests:

Description of fmw-stretched-performance-env.png follows

Description of the illustration fmw-stretched-performance-env.png

fmw-stretched-performance-env-oracle.zip

The real latency values between the networks in this environment were:

Between Hosts	Avgerage Latency (RTT in ms)
In the same availability domain	0.3
In the same region but in a different availability domain	0.6
In different regions (Frankfurt and Amsterdam)	6.5

Review Stress Tests

Multiple stress tests were conducted using various configurations and workloads.

Some of the tested configurations were:

Stressing the cluster with two nodes up, applying different latencies between the servers and between one of the servers to the database.
Stress testing individual servers, each with different latencies to the database.
Running tests with both servers collocated with the database (local only) and with both servers located remotely from the database (remote only).
Stressing the cluster with two nodes active in each region.

For each configuration, different workloads were tested. All the workload requests are first sent to the front end, where they are distributed (through the global load balancing, then the local load balancers, and then the web servers) among the Oracle WebLogic Server instances. The workload category (low, medium, high) depends on the number of active nodes in each setup and is constrained by the database's concurrency limits. For example, 80 concurrent virtual users would be considered a low workload for WebLogic servers if there are four nodes running, but a high workload if only one node is active. However, from the database's perspective, the workload remains the same. For simplicity, the workloads used are as follows:

Low workloads (20 concurrent virtual users per WebLogic server, with a maximum of 40 total concurrent virtual users in the system)
Medium workloads (40-60 concurrent virtual users per WebLogic server, with a maximum of 120 total concurrent virtual users in the system)
High workloads (80 concurrent virtual users per WebLogic server, with a maximum of 160 total concurrent virtual users in the system )

Based on the stress test results, these are the conclusions:

Overall cluster performance
The overall cluster performance (in terms of WebLogic throughput, transactions per second (TX/sec)) for a cluster with 2 servers:
- When both servers are in the same Availability Domain (AD) (taken as reference): 100%
- When each server is in a different ADs: ~100%
- When one server is in a different region (6.5ms round-trip time (RTT)): 90-95%
- When both servers are in a different region than the DB (6.5ms RTT): 85-95%
Per-server performance
The per-server performance (in terms of WebLogic throughput, TX/sec):
- For the server in the same AD as DB (taken as reference): 100%
- For the server in the other AD: 98-99%
- For the server in the other region: 85-90%
Number of active data source connections
The number of active data source connections increases with higher latency to the database. Although it depends on the workload, the server in region 2 shows up to 2x active connections than the server collocated with the Database. Consider this for a correct sizing of the WebLogic data sources and database sessions.
JTA transactions
The JTA transactions remain active for longer periods in servers with higher latency to the database. Transactions that remain active for longer periods are more likely to be affected by failures. Therefore, it becomes particularly important in these systems that transaction logs use JDBC persistent stores. For server failures, service migration should take place, and recovery will happen automatically.
Cross-region latency
For a cross-region latency of 6.5ms RTT, and implementing best practices provided by this document for the FMW Stretched clusters:
- There is low performance degradation using a stretched cluster (~10%).
- There is a similar performance degradation for a cluster with one server in each region and a cluster with both servers in the remote region. This is because the intra-cluster communication is also impacted by the latency.
Cross-AD latency
The cross-AD latency (0.6ms) doesn’t have a significant impact on the overall performance of a SOA FOD system.

Note:

With all of the above in mind, and with the performance penalties observed in many tests, Oracle does not support Oracle Fusion Middleware stretched clusters that exceed 10 milliseconds of latency (RTT) between the sites. Systems may operate without issues, but the transaction times will increase considerably. Latencies beyond 10 milliseconds (RTT) will also cause problems in the Oracle Coherence cluster used for deployment and JT, web services, or application timeouts. This makes the solutions presented in this playbook suitable primarily for sites or regions with low latency between them.

When stressing a cluster with 2 nodes, the following chart shows the overall performance of the cluster, depending on where the servers are located. The reference (100%) is when both servers run in the same AD as the database.

Description of stretched-cluster-stress-2node-overall.png follows

Description of the illustration stretched-cluster-stress-2node-overall.png

When stressing a cluster with 2 nodes, the following chart shows the performance for the server that is not collocated with the database (it is in the other AD or in a remote region) compared to the performance of the server that is collocated with the database:

Description of stretched-cluster-stress-2node-wls.png follows

Description of the illustration stretched-cluster-stress-2node-wls.png

When stressing a cluster with 2 nodes, these charts show the number of active data source connections (average) for each server. One server is always collocated with the database (site1), and the other server is at different latency values from the database (site2):

Description of stretched-cluster-stress-2node-connections.png follows

Description of the illustration stretched-cluster-stress-2node-connections.png

When stressing a single server with different database latencies, the following performance results are observed, compared to a server that is co-located with the database, under medium to high load. The reference (100%) is when the server is in the same AD as the database.

Description of stretched-cluster-stress-single-overall.png follows

Description of the illustration stretched-cluster-stress-single-overall.png

When stressing a single server with different latencies to the database, this is the active data source connections under medium to high stress:

Description of stretched-cluster-stress-single-connections.png follows

Description of the illustration stretched-cluster-stress-single-connections.png

When stressing a single server with different latencies to the database, the following image shows the average JTA active time for different latencies to the database:

Description of stretched-cluster-stress-single-jta.png follows

Description of the illustration stretched-cluster-stress-single-jta.png

When comparing the performance of a cluster with both servers in the same region as the database (local only) vs. a cluster with both servers in a different region than the database (remote only), the following performance results are observed. The reference (100%) is the local-only cluster.

Description of stretched-cluster-stress-xregion-overall.png follows

Description of the illustration stretched-cluster-stress-xregion-overall.png

The following figure shows the average JTA TX active time for a cluster with both servers running in the same region as the database (local only) and a cluster running both servers in a different region than the database (remote only).

Description of stretched-cluster-stress-xregion-jta.png follows

Description of the illustration stretched-cluster-stress-xregion-jta.png

Review Start Times

In clusters with members collocated with the Database, a considerable amount of time is dedicated on Oracle WebLogic Server start to create connection pools.

Different delays are expected according to the initial capacity settings in the data sources. By default, most Oracle Fusion Middleware (FMW) data sources use a zero initial capacity for their connection pool. However, to reduce the response time of the system during normal runtime operation, it may be beneficial to increase the initial pool capacity. However, in a stretched cluster, those servers that reside remotely to the Database, will show increased latency on their start as higher initial pool capacity is used.

A balanced decision is required between optimizing response times during normal operation and minimizing the start time to determine the ideal initial capacity settings. Since the initial capacity is configured at the data source (connection pool) level, these settings influence the startup time for all servers within the cluster (the ones local to the database and the ones that are remote to it).

The following graph shows the WebLogic server start times as the latency to the database grows, for different initial size values in all the data sources (11 data sources in total):

Description of stretched-cluster-wls-start-time.png follows

Description of the illustration stretched-cluster-wls-start-time.png

Review JMS Service Migration Times

When using JDBC persistent stores, Automatic Service Migration is possible across regions in stretched clusters because both Java Message Service (JMS) and transaction log (TLOG) data are accessible from each region.

However, the time taken for a service migration operation from region 1 to region 2 can increase due to the latency to the database. This increase is caused by the time spent recovering the messages in the other server, because they are read from the persistent store in the database in the other region.

The increment is higher if the persistent stores have a large number of pending messages. For JMS messages with a size of 2.7 KB each, the following image shows the JMS service migration times when one of the persistent stores has a high number of pending messages (around 8000), and the service migrates from a server collocated with the database to another server, for different latencies between the destination server and the database:

Description of stretched-cluster-jms-high-time.png follows

Description of the illustration stretched-cluster-jms-high-time.png

The following image shows the service migration time increment (%) with a high number of pending messages (around 8000) for different latencies between the destination server and the database. The reference (100%) is when the service migrates to a server that is in the same Availability Domain (AD) as the database.

Description of stretched-cluster-jms-high-percent.png follows

Description of the illustration stretched-cluster-jms-high-percent.png

The following image shows the migration times for the same case but with a low number of pending messages (around 50) for different latencies between the destination server and the database.

Description of stretched-cluster-jms-low-time.png follows

Description of the illustration stretched-cluster-jms-low-time.png

The following image shows the JMS service migration time increment (%) with a low number of pending messages (around 50) for different latencies between the destination server and the database. The reference (100%) is when the service migrates to a server that is in the same AD as the database.

Description of stretched-cluster-jms-low-percentage.png follows

Description of the illustration stretched-cluster-jms-low-percentage.png

Review SOA Composite Deployment Times

When focusing on SOA, the time taken to deploy and load composites is higher in the servers with higher latency to the database.

When deploying a composite (deploying its first version or updating to a newer version), the composite may be deployed earlier to servers in region 1 than to the servers in region 2, although it will not be formally activated until it is available in all members of the cluster.

The following image shows the increase in time taken to load a composite in a server during the server startup, with latency to the database as compared to the time taken to load it in a server that resides in the same Availability Domain (AD) as the database. The composite size is 365 KB.

Description of stretched-cluster-composite-load-server.png follows

Description of the illustration stretched-cluster-composite-load-server.png

The following image shows the increase in time taken to deploy a composite with the Oracle WebLogic Scripting Tool (WLST) commands, for different latencies from the server that performs the deploy to the database.

Description of stretched-cluster-composite-load-wlst.png follows

Description of the illustration stretched-cluster-composite-load-wlst.png

Review Traffic Between Sites

The recommendations provided in this document are intended to constrain the traffic as much as possible inside each site for the most common operations.

This isolation, however, is non-deterministic (for example, there is room for failover scenarios where a Java Message Service (JMS) invocation could take place across the two sites). That said, for a typical application, most of the traffic takes place between the Oracle WebLogic Server instances and the database. This will be the key to the performance of the Oracle Fusion Middleware (FMW) stretched clusters topology. This image shows the percentage of traffic between a WebLogic server in region 2 and the different addresses in region 1 during a stress test. Notice that more than 90% of the traffic happens between the server and the database, which is located in region 1.

To capture the amount of traffic per IP between the sites, you can use the iftop tool. For example:

sudo iftop -i ens3 -F <remote_site_CIDR> -n -t -s 900

The following image shows the percentage of traffic between a WebLogic server in region 2 and the different addresses in region 1 during a stress test.

Description of stretched-cluster-traffic-sites.png follows

Description of the illustration stretched-cluster-traffic-sites.png