Implement Mid-tier Replication in an OCI Disaster Recovery Architecture

Implement the ongoing replication for your middleware tier in a symmetric disaster recovery (DR) system in Oracle Cloud Infrastructure (OCI), by replicating the application servers and their configurations between primary and secondary regions, ensuring minimal downtime and data loss during a failover or switchover.

This solution playbook provides an overview of mid-tier replication throughout the lifecycle of the system. It presents various replication technologies and provides details to implement them in a real scenario. It applies active-passive mid-tier disaster recovery scenarios where both primary and standby systems are in OCI.

The content is intended for mid-tier administrators familiar with disaster recovery (DR) topologies for middleware and with OCI. The examples and terminology refer to Oracle WebLogic Server and to PaaS services that utilize WebLogic; however, the replication technologies and implementations described apply to any mid-tier system.

Note:

This document doesn't describe the disaster recovery setup.

Architecture

This architecture shows a high-level overview of middleware active-passive disaster recovery topology. This playbook assumes that the primary and secondary systems are already created.

Any active-passive disaster recovery solution for a mid-tier system must implement the following essential features:

Geographic separation
Primary and secondary systems are geographically separated, far enough so they can’t be affected by the same disaster event.
Symmetry
The primary and secondary systems are symmetric. The secondary system has the same number of nodes in the mid-tier and the db-tier, with similar CPU and memory capacity.
Unique front-end name
Unique front-end names for the primary and secondary. The access from clients to the system must be agnostic to the site being used as the primary one. To accomplish this, the front-end address names must be unique and always map to the IP of the system that is the primary at that moment. This name is usually referred to as a virtual front-end or vanity URL.
Listen addresses
The listen addresses of the mid-tier processes must be host names resolvable in both systems and mapped to the IPs of the hosts of the local site.
Database replication
The data of the primary database must be replicated to the standby database using Oracle Data Guard.
Mid-tier replication
Primary and secondary mid-tiers must be in sync. They must have the same configuration, the same product version, and the same patch level. There are different approaches to achieve this. You can maintain primary and secondary systems separately: if a change is performed in primary, the same change is repeated in secondary, if a patch is installed in primary, the same patch is installed in standby. However, this duplicates the work and is prone to errors. Oracle Maximum Availability Architecture (Oracle MAA) recommends implementing an automatic replication to copy the mid-tier file system artifacts. This ensures that the primary and standby systems are always in sync.
Management of the information that is specific to each site
The configuration of the secondary is an exact copy of the primary, but there may be file artifacts that contain information specific to each site, which must be different in primary and secondary. The DR topology must support this and allow site-specific information customization.

Tip:

Oracle WebLogic Server Example

In an Oracle WebLogic system, the primary mid-tier connects to the primary region’s database, and the secondary mid-tier connects to the secondary region’s database. The primary and secondary mid-tier systems have the same configuration, so there must be a mechanism to ensure that each system uses the appropriate connection string that points to its local database. Oracle Maximum Availability Architecture (Oracle MAA) recommends using TNS aliases for the data sources, with different tnsnames.ora files in each site. The mid-tier replication methods must take this into account, either skipping the file containing the database connect string (tnsnames.ora) or replacing the database connect string in the files to point to the local database.

The following image is an example of an active-passive disaster recovery solution for a mid-tier system.

Description of active-passive-dr-mid-tier.png follows

Description of the illustration active-passive-dr-mid-tier.png

active-passive-dr-mid-tier-oracle.zip

Terminology

You should be familiar with the following concepts and terminology:

Mid-tier (also middle tier or middleware)
The mid-tier refers to the layer within a multi-tiered application architecture that sits between the user interface (front-end) and the data storage (back-end). It handles business logic, data processing, and security, acting as a bridge between the user and the database.
Disaster
A sudden, unplanned catastrophic event that causes unacceptable damage or loss in a site or geographical area. A disaster is an event that compromises an organization's ability to provide critical functions, processes, or services for unacceptable period and causes the organization to invoke its recovery plans.
Disaster Recovery (DR)

Ability to safeguard against natural or unplanned outages at a production site by having a recovery strategy for applications and data to a geographically separate secondary site.
Disaster Recovery Topology
The production site and the secondary site hardware and software components that comprise an Oracle Fusion Middleware Disaster Recovery solution.
Oracle Maximum Availability Architecture
Oracle Maximum Availability Architecture (Oracle MAA) is the best practice blueprint for data protection and availability of Oracle products (Database, Fusion Middleware, Applications). Implementing Oracle MAA best practices is one of the key requirements for any Oracle deployment. It provides recommendations for setting up and managing an Oracle system. Oracle MAA includes the Oracle Fusion Middleware Enterprise Deployment Guide recommendations and adds disaster protection best practices to minimize planned and unplanned downtime for outages affecting an entire data center or region.
System
A System is a set of targets (hosts, databases, application servers, and so on) that work together to host your applications. For example, to monitor an application in Oracle Enterprise Manager, you would first create a System, that consists of the database, listener, application server, and hosts targets on which the application runs.
Site
A site is the set of different components in a data center needed to run a group of applications. For example, a site could consist of Oracle Fusion Middleware instances, databases, storage, and so on.
Production or Primary site
The site that is carrying the system’s workload at a precise point in time. It is a group of hardware, network, and storage resources, and processes that is actively used to carry business logic and process requests at a precise point in time.
Secondary (or standby or DR) site
A secondary site is a backup location that can take over the business logic and requests that a primary site was processing. Typically, secondary sites are also named as "Standby" because they remain on "standby or inactive mode". This means that they are not processing the production workload during normal operations. However, this does not imply that the secondary site cannot be used for other purposes. This is especially true in more modern models where the secondary site is used for reporting operations and more importantly for validating changes before applying them in the primary site.
Recovery Point Objective (RPO)
Recovery point objective is the amount of data loss that a system can tolerate from a business point of view. For example, the amount of data loss that is acceptable when an outage takes place.
Recovery Time Objective (RTO)
Recovery time objective is the amount of downtime a system can tolerate or the acceptable amount of time that an application or service can remain unavailable when an outage takes place, from a business point of view.
Oracle Cloud Infrastructure (OCI)
OCI is a set of complementary cloud services that enable you to build and run a range of applications and services in a highly available hosted environment. OCI provides high-performance compute capabilities (as physical hardware instances) and storage capacity in a flexible overlay virtual network that is securely accessible from your on-premises network.
OCI region
An OCI region is a localized geographic area, composed of one or more availability domains. Regions are independent of other regions and can be separated by vast distances—across countries or even continents. A region is a site in terms of Disaster Recovery.
OCI Block Volumes
OCI Block Volumes provide reliable, high performance, low-cost block storage that persists beyond the lifespan of a virtual machine, with built-in redundancy and the ability to scale.
OCI File Storage
OCI File Storage is a fully managed, elastic, enterprise-grade storage service that enables servers and applications to access data through shared file systems.
DBFS
A Database File System (DBFS) is a standard file system interface in the Oracle Database. DBFS is similar to NFS in that it provides a shared network file system that looks like a local file system and has both a server component and a client component.
WLS-HYDR Framework
The "WLS-HYDR Framework" refers to a framework for creating and configuring a symmetric Disaster Recovery (DR) system for Oracle WebLogic Server (WLS) environments, specifically within Oracle Cloud Infrastructure. This framework automates the manual processes involved in setting up a DR environment for WLS or Fusion Middleware (FMW) domains.
Oracle WebLogic Server for Oracle Cloud Infrastructure stack
Oracle WebLogic Server for OCI stack refers to a preconfigured environment built using Oracle Resource Manager in OCI Marketplace, that provisions and manages Oracle WebLogic Server deployments on OCI. It automates the creation and configuration of various OCI resources like compute instances, networking, load balancers, alongside a WebLogic domain.
Oracle SOA Suite on Marketplace stack
Oracle SOA Suite on Marketplace stack is a preconfigured environment built using Oracle Resource Manager in OCI Marketplace, for deploying and managing Oracle SOA Suite applications on OCI. It automates the creation and configuration of various OCI resources like compute instances, networking, load balancers, alongside a SOA WebLogic domain.
TNS Alias
In Oracle, a TNS alias, also known as a Net Service Name, is a user-friendly identifier that simplifies database connections. It acts as a shortcut, mapping a human-readable name to the more complex connection details required to reach a specific Oracle database instance. These details, including protocol, host, port, and service name, are stored in a configuration file, typically named tnsnames.ora.
TNS Admin folder
The Oracle TNS Admin folder, specified by the TNS_ADMIN environment variable, is the directory where Oracle Net Services configuration files, such as tnsnames.ora, are located. A mid-tier system can use a TNS admin folder with the tnsnames.ora and other artifacts needed to connect to the database.

About Middleware Active-Passive DR Setup Procedures in OCI

In an active-passive disaster recovery topology for middleware, the secondary system is a mirror of the primary system. When the primary and secondary systems are both in OCI, there are different ways to set up the secondary system:

Manual
Create each resource individually through the OCI Console or CLI as a mirror of the primary system.
WLS-HYDR framework
Use the WLS-HYDR framework for your mid-tier systems based on Oracle WebLogic. This framework uses the OCI SDK for Python to create all the resources in the secondary as a mirror of the primary system. See the Explore More section in this playbook for a link to the wls-hydr framework in GitHub.
Provision using the same Marketplace stack
If the primary system is a Marketplace stack, such as Oracle WebLogic Server for OCI or SOA Marketplace, then you can provision by using the same Marketplace stack that is used in primary, with the standby database in snapshot standby mode.

This solution playbook applies to all these cases as long as they meet the features of an active-passive disaster recovery topology described in the previous point. It assumes that the primary and secondary systems have already been created.

Note:

This document doesn't describe the disaster recovery setup.