Design a multicloud disaster recovery solution with Oracle Database Service for Azure

Enterprises are increasingly deploying their solutions across multiple clouds and require high availability and disaster recovery capabilities to ensure business continuity. Oracle Data Guard is widely used to ensure high availability, data protection, and disaster recovery for enterprise database.

This multicloud split-stack disaster recovery solution uses Microsoft Azure and Oracle Cloud Infrastructure (OCI) as an example. The concept is applicable to split-stack architecture across any cloud.

This multicloud solution uses Oracle Database Service for Microsoft Azure (OracleDB for Azure) to deploy the database. OracleDB for Azure enables customers to easily provision, access, and operate enterprise-grade Oracle Database services in OCI with a familiar Azure-like experience.

In this solution, we show an Exadata Database service cross-regional disaster recovery architecture. This solution is also applicable to Oracle Base Database Service deployed with OracleDB for Azure.

Architecture

The multicloud disaster recovery topology across regions uses Oracle Exadata Database Service on Dedicated Infrastructure (Oracle Exadata Database Service) in OCI production and disaster recovery regions, deployed separately with OracleDB for Azure and a custom application deployed in Azure production and disaster recovery regions.

We use the OCI Azure interconnection regions as an example: OCI-Frankfurt and Azure-Germany West Central for production, and OCI-Amsterdam and Azure-West Europe for disaster recovery. The disaster recovery replication traffic travels over the corresponding OCI and Azure private networks.

This solution uses Data Guard to replicate the Oracle Exadata Database Service databases from production region (Active) to the disaster recovery region (Standby). The active standby database provides protection from unplanned outages and reduces downtime for planned maintenance activities, such as database patching and upgrades. On the Azure side, the Azure VM in the production region is replicated to the Azure disaster recovery region using Azure Site Recovery Service.

To deploy OracleDB for Azure in a cross-region disaster recovery setup, perform the following high-level steps:

  1. Set up OracleDB for Azure by requesting a multicloud link at the following address: Request Oracle Database Service for Microsoft Azure multicloud link.

    Note:

    When you sign up for OracleDB for Azure, the service configures a private connection to your database resources as part of the account linking process. You are prompted to provide a recognized organization name or an email address.
  2. Add a secondary (disaster recovery) Azure location in the OracleDB for Azure Portal.
  3. Deploy OracleDB for Azure and establish the network link in both production and disaster recovery regions.
  4. Enable and configure Oracle Data Guard manually on the database nodes.
  5. Update the tnsnames.ora files on applications servers to add both production and disaster recovery hosts to the Connection Strings.
  6. Configure Microsoft's Azure Site Recovery (ASR) for application VM replication.
  7. Manage DB Services from the OracleDB for Azure Portal.

The following diagram illustrates this reference architecture.



oci-azure-multicloud-dr-oracle.zip

The architecture has the following components:

Oracle Cloud Infrastructure components

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Remote peering

    Remote peering allows the VCNs' resources to communicate using private IP addresses without routing the traffic over the internet or through your on-premises network. Remote peering eliminates the need for an internet gateway and public IP addresses for the instances that need to communicate with another VCN in a different region.

  • Oracle Database Service for Microsoft Azure

    Oracle Database Service for Microsoft Azure (OracleDB for Azure) is an Oracle Cloud Infrastructure (OCI) service with your database resources residing in OCI. Your OCI account is linked to your Azure account through Oracle Database Service for Microsoft Azure Network Link, which is an Oracle-managed tunnel connection. OracleDB for Azure connects components in your Azure and OCI tenants.

    OracleDB for Azure allows you to easily integrate Oracle Cloud Infrastructure Database into your Azure cloud environment. OracleDB for Azure uses a service-based approach and is an alternative to manually creating complex cross-cloud deployments for your application stacks.

  • Exadata Database Service

    Oracle Exadata Database Service enables you to leverage the power of Exadata in the cloud. You can provision flexible X8M and X9M systems that allow you to add database compute servers and storage servers to your system as your needs grow. X8M and X9M systems offer RDMA over Converged Ethernet (RoCE) networking for high bandwidth and low latency, persistent memory (PMEM) modules, and intelligent Exadata software. You can provision X8M and X9M systems by using a shape that's equivalent to a quarter-rack X8 and X9M system, and then add database and storage servers at any time after provisioning.

    Oracle Exadata Database Service on Dedicated Infrastructure provides Oracle Exadata Database Machine as a service in an Oracle Cloud Infrastructure (OCI) data center. The Oracle Exadata Database Service on Dedicated Infrastructure instance is a virtual machine (VM) cluster that resides on Exadata racks in an OCI region.

    Oracle Exadata Database Service on Cloud@Customer provides Oracle Exadata Database Service that is hosted in your data center.

  • Data Guard

    Oracle Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to remain available without interruption. Oracle Data Guard maintains these standby databases as copies of the production database. Then, if the production database becomes unavailable because of a planned or an unplanned outage, Oracle Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage.

Microsoft Azure components
  • Virtual network (VNet)

    Azure Virtual Network (VNet) is the fundamental building block for your private network in Azure. VNet enables many types of Azure resources, such as Azure virtual machines (VM), to securely communicate with each other, the internet, and on-premises networks.

  • VNet Peering over Azure backbone

    Virtual network peering enables you to seamlessly connect two or more Virtual Networks in Azure. The virtual networks appear as one for connectivity purposes. The traffic between virtual machines in peered virtual networks uses the Microsoft backbone infrastructure. Like traffic between virtual machines in the same network, traffic is routed through Microsoft's private network only. Global virtual network peering is used to connect virtual networks across Azure regions.

  • Site Recovery

    Azure Site Recovery replicates an Azure VM to a different Azure region directly from the Azure portal. You can minimize recovery issues by sequencing the order of multi-tier applications running on multiple virtual machines and keep applications available during a disaster.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • Disaster recovery
    • Disaster recovery testing is a standard practice of enterprise IT operations. It is recommended to switch over production and Disaster Recovery every three to six months for best failover and failback assurance.
    • Database and application failover must happen at the same time, in OCI and Azure to the disaster recovery regions, respectively.
    • Use the recommended TNS connection string and follow Oracle Maximum Availability Architecture best practices to ensure continuous availability for your applications.
      ALIAS = (DESCRIPTION = (CONNECT_TIMEOUT=90) (RETRY_COUNT=20)(RETRY_DELAY=3)
              (TRANSPORT_CONNECT_TIMEOUT=3)
                    (ADDRESS_LIST = (LOAD_BALANCE=on)
                          ( ADDRESS = (PROTOCOL = TCP)(HOST=primary-scan)(PORT=1521)))
                    (ADDRESS_LIST = (LOAD_BALANCE=on)
                          ( ADDRESS = (PROTOCOL = TCP)(HOST=secondary-scan)(PORT=1521)))
                    (CONNECT_DATA=(SERVICE_NAME = gold-cloud)))

    The TNS connection string includes both the primary and standby hosts. Specific values may be tuned but the values in this example are reasonable starting points.

  • Use Oracle Data Guard Fast-Start Failover to automate the database failover and reduce the recovery time without performing any manual steps.
  • Create role-based custom database services for your application using the primary role for the primary site, and standby role for the secondary site when used for read-only workloads. Database services start and stop automatically at a site based on their role.

Considerations

Consider the following factors when deploying this reference architecture.

  • Recovery Point Objective (RPO) and Recovery Time Objective (RTO)

    Choose the proper Oracle Data Guard data protection mode based on your business requirements. Choose Maximum Performance or Maximum Availability to prioritize production database performance or Maximum Protection to prioritize data protection.

  • Network connectivity

    Establish on-premises network connectivity to multiple OCI and Azure regions.

  • Low latency

    Place Azure application VM in the availability zone that has the lowest latency with Oracle Database nodes created by OracleDB for Azure.

  • Zero data loss

    Oracle Data Guard Far Sync instance can be used to provide zero data loss without impacting application performance if the network latency between the regions is too high.

Acknowledgments

  • Authors: Thomas Van Buggenhout, Ejaz Akram, Wei Han, Sinan Petrus Toma
  • Contributor: Julien Silverston