Perform cross-regional disaster recovery for Oracle Exadata Database Service on Oracle Database@Azure

When designing applications, it is essential to ensure business continuity by establishing a robust disaster recovery mechanism for restoring operations in the event of an outage.

For many years, customers have trusted Oracle Exadata Database Service using Oracle Maximum Availability Architecture (MAA) to power mission-critical applications both on premises and on Oracle Cloud Infrastructure (OCI). Oracle Exadata Database Service on Oracle Database@Azure offers feature and price parity with Exadata on OCI and can be deployed across multiple Microsoft Azure availability zones (AZs) and regions to ensure high availability and disaster recovery.

Architecture

This architecture shows a high-availability, containerized Azure Kubernetes Service (AKS) application with Oracle Exadata Database Service on Oracle Database@Azure in a cross-regional, disaster recovery topology.

A high-availability, containerized Azure Kubernetes Service (AKS) application is deployed in two Azure regions: a primary region and a standby region. The container images are stored in the Azure container registry and are replicated between primary and standby regions. Users access the application externally through a public load balancer.

For data protection, the Oracle Database is running in an Exadata virtual machine (VM) cluster in the primary region, with Oracle Data Guard or Oracle Active Data Guard replicating the data to the standby database running on an Exadata VM cluster in the standby region.

The database transparent data encryption (TDE) keys are stored in Oracle Cloud Infrastructure Vault and replicated between the Azure and OCI regions. The automatic backups are in OCI for both the primary and standby regions. Customers can use Oracle Cloud Infrastructure Object Storage or Oracle Database Autonomous Recovery Service as the preferred storage solution.

The Oracle Exadata Database Service on Oracle Database@Azure network is connected to the Exadata client subnet by using a dynamic routing gateway (DRG) managed by Oracle. A DRG is also required to create a peer connection between VCNs in different regions. Because only one DRG is allowed per VCN in OCI, a second VCN with its own DRG is required to connect the primary and standby VCNs in each region. In this example:

  • The primary Exadata VM cluster is deployed in the VCN Primary VCN client subnet (10.5.0.0/24).
  • The Hub VCN Primary VCN for the transit network is 10.15.0.0/29.
  • The standby Exadata VM cluster is deployed in the VCN Standby VCN client subnet (10.6.0.0/24).
  • The Hub VCN Standby VCN for the transit network is 10.16.0.0/29.

No subnet is required for the Hub VCNs to enable transit routing, therefore these VCNs can use a very small network. The VCNs on the OCI child site are created after the Oracle Exadata Database Service VM clusters on Oracle Database@Azure have been created for the primary and standby databases.

The following diagram illustrates the architecture:



exadb-dr-db-azure-oracle.zip

Microsoft Azure provides the following components:

  • Microsoft Azure region

    An Azure region is a geographical area in which one or more physical Azure data centers, called availability zones, reside. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

    Azure and OCI regions are localized geographic areas. For Oracle Database@Azure, an Azure region is connected to an OCI region, with availability zones (AZs) in Azure connected to availability domains (ADs) in OCI. Azure and OCI region pairs are selected to minimize distance and latency.

  • Microsoft Azure availability zone

    An availability zone is a physically-separate data center within a region that is designed to be highly available and fault tolerant. Availability zones are close enough to have low-latency connections to other availability zones.

  • Microsoft Azure Virtual Netwok

    Microsoft Azure Virtual Network (VNet) is the fundamental building block for a private network in Azure. VNet enables many types of Azure resources, such as Azure virtual machines (VM), to securely communicate with each other, the internet, and with on-premises networks.

  • Microsoft Azure Delegated Subnet

    Subnet delegation alows you to inject a managed service, specifically a platform-as-a-service (PaaS) service, directly into your virtual network. A delegated subnet can be a home for an externally managed service inside of your virtual network so that the external service acts as a virtual network resource, even though it is an external PaaS service.

  • Microsoft Azure VNIC

    The services in Azure data centers have physical network interface cards (NICs). Virtual machine instances communicate using virtual NICs (VNICs) associated with the physical NICs. Each instance has a primary VNIC that's automatically created and attached during launch and is available during the instance's lifetime.

  • Microsoft Azure Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VNet, typically through gateways. Route tables are associated with subnets in a VNet.

  • Azure Virtual Network Gateway

    Azure Virtual Network Gateway service establishes secure, cross-premises connectivity between an Azure virtual network and an on-premises network. It allows you to create a hybrid network that spans your data center and Azure.

Oracle Cloud Infrastructure provides the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domain

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain shouldn't affect the other availability domains in the region.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and does not traverse the internet.

  • Local peering gateway (LPG)

    An LPG enables you to peer one VCN with another VCN in the same region. Peering means the VCNs communicate using private IP addresses, without the traffic traversing the internet or routing through your on-premises network.

  • Network security group (NSG)

    Network security group (NSG) acts as a virtual firewall for your cloud resources. With the zero-trust security model of Oracle Cloud Infrastructure, all traffic is denied, and you can control the network traffic inside a VCN. An NSG consists of a set of ingress and egress security rules that apply to only a specified set of VNICs in a single VCN.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Data Guard

    Oracle Data Guard and Oracle Active Data Guard provide a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases and that enable production Oracle databases to remain available without interruption. Oracle Data Guard maintains these standby databases as copies of the production database using in-memory replication. If the production database becomes unavailable due to a planned or an unplanned outage, Oracle Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Oracle Active Data Guard provides the additional ability to offload read-mostly workloads to standby databases and also provides advanced data protection features.

  • Oracle Database Autonomous Recovery Service

    Oracle Database Autonomous Recovery Service lets you make a point-in-time snapshot of the data on block volumes, boot volumes, and in Oracle Databases. With backup automation and enhanced data protection capabilities for OCI databases, you can offload all backup processing and storage requirements to Oracle Database Autonomous Recovery Service, thereby eliminating backup infrastructure costs and manual administration overhead.

  • Exadata Database Service

    Oracle Exadata is an enterprise database platform that runs Oracle Database workloads of any scale and criticality with high performance, availability, and security. Exadata’s scale-out design employs unique optimizations that let transaction processing, analytics, machine learning, and mixed workloads run faster and more efficiently. Consolidating diverse Oracle Database workloads on Exadata platforms in enterprise data centers, on Oracle Cloud Infrastructure (OCI), and in multicloud environments helps organizations increase operational efficiency, reduce IT administration, and lower costs.

    Oracle Exadata Database Service enables you to leverage the power of Exadata in the cloud. Oracle Exadata Database Service delivers proven Oracle Database capabilities on purpose-built, optimized Oracle Exadata infrastructure in the public cloud and on Cloud@Customer. Built-in cloud automation, elastic resource scaling, security, and fast performance for all Oracle Database workloads helps you simplify management and reduce costs.

  • Oracle Database@Azure

    Oracle Database@Azure integrates Oracle technologies, such as Oracle Exadata Database Service, Oracle Autonomous Database Serverless, Oracle Real Application Clusters (Oracle RAC), and Oracle Data Guard into the Microsoft Azure platform.

    Oracle Database@Azure is the Oracle Database service running on Oracle Cloud Infrastructure (OCI), and is collocated in Microsoft Azure data centers. The service offers feature and price parity with OCI. You can purchase the service on Azure Marketplace. Oracle Database@Azure offers the same low latency as other Azure-native services and meets mission-critical workload and cloud-native development requirements. You can manage the service on the Azure console and with Azure automation tools. The service is deployed in Azure Virtual Network (VNet) and is integrated with the Azure identity and access management system. The OCI and Oracle Database metrics and audit logs are natively available in Azure. The service requires that users have an Azure tenancy and an OCI tenancy.

Recommendations

Use the following recommendations as a starting point when performing cross-regional disaster recovery for Oracle Exadata Database Service on Oracle Database@Azure. Your requirements might differ from the architecture described here.
  • Deploy the required Exadata infrastructure in both primary and standby regions. For each Exadata instance, deploy an Exadata VM cluster in the delegated subnet of a Microsoft Azure virtual network (VNet). The Oracle Real Application Clusters (RAC) database can then be instantiated on the cluster. In the same VNet, deploy Azure Kubernetes Service (AKS) in a separate subnet. Configure Oracle Data Guard to replicate data from one Oracle Database to the other, across regions.
  • When Exadata VM clusters are created in the Oracle Database@Azure child site, each is created within its own Oracle Cloud Infrastructure virtual cloud network (VCN). Oracle Data Guard requires that the databases communicate with each other to ship redo data. The VCNs must be peered to enable this communication.

Considerations

When performing cross-regional disaster recovery for Oracle Exadata Database Service on Oracle Database@Azure, consider the following.

  • Preparation for a disaster scenario requires a comprehensive approach that considers different business requirements and availability architectures and that encompasses those considerations in an actionable, high-availability (HA), disaster-recovery (DR) plan. The scenario described here provides guidelines to help select the approach that best fits your application deployment by using a simple but effective failover for the disaster recovery configuration in your Oracle Cloud Infrastructure (OCI) and Microsoft Azure environments.
  • Use Oracle Data Guard across regions for the databases provisioned in the Exadata VM Cluster on Oracle Database@Azure by using an OCI-managed network.
  • Oracle Cloud Infrastructure is the preferred network for achieving better performance, measured by latency and throughput, and for achieving reduced cost, as the first 10 TB/Month is free.

Deploy

To configure the network communication between regions shown in the above architecture diagram, complete the following high-level steps.

Primary Region

  1. Create a virtual cloud network (VCN), HUB VCN Primary, in the Oracle Cloud Infrastructure (OCI) primary region.
  2. Deploy two local peering gateways (LPGs), Primary-LPG and Hub-Primary-LPG, in VCN Primary and HUB VCN Primary respectively.
  3. Establish a peer LPG connection between the LPGs for HUB VCN Primary and VCN Primary.
  4. In the VCN Primary VCN, update the default route table to route traffic for the IP address associated with the VCN Standby client subnet to use LPG peering. For example:
    10.6.0.0/24 to Primary-LPG

    Note:

    To update the default route table, you currently must create a support ticket with the title Required VCN Route Table Update permission and provide the region, tenancy OCID, VCN OCID, and Service DRG OCID.
  5. Create a dynamic routing gateway (DRG), Primary-DRG in the Hub VCN Primary VCN.
  6. In the HUB VCN Primary VCN, create the route table, primary_hub_transit_drg, and assign the destination of the VCN Primary client subnet, a target type of LPG, and the target Hub-Primary-LPG. For example:
    10.5.0.0/24 target type: LPG, Target: Hub-Primary-LPG
  7. In the HUB VCN Primary VCN, create a second route table, primary_hub_transit_lpg, and assign the destination of the VCN Primary client subnet, a target type DRG, and a target Primary-DRG. For example:
    10.6.0.0/24 target type: DRG, Target: Primary-DRG
  8. Attach the Hub VCN Primary VCN to the DRG. Edit the DRG VCN attachments, and under advanced options, edit the tab VCN route table to associate it with the the primary_hub_transit_drg route table. This configuration permits transit routing.​
  9. Associate the primary_hub_transit_lpg route table with the Hub-Primary-LPG gateway.
  10. In the Hub VCN Primary default route table, add a route rule for the VCN Primary client subnet IP Address range to use the LPG. Add another route rule for the VCN Standby client subnet IP Address range to use the DRG. For example:
    10.5.0.0/24 LPG Hub-Primary-LPG
    10.6.0.0/24 DRG Primary-DRG
  11. For Primary-DRG, select the DRG route table, Autogenerated DRG Route Table for RPC, VC, and IPSec attachments. Add a static route to the VCN Primary subnet client IP Address range that uses the Hub VCN Primary VCN with a next hop attachment type of VCN and the next hop attachment name Primary Hub attachment. For example:
    10.5.0.0/24 VCN Primary Hub attachment
  12. Use the Primary-DRG remote peering connection attachments menu to create a remote peering connection, RPC.
  13. In the VCN Primary client subnet, update the network security group (NSG) to create a security rule to allow ingress for TCP port 1521. Optionally, you can add SSH port 22 for direct SSH access to the database servers.

    Note:

    For a more precise configuration, disable the import route distribution of the Autogenerated DRG Route Table for RPC, VC, and IPSec attachments route table. For Autogenerated DRG Route Table for VCN attachments, create and assign a new import route distribution including only the required RPC attachment.

Standby Region

  1. Create the VCN, HUB VCN Standby, in the OCI standby region.
  2. Deploy two LPGs, Standby-LPG and Hub-Standby-LPG, in the VCN Standby and the HUB VCN Standby VCNs respectively.
  3. Establish a peer LPG connection between LPGs for VCN Standby and HUB VCN Standby.
  4. In the VCN Standby VCN, update the default route table to route traffic for the IP address associated with the VCN Primary client subnet to use LPG peering. For example:
    10.5.0.0/24 to Standby-LPG

    Note:

    To update the default route table, you currently must create a support ticket with the title Required VCN Route Table Update permission and provide the region, tenancy OCID, VCN OCID, and Service DRG OCID.
  5. Create a DRG, Standby-DRG in the Hub VCN Standby VCN.
  6. In the HUB VCN Standby VCN, create a route table, standby_hub_transit_lpg, and assign the destination of the VCN Standby client subnet, a target type of LPG, and a target Hub-Standby-LPG. For example:
    10.6.0.0/24 target type: LPG, Target: Hub-Standby-LPG
  7. In the HUB VCN Standby VCN, create a second route table, standby_hub_transit_drg and assign the destination of the VCN Standby client subnet, a target type DRG, and a target Standby-DRG. For example:
    10.5.0.0/24 target type: DRG, Target: Standby-DRG
  8. Attach the Hub VCN Standby VCN to the DRG. Edit The DRG VCN attachments and under advanced options, edit the VCN route table to associate it with the standby_hub_transit_drg route table. This configuration permits transit routing.
  9. In the Hub VCN Standby default route table add route rules for the VCN Standby client subnet IP Address range to use the LPG and for the VCN Primary client subnet IP Address range to use the DRG. For example:
    10.6.0.0/24 LPG Hub-Standby-LPG
    10.5.0.0/24 DRG Standby-DRG
  10. Associate the route table, standby_hub_transit_lpg with the Hub-Standby-LPG gateway.
  11. For Standby-DRG, select DRG route table Autogenerated Drg Route Table for RPC, VC, and IPSec attachments. Add a static route to the VCN Standby subnet client IP Address range that use the Hub VCN Standby VCN with a next hop attachment type of VCN and the next hop attachment name Standby Hub attachment. For example:
    10.6.0.0/24 VCN Standby Hub attachment
  12. Use the Standby-DRG remote peering connection attachments menu to create a remote peering connection, RPC.
  13. Select the remote peering connection, select Establish Connection, and provide the Primary-DRG OCID. The peering status becomes peered. Both regions are connected.
  14. In the VCN Standby client subnet, update the NSG to create a security rule to allow ingress for TCP port 1521. Optionally, you can add SSH port 22 for direct SSH access to the database servers.

Data Guard Association

  1. To enable Oracle Data Guard or Oracle Active Data Guard for the Oracle Database, on the Oracle Database details page, click Data Guard Associations, then click Enable Data Guard.
  2. On the Enable Data Guard page:
    1. Select the standby region.
    2. Select the standby availability domain mapped to Azure AZ.
    3. Select the standby Exadata infrastructure.
    4. Select the desired standby VM cluster.
    5. Choose Oracle Data Guard or Oracle Active Data Guard. MAA recommends Oracle Active Data Guard for auto block repair of data corruptions and the ability to offload reporting.
    6. For cross-region Oracle Data Guard associations, only the maximum performance protection mode is supported.
    7. Select an existing database home or create one. It's recommended to use the same database software image of the primary database for the standby database home, so that both have the same patches available.
    8. Enter the password for the SYS user and enable Oracle Data Guard.

    After Oracle Data Guard is enabled, the standby database will be listed in the Data Guard Associations section.

  3. (Optional) Enable automatic failover (Fast-Start Failover) to reduce the recovery time in case of failures by installing Data Guard Observer on a separate VM, preferably in a separate location or in the application network.

Acknowledgments

  • Authors: Ricardo Anda, Srikanth Bolisetty, Julien Silverston, Andy Steinorth
  • Contributors: Tammy Bednar, Wei Han, Glen Hawkins, Gavin Parish, Sinan Petrus Toma, Lawrence To, Thomas Van Buggenhout, Robert Lies