Plan High Availability for Network Resources

One of the first steps in working with Oracle Cloud Infrastructure is to set up a virtual cloud network (VCN) for your cloud resources. Ensuring the high availability of this network is one of the most important considerations in your architecture design.

To plan for high availability of your network resources, the key design strategies you should consider are:
  • Determine the right size of your network's subnets.
  • Plan high availability configurations for these key components: Load Balancers, IPSec VPN Connections, and FastConnect Circuits.
This article describes these strategies.

Determine the Right Size of Subnets

A subnet is a subdivision of a cloud network. Establishing high availability of your network requires sizing this resource correctly.

Each subnet in a VCN consists of a contiguous range of IP addresses that do not overlap with other subnets in the VCN (for example, 172.16.1.0/24). The first two IP addresses and the last one in the subnet's CIDR are reserved by the Oracle Cloud Infrastructure Networking service. You can't change the size of the subnet after it is created, so it's important to determine the size you need before creating any subnets. Consider the future growth of your workloads and leave sufficient capacity to meet high availability requirements, such as the need to set up standby Compute instances.

Plan High Availability for Load Balancers

Oracle Cloud Infrastructure Load Balancing provides automated traffic distribution from one entry point to multiple servers reachable from your VCN. The service offers a load balancer with your choice of a public or private IP address and provisioned bandwidth.

The Load Balancing service improves resource utilization, facilitates scaling, and helps ensure high availability. It supports routing incoming requests to various backend sets based on virtual hostname, path route rules, or combination of both.

To accept traffic from the internet, you create a public load balancer. The service assigns it a public IP address that serves as the entry point for incoming traffic. You can associate the public IP address with a friendly DNS name through any DNS vendor.

A public load balancer is regional in scope. It is inherently highly available across availability domains. In a region that has a single availability domain, the load balancer nodes are distributed across fault domains. To achieve high availability for your systems, you can put the systems behind a public load balancer. For instance, you can put your web server VMs as backend server sets behind a public load balancer, as illustrated in the following diagram:

Description of public-lb.png follows
Description of the illustration public-lb.png

Note:

The architecture shows multiple availability domains (ADs). For a region that has a single AD, adjust the architecture to distribute your resources across the fault domains within the AD.

To isolate your load balancer from the internet and simplify your security posture, you create a private load balancer. The Load Balancing service assigns it a private IP address that serves as the entry point for incoming traffic.

When you create a private load balancer, the service requires only one subnet to host both the primary and standby load balancers. The load balancer can be regional or AD-specific, depending on the scope of its host subnet

Description of pvt-lb.png follows
Description of the illustration pvt-lb.png

Note:

The architecture shows multiple availability domains (ADs). For a region that has a single AD, adjust the architecture to distribute your resources across the fault domains within the AD.
To provide high availability across availability domains, you can configure multiple private load balancers on Oracle Cloud Infrastructure and use on-premises or private DNS servers to set up a round-robin DNS configuration with the IP addresses of the private load balancers. The following is an overview of this process:
  1. Deploy two private load balancers, one in each availability domain.
  2. Configure two custom DNS VMs in the VCN.
  3. Modify the VCN Default DHCP options to use a Custom DNS Resolver and set the DNS servers to the IP addresses of the DNS VMs.
  4. Add a new round-robin DNS zone entry for the private load balancer FQDN with a low TTL.
  5. Add two A records with the IP addresses of the two private load balancers.
  6. Use the FQDN of the private load balancer when accessing the private load balancer.

Understand FastConnect and VPN High Availability Design

Understanding how to design your network for redundancy so that it meets the requirements for the Oracle Cloud Infrastructure IPSec VPNs and FastConnect service level agreement ensures the highly available, fault-tolerant network connections that are key to well-architected systems.

An organization’s business-availability and application requirements help determine the most appropriate configuration when designing remote connections. Generally, however, you should consider using redundant hardware and network service providers between your location and Oracle’s data centers. The most robust option is to use multiple FastConnect connections with circuits from different network service providers. To achieve high availability for your network, we recommend the following best practices:
  • Schedule regular maintenance by Oracle, your provider, or your own organization.
  • Avoid single points of failure, even if you are planning to use multiple interfaces for availability. High availability connections require redundant hardware, even when connecting from the same physical location.
  • Consider a dual provider approach to ensure network diversity when selecting FastConnect providers.
  • Provision sufficient network capacity to ensure that the failure of one network connection doesn’t overwhelm and degrade redundant connections.

Plan High Availability for IPSec VPN Connections

You can choose to implement IPSec VPN connections to connect your data center to Oracle Cloud Infrastructure. An IPSec VPN connection is easy to set up and cost-effective.

To enable redundancy, each Oracle Cloud Infrastructure dynamic routing gateway (DRG) has multiple VPN endpoints so that each IPSec VPN connection consists of multiple redundant IPSec tunnels that use static routes to route traffic. To ensure high availability, you must set up VPN connections within your internal network to use either path when needed as illustrated in following diagram:

Description of vpn-redundancy.png follows
Description of the illustration vpn-redundancy.png

If your data centers span multiple geographical locations, we recommend using a broad CIDR (0.0.0.0/0) as a static route in addition to the CIDR of the specific geographical location. This broad CIDR provides high availability and flexibility to your network design.

For instance, the following diagram shows two networks in separate geographical areas that each connect to Oracle Cloud Infrastructure. Each area has a single on-premises router, so two IPSec VPN connections can be created. Note that each IPSec VPN connection has two static routes: one for the CIDR of the particular geographical area, and a broad 0.0.0.0/0 static route.

Description of redundancy-multiple-onprem-network.png follows
Description of the illustration redundancy-multiple-onprem-network.png

In one scenario, the CPE 1 router in the preceding diagram goes down. If Subnet 1 and Subnet 2 can communicate with each other, the VCN is still able to access the systems in Subnet 1 because of the 0.0.0.0/0 static route that goes to CPE 2. The following diagram illustrates this scenario:

Description of vpn-redundancy-multiple-onprem-networks-failover.png follows
Description of the illustration vpn-redundancy-multiple-onprem-networks-failover.png

In another scenario, you add a new geographical area with Subnet 3 and connect it to Subnet 2. You would add a route rule to your VCN’s route table for Subnet 3 so that the VCN can reach systems in Subnet 3 without creating a new VPN connection because of the 0.0.0.0/0 static route that goes to CPE 2. The following diagram illustrates this scenario:

Description of vpn-redundancy-additional-onprem-network.png follows
Description of the illustration vpn-redundancy-additional-onprem-network.png

Plan High Availability for FastConnect Circuits

Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable and consistent networking experience compared to internet-based connections.

With FastConnect, you can choose to use private peering, public peering, or both.
  • Use private peering to extend your existing infrastructure into a virtual cloud network (VCN) in Oracle Cloud Infrastructure (for example, to implement a hybrid cloud, or in a lift-and-shift scenario). Communication across the connection is with IPv4 private addresses (typically RFC 1918).
  • Use public peering to access public services in Oracle Cloud Infrastructure without using the internet (for example, to access the Oracle Cloud Infrastructure Console and APIs, or public load balancers in your VCN). Communication across the connection is with IPv4 public IP addresses. Without FastConnect, the traffic destined for public IP addresses would be routed over the internet. With FastConnect, that traffic goes over your private physical connection.

You can either connect directly to Oracle Cloud Infrastructure routers in provider points-of-presence (POPs) or use one of Oracle’s many partners to connect from POPs around the world to your Oracle Cloud Infrastructure Networking resources. Oracle provides features that allow you to build fault-tolerant connections, including multiple POPs per region and multiple FastConnect routers per POP.

To avoid a single point of failure for FastConnect, consider the following redundancy options:
  • Multiple FastConnect locations within each metro area
  • Multiple routers in each FastConnect location
  • Multiple physical circuits in each FastConnect location
Oracle handles the redundancy of the routers and physical circuits in the FastConnect locations. In your network design with FastConnect, we recommend considering the following redundancy configurations for your high availability requirements:
  • Availability domain redundancy: Connect to any FastConnect location and access services located in any availability domain within a region. This configuration provides availability domain resiliency via multiple POPs per region. Peering connections terminate on routers in the POP.
  • Data center location redundancy: Connect at two different FastConnect locations per region.
  • Router redundancy: Connect to two different routers per FastConnect location.
  • Circuit redundancy: Have multiple physical connections at any of the FastConnect locations. Each of these circuits can have multiple physical links in an aggregated interface/LAG, which adds another level of redundancy.
  • Partner/provider redundancy: Connect to the FastConnect locations by using single or multiple partners.
Based on the location of the on-premises data center, you can establish a FastConnect connection in one of the following ways:
  • Colocation (port speed of 10 Gbps): By colocating with Oracle in a FastConnect location
  • Oracle provider (port speeds in 1-Gbps and 10-Gbps increments): By connecting to an Oracle provider
In a colocation scenario, a cross-connect is the physical cable connecting your existing network to Oracle in the FastConnect location. When you are provisioning your FastConnect service, we recommend that you set up at least two cross-connects. Each cross-connect should connect to a different router, so that a failure in one router does not impact your connection to Oracle Cloud Infrastructure resources. After making the first cross-connect, you can request that the second one be provisioned on a different Oracle FastConnect router than the first one. You should provision new virtual circuits on both redundant links, which ensures connectivity between your on-premises network and Oracle Cloud Infrastructure VCNs if one router fails.

For the Oracle provider scenario, we recommend that you set up redundant circuits with two different FastConnect locations by the same provider or different providers. With this configuration, you can have redundancy on both the circuits and the data center levels. The following diagram illustrates FastConnect connection with two virtual circuits and two different FastConnect locations:

Description of fastconnect-multiple-fc-locations.png follows
Description of the illustration fastconnect-multiple-fc-locations.png

Oracle’s FastConnect partners have redundant links to the Oracle network. As a customer of the partner, you should have redundant links to the partner’s network. These connections should be on different routers, both in your network and in the partner’s network. When you provision virtual circuits, provision them across your multiple provider links.

This diagram illustrates these redundant connections:

Description of fastconnect-dual-vc.png follows
Description of the illustration fastconnect-dual-vc.png

Some additional configuration strategies you should consider are to:
  • Avoid Impact During Planned Maintenance

    When you want to perform maintenance on one of your routers, you can configure your Border Gateway Protocol (BGP) local preference on routes learned over their virtual circuit so that the local preference is higher on the router that will stay in service. BGP local preference is used to modify outbound traffic preference in an on-premises network.

    You can modify traffic from Oracle to your network by using BGP AS prepending. On the router where the maintenance will be performed, prepend your local BGP AS number. Doing so causes the Oracle Cloud network to prefer the FastConnect virtual circuit that has the shorter AS path.

    After you modify the BGP local preference and AS prepending, monitor your router’s virtual circuit interface counters and verify that the in and out packet counter rates are very low. The only traffic remaining on the link should be BGP protocol traffic.

  • Continuously Test Redundant Paths

    During normal operation, we recommend using all available paths between your on-premises network and the Oracle Cloud. Doing so ensures that if a failure occurs, your redundant path is already working. Alternatively, using an active/backup design means that you trust that your backup path will work during a failure. For this reason, you should consider using equal BGP local preference and BGP AS path length.

Use Both IPSec VPN and FastConnect

To have an additional level of redundancy, you can set up both IPSec VPN and FastConnect to connect your on-premises data centers to Oracle Cloud Infrastructure.

When you set up both an IPSec VPN connection and FastConnect virtual circuits to the same DRG, remember that the IPSec VPN uses static routes but FastConnect uses BGP. Oracle Cloud Infrastructure advertises a route for each of your VCN’s subnets over the FastConnect virtual circuit BGP session, and overrides the default route selection behavior to prefer BGP routes over static routes if a static route overlaps with a route advertised by your on-premises network. The following diagram illustrates this configuration:

Description of vpn_fastconnect.png follows
Description of the illustration vpn_fastconnect.png