Learn About High Availability in the Cloud

You need your applications in the cloud to be available 24/7; their workloads must continue to run regardless of any outages in the cloud infrastructure. Designing a highly available service or application will help ensure maximum potential uptime and accessibility.

About High Availability

To design a high availability architecture, three key elements should be considered— redundancy, monitoring, and failover:

  • Redundancy means that multiple components can perform the same task. The problem of a single point of failure is eliminated because redundant components can take over a task performed by a component that has failed.
  • Monitoring means checking whether or not a component is working properly.
  • Failover is the process by which a secondary component becomes primary when the primary component fails.
The best practices introduced here focus on these three key elements. Although high availability can be achieved at many different levels, including the application level and the cloud infrastructure level, here we will focus on the cloud infrastructure level.

About the High Availability Capabilities of Oracle Cloud

An Oracle Cloud Infrastructure region is a localized geographic area composed of one or more availability domains, each composed of three fault domains.

An availability domain is one or more data centers located within a region. Availability domains are isolated from each other, fault tolerant, and unlikely to fail simultaneously. Because availability domains do not share physical infrastructure, such as power or cooling, or the internal availability domain network, a failure that impacts one availability domain is unlikely to impact the availability of others.

A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain contains three fault domains. Fault domains let you distribute your instances so that they are not on the same physical hardware within a single availability domain. As a result, an unexpected hardware failure or a Compute hardware maintenance that affects one fault domain does not affect instances in other fault domains. You can optionally specify the fault domain for a new instance at launch time, or you can let the system select one for you.

All the availability domains in a region are connected to each other by a low-latency, high bandwidth network. This predictable, encrypted interconnection between availability domains provides the building blocks for both high availability and disaster recovery.

Oracle Cloud Infrastructure resources are either specific to a region, such as a virtual cloud network, or specific to an availability domain, such as a Compute instance. When you configure your cloud services, if the services are specific to an availability domain, it is important to leverage multiple availability domains or fault domains to ensure high availability and to protect against resource failure. By creating redundant Compute instances in other availability domains or fault domains, you can avoid an impact to your applications by an issue that affects the primary Compute instance or its domain. You can design solutions to have multiple regions, multiple availability domains, or multiple fault domains, depending on the class of failures you want protect against.