Shared Responsibility Model for Resiliency

Resiliency in the cloud is a shared responsibility between you (the user) and Oracle. For you to build resilient workload architectures in Oracle Cloud Infrastructure (OCI), you must understand your high availability and disaster recovery requirements and responsibilities.

Oracle's Responsibility: “Resiliency of the Cloud”

OCI is responsible for the "resiliency of the cloud." OCI provides a robust, highly available and resilient global cloud infrastructure consisting of data centers, network, physical hardware, and software designed to minimize downtime and ensure that applications remain accessible and functional even in the event of failures. OCI offers end-to-end service level agreements (SLAs) covering performance, availability, and manageability of these services.

OCI is physically hosted in multiple regions. The regions are independent and are geographically dispersed within a country, between countries, or among continents. Each region is comprised of one or more availability domains (ADs), which are named Single-AD or Multi-AD respectively. Each AD is an independent data center, and in multi-AD regions, each one is isolated to help reduce the risk of failure in affecting others.

The ADs are connected by a secured, low latency, high-bandwidth network, which lets you build resilient, highly-available solutions across multiple ADs (where available). Additionally, every AD contains three fault domains (FDs). Each FD is a grouping of hardware and infrastructure distinct from the other FDs in the same AD. FDs allow distributing resources so that they don't depend on the same physical hardware within a single AD. As a result, hardware failures or maintenance events that affect one FD don't affect the resources in other FDs.

OCI core infrastructure components, such as Compute, Storage, Networking, Identity, and Database services have built-in redundancies. You can leverage ADs, FDs, and these services to build highly available applications. However, OCI doesn't automatically replicate, deploy, or perform failover for application resources and data provisioned in a user’s tenancy to another AD or region in the event of a disaster or partial/complete regional outage. It's the user’s responsibility to deploy their application resources across ADs and regions.

For example, if an application is deployed on a compute instance (with a block volume) within one AD (for example, AD1), OCI won't automatically provision a new compute instance in a different AD or region in the event of a failure affecting the instance.

Note: Block storage has built in redundancies.

Your Responsibility: “Resiliency in the Cloud”

To achieve "resiliency in the cloud", you are ultimately responsible for developing a comprehensive business continuity plan, including a high-availability (HA) and disaster recovery (DR) strategy, risk assessments, and incident response plans. You are also responsible for deploying your applications and systems across multiple FDs, ADs, and regions for resiliency and fault tolerance using OCI best practices and Maximum Availability Architecture (MAA) Frameworks. Each component of the application should be designed to ensure it has the maximum potential for uptime and accessibility. To ensure high-availability, single points of failures must be identified and eliminated so that even if components fail, the application remains running and available.

In the event of disaster or full regional outage, whether it involves a Single-AD or Multi-AD region, it's your responsibility to ensure OCI resource availability is allocated for your tenancy in the failover AD or region before executing a disaster recovery plan.

Resiliency is a Shared Responsibility Between OCI and You

OCI Responsibilities: Resiliency of the Cloud

ComponentsDescription
Region, Availability Domains, Fault DomainsOracle provisions, manages, monitors, secures, and operates a highly reliable global cloud infrastructure.
OCI Storage ServicesOracle provisions and operates storage services, providing service high availability and protecting data physically within an availability domain.
OCI Core Networking ServicesOracle provides high availability for OCI core networking services and connectivity services with global traffic shaping that ensures optimal application connectivity and performance.
OCI Database ServicesOracle creates and initiates the Database service, conducts hardware maintenance and enhancements, updates storage servers, and oversees service health.

Your Responsibilities: Resiliency in the Cloud

ComponentsDescription
HA, DR, and failover planning and testingPlan, configure, test, and run HA, DR, and failover solutions for data and service resiliency to ensure business continuity.
Operations and ManagementYou are responsible for operating and monitoring your cloud resources, implementing resilient cloud architecture best practices to minimize service disruptions.
Workload ArchitectureYou are responsible for using Enterprise Architecture Best Practices and Maximum Availability Architecture (MAA) frameworks for designing, building, and maintaining reliable, secure, efficient, and cost-effective cloud workloads.
Resiliency PlanningYou are responsible for developing a comprehensive business continuity plan, including HA and DR strategy, risk assessments, and incident response plans.

How OCI Delivers Cloud Resiliency

The following information describes ways in which OCI delivers cloud resiliency.

OCI Responsibilities for Services

  • OCI Architecture is built with resiliency, deploying multiple components that can execute the same task.
  • OCI monitors the health of OCI services and manages automatic failover in case of service disruption.
  • OCI core platform services, servers, and storage, networking, core Identity and Access Management (IAM), and telemetry services are designed and deployed redundantly. OCI continuously monitors their health, and in the event of a failure, automatic failover processes are executed to provide continuity.
  • OCI Storage services have built in resiliency. OCI Block Volume provides persistent, high-performance data storage within an AD. Similarly, OCI Object Storage provides persistent, durable, high-performance data storage within an AD. Additionally, in multi-AD regions, object store replicates the data across ADs automatically. File storage maintains replicas across fault domains, within an AD.
  • Oracle provides highly robust and resilient Database Services within OCI that let you select the most suitable HA and DR strategy for your needs.
  • OCI DNS is hosted across multiple geographically distributed data centers, making it highly available. It also provides low latency, a basic level of load-balancing, and resiliency to be able to handle outages or heavy traffic with minimal impact to users.

Your Responsibilities for Achieving Resiliency

The following information describes ways in which you are responsible for achieving resiliency.

Process Recommendations

Identity Domains

  • Plan for disaster recovery and identity domains.
  • Identity domain replication is always enabled for the “default” identity domain. The “default” identity domain always replicates to all regions to which the tenant is subscribed. When an administrator subscribes to another region, the “default” identity domain automatically replicates to that region.
  • Additional identity domains are created in the “home region” specified at the creation time. They don't replicate to other subscribed regions unless replication is specifically enabled.

Compute

  • Plan high availability for Compute instances, distributing them across FDs in each of the ADs, and placing them behind load balancers.
    • Enable backup for a point-in-time snapshot of your volumes.
    • Set up cross-region replication of block volumes, boot volumes, and volume groups.
    • Make the compute images available in both an active and a DR region. In the region for DR, deploy a minimal setup to maintain a warm standby. Then, use capacity reservations to reserve the rest of the required capacity to run all the VMs when the DR region becomes primary.

Storage

Database

OCI HA DR Decision Tree

OCI HA – DR Decision Tree.

Explore More