Learn About the Cyber-Resilience Pillar in OCI

Cyber resilience is the logical evolution and extension of traditional backup and recovery. Cyber resilience assumes that the backup and recovery environment will be attacked, that the data integrity of the data backup is suspect and must be validated before restoring in a production environment.

The simplest use case here is a situation where fields in a production database are encrypted and backed up. As far as the backup is concerned, the database fields were backed up in an as-intended state and would restore them in that same defunct state. The same could hold true with unstructured storage. A backup can back up malware and assume that the malware is trusted data and restore it and re-propagate it into the restored environment without considering that the integrity and validity of the datasets may have been compromised. Cyber resilience assumes that before any dataset can be recovered in a safe production restoration capacity, it must be validated for integrity and security before being used to restore data stores attached to compute systems in a Safe Room production environment.

Traditional disaster recovery and cyber resilience are complementary and must both be accounted for in modern computing architectures. By interlocking the principles and components of traditional disaster recovery and modern cyber resilience, enterprises can ensure both data availability as well as data confidentiality and integrity. Modern architectures that address CIA are complementary composite architectures. Cyber resilience overlays traditional disaster recovery to achieve the ultimate in flexibility relative to preservation of CIA through full control of both RPO and RTO objectives of the enterprise.

About Enclaves

Modern resilient architectures consist of what we call Enclaves. Enclaves are logically air-gapped and semi-isolated areas within an OCI tenancy that are separated using compartments, VCNs, OCI IAM policies, and the OCI Services Network as logical air-gapping constructs and mechanisms.

Oracle recommends that you implement VCNs as part of a hub-and-spoke, DRG-connected topology that routes through a firewall service such as the OCI Network Firewall to facilitate zero-trust networking across the various enclaves. Another mechanism by which Zero-Trust Identity can be adhered to is through the use of separate OCI Identity Domains to provide grand access to the various enclaves. The cyber-resileince reference architecture diagram shows a working example of how this architecture functions in practice using the OCI Personas pattern. Between this Zero-Trust Identity pattern and the tight control of a minimal number of break-glass OCI Tenancy Administrator accounts, access related security can be locked down on a strictly "need-to-know" basis within an OCI Tenancy.

An enclave is represented by a compartment located in either the same region as the production workload or in a second region. For example, in the reference architecture diagram, both the backup and vault enclaves are compartments within Region 1 where the production workload runs. Region 2 contains a second vault enclave along with the Clean Room and Safe Production Restore enclaves; each represented by it own primary compartment. The OCI Services Enclave is an air-gapped enclave where the Oracle backup and recovery services reside that is isolated from the customer tenancy.

The following resources are accounted for in an enclave:

  • Oracle Cloud Infrastructure Object Storage
  • Oracle Cloud Infrastructure Compute
  • Oracle Database
  • Oracle Cloud Infrastructure Networking
  • Oracle Cloud Infrastructure File Storage
  • Block Storage including Oracle Cloud Infrastructure Block Volumes
  • Malware and integrity checking solutions

The following are the different types of enclaves:

  1. Production enclave: The Production workload runs here and its structure is based on the Deploy a secure landing zone that meets the CIS Foundations Benchmark for Oracle Cloud solution linked in the Before You Begin section. The Backup and Vault compartments are key to this architecture.
  2. Backup enclave: The Backup compartment contains all unstructured Object, Block, and File Storage backups. This compartment has restrictive OCI IAM policies that control access to and prevent deletion of the backups.
  3. Production Vault enclave: The Production compartment contains replicated copies of the storage backups where these copies are rendered immutable. The backups are more secured with very restrictive OCI IAM policies derived from a separate identity domain that provide an administrative air gap from production identities.
  4. Secondary Region Vault enclave: Contains cross-region replicas of the Production vault storage backups that are access controlled with another separate identity domain.
  5. Clean Room enclave: All the backup automation testing is done here along with the inspection of the unstructured data for malware and corrupted data. Data integrity and malware detection for databases is built into the Oracle backup and recovery services. It is also where the database backups taken by the Oracle Backup and Recovery services are mounted and tested for final data integrity and operational readiness. Known good copies of all data backups are stored in this enclave waiting for a recovery event.
  6. Safe Room Production enclave: A production-equivalent environment can be implemented using IaC and potentially the Full-Stack Disaster Recovery service for the application tier, and where the known good backups are set up for testing and operational readiness.

    Note:

    This is not intended to be a cold disaster recovery environment, but rather one where continuous recovery can be tested; potentially resulting in an instant-on new production environment that can be used when the production environment is impaired due to an attack.
  7. OCI Services enclave: Includes air-gapped managed services including Oracle Database Zero Data Loss Autonomous Recovery Service (ZDLRS) and Oracle Database Autonomous Recovery Service.

The following shows the reference architecture for implementing cyber resilience in OCI.



oci-tenancy-cyber-resilience-capabilities-oracle.zip

Recommendations for Backup and Vaulting

When implementing, consider these backup and recovery options. When looking at backups you'll want to take RPO into consideration. Protection of the backup infrastructure is also of paramount importance to facilitate resilience of the backup infrastructure and data.

Backup Infrastructure

Object Versioning and Cross-Region Replication

Object versioning should be enabled if there will be multiple versions of an object with the same name and each version of the document must be preserved for change control. Retention rules should be used when there is only one version of an object, such as a compressed backup with a specific date and time that is unique in nature and doesn't need to be updated.

When Object Versioning is enabled for a bucket, any changes to a file when uploaded are captured as a separate individual version of the document retaining previous versions indefinitely. Source buckets with object versioning can be configured to replicate with target buckets in another region. However, the individual versions of the document are not replicated and only the latest version of the document is replicated to the target region.

Retention Rule Locks and Cross-Region Replication

Retention rules and retention rule locks can be enabled in source buckets to ensure that the contents of the bucket remain immutable for a period of time as specified in the retention rule and it's associated retention rule lock assuming one is defined. The Retention rule time specifies the length of time the file must be retained. The retention rule lock makes this immutable by locking the retention period from any type of modification.

Note:

Object Versioning and Retention Rules are mutually exclusive configurations for a given bucket. You can configure buckets with either Object Versioning or Retention Rules, but not both simultaneously.

Use the following backup controls as a starting point for the cyber-resilience pillar:

OCI Object Storage: In OCI Object Storage, an immutable bucket is a storage location governed by time-bound retention rules that protect data from modification or deletion during a specified duration. Enabling object versioning of bucket storage is another way to configure immutability in OCI Object Storage.

OCI File Storage: OCI File Storage now natively supports policy-based snapshots. You can create point-in-time copies of data in snapshots to protect against accidental or unintended file deletions and modifications and take as many snapshots as you need. You can create policy-based snapshots with cloning and file system-consistent File Storage replication to automatically create, replicate, and maintain snapshots at a different geographical location. You can create file system clones based on a policy-based snapshot to provide a separate writable copy of data at the source or target side of the replication.

OCI Boot and Block Volumes: Snapshots and replications within OCI can be leveraged to facilitate boot and block volume backups.

Oracle Backup and Recovery Service: Review and consider using Oracle-native backup services for:

  • Attached NVMe devices
  • OCI Block Volumes
  • OCI Boot Volumes
  • OCI Object Storage
  • OCI File Storage
  • OCI bare metal and virtual machine database systems
  • Oracle Autonomous Database

The OCI Cyber-Resilience architecture represents the full breadth and depth of the capabilities and best practices associated with addressing data integrity threats across structured and unstructured data in OCI. As such, it can be overwhelming to determine where to start and what the journey looks like.

Network Architecture/VCN deployments Variations Across Regions

In the reference architecture, certain compartments are represented with VCN associations in a particular region. In general, compartments are tenancy-wide constructs, span multiple regions, and can exist in every region of the tenancy. The Cyber-Resilience reference architecture denotes certain VCNs and their compartment associations as region specific. This is intended to be a starting point for the design. As a Cyber-Resilience architect for your tenancy, you have the flexibility of deploying VCNs for the various Cyber-Resilience enclaves in any region per your business requirements. Regardless of which region you place them in, ensure that the VCNs are isolated spokes of a hub-and-spoke network architecture in a given region and are access-controlled through a firewall appliance or service such as OCI Network Firewall.

Recommendations for Recovery Controls

Recovery emphasizes the RTO and addresses the speed of recovering to a known good RPO. Learn about recovery automation and orchestration techniques, tooling, and best practices.

The Recovery process can be surgical in nature (a particular block volume) or all encompassing such that multiple block volumes, database backups, file storage snapshots, and object storage repositories are orchestrated across the gates of the various enclaves as defined in the OCI Cyber-Resiliency architecture. These enclaves include production, vaults, clean room, and safe production, and may span regions. The recovery process is commensurate with the associated backup process used for the resource.

Oracle Zero Data Loss Recovery Appliance: Implement incremental forever backups using the ZDLRA "incremental forever" recovery strategy. This reduces production processing to a minimum by transmitting only the changed data. The ZDLRA appliance archives the recovery logs. Vault the archive logs to immutable backups such as ZFS appliances or in the cloud to OCI Object Storage which is configured to use object versioning and/or retention rule locks.

Oracle Database Autonomous Recovery Service: Configure the Oracle Database Autonomous Recovery Service for your tenancy by creating groups and users for the service along with policies configured to allow users to manage and use the service respectively per their job function. Configure a private subnet for the recovery service and then review protection policies for database backup retention.