Learn About Cyber Resilience Pillar in OCI

Cyber resilience is the evolution and extension of traditional backup and recovery. In the event of a cyber attack, cyber resilience anticipates that the backup and recovery environment will also be attacked. The integrity of the data backup is suspect and must be validated before restoring your data to a production environment.

You can use OCI native control features to protect your tenancy. Alternatively, you can use third-party backup and recovery vendors. Oracle recommends that you have both operational backups and immutable backups to complement your standard backup and recovery runbooks.

Use the OCI cyber resilience reference architecture as a template to ensure business continuity during data integrity threats and breaches, and to complement and enhance existing or standard disaster recovery architectures.

The following shows the reference architecture for implementing cyber resilience in OCI:



cyber-resilience-mandatory-arch-oracle.zip

This architecture showcases a Production enclave which consists of Network, Application, and Database compartments. The Vault enclave contains a single compartment which hosts the immutable backups for unstructured data. Inside the Vault enclave we have an Immutable Object Storage bucket, an Orchestration server, and worker nodes. The Orchestration server coordinates the backup process by finding all the resources that should be backed up and then requesting the worker nodes to do the actual backup operations. Data can be copied from multiple Application compartments into the bucket in the Vault compartment.

The OCI Vault enclave is used for storing unstructured data, which includes local application data within Virtual Machines and/or data stored on an NFS share using OCI File Storage.

For structured data which includes Oracle databases, the Oracle Database Zero Data Loss Autonomous Recovery Service provides immutable backups and ransomware protection.

About Enclaves

Modern resilient cloud architectures use a concept called Enclaves.

An enclave is a logically air-gapped and semi-isolated area within your Oracle Cloud Infrastructure (OCI) tenancy. Enclaves are separated using compartments, VCNs, OCI IAM policies, and the Oracle Services Network, creating logical boundaries for security and management. An administrative air gap between enclaves allows you to use separate Identity Domains, isolate VCNs, or separate tenancies.



There are three types of enclaves commonly used:

  1. Production enclave: The Production enclave compartment consists of one or more compartments that host your production workloads. The structure is based on the Deploy a secure landing zone that meets the CIS OCI Foundations Benchmark reference architecture linked in the Before You Begin section. Your operational backups should be readily accessible in the Production enclave. For normal backup and recovery operations this should meet most requirements for low latency and quick recovery utilizing native features. Canary data could be monitored for observation in the production enclave as well. Use the Production enclave for:
    • Storing and accessing operational backups
    • Implementing defense in depth
    • (Optional) Observing canary datasets
  2. OCI Vault enclave: The OCI Vault enclave compartment stores all your unstructured Object, Block, and File Storage backups in an "immutable" state to prevent them from being altered or deleted. The Vault enclave network is isolated and has no direct connectivity with the Production enclave. It is secured with very restrictive OCI IAM policies derived from a separate identity domain that provides an administrative air gap from production identities.

    All backup automation testing and malware or corruption inspection for these backups is performed here. Known good copies of all unstructured Object, Block and File Storage backups are stored in this enclave waiting for a recovery event. Data integrity and malware detection for databases is built into the Oracle Database Zero Data Loss Autonomous Recovery Service.

    Database backups are immutably stored in a separate, OCI controlled tenancy with no direct access from the Production enclave, except for restore operations. Database backups can be mounted and tested for final data integrity and operational readiness. Use the OCI Vault enclave for:

    • Storing periodic backups
    • Immutable vaulting
    • Automating backup testing
    • Detecting data corruption in backups
    • Detecting malware in backup data
  3. Safe Restore enclave (Optional): The Safe Restore enclave is also separated from the OCI Vault enclave with another administrative air gap using a separate Identity Domain and tenancy than your Production enclave. Safe Restore enclaves can be created using Infrastructure as Code tools like Terraform to quickly deploy a production-equivalent environment. Here, you continuously restore known-good backups to verify operational readiness, guided by your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) values.

    In the event of a major incident, you could temporarily use the Safe Restore enclave as your new production environment until threats are removed from your original production space. Use the Safe Restore enclave for:

    • Ongoing incremental restores of backup data
    • Testing backups to meet RTO and RPO objectives
    • Rapidly switching to a new production environment if needed
    • Scaling the environment from minimal to full production using Terraform
    • Cutting back to the original production environment after forensic analysis and recovery

    Caution:

    The Safe Restore enclave is not intended as a traditional cold disaster recovery site. Instead, it allows continuous testing of your recovery strategy and provides the ability to instantly create a new Production environment if the main one is compromised. If your Production enclave is attacked by ransomware, law enforcement may need to investigate and gather forensics leading to unexpected downtime and thus delaying your recovery efforts. If your organization can't afford downtime for critical applications, consider implementing a Safe Restore enclave.

Recommendations for Backup and Recovery Operations

When planning to roll out technology solutions, keep in mind that there are multiple teams must collaborate to coordinate the backup and recovery process. While most backup actions are typically automated, recovery can be more tedious and require human intervention. Organizations may develop a runbook containing standard operating procedures (SOPs) to follow when specific actions are needed. In this solution playbook, the scope of Backup and Recovery is limited to the current region where the primary workload exists. Disaster recovery solves for availability and focuses on recovering a backup from a known good or untampered pristine copy.

For OCI native virtual machines, Oracle recommends that you create custom images periodically and export them to an immutable Object Storage bucket. This allows you to rebuild and restore the boot volume. Consider developing a runbook in the event of a total virtual machine failure. If your operational backups fail to restore the virtual machine, you should test the process of rebuilding a custom image from your immutable bucket. Review the Importing and Exporting Custom Images page in OCI documentation.

With operational backups, you can usually restore safely back into production. If you suspect a cybersecurity incident, you will need to restore to the OCI Vault compartment or Safe Restore area where you have implemented controls in the zero-trust domain. After you inspect the restored data, you must scan and mitigate any remaining security risks on the virtual machine or raw data (such as malware, viruses, and so on).

After you verify that no cybersecurity risks remain, restore the untampered pristine copy back to production. Document, test, and validate this process at least twice a year. In OCI, you will have lots of boot, block, file and other unstructured data. Catalog all the associated resource mappings, such as drive mappings, mount points, compute instances and other raw objects in OCI. Use third-party backup products, open-source tools, and/or the OCI CLI to help you snapshot the associations at a given point and time. Recording this data can help you answer critical questions and determine a course of action. For example, if a block volume failed restoration, identify the virtual machines that are in a degraded state.

Summary of Backup and Recovery Operations Controls

  • BR-1: Backup OCI Custom Image to immutable bucket.
  • BR-2: Implement Backup and Restore operations to OCI Vault or Safe Restore area.
  • BR-3: Catalog associated resource mappings (such as drive mappings, mounts, compute instances, and so on).
  • BR-4: Create an OCI Vault enclave and/or Safe Restore area environment.

Recommendations for Immutability

Immutable backup data can't be modified or deleted during the customer-defined retention period. To implement the cyber resilience architecture, Oracle recommends that you have both operational backups and immutable backups. Use Operational backups for normal backup and recovery operations. In the event of data corruption, malware, or other cyber risks, the immutable backup is your pristine copy which is free of data corruption or tampering.

Even when your backups are immutable, it is possible that the backup source data contains malicious code or malware. When restoring data from operational or immutable backups, consider using an OCI Vault or Safe Restore environment to validate that the backup is clear of any cyber threats and to prevent any further damage to your production environments.

For most OCI databases, such as Oracle Base Database Service, Oracle Exadata Database Service on Dedicated Infrastructure, and Oracle Autonomous Database on Dedicated Exadata Infrastructure, Oracle recommends using the Oracle Database Zero Data Loss Autonomous Recovery Service which offers a fully managed data protection running on OCI. The Recovery Service has automated capabilities to protect Oracle Database changes in real time, validate backups without production database overhead, and enable a fast and predictable recovery to any point in time. When you enable real time data protection, you can recover a protected database within less than a second of when an outage or ransomware attack occurred. The Recovery Service includes Immutability and Anomaly Detection built into the platform which gives you visibility into the status of your backups and can be configured to send you alerts to notify you about issues that may affect your ability to recover.

You can also use Oracle Autonomous Database Serverless which natively supports immutable backup retention. Ensure you turn the feature on.

OCI Object Storage can implement WORM-compatible (Write-Once, Read-Many) immutability controls that prevent your data from being modified or deleted. Features such as Object Storage retention rules define how long the data must be retained before it is allowed to be deleted. After the retention period, you can use Object Storage Lifecycle policies to archive or delete your data. Oracle recommends that you test the backup process. After you are convinced that the retention period meets your business requirements, you must lock the retention rule, to prevent any further modification by Tenancy administrators. There's a mandatory 14-day delay before a rule can be locked. This delay lets you thoroughly test, modify or delete the rule, or the rule lock before the rule is permanently locked.

Caution:

Locking a retention rule is an irreversible operation. Even a tenancy administrator or Oracle Support can't delete a locked rule.

Virtual Machines are a combination of boot volumes and block volumes in OCI. To protect your OCI boot volume, create a custom image of your Virtual Machine and then export the custom image (.oci is the default format, but .qcow2 or other formats are supported) into an OCI Object Storage bucket.

Any critical data sitting on a block volume should be backed up using a custom script into the Immutable Object Storage bucket.

OCI File Storage allows users to create snapshots, but those snapshots are not immutable by default as any OCI administrator with the right IAM privileges can delete a snapshot. To protect OCI File Storage, Oracle recommends that you copy the data directly into an Immutable bucket on a periodic basis.

Summary of Immutability Controls

  • IM-1: Configure an immutable bucket for unstructured data.
  • IM-2: Protect your data using Oracle Database Zero Data Loss Autonomous Recovery Service for OCI databases.
  • IM-3: If using OCI File Storage, copy OCI File Storage to an immutable Object Storage bucket.

Recommendations for Zero-Trust Security Controls

To implement zero-trust security, Oracle recommends that you evaluate tenancy controls that:
  • Restrict Identities and Permissions: Limit identities (IAM domains, groups, users, and policies) who can access your backups and their permissions
  • Strengthen Network Segmentation: Reevaluate network segmentation along with implementing virtual air gaps for immutable backups.

Combine these two concepts to make it much more difficult for threat actors to get access to your data.

Compartment design is critical to implementing zero-trust security in the Cyber Resilience architecture. Create a nested compartment architecture with the Backup Compartment at the top level, and include at least two child compartments—for example, one for Immutable Vault backups and another for Safe Restore. This setup allows you to apply IAM policies closer to individual resources and enforce separation of duties.

For tighter access control, create specific users and groups to access your immutable Object Storage buckets. Based on your security requirements, you can further separate access by identity domain, compartment, users, groups, and IAM policies to restrict who can access a specific bucket. In existing tenancies where multiple groups may have access to a bucket, review and reduce access so that only backup storage administrators can manage backups.

Oracle Access Governance provides a Who has access to what - Enterprise-wide Browser page which tracks and monitors users who have access to different systems, data, and applications, their permission levels, and purpose of the access, to make informed decisions and detect potential security risks for effective governance. Use this information to ensure that your IAM policies align with the principles of separation of duties and least privilege.

If you operate virtual machines or other IaaS infrastructure critical to backup, consider adding them to an OCI dynamic group. This allows you to target these nodes with IAM policies that grant the necessary access to the backup storage tier.

Harden network access in your zero-trust environment. Follow these recommendations to prevent a restored virtual machine to reinfect production, reopen a security backdoor, or a command-and-control access point for attackers:

  • In your zero-trust environment, restrict network access where possible. For example, in an OCI Vault or Safe Restore enclave, avoid using a DRG which may allow malware to leak out to the rest of your OCI environment. Instead, consider using the OCI-managed Bastion service (or customer-managed jump hosts), private endpoint, or a service gateway to permit access to the OCI control plane.
  • Don't allow routing between your various backup networks. If you require network connectivity between your backup infrastructure, implement OCI Network Firewall and NSGs to only allow tightly-controlled traffic patterns. This creates a virtual network air gap between your production networks and backup compartments, and prevents restored virtual machines from reinfecting production environments or reopening vulnerabilities in the environment.

Summary of Zero-Trust Security Controls

  • ZT-1: Configure private, secure object storage buckets with IAM permissions limited to specialized recovery accounts. Optionally, leverage Oracle Access Governance to determine effective permissions on immutable buckets.
  • ZT-2: Apply robust network segmentation. Use Bastions, private endpoints, service gateways, NSGs, and network firewalls.
  • ZT-3: Enhance IAM by using dynamic group memberships and IAM policies for scripted immutable vaulting activities.
  • ZT-4: Design compartment structures as outlined in the OCI cyber resilience reference architecture. Use nested compartments to apply IAM policies close to resources and enforce separation of duties.

Recommendations for Threat Detection Controls

One of the biggest challenges in cybersecurity is detecting when a threat actor has infiltrated your environment. Even if you have implemented basic security controls—such as event logging and forensic analysis—it can still be difficult to determine if your cloud resources have already been compromised.

Consider using Cloud Security Posture Management (CSPM) tools to enhance cloud protection. Oracle Cloud Guard is a built-in CSPM tool in OCI that you can use to implement your cyber resilience architecture. There are also third-party solutions available that offer features like intrusion detection, anomaly detection, and alerting. For example, with OCI Cloud Guard, you can configure policies to prevent OCI Object Storage from being exposed as a public bucket on the internet. Additionally, your CSPM tool should monitor critical services, such as the Oracle Database Zero Data Loss Autonomous Recovery Service, and ensure it isn't disabled and that backups and backup policies remain secure. Configure your CSPM tool to verify if services such as the Oracle Database Zero Data Loss Autonomous Recovery Service were disabled or if attempts to disable backups, modify backup policies, and so on occurred.

Pair CSPM with endpoint security solutions to address both IaaS security policies and endpoint vulnerabilities. When you send audit logs, event logs, VCN flow logs, and other data to a third-party SIEM or XDR (Extended Detection and Response) platform, cloud administrators gain valuable insights for event correlation and advanced forensics.

For more information, see the Design Guidance for SIEM Integration on OCI linked in the Explore More section and the Overview of Security Best Practices in OCI Tenancy blog linked in the Before you Begin section.

Internal Honeypots

Another valuable threat detection tactic is deploying "internal honeypots"—decoy compute instances designed to attract malicious actors. These honeypots typically run services that are intentionally easy to detect or exploit, making them visible using common network scanning tools like NMAP. On a private network, no one should access these decoys, so any interaction is a strong indicator of suspicious behavior, such as threat actors searching for "file servers" or other targets. Both commercial and open-source honeypot solutions are available. In well-secured environments, there should be minimal suspicious activity detected by honeypots, making them a reliable early warning system and a way to validate your existing controls.

Note:

Don't deploy honeypots on instances with public IP addresses. Honeypots exposed to the internet are likely to be attacked, potentially introducing additional risk.

Canary Data

A threat detection technique, applicable to both structured data (such as database tables) and unstructured data (such as file servers). Like honeypots, canary data acts as a targeted trap. For example, you can create a dedicated table or inject specific canary records into a production database. If there is unexpected access, modification, or deletion of these records, it may indicate unauthorized or malicious activity—such as a threat actor trying to access or tamper with sensitive data like customer information or order details.

For file systems, canary data might involve monitored files or folders within NFS shares. Any unauthorized changes could signal a security compromise. Using canary data typically requires commercial or open-source third-party tools.

Summary of Threat Detection Controls

  • TD-1: Use Oracle Cloud Guard bucket policies to control public and private access, enable bucket logging, and activate threat detection rules.
  • TD-2: Implement canary data within both structured (such as databases) and unstructured (such as file storage) datasets to detect unauthorized access or tampering.
  • TD-3: Deploy internal honeypots with monitoring sensors close to backup and production systems to attract and identify potential threats.
  • TD-4: Integrate your environment's telemetry data with XDR/SIEM to enable comprehensive forensics and advanced threat analysis.