Implement a cyber recovery solution on Oracle Cloud Infrastructure

Cyber security has become an increasingly critical topic as malware and ransomware attacks continue to occur around the world. For mission-critical databases, such attacks leading to lost data and system downtime can have far ranging impacts throughout the business in terms of revenue, operations, reputation, and even penalties.

You can implement a solution for protecting your Oracle applications hosted on Oracle Cloud Infrastructure (OCI), from cyber attacks by creating and storing immutable backups. In the event of a cyber attack, such as a ransomware encryption attack, the immutable backups can be used to restore the Oracle application to a previous state, allowing for minimal disruption to business operations.

Architecture

This reference architecture shows how you can implement an automated backup and restore solution that is deployed on OCI.

Oracle E-Business Suite application (EBS) is used as an example for this solution but you can easily adopt it for other Oracle applications.

The following diagram illustrates this reference architecture.



oci-cyber-recovery-arch-oracle.zip

  1. To deploy this reference architecture, create a new OCI tenancy (named Cyber Recovery Tenancy) to store immutable copies of backups.
    1. The Cyber Recovery Tenancy will provide a platform for safely testing and recovering systems in the event of a cyber attack.
    2. The Cyber Recovery Tenancy will be "air-gapped" from the Production network, with inbound connections only permitted during a Cyber Recovery invocation or periodic test.
    3. The OCI Cyber Recovery Tenancy will differ in terms of access management, without integration with Active Directory. All data copied to the OCI Cyber Recovery Tenancy will be initiated from within the tenancy and copied via OCI's network backbone ensuring complete security.
  2. Back up all production entities (Databases, Servers, File Storage, and so on) to a set of Cyber Recovery Object Storage Buckets within the Production Tenancy, in accordance with a defined backup schedule.
    1. The Cyber Recovery Object Storage Buckets must have their retention and lifecycle policies set in alignment with your requirement (for example, there will be a daily cycle bucket with its own retention policy).
    2. The Cyber Recovery Object Storage Buckets must have permissions set, such that they can be read by a synchronization script initiated from the Cyber Recovery Tenancy.
    3. Repurpose existing production backups with additional backup solutions utilized when necessary.
    4. Different storage techniques should be deployed in the Production Tenancy, each with a different technical solution but following the same logical sequence of creating a Cyber Recovery data set.
  3. The synchronization script from the Cyber Recovery Tenancy’s CRS compartment will run on a regular basis (for example, every 15 minutes during the backup window).
    1. Apply the admin policy so that access to the Production Tenancy is permitted.
    2. Copy any new objects identified in the Production Cyber Recovery Object Storage Buckets into an equivalent Cyber Recovery Tenancy Object Storage Buckets.
    3. Disable the admin policy in order to prevent access to the Production Tenancy.
    4. Data will be copied to the Cyber Recovery Tenancy via OCI's network backbone, and all inbound access will be restricted to OCI Console access.
  4. Once an object has been synchronized to the Cyber Recovery Tenancy, its lifecycle is no longer linked to its Production equivalent. For example, in the unlikely event (for example, a mismatch in lifecycle policies), a Production backup object is deleted, this WILL NOT result in the deletion of the Cyber Recovery Tenancy backup object.
  5. The Cyber Recovery object bucket will be placed in CRS compartment that has a Security Zone policy to prevent unauthorized changes to it.
  6. Set Permissions on the Cyber Recovery Tenancy Object buckets such that it can only be read by authorized users within the tenancy (permissions must be set such that only the synchronization script can write to it).
  7. Set retention policies within the Cyber Recovery object buckets to manage the object lifecycle for deletion (for example, objects will be deleted once they are 15 days old) and archival (for example, archive objects which are 10 days old).
  8. When a new object is created in the Cyber Recovery object bucket, it will be scanned for viruses and its sha2/digital signature checked to ensure the backup file is untampered. Consider performing sha2/digital signature check on the database backups depending on backup size and Recovery point objective (RPO) / Recovery time objective (RTO) of the solution.

Note:

You can automate most of the process using Terraform scripts, Ansible playbooks, Shell and RMAN scripts which are described in the Deploy section of this reference architecture.

The following diagram illustrates a sample deployment topology for the Cyber Recovery Tenancy.
Description of oci_cyber_recovery_deploy.png follows
Description of the illustration oci_cyber_recovery_deploy.png

oci-cyber-recovery-deploy-oracle.zip

In this example, we are using the same IP ranges as production application VCN in EBS_BlueRoom and EBS_RedRoom. This approach has been chosen to allow automated testing of backups with minimal changes required to restore the application in EBS_RedRoom. This dictates the network design, where both EBS_BlueRoom VCN and EBS_RedRoom VCN cannot be connected to the DRG at the same time.

Segregation of tenancy ownership can be considered with:

  1. A dedicated team supporting the Cyber Recovery Tenancy (who does not have access to the Production Tenancy).

    and

  2. A production application support team with access only to the red room compartments during quarterly application testing exercise. This team can access CRS network from the corporate on-premises network.

The architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

    Users (both local and federated) are added to one or more groups, which are in turn attached to IAM policies governing the access to the OCI assets per Compartment.

    Compartments provide the ability to organize and isolate resources in an Oracle Cloud Infrastructure tenancy. They will play an important role in setting a foundation for deploying new workloads into the tenancy. Although they may appear to have the nature of logical grouping of OCI resources, they serve as policy enforcements points, thus they have paramount importance with respect to the tenancy’s security.

    Compartments can be deployed according to a functional, operational or project hierarchy. This allows to maintain an isolation between resources for different roles, functions and organizational hierarchy. A compartment hierarchy can have up to 6 levels, based on the requirements. Access control is defined by policies.

    Each compartment should have specific permissions assigned to the related groups. As a general rule, users will not be able to elevate their permissions to other compartments. The following Compartment hierarchy will be used in the Cyber Recovery Tenancy to provide separation of concerns between applications and environments.



    oci-cyber-recovery-compartments-oracle.zip

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Load balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Cloud Guard

    You can use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • FastConnect

    Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.

  • Local peering gateway (LPG)

    An LPG enables you to peer one VCN with another VCN in the same region. Peering means the VCNs communicate using private IP addresses, without the traffic traversing the internet or routing through your on-premises network.

  • Exadata DB system

    Oracle Exadata Database Service enables you to leverage the power of Exadata in the cloud. You can provision flexible X8M systems that allow you to add database compute servers and storage servers to your system as your needs grow. X8M systems offer RoCE (RDMA over Converged Ethernet) networking for high bandwidth and low latency, persistent memory (PMEM) modules, and intelligent Exadata software. You can provision X8M systems by using a shape that's equivalent to a quarter-rack X8 system, and then add database and storage servers at any time after provisioning.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    The production EBS and SOA systems are hosted within a customer managed OCI compartment, with the security posture shared by the customer. The intention is to replicate the security posture of the environment, including security lists and network topology, in the Cyber Recovery Tenancy. Access management in the Cyber Recovery Tenancy will differ from the production tenancy, with no IAM integration with the customer's Active Directory solution.

    Access to the clean rooms, Blue and Red, will be restricted and all ingress will be routed via Bastion/Jump servers which are dedicated to that role. This access will only be enabled during a test window and will be removed once testing is completed in Red Room while Blue Room can be accessed daily. Access to the Blue room will be restricted to the nominated Cyber Recovery Managed Service Partner, while access to the Red room will be restricted to the nominated Managed Service Partner and key staff.

    Inbound Access via the IPsec/VPN Tunnel to the Cyber Recovery Tenancy will be available for Blue Room on a daily basis, whereas access to Red Room will be completely blocked during normal operation and will only be opened to predefined IP addresses during a Red Room testing event.

    Outbound access via the IPsec/VPN Tunnel from the Cyber Recovery Tenancy will be completely blocked. Access from the Cyber Recovery Tenancy to the Production Tenancy will only be in place during the synchronization of backup files and outside these periods this access will be prevented by removal of the policy. There will be no inbound or outbound internet access to the Cyber Recovery Tenancy.

  • Cloud Guard

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Network security groups (NSGs)

    You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists, because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.

  • Load balancer bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

Considerations

Consider the following points when deploying this reference architecture.

  • Performance

    Object storage transfer: Production databases and compute backups can have significant size. Test how long it will take to transfer a daily backup between the tenancies. Consider deploying the CRS Tenancy in the same OCI region as the Production tenancy for best data transfer performance.

    Anti-virus and signature scanning in CRS tenancy: As a part of CRS process, anti-virus and signature scan of the backup files can be included. Sufficient resources should be provided on the orchestration server to complete the scans in a timely manner.

  • Security

    Configure Oracle Cloud Guard / Maximum Security Zones to monitor if the object storage policies are in place and match the Cyber Recovery Security posture to ensure that any attempt to alter this is prevented.

    Additional security measures should implemented to protect the cloud environment and backup solution from potential attacks.

  • Availability

    This solution can be deployed in any OCI region. High Availability could be achieved by deploying redundant orchestration and Bastion servers.

  • Cost

    Consider the following elements when estimating the cost:

    • Object storage: The solution assumes storage of daily database and block volume backups of the production environment for multiple days.
    • Compute cost: Testing backups requires starting up compute and database resources.
    • Software license cost: If the application test is included, then the appropriate software license must be obtained for the application services that will be started during the test.

Deploy

You can deploy this reference architecture by downloading the code from GitHub and customizing it as per your specific requirement.

  1. Setup the OCI CIS Landing Zone. For more information, see the Deploy a secure landing zone that meets the CIS Foundations Benchmark for Oracle Cloud link in the Explore More section.
  2. Provision two VM's as Orchestration server in Production Tenancy and Cyber Recovery Tenancy to execute the scripts.
  3. Go to GitHub.
  4. Clone the Production script (Ansible scripts) from the repository to the Production orchestration server.
  5. Clone the Cyber Recovery script (Ansible, Terraform and Shell scripts) from the repository to the Cyber Recovery Tenancy orchestration server.
  6. Follow the instructions in the README document to execute the backup script (Production Tenancy), sync and restore (Cyber Recovery Tenancy).

Acknowledgments

  • Authors: Grzegorz Reizer, Peter Deakin, Anand Singh
  • Contributors: Bala Sunil, Elwyne Mabanglo, Bhaskar Ivaturi, Hiren Mehta, Chandan Raychaudhury