Design an Oracle Hyperion EPM System with an OCI Full Stack Disaster Recovery topology

Oracle Cloud Infrastructure Full Stack Disaster Recovery Service orchestrates the transition of compute, database, and applications between Oracle Cloud Infrastructure (OCI) regions from around the globe with a single-click. You can automate the steps needed to recover one or more business systems without redesigning or rearchitecting existing infrastructure, databases, or applications. The disaster recovery (DR) strategy employs a comprehensive replication of both boot and block volumes for the application and Oracle Data Guard for database from the production environment to the standby site, greatly simplifying the configuration of the standby location. This method aligns with the DR guidelines outlined in the Oracle Enterprise Performance Management System Deployment Options Guide, which adheres to the recommendations for disaster recovery provided for Oracle. Oracle Fusion Cloud Enterprise Performance Management (Oracle Cloud Enterprise Performance Management) and Oracle Hyperion Enterprise Performance Management System (EPM) are interchangeably used in this architecture.

Architecture

This architecture shows a full-stack disaster recovery (DR) architecture for an Oracle Enterprise Performance Management (EPM) system across two OCI regions: a primary region and a standby region. Each region contains virtual cloud networks (VCNs), load balancers, virtual machines, boot volumes, block volumes, file storage, and databases.

The following diagram illustrates this reference architecture.

Description of epm-fsdr-architecture.png follows
Description of the illustration epm-fsdr-architecture.png

epm-fsdr-architecture-oracle.zip

Key features include:

  • Cross-Region Replication: Boot volumes, block volumes, and file storage are replicated across regions to ensure data synchronization.
  • Data Guard: Databases use Oracle Data Guard for continuous data replication, ensuring that the standby region has an up-to-date copy of the primary database.
  • Remote Peering: DRGs (Dynamic Routing Gateways) in both regions are connected through remote peering, allowing for network traffic and resource connectivity between regions. This setup enables a robust disaster recovery solution, ensuring high availability and business continuity for Oracle EPM systems.

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Remote peering

    Remote peering allows the VCNs' resources to communicate using private IP addresses without routing the traffic over the internet or through your on-premises network. Remote peering eliminates the need for an internet gateway and public IP addresses for the instances that need to communicate with another VCN in a different region.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Load balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.

  • Application server

    Application servers use a secondary peer that, like the database, will take over processing in the event of a disaster. Application servers use configuration and metadata that is stored both in the database and the file system. Application server clustering provides protection in the scope of a single region but ongoing modifications and new deployments need to be replicated to the secondary location on an ongoing basis for a consistent disaster recovery.

  • Block volume

    With Oracle Cloud Infrastructure Block Volumes, you can create, attach, connect, and move storage volumes, and change volume performance to meet your storage, performance, and application requirements. After you attach and connect a volume to an instance, you can use the volume like a regular hard drive. You can also disconnect a volume and attach it to another instance without losing data.

  • File storage

    The Oracle Cloud Infrastructure File Storage service provides a durable, scalable, secure, enterprise-grade network file system. You can connect to a File Storage service file system from any bare metal, virtual machine, or container instance in a VCN. You can also access a file system from outside the VCN by using Oracle Cloud Infrastructure FastConnect and IPSec VPN.

  • Object storage

    Oracle Cloud Infrastructure Object Storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Compute

    With Oracle Cloud Infrastructure Compute, you can provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.

  • Data Guard

    Oracle Data Guard and Oracle Active Data Guard provide a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases and that enable production Oracle databases to remain available without interruption. Oracle Data Guard maintains these standby databases as copies of the production database by using in-memory replication. If the production database becomes unavailable due to a planned or an unplanned outage, Oracle Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Oracle Active Data Guard provides the additional ability to offload read-mostly workloads to standby databases and also provides advanced data protection features.

  • DNS

    Oracle Cloud Infrastructure Domain Name System (DNS) service is a highly scalable, global anycast domain name system (DNS) network that offers enhanced DNS performance, resiliency, and scalability, so that end users connect to customers’ application as quickly as possible, from wherever they are.

  • Oracle Base Database Service

    Oracle Base Database Service is an Oracle Cloud Infrastructure (OCI) database service that enables you to build, scale, and manage full-featured Oracle databases on virtual machines. Oracle Base Database Service uses OCI Block Volumes storage instead of local storage and can run Oracle Real Application Clusters (Oracle RAC) to improve availability.

  • Full Stack Disaster Recovery Service

    Oracle Cloud Infrastructure Full Stack Disaster Recovery Service is an OCI disaster recovery orchestration and management service that provides comprehensive disaster recovery capabilities for all layers of an application stack, including infrastructure, middleware, database, and application.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure (OCI) proactively. Oracle Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Oracle Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe and denies operations that violate any of the policies.

  • Cloud Guard

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Security Zones

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Network security groups (NSGs)

    You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists, because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.

  • Load balancer bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

  • DNS resolution

    By default, the internet and VCN resolver does not let instances resolve the host names of hosts in your on-premises network connected to your VCN by site-to-site VPN or OCI FastConnect. That functionality is achieved either by using a custom resolver or by configuring the VCN’s private DNS resolver.

Considerations

Consider the following points when deploying this reference architecture:

  • Compute Instances

    This OCI Full Stack Disaster Recovery Service architecture uses moving compute instances. In general terminology, moving instances are called Cold virtual machine (VM) or Pilot Light DR topology. Application VMs are deployed only in the primary region. During DR runtime, VMs are created at the standby region. The Oracle DB system with Oracle Data Guard must be created in the primary and standby region. Before implementing the OCI Full Stack DR solution, the primary Oracle Hyperion Enterprise Performance Management System must be installed and fully configured in one OCI Region.

  • Protection Groups

    Create two OCI Full Stack Disaster Recovery Service Protection Groups, one in each region. They should include: DB, OCI Compute, block storage, file system storage, and load balancer.

  • Load Balancer

    The Load Balancer in the standby region must be created manually, but not configured. OCI Full Stack Disaster Recovery Service will copy over the load balancer configuration from primary to standby during failover.

  • Performance

    When planning the RPO and RTO, consider the time required for storage backups to be replicated across regions.

  • Availability

    You can leverage custom DNS domain settings to redirect client traffic to the new production region after a failover. By updating the DNS entries to point to the IP addresses of the application hosts in the standby region, client requests will automatically be routed to the newly active region. This ensures seamless redirection of traffic without needing manual intervention at the client side, minimizing downtime and maintaining service availability during and after the failover process.

  • Database

    Your source databases are synchronized using Oracle Data Guard, which ensures continuous replication between the primary and standby databases. During a failover, OCI Full Stack Disaster Recovery Service automatically handles the role switch, promoting the standby database to become the new primary. To ensure smooth failover and application continuity, both the primary and standby databases must use the same database service name. This allows applications and services to seamlessly connect to the new primary database after failover without requiring any changes to connection configurations, reducing downtime and complexity during the recovery process.

  • Compute

    After a failover, the IP addresses of the application layer hosts in the standby region need to be mapped to the original host names from the production region. This ensures that any systems, users, or services trying to connect using the original production host names are redirected to the corresponding hosts in the standby region, now functioning as the new active environment. By updating the DNS records or reconfiguring any relevant network settings to point to the new IP addresses in the standby region, the transition becomes seamless, allowing for minimal disruption to application availability and user access.

Explore More

Learn more about Oracle Hyperion Enterprise Performance Management System, Oracle Cloud Infrastructure (OCI), and OCI Full Stack Disaster Recovery Service, see the following resources:

Review these additional resources:

Acknowledgments

  • Author: Grzegorz Reizer - EPM Specialist, OCI Specialist Team