Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery
Part 1: Introduction
Oracle Cloud Infrastructure Full Stack Disaster Recovery (OCI Full Stack DR) orchestrates the transition of compute, database, and applications between Oracle Cloud Infrastructure (OCI) regions from around the globe with a single-click. Customers can automate the steps needed to recover one or more business systems without redesigning or rearchitecting existing infrastructure, databases, or applications.
This tutorial outlines the procedures for utilizing the OCI Full Stack DR service to manage switchover and failover processes for an Oracle Enterprise Performance Management system environment 11.1.2.x
and 11.2.x
within an OCI disaster recovery framework. It is important to note that the configuration of the system topology and other lifecycle activities (such as patching, testing, expanding and so on) fall outside the scope of OCI Full Stack DR.
The disaster recovery (DR) strategy employs a comprehensive replication of both boot and block volumes for the application and Oracle Data Guard for database from the production environment to the standby site, greatly simplifying the configuration of the standby location. This method aligns with the DR guidelines outlined in the EPM System Deployment Options Guide, which adheres to the recommendations for disaster recovery provided for Fusion Middleware.
Oracle Enterprise Performance Management (Oracle EPM) and Hyperion EPM are interchangeably used in the tutorial.
Oracle Enterprise Performance Management is Normally Part of a Larger System
This tutorial assumes that Oracle EPM is the only application being added to the DR protection groups. This is not normal.
This tutorial focuses exclusively on Oracle EPM System for the sake of clarity. In practice, Oracle EPM is typically a component of a larger business system that includes various services and applications within a single OCI Full Stack DR protection group and set of DR plans. You will likely follow similar Oracle Help Center (OHC) tutorials for other applications and services such as PeopleSoft,Oracle WebLogic Server,Oracle Analytics Cloud,Oracle Integration.
Caution About Implementing Incrementally
Adding more members to a DR protection group after creating DR plans will delete all existing DR plans in the protection groups at both regions.
OCI Full Stack DR is designed with the assumption that the entire application stack for a given business system is already deployed across OCI regions and manual DR has already been proven to work. If your business system includes more than Oracle EPM, then add all members for all other applications or OCI services to the DR Protection Groups before creating any DR plans.
How the Recovery Works
The recovery solution for Oracle EPM requires OCI Full Stack DR to execute a series of custom shell scripts during a recovery operation such as a failover or switchover. The scripts referenced in this tutorial are provided by the EMEA Cloud Architecture Specialists team and available in Oracle EPM GitHub repository specifically tailored for this Hyperion EPM DR solution. The scripts are downloaded to a compute instance that is part of the application stack that OCI Full Stack DR will manage during a recovery operation.
This tutorial explains how to download the scripts and how to use them in a later step.
The following scripts are provided for generic guidance. You can either use your own scripts or customize the scripts according to your corporate policy and security requirements.
Oracle EPM Deployment Architecture
In this tutorial, we will use moving instances topology for the Oracle EPM application. In general terminology, moving instances are called Cold VM/Pilot light DR topology. Application VMs are deployed only in the primary region. During DR runtime, VMs will be created at the standby region. Oracle DB system with Oracle Data Guard has to be created in primary and standby region. Before the OCI Full Stack DR solution can be implemented, the primary Hyperion EPM System must be installed and fully configured in one OCI region.
This design is based on the reference DR architecture for Hyperion on OCI, which can be reviewed in detail. For more information, see Design the infrastructure to deploy Oracle Enterprise Performance Management in the cloud.
Private OCI Load Balancer
Traffic from your internal and on-premises users flows through IPSec VPN tunnels or FastConnect virtual circuits to the dynamic routing gateway (DRG) that is attached to the VCN. A private load balancer intercepts the requests and distributes them to the private web tier.
The web tier is hosted on a compute instance that is attached to a private subnet.
Application Tier
All the compute instances in the application tier are attached to a private subnet. This isolation at the network level shields the applications from unauthorized network access and other resources in the topology.
The service gateway enables the private compute instances in the application tier to access a Yum and WSUS servers within the region to get operating system updates and additional packages. Additionally, the service gateway allows you to back up the applications to OCI Object Storage within the region, without traversing the public internet.
Data stored in block volumes and files system is replicated to a standby region using Cross-Region Replication (CRR).
Database Tier
Oracle Base Database Service hosts the EPM System schemas. Data is synchronized to the standby region using Data Guard.
Fig 1: Oracle EPM reference architecture
Becoming Familiar with the Entire Process
The EMEA OCI Specialist and OCI Full Stack DR engineering teams have created a series of companion videos for this tutorial to understand the entire process flow. These videos are part of a OCI Full Stack DR Oracle EPM playlist in YouTube that can be accessed using the following links:
- Video 1: Deploy Oracle EPM for Disaster Recovery
- Video 2: Automate Disaster Recovery operations for Oracle EPM using OCI Full Stack DR
- Video 3: Scripts used to Automate Recovery for Oracle EPM
Part 2: Step-by-Step Instructions
This part begins the step-by-step instructions needed to add Oracle EPM to OCI Full Stack DR.
Objectives
The following steps will be covered in this tutorial explaining how to automate recovery for Oracle EPM using Full Stack DR:
- Task 1: Deploy Oracle EPM for Disaster Recovery
- Configure a cross-region Oracle Data Guard for Oracle Base Database Service
- Prepare the DR Control Node to run custom automation
- Create block volume group
- Create Oracle Cloud Infrastructure Identity and Access Management (OCI IAM) policies for Full Stack DR
- Create OCI IAM policies for other OCI services
- Create OCI Object Storage buckets for logs
- Create standby load balancer (optional)
- Task 2: Create DR Protection Groups (DRPG)
- Task 3: Add members to Region 1 and Region 2 DRPGs
- Task 4: Create basic DR plans in Region 2 (Newport)
- Create switchover plan
- Create failover plan
- Task 5: Customize the switchover plan in Region 2 (Newport)
- Task 6: Customize the failover plan in Region 2 (Newport)
- Task 7: Execute the switchover plan in Region 2 (Newport)
- Task 8: Create basic DR plans in Region 1 (London)
- Create switchover plan
- Create failover plan
- Task 9: Customize the switchover plan in Region 1 (London)
- Task 10: Customize the failover plan in Region 1 (London)
Definitions and Assumptions throughout the Tutorial
Regions
Region 1 is London
- London will start out as the primary region. This role will eventually change to standby after you are instructed to perform a switchover in later steps.
Region 2 is Newport
- Newport will start out as the standby region. This role will eventually change to primary after you are instructed to perform a switchover in later steps.
Compartments
You are free to organize Oracle EPM and OCI Full Stack DR into any compartment scheme that works within your standards for IT governance. We have chosen to organize applications into their own individual compartments, then organize all DR Protection Groups into a single compartment where completely different business systems can all be seen at a glance.
DR Control Node
The DR Control Node is any compute instance that you designate to host custom scripts that perform specific tasks to recover the EPM System. The scripts are called by Full Stack DR during a recovery operation. Any existing compute instance that is a member of a DR protection group (DRPG) can be the control node. In this example, the EPM System application server hosts all the custom scripts used in the recovery process. In this tutorial, both the application node and control node are the same.
Prerequisites
- A fully configured EPM System should be deployed in one OCI region. It should utilize an Oracle Base Database Service with a license allowing the deployment of a cross-region Oracle Data Guard.
- VCN and subnets in the standby region. Mirroring the network topography from the primary region in the standby region is recommended. For cross-region Oracle Data Guard replication, VCN peering between primary and standby regions must be enabled; therefore, the VCNs must not have overlapping IP CIDR ranges.
Task 1: Deploy Oracle EPM for Disaster Recovery
OCI Full Stack DR is not involved in any part of this step.
Task 1.1: Configure a Cross-Region Oracle Data Guard for Oracle Base Database Service
To deploy cross-region Oracle Data Guard for Oracle Base Database Service, see Use Oracle Data Guard on a DB System.
When synchronizing databases in your system with Oracle Data Guard, it is crucial to use the same database service name or TNS alias for both the primary and standby databases. This practice minimizes the changes needed at the application layer after a switchover, ensuring a smooth transition. For detailed guidance, refer to the “Database Considerations” section in the Fusion Middleware DR documentation, where various approaches are thoroughly described.
Task 1.2: Prepare the DR Control Node to Run Custom Automation
Designate a compute instance to act as a DR Control Node for EPM Full Stack DR. This can be an existing compute instance, or it can be a compute instance created just for this purpose. See the options below for more details. Ensure the compute instance(s) acting as the DR Control Node has been configured to run commands using the Oracle Cloud Agent: Running Commands on an Instance.
It is best to use a movable compute instance, but you can also designate a non-movable compute instance in region 1 and another one in region 2 if you do not have any movable compute as part of your DR solution. You will need to maintain any changes you make to scripts or the guest OS in both regions if non-movable compute is used for this role.
Download to the DR Control Node the custom scripts from here: Oracle EPM Github scripts that were written specifically for this EPM System DR example. The scripts shown below should be copied to any subdirectory on the compute instance acting as the DR Control Node. In this tutorial, we are going to use EPM app node as the DR control node as well. The scripts were created specifically for this example of EPM System recovery and will have to be modified for use in your recovery solution. In this tutorial, EPM app is running on a Windows VM and hence the Powershell (ps1) scripts are used. If you use a Linux VM, shell scripts are available in the same github repository. Since we are using EPM VM for running the scripts, DR Control Node is referred to the same EPM VM.
Task 1.3: Create Block Volume Group
Create a block volume group in region 1 and ensure it is replicated in region 2. Ensure the boot volume for the DR Control Node is a member of a block volume group and the block volume group is replicated to region 2. For more information, see Creating a Volume Group.
Ensure that any other boot and block belonging to any other movable compute for this OCI Full Stack DR project also belongs to the block volume group replicated to region 2.
Task 1.4: Create OCI IAM Policies for OCI Full Stack DR
Configure the required OCI IAM policies for OCI Full Stack DR as outlined in the following documents.
- Policies for OCI Full Stack DR
- Configuring Identity and Access Management (IAM) policies to use Full Stack DR
Task 1.5: Create OCI IAM Policies for other Services managed by OCI Full Stack DR
OCI Full Stack DR must have the ability to control and manage other key OCI services such as compute, networking, storage, vaults, databases and other miscellaneous services. Configure the required OCI IAM policies for other services explained here: Policies for Other Services Managed by Full Stack Disaster Recovery.
Task 1.6: Create OCI Object Storage Buckets for DRPG Logs
Task 1.6.1: Navigate to OCI Object Storage
Begin by navigating to Object Storage & Archive Storage as shown in Figure 1.1.
- Ensure the browser context is set to region 1 (London).
- Select Storage.
- Select Buckets.
Figure 1.1: Navigate to object storage
Task 1.6.2: OCI Object Storage Bucket in Region 1
Create an OCI Object Storage bucket in region 1. In a later step, the bucket will be assigned to the DR protection group in region 1.
- Select the compartment that contains EPM system related resources.
- Click Create Bucket.
- Give the bucket a meaningful name that easily identifies which application and purpose it serves.
- Use the default value for Default Storage Tier and Encryption.
- Click Create to create the bucket.
Figure 1.2: Create an object storage bucket in region 1
Task 1.6.3: OCI Object Storage Bucket in Region 2
Follow the same process to create an object storage bucket in region 2 (Newport). Make sure to select Newport region. In a later step, the bucket will be assigned to the DR protection group in region 2.
- Change the context to region 2.
- Select the compartment that contains EPM System related resources in region 2.
- Click Create to create the bucket.
Figure 1.3: Create an object storage bucket in region 2
Task 1.7: (Optional) Create OCI Load Balancer in standby region
Use of OCI Load Balancer is optional, if your EPM System does not include it skip this task.
Create an OCI Load Balancer in the Standby Region:
- Mirror the configuration of the primary load balancer.
- Create an empty backend set within this load balancer. At this point in the standby region, there are no backends to include in the configuration, so only an empty backend set should be created.
Configuration During Switchover or Failover:
- The configuration of the primary EPM System backend set will be copied to the empty standby backend set.
Certificates and Listeners:
- If you use your own certificates, load them onto the standby load balancer.
- Configure listeners to match the primary load balancer’s configuration.
Post-Switchover Update:
- After the switchover, update the DNS information to point to the IP address of the standby load balancer.
For more information, see Overview of Load Balancer.
Task 2: Create DR Protection Groups (DRPG) in both Region
Note: Skip Task 2 entirely if Oracle EPM is being added to existing DR Protection Groups.
Create DR protection groups in region 1 and region 2 if the protection groups for this application stack do not exist yet.
Task 2.1: Navigate to DR Protection Groups
Begin by navigating to DR Protection Groups (OCI Full Stack DR) as shown in Figure 2-1 below.
- Ensure the OCI region context is set to region 1 (London).
- Click Migration & Disaster Recovery.
- Click DR Protection Groups.
Figure 2-1: Navigate to DR protection groups
Task 2.2: Create a Protection Group in Region 1
Create a basic DR protection group (DRPG) in region 1 as shown in Figure 2-2 below. The peer, role and members will be assigned in later steps.
- Select the compartment where you want the DRPG to be created. This can be the same compartment where EPM System resources exist.
- Click Create DR protection group to open the dialog.
- Use a meaningful name for the DRPG.
- Select the object storage bucket created in Task 2 for region 1.
- Click Create.
Figure 2.2: Parameters needed to create DR protection group in region 1
Task 2.3: Create a Protection Group in Region 2
Create a basic DR protection group (DRPG) in region 2 as shown in Figure 2-3 below. The peer, role and members will be assigned in later steps.
- Change the OCI region context to region 2 (Newport).
- Select the compartment where you want the DRPG to be created. This can be the same compartment where EPM System resources exist.
- Click Create DR protection group to open the dialog.
- Use a meaningful name for the DRPG.
- Select the object storage bucket created in Task 2 for region 2.
- Click Create.
Figure 2-3: Parameters needed to create DR protection group in region 2
Task 2.4: Associate Protection Groups in Region 1 and Region 2
Associate the DRPGs in each region as peers of each other and assign the peer roles of primary and standby. This is how OCI Full Stack DR will know which two regions work together for Oracle EPM System recovery. The roles of primary and standby are automatically changed by OCI Full Stack DR as part of any DR operation/DR plan execution; there is no need to manage the roles manually at any time.
Task 2.4.1: Begin the Association
- Ensure OCI region context is set to region 1 (London).
- Click Associate to begin the process.
Figure 2.4.1: Begin DRPG association
Task 2.4.2: Associate Protection Groups in Region 1 and Region 2
Provide the parameters as shown in Figure 2.4.2.
- Select primary role. OCI Full Stack DR will assign the standby role to region 2 automatically.
- Select region 2 (Newport), where the other DRPG was created.
- Select the peer DRPG that was created in.
- Click Associate.
Figure 2.4.2: Parameters needed to associate the DRPGs
Task 2.4.3: What You Should see After Association is Complete
OCI Full Stack DR will show something like Figure 2.4.3, once the association is completed.
- The current primary peer DRPG is London (region 1).
- The current standby peer DRPG is Newport (region 2).
Figure 2.4.3: Showing the peer relationship from the individual DRPG perspective
The same information can be found whenever the context/view is from a global perspective showing all DR protection groups as shown in Figure 2.4.4 below.
- The current primary peer DRPG is London (region 1).
- The current standby peer DRPG is Newport (region 2).
Figure 2.4.4: Showing the peer relationship from the global DRPG perspective
Task 3: Add Members to the DR Protection Group
Note: This task will delete any existing DR plans in both regions when adding members to existing DR Protection Groups. OCI Full Stack DR cannot save copies or make backups of DR protection groups at the time of this writing. Make sure you have recorded all the information about any DR plan groups and steps in a text file or spreadsheet to help recreate the custom, user-defined plan groups and steps. You can also create bash scripts that call OCI Full Stack DR CLI commands to recreate the custom, user-defined plan groups and steps (this is beyond the scope of this tutorial).
Add the database and Oracle EPM System application compute node(s) as members of the DR protection groups. The DR Control Node is either a compute instance you created just to support the DR orchestration or a compute instance that is part of the application stack you want to manage with Full Stack DR. In this example, the EPM System application node also performs the function of the DR Control Node.
You will add the following resources to the primary DRPG in region 1:
- The EPM System application compute node, which also performs the function of the DR Control node.
- The volume group contains the boot volume of the EPM system application compute node. If present, any additional block volumes attached to the compute nodes should be included in the volume group.
- The primary Oracle Base Database Service.
- The primary load balancer.
Task 3.1: Begin Adding Members to DRPG in Region 1
Begin by selecting the DRPG in region 1 as shown in the Figure 3-1.
- Ensure the OCI region context is region 1 (London).
- Select the DRPG in region 1.
- Select Members.
- Click Add Member to begin the process.
Figure 3-1: How to begin adding members to DR protection group in region 1
Task 3.1.1: Add Compute Instance for DR Control Node
Add a compute instance for the EPM System, which also serves as the DR Control Node, as illustrated in Figure 3.1.1. In this example, a single compute instance hosts all modules of the EPM System. If your EPM System is deployed in a distributed environment with multiple compute nodes, ensure that each node is included in this step.
- Acknowledge warning about DR plans.
- Enter Compute as a member Resource type.
- Select the EPM System application compute instance. This same compute instance will also be used for running user-defined scripts.
- Select Moving instance.
- Add OCI Full Stack DR which VCN & subnet to assign to the VNIC(s) at region 2 during a recovery. Figure 4-2 shows a single VNIC. OCI Full Stack DR does not care how many VNICs you have or how they are configured at either region; specify whatever you need that fits your requirements. Make sure to provide a valid IP address from the target subnet in the standby region. This will simplify updating host files after the switchover, as the compute instance will consistently use the same, known IP address.
Figure 3.1.1: Parameters needed to add DR Control Node
Task 3.1.2: Add Block Volume Group for DR Control Node
Add the block volume group containing boot and block volumes attached to the EPM System application node. The block volume group must already have cross-region replication configured between the two regions before adding it to the DR protection group.
- Select Volume group as member Resource type.
- Ensure the correct compartment containing the volume group is selected, then select the volume group.
Figure 3.1.2: Parameters needed to add boot volume group for EPM Compute
Task 3.1.3: Add Primary Oracle Base Database Service
At this point, Oracle Data Guard should already be configured for the Oracle Base Database service system as part of Task 1. Add the primary DB as a member of the DRPG in region 1.
- Select Database as member Resource type.
- Ensure that the correct compartment for the database are selected.
- Provide details of the secret in the OCI Vault containing the EPM database SYS user password. You have created this secret during the configuration of Oracle Data Guard in Task 1.
Figure 3.1.3: Parameters needed to add Primary DB running from Base DB service
Task 3.1.4: Add OCI Load Balancer
In this example, we are adding the load balancer as a member of the DRPG in region 1.
- Select Load Balancer as member Resource type.
- Ensure correct compartments for load balancer is selected.
- Select Source Backend Set: This is the backend set used by your EPM System application. An OCI Load Balancer can be shared among multiple applications and may have multiple backend sets configured. During a DR switchover, only the backend sets specified here will have their configuration moved to the standby region.
- Select Destination Backend Set: This is the empty backend set created in Task 1.7 in region 2.
Figure 3.1.4: Parameters needed to add Load Balancer
Task 3.1.5: Verify Member Resources for Region 1
The DRPG for region 1 should now have four member resources as shown in Figure 3.1.5. The names of your member resources will be different.
- The primary database.
- The movable compute instance.
- Block volume group for the compute instance.
- OCI Load balancer
Figure 3.1.5: Showing members of DRPG in region 1
Task 3.2: Begin Adding Members to DRPG in Region 2
Begin by selecting the DRPG in region 2.
- Ensure the OCI region context is region 2 (Newport).
- Select the DRPG in region 2.
- Click Members.
- Click Add Member to begin the process.
You will add the following resources to the standby DRPG in region 2 by following similar steps as in the primary region:
- The standby/remote Oracle Base Database Service system.
- Standby OCI Load Balancer.
Once the task is completed the DRPG for region 2 should have two member resources as shown in Figure 3.2 below.
Figure 3.2: Showing members of DRPG in region 2
Task 4: Create Basic DR Plans in Region 2 (Newport)
This step creates basic switchover and failover plans associated with the standby DR Protection Group in region 2 (Newport).
The purpose of each plan is to transition the workload from primary region 1 to standby region 2. The roles of the DR protection groups in both regions are automatically reversed as part of any DR operation, so the protection group in region 1 will become the standby and the protection group in region 2 will become primary after a failover or switchover.
OCI Full Stack DR will pre-populate both plans with built-in steps based on the member resources added in the previous tasks. The plans will be customized in later steps to handle all the tasks related to EPM System during a recovery operation.
The switchover plans are always created in the protection group with the standby role; region 2 is currently the standby protection group, so we will begin in Newport.
Task 4.1: Create DR Plans
Create a basic plan by selecting the DRPG in region 2 (Newport)
- Ensure the OCI region context is region 2 (Newport).
- Select the standby DRPG in region 2.
- Select Plans.
- Click Create Plan to begin the process.
Figure 4-1: How to begin creating basic DR plans in region 2
Task 4.1.1: Create a Switchover Plan
Creating a DR plan is simple as shown in Figure 4.1.1 below.
- Make the name of the switchover plan simple but meaningful. The name should be as short as possible but easy to understand at a glance to help reduce confusion and human error during a crisis.
- Select the plan type as Switchover (planned). There are only four plan types at the time of this writing.
Figure 4.1.1: The parameters needed to create DR switchover plan
Task 4.1.2: Create a Failover Plan
Follow the same process to create a basic failover plan as shown in Figure 4.1.2 below.
- Make the name of the failover plan simple but meaningful. The name should be as short as possible but easy to understand at a glance to help reduce confusion and human error during a crisis.
- Select the plan type as Failover (unplanned). There are only four plan types at the time of this writing.
Figure 4.1.2: The parameters needed to create DR failover plan
The standby DR Protection Group in region 2 should now have the two DR plans as shown in the following image. These will handle transitioning workloads from region 1 to region 2. You will create similar plans at region 1 to transition workloads from region 2 back to region 1 in a later task.
Figure 4.1.3: Showing the two basic DR plans that must exist in region 2 before proceeding any further
Task 5: Customize the Switchover Plan in Region 2 (Newport)
The basic DR plans created in Task 4 contain prepopulated steps for recovery tasks that are built into Full Stack DR and do not contain anything to manage recovery tasks specific to the EPM System application. This step explains how to add custom, user-defined DR Plan Groups and steps to manage the tasks that need to be accomplished during a switchover for the EPM System:
- Stop EPM System services at the current primary region 1 before stopping any VMs.
- Update the host files on compute node to map standby IP addresses to primary region hostnames.
- Start EPM System services at the current standby region 2 after launching any VMs.
Task 5.1: Select the Switchover Plan
Navigate to the switchover plan created in Task 4.
Figure 5.1: How to begin customizing the switchover plan in region 2
Task 5.2: (Optional) Enable DR Plan Groups that Terminate Artifacts
There are two plan groups that are disabled by default in switchover plans as shown in the following screenshot. They are disabled to provide a level of comfort during testing that nothing is actually being deleted and you still have a viable copy of the artifacts as a backup in case something goes wrong during testing.
However, these two plan groups terminate (delete) artifacts that will never be used again as part of any DR operation in the future. The artifacts will simply continue to accumulate over time as you switch back and forth between the two regions causing confusion about which compute instances and volume groups are the ones that should actually be active.
These plan groups should be enabled once OCI Full Stack DR goes into production. Any artifacts that were left in place during testing switchovers and switchbacks while these two plan groups were disabled should be terminated and cleaned up before going into production to reduce confusion and the likelihood of human error during normal operations.
Optionally, these plan groups can be enabled now to avoid having to manually clean up the superfluous artifacts before going into production.
Figure 5.2: Plan groups disabled by default
Here is what the disabled plan groups do when they are enabled:
-
This plan group terminates artifacts of compute instances that are left behind at region 1 after the replicated versions of the VMs have been launched at region 2 during the OCI Object Storage operation to reverse the replication from region 2 back to region 1 as part of the switchover. The leftover VMs are not used during a switchback because the operation to reverse block volume replication creates all new VMs in completely new block volume groups.
-
This plan group terminates artifacts of block volume groups (VGs) that are left behind at region 1 after the replicated versions of the VGs have been activated at region 2 and volume group replication has been reversed during the switchover. The leftover block volume groups are never used again, not even as part of a switchover from region 2 back to region 1.
Task 5.2.1: Enable Terminate Compute Plan Group
Enable the plan group.
-
Select Enable all steps from the context menu to the right of the plan group name.
Figure 5.2.1: How to enable terminate compute instances
Task 5.2.2 Enable terminate volume groups plan group
Enable the plan group.
-
Select Enable all steps from the context menu to the right of the plan group name.
Figure 5.2.2: How to enable terminate volume groups
Task 5.3: Create a Plan group to Execute Custom Scripts at Region 1 (primary)
Begin adding custom, user-defined DR Plan Groups.
The first user-defined plan group will execute custom scripts to stop the EPM System services running at primary region 1. This plan group will contain a single step that calls the Windows PowerShell script stop_services.ps1, which was downloaded to the folder c:/scripts
on the EPM Application node in Task 1.2.
Task 5.3.1: Select Add Plan Group
Begin the process of adding a plan group.
- Click Add group to begin.
- Give the plan group a simple but descriptive name. This is optional, but a best practice is to add a note about which region the plan group will execute the steps.
- Select a position where the plan group will be inserted into the DR plan. In this case, we are going to insert our user-defined plan group before the built-in plan group that stops the VMs at region 1.
- Select the built-in Stop Compute Instances (primary) plan group.
- Click Add Step to open the dialog where we will specify the script to stop the EPM System.
Figure 5.3.1: Parameters to create plan group and add step to stop EPM
Task 5.3.2: Provide Step Name and Local Script Parameters
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. In this case, it will stop EPM System services in region 1.
We will explain all fields in this dialog, but leave out this detail in all the remaining screenshots in subsequent steps since we are just performing the same process repeatedly.
- A descriptive Step name explaining what task this step performs.
- Always select the region where the EPM App node is running right now, not where it will be running during a switchover. OCI Full Stack DR will keep track of where the VM is running, so you just need to specify where it is right now. In this case, the EPM App node is running in region 1 (London).
- Select the correct compartment that contains the DR Control Node. Then select the compute instance designated as the DR Control Node; in this example, it is EPM system application compute.
- Select Run local script to inform OCI Full Stack DR that the script will be found on a compute instance. The Windows PowerShell scripts were downloaded to the DR Control Node in Task 1.2.
- Paste in the absolute path where you installed the
stop_services.ps1
script on the DR Control Node. Add stop as the first parameter and the OCI region ID as the second. - The DR plan should stop if the script fails to stop EPM Services. This will allow anyone to see there is a problem and fix it. OCI Full Stack DR provides the opportunity to continue running the switchover plan after fixing the problem.
- The default value before Full Stack DR declares a failure is one hour. This value can be changed to 30 minutes or whatever is felt to be a more realistic timeout value.
- Click Add step to add this step to the plan group.
Figure 5.3.2: Parameters to create the plan step for to stop EPM
Task 5.3.3: Complete Adding Plan Group and Step
The step to stop the EPM System has now been added to the DR plan group, as shown in Figure 5.3.3 below.
This shows the plan step that was just added. It is possible to add additional steps to a DR plan group, but this plan group will only include the step to stop EPM Services. Click Add to add the DR plan group and step to the DR plan.
Figure 5.3.3: Finalize adding plan group and step to stop EPM
Task 5.4: Create Plan Group to Execute Custom Scripts at Region 2 (Standby)
The second user-defined plan group will update host files on compute nodes and start EPM System services after the DR Control Node is launched at the standby region 2. This plan group will contain two steps that call host_switch_failover.ps1
and start_services.ps1
PowerShell scripts that were downloaded to the DR Control Node in Task 1.2.
Task 5.4.1 Create a DR Plan Group to Update the Host File after the Switchover to the Standby Region
- Give the plan group a simple but descriptive Group name.
- Select a position where the plan group will be inserted into the DR plan. In this case, we are going to insert our user-defined plan group after the built-in plan group that launches the replicated version of the EPM System application node, which also performs the function of the DR control node at region 2.
- Select the built-in Launch Compute Instances plan group
- Click Add step to open the dialog where we will specify the script to update the host file.
Figure 5.4.1: Parameters to create the plan step for to start EPM
Task 5.4.2: Provide Step Name and Local Script Parameters for Host File Update Script
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. The host_switch_failover.ps1
script updates the host file on the compute node so that the new IP addresses for the compute and database instances in region 2 are mapped to the original region 1 hostname. This will allow the application to be started without any further modifications in the application layer.
This step is the same as Task 5.3.2 except for the items shown in Figure 5.4.2 below.
- A descriptive Step name explaining what task this step performs.
- Paste in the absolute path to
PowerShell.exe
and to where you installed thehost_switch_failover.ps1
script on the EPM App node. - Click Add step to add this step to the plan group.
Figure 5.4.2: Parameters to update host file
Task 5.4.3: Provide Step Name and Local Script Parameters for EPM System Service Start Script
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. In this case, EPM system services will start at region 2.
- A descriptive Step name explaining what task this step performs.
- Paste in the absolute path to
PowerShell.exe
and to where you installed thestart_services.ps1
script on the EPM App node. - Click Add step to add this step to the plan group.
- Click Add to add the plan group which now contains two steps to execute two custom scripts.
Figure 5.4.3: Parameters to start EPM
The switchover plan should now include both DR Plan Groups as shown in the following screenshot.
Figure 5.4.4: Custom script after startup
Task 6: Customize the Failover Plan in Region 2 (Newport)
This task explains how to add custom, user-defined DR Plan Groups and steps to manage the tasks that need to be accomplished during a failover for the EPM System at region 2 during an actual outage or loss of access to region 1. These steps will be a subset of the same steps that were just added to the switchover plan in Task 5 above. However, only steps that are executed at standby region 2 will be added to the failover plan since it is assumed region 1 is completely inaccessible during a failover.
Task 6.1: Select the Failover Plan
Begin by navigating to the failover plan created in Task 5.
- Ensure standby region 2 is still the current region context in the console.
- Select the failover plan.
Figure 6-1: How to create begin customizing the failover plan in region 2.
Task 6.1.2: Add Steps to the new user-defined Plan group
-
Click Add Group.
Figure 6.1.2 Parameters to create the plan step for to start EPM -
Follow the instructions from task 5.4 to add two steps to the user-defined plan group to execute custom scripts:
host_switch_failover.ps1
andstart_services.ps1
. -
After adding steps and the user-defined plan group, your failover plan should look like this:
Figure 6.1.3 Parameters to create the plan step for to start EPM and update hosts
Task 7: Execute the Switchover Plan in Region 2 (Newport)
Both switchover and failover DR plans have been completed in the standby region 2 (Newport). The DR plans in region 2 allow OCI Full Stack DR to transition workloads from region 1 to region 2. The next task is to create switchover and failover plans in the protection group for region 1 (London) so OCI Full Stack DR can transition workloads from region 2 back to region 1.
However, DR plans can only be created and modified in the protection group with the standby role. The DR protection group in region 1 is currently the primary, which means DR plans cannot be created in region 1.
Therefore, we need to reverse the roles of the protection groups so region 1 is the standby and region 2 is the primary. Execute the switchover plan that was just created to transition the workload from region 1 (London) to region 2 (Newport).
Task 7.1: Begin Plan Execution
Execute the DR plan to begin the process of transitioning the EPM System workload from region 1 to region 2.
- Ensure the region context is still set to standby region 2 (Newport).
- Use the breadcrumbs at the top of the console to help ensure DR protection group details is the current plan context.
- Ensure the correct DR protection group in region 2 is selected; it should be the standby role.
- Before proceeding, ensure both the failover and switchover plans exist; if not, go back to the previous tasks to create both DR plans.
- Click Execute DR plan.
Figure 7-1: Showing how to execute a switchover to standby region
Task 7.2: Select Switchover Plan and Execute
This task executes the switchover plan in region 2.
- Select the switchover plan.
- Select Enable prechecks.
- Click Execute DR plan to begin.
Figure 7.2: Choose and execute the switchover plan
Task 7.3: Next Steps
Monitor the switchover plan until the EPM System workload has been fully transitioned from region 1 to region 2. Full Stack DR will clean up artifacts and change the primary and standby roles between the regions. In case,if the switchover plan execution fails, verify the logs and make sure the plan is executed successfully.
Once Full Stack DR has completed the switchover, region 2 (Newport) will become the primary region, and region 1 (London) will be the standby region.
Task 8: Create DR plans in region 1 (London)
Create the same basic switchover and failover plans in the DR Protection Group for region 1 (London), which is now the standby peer.
Each plan aims to transition the workload from region 2 to region 1 whenever region 2 is the primary peer. As part of any DR operation, the roles of the DR protection groups in both regions are automatically reversed, so the DR protection group in region 2 will become the standby, and the DR protection group in region 1 will become the primary after a failover or switchover.
OCI Full Stack DR will pre-populate both plans with built-in steps based on the member resources added in the previous task. In later steps, the plans will be customized to handle all tasks related to the EPM System during a recovery operation.
The switchover plans are always created in the protection group with the standby role; region 1 is currently the standby protection group after executing the switchover plan in Task 8.
Task 8.1: Create DR plans
Create a basic plan by selecting the DRPG in region 1 as shown in Figure 8.1.
- Ensure the OCI region context is region 1 (London).
- Select the standby DRPG in region 1.
- Select Plans.
- Click Create Plan to begin the process.
- Make the name of the switchover plan simple but meaningful. The name should be as short as possible but easy to understand at a glance to help reduce confusion and human error during a crisis.
- Select Plan type as Switchover (planned). There are only four plan types at the time of this writing.
- Click Create to create a basic switchover plan pre-populated with basic built-in steps.
Figure 8.1: The parameters needed to create DR switchover plan
Task 8.2: Create a Failover Plan
Follow the same process to create a basic failover plan as shown in Figure 8.2.
-
Make the Name of the failover plan simple but meaningful. The name should be as short as possible but easy to understand at a glance to help reduce confusion and human error during a crisis.
-
Select Plan type as Failover (unplanned).There are four plan types at the time of this writing.
-
Click Create to create basic failover plan prepopulated with basic built-in steps.
Figure 8.2: The parameters needed to create DR failover plan
The standby DR Protection Group in region 1 should now have the two DR plans as shown below. These will handle transitioning workloads from region 2 back to region 1.
Figure 8.3: Showing the two basic DR plans that must exist in region 2 before proceeding any further
Task 9: Customize the Switchover Plan in Region 1 (London)
Everything about this task is almost exactly the same as what we did in Task 5 for region 2, except this is being done in region 1.
The basic DR plans created in Task 8 contain prepopulated steps for recovery tasks that are built into OCI Full Stack DR and do not contain anything to manage recovery tasks specific to the EPM System application. This step explains how to add custom, user-defined DR Plan Groups and steps to manage the tasks that need to be accomplished during a switchover for the EPM System:
- Stop EPM System services at the current primary region 1 before stopping any VMs.
- Update the host files on the compute node to map standby IP addresses to primary region hostnames.
- Start EPM System services at the current standby region 2 after launching any VMs.
Task 9.1: Select the Switchover Plan
Navigate to the switchover plan created in the previous task.
Figure 9-1: How to begin customizing the switchover plan in region 1
Task 9.2: (Optional) Enable DR Plan Groups that Terminate Artifacts
These are the same steps performed for region 2 in an earlier task; the same process needs to be followed for region 1.
Two plan groups are disabled by default in switchover plans as shown in the screenshot below. They are disabled to provide a level of comfort during testing that nothing is actually being deleted, and you still have a viable copy of the artifacts as a backup in case something goes wrong during testing.
However, these two plan groups terminate (delete) artifacts that will never be used again as part of any DR operation in the future. The artifacts will simply continue to accumulate over time as you switch back and forth between the two regions causing confusion for humans about which compute instances and volume groups are the ones that should actually be active.
These plan groups should be enabled once OCI Full Stack DR goes into production. Any artifacts that were left in place during testing switchovers and switchbacks while these two plan groups were disabled should be terminated and cleaned up before going into production to reduce confusion and the likelihood of human error during normal operations.
Optionally, these plan groups can be enabled now to avoid having to manually clean up the superfluous artifacts before going into production.
Figure 9-2: Plan groups disabled by default
Here is what the disabled plan groups do when they are enabled:
-
This plan group terminates artifacts of compute instances that are left behind at region 2 after the replicated versions of the VMs have been launched at region 1 during the OCI block storage operation to reverse the replication from region 1 back to region 2 as part of the switchover. The leftover VMs are not used during a switchback because the operation to reverse block volume replication creates all new VMs in completely new block volume groups.
-
This plan group terminates artifacts of block volume groups (VGs) that are left behind at region 2 after the replicated versions of the VGs have been activated at region 1 and volume group replication has been reversed during the switchover. The leftover block volume groups are never used again, not even as part of a switchover from region 1 back to region 2.
Task 9.2.1: Enable Terminate Compute Plan group
Enable the plan group.
-
Select Enable all steps from the context menu to the right of the plan group name
Figure 9.2.1: How to enable terminate compute instances
Task 9.2.2 Enable terminate volume groups plan group
Enable the plan group.
-
Select Enable all steps from the context menu to the right of the plan group name
Figure 9.2.2: How to enable terminate volume groups
Task 9.3: Create Plan Group to Execute Custom Scripts at Region 2 (Primary)
Begin adding custom, user-defined DR Plan Groups.
The first user-defined plan group will execute custom scripts to stop EPM System services running at the primary region 2. This plan group will contain a single step that calls the Windows PowerShell script stop_services.ps1
downloaded to the c:/scripts
folder on the DR Control Node in Task 1.2.
Task 9.3.1: Select Add Plan Group
Begin the process to add a plan group.
- Click Add group to begin.
- Give the plan group a simple but descriptive name.
- Select a position where the plan group will be inserted into the DR plan. In this case, we are going to insert our user-defined plan group before the built-in plan group that stops the VMs at region 2.
- Select the built-in Stop Compute Instances (Primary) plan group.
- Click Add step to open the dialog where we will specify the script to stop EPM System.
Figure 9.3.1: Parameters to create plan group and add step to stop EPM system services
Task 9.3.2: Provide Step Name and Local Script Parameters
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. In this case, it will stop EPM System services in region 2.
We will explain all fields in this dialog, but leave out this detail in all the remaining screenshots in subsequent steps since we are just performing the same process repeatedly.
- A descriptive Step name explaining what task this step performs.
- The DR plan should stop if the script fails to stop EPM Services. This will allow anyone to see there is a problem and fix it. Full Stack DR provides the opportunity to continue running the switchover plan after fixing the problem.
- The default value before OCI Full Stack DR declares a failure is one hour. This value can be changed to 30 minutes or whatever is felt to be a more realistic timeout value.
- Always select the region where the DR Control Node is running right now, not where it will be running during a switchover. OCI Full Stack DR will keep track of where the VM is running, so you just need to specify where it is right now. In this case, the DR Control Node is running in region 1 (London).
- Select Run local script to inform Full Stack DR that the script will be found on a compute instance. The Windows PowerShell scripts were downloaded to the DR Control Node in Task 1.2.
- Select the correct compartment that contains the DR Control Node. Then select the compute instance that was designated as the DR Control Node, in this example it is EPM system application compute.
- Paste in the absolute path where you installed the
stop_services.ps1
script on the DR Control Node. Add stop as the first parameter and the OCI region ID as the second. - Click Add step to add this step to the plan group.
Figure 9.3.2: Parameters to create plan group and add step to start EPM system services
Task 9.3.3: Complete Adding Plan Group and Step
-
The step to stop EPM System is now added to the DR plan group as shown in figure 9.3.3.
Figure 9.3.3: Parameters to create plan group and add step to Stop EPM system services -
This shows the plan step that was just added. It is possible to add additional steps to a DR plan group, but this plan group will only include the step to stop EPM Services.
-
Click Add to add the DR plan group and step to the DR plan.
Figure 9.3.4: Parameters to create plan group and group added to Stop EPM system services
Task 9.4: Create Plan group to Execute Custom Scripts at Region 1 (Standby)
The second user-defined plan group will update host files on compute nodes and start EPM System services after the DR Control Node is launched at the standby region 1. This plan group will contain two steps that call the host_switch_failback.ps1
and start_services.ps1
PowerShell scripts that were downloaded to the DR Control Node in Task 1.2. The host_switch_failback.ps1
script reverses changes introduced by the host_switch_failover.ps1
script in Newport region and restores the original host files on the compute nodes after they have been moved back to the original primary region London.
Task 9.4.1 Create a DR Plan Group to Update the Host file after Switchover to the Standby Region
- Give the plan group a simple but descriptive name.
- Select a position where the plan group will be inserted into the DR plan. In this case, we are going to insert our user-defined plan group after the built-in plan group that launches the replicated version of EPM System application node, which also performs the function of the DR control node at region 1.
- Select the built-in Launch Compute Instances (Standby) plan group.
- Click Add step to open the dialog where we will specify the script to update host file.
Figure 9.4.1: Parameters to create plan group and group added to update host file
Task 9.4.2: Provide Step Name and Local Script Parameters for host file update Script
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. The host_switch_failback.ps1
script updates the host file on the compute node. It reverses the changes introduced by the host_switch_failback.ps1
script in the Newport region and restores the original host file for region 1 (London). This will allow the application to start without any further modifications in the application layer.
This step is the same as Task 9.3.2 except for the items shown in the Figure.
- A descriptive name explaining what task this step performs.
- Paste in the absolute path to
PowerShell.exe
and to where you installed thehost_switch_failover.ps1
script on the DR Control Node. - Click Add step to add this step to the plan group.
Figure 9.4.2: Parameters to create plan group and step details added to update host file
Task 9.4.3: Provide Step Name and Local Script Parameters for EPM System Service Start Script
The Add plan group step dialog allows us to specify parameters about what this one step will perform and how it will behave during recovery. In this case, EPM system services will start at region 2.
- A descriptive name explaining what task this step performs.
- Paste in the absolute path to
PowerShell.exe
and to where you installed thestart_services.ps1
script on the DR Control Node. - Click Add step to add this step to the plan group.
- Click Add to add the plan group which now contains two steps to execute two custom scripts.
The switchover plan should now include both DR Plan Groups as shown in the following screenshot.
Figure 9.4.3: Swichover user defined plan groups
Task 10: Customize the Failover Plan in Region 1 (London)
This task explains how to add custom, user-defined DR Plan Groups and steps to manage the tasks that need to be accomplished during a failover for the EPM System at region 1 during an actual outage or loss of access to region 2. These will be a subset of the same steps that were just added to the switchover plan in Task 9 above. However, only steps that are executed at standby region 1 will be added to the failover plan since it is assumed region 2 is completely inaccessible during a failover.
Task 10.1: Add User Defined Plan Group to Failover Plan
Navigate to the failover plan created in Task 8.
Figure 10.1: Failover plan in Region 1
Task 10.1.1: Add Plan Group
- Ensure standby region 2 is still the current region context in the console.
- Select the failover plan.
- Click Add group.
- Specify the name of the group.
- Add it to the plan after the built in step Launch Compute Instance.
Figure 10.1: Parameters to create plan group to execute custom scripts after failover to region 2.
Task 10.1.2: Add Steps to the New User-Defined Plan Group
-
Follow the instructions from Task 9.4 to add two steps to the user-defined plan group to execute the custom script:
host_switch_failback.ps1
.
Figure 10.2: Parameters to create plan group step for script updating the host file. -
Add a second step in the plan group to start the services using the
start_services.ps1
script.
Figure 10.3: Parameters to create plan group step for script updating the host file. -
After adding steps the user defined plan group should look like this and click Add.
Figure 10.4: Plan group showing configured steps to execute two local scripts after the compute instance launch. -
The failover plan should now include the user-defined DR Plan Group for the EPM System, as shown in the following screenshot. You may have additional plan groups if your protection group includes other applications or OCI services along with the EPM System.
Figure 10.5: Failover plan showing user defined plan groups
Next Steps
OCI Full Stack DR for the EPM System should be fully implemented at this point. However, full functionality should be validated before using it in production. All failover and switchover plans should be executed to validate that everything works as expected and the recovery team fully understands the entire process.
Testing Switchover Plans
Switchover plans are designed to clean up all artifacts and ensure all roles for built-in recovery steps, such as load balancer, block storage, file systems, BaseDB, ExaCS, and Autonomous Database, are ready to be recovered from the standby region without human intervention.
Testing Failover Plans
Failovers are different. Failovers by their very nature cannot clean up artifacts or ensure services and databases at the failed region are ready to transition workloads back to region 1. The recovery team needs to understand and perform tasks to ensure Oracle Data Guard is in the correct state, artifacts for storage and compute instances have been terminated, and so on. For more information, see Resetting DR Configuration After a Failover.
Validate All DR Plans for Final Acceptance
The recovery team needs to perform a final validation to demonstrate the readiness of OCI Full Stack DR protection groups and plans for production workloads. Region 2 (Newport) should be the primary region at this point in the process. Begin final validate of all plans by completing the following steps:
-
Test switchover from region 2 (primary) back to region 1 (standby).
-
Test failover from region 1 (primary) to region 2 (standby).
-
Prepare region 1 (primary) for failover from region 2.
-
Test failover from region 2 (primary) to region 1 (standby).
-
Prepare region 2 (primary) for either a failover or switchover to region 2.
-
Optionally you can also create and test Start Drill and Stop Drill plans based on the requirements.
-
The DR protection groups and application stack should be in a normal operational state and ready for a failover or switchover at this point.
Related Links
-
Design the infrastructure to deploy Oracle Enterprise Performance Management in the cloud
-
Deployment Options Guide: Disaster Recovery for Hyperion EPM System
-
Database Considerations: Disaster Recovery for Fusion Middleware
-
Sample scripts location 1: Download EPM sample scripts from GitHub
-
Sample scripts location 2: Download EPM sample scripts from GitHub
-
Oracle Cloud Infrastructure (OCI) Full Stack Disaster Recovery
-
OCI Full Stack Disaster Recovery - User defined group scripts
Acknowledgments
-
Author - Grzegorz Reizer (Oracle EPM Specialist)
-
Contributor - Suraj Ramesh (Product Manager for OCI Full Stack DR)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Automate Recovery for Oracle Enterprise Performance Management using OCI Full Stack Disaster Recovery
G11330-01
July 2024