Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Enhanced Disaster Recovery Plan Management with OCI Full Stack Disaster Recovery
Introduction
Oracle Cloud Infrastructure Full Stack Disaster Recovery (OCI Full Stack DR) orchestrates the transition of compute, database, and applications between Oracle Cloud Infrastructure (OCI) regions from around the globe with a single click. Customers can automate the steps needed to recover one or more business systems without redesigning or re-architecting existing infrastructure, databases, or applications and without needing specialized management or conversion servers.
The recent update to the OCI Full Stack DR service has significantly improved the management of DR plans. Plans will now be preserved instead of being deleted if there are any member update, addition or deletion, allowing users to refresh the plans and verify them. Let us explore how these changes enhance the user experience and simplify DR management.
Initial Deployment Architecture
-
2 x Moving compute on the primary region (vmapp01 and vmapp02).
-
1 x Volume group in primary region containing boot volumes for vmapp01 and vmapp02.
Target Deployment Architecture
-
1 x Moving instance on the primary region (vmapp01).
-
1 x Non-moving instance on the primary region (vmapp03).
-
1 x Non-moving instance on the standby region (vmapp03dr).
-
1 x Volume group in primary region containing boot volume for vmapp01 only.
Objectives
Modify existing Full Stack DR Protection Group member resources without deleting any existing DR plans. This tutorial will demonstrate the plan refresh workflow by removing one moving compute and adding two non-moving compute in existing primary and standby DR Protection Group that are already peers between two OCI regions.
Primary region is Ashburn and standby region is Phoenix.
The following tasks will be covered in this tutorial.
- Task 1: Remove members from primary DRPG.
- Task 2: Add new members to primary and standby DRPG.
- Task 3: Refresh plans in standby DRPG.
- Task 4: Verify plans in standby DRPG.
- Task 5: Make final adjustments to plans in standby DRPG.
- Task 6: Execute switchover plan in standby DRPG.
- Task 7: Refresh and verify DR plans after switchover.
Prerequisites
-
This tutorial assumes the DR Protection Groups (DRPG) already exist, and you have existing DR plans in both regions.
-
This tutorial assumes the reader has administrator privileges and the required Oracle Cloud Infrastructure Identity and Access Management (OCI IAM) policies for OCI Full Stack DR are already in place. For more information, see Configuring Identity and Access Management (IAM) policies to use Full Stack DR and Policies for Full Stack Disaster Recovery.
-
The boot volume for the moving compute (appvm02) being removed in this tutorial has already been removed from the existing the volume group (vgapp01). DR plan updates will fail if the boot device for appvm02 is still contained in vgapp01. For more information, see Removing Volumes from a Group.
-
One new compute instance already exists in the primary region and OCI Full Stack DR is able to run commands on the guest OS. For more information, see Running Commands on an Instance.
-
One new compute instance already exists in the standby region and OCI Full Stack DR is able to run commands on the guest OS. For more information, see Running Commands on an Instance.
Note: The two compute instances you created in each region will be added as non-moving compute, which means the boot volumes do not need to be added to a volume group, they do not need to be replicated, and they are not added as members of the DRPG in either region.
Task 1: Remove Members from the Primary DRPG
-
In the primary DRPG (
DRPG_Refresh_IAD
), select Members. -
Select compute VM (
vmapp02
) and click Remove members. -
Select I understand that I must refresh and verify all the existing plans and click Remove.
Task 2: Add new Members to Primary and Standby DRPG
-
In the primary DRPG (
DRPG_Refresh_IAD
), select Members and add the compute VM (vmapp03
) as a member. -
In the standby DRPG (
DRPG_Refresh_PHX
), select Members and add the compute VM (vmapp03dr
) as a member.
All DR plans in the standby DR Protection Group (DRPG) are set to Needs attention (Needs refresh) whenever changes are made to members of the primary or standby DR protection group. DR plans in both standby and primary regions cannot be modified. Additional changes to DRPG membership can be made in either region, but DR plan groups and steps can’t be added, removed or modified until the refresh and verify workflow has been completed.
You should see something like the following screenshot after completing any changes to members in a protection group. This screenshot shows three of the four DR plan types that should exist in the standby protection group as a best practice. You may or may not have created all three plan types; this is simply an example.
Task 3: Refresh DR Plans in Standby DRPG
Refresh the DR plans that are in a Needs Attention (Needs Refresh) state to see the plan groups and plan steps that will be added or removed as a result of the changes made to members of the protection groups in both regions. This is a critical step that allows you to visually review the DR plans before committing to the planned changes as part of Task 4.
Only DR plans contained in the standby DRPG can be refreshed and verified, since they are in a Needs Attention (Needs Refresh) state. The DR plans in the primary DRPG which are in an Inactive state cannot be refreshed until that DRPG inherits the standby role. Manually switching the roles in the DR Protection Group detail page will not work for the refresh process, so the only valid way of changing the role of the primary DRPG to standby is by executing a switchover plan in the standby DRPG. The switchover is explained in the next task.
The purpose of the refresh is to give people a chance to review everything that will be added or removed from the DR plans before committing the changes. The plan groups and steps impacted by the membership changes will be tagged after the plan refresh is completed. The following list shows the various tags that call out the modified plan groups and steps.
- Group modified: Some steps have been added to or removed from the group.
- Group added: A new group has been added.
- Group deleted: An existing group will be deleted after verification.
- Step added: A new step has been added.
- Step deleted: An existing step will be deleted after verification.
Follow the steps:
-
To begin, select a DR plan that is in a Needs Attention (Needs refresh) state.
-
Click Refresh as shown in the following screenshot.
-
A confirmation box will pop up. Click Refresh in the confirmation box to continue.
The DR plans will look something like the screenshot below once the refresh is completed. The refresh process introspects all the changes that were made to member resources in both regions, then modifies the plan groups and steps to show what adjustments will be made based on the changes in membership. The state for refreshed DR plans will change to Needs Attention (Needs verification) once the refresh is completed. Notice in the screenshot below the label of the Refresh button has changed to Verify.
Expanding all the plan groups as shown in the screenshot below will reveal all the individual plan steps that will be added or removed as part of the verify task. The updated plan groups and corresponding steps are temporarily labelled using the tags in the list above.
Refresh and visually review all remaining DR plans in the standby DRPG that are in a Needs Attention (Needs refresh) state, then move to the next task.
Task 4: Verify DR Plans in Standby DRPG
Verify the refreshed DR plans after visually reviewing them. This is another critical step that commits the planned changes in modified DR plans.
-
To begin, select any plan in a Needs attention (Needs verification) state.
-
Click Verify as shown in the following screenshot.
-
A confirmation box will pop up. Click Verify in the confirmation box to continue.
The verify process removes all modification tags from the plan and enables the Run prechecks and Execute plan buttons as shown in the following screenshot. The state of the plan will change to Active after verification has completed.
Verify all remaining DR plans in the standby DRPG that are in a Needs Attention (Needs verification) state until all plans have been changed to Active, then move to the next task.
Task 5: Make final adjustments to plans in standby DRPG
The example DR plans shown in this tutorial do not have any user-defined plan groups or steps. However, you may want to experiment with adding a user-defined plan group and steps if none exist.
If you are using this tutorial to update existing DR Protection Groups and plans in your tenancy, then use this opportunity to make appropriate changes to the refreshed DR plans. The following list shows a few examples of things you might want to adjust in existing plans:
- New groups may be added if completely new resource types were added as members of DR Protection Group in either region. Ensure that the groups are in the correct order.
- You may need to create new user-defined plan groups and steps for something completely new.
- You may need to add new steps to existing user-defined plan groups.
- You may need to reorder existing plan groups to improve or fix the order of operations.
Ensure all existing DR plans have been adjusted before moving on to the next task.
Task 6: Execute Switchover Plan in Standby DRPG
Note:
The DR plans in the standby region should all be active at this point, which means OCI Full Stack DR can execute the active Failover, Switchover and DR Drill plans even if a catastrophic event causes an outage at the primary region. Switchovers are disruptive and require an outage. Therefore, this task can be performed at a later point in time when an outage can be scheduled to execute the switchover plan in the current standby region.
If you cannot complete this step now, do not forget to complete this task at some point in the future.
Execute the prechecks for the switchover plan you just refreshed in the current standby region, then execute the switchover plan if the prechecks succeed. After that, step through Tasks 3 and 4 for all DR plans contained in the peer DR Protection Group at the second region once the switchover has successfully completed.
The DR plans in the primary region will still be in an Inactive (Needs refresh) state and will also need to be refreshed. However, recovery plans contained in protection groups with the primary role cannot be modified, including refreshing, and verifying. You will need to transition the workload to the current standby region to complete the full DR plan refresh lifecycle and ensure the integrity of your disaster recovery.
Execute prechecks as an independent operation first as a best practice.
-
To begin, open the switchover plan in the standby region.
-
Click Run Prechecks.
-
A confirmation box will pop up. Click Run Prechecks in the confirmation box to continue.
Ensure the prechecks complete successfully as shown in the screenshot below. You may need to remediate any failed precheck steps at this point and then run the precheck again until all steps succeed.
Execute the Switchover plan.
-
To begin, click Execute plan.
-
A confirmation box will pop up. Click Execute plan in the confirmation box to continue.
-
Monitor the plan execution to ensure all steps in the plan succeed.
The following screenshot shows the successful completion of the switchover plan. However, you may encounter failed steps even though the prechecks completed successfully; there is a chance steps will fail as the recovery steps are being executed in reality. Remediate any failed steps and try again.
Task 7: Refresh and Verify DR Plans After Switchover
The roles of the DR Protection Groups will be reversed automatically once the switchover is complete. Continuing our example, Phoenix will now have the primary role and Ashburn will have the standby role.
At this point, all the DR plans in Ashburn will now be in an Inactive (Needs refresh) state since it is now the standby DRPG. You will need to repeat the following tasks in the new standby region:
- Task 3: Refresh plans in standby DRPG.
- Task 4: Verify plans in standby DRPG.
- Task 5: Make final adjustments to plans in standby DRPG.
Next Steps
There are two best practices that should be incorporated into the normal day-to-day operations to help ensure the readiness of your DR plans.
- Regular periodic execution of prechecks.
- Regular periodic execution of DR Drills.
Think about scheduling weekly prechecks of all DR plans in the standby DR Protection Group. Prechecks can be run at any time and have zero impact on production workloads. This will help ensure integrity of your DR plans, catching missing member resources, missing networks, the inability to find expected scripts called by user-defined steps, etc.
Another very important way of validating the readiness of your disaster recovery is to schedule periodic DR Drills once a month or quarter. DR Drills also have zero impact on production workloads, but give you the ability to validate recovery of compute, storage, Oracle databases and backend sets for load balancers in the standby region with the click of a single button. Learn more about Full Stack DR Drills.
Related Links
-
Oracle Cloud Infrastructure (OCI) Full Stack Disaster Recovery
-
Join #full-stack-dr slack channel
Acknowledgments
- Author - Raphael Teixeira (Principal member of technical staff for Full Stack DR engineering)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Enhanced Disaster Recovery Plan Management with OCI Full Stack Disaster Recovery
G23211-01
December 2024