Implement OCI Block Volumes Replication

This implementation uses the OCI Block Volumes cross-region replication feature to replicate the block volumes.

The following are advantages of implementing OCI Block Volumes replication:

  • There is no need to create and run scripts periodically, as in other replication cases. Once you set up the replication, it is performed automatically by Oracle Cloud Infrastructure.
  • It is a general-purpose solution applicable to any block volume of any compute instance (except for the boot volumes). If you have multiple systems, then you can use the same approach in all of them.
  • The information on the replicated block volumes is an exact copy of the primary block volumes; all the files in the block volume are replicated.

Consider the following before using OCI Block Volumes replication:

  • It requires steps to mount the replicated block volumes in the secondary system. You can’t directly mount the replica of the block volumes; you first have to activate them to create cloned block volumes, which can be mounted. This is not complex in systems with few nodes, but the complexity increases when there are many nodes. And especially in systems that don’t have the same node distribution in the availability domains of the primary and standby.

    However, you can overcome this complexity using the Oracle Cloud Infrastructure Full Stack Disaster Recovery service to automate these steps in the switchover, failover, and validation operations.

  • This technology may not be enough for many systems. If the system has more types of storage (for example, shared OCI File Storage file systems), then you'll need to use a different replica technology for them.

Set Up Replication for OCI Block Volumes

To implement OCI Block Volumes replication, the following steps are required:

  • Use the OCI Console to define the Volume Groups in the primary region, grouping the block volumes that you need to replicate.

    A volume group can contain only block volumes that are in the same availability domain (AD), and all the block volumes in the group are replicated to one destination AD only. If your block volumes are located in more than one AD, then create a Block Volume Group for each combination of source and destination ADs.

  • Enable the replica in the Volume Groups to the appropriate ADs of the secondary region.
  • Connect to the mid-tier hosts in the secondary system and unmount the block volumes that will be replicated from primary.
  • Use the OCI Console to detach and discard all the block volumes that will be replicated from the primary system. They will no longer be used.
  • Implement a way to manage the site-specific information that resides in the block volumes by updating it with the appropriate information after the replica.

This implementation applies to any block volume except boot volumes. Boot volume replication has other implications and is out of the scope of this implementation.

Example 1: Use OCI Block Volumes replication to replicate the mid-tier configuration block volume

Note:

This example applies to any mid-tier system. As a reference, it explains how to replicate the block volumes that contain the Oracle WebLogic configuration of a Oracle WebLogic Server for OCI stack. But you can follow the same steps to replicate other block volumes in a mid-tier system, except for the boot volumes.

The following image is an example of an Oracle WebLogic Server system with OCI Block Volumes cross-region replica.



wls-bv-cross-replica-oracle.zip

To set up the cross-region replica for the block volumes, follow these steps:

  1. Back up the information that is specific to each site.

    The block volume can contain files with information that is specific to each site, for example, connection strings to databases or to LDAP servers.

    When using block volume replica, the replicated block volumes are an exact copy of the primary block volumes; you can’t skip specific files or folders from the replica. Hence, you must manage these differences by adapting the information on each site. There are various approaches:

    • You can perform a string search and replacement in the files with site-specific information.
    • You can back up this information before the replica and restore it afterward.

    At this point, before enabling the replica, identify and back up any file with site-specific information that resides in the block volumes that are replicated. Make the backup copy in a location that is not under the replicated block volume; otherwise, it will be overridden.

    Tip:

    Oracle WebLogic Server Example

    For example, when replicating block volumes that contain a WebLogic domain, there are files with information to connect to the database. This information is in the TNS admin folder. Check the tns_admin property in the WebLogic data sources to identify the folder. This document provides scripts to manage this, following the appropriate approach depending on the scenario:

    • If the system connects to an Oracle Base Database Service or Oracle Exadata Database Service, then you can just update the database connection string in the tnsnames.ora file of the secondary mid-tier system during the switchover and failover operations. This document provides an example script for this.
    • If the system connects to an Oracle Autonomous Database, then the TNS admin folder contains more artifacts (a trust store and a keystore). They are different in primary and standby, and they can’t be updated with a simple string replacement. This document provides an script that restores the backup copy of the TNS folder.

    At this point, you only need to perform a backup of the TNS folder information.

  2. Identify the Block Volumes of the primary mid-tier hosts.
    1. Go to the OCI Console, select your Primary region, and choose your compartment.
    2. Navigate to Storage, and then Block Volumes. Identify the block volumes and the mount points.
    3. Make a note of the names, the AD where they are located, the host they are attached to, and the mount point.

    Tip:

    Oracle WebLogic Example

    For example, the block volumes that contain the WebLogic domain in the Oracle WebLogic Server for OCI and Oracle SOA Suite on Marketplace stacks are the data block volumes. Their names are: prefix-data-block-N (where N is the number of the host node) and are mounted in /u01/data in each host.

    Block Volume in Primary AD Host Mount Point
    prefix-data-block-0 AD1 prefix-wls-0 /u01/data
    prefix-data-block-1 AD2 prefix-wls-1 /u01/data

    You may have additional block volumes to store the Oracle product homes. For example, in a Oracle WebLogic Server for OCI stack, the computes also have the prefix-mw-block-N block volumes, mounted in /u01/app.

    If you create the secondary with the WLS-HYDR framework for a primary stack, then it has two redundant OCI File Storage file systems to store the Oracle products instead of block volumes. Hence, for products, the primary uses block volumes and the secondary OCI File Storage. If you want, you can configure ongoing replication for the “mw” Block Volumes as well. Just configure the Block Volume replica for them in the primary and rule out the product’s OCI File Storage file systems of the secondary. However, since these items contain the Oracle product homes, it is not mandatory to replicate these items on an ongoing basis. See "Mid-Tier File Artifacts" for more information.

  3. Identify the Block Volumes in the secondary mid-tier hosts.
    Repeat the steps described in the previous step to get the names and Availability Domains (ADs) of the block volumes of the secondary mid-tier hosts.

    Tip:

    Oracle WebLogic Example

    If you create the secondary system with WLS-HYDR framework, then the hosts and block volume names can have different suffix numbering than primary. The marketplace stacks use suffixes 0,1,2,3, while a system created with the WLS-HYDR framework uses suffixes 1,2,3,4. Make sure you correctly identify the peer nodes and volumes. For example:

    Block Volume in Secondary AD Host Mount Point
    prefixBV1 AD1 prefixhost-1 /u01/data
    prefixBV2 AD2 prefixhost-2 /u01/data
  4. Create Block Volume Groups in primary and enable the cross-region replica.
    Create Block Volume Groups in the primary to group all the block volumes that are going to be replicated from a particular AD to a particular AD in the secondary. The replica is enabled at the Volume Group level, so it applies to all the Block Volumes in that group. A volume group can contain only block volumes that are in the same AD, and all the block volumes in the group are replicated to one destination AD only. So, if your compute instances are located in more than one AD, create a Block Volume Group for each combination of source and destination ADs.

    Perform the following steps to create a Block Volume Group and enable the cross-region replica:

    1. Log on to the OCI Console in the primary region.
    2. Navigate to Storage, then Volume Groups.
    3. Create a block volume group.
      For example: prefix-BVGroup-region1AD1-region2AD1
    4. Add the block volumes that you will replicate within the volume group.

      Note:

      Do not add Boot Volumes. They are not replicated.
    5. Enable cross-region replication in the Volume group.
      • Target region: Select the secondary region.
      • Availability domain: Select the AD in the secondary region where the computers that will mount the replicated volumes are located.
      • Volume Group Replica Name: Enter the name for the replica Block Volume group. For clarity, use the same Block Volume group as in the primary.
    6. Save the changes.
  5. Verify that the replicas are created in the secondary region.
    1. In the OCI Console, select the secondary region.
    2. Navigate to Storage, then click Block Storage, and then Volume Group Replicas.
  6. Repeat the steps to create additional Block Volume Groups if your primary compute instances reside in more than one AD.

    Tip:

    Oracle WebLogic Examples

    When the primary is a Marketplace stack and the secondary is created with WLS-HYDR:

    In Oracle WebLogic Server for OCI and Oracle SOA Suite on Marketplace stacks: if the region has multiple availability domains (3), then it distributes the compute instances across them. For example, node0 in AD1, node1 in AD2, node2 in AD3, node3 in AD1.

    In a system created by the WLS-HYDR: if the region has multiple availability domains (3), then the user can choose to distribute the compute instances across them or not. If yes, it distributes the compute instances across 2 ADs. For example, node1 in AD1, node2 in AD2, node3 in AD1, node4 in AD2.

    You must define the BV groups properly to group block volumes that are replicated to the same AD in the destination. A volume group can contain only block volumes that are in the same AD, and all the block volumes in the group can be replicated to one destination AD only. If there are mixes (OCI Block Volumes in the same origin AD but different destination AD, and vice versa), then you need to create as many Block Volume Groups as needed to manage all the replica combinations. Here are some example scenarios:

    • Example 2, Two nodes, only 1 AD in primary and secondary
      • primary region: node0 in AD1, node1 in AD1
      • secondary region: node1 in AD1, node2 in AD1

      Solution:

      1 Volume Group in primary, replicating to 1 Volume Group in secondary

    • Example 3, Two nodes, more than 1 AD in primary and secondary
      • In primary region: node0 in AD1, node1 in AD2
      • In secondary region: node1 in AD1, node2 in AD2

      Solution:

      Primary will have these volume groups:

      • volume-group-AD1 (with node0's BV) replicated to secondary AD1 (for secondary node1)
      • volume-group-AD2 (with node1's BV) replicated to secondary AD2 (for secondary node2)
    • Example 4, Six nodes, more than 1 AD in primary and secondary
      • In primary region: node0 in AD1, node1 in AD2, node2 in AD3, node3 in AD1, node4 in AD2, node5 in AD3
      • In secondary region: node1 in AD1, node2 in AD2, node3 in AD1, node4 in AD2, node5 in AD1, node6 in AD2

      Solution:

      Primary needs multiple volumes groups: (the same in the other way after a switchover)

      • volume-group-reg1AD1-reg2AD1 with node0's BV replicated to secondary AD1 (for secondary node1)
      • volume-group-reg1AD2-reg2AD2 with node1's BV replicated to secondary AD2 (for secondary node2)
      • volume-group-reg1AD3-reg2AD1 with node2's BV replicated to secondary AD1 (for secondary node3)
      • volume-group-reg1AD1-reg2AD2 with node3's BV replicated to secondary AD2 (for secondary node4)
      • volume-group-reg1AD2-reg2AD1 with node4's BV replicated to secondary AD1 (for secondary node5)
      • volume-group-reg1AD3-reg2AD2 with node5's BV replicated to secondary AD2 (for secondary node6)
  7. Detach the original Block Volumes from the secondary mid-tier hosts.

    Note:

    Boot Volumes must NOT be unmounted or detached.

    Perform the following for each mid-tier host in the secondary:
    1. Unmount the data block volume that is replicated from the primary.
      Ensure that no oracle processes are running; otherwise, the unmount will fail.
      For example,
      [opc@host ~]$ sudo umount /u01/data
    2. As the root user, edit the /etc/fstab file and remove the entry for the block volume unmounted.
      This prevents it from trying to mount the original block volumes in the next reboot. Example entry for the volume mounted in /u01/data:
      ..
      #Remove this entry:
      #UUID=9e87cf72-a75c-4dff-9825-432f1668d8f9 /u01/data ext4 auto,defaults,_netdev,nofail 0 2
    3. Detach the block volume from the OCI Console.
      Go to each Block Volume, then Attached Instances, then Detach from Instance. The OCI Console will ask you to run some ISCSI commands before completing the detachment.
    4. Repeat these steps in all the mid-tier nodes in the secondary.
  8. Delete or rename the detached the OCI Block Volumes in the secondary.
    The original data block volumes detached from the secondary mid-tier hosts are no longer used. You can delete them now or rename and delete later.
  9. Restart the systemd daemon in the secondary mid-tier hosts.
    To refresh any cached references to the previously mounted devices, run this command:
    sudo systemctl daemon-reload
  10. If required, prepare the scripts to replace the information specific to each site.

    This action applies only when the Block Volumes contain information specific to each site. Otherwise, no action is required.

    Create scripts to replace the local site information, according to your specific requirements. For example, performing a search and replace, or restoring a backup copy of the site-specific data. Make sure you store these scripts in a folder that is NOT replicated.

    Do not run the scripts at this point. You'll use the scripts the next time you perform a validation, a switchover, or a failover.

    Tip:

    Oracle WebLogic Example

    For example, when you replicate Block Volumes that contain a WebLogic domain. During a switchover or failover, you need to perform a replacement on the configuration to point to the local database. This document provides example scripts to automate this replacement.

    Database Type Replacement Script and Download Steps Prepare Steps
    Oracle Base Database Service or Oracle Exadata Database Service

    replacement_script_BVmodel.sh

    1. Go to the MAA repository in GitHub https://github.com/oracle-samples/maa
    2. Download all the scripts in the wls_mp_dr directory.

      The script is located in the folder wls_mp_dr/Block_Volume_Replica_Method

    3. Copy to all the mid-tier hosts.

    This script replaces the database connection strings. It also cleans up the state files of the WebLogic servers (.lck and .state) for a clean startup.

    Edit and customize it in each host with the appropriate values, by providing the local and remote values for the database in each site.

    Note that the values are different depending on the site. When you customize it in the site1 hosts, the “LOCAL” values refer to the site1’s values, and the “REMOTE” values refer to the site2's values. When you customize the script in the site2 hosts, the “LOCAL” values refer to the site2 and the “REMOTE” values to the site1.
    Autonomous Database

    fmwadb_switch_db_conn.sh

    1. Go to the MAA repository in GitHub https://github.com/oracle-samples/maa
    2. Download all the scripts in the app_dr_common directory.

      Download all the scripts in the fmw-wls-with-adb-dr directory.

    3. Copy to all the mid-tier hosts.

      The scripts make calls to each other.

    4. Place all the scripts of both directories in the same folder.

    You don’t need to edit the script. The values of the folder and the password are passed as inputs.

    To run the script:
    ./fmwadb_switch_db_conn.sh WALLET_DIR WALLET_PASSWORD

    Where the WALLET_DIR is a folder that contains the tnsnames.ora, keystore, and truststore files to connect to the local database.

    Ensure that the WALLET_DIR folder is not overridden in the replica.

    Do not run the script at this point.

Validate Replication for OCI Block Volumes

In a switchover or failover operation, the replicated information must be available and usable in the standby site before the processes are started. This is also required when you validate the secondary system (by opening the standby database in snapshot mode).

The image shows how activation creates attachable OCI Block Volumes from the replicas.



activation-create-bv-oracle.zip

Perform the following to make the replicated volumes available and usable in the standby system:

  1. Activate the replicas in the standby site.
    OCI Block Volumes replicas can’t be mounted directly; you must activate them first. When you activate a block volume (BV) replica, an "attachable" BV is created as a clone of the replicated BV. Then, you can attach the cloned BV to the compute instances.
    Perform the following steps to activate the replicas in the standby site:
    1. In the OCI Console, go to the region of the standby site. Select Block Storage, then Volume Group Replicas.
    2. Click the Volume Group replica and then click Activate.
    3. Name the Volume Group created as a result of this activation. For simplicity, use the same name as in the primary region.
    4. Repeat the same steps for all the Volume Group replicas in the standby site.
  2. Attach the replicated block volumes to mid-tier hosts in the standby site.
    1. In the OCI Console, select Storage, then Block Volume to locate the attachable OCI Block Volumes created as a result of the activation in the standby site.
    2. Attach the appropriate Block Volume to the appropriate host. Click Block Volume, then Attached Instances, then Attach to Instance. To simplify the procedure, select use Oracle Cloud Agent to automatically connect to iSCSI-attached volumes.
      The Cloud Agent will automatically run iSCSI commands, so you don’t have to run them. To use this, make sure you enable the block volumes management plug-in in the host.
    3. If you don’t use the Oracle Cloud Agent, run the iSCSI commands manually. Click ISCSI Commands & Information of the attached block volume and run the ISCSI commands provided in “Commands for connecting" in the mid-tier host.
  3. Mount the replicated block volumes in the standby hosts.
    Perform the following for each block volume:
    1. Get the UUID of the new attached block volume.
      It is the same UUID that the Block Volume has in the primary site. For example:
      [root@prefix-wls-0 opc]# sudo blkid
      /dev/sda3: UUID="974147f5-d731-41de-bba8-56ff78ed1c9c" TYPE="xfs"    PARTUUID="4a95c68a-bc70-4be9-bce8-b15e995fcf46"
      /dev/sda1: SEC_TYPE="msdos" UUID="593B-B893" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="c5ac3089-6a91-40e0-bcc1-212ba0b43418"
      /dev/sda2: UUID="9ca12daa-d7ea-44a2-8680-5b676488b054" TYPE="swap" PARTUUID="682a63d1-d3ec-4019-b372-43720aaae717"
      /dev/sdb: UUID="35e72262-979a-4d84-85ce-a6f91e3b1250" TYPE="ext4" 
      /dev/sdc: UUID="c293b5b5-005c-43e9-8c2f-02e873b76926" TYPE="ext4" 
    2. If it is not already, then add an entry for the appropriate UUID in the /etc/fstab file in the host to mount and to persist the mount after reboots.
      Ensure you use the same file system format (for example, ext4) as in the primary site. For example:
      UUID=c293b5b5-005c-43e9-8c2f-02e873b76926 /u01/data ext4  auto,defaults,_netdev,nofail
      The UUID of each replicated block volume remains the same value. Oracle recommends keeping the newly added entry in the /etc/fstab file for the future. Hence, the systemd daemon will automatically mount the block volume the next time it is attached during a switchover or failover operation.
    3. Mount the new attached block volume. If the appropriate entry already exists in the /etc/fstab file when the device is attached, then the block volume is automatically mounted after being attached.
      The following example is for how to mount the new attached block volume.
      [root@prefix-wls-0 opc]# mount -a
      [root@prefix-wls-0 opc]# df -h| grep /u01/data
      /dev/sdb 49G 1.4G 46G 3% /u01/data
    4. Repeat the steps to attach all the activated block volumes.
  4. Replace the information that is site-specific in the secondary mid-tier hosts.
    The replacement script replaces the site-specific information in the secondary mid-tier hosts.

    Tip:

    Example for block volumes that contain the WebLogic domain

    Update the database connection information to point to the local database by running the replacement script on all the standby mid-tier hosts:

    1. If the system uses Oracle Base Database Service or Oracle Exadata Database Service, then run the replacement_script_BVmodel.sh script.

      Make sure it uses the appropriate values.

    2. If the system uses Oracle Autonomous Database, then run the fmwadb_switch_db_conn.sh script.

      The script requires, as inputs, the path where the secondary original wallet is and the wallet password.

      If the tns_admin folder is under the DOMAIN_HOME/config folder, then you can run the script only on the administration host. The rest of the nodes will download the updated tnsnames.ora when the managed servers start. Otherwise, run the script on all the mid-tier hosts.

  5. Clean up the servers’ lock files.
    The replicated block volumes may contain lock files of the mid-tier process, because the replica runs while the primary processes are up. Before starting the processes in secondary, you may need to clean up these files. Otherwise, they can prevent the mid-tier processes from starting

    Tip:

    Example for block volumes that contain the WebLogic domain

    There may be .lck, .pid, or .state files in ${DOMAIN_HOME}/servers/*/data/nodemanager folders carried from the primary. Make sure that these files are cleaned up before trying to start the node manager and the servers. For example

    rm -f ${DOMAIN_HOME}/servers/*/data/nodemanager/*.lck
    rm -f ${DOMAIN_HOME}/servers/*/data/nodemanager/*.state
    rm -f ${DOMAIN_HOME}/servers/*/data/nodemanager/*.pid
    

    You can include this action in the replacement scripts or as a previous step in the Oracle WebLogic start-up.

    The activation creates attachable Block Volumes from the replicas, as shown in the previous image.
  6. When the switchover or failover has finished, the Block Volumes of the site with the standby role must be detached and deleted. This is also required when you have completed a validation on the standby site (by opening the standby database in snapshot mode) and want to revert it to the standby role.
    1. Unmount all the block volumes in the standby site that are replicated from the primary.
      [root@prefix-wls-0 opc]# umount /u01/data
    2. Detach block volumes in standby.
      Use the OCI Console UI (or the API) to detach the unmounted block volumes from the standby mid-tier hosts to prepare them for the future. If you used Oracle Cloud Agent to attach the block volume, then the agent runs the iSCSI commands to log off the iSCSI targets.
    3. Delete block volumes and groups in standby.

      Delete or rename the detached volumes from the standby mid-tier hosts to prevent mounting them by mistake.

      Delete the unused Volume groups in the standby site. They will not be used anymore.

Perform Ongoing Replication for OCI Block Volumes

Follow these recommendations for the ongoing replication when using this implementation:

  • OCI automatically performs OCI Block Volumes replication in the background. The only thing you need to do during the lifecycle of the system is ensure that the Volume Groups of the system with the primary role have the cross-region replica enabled.
  • Consider using OCI Full Stack Disaster Recovery to automate the switchover and failover tasks. It provides the ability to run a switchover or failover plan with just one click using the OCI Console. It is very useful to simplify the execution of all the tasks related to Block Volume replica.
  • The replication feature is complementary to the backup feature, not a replacement. Make sure you enable a backup policy for the block volumes that you replicate. This will provide data protection in addition to the cross-region replica, allowing you to restore to a point-in-time.
  • Maintain the information that is specific to each site and keep it up-to-date. For example, if the file system contains a folder with the artifacts to connect to an Oracle Autonomous Database, maintain a backup copy of this folder. Ensure that you update the backup of the folder when you perform an update in the wallet. This way, it will be correctly restored in subsequent switchover and failovers.
  • After a switchover or a failover operation, change the replica direction. For this:
    • Enable the replica in the OCI Block Volumes groups of the new primary to the new standby site.
    • Disable the previous replication from the original primary and delete the unused block volumes.