Active-Active Disaster Recovery

Oracle Linux Virtualization Manager supports an active-active disaster recovery failover configuration that can span two sites, both of which are active. If the primary site becomes unavailable, the Oracle Linux Virtualization Manager environment smoothly transitions to the secondary site to ensure business continuity.

To support active-active failover, you must configure a stretch cluster where hosts capable of running all the virtual machines in the cluster are located in the primary and secondary site. All the hosts belong to the same Oracle Linux Virtualization Manager cluster. You can implement a stretched cluster configuration using a self-hosted engine environment or a standalone Engine environment.

With active-active disaster recovery you must also have replicated storage that is writable on both sites. This enables virtual machines to migrate between sites and continue running on the site’s storage.

Virtual machines migrate to the secondary site if the primary site becomes unavailable. When the primary site becomes available and the storage is replicated in both sites, virtual machines automatically failback.

To ensure virtual machine failover and failback works, you must configure:

  • virtual machines for highly availability, and each virtual machine must have a lease on a target storage domain to ensure the virtual machine can start even without power management.

  • soft enforced virtual machine to host affinity to ensure the virtual machines only start on the selected hosts.

Network Considerations

All hosts in the stretch cluster must be on the same broadcast domain over a Layer 2 (L2) network, which means that connectivity between the two sites needs to be L2.

The maximum latency requirements between the sites across the L2 network is different for the standalone Engine environment and the self-hosted engine environment:

  • A maximum latency of 100ms is required for the standalone Engine environment

  • A maximum latency of 7ms is required for self-hosted engine environment

Storage Considerations

The storage domain for Oracle Linux Virtualization Manager can be either block devices (iSCSI or FCP) or a file system (NAS/NFS or GlusterFS).

Both sites require synchronously replicated storage that is writable with shared L2 network connectivity to allow virtual machines to migrate between sites and continue running on the site’s storage. All storage replication options supported by Oracle Linux 8 and later can be used in the stretch cluster.

For more information, see the storage topics in the Administration Guide and the Architecture and Planning Guide.

Configuring a Standalone Engine Stretch Cluster Environment

Before you begin configuring your standalone engine environment for a stretch cluster, review the following prerequisites and limitations:

  • A writable storage server in both sites with L2 network connectivity.

  • Real-time storage replication service to duplicate the storage.

  • Maximum 100ms latency between sites.

    The Engine must be highly available for virtual machines to failover and failback between sites. If the Engine goes down with the site, the virtual machines do not failover.

  • The standalone Engine is only highly available when managed externally, for example:

    • As a highly available virtual machine in a separate virtualization environment

    • In a public cloud

To configure a standalone engine stretch cluster:

  1. Install and configure the Oracle Linux Virtualization Manager engine.

    For more information, see Installation and Configuration in the Oracle Linux Virtualization Manager: Getting Started Guide.

  2. Install hosts in each site and add them to the cluster.

    For more information, see Configuring a KVM Host in the Oracle Linux Virtualization Manager: Getting Started Guide.

  3. Configure the storage pool manager (SPM) priority to be higher on all hosts in the primary site to ensure SPM failover to the secondary site occurs only when all hosts in the primary site are unavailable.

    For more information, see Storage Pool Manager in the Oracle Linux Virtualization Manager: Architecture and Planning Guide.

  4. Configure all virtual machines that need to failover as highly available and ensure that a virtual machine has a lease on the target storage domain.

    For more information, see Optimizing Clusters, Hosts and Virtual Machines.

  5. Configure virtual machine to host soft affinity and define the behavior you expect from the affinity group.

    For more information, see Affinity Groups in the oVirt Virtual Machine Management Guide.

    Important:

    With VM Affinity Rule Enforcing enabled (shown as Hard in the list of Affinity Groups), the system does not migrate a virtual machine to a host different from where the other virtual machines in its affinity group are running. For more information, see Virtual Machine Issues in the Oracle Linux Virtualization Manager: Release Notes.

The active-active failover can be manually performed by placing the main site’s hosts into maintenance mode.

Configuring a Self-Hosted Engine Stretch Cluster Environment

Before you begin configuring your self-hosted engine environment for a stretch cluster, review the following prerequisites and limitations:

  • A writable storage server in both sites with L2 network connectivity

  • Real-time storage replication service to duplicate the storage

  • Maximum 7ms latency between sites

To configure a self-hosted engine stretch cluster:

  1. Deploy the Oracle Linux Virtualization Manager self-hosted engine.

    For more information, see Self-hosted Engine Deployment in the Oracle Linux Virtualization Manager: Getting Started Guide.

  2. Optionally, install additional hosts in each site and add them to the cluster.

    For more information, see Adding a KVM Host in the Oracle Linux Virtualization Manager: Getting Started Guide.

  3. Configure the storage pool manager (SPM) priority to be higher on all hosts in the primary site to ensure SPM failover to the secondary site occurs only when all hosts in the primary site are unavailable.

    For more information, see Storage Pool Manager in the Oracle Linux Virtualization Manager: Architecture and Planning Guide.

  4. Configure all virtual machines that need to failover as highly available and ensure that a virtual machine has a lease on the target storage domain.

    For more information, see Optimizing Clusters, Hosts and Virtual Machines.

  5. Configure a virtual machine to host soft affinity and define the affinity group's behaviour.

    For more information, see Affinity Groups in the oVirt Virtual Machine Management Guide.

    Important:

    With VM Affinity Rule Enforcing enabled (shown as Hard in the list of Affinity Groups), the system does not migrate a virtual machine to a host different from where the other virtual machines in its affinity group are running. For more information, see Virtual Machine Issues in the Oracle Linux Virtualization Manager: Release Notes.

The active-active failover can be manually performed by placing the main site’s hosts into maintenance mode.