Note:

Deploy Microsoft SQL Server Distributed Always On Availability Groups for DR on OCI

Introduction

Distributed Always On availability group (Distributed availability group) is a powerful feature in Microsoft SQL Server that extends the capabilities of traditional availability group for SQL Server.

Distributed availability groups allows you to create a Disaster Recovery (DR) solution that spans multiple Windows Server Failover Clusters (WSFCs) running in different Oracle Cloud Infrastructure (OCI) regions.

This enables you to achieve higher levels of availability, disaster recovery capabilities, and geographic distribution for your critical SQL Server databases running on OCI.

Exclusions for this Tutorial

In this tutorial, we will not cover step by step creation of the single Microsoft SQL Server Always On availability groups. For more information, see Deploy Microsoft SQL Server Always On Availability Groups for HA and DR on OCI.

Please refer the following official Microsoft documentation:

Objectives

Prerequisites

Task 1: Create Distributed Availability Group

Create distributed availability group (distributed-aoag) composed by the two underlying already running Always On availability groups.

As already mentioned, we assume that two independent Always On availability groups are already up and running in two different OCI regions.

The second Always On availability group (marseille-aoag), the standby one, needs to have no databases associated, so practically the second Always On availability group needs to be empty before the distributed availability group creation, so without any availability database associated. You can create the second Always On availability group as usual with an initial database associated and after that you can remove this database that has been used only to create the second Always On availability group. This is because the graphical interface is not possible to create an Always On availability group with any database associated.

  1. Create distributed availability group on the first Always On availability group.

    Connect to SQL Server on the first server (Paris site sql-srv1 node in this example), and run the following SQL commands.

    Note:

    • The listener names are paris-sql-list and mars-sql-list.

    • TCP port to be used, 5022 is the port of the endpoint, and must be used. This is usually different from the listener port (1433).

    • Availability group names, must be exactly the names used by the already running Always On availability group.

    USE MASTER;
    CREATE AVAILABILITY GROUP [distributed-aoag]  
       WITH (DISTRIBUTED)   
       AVAILABILITY GROUP ON  
          'paris-aoag' WITH    
          (   
             LISTENER_URL = 'tcp://paris-sql-list.acme.corp:5022',    
             AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
             FAILOVER_MODE = MANUAL,   
             SEEDING_MODE = AUTOMATIC   
          ),   
          'marseille-aoag' WITH    
          (   
             LISTENER_URL = 'tcp://mars-sql-list.acme.corp:5022',   
             AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
             FAILOVER_MODE = MANUAL,   
             SEEDING_MODE = AUTOMATIC   
          );    
    GO
    
  2. Join to the distributed availability group on the second Always On availability group.

    Connect to SQL Server on the first server of the second Always On availability group (Marseille site sql-srv3 server) and run the following SQL commands.

    USE MASTER;
    ALTER AVAILABILITY GROUP [distributed-aoag]   
       JOIN   
       AVAILABILITY GROUP ON  
          'paris-aoag' WITH    
          (   
             LISTENER_URL = 'tcp://paris-sql-list.acme.corp:5022',    
             AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
             FAILOVER_MODE = MANUAL,   
             SEEDING_MODE = AUTOMATIC   
          ),   
          'marseille-aoag' WITH    
          (   
             LISTENER_URL = 'tcp://mars-sql-list.acme.corp:5022',   
             AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
             FAILOVER_MODE = MANUAL,   
             SEEDING_MODE = AUTOMATIC   
          );    
    GO 
    
  3. It is important to understand that unlike traditional availability groups, distributed availability groups do not require resource groups or roles in the WSFC. All metadata is managed within SQL Server. This means that even SQL Server Management Studio does not directly display the names of databases in the distributed availability group.

    To view this information, run the following Transact-SQL script.

    --View metadata and status of the Distributed Availability Group
    SELECT r.replica_server_name, r.endpoint_url,
    rs.connected_state_desc, rs.role_desc, rs.operational_state_desc,
    rs.recovery_health_desc,rs.synchronization_health_desc,
    r.availability_mode_desc, r.failover_mode_desc
    FROM sys.dm_hadr_availability_replica_states rs 
    INNER JOIN sys.availability_replicas r
    ON rs.replica_id=r.replica_id
    ORDER BY r.replica_server_name   
    

    image

Note: To accommodate potential network latency between regions, we have configured the primary and secondary Always On availability group with asynchronous-commit replication. This minimizes performance overhead on the primary database. Within each Always On availability group, we have opted for synchronous-commit replication among replicas to ensure high availability. However, for failover between asynchronous-commit replicas (in case of distributed availability group), reducing data loss requires a temporary switch to synchronous-commit mode before initiating the failover. For the failover_mode, the only available mode for distributed availability group is manual.

Task 2: Failover Procedure for Distributed Availability Group

In this task, we will talk about failover between the two Always On availability groups. Failover procedure of the database is composed by the following easy steps and checks.

Follow the steps:

  1. Run the following scripts to check if synchronization is fine, first on the currently primary SQL of the currently primary site and then on the primary SQL of the secondary site.

    The executed results needs to be CONNECTED_STATE = CONNECTED and SYNCHRONIZATION_HEALTH = HEALTHY.

    select ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, 
       ars.operational_state_desc, ars.synchronization_health_desc from sys.availability_groups ag  
       join sys.availability_replicas ar on ag.group_id=ar.group_id
       left join sys.dm_hadr_availability_replica_states ars
       on ars.replica_id=ar.replica_id
       where ag.is_distributed=1
    GO
    

    image

    image

  2. Change the availability mode. Run the following script first on the currently primary SQL Server of the currently primary site and then on the primary SQL Server of the secondary site.

    --Run this first on the primary replica of the primary AOAG and then again on the secondary AOAG
    USE MASTER;     
    ALTER AVAILABILITY GROUP [distributed-aoag]     
    MODIFY 
    AVAILABILITY GROUP ON
    'paris-aoag' WITH 
    ( 
       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
    ), 
    'marseille-aoag' WITH     
    ( 
       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
    );
    
  3. To ensure no data loss, verify the results of this step. The status must be SYNCHRONIZED, and the last_hardened_lsn should match for each database on both the global primary and the forwarder.

    -- Run this query on the Global Primary and the forwarder
    -- Check the results to see if synchronization_state_desc is SYNCHRONIZED, and the last_hardened_lsn is the same per database on both the global primary and forwarder 
    -- If not rerun the query on both side every 5 seconds until it is the case
    --
    SELECT ag.name
             , drs.database_id
             , db_name(drs.database_id) as database_name
             , drs.group_id
             , drs.replica_id
             , drs.synchronization_state_desc
             , drs.last_hardened_lsn  
    FROM sys.dm_hadr_database_replica_states drs 
    INNER JOIN sys.availability_groups ag on drs.group_id = ag.group_id;
    

    image

    image

  4. You are now ready to set REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT on the current primary SQL Server at the current primary site.

    --Run this script into Primary AOAG
    USE MASTER;
    ALTER AVAILABILITY GROUP [distributed-aoag]
    SET (REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 1);
    

    And now you are ready to change the role of the primary Always On availability group from primary to secondary.

    --Run this script into Primary AOAG, this will terminate client applications connected to the primary replica of the primary AOAG
    USE MASTER;
    ALTER AVAILABILITY GROUP [distributed-aoag] SET (ROLE = SECONDARY);
    

    image

  5. Change the role of the secondary Always On availability group from secondary to primary. The following script will perform this role change, enabling the database for read or write operations. Additionally, the roles of the Always On availability groups within the distributed availability group will also be updated accordingly.

    --Run this script into Secondary AOAG, this will terminate client applications connected to the primary replica of the primary AOAG
    ALTER AVAILABILITY GROUP [distributed-aoag] FORCE_FAILOVER_ALLOW_DATA_LOSS;
    

    Run the following script to check the status.

    --check the status on the new primary (formerly standby site)
    select ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, 
       ars.operational_state_desc, ars.synchronization_health_desc from sys.availability_groups ag  
       join sys.availability_replicas ar on ag.group_id=ar.group_id
       left join sys.dm_hadr_availability_replica_states ars
       on ars.replica_id=ar.replica_id
       where ag.is_distributed=1
    GO
    

    image

    image

  6. Now on the new primary SQL Server, unset REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT.

    --Run this script into Primary AOAG
    ALTER AVAILABILITY GROUP [distributed-aoag] 
    SET (REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0);
    

    Change the availability mode back to the standard mode by running the following script on both the primary and secondary sites.

    --Run this first on the primary replica of the primary AOAG and then again on the secondary AOAG
    ALTER AVAILABILITY GROUP [distributed-aoag]     
    MODIFY 
    AVAILABILITY GROUP ON
    'paris-aoag' WITH 
    ( 
       AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT 
    ), 
    'marseille-aoag' WITH     
    ( 
       AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT 
    );
    

Task 3: Failback Procedure for Distributed Always On Availability Group

To restore Paris as the primary site and Marseille as the secondary site, simply perform a new switchover to reverse the synchronization, as outlined in Task 2.

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.