Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Deploy Microsoft SQL Server Distributed Always On Availability Groups for DR on OCI
Introduction
Distributed Always On availability group (Distributed availability group) is a powerful feature in Microsoft SQL Server that extends the capabilities of traditional availability group for SQL Server.
Distributed availability groups allows you to create a Disaster Recovery (DR) solution that spans multiple Windows Server Failover Clusters (WSFCs) running in different Oracle Cloud Infrastructure (OCI) regions.
This enables you to achieve higher levels of availability, disaster recovery capabilities, and geographic distribution for your critical SQL Server databases running on OCI.
Exclusions for this Tutorial
In this tutorial, we will not cover step by step creation of the single Microsoft SQL Server Always On availability groups. For more information, see Deploy Microsoft SQL Server Always On Availability Groups for HA and DR on OCI.
Please refer the following official Microsoft documentation:
Objectives
- Create Microsoft SQL Server distributed Always On availability group solution on OCI.
Prerequisites
-
The building blocks of a distributed availability group are:
-
VCNs: Create OCI Virtual Cloud Networks (VCNs) into two separate OCI regions and connect through Dynamic Routing Gateways (DRGs) remote peering.
-
AOAG 1: This is running in OCI Region 1. This is where the database to be replicated runs normally. This is based on a WSFC running on SQL Server #1 and SQL Server #2 in the following example (Paris OCI region).
-
AOAG 2: This is running in OCI Region 2. This is completely independent Always On availability group running on a WSFC composed by SQL Server #3 and SQL Server #4 in the following example (Marseille OCI region).
-
Distributed AOAG: This is a logical construct created in the SQL database to be replicated.
The following image shows the logical representation of a distributed availability group.
-
-
Create two independent Always On availability group (one in the first region and the other in the second region). For more information, see Deploy Microsoft SQL Server Always On Availability Groups for HA and DR on OCI.
Now, we have two independent Always On availability groups running in two different OCI regions peered, in this example, the OCI regions are Paris and Marseille.
-
We have first WSFC cluster (
paris-wsfc
) in the first region with first SQL Always On availability group (paris-aoag
) and the SQL listener (paris-sql-list
) for the SQL Always On availability group.The two Windows nodes are
sql-srv1
andsql-srv2
. -
In the second region, we have second WSFC cluster (
marseille-wsfc
) with second SQL Always On availability group (marseille-aoag
) and the SQL listener (mars-sql-list
) for the second SQL Always On availability group.The two Windows nodes are
sql-srv3
andsql-srv4
. -
From a SQL Server perspective, starting from
sql-srv1
(paris-aoag
), we can see in this example the DemoDB that is the database replicated with the first Always On availability group and the newly createddistributed-aoag
. -
So connecting with
sql-srv3
(marseille-aoag
), we can see in this example also the DemoDB that is the database replicated with the first Always On availability group, the newly createddistributed-aoag
, and themarseille-aoag
that is the second Always On availability group created in the second site (Marseille).
-
Task 1: Create Distributed Availability Group
Create distributed availability group (distributed-aoag
) composed by the two underlying already running Always On availability groups.
As already mentioned, we assume that two independent Always On availability groups are already up and running in two different OCI regions.
The second Always On availability group (marseille-aoag
), the standby one, needs to have no databases associated, so practically the second Always On availability group needs to be empty before the distributed availability group creation, so without any availability database associated. You can create the second Always On availability group as usual with an initial database associated and after that you can remove this database that has been used only to create the second Always On availability group. This is because the graphical interface is not possible to create an Always On availability group with any database associated.
-
Create distributed availability group on the first Always On availability group.
Connect to SQL Server on the first server (Paris site
sql-srv1
node in this example), and run the following SQL commands.Note:
-
The listener names are
paris-sql-list
andmars-sql-list
. -
TCP port to be used,
5022
is the port of the endpoint, and must be used. This is usually different from the listener port (1433
). -
Availability group names, must be exactly the names used by the already running Always On availability group.
USE MASTER; CREATE AVAILABILITY GROUP [distributed-aoag] WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'paris-aoag' WITH ( LISTENER_URL = 'tcp://paris-sql-list.acme.corp:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ), 'marseille-aoag' WITH ( LISTENER_URL = 'tcp://mars-sql-list.acme.corp:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ); GO
-
-
Join to the distributed availability group on the second Always On availability group.
Connect to SQL Server on the first server of the second Always On availability group (Marseille site
sql-srv3
server) and run the following SQL commands.USE MASTER; ALTER AVAILABILITY GROUP [distributed-aoag] JOIN AVAILABILITY GROUP ON 'paris-aoag' WITH ( LISTENER_URL = 'tcp://paris-sql-list.acme.corp:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ), 'marseille-aoag' WITH ( LISTENER_URL = 'tcp://mars-sql-list.acme.corp:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ); GO
-
It is important to understand that unlike traditional availability groups, distributed availability groups do not require resource groups or roles in the WSFC. All metadata is managed within SQL Server. This means that even SQL Server Management Studio does not directly display the names of databases in the distributed availability group.
To view this information, run the following Transact-SQL script.
--View metadata and status of the Distributed Availability Group SELECT r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.role_desc, rs.operational_state_desc, rs.recovery_health_desc,rs.synchronization_health_desc, r.availability_mode_desc, r.failover_mode_desc FROM sys.dm_hadr_availability_replica_states rs INNER JOIN sys.availability_replicas r ON rs.replica_id=r.replica_id ORDER BY r.replica_server_name
Note: To accommodate potential network latency between regions, we have configured the primary and secondary Always On availability group with asynchronous-commit replication. This minimizes performance overhead on the primary database. Within each Always On availability group, we have opted for synchronous-commit replication among replicas to ensure high availability. However, for failover between asynchronous-commit replicas (in case of distributed availability group), reducing data loss requires a temporary switch to synchronous-commit mode before initiating the failover. For the failover_mode, the only available mode for distributed availability group is manual.
Task 2: Failover Procedure for Distributed Availability Group
In this task, we will talk about failover between the two Always On availability groups. Failover procedure of the database is composed by the following easy steps and checks.
-
Initial checks.
-
Change the availability mode from asynchronous to synchronous, for the primary Always On availability group and the secondary Always On availability group.
-
Run scripts to check if the synchronization is fine.
-
Change the role of the primary Always On availability group from primary to secondary.
-
Failover to the secondary Always On availability group.
-
Change the availability mode from synchronous to asynchronous, for the primary Always On availability group and the secondary Always On availability group.
Follow the steps:
-
Run the following scripts to check if synchronization is fine, first on the currently primary SQL of the currently primary site and then on the primary SQL of the secondary site.
The executed results needs to be
CONNECTED_STATE
=CONNECTED
andSYNCHRONIZATION_HEALTH
=HEALTHY
.select ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, ars.operational_state_desc, ars.synchronization_health_desc from sys.availability_groups ag join sys.availability_replicas ar on ag.group_id=ar.group_id left join sys.dm_hadr_availability_replica_states ars on ars.replica_id=ar.replica_id where ag.is_distributed=1 GO
-
Change the availability mode. Run the following script first on the currently primary SQL Server of the currently primary site and then on the primary SQL Server of the secondary site.
--Run this first on the primary replica of the primary AOAG and then again on the secondary AOAG USE MASTER; ALTER AVAILABILITY GROUP [distributed-aoag] MODIFY AVAILABILITY GROUP ON 'paris-aoag' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT ), 'marseille-aoag' WITH ( AVAILABILITY_MODE = SYNCHRONOUS_COMMIT );
-
To ensure no data loss, verify the results of this step. The status must be
SYNCHRONIZED
, and thelast_hardened_lsn
should match for each database on both the global primary and the forwarder.-- Run this query on the Global Primary and the forwarder -- Check the results to see if synchronization_state_desc is SYNCHRONIZED, and the last_hardened_lsn is the same per database on both the global primary and forwarder -- If not rerun the query on both side every 5 seconds until it is the case -- SELECT ag.name , drs.database_id , db_name(drs.database_id) as database_name , drs.group_id , drs.replica_id , drs.synchronization_state_desc , drs.last_hardened_lsn FROM sys.dm_hadr_database_replica_states drs INNER JOIN sys.availability_groups ag on drs.group_id = ag.group_id;
-
You are now ready to set
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT
on the current primary SQL Server at the current primary site.--Run this script into Primary AOAG USE MASTER; ALTER AVAILABILITY GROUP [distributed-aoag] SET (REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 1);
And now you are ready to change the role of the primary Always On availability group from primary to secondary.
--Run this script into Primary AOAG, this will terminate client applications connected to the primary replica of the primary AOAG USE MASTER; ALTER AVAILABILITY GROUP [distributed-aoag] SET (ROLE = SECONDARY);
-
Change the role of the secondary Always On availability group from secondary to primary. The following script will perform this role change, enabling the database for read or write operations. Additionally, the roles of the Always On availability groups within the distributed availability group will also be updated accordingly.
--Run this script into Secondary AOAG, this will terminate client applications connected to the primary replica of the primary AOAG ALTER AVAILABILITY GROUP [distributed-aoag] FORCE_FAILOVER_ALLOW_DATA_LOSS;
Run the following script to check the status.
--check the status on the new primary (formerly standby site) select ag.name, ag.is_distributed, ar.replica_server_name, ar.availability_mode_desc, ars.connected_state_desc, ars.role_desc, ars.operational_state_desc, ars.synchronization_health_desc from sys.availability_groups ag join sys.availability_replicas ar on ag.group_id=ar.group_id left join sys.dm_hadr_availability_replica_states ars on ars.replica_id=ar.replica_id where ag.is_distributed=1 GO
-
Now on the new primary SQL Server, unset
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT
.--Run this script into Primary AOAG ALTER AVAILABILITY GROUP [distributed-aoag] SET (REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0);
Change the availability mode back to the standard mode by running the following script on both the primary and secondary sites.
--Run this first on the primary replica of the primary AOAG and then again on the secondary AOAG ALTER AVAILABILITY GROUP [distributed-aoag] MODIFY AVAILABILITY GROUP ON 'paris-aoag' WITH ( AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT ), 'marseille-aoag' WITH ( AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT );
Task 3: Failback Procedure for Distributed Always On Availability Group
To restore Paris as the primary site and Marseille as the secondary site, simply perform a new switchover to reverse the synchronization, as outlined in Task 2.
Related Links
Acknowledgments
- Authors - Alessandro Volpi (Cloud Solution Specialist)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Deploy Microsoft SQL Server Distributed Always On Availability Groups for DR on OCI
G23214-01
December 2024