Chapter 6 Configuring Geo-replication

Geo-replication provides a facility to mirror data across geographically distributed clusters for the purpose of disaster recovery. The mirroring process is asynchronous, in that it is run periodically and only files that have been modified are replicated. This feature can be used across a LAN, WAN or across the Internet. Since replication is asynchronous, the architecture follows a master-slave model, where changes on the master volume are replicated to a slave volume. In the event of a disaster, data can be restored from a slave volume.

This chapter describes how geo-replication works, the minimum requirements to implement it and what you need to do to configure your environment to achieve it. Some general usage commands are also covered here.

Note

Version 6 of Gluster uses terms Master and Slave which are replaced with the less divisive terms Primary and Secondary in later versions of Gluster. To avoid causing confusion, this document continues to use the terms Master and Slave until such time as the new versions of Gluster become available.

6.1 About Geo-replication

Geo-replication's primary use case is for disaster recovery. It performs asynchronous copies of data from one Gluster volume to another hosted in a separate geographically distinct cluster. Copies are performed using a backend rsync process over an SSH connection.

Geo-replication assumes that there is a master cluster where data is hosted in a configured volume. Changes made to data on the master cluster are copied, periodically, to a volume configured on a slave cluster.

In the event of failure of the master cluster, data can be restored from the slave, or the slave can be promoted to replace the master cluster.

Gluster is agnostic about the actual physical location or network between separate clusters where geo-replication is configured. You can configure geo-replication on the same LAN, on a WAN, across the Internet, or even between virtual machines hosted on the same system. As long as the clusters involved can access each other via SSH across a network, geo-replication can be configured.

Geo-replication also facilitates the possibility of multi-site mirroring, where data from a master cluster is copied to multiple slave clusters; and also cascading mirroring, where data that is replicated to a slave cluster can in turn be mirrored to another slave.

In this documentation, we focus on a basic, tested configuration where data on a standard Gluster volume is geo-replicated to a volume hosted on a single slave cluster.

For more information on geo-replication, see the upstream documentation at https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/

6.2 General Requirements for Geo-replication

Geo-replication requires at least two gluster clusters to be configured to the level where a volume is accessible on each cluster. One cluster is treated as the master cluster, and the second is treated as the slave. The master cluster may already be in use prior to the configuration of a slave cluster.

Clusters must be set up and configured following the requirements set out in Chapter 2, Installing Gluster Storage for Oracle Linux. Notably, all nodes in both clusters must have network synchronized time using an NTP service, such as chrony.

Geo-replication is performed over SSH between the master cluster and the slave cluster. The two clusters should be able to connect to each other via SSH. Access should be available for all nodes in all clusters that take part in geo-replication, so that failover between nodes is handled appropriately. Networking and firewall configuration must facilitate this.

Ensure that the slave cluster volume that you are using for mirroring is clean of data and that it has sufficient capacity to mirror the content on the master cluster.

Install the glusterfs-geo-replication package on all nodes in both clusters. For example:

$ sudo yum install glusterfs-geo-replication

Other general software requirements are met by default on most Oracle Linux systems, but you should check that you have the latest available versions of:

  • rsync

  • openssh

  • glusterfs

  • glusterfs-fuse

Important

When using geo-replication in conjunction with the Gluster snapshot functionality, you must ensure that snapshots should not be out of order on master and slave nodes on restore. Therefore, when taking a snapshot of a volume on master node, pause geo-replication first and take a snapshot on both the master node and on all slave nodes, before resuming geo-replication. Equally, when restoring from a snapshot, pause the geo-replication, and perform the restore on all nodes before resuming geo-replication.

6.3 Setting up a Slave Nodes for Geo-replication

It is possible for the geo-replication service to connect from the master cluster to the slave cluster using the root account. However for this to happen, the SSH connections that are used to synchronize the data are required to connect to the root user. Exposing ssh access for a root user is not good practice for security reasons. On production systems, it is preferable that you create a user and group specifically for the purpose of handling these connections on each of the slave node systems.

$ sudo useradd georep

Substitute georep with a username that you intend to use for this purpose. You can choose to either rely on the group that is created for this user, or if you want to use an alternate group, you can create a separate group and add the new user to this group.

Note that on at least one slave host you should set the password for this user, so that you are able to copy an ssh key to the host later during the configuration.

You can now configure the gluster mountbroker to automatically handle mounting the volume that you intend to use for mirroring with the appropriate permissions. On any single slave node system, you can run the gluster-mountbroker command to do this:

  1. Set up the gluster mountbroker for the new account that you have created:

    $ sudo gluster-mountbroker setup /var/mountbroker-root georep

    This command sets up a root folder for all volumes handled by the mountbroker. Typically, this is set to /var/mountbroker-root, but you can set this to any location on your slave nodes. Note that the directory is created on all nodes when the command is run. Substitute georep with the group that should have permission to this folder. usually this matches the value of the username that you created for this purpose, but if more than one user may access this data, you may want to define a broader group.

  2. Add a volume on your existing slave cluster to the mountbroker.

    $ sudo gluster-mountbroker add slavevol georep

    Substitute slavevol with the name of the volume that you intend to use on your slave cluster. Substitute georep with the name of the user that you created for this purpose.

  3. Check the status of the mountbroker to determine whether everything is set up correctly:

    $ sudo gluster-mountbroker status

Once the mountbroker is configured, restart the glusterd service on all of the slave cluster nodes:

$ sudo systemctl restart glusterd

This step ensures that glusterd becomes aware of the mountbroker configuration.

6.4 Configuring the Geo-replication Session

On the master cluster, you must create the ssh key and ensure that it is added to the authorized_keys on each node on the slave cluster.

For this to work smoothly an initial ssh key should be created without a passphrase on a node in the master cluster. This key should be copied to one node on the slave cluster. On a node in the master cluster, run the following command:

$ sudo ssh-keygen

When prompted for a passphrase, leave this field empty and press Enter.

Once the key is created, copy it to a node in your slave cluster for the georep user. For example:

$ sudo ssh-copy-id georep@slave-node1.example.com

On the same master cluster node where you created the ssh key, run the following command:

$ sudo gluster-georep-sshkey generate

This step creates a separate key that is copied to all of the master nodes and that will be used for subsequent communications with any slave nodes during geo-replication.

To finally configure a geo-replication session, run the following command on the same node on the master cluster:

$ sudo gluster volume geo-replication myvolume georep@slave-node1.example.com::slavevol \
   create push-pem

Substitute myvolume with the name of the volume on the master cluster that you wish to replicate. Substitute georep with the username that you have configured for SSH access on the slave cluster. Substitute slave-node1.example.com with the hostname, domain name or IP address of the node on the slave cluster where you copied an SSH key. Substitute slavevol with the name of the volume on the slave cluster that you intend to use for replication.

When the command runs, it copies the public keys for the master cluster nodes to the node of the slave cluster. On the slave node, you must run the following command to configure the environment to use these keys for the user that you have created there:

            
              sudo /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh georep myvolume slavevol
            
          

6.5 Starting, Stopping and Checking Status of Geo-replication

Once a geo-replication session has been set up and configured, you can start replication. To do this, for example, run:

$ sudo gluster volume geo-replication myvolume georep@slave-node1.example.com::slavevol \
   start

Run the same command but substitute start with stop to stop geo-replication for the volume.

To see the status of all running geo-replication, you can run:

$ sudo gluster volume geo-replication status

The status for any geo-replication session can indicate any potential issues and can show which cluster nodes in each cluster are being used for the geo-replication synchronization. The status may be set to:

  • Initializing: The session is starting and the first connections are being made.

  • Created: The geo-replication session is created, but not started.

  • Active: The gsync daemon in this node is active and syncing the data. Only one replica pair on the master cluster is ever in Active state and handling data synchronization. If the Active node fails, a Passive node is promoted to Active.

  • Passive: A replica pair of the active node. Passive nodes in the cluster are on standby to be promoted to Active status if the currently Active node fails.

  • Faulty: Faulty status can indicate a temporary or critical problem in the geo-replication session. In some cases, the Faulty status may rectify as the geo-replication service attempt to restart on a node. If a Faulty status persists, check log files to try to resolve the issue. You can determine which log file you should look at using the config command. For example:

  • $ sudo gluster volume geo-replication myvolume georep@slave-node1.example.com::slavevol \
       config log-file
  • Stopped: The geo-replication session is stopped.