MySQL Utilities

3.4.6 How Can you determine if data was replicated correctly?

Once the replication system is setup and running, it is not uncommon that one might want to verify if the data is being replicated correctly on the slaves. In normal circumstances, the same data is expected on the master and its slaves (excluding the use of filtering rules). Nevertheless, faults at the data level can introduce inconsistent changes on servers without raising any kind of error. These data inconsistencies can result from bugs, hardware malfunction, human errors, or unauthorized access.

It is desirable to detect these issues, in order to fix them and ultimately prevent them from happening again. Determining the cause of such issues might not be an easy task since it might be caused by byzantine failures at distinct levels. However, the first big step toward a solution to this kind of problem is being able to detect data inconsistency and make sure that the data among the replication servers is synchronized.


The goal is to execute the mysqlrplsync utility to detect data consistency issues on an active replication system making sure that the master and its slaves are synchronized.

Executing this task manually on an active system is difficult and sometimes tedious since changes may be continuously happening on all servers (in an asynchronous way) and the same data needs to be compared between servers. Moreover, it can introduce an undesirable and uncontrolled impact on the system performance if you lock the tables or stop replication.

Fortunately, the mysqlrplsync utility allows us to perform this task in an easy and optimized way with a controlled impact on the running system (limiting the execution time of all operations).

Let's assume that a replication topology with one master (server1:3310) and two slaves (server2:3311, server3:3312) was previously setup and it is running without errors.

Example Execution

Start the mysqlrplsync utility, specifying the servers you want to check.

shell> mysqlrplsync --master=user:pass@localhost:3310 \
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is 15 transactions behind Master.
# - Slave 'localhost@3312' is 12 transactions behind Master.
# Checking data consistency.
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_rplsync_db' database...
# - Checking 't0' table data...
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# Checking 'test_db' database...
# - Checking 't0' table data...
#   [OK] `test_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t1` checksum for server 'localhost@3312'.
# SUMMARY: No data consistency issue found.


The above example illustrates how to start the mysqlrplsync utility to check if all data on the specified replication topology is synchronized.

To do this, we simply need to specify the master's connection with the --master option, and the list of slaves with the --slaves option. As an alternative to the --slaves option, one can use the --discover-slaves-login specifying a user and password (or login-path) to connect to the slaves and the utility attempts to discover all of the slaves connected to the master using the specified login. For example, '--discover-slaves-login=root:secret' is used to discover all of the slaves and login to each using the 'root' user id and the password 'secret'.

The --discover-slaves-login can be very handy especially if there is a huge number of slaves in the topology, but bear in mind that the explicit specification of slaves is safer and that discovery can fail to find some servers. In particular, it is important to note that in order for slaves to be discovered, they must be started with the '--report-host' and '--report-port' options with appropriate values and they must be correctly connected to the master (IO thread running) otherwise discovery may fail to identify the slave.

In the above example, no data consistency issues were found. In case any data difference are found, each is clearly identified by the '[DIFF]' prefix followed by concise information of where and what is the difference. Additionally, at the end the utility displays a summary of the number of issues found.

The utility also allows users to check consistency on the slaves without specifying the master. However, be advised that only checking the slaves does not guarantee that there is no data consistency issue between the master and the slaves. Also keep in mind that the results provided by the utility are valid at the time the checks are actually performed for each table. This is because in an active system with data continuously changing, inconstancy issues might be introduced in the immediate instance after the check is completed.

Permissions Required

The user for the master must have permissions to lock tables, perform the checksum, and get information about the master status. Specifically, the user used to connect to the master requires the following privileges: SUPER or REPLICATION CLIENT, LOCK TABLES and SELECT.

The user for the slaves must have permissions to start/stop the slave, perform the checksum, and get information about the slave status. More specifically, the login user to connect to slaves requires the following privileges: SUPER and SELECT.

Tips and Tricks

In the above example, the mysqlrplsync utility was used to check all the data on the servers. However, it is possible to check only specific databases and tables. For that purpose, the user only need specify the target database and tables as arguments when invoking the utility. It is also possible to exclude specific database and tables from the check using the --exclude option. For example, '--exclude=test_rplsync_db,test_db.t0' excludes the database 'test_rplsync_db' and table 'test_db.t0' from the check performed by the utility.

The utility provides important options to control the execution time of the checksum queries performed on each table and the waiting time for slaves to reach an established synchronization point, namely: the --checksum-timeout and --rpl-timeout options. A polling process is applied on each slave to periodically check if replication has caught up with the defined sync point (all transactions have been processed).

The periodic interval to perform this check can be adjusted with the --interval option. These options are fundamental to control the impact of the execution of the utility on the replication system allow you to limit the execution time of the checksum queries for large tables and the time slaves wait for replication to catch up. When the timeouts defined by those options are reached, the check is skipped. Nevertheless, the user can always execute the utility later only for the skipped tables using higher timeout values.

The utility provides the flexibility to be executed separately for different set of servers, only affecting different parts of the replication system at each time. For example, consider a heterogeneous system (where slaves have a different performance characteristics) with one master 'M' and three slaves 'S1', 'S2' and 'S3'. To minimize the impact on the master, the user can run the utility first for the master 'M' and the fastest slave 'S1', and then run it again only for the slaves 'S1', 'S2' and 'S3'. If no consistency issues are found in the first execution (M = S1) or in the second execution (S1 = S2 = S3), then by transitivity and due to the inclusion of the same server 'S1' in both checks, it can be said that there is no consistency issues in the topology (M = S1 = S2 = S3) at the time the first check was completed. This kind of execution must be performed sequentially and not concurrently, otherwise the synchronization process of each instance may affect the other and it may not work properly.