5.19 mysqlrplsync — Replication synchronization checker

This utility permits you to check replication servers for synchronization. It checks data consistency between a master and slaves or between two slaves. The utility reports missing objects as well as missing data.

The utility can operate on an active replication topology, applying a synchronization process to check the data. Those servers where replication is not active can still be checked but the synchronization process will be skipped. In that case, it is up to the user to manually synchronize the servers.

The user must provide connection parameters for the servers. That is, the utility requires the master and slaves using the --master and --slaves options. To compare only slaves, the user need only provide the --slaves option.

The utility also provides a feature to discover slaves connected to the master using the --discover-slaves-login and --master options. To use the discover slaves feature, all slaves must use the following startup options; --report-host and --report-port to specify the correct hostname and port of the server. If these are missing or report the incorrect information, the slave may not be discovered and therefore not included in the synchronization check. The discover slaves feature ignores slaves that cannot be reached.

By default, all data is included in the comparison. To check specific databases or tables, list each element as a separated argument for the utility using fully qualified names. The user can also choose to exclude some databases or tables from the check using the --exclude option.

The utility also provides some important features that allow users to adjust the execution of the consistency check to their system. For example, the user may wish the utility to minimize execution of the synchronization process. To do so, the user uses the --rpl-timeout to define the maximum time for each slave to synchronize. More specifically, allow slaves to catch up with the master in order to compare the data. During this waiting step, the slaves status is periodically polled according to a predefined time interval. This polling interval to verify if the slaves are synced can be adjusted with the --interval option. A checksum query is used to compare the data of each table between servers. The checksum calculation step is skipped if its execution exceeds a predefined time, avoiding undesirable performance impacts on the target system if it takes too long to execute. The user can change the checksum timeout using the --checksum-timeout option.

Users can also use the --verbose option to see additional information when the utility executes.

This utility is designed to work exclusively for servers that support global transaction identifiers (GTIDs) and have gtid_mode=ON. Servers with GTID disabled will be skipped by the utility. See Replication with Global Transaction Identifiers, for more information about GTID.

The utility takes into consideration the use of replication filtering rules on the servers skipping the check for filtered databases and tables according to the defined options. Nevertheless, the use of replication filters can still lead to data consistency issues depending on how statements are evaluated. See How Servers Evaluate Replication Filtering Rules, for more information.

OPTIONS

mysqlrplsync accepts the following command-line options:

NOTES

The data consistency check is performed per table using a checksum table. If the calculated checksum differs, it indicates the tables are not synchronized. Nevertheless, since the checksum operation is not collision free, there is a very small probability that two different tables can produce the same checksum.

Mixing IP and hostnames is not recommended. The replication-specific utilities will attempt to compare hostnames and IP addresses as aliases for checking slave connectivity to the master. However, if your installation does not support reverse name lookup, the comparison could fail. Without the ability to do a reverse name lookup, the replication utilities could report a false negative that the slave is (not) connected to the master.

For example, if you setup replication using MASTER_HOST=ubuntu.net on the slave and later connect to the slave with mysqlrplcheck and have the master specified as --master=192.168.0.6 using the valid IP address for ubuntu.net, you must have the ability to do a reverse name lookup to compare the IP (192.168.0.6) and the hostname (ubuntu.net) to determine if they are the same machine.

Similarly, in order to avoid issues mixing local IP '127.0.0.1' with 'localhost', all the addresses '127.0.0.1' will be internally converted to 'localhost' by the utility.

The path to the MySQL client tools should be included in the PATH environment variable in order to use the authentication mechanism with login-paths. This will allow the utility to use the my_print_defaults tools which is required to read the login-path values from the login configuration file (.mylogin.cnf).

LIMITATIONS

This utility is designed to work exclusively for servers that support global transaction identifiers (GTIDs) and have gtid_mode=ON. Due to known server issues with some operations required for the synchronization process, only MySQL Server versions 5.6.14 and higher are supported by this utility.

Some replication filtering options are not supported by this utility due to known issues on the server side, namely: replicate_do_db, replicate_ignore_db, and replicate_wild_do_table. In case a non supported replication filtering option is detected on a server, the utility issues an appropriate error and exits. This check is performed at the beginning when the utility starts.

EXAMPLES

To check the data consistency on an active replication system explicitly specifying the master and slaves:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is 15 transactions behind Master.
# - Slave 'localhost@3312' is 12 transactions behind Master.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_rplsync_db' database...
# - Checking 't0' table data...
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# Checking 'test_db' database...
# - Checking 't0' table data...
#   [OK] `test_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t1` checksum for server 'localhost@3312'.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

To check the data consistency on an active replication system using slave discovery:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --discover-slaves-login=rpl:pass
# Discovering slaves for master at localhost:3310
# Discovering slave at localhost:3311
# Found slave: localhost:3311
# Discovering slave at localhost:3312
# Found slave: localhost:3312
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is 15 transactions behind Master.
# - Slave 'localhost@3312' is 15 transactions behind Master.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_rplsync_db' database...
# - Checking 't0' table data...
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# Checking 'test_db' database...
# - Checking 't0' table data...
#   [OK] `test_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t1` checksum for server 'localhost@3312'.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

To check the data consistency on an active replication system, but only between specific slaves:

shell> mysqlrplsync --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312
#
# Checking data consistency.
#
# Using Slave 'localhost@3311' as base server for comparison.
# Checking 'test_rplsync_db' database...
# - Checking 't0' table data...
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# Checking 'test_db' database...
# - Checking 't0' table data...
#   [OK] `test_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_db`.`t1` checksum for server 'localhost@3312'.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

To check the data consistency of a specific database and table on an active replication system:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312 \
          test_rplsync_db test_db.t1
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is 15 transactions behind Master.
# - Slave 'localhost@3312' is 12 transactions behind Master.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_rplsync_db' database...
# - Checking 't0' table data...
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# Checking 'test_db' database...
# - Checking 't1' table data...
#   [OK] `test_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t1` checksum for server 'localhost@3312'.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

To check the data consistency on an active replication system excluding a specific database and table:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312 \
          --exclude=test_rplsync_db,test_db.t1
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is 15 transactions behind Master.
# - Slave 'localhost@3312' is 12 transactions behind Master.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_db' database...
# - Checking 't0' table data...
#   [OK] `test_db`.`t0` checksum for server 'localhost@3311'.
#   [OK] `test_db`.`t0` checksum for server 'localhost@3312'.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

The following is an example of a replication check that has data inconsistencies:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is up-to-date.
# - Slave 'localhost@3312' is up-to-date.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# [DIFF] Database NOT on base server but found on 'localhost@3311': only_on_slave_db
# Checking 'test_rplsync_db' database...
#   [DIFF] Table NOT on base server but found on 'localhost@3311': t3
#   [DIFF] Table NOT on base server but found on 'localhost@3312': t3
#   [DIFF] Table 'test_rplsync_db.t0' NOT on server 'localhost@3311'.
# - Checking 't0' table data...
#   [DIFF] `test_rplsync_db`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   WARNING: Slave not active 'localhost@3311' - Sync skipped.
#   [DIFF] `test_rplsync_db`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t1` checksum for server 'localhost@3312'.
# - Checking 't2' table data...
#   WARNING: Slave not active 'localhost@3311' - Sync skipped.
#   [OK] `test_rplsync_db`.`t2` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db`.`t2` checksum for server 'localhost@3312'.
# Checking 'only_on_master_db' database...
#   [DIFF] Database 'only_on_master_db' NOT on server 'localhost@3311'.
#   [DIFF] Database 'only_on_master_db' NOT on server 'localhost@3312'.
#
#...done.
#
# SUMMARY: 8 data consistency issues found.
#

Check a replication topology with filtering:

shell> mysqlrplsync --master=user:pass@localhost:3310 \
          --slaves=rpl:pass@localhost:3311,rpl:pass@localhost:3312 \
          --verbose
# Checking users permission to perform consistency check.
#
# WARNING: Replication filters found on checked servers. This can lead data consistency issues depending on how statements are evaluated.
# More information: http://dev.mysql.com/doc/en/replication-rules.html
# Master 'localhost@3310':
#   - binlog_do_db: test_rplsync_db1
# Slave 'localhost@3311':
#   - replicate_do_table: test_rplsync_db1.t1
# Slave 'localhost@3312':
#   - replicate_ignore_table: test_rplsync_db1.t2
#   - replicate_wild_ignore_table: test\_rplsync\_db1.%3
#
# GTID differences between Master and Slaves:
# - Slave 'localhost@3311' is up-to-date.
# - Slave 'localhost@3312' is up-to-date.
#
# Checking data consistency.
#
# Using Master 'localhost@3310' as base server for comparison.
# Checking 'test_rplsync_db1' database...
# [SKIP] Table 't0' check for 'localhost@3311' - filtered by replication rule.
# - Checking 't0' table data...
#   Setting data synchronization point for slaves.
#   Compute checksum on slaves (wait to catch up and resume replication).
#   [OK] `test_rplsync_db1`.`t0` checksum for server 'localhost@3312'.
# - Checking 't1' table data...
#   Setting data synchronization point for slaves.
#   Compute checksum on slaves (wait to catch up and resume replication).
#   [OK] `test_rplsync_db1`.`t1` checksum for server 'localhost@3311'.
#   [OK] `test_rplsync_db1`.`t1` checksum for server 'localhost@3312'.
# [SKIP] Table 't2' check for 'localhost@3311' - filtered by replication rule.
# [SKIP] Table 't2' check for 'localhost@3312' - filtered by replication rule.
# [SKIP] Table 't3' check for 'localhost@3311' - filtered by replication rule.
# [SKIP] Table 't3' check for 'localhost@3312' - filtered by replication rule.
# [SKIP] Database 'test_rplsync_db0' check - filtered by replication rule.
# [SKIP] Database 'test_rplsync_db2' check - filtered by replication rule.
# [SKIP] Database 'test_rplsync_db3' check - filtered by replication rule.
#
#...done.
#
# SUMMARY: No data consistency issue found.
#

PERMISSIONS REQUIRED

The user for the master must have permissions to lock tables, perform the checksum, and get information about the master status. Specifically, the user used to connect to the master requires the following privileges: SUPER or REPLICATION CLIENT, LOCK TABLES and SELECT.

The user for the slaves must have permissions to start/stop the slave, perform the checksum, and get information about the slave status. More specifically, the login user to connect to slaves requires the following privileges: SUPER and SELECT.