MySQL 8.0 Reference Manual Including MySQL NDB Cluster 8.0

19.5.1.34 Replication and Transaction Inconsistencies

Inconsistencies in the sequence of transactions that have been executed from the relay log can occur depending on your replication configuration. This section explains how to avoid inconsistencies and solve any problems they cause.

The following types of inconsistencies can exist:

The following scenarios are relevant to the existence of half-applied transactions, gaps, and source binary log position lag:

  1. While replication threads are running, there may be gaps and half-applied transactions.

  2. mysqld shuts down. Both clean and unclean shutdown abort ongoing transactions and may leave gaps and half-applied transactions.

  3. KILL of replication threads (the SQL thread when using a single-threaded replica, the coordinator thread when using a multithreaded replica). This aborts ongoing transactions and may leave gaps and half-applied transactions.

  4. Error in applier threads. This may leave gaps. If the error is in a mixed transaction, that transaction is half-applied. When using a multithreaded replica, workers which have not received an error complete their queues, so it may take time to stop all threads.

  5. STOP REPLICA when using a multithreaded replica. After issuing STOP REPLICA, the replica waits for any gaps to be filled and then updates Exec_master_log_pos. This ensures it never leaves gaps or source binary log position lag, unless any of the cases above applies, in other words, before STOP REPLICA completes, either an error happens, or another thread issues KILL, or the server restarts. In these cases, STOP REPLICA returns successfully.

  6. If the last transaction in the relay log is only half-received and the multithreaded replica's coordinator thread has started to schedule the transaction to a worker, then STOP REPLICA waits up to 60 seconds for the transaction to be received. After this timeout, the coordinator gives up and aborts the transaction. If the transaction is mixed, it may be left half-completed.

  7. STOP REPLICA when the ongoing transaction updates transactional tables only, in which case it is rolled back and STOP REPLICA stops immediately. If the ongoing transaction is mixed, STOP REPLICA waits up to 60 seconds for the transaction to complete. After this timeout, it aborts the transaction, so it may be left half-completed.

The global setting for the system variable rpl_stop_replica_timeout (from MySQL 8.0.26) or rpl_stop_slave_timeout (before MySQL 8.0.26) is unrelated to the process of stopping the replication threads. It only makes the client that issues STOP REPLICA return to the client, but the replication threads continue to try to stop.

If a replication channel has gaps, it has the following consequences:

  1. The replica database is in a state that may never have existed on the source.

  2. The field Exec_master_log_pos in SHOW REPLICA STATUS is only a low-water mark. In other words, transactions appearing before the position are guaranteed to have committed, but transactions after the position may have committed or not.

  3. CHANGE REPLICATION SOURCE TO and CHANGE MASTER TO statements for that channel fail with an error, unless the applier threads are running and the statement only sets receiver options.

  4. If mysqld is started with --relay-log-recovery, no recovery is done for that channel, and a warning is printed.

  5. If mysqldump is used with --dump-replica or --dump-slave, it does not record the existence of gaps; thus it prints CHANGE REPLICATION SOURCE TO | CHANGE MASTER TO with RELAY_LOG_POS set to the low-water mark position in Exec_master_log_pos.

    After applying the dump on another server, and starting the replication threads, transactions appearing after the position are replicated again. Note that this is harmless if GTIDs are enabled (however, in that case it is not recommended to use --dump-replica or --dump-slave).

If a replication channel has source binary log position lag but no gaps, cases 2 to 5 above apply, but case 1 does not.

The source binary log position information is persisted in binary format in the internal table mysql.slave_worker_info. START REPLICA [SQL_THREAD] always consults this information so that it applies only the correct transactions. This remains true even if replica_parallel_workers or slave_parallel_workers has been changed to 0 before START REPLICA, and even if START REPLICA is used with UNTIL clauses. START REPLICA UNTIL SQL_AFTER_MTS_GAPS only applies as many transactions as needed in order to fill in the gaps. If START REPLICA is used with UNTIL clauses that tell it to stop before it has consumed all the gaps, then it leaves remaining gaps.

Warning

RESET REPLICA removes the relay logs and resets the replication position. Thus issuing RESET REPLICA on a multithreaded replica with gaps means the replica loses any information about the gaps, without correcting the gaps. In this situation, if binary log position based replication is in use, the recovery process fails.

When GTID-based replication is in use (GTID_MODE=ON) and SOURCE_AUTO_POSITION is set for the replication channel using the CHANGE REPLICATION SOURCE TO statement, the old relay logs are not required for the recovery process. Instead, the replica can use GTID auto-positioning to calculate what transactions it is missing compared to the source. From MySQL 8.0.26, the process used for binary log position based replication to resolve gaps on a multithreaded replica is skipped entirely when GTID-based replication is in use. When the process is skipped, a START REPLICA UNTIL SQL_AFTER_MTS_GAPS statement behaves differently, and does not attempt to check for gaps in the sequence of transactions. You can also issue CHANGE REPLICATION SOURCE TO statements, which are not permitted on a non-GTID replica where there are gaps.