MySQL 5.7 Reference Manual Including MySQL NDB Cluster 7.5 and NDB Cluster 7.6
Inconsistencies in the sequence of transactions that have been executed from the relay log can occur depending on your replication configuration. This section explains how to avoid inconsistencies and solve any problems they cause.
The following types of inconsistencies can exist:
Half-applied transactions. A transaction which updates non-transactional tables has applied some but not all of its changes.
Gaps. A gap is a transaction that has
not been fully applied, even though some transaction later
in the sequence has been applied. Gaps can only appear when
using a multithreaded replica. To avoid gaps occurring, set
slave_preserve_commit_order=1
,
which requires
slave_parallel_type=LOGICAL_CLOCK
,
and that log-bin
and
log-slave-updates
are also
enabled. Note that
slave_preserve_commit_order=1
does not preserve the order of non-transactional DML
updates, so these might commit before transactions that
precede them in the relay log, which might result in gaps.
Source binary log position lag. Even in
the absence of gaps, it is possible that transactions after
Exec_master_log_pos
have been applied.
That is, all transactions up to point N
have been applied, and no transactions after
N
have been applied, but
Exec_master_log_pos
has a value smaller
than N
. In this situation,
Exec_master_log_pos
is a “low-water
mark” of the transactions applied, and lags behind
the position of the most recently applied transaction. This
can only happen on multithreaded replicas. Enabling
slave_preserve_commit_order
does not prevent source binary log position lag.
The following scenarios are relevant to the existence of half-applied transactions, gaps, and source binary log position lag:
While replication threads are running, there may be gaps and half-applied transactions.
mysqld shuts down. Both clean and unclean shutdown abort ongoing transactions and may leave gaps and half-applied transactions.
KILL
of replication threads
(the SQL thread when using a single-threaded replica, the
coordinator thread when using a multithreaded replica). This
aborts ongoing transactions and may leave gaps and
half-applied transactions.
Error in applier threads. This may leave gaps. If the error is in a mixed transaction, that transaction is half-applied. When using a multithreaded replica, workers which have not received an error complete their queues, so it may take time to stop all threads.
STOP SLAVE
when using a
multithreaded replica. After issuing
STOP SLAVE
, the replica waits
for any gaps to be filled and then updates
Exec_master_log_pos
. This ensures it
never leaves gaps or source binary log position lag, unless
any of the cases above applies, in other words, before
STOP SLAVE
completes, either
an error happens, or another thread issues
KILL
, or the server restarts.
In these cases, STOP SLAVE
returns successfully.
If the last transaction in the relay log is only
half-received and the multithreaded replica coordinator has
started to schedule the transaction to a worker, then
STOP SLAVE
waits up to 60
seconds for the transaction to be received. After this
timeout, the coordinator gives up and aborts the
transaction. If the transaction is mixed, it may be left
half-completed.
STOP SLAVE
when the ongoing
transaction updates transactional tables only, in which case
it is rolled back and STOP
SLAVE
stops immediately. If the ongoing
transaction is mixed, STOP
SLAVE
waits up to 60 seconds for the transaction
to complete. After this timeout, it aborts the transaction,
so it may be left half-completed.
The global variable
rpl_stop_slave_timeout
is
unrelated to the process of stopping the replication threads. It
only makes the client that issues STOP
SLAVE
return to the client, but the replication
threads continue to try to stop.
If a replication channel has gaps, it has the following consequences:
The replica database is in a state that may never have existed on the source.
The field Exec_master_log_pos
in
SHOW SLAVE STATUS
is only a
“low-water mark”. In other words, transactions
appearing before the position are guaranteed to have
committed, but transactions after the position may have
committed or not.
CHANGE MASTER TO
statements
for that channel fail with an error, unless the applier
threads are running and the CHANGE
MASTER TO
statement only sets receiver options.
If mysqld is started with
--relay-log-recovery
, no
recovery is done for that channel, and a warning is printed.
If mysqldump is used with
--dump-slave
, it does not
record the existence of gaps; thus it prints
CHANGE MASTER TO
with
RELAY_LOG_POS
set to the “low-water
mark” position in
Exec_master_log_pos
.
After applying the dump on another server, and starting the
replication threads, transactions appearing after the
position are replicated again. Note that this is harmless if
GTIDs are enabled (however, in that case it is not
recommended to use
--dump-slave
).
If a replication channel has source binary log position lag but no gaps, cases 2 to 5 above apply, but case 1 does not.
The source binary log position information is persisted in
binary format in the internal table
mysql.slave_worker_info
.
START SLAVE
[SQL_THREAD]
always consults this information so that
it applies only the correct transactions. This remains true even
if slave_parallel_workers
has
been changed to 0 before START
SLAVE
, and even if START
SLAVE
is used with UNTIL
clauses.
START SLAVE UNTIL
SQL_AFTER_MTS_GAPS
only applies as many transactions
as needed in order to fill in the gaps. If
START SLAVE
is used with
UNTIL
clauses that tell it to stop before it
has consumed all the gaps, then it leaves remaining gaps.
RESET SLAVE
removes the relay
logs and resets the replication position. Thus issuing
RESET SLAVE
on a replica with
gaps means the replica loses any information about the gaps,
without correcting the gaps.
When GTID-based replication is in use, from MySQL 5.7.28 a
multithreaded replica checks first whether
MASTER_AUTO_POSITION
is set to
ON
, and if it is, omits the step of
calculating the transactions that should be skipped or not
skipped. In that situation, the old relay logs are not
required for the recovery process.