MySQL 5.6 Reference Manual Including MySQL NDB Cluster 7.3-7.4 Reference Guide
In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the replica to recover its state before halting. This section describes the impact of an unexpected halt of a replica during replication and how to configure a replica for the best chance of recovery to continue replication.
After an unexpected halt of a replica, upon restart the replica's
SQL thread must recover which transactions have been executed
already. The information required for recovery is stored in the
replica's applier metadata repository. In older MySQL Server
versions, this log could only be created as a file in the data
directory that was updated after the transaction had been applied.
This held the risk of losing synchrony with the source depending
at which stage of processing a transaction the replica halted, or
even corruption of the file itself. In MySQL 5.6 you
can instead use an InnoDB
table to
store the applier metadata repository. By using this transactional
storage engine the information is always recoverable upon restart.
As a table, updates to the applier metadata repository are
committed together with the transactions, meaning that the
replica's progress information recorded in that repository is
always consistent with what has been applied to the database, even
in the event of an unexpected server halt.
To configure MySQL 5.6 to store the applier metadata
repository as an InnoDB
table, set
the system variable
relay_log_info_repository
to
TABLE
. The server then stores information
required for the recovery of the replication SQL thread in the
mysql.slave_relay_log_info
table. For further
information on the replication metadata repositories, see
Section 17.2.2, “Relay Log and Replication Metadata Repositories”.
Exactly how a replica recovers from an unexpected halt is
influenced by the chosen method of replication, whether the
replica is single-threaded or multithreaded, the setting of
variables such as
relay_log_recovery
, and whether
features such as MASTER_AUTO_POSITION
are being
used.
The following table shows the impact of these different factors on how a single-threaded replica recovers from an unexpected halt.
Table 17.1 Factors Influencing Single-threaded Replica Recovery
GTID |
MASTER_AUTO_POSITION |
Crash type |
Recovery guaranteed |
Relay log impact |
||
---|---|---|---|---|---|---|
OFF |
Not relevant |
1 |
TABLE |
Server |
Yes |
Lost |
OFF |
Not relevant |
1 |
Any |
OS |
No |
Lost |
OFF |
Not relevant |
0 |
TABLE |
Server |
Yes |
Remains |
OFF |
Not relevant |
0 |
TABLE |
OS |
No |
Remains |
ON |
ON |
1 | Not relevant |
Not relevant |
Yes |
Lost |
ON |
OFF |
0 |
TABLE |
Server |
Yes |
Remains |
ON |
OFF |
0 |
Any |
OS |
No |
Remains |
As the table shows, when using a single-threaded replica the following configurations are most resilient to unexpected halts:
When using GTIDs and MASTER_AUTO_POSITION
,
set relay_log_recovery=1
.
With this configuration the setting of
relay_log_info_repository
and
other variables does not impact on recovery. Note that to
guarantee recovery,
sync_binlog=1
must also be
set on the replica, so that the replica's binary log is
synchronized to disk at each write. Otherwise, committed
transactions might not be present in the replica's binary log.
When using file position based replication, set
relay_log_recovery=1
and
relay_log_info_repository=TABLE
.
During recovery the relay log is lost.
The following table shows the impact of these different factors on how a multithreaded replica recovers from an unexpected halt.
Table 17.2 Factors Influencing Multithreaded Replica Recovery
GTID |
|
Crash type |
Recovery guaranteed |
Relay log impact |
|||
---|---|---|---|---|---|---|---|
OFF |
1 |
Not relevant |
1 |
TABLE |
Any |
Yes |
Lost |
OFF |
>1 |
Not relevant |
1 |
TABLE |
Server |
Yes |
Lost |
OFF |
>1 |
Not relevant |
1 |
Any |
OS |
No |
Lost |
OFF |
1 |
Not relevant |
0 |
TABLE |
Server |
Yes |
Remains |
OFF |
1 |
Not relevant |
0 |
TABLE |
OS |
No |
Remains |
ON |
Any | ON |
1 | Any |
Any |
Yes |
Lost |
ON |
1 |
OFF |
0 |
TABLE |
Server |
Yes |
Remains |
ON |
1 |
OFF |
0 |
Any |
OS |
No |
Remains |
As the table shows, when using a multithreaded replica the following configurations are most resilient to unexpected halts:
When using GTIDs and MASTER_AUTO_POSITION
,
set relay_log_recovery=1
.
With this configuration the setting of
relay_log_info_repository
and
other variables does not impact on recovery.
When using file position based replication, set
relay_log_recovery=1
,
sync_relay_log=1
, and
relay_log_info_repository=TABLE
.
During recovery the relay log is lost.
It is important to note the impact of
sync_relay_log=1
, which requires
a write of to the relay log per transaction. Although this setting
is the most resilient to an unexpected halt, with at most one
unwritten transaction being lost, it also has the potential to
greatly increase the load on storage. Without
sync_relay_log=1
, the effect of
an unexpected halt depends on how the relay log is handled by the
operating system. Also note that when
relay_log_recovery=0
, the next
time the replica is started after an unexpected halt the relay log
is processed as part of recovery. After this process completes,
the relay log is deleted.
An unexpected halt of a multithreaded replica using the
recommended file position based replication configuration above
may result in a relay log with transaction inconsistencies (gaps
in the sequence of transactions) caused by the unexpected halt.
See
Replication and Transaction Inconsistencies.
In MySQL 5.7.13 and later, if the relay log recovery process
encounters such transaction inconsistencies they are filled and
the recovery process continues automatically. In MySQL versions
prior to MySQL 5.7.13, this process is not automatic and requires
starting the server with
relay_log_recovery=0
, starting
the replica with
START SLAVE UNTIL
SQL_AFTER_MTS_GAPS
to fix any transaction
inconsistencies and then restarting the replica with
relay_log_recovery=1
.
When you are using multi-source replication and
relay_log_recovery=1
, after
restarting due to an unexpected halt all replication channels go
through the relay log recovery process. Any inconsistencies found
in the relay log due to an unexpected halt of a multithreaded
replica are filled.