MySQL Utilities

5.12 mysqlfailover — Automatic replication health monitoring and failover

This utility permits users to perform replication health monitoring and automatic failover on a replication topology consisting of a single master and its slaves. The utility is designed to run interactively or continuously refreshing the health information and checking the master status at periodic intervals. Its primary mission is to monitor the master for failure and when a failure occurs, execute failover to one of the slaves that is in a valid state. The utility accepts an optional list of slaves to be considered for the candidate slave.

This utility is designed to work exclusively for servers that support global transaction identifiers (GTIDs) and have gtid_mode=ON. MySQL server versions 5.6.5 and higher support GTIDs. See Replication with Global Transaction Identifiers for more information. Thus, this utility does not work with anonymous replication servers (binary log + position).

The user can specify the interval in seconds to use for detecting the master status and generating the health report using the --interval option. At each interval, the utility checks to see if the server is alive via a ping operation followed by a check of the connector to detect if the server is still reachable. The ping operation can be controlled with the --ping option (see below).

If the master is found to be offline or unreachable, the utility executes one of the following actions based on the --failover-mode option value. The available values are:

For all options that permit specifying multiple servers, the options require a comma-separated list of connection parameters in the following form (where the password, port, and socket are optional).:

*user*[:*passwd*]@*host*[:*port*][:*socket*] or
*login-path*[:*port*][:*socket*]

The utility permits users to discover slaves connected to the master. The discover slaves feature is run automatically on each interval. Furthermore, it is required that slaves use the --master-info-repository=TABLE startup setting.

The discover slaves option requires all slaves use the --report-host and --report-port server startup variables with the correct hostname and port. If these are missing or report the incorrect information, the slave may not be detected and thus not included in the operation of the utility. The discover slaves option ignores any slaves to which it cannot connect.

Note

If you have one or more slaves which do not report their hostname and port and should a failover event occur, those slaves are not included in the resulting topology. That is, they are not a slave of the new master. Be sure to check that all of your slaves are accounted for in the health report before relying on the utility for complete automatic failover.

The utility permits the user to specify an external script to execute before and after the switchover and failover commands. The user can specify these with the --exec-before and --exec-after options. The return code of the script is used to determine success. Each script must report 0 (success) to be considered successful. If a script returns a value other than 0, the result code is presented in an error message.

The utility also permits the user to specify a script to be used for detecting a downed master or an application-level event to trigger failover. This can be specified using the --exec-fail-check option. The return code for the script is used to invoke failover. A return code of 0 indicates failover should not take place. A return code other than 0 indicates failover should take place. This is checked at the start of each interval if a script is supplied. The timeout option is not used in this case and the script is run once at the start of each interval.

The utility permits the user to log all actions taken during the commands. The --log option requires a valid path and filename of the file to use for logging operations. The log is active only when this option is specified. The option --log-age specifies the age in days that log entries are kept. The default is seven (7) days. Older entries are automatically deleted from the log file (but only if the --log option is specified).

The format of the log file includes the date and time of the event, the level of the event (informational - INFO, warning - WARN, error - ERROR, critical failure - CRITICAL), and the message reported by the utility.

The interface provides a number of options for displaying additional information. You can choose to view the replication health report (default), or choose to view the list of GTIDs in use, the UUIDs in use, or view the log file contents if logging is enabled. Each of these reports is described below.

The user interface is designed to match the size of the terminal window in which it is run. A refresh option is provided to permit users to resize their terminal windows or refresh the display at any time. However, the interface automatically resizes to the terminal window on each interval.

The interface displays the name of the utility, the master's status including binary log file, position, and filters as well as the date and time of the next interval event.

The interface also permits the user to scroll up or down through a list longer than what the terminal window permits. When a long list is presented, the scroll options become enabled. The user can scroll the list up with the up arrow key and down with the down arrow key.

Use the --verbose option to see additional information in the health report and additional messages during failover.

MODES OF OPERATION

The utility supports two modes of operation. The default mode, running as a console, works as described above. An additional mode that permits you to run the utility as a daemon is provided for POSIX platforms.

When run as a daemon, the utility does not have interactivity. However, all events are written to the log file. You can control what is written to the log by using the --report-values option.

To run the utility as a daemon, use the --daemon option. There are four commands that can be used in --daemon option. These include:

OPTIONS

mysqlfailover accepts the following command-line options:

NOTES

The login user must have the appropriate permissions for the utility to check servers and monitor their status (e.g., SHOW SLAVE STATUS, SHOW MASTER STATUS). The user must also have permissions to execute the failover procedure (e.g., STOP SLAVE, START SLAVE, WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS, CHANGE MASTER TO ...). Lastly, the user must have the REPLICATE SLAVE privilege for slaves to connect to their master. The same permissions are required by the failover utility for master and slaves in order to run successfully. In particular, users connected to slaves, candidates and master require SUPER, GRANT OPTION, REPLICATION SLAVE, RELOAD, DROP, CREATE, INSERT and SELECT privileges.

The DROP, CREATE, INSERT and SELECT privileges are required to register the failover instance on the initial master or the new master (after a successful failover). Therefore, since any slave can become the new master, slaves and candidates also require those privileges. The utility checks permissions for the master, slaves, and candidates at startup.

While this utility can be used to perform failover should the master fail, care should be taken when designing your high availability solution. You must design your topology so that it can tolerate a number of failures to match your risk assessment.

In this case, the number of candidate slaves determines the number of times the utility can safely failover. For example, having a master and four candidate slaves allows up to four failures; one for the original master and up to three candidates. That is, should the first new master fail, the utility will try to failover to the next available candidate slave (or simply those listed as slaves).

When the utility has failed over to the last slave in the topology (and there are no other slaves), the utility will continue to monitor that master but when that master fails, the utility will stop with an error. In this case, no user supplied scripts will run (for example, --exec-before, --exec-after, --exec-post-failover).

However, if the utility is started with the auto failover mode (--failover-mode=auto), you can add slaves at any time and those new slaves will become candidates for failover. In that case, so long as there are slaves to failover to, the utility will continue to monitor the master. Slaves can be added while the utility is running using the normal replication procedure for adding a slave to a master. You do not need to stop the utility to add a new slave.

If the failover mode is set to elect (--failover-mode=elect), when the last candidate slave is chosen as the master, the utility will not failover to any other slaves even if they were added to the topology after the utility was started. This is because the elect mode limits candidate selection to the list specified with the --candidates option. In this case, when the last master (candidate slave) fails, the utility will stop with an error.

Note

While a topology with one master and one candidate slave is a basic replication topology, it can tolerate at most one failure and therefore is not a recommended setup for critical high availability scenarios.

Mixing IP and hostnames is not recommended. The replication-specific utilities attempt to compare hostnames and IP addresses as aliases for checking slave connectivity to the master. However, if your installation does not support reverse name lookup, the comparison could fail. Without the ability to do a reverse name lookup, the replication utilities could report a false negative that the slave is (not) connected to the master.

For example, if you set up replication using MASTER_HOST=ubuntu.net on the slave and later connect to the slave with mysqlrplcheck and have the master specified as --master=198.51.100.6 using the valid IP address for ubuntu.net, you must have the ability to do a reverse name lookup to compare the IP (198.51.100.6) and the hostname (ubuntu.net) to determine if they are the same machine.

Similarly, in order to avoid issues mixing local IP '127.0.0.1' with 'localhost', all the addresses '127.0.0.1' are internally converted to 'localhost' by the utility. Nevertheless, it is best to use the actual hostname of the master when connecting or setting up replication.

The utility checks to see if the slaves are using the option --master-info-repository=TABLE. If they are not, the utility stops with an error.

The path to the MySQL client tools should be included in the PATH environment variable in order to use the authentication mechanism with login-paths. This allows the utility to use the my_print_defaults tool which is required to read the login-path values from the login configuration file (.mylogin.cnf).

At startup, the console attempts to register itself with the master. If another console is already registered, and the failover mode is auto or elect, the console is blocked from running failover. When a console quits, it unregisters itself from the master. If this process is broken, the user may override the registration check by using the --force option.

The console creates a special table in the mysql database that is used to keep track of which instance is communicating with the master. If you use the --force option, the console removes the rows in this table. The table is constructed with:

CREATE TABLE IF NOT EXISTS mysql.failover_console (host char(30), port char(10))

When the console starts, a row is inserted containing the hostname and port of the master. On startup, if a row matches these values, the console does not start. If you use the --force option, the row is deleted.

When running the utility using the --daemon=nodetach option, the --pidfile option can be omitted. It is ignored if used.

When using the external scripts, the following parameters are passed in the order shown.

Suppose you have a script run_before.sh and you specify that you want it executing before the failover is performed (using the --exec-before option). Further, let us assume the master MySQL Server is using port 3306 on the host 'host1' and the MySQL Server that becomes the new master is using port 3308 on host 'can_host2'. The script would therefore be invoked in the following manner.

% run_before.sh host1 3306 can_host2 3308

Table 5.3 External Script Parameters

MySQL Failover Option Parameters Passed to External Script
--exec-before master host, master port, candidate host, candidate port
--exec-after new master host, new master port
--exec-fail-check master host, master port
--exec-post-failover (no errors during failover) old master host, old master port, new master host, new master port
--exec-post-failover (errors during failover) old master host, old master port

EXAMPLES

To launch the utility, you must specify at a minimum the --master option and either the --discover-slaves-login option or the --slaves option. The --discover-slaves-login option can be used in conjunction with the --slaves option to specify a list of known slaves (or slaves that do not report their host and IP) and to discover any other slaves connected to the master.

An example of the user interface and some of the report views are shown in the following examples.

Note

The "GTID Executed Set" displays the first GTID listed in the SHOW MASTER STATUS view. If there are multiple GTIDs listed, the utility displays [...] to indicate there are additional GTIDs to view. You can view the complete list of GTIDs on the GTID display screens.

The default interface displays the replication health report like the following. In this example the log file is enabled. A sample startup command is shown below:

shell> mysqlfailover --master=root@localhost:3331 --discover-slaves-login=root --log=log.txt

MySQL Replication Monitor and Failover Utility
Failover Mode = auto     Next Interval = Mon Mar 19 15:56:03 2012

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  571

GTID Executed Set
2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7 [...]

Replication Health Status
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| localhost  | 3331  | MASTER  | UP     | ON         | OK      |
| localhost  | 3332  | SLAVE   | UP     | ON         | OK      |
| localhost  | 3333  | SLAVE   | UP     | ON         | OK      |
| localhost  | 3334  | SLAVE   | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+
Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries

Press Q to exit the utility, R to refresh the current display, and H returns to the replication health report.

Press the G key to show a GTID report similar to the following. The first page shown is the master's executed GTID set:

MySQL Replication Monitor and Failover Utility
Failover Mode = auto     Next Interval = Mon Mar 19 15:59:33 2012

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  571

GTID Executed Set
2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7 [...]

Master GTID Executed Set
+-------------------------------------------+
| gtid                                      |
+-------------------------------------------+
| 2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7  |
| 5503D37E-2DB2-11E2-A781-8077D4C14B33:1-3  |
+-------------------------------------------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries Up|Down-scroll

Continuing to press G cycles through the three GTID lists.

If the list is longer than the screen permits as shown in the example above, the scroll up and down help is also shown. In this case, press the down arrow key to scroll down.

Press U to view the list of UUIDs used in the topology, for example:

MySQL Replication Monitor and Failover Utility
Failover Mode = auto     Next Interval = Mon Mar 19 16:02:34 2012

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  571

GTID Executed Set
2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7 [...]

UUIDs
+------------+-------+---------+---------------------------------------+
| host       | port  | role    | uuid                                  |
+------------+-------+---------+---------------------------------------+
| localhost  | 3331  | MASTER  | 55c65a00-71fd-11e1-9f80-ac64ef85c961  |
| localhost  | 3332  | SLAVE   | 5dd30888-71fd-11e1-9f80-dc242138b7ec  |
| localhost  | 3333  | SLAVE   | 65ccbb38-71fd-11e1-9f80-bda8146bdb0a  |
| localhost  | 3334  | SLAVE   | 6dd6abf4-71fd-11e1-9f80-d406a0117519  |
+------------+-------+---------+---------------------------------------+
Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries

If, once the master is detected as down and failover mode is auto or elect and there are viable candidate slaves, the failover feature engages automatically and the user sees the failover messages appear. When failover is complete, the interface returns to monitoring replication health after 5 seconds. The following shows an example of failover occurring:

Failover starting...
# Candidate slave localhost:3332 will become the new master.
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.
# Discovering slaves for master at localhost:3332

Failover console will restart in 5 seconds.

After the failover event, the new topology is shown in the replication health report:

MySQL Replication Monitor and Failover Utility
Failover Mode = auto     Next Interval = Mon Mar 19 16:05:12 2012

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  1117

GTID Executed Set
2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7 [...]

UUIDs
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| localhost  | 3332  | MASTER  | UP     | ON         | OK      |
| localhost  | 3333  | SLAVE   | UP     | ON         | OK      |
| localhost  | 3334  | SLAVE   | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries

Pressing L with the --log option specified causes the interface to show the entries in the log file, such as:

MySQL Replication Monitor and Failover Utility
Failover Mode = auto     Next Interval = Mon Mar 19 16:06:13 2012

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB
mysql-bin.000001  1117

GTID Executed Set
2A67DE00-2DA1-11E2-A711-00764F2BE90F:1-7 [...]

Log File
+-------------------------+----------------------------------------- ... --+
| Date                    | Entry                                    ...   |
+-------------------------+----------------------------------------- ... --+
| 2012-03-19 15:55:33 PM  | INFO Failover console started.           ...   |
| 2012-03-19 15:55:33 PM  | INFO Failover mode = auto.               ...   |
| 2012-03-19 15:55:33 PM  | INFO Getting health for master: localhos ...   |
| 2012-03-19 15:55:33 PM  | INFO Master status: binlog: mysql-bin.00 ...   |
+-------------------------+----------------------------------------- ... --+
Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries Up|Down-scroll\

LIMITATIONS

When the --master-fail-retry option is used, and the value specified exceeds the value of the --interval option, it is possible for an interval event to be delayed if the master retry check is still processing. In this case, should the master not be down, the next interval occurs immediately after the check of the master is complete.

PERMISSIONS REQUIRED

The user must have permissions to monitor the servers on the topology and configure replication to successfully perform the failover operation. Additional permissions are also required to register and unregister the running mysqlfailover instance on the master and slaves. Specifically, the login user must have the following privileges: SUPER, GRANT OPTION, REPLICATION SLAVE, RELOAD, DROP, CREATE, INSERT, and SELECT. (See Privileges Provided by MySQL, for more information about these.)

The referred permissions are required for the login users used for all servers (master, slaves, and candidates).