18.6.11 MySQL Cluster Replication Conflict Resolution

When using a replication setup involving multiple masters (including circular replication), it is possible that different masters may try to update the same row on the slave with different data. Conflict resolution in MySQL Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given master should be applied on the slave.

Some types of conflict resolution supported by MySQL Cluster (NDB$OLD(), NDB$MAX(), NDB$MAX_DELETE_WIN()) implement this user-defined column as a timestamp column (although its type cannot be TIMESTAMP, as explained later in this section). These types of conflict resolution are always applied a row-by-row basis rather than a transactional basis. The epoch-based conflict resolution functions NDB$EPOCH() and NDB$EPOCH_TRANS() compare the order in which epochs are replicated (and thus these functions are transactional). Different methods can be used to compare resolution column values on the slave when conflicts occur, as explained later in this section; the method used can be set on a per-table basis.

You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.

Requirements.  Preparations for conflict resolution must be made on both the master and the slave. These tasks are described in the following list:

When using the functions NDB$OLD(), NDB$MAX(), and NDB$MAX_DELETE_WIN() for timestamp-based conflict resolution, we often refer to the column used for determining updates as a timestamp column. However, the data type of this column is never TIMESTAMP; instead, its data type should be INT (INTEGER) or BIGINT. The timestamp column should also be UNSIGNED and NOT NULL.

The NDB$EPOCH() and NDB$EPOCH_TRANS() functions discussed later in this section work by comparing the relative order of replication epochs applied on a primary and secondary MySQL Cluster, and do not make use of timestamps.

Master column control.  We can see update operations in terms of before and after images—that is, the states of the table before and after the update is applied. Normally, when updating a table with a primary key, the before image is not of great interest; however, when we need to determine on a per-update basis whether or not to use the updated values on a replication slave, we need to make sure that both images are written to the master's binary log. This is done with the --ndb-log-update-as-write option for mysqld, as described later in this section.

Important

Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart mysqld, or start a new mysqld instance with different logging options.

Logging Full or Partial Rows (--ndb-log-updated-only Option)

Command-Line Format--ndb-log-updated-only
System VariableNamendb_log_updated_only
Variable ScopeGlobal
Dynamic VariableYes
Permitted ValuesTypeboolean
DefaultON

For purposes of conflict resolution, there are two basic methods of logging rows, as determined by the setting of the --ndb-log-updated-only option for mysqld:

It is usually sufficient—and more efficient—to log updated columns only; however, if you need to log full rows, you can do so by setting --ndb-log-updated-only to 0 or OFF.

--ndb-log-update-as-write Option: Logging Changed Data as Updates

Command-Line Format--ndb-log-update-as-write
System VariableNamendb_log_update_as_write
Variable ScopeGlobal
Dynamic VariableYes
Permitted ValuesTypeboolean
DefaultON

The setting of the MySQL Server's --ndb-log-update-as-write option determines whether logging is performed with or without the before image. Because conflict resolution is done in the MySQL Server's update handler, it is necessary to control logging on the master such that updates are updates and not writes; that is, such that updates are treated as changes in existing rows rather than the writing of new rows (even though these replace existing rows). This option is turned on by default; in other words, updates are treated as writes. (That is, updates are by default written as write_row events in the binary log, rather than as update_row events.)

To turn off the option, start the master mysqld with --ndb-log-update-as-write=0 or --ndb-log-update-as-write=OFF. You must do this when replicating from NDB tables to tables using a different storage engine; see Replication from NDB to other storage engines, and Replication from NDB to a nontransactional storage engine, for more information.

Conflict resolution control.  Conflict resolution is usually enabled on the server where conflicts can occur. Like logging method selection, it is enabled by entries in the mysql.ndb_replication table.

The ndb_replication system table.  To enable conflict resolution, it is necessary to create an ndb_replication table in the mysql system database on the master, the slave, or both, depending on the conflict resolution type and method to be employed. This table is used to control logging and conflict resolution functions on a per-table basis, and has one row per table involved in replication. ndb_replication is created and filled with control information on the server where the conflict is to be resolved. In a simple master-slave setup where data can also be changed locally on the slave this will typically be the slave. In a more complex master-master (2-way) replication schema this will usually be all of the masters involved. Each row in mysql.ndb_replication corresponds to a table being replicated, and specifies how to log and resolve conflicts (that is, which conflict resolution function, if any, to use) for that table. The definition of the mysql.ndb_replication table is shown here:

CREATE TABLE mysql.ndb_replication  (
    db VARBINARY(63),
    table_name VARBINARY(63),
    server_id INT UNSIGNED,
    binlog_type INT UNSIGNED,
    conflict_fn VARBINARY(128),
    PRIMARY KEY USING HASH (db, table_name, server_id)
)   ENGINE=NDB
PARTITION BY KEY(db,table_name);

The columns in this table are described in the next few paragraphs.

db.  The name of the database containing the table to be replicated. You may employ either or both of the wildcards _ and % as part of the database name. Matching is similar to what is implemented for the LIKE operator.

table_name.  The name of the table to be replicated. The table name may include either or both of the wildcards _ and %. Matching is similar to what is implemented for the LIKE operator.

server_id.  The unique server ID of the MySQL instance (SQL node) where the table resides.

binlog_type.  The type of binary logging to be employed. This is determined as shown in the following table:

ValueInternal ValueDescription
0NBT_DEFAULTUse server default
1NBT_NO_LOGGINGDo not log this table in the binary log
2NBT_UPDATED_ONLYOnly updated attributes are logged
3NBT_FULLLog full row, even if not updated (MySQL server default behavior)
4NBT_USE_UPDATE(For generating NBT_UPDATED_ONLY_USE_UPDATE and NBT_FULL_USE_UPDATE values only—not intended for separate use)
5[Not used]---
6NBT_UPDATED_ONLY_USE_UPDATE (equal to NBT_UPDATED_ONLY | NBT_USE_UPDATE)Use updated attributes, even if values are unchanged
7NBT_FULL_USE_UPDATE (equal to NBT_FULL | NBT_USE_UPDATE)Use full row, even if values are unchanged

conflict_fn.  The conflict resolution function to be applied. This function must be specified as one of those shown in the following list:

These functions are described in the next few paragraphs.

NDB$OLD(column_name).  If the value of column_name is the same on both the master and the slave, then the update is applied; otherwise, the update is not applied on the slave and an exception is written to the log. This is illustrated by the following pseudocode:

if (master_old_column_value == slave_current_column_value)
  apply_update();
else
  log_exception();

This function can be used for same value wins conflict resolution. This type of conflict resolution ensures that updates are not applied on the slave from the wrong master.

Important

The column value from the master's before image is used by this function.

NDB$MAX(column_name).  If the timestamp column value for a given row coming from the master is higher than that on the slave, it is applied; otherwise it is not applied on the slave. This is illustrated by the following pseudocode:

if (master_new_column_value > slave_current_column_value)
  apply_update();

This function can be used for greatest timestamp wins conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.

Important

The column value from the master's after image is used by this function.

NDB$MAX_DELETE_WIN(column_name).  This is a variation on NDB$MAX(). Due to the fact that no timestamp is available for a delete operation, a delete using NDB$MAX() is in fact processed as NDB$OLD. However, for some use cases, this is not optimal. For NDB$MAX_DELETE_WIN(), if the timestamp column value for a given row adding or updating an existing row coming from the master is higher than that on the slave, it is applied. However, delete operations are treated as always having the higher value. This is illustrated in the following pseudocode:

if ( (master_new_column_value > slave_current_column_value)
        ||
      operation.type == "delete")
  apply_update();

This function can be used for greatest timestamp, delete wins conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.

Note

As with NDB$MAX(), the column value from the master's after image is the value used by this function.

NDB$EPOCH() and NDB$EPOCH_TRANS().  The NDB$EPOCH() function tracks the order in which replicated epochs are applied on a slave MySQL Cluster relative to changes originating on the slave. This relative ordering is used to determine whether changes originating on the slave are concurrent with any changes that originate locally, and are therefore potentially in conflict.

Most of what follows in the description of NDB$EPOCH() also applies to NDB$EPOCH_TRANS(). Any exceptions are noted in the text.

NDB$EPOCH() is asymmetric, operating on one MySQL Cluster in a two-cluster circular replication configuration (sometimes referred to as active-active replication). We refer here to cluster on which it operates as the primary, and the other as the secondary. The slave on the primary is responsible for detecting and handling conflicts, while the slave on the secondary is not involved in any conflict detection or handling.

When the slave on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary MySQL Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary MySQL Cluster always wins any conflicts with the secondary—that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This primary always wins rule has the following implications:

NDB$EPOCH() and NDB$EPOCH_TRANS() do not require any user schema modifications, or application changes to provide conflict detection. However, careful thought must be given to the schema used, and the access patterns used, to verify that the complete system behaves within specified limits.

Each of the NDB$EPOCH() and NDB$EPOCH_TRANS() functions can take an optional parameter; this is the number of bits to use to represent the lower 32 bits of the epoch, and should be set to no less than

CEIL( LOG2( TimeBetweenGlobalCheckpoints / TimeBetweenEpochs ), 1) 

For the default values of these configuration parameters (2000 and 100 milliseconds, respectively), this gives a value of 5 bits, so the default value (6) should be sufficient, unless other values are used for TimeBetweenGlobalCheckpoints, TimeBetweenEpochs, or both. A value that is too small can result in false positives, while one that is too large could lead to excessive wasted space in the database.

Both NDB$EPOCH() and NDB$EPOCH_TRANS() insert entries for conflicting rows into the relevant exceptions tables, provided that these tables have been defined according to the same exception table schema rules as described elsewhere in this section (see NDB$OLD(column_name)). Note that you need to create any exceptions table before creating the table with which it is to be used.

As with the other conflict detection functions discussed in this section, NDB$EPOCH() and NDB$EPOCH_TRANS() are activated by including relevant entries in the mysql.ndb_replication table (see The ndb_replication system table). The roles of the primary and secondary MySQL Clusters in this scenario are fully determined by mysql.ndb_replication table entries.

Because the conflict detection algorithms employed by NDB$EPOCH() and NDB$EPOCH_TRANS() are asymmetric, you must use different values for the primary slave's and secondary slave's server_id entries.

Prior to MySQL Cluster NDB 7.3.6, conflict between DELETE operations were handled like those for UPDATE operations, and within the same epoch were considered in conflict. In MySQL Cluster NDB 7.3.6 and later, a conflict between DELETE operations alone is not sufficient to trigger a conflict using NDB$EPOCH() or NDB$EPOCH_TRANS(), and the relative placement within epochs does not matter. (Bug "18454499)

NDB$EPOCH() and NDB$EPOCH_TRANS() status variables.  Several status variables can be used to monitor NDB$EPOCH() and NDB$EPOCH_TRANS() conflict detection. You can see how many rows have been found in conflict by NDB$EPOCH() since this slave was last restarted from the current value of the Ndb_conflict_fn_epoch system status variable.

Ndb_conflict_fn_epoch_trans provides the number of rows that have been found directly in conflict by NDB$EPOCH_TRANS(); the number of rows actually realigned, including those affected due to their membership in or dependency on the same transactions as other conflicting rows, is given by Ndb_conflict_trans_row_reject_count.

For more information, see Section 18.3.4.4, “MySQL Cluster Status Variables”.

Limitations on NDB$EPOCH().  The following limitations currently apply when using NDB$EPOCH() to perform conflict detection:

NDB$EPOCH_TRANS().  NDB$EPOCH_TRANS() extends the NDB$EPOCH() function. Conflicts are detected and handled in the same way using the primary wins all rule (see NDB$EPOCH() and NDB$EPOCH_TRANS()) but with the extra condition that any other rows updated in the same transaction in which the conflict occurred are also regarded as being in conflict. In other words, where NDB$EPOCH() realigns individual conflicting rows on the secondary, NDB$EPOCH_TRANS() realigns conflicting transactions.

In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.

NDB$EPOCH_TRANS() is subject to the same conditions and limitations as NDB$EPOCH(), and in addition requires that Version 2 binary log row events are used (--log-bin-use-v1-row-events equal to 0), which adds a storage overhead of 2 bytes per event in the binary log. In addition, all transaction IDs must be recorded in the secondary's binary log (--ndb-log-transaction-id option), which adds a further variable overhead (up to 13 bytes per row).

See NDB$EPOCH() and NDB$EPOCH_TRANS().

NULL.  Indicates that conflict resolution is not to be used for the corresponding table.

Status information.  A server status variable Ndb_conflict_fn_max provides a count of the number of times that a row was not applied on the current SQL node due to greatest timestamp wins conflict resolution since the last time that mysqld was started.

The number of times that a row was not applied as the result of same timestamp wins conflict resolution on a given mysqld since the last time it was restarted is given by the global status variable Ndb_conflict_fn_old. In addition to incrementing Ndb_conflict_fn_old, the primary key of the row that was not used is inserted into an exceptions table, as explained later in this section.

Conflict resolution exceptions table.  To use the NDB$OLD() conflict resolution function, it is also necessary to create an exceptions table corresponding to each NDB table for which this type of conflict resolution is to be employed. This is also true when using NDB$EPOCH() or NDB$EPOCH_TRANS(). The name of this table is that of the table for which conflict resolution is to be applied, with the string $EX appended. (For example, if the name of the original table is mytable, the name of the corresponding exceptions table name should be mytable$EX.) Prior to MySQL Cluster NDB 7.4.1, this table is created as shown:

CREATE TABLE original_table$EX  (
    server_id INT UNSIGNED,
    master_server_id INT UNSIGNED,
    master_epoch BIGINT UNSIGNED,
    count INT UNSIGNED,

    original_table_pk_columns,

    [additional_columns,]

    PRIMARY KEY(server_id, master_server_id, master_epoch, count)
) ENGINE=NDB;

MySQL Cluster NDB 7.4.1 and later support an extended exceptions table definition that includes optional columns providing information about an exception's type, cause, and originating transaction. In these versions, the syntax for creating the exceptions table is as shown here:

CREATE TABLE original_table$EX  (
    [NDB$]server_id INT UNSIGNED,
    [NDB$]master_server_id INT UNSIGNED,
    [NDB$]master_epoch BIGINT UNSIGNED,
    [NDB$]count INT UNSIGNED,

    [NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 
      'REFRESH_ROW', 'READ_ROW') NOT NULL,]
    [NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 
      'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,]
    [NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,]

    original_table_pk_columns,

    [orig_table_column|orig_table_column$OLD|orig_table_column$NEW,]

    [additional_columns,]

    PRIMARY KEY([NDB$]server_id, [NDB$]master_server_id, [NDB$]master_epoch, [NDB$]count)
) ENGINE=NDB;

The first four columns are required. The names of the first four columns and the columns matching the original table's primary key columns are not critical; however, we suggest for reasons of clarity and consistency, that you use the names shown here for the server_id, master_server_id, master_epoch, and count columns, and that you use the same names as in the original table for the columns matching those in the original table's primary key.

Starting with MySQL Cluster NDB 7.4.1, if the exceptions table uses one or more of the optional columns NDB$OP_TYPE, NDB$CFT_CAUSE, or NDB$ORIG_TRANSID discussed later in this section, then each of the required columns must also be named using the prefix NDB$. If desired, you can use the NDB$ prefix to name the required columns even if you do not define any optional columns, but in this case, all four of the required columns must be named using the prefix.

Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table. The data types for the columns duplicating the primary key columns of the original table should be the same as (or larger than) those of the original columns. In MySQL Cluster NDB 7.3 and earlier, the exceptions table's primary key must be reproduced column for column. Beginning with MySQL Cluster NDB 7.4.1, a subset of the primary key columns may be used instead.

Regardless of the MySQL Cluster version employed, the exceptions table must use the NDB storage engine. (An example that uses NDB$OLD() with an exceptions table is shown later in this section.)

Additional columns may optionally be defined following the copied primary key columns, but not before any of them; any such extra columns cannot be NOT NULL. In MySQL Cluster NDB 7.4.1 and later, support is provided for three additional, predefined optional columns NDB$OP_TYPE, NDB$CFT_CAUSE, and NDB$ORIG_TRANSID, which are described in the next few paragraphs.

NDB$OP_TYPE: This column can be used to obtain the type of operation causing the conflict. If you use this column, define it as shown here:

NDB$OP_TYPE ENUM('WRITE_ROW', 'UPDATE_ROW', 'DELETE_ROW', 
    'REFRESH_ROW', 'READ_ROW') NOT NULL

The WRITE_ROW, UPDATE_ROW, and DELETE_ROW operation types represent user-initiated operations. REFRESH_ROW operations are operations generated by conflict resolution in compensating transactions sent back to the originating cluster from the cluster that detected the conflict. READ_ROW operations are user-initiated read tracking operations defined with exclusive row locks.

NDB$CFT_CAUSE: You can define an optional column NDB$CFT_CAUSE which provides the cause of the registered conflict. This column, if used, is defined as shown here:

NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
    'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL

ROW_DOES_NOT_EXIST can be reported as the cause for UPDATE_ROW and WRITE_ROW operations; ROW_ALREADY_EXISTS can be reported for WRITE_ROW events. DATA_IN_CONFLICT is reported when a row-based conflict function detects a conflict; TRANS_IN_CONFLICT is reported when a transactional conflict function rejects all of the operations belonging to a complete transaction.

NDB$ORIG_TRANSID: The NDB$ORIG_TRANSID column, if used, contains the ID of the originating transaction. This column should be defined as follows:

NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL

NDB$ORIG_TRANSID is a 64-bit value generated by NDB. This value can be used to correlate multiple exceptions table entries belonging to the same conflicting transaction from the same or different exceptions tables.

In MySQL Cluster NDB 7.4.1 and later, additional reference columns which are not part of the original table's primary key can be named colname$OLD or colname$NEW. colname$OLD references old values in update and delete operations—that is, operations containing DELETE_ROW events. colname$NEW can be used to reference new values in insert and update operations—in other words, operations using WRITE_ROW events, UPDATE_ROW events, or both types of events. Where a conflicting operation does not supply a value for a given non-primary-key reference column, the exceptions table row contains either NULL, or a defined default value for that column.

Important

The mysql.ndb_replication table is read when a data table is set up for replication, so the row corresponding to a table to be replicated must be inserted into mysql.ndb_replication before the table to be replicated is created.

Examples

The following examples assume that you have already a working MySQL Cluster replication setup, as described in Section 18.6.5, “Preparing the MySQL Cluster for Replication”, and Section 18.6.6, “Starting MySQL Cluster Replication (Single Replication Channel)”.

NDB$MAX() example.  Suppose you wish to enable greatest timestamp wins conflict resolution on table test.t1, using column mycol as the timestamp. This can be done using the following steps:

  1. Make sure that you have started the master mysqld with --ndb-log-update-as-write=OFF.

  2. On the master, perform this INSERT statement:

    INSERT INTO mysql.ndb_replication
        VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');
    

    Inserting a 0 into the server_id indicates that all SQL nodes accessing this table should use conflict resolution. If you want to use conflict resolution on a specific mysqld only, use the actual server ID.

    Inserting NULL into the binlog_type column has the same effect as inserting 0 (NBT_DEFAULT); the server default is used.

  3. Create the test.t1 table:

    CREATE TABLE test.t1 (
        columns
        mycol INT UNSIGNED,
        columns
    ) ENGINE=NDB;
    

    Now, when updates are done on this table, conflict resolution is applied, and the version of the row having the greatest value for mycol is written to the slave.

Note

Other binlog_type options—such as NBT_UPDATED_ONLY_USE_UPDATE should be used to control logging on the master using the ndb_replication table rather than by using command-line options.

NDB$OLD() example.  Suppose an NDB table such as the one defined here is being replicated, and you wish to enable same timestamp wins conflict resolution for updates to this table:

CREATE TABLE test.t2  (
    a INT UNSIGNED NOT NULL,
    b CHAR(25) NOT NULL,
    columns,
    mycol INT UNSIGNED NOT NULL,
    columns,
    PRIMARY KEY pk (a, b)
)   ENGINE=NDB;

The following steps are required, in the order shown:

  1. First—and prior to creating test.t2—you must insert a row into the mysql.ndb_replication table, as shown here:

    INSERT INTO mysql.ndb_replication
        VALUES ('test', 't2', 0, NULL, 'NDB$OLD(mycol)');
    

    Possible values for the binlog_type column are shown earlier in this section. The value 'NDB$OLD(mycol)' should be inserted into the conflict_fn column.

  2. Create an appropriate exceptions table for test.t2. The table creation statement shown here includes all required columns; any additional columns must be declared following these columns, and before the definition of the table's primary key.

    CREATE TABLE test.t2$EX  (
        server_id SMALLINT UNSIGNED,
        master_server_id INT UNSIGNED,
        master_epoch BIGINT UNSIGNED,
        count BIGINT UNSIGNED,
        a INT UNSIGNED NOT NULL,
        b CHAR(25) NOT NULL,
    
        [additional_columns,]
    
        PRIMARY KEY(server_id, master_server_id, master_epoch, count)
    )   ENGINE=NDB;
    

    In MySQL Cluster NDB 7.4.1 and later, we can include additional columns for information about the type, cause, and originating transaction ID for a given conflict. We are also not required to supply matching columns for all primary key columns in the original table. In these versions, you can create the exceptions table like this:

    CREATE TABLE test.t2$EX  (
        NDB$server_id SMALLINT UNSIGNED,
        NDB$master_server_id INT UNSIGNED,
        NDB$master_epoch BIGINT UNSIGNED,
        NDB$count BIGINT UNSIGNED,
        a INT UNSIGNED NOT NULL,
        
        NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 
          'REFRESH_ROW', 'READ_ROW') NOT NULL,
        NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 
          'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
        NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
    
        [additional_columns,]
    
        PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)
    )   ENGINE=NDB;
    

    Note that the NDB$ prefix is required for the four required columns since we included at least one of the columns NDB$OP_TYPE, NDB$CFT_CAUSE, or NDB$ORIG_TRANSID in the table definition.

  3. Create the table test.t2 as shown previously.

These steps must be followed for every table for which you wish to perform conflict resolution using NDB$OLD(). For each such table, there must be a corresponding row in mysql.ndb_replication, and there must be an exceptions table in the same database as the table being replicated.

Read conflict detection and resolution.  MySQL Cluster NDB 7.4.1 and later support tracking of read operations, which makes it possible in circular replication setups to manage conflicts between reads of a given row in one cluster and updates or deletes of the same row in another. This example uses employee and department tables to model a scenario in which an employee is moved from one department to another on the master cluster (which we refer to hereafter as cluster A) while the slave cluster (hereafter B) updates the employee count of the employee's former department in an interleaved transaction.

The data tables have been created using the following SQL statements:

# Employee table
CREATE TABLE employee (
    id INT PRIMARY KEY,
    name VARCHAR(2000),
    dept INT NOT NULL
)   ENGINE=NDB;  

# Department table
CREATE TABLE department (
    id INT PRIMARY KEY,
    name VARCHAR(2000),
    members INT
)   ENGINE=NDB;

The contents of the two tables include the rows shown in the (partial) output of the following SELECT statements:

mysql> SELECT id, name, dept FROM employee;
+---------------+------+
| id   | name   | dept |
+------+--------+------+
...
| 998  |  Mike  | 3    |
| 999  |  Joe   | 3    |
| 1000 |  Mary  | 3    |
...
+------+--------+------+

mysql> SELECT id, name, members FROM department;
+-----+-------------+---------+
| id  | name        | members |
+-----+-------------+---------+
...
| 3   | Old project | 24      |
...
+-----+-------------+---------+

We assume that we are already using an exceptions table that includes the four required columns (and these are used for this table's primary key), the optional columns for operation type and cause, and the original table's primary key column, created using the SQL statement shown here:

CREATE TABLE employee$EX  (
    NDB$server_id INT UNSIGNED,
    NDB$master_server_id INT UNSIGNED,
    NDB$master_epoch BIGINT UNSIGNED,
    NDB$count INT UNSIGNED,

    NDB$OP_TYPE ENUM( 'WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
                      'REFRESH_ROW','READ_ROW') NOT NULL,
    NDB$CFT_CAUSE ENUM( 'ROW_DOES_NOT_EXIST',
                        'ROW_ALREADY_EXISTS',
                        'DATA_IN_CONFLICT',
                        'TRANS_IN_CONFLICT') NOT NULL,

    id INT NOT NULL,
                    
    PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)
)   ENGINE=NDB;

Suppose there occur the two simultaneous transactions on the two clusters. On cluster A, we create a new department, then move employee number 999 into that department, using the following SQL statements:

BEGIN;
  INSERT INTO department VALUES (4, "New project", 1);      
  UPDATE employee SET dept = 4 WHERE id = 999;
COMMIT;

At the same time, on cluster B, another transaction reads from employee, as shown here:

BEGIN;
  SELECT name FROM employee WHERE id = 999;
  UPDATE department SET members = members - 1  WHERE id = 3;
commit;

The conflicting transactions are not normally detected by the conflict resolution mechanism, since the conflict is between a read (SELECT) and an update operation. Beginning with MySQL Cluster NDB 7.4.1, we can circumvent this issue by executing SET ndb_log_exclusive_reads = 1 on the slave cluster. Acquiring exclusive read locks in this way causes any rows read on the master to be flagged as needing conflict resolution on the slave cluster. If we enable exclusive reads in this way prior to the logging of these transactions, the read on cluster B is tracked and sent to cluster A for resolution; the conflict on the employee row will be detected and the transaction on cluster B is aborted.

The conflict is registered in the exceptions table (on cluster A) as a READ_ROW operation (see Conflict resolution exceptions table, for a description of operation types), as shown here:

mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
+-------+-------------+-------------------+
| id    | NDB$OP_TYPE | NDB$CFT_CAUSE     |
+-------+-------------+-------------------+
...
| 999   | READ_ROW    | TRANS_IN_CONFLICT |
+-------+-------------+-------------------+

Any existing rows found in the read operation are flagged. This means that multiple rows resulting from the same conflict may be logged in the exception table, as shown by examining the effects a conflict between an update on cluster A and a read of multiple rows on cluster B from the same table in simultaneous transactions. The transaction executed on cluster A is shown here:

BEGIN;
  INSERT INTO department VALUES (4, "New project", 0);      
  UPDATE employee SET dept = 4 WHERE dept = 3;
  SELECT COUNT(*) INTO @count FROM employee WHERE dept = 4; 
  UPDATE department SET members = @count WHERE id = 4;
COMMIT;

Concurrently a transaction containing the statements shown here runs on cluster B:

SET ndb_log_exclusive_reads = 1;  # Must be set if not already enabled
...
BEGIN;
  SELECT COUNT(*) INTO @count FROM employee WHERE dept = 3 FOR UPDATE; 
  UPDATE department SET members = @count WHERE id = 3;
COMMIT;

In this case, all three rows matching the WHERE condition in the second transaction's SELECT are read, and are thus flagged in the exceptions table, as shown here:

mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
+-------+-------------+-------------------+
| id    | NDB$OP_TYPE | NDB$CFT_CAUSE     |
+-------+-------------+-------------------+
...
| 998   | READ_ROW    | TRANS_IN_CONFLICT | 
| 999   | READ_ROW    | TRANS_IN_CONFLICT | 
| 1000  | READ_ROW    | TRANS_IN_CONFLICT |
...
+-------+-------------+-------------------+ 

Read tracking is performed on the basis of existing rows only. A read based on a given condition track conflicts only of any rows that are found and not of any rows that are inserted in an interleaved transaction. This is similar to how exclusive row locking is performed in a single instance of MySQL Cluster.