22 Changes in MySQL NDB Cluster 7.2.17 (5.5.37-ndb-7.2.17) (2014-07-11, General Availability)

MySQL NDB Cluster 7.2.17 is a new release of NDB Cluster, incorporating new features in the NDB storage engine, and fixing recently discovered bugs in previous MySQL NDB Cluster 7.2 development releases.

Obtaining MySQL NDB Cluster 7.2. MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.37 (see Changes in MySQL 5.5.37 (2014-03-27, General Availability)).

Functionality Added or Changed

NDB Cluster APIs: Added as an aid to debugging the ability to specify a human-readable name for a given Ndb object and later to retrieve it. These operations are implemented, respectively, as the setNdbObjectName() and getNdbObjectName() methods.
To make tracing of event handling between a user application and NDB easier, you can use the reference (from getReference() followed by the name (if provided) in printouts; the reference ties together the application Ndb object, the event buffer, and the NDB storage engine's SUMA block. (Bug #18419907)

Bugs Fixed

NDB Replication: When using NDB$EPOCH_TRANS, conflicts between DELETE operations were handled like conflicts between updates, with the primary rejecting the transaction and dependents, and realigning the secondary. This meant that their behavior with regard to subsequent operations on any affected row or rows depended on whether they were in the same epoch or a different one: within the same epoch, they were considered conflicting events; in different epochs, they were not considered in conflict.
This fix brings the handling of conflicts between deletes by NDB$EPOCH_TRANS with that performed when using NDB$EPOCH for conflict detection and resolution, and extends testing with NDB$EPOCH and NDB$EPOCH_TRANS to include “delete-delete” conflicts, and encapsulate the expected result, with transactional conflict handling modified so that a conflict between DELETE operations alone is not sufficient to cause a transaction to be considered in conflict. (Bug #18459944)
NDB Cluster APIs: When an NDB data node indicates a buffer overflow via an empty epoch, the event buffer places an inconsistent data event in the event queue. When this was consumed, it was not removed from the event queue as expected, causing subsequent nextEvent() calls to return 0. This caused event consumption to stall because the inconsistency remained flagged forever, while event data accumulated in the queue.
Event data belonging to an empty inconsistent epoch can be found either at the beginning or somewhere in the middle. pollEvents() returns 0 for the first case. This fix handles the second case: calling nextEvent() call dequeues the inconsistent event before it returns. In order to benefit from this fix, user applications must call nextEvent() even when pollEvents() returns 0. (Bug #18716991)
NDB Cluster APIs: The pollEvents() method returned 1, even when called with a wait time equal to 0, and there were no events waiting in the queue. Now in such cases it returns 0 as expected. (Bug #18703871)
MySQL NDB ClusterJ: Writing a value failed when read from a fixed-width char column using utf8 to another column of the same type and length but using latin1. The data was returned with extra spaces after being padded during its insertion. The value is now trimmed before returning it.
This fix also corrects Data length too long errors during the insertion of valid utf8 characters of 2 or more bytes. This was due to padding of the data before encoding it, rather than after. (Bug #71435, Bug #18283369)
Processing a NODE_FAILREP signal that contained an invalid node ID could cause a data node to fail. (Bug #18993037, Bug #73015)
References: This issue is a regression of: Bug #16007980.
When building out of source, some files were written to the source directory instead of the build dir. These included the manifest.mf files used for creating ClusterJ jars and the pom.xml file used by mvn_install_ndbjtie.sh. In addition, ndbinfo.sql was written to the build directory, but marked as output to the source directory in CMakeLists.txt. (Bug #18889568, Bug #72843)
It was possible for a data node restart to become stuck indefinitely in start phase 101 (see Summary of NDB Cluster Start Phases) when there were connection problems between the node being restarted and one or more subscribing API nodes.
To help prevent this from happening, a new data node configuration parameter RestartSubscriberConnectTimeout has been introduced, which can be used to control how long a data node restart can stall in start phase 101 before giving up and attempting to restart again. The default is 12000 ms. (Bug #18599198)
Executing ALTER TABLE ... REORGANIZE PARTITION after increasing the number of data nodes in the cluster from 4 to 16 led to a crash of the data nodes. This issue was shown to be a regression caused by previous fix which added a new dump handler using a dump code that was already in use (7019), which caused the command to execute two different handlers with different semantics. The new handler was assigned a new DUMP code (7024). (Bug #18550318)
References: This issue is a regression of: Bug #14220269.
Following a long series of inserts, when running with a relatively small redo log and an insufficient large value for MaxNoOfConcurrentTransactions, there remained transactions that were blocked by the lack of redo log and were thus not aborted in the correct state (waiting for prepare log to be sent to disk, or LOG_QUEUED state). This caused the redo log to remain blocked until unblocked by a completion of a local checkpoint. This could lead to a deadlock, when the blocked aborts in turned blocked global checkpoints, and blocked GCPs block LCPs. To prevent this situation from arising, we now abort immediately when we reach the LOG_QUEUED state in the abort state handler. (Bug #18533982)
ndbmtd supports multiple parallel receiver threads, each of which performs signal reception for a subset of the remote node connections (transporters) with the mapping of remote_nodes to receiver threads decided at node startup. Connection control is managed by the multi-instance TRPMAN block, which is organized as a proxy and workers, and each receiver thread has a TRPMAN worker running locally.
The QMGR block sends signals to TRPMAN to enable and disable communications with remote nodes. These signals are sent to the TRPMAN proxy, which forwards them to the workers. The workers themselves decide whether to act on signals, based on the set of remote nodes they manage.
The current issue arises because the mechanism used by the TRPMAN workers for determining which connections they are responsible for was implemented in such a way that each worker thought it was responsible for all connections. This resulted in the TRPMAN actions for OPEN_COMORD, ENABLE_COMREQ, and CLOSE_COMREQ being processed multiple times.
The fix keeps TRPMAN instances (receiver threads) executing OPEN_COMORD, ENABLE_COMREQ and CLOSE_COMREQ requests. In addition, the correct TRPMAN instance is now chosen when routing from this instance for a specific remote connection. (Bug #18518037)
During data node failure handling, the transaction coordinator performing takeover gathers all known state information for any failed TC instance transactions, determines whether each transaction has been committed or aborted, and informs any involved API nodes so that they can report this accurately to their clients. The TC instance provides this information by sending TCKEY_FAILREF or TCKEY_FAILCONF signals to the API nodes as appropriate top each affected transaction.
In the event that this TC instance does not have a direct connection to the API node, it attempts to deliver the signal by routing it through another data node in the same node group as the failing TC, and sends a GSN_TCKEY_FAILREFCONF_R signal to TC block instance 0 in that data node. A problem arose in the case of multiple transaction cooridnators, when this TC instance did not have a signal handler for such signals, which led it to fail.
This issue has been corrected by adding a handler to the TC proxy block which in such cases forwards the signal to one of the local TC worker instances, which in turn attempts to forward the signal on to the API node. (Bug #18455971)
When running with a very slow main thread, and one or more transaction coordinator threads, on different CPUs, it was possible to encounter a timeout when sending a DIH_SCAN_GET_NODESREQ signal, which could lead to a crash of the data node. Now in such cases the timeout is avoided. (Bug #18449222)
Failure of multiple nodes while using ndbmtd with multiple TC threads was not handled gracefully under a moderate amount of traffic, which could in some cases lead to an unplanned shutdown of the cluster. (Bug #18069334)
A local checkpoint (LCP) is tracked using a global LCP state (c_lcpState), and each NDB table has a status indicator which indicates the LCP status of that table (tabLcpStatus). If the global LCP state is LCP_STATUS_IDLE, then all the tables should have an LCP status of TLS_COMPLETED.
When an LCP starts, the global LCP status is LCP_INIT_TABLES and the thread starts setting all the NDB tables to TLS_ACTIVE. If any tables are not ready for LCP, the LCP initialization procedure continues with CONTINUEB signals until all tables have become available and been marked TLS_ACTIVE. When this initialization is complete, the global LCP status is set to LCP_STATUS_ACTIVE.
This bug occurred when the following conditions were met:
- An LCP was in the LCP_INIT_TABLES state, and some but not all tables had been set to TLS_ACTIVE.
- The master node failed before the global LCP state changed to LCP_STATUS_ACTIVE; that is, before the LCP could finish processing all tables.
- The NODE_FAILREP signal resulting from the node failure was processed before the final CONTINUEB signal from the LCP initialization process, so that the node failure was processed while the LCP remained in the LCP_INIT_TABLES state.
Following master node failure and selection of a new one, the new master queries the remaining nodes with a MASTER_LCPREQ signal to determine the state of the LCP. At this point, since the LCP status was LCP_INIT_TABLES, the LCP status was reset to LCP_STATUS_IDLE. However, the LCP status of the tables was not modified, so there remained tables with TLS_ACTIVE. Afterwards, the failed node is removed from the LCP. If the LCP status of a given table is TLS_ACTIVE, there is a check that the global LCP status is not LCP_STATUS_IDLE; this check failed and caused the data node to fail.
Now the MASTER_LCPREQ handler ensures that the tabLcpStatus for all tables is updated to TLS_COMPLETED when the global LCP status is changed to LCP_STATUS_IDLE. (Bug #18044717)
When performing a copying ALTER TABLE operation, mysqld creates a new copy of the table to be altered. This intermediate table, which is given a name bearing the prefix #sql-, has an updated schema but contains no data. mysqld then copies the data from the original table to this intermediate table, drops the original table, and finally renames the intermediate table with the name of the original table.
mysqld regards such a table as a temporary table and does not include it in the output from SHOW TABLES; mysqldump also ignores an intermediate table. However, NDB sees no difference between such an intermediate table and any other table. This difference in how intermediate tables are viewed by mysqld (and MySQL client programs) and by the NDB storage engine can give rise to problems when performing a backup and restore if an intermediate table existed in NDB, possibly left over from a failed ALTER TABLE that used copying. If a schema backup is performed using mysqldump and the mysql client, this table is not included. However, in the case where a data backup was done using the ndb_mgm client's BACKUP command, the intermediate table was included, and was also included by ndb_restore, which then failed due to attempting to load data into a table which was not defined in the backed up schema.
To prevent such failures from occurring, ndb_restore now by default ignores intermediate tables created during ALTER TABLE operations (that is, tables whose names begin with the prefix #sql-). A new option --exclude-intermediate-sql-tables is added that makes it possible to override the new behavior. The option's default value is TRUE; to cause ndb_restore to revert to the old behavior and to attempt to restore intermediate tables, set this option to FALSE. (Bug #17882305)
The logging of insert failures has been improved. This is intended to help diagnose occasional issues seen when writing to the mysql.ndb_binlog_index table. (Bug #17461625)
Employing a CHAR column that used the UTF8 character set as a table's primary key column led to node failure when restarting data nodes. Attempting to restore a table with such a primary key also caused ndb_restore to fail. (Bug #16895311, Bug #68893)
The --order (-o) option for the ndb_select_all utility worked only when specified as the last option, and did not work with an equals sign.
As part of this fix, the program's --help output was also aligned with the --order option's correct behavior. (Bug #64426, Bug #16374870)