MySQL NDB Cluster 7.2 Release Notes

24 Changes in MySQL NDB Cluster 7.2.15 (5.5.35-ndb-7.2.15) (2014-02-05, General Availability)

MySQL NDB Cluster 7.2.15 is a new release of NDB Cluster, incorporating new features in the NDB storage engine, and fixing recently discovered bugs in previous MySQL NDB Cluster 7.2 development releases.

Obtaining MySQL NDB Cluster 7.2.  MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.35 (see Changes in MySQL 5.5.35 (2013-12-03, General Availability)).

Bugs Fixed

  • Packaging; Solaris: Compilation of ndbmtd failed on Solaris 10 and 11 for 32-bit x86, and the binary was not included in the binary distributions for these platforms. (Bug #16620938)

  • Microsoft Windows: Timers used in timing scheduler events in the NDB kernel have been refactored, in part to insure that they are monotonic on all platforms. In particular, on Windows, event intervals were previously calculated using values obtained from GetSystemTimeAsFileTime(), which reads directly from the system time (wall clock), and which may arbitrarily be reset backward or forward, leading to false watchdog or heartbeat alarms, or even node shutdown. Lack of timer monotonicity could also cause slow disk writes during backups and global checkpoints. To fix this issue, the Windows implementation now uses QueryPerformanceCounters() instead of GetSystemTimeAsFileTime(). In the event that a monotonic timer is not found on startup of the data nodes, a warning is logged.

    In addition, on all platforms, a check is now performed at compile time for available system monotonic timers, and the build fails if one cannot be found; note that CLOCK_HIGHRES is now supported as an alternative for CLOCK_MONOTONIC if the latter is not available. (Bug #17647637)

  • NDB Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)

  • NDB Cluster APIs: UINT_MAX64 was treated as a signed value by Visual Studio 2010. To prevent this from happening, the value is now explicitly defined as unsigned. (Bug #17947674)

    References: See also: Bug #17647637.

  • NDB Cluster APIs: It was possible for an Ndb object to receive signals for handling before it was initialized, leading to thread interleaving and possible data node failure when executing a call to Ndb::init(). To guard against this happening, a check is now made when it is starting to receive signals that the Ndb object is properly initialized before any signals are actually handled. (Bug #17719439)

  • NDB Cluster APIs: Compilation of example NDB API program files failed due to missing include directives. (Bug #17672846, Bug #70759)

  • ndbmemcache: When attempting to start memcached with a cache_size larger than that of the available memory and with preallocate=true failed, the error message provided only a numeric code, and did not indicate what the actual source of the error was. (Bug #17509293, Bug #70403)

  • ndbmemcache: A memcached server running the NDB engine could crash after being disconnected from a cluster. (Bug #14055851)

  • MySQL NDB ClusterJ: Call of setPartitionKey() failed after a previous transaction failed. It was because the state of the transaction was not properly cleaned up after the transaction failure. This fix adds the clean-up that is needed in the situation. (Bug #17885485)

  • Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)

  • Under certain specific circumstances, in a cluster having two SQL nodes, one of these could hang, and could not be accessed again even after killing the mysqld process and restarting it. (Bug #17875885, Bug #18080104)

    References: See also: Bug #17934985.

  • Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)

    References: See also: Bug #17475425, Bug #17647637.

  • In some cases, with ndb_join_pushdown enabled, it was possible to obtain from a valid query the error Got error 290 'Corrupt key in TC, unable to xfrm' from NDBCLUSTER even though the data was not actually corrupted.

    It was determined that a NULL in a VARCHAR column could be used to construct a lookup key, but since NULL is never equal to any other value, such a lookup could simple have been eliminated instead. This NULL lookup in turn led to the spurious error message.

    This fix takes advantage of the fact that a key lookup with NULL never finds any matching rows, and so NDB does not try to perform the lookup that would have led to the error. (Bug #17845161)

  • It was theoretically possible in certain cases for a number of output functions internal to the NDB code to supply an uninitialized buffer as output. Now in such cases, a newline character is printed instead. (Bug #17775602, Bug #17775772)

  • Use of the localtime() function in NDB multithreading code led to otherwise nondeterministic failures in ndbmtd. This fix replaces this function, which on many platforms uses a buffer shared among multiple threads, with localtime_r(), which can have allocated to it a buffer of its own. (Bug #17750252)

  • When using single-threaded (ndbd) data nodes with RealTimeScheduler enabled, the CPU did not, as intended, temporarily lower its scheduling priority to normal every 10 milliseconds to give other, non-realtime threads a chance to run. (Bug #17739131)

  • During arbitrator selection, QMGR (see The QMGR Block) runs through a series of states, the first few of which are (in order) NULL, INIT, FIND, PREP1, PREP2, and START. A check for an arbitration selection timeout occurred in the FIND state, even though the corresponding timer was not set until QMGR reached the PREP1 and PREP2 states. Attempting to read the resulting uninitialized timestamp value could lead to false Could not find an arbitrator, cluster is not partition-safe warnings.

    This fix moves the setting of the timer for arbitration timeout to the INIT state, so that the value later read during FIND is always initialized. (Bug #17738720)

  • The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.

    In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)

    References: See also: Bug #17842035.

  • The length of the interval (intended to be 10 seconds) between warnings for GCP_COMMIT when the GCP progress watchdog did not detect progress in a global checkpoint was not always calculated correctly. (Bug #17647213)

  • In certain rare cases on commit of a transaction, an Ndb object was released before the transaction coordinator (DBTC kernel block) sent the expected COMMIT_CONF signal; NDB failed to send a COMMIT_ACK signal in response, which caused a memory leak in the NDB kernel could later lead to node failure.

    Now an Ndb object is not released until the COMMIT_CONF signal has actually been received. (Bug #16944817)

  • After restoring the database metadata (but not any data) by running ndb_restore --restore-meta (or -m), SQL nodes would hang while trying to SELECT from a table in the database to which the metadata was restored. In such cases the attempt to query the table now fails as expected, since the table does not actually exist until ndb_restore is executed with --restore-data (-r). (Bug #16890703)

    References: See also: Bug #21184102.

  • Losing its connections to the management node or data nodes while a query against the ndbinfo.memoryusage table was in progress caused the SQL node where the query was issued to fail. (Bug #14483440, Bug #16810415)

  • The ndbd_redo_log_reader utility now supports a --help option. Using this options causes the program to print basic usage information, and then to exit. (Bug #11749591, Bug #36805)