MySQL NDB Cluster 7.2 Release Notes

33 Changes in MySQL NDB Cluster 7.2.6 (5.5.22-ndb-7.2.6) (2012-05-21, General Availability)

MySQL NDB Cluster 7.2.6 is a new release of NDB Cluster, incorporating new features in the NDB storage engine, and fixing recently discovered bugs in previous MySQL NDB Cluster 7.2 releases.

Obtaining MySQL NDB Cluster 7.2.  MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.22 (see Changes in MySQL 5.5.22 (2012-03-21, General Availability)).

Bugs Fixed

  • Important Change: The ALTER ONLINE TABLE ... REORGANIZE PARTITION statement can be used to create new table partitions after new empty nodes have been added to a MySQL NDB Cluster. Usually, the number of partitions to create is determined automatically, such that, if no new partitions are required, then none are created. This behavior can be overridden by creating the original table using the MAX_ROWS option, which indicates that extra partitions should be created to store a large number of rows. However, in this case ALTER ONLINE TABLE ... REORGANIZE PARTITION simply uses the MAX_ROWS value specified in the original CREATE TABLE statement to determine the number of partitions required; since this value remains constant, so does the number of partitions, and so no new ones are created. This means that the table is not rebalanced, and the new data nodes remain empty.

    To solve this problem, support is added for ALTER ONLINE TABLE ... MAX_ROWS=newvalue, where newvalue is greater than the value used with MAX_ROWS in the original CREATE TABLE statement. This larger MAX_ROWS value implies that more partitions are required; these are allocated on the new data nodes, which restores the balanced distribution of the table data.

    For more information, see ALTER TABLE Statement, and Adding NDB Cluster Data Nodes Online. (Bug #13714648)

  • NDB Replication: Error handling in conflict detection and resolution has been improved to include errors generated for reasons other than operation execution errors, and to distinguish better between permanent errors and transient errors. Transactions failing due to transient problems are now retried rather than leading to SQL node shutdown as occurred in some cases. (Bug #13428909)

  • NDB Replication: DDL statements could sometimes be missed during replication channel cutover, due to the fact that there may not be any epochs following the last applied epoch when the slave is up to date and no new epoch has been finalized on the master. Because epochs are not consecutively numbered, there may be a gap between the last applied epoch and the next epoch; thus it is not possible to determine the number assigned to the next epoch. This meant that, if the new master did not have all epochs, it was possible for those epochs containing only DDL statements to be skipped over.

    The fix for this problem includes modifications to mysqld binary logging code so that the next position in the binary log following the COMMIT event at the end of an epoch transaction, as well as the addition of two new columns next_file and next_position to the mysql.ndb_binlog_index table. In addition, a new replication channel cutover mechanism is defined that employs these new columns. To make use of the new cutover mechanism, it is necessary to modify the query used to obtain the start point; in addition, to simplify prevention of possible errors caused by duplication of DDL statements, a new shorthand value ddl_exist_errors is implemented for use with the mysqld option --slave-skip-errors. It is highly recommended that you use this option and value on the new replication slave when using the modified query.

    For more information, see Implementing Failover with NDB Cluster Replication.

    Note that the existing replication channel cutover mechanism continues to function as before, including the same limitations described previously. (Bug #11762277, Bug #54854)

  • NDB Cluster APIs: An assert in memcache/include/Queue.h by NDB could cause memcached to fail. (Bug #13874027)

  • An error handling routine in the local query handler (DBLQH) used the wrong code path, which could corrupt the transaction ID hash, causing the data node process to fail. This could in some cases possibly lead to failures of other data nodes in the same node group when the failed node attempted to restart. (Bug #14083116)

  • When a fragment scan occurring as part of a local checkpoint (LCP) stopped progressing, this kept the entire LCP from completing, which could result it redo log exhaustion, write service outage, inability to recover nodes, and longer system recovery times. To help keep this from occurring, MySQL NDB Cluster now implements an LCP watchdog mechanism, which monitors the fragment scans making up the LCP and takes action if the LCP is observed to be delinquent.

    This is intended to guard against any scan related system-level I/O errors or other issues causing problems with LCP and thus having a negative impact on write service and recovery times. Each node independently monitors the progress of local fragment scans occurring as part of an LCP. If no progress is made for 20 seconds, warning logs are generated every 10 seconds thereafter for up to 1 minute. At this point, if no progress has been made, the fragment scan is considered to have hung, and the node is restarted to enable the LCP to continue.

    In addition, a new ndbd exit code NDBD_EXIT_LCP_SCAN_WATCHDOG_FAIL is added to identify when this occurs. See LQH Errors, for more information. (Bug #14075825)

  • It could sometimes happen that a query pushed down to the data nodes could refer to buffered rows which had been released, and possibly overwritten by other rows. Such rows, if overwritten, could lead to incorrect results from a pushed query, and possibly even to failure of one or more data nodes or SQL nodes. (Bug #14010406)

  • DUMP 2303 in the ndb_mgm client now includes the status of the single fragment scan record reserved for a local checkpoint. (Bug #13986128)

  • Pushed joins performed as part of a stored procedure or trigger could cause spurious Out of memory errors on the SQL node where they were executed. (Bug #13945264)

    References: See also: Bug #13944272.

  • INSERT ... SELECT executed inside a trigger or stored procedure with ndb_join_pushdown enabled could lead to a crash of the SQL node on which it was run. (Bug #13901890)

    References: See also: Bug #13945264.

  • When upgrading or downgrading between a MySQL NDB Cluster version supporting distributed pushdown joins (MySQL NDB Cluster 7.2 and later) and one that did not, queries that the later MySQL NDB Cluster version tried to push down could cause data nodes still running the earlier version to fail. Now the SQL nodes check the version of the software running on the data nodes, so that queries are not pushed down if there are any data nodes in the cluster that do not support pushdown joins. (Bug #13894817)

  • ndbmemcached exited unexpectedly when more than 128 clients attempted to connect concurrently using prefixes. In addition, a NOT FOUND error was returned when the memcached engine encountered a temporary error from NDB; now the error No Ndb Instances in freelist is returned instead. (Bug #13890064, Bug #13891085)

  • The performance of ndbmemcache with a workload that consisted mostly of primary key reads became degraded. (Bug #13868787, Bug #64713)

  • When the --skip-config-cache and --initial options were used together, ndb_mgmd failed to start. (Bug #13857301)

  • The memcached server failed to build correctly on 64-bit Solaris/SPARC. (Bug #13854122)

  • ALTER ONLINE TABLE failed when a DEFAULT option was used. (Bug #13830980)

  • In some cases, restarting data nodes spent a very long time in Start Phase 101, when API nodes must connect to the starting node (using NdbEventOperation), when the API nodes trying to connect failed in a live-lock scenario. This connection process uses a handshake during which a small number of messages are exchanged, with a timeout used to detect failures during the handshake.

    Prior to this fix, this timeout was set such that, if one API node encountered the timeout, all other nodes connecting would do the same. The fix also decreases this timeout. This issue (and the effects of the fix) are most likely to be observed on relatively large configurations having 10 or more data nodes and 200 or more API nodes. (Bug #13825163)

  • ndbmtd failed to restart when the size of a table definition exceeded 32K.

    (The size of a table definition is dependent upon a number of factors, but in general the 32K limit is encountered when a table has 250 to 300 columns.) (Bug #13824773)

  • An initial start using ndbmtd could sometimes hang. This was due to a state which occurred when several threads tried to flush a socket buffer to a remote node. In such cases, to minimize flushing of socket buffers, only one thread actually performs the send, on behalf of all threads. However, it was possible in certain cases for there to be data in the socket buffer waiting to be sent with no thread ever being chosen to perform the send. (Bug #13809781)

  • When trying to use ndb_size.pl --hostname=host:port to connect to a MySQL server running on a nonstandard port, the port argument was ignored. (Bug #13364905, Bug #62635)

  • The transaction_allow_batching server system variable was inadvertently removed from the NDB 7.2 codebase prior to General Availability. This fix restores the variable. (Bug #64697, Bug #13891116, Bug #13947227)