MySQL NDB Cluster 7.2 Release Notes

29 Changes in MySQL NDB Cluster 7.2.10 (5.5.29-ndb-7.2.10) (2013-01-02, General Availability)

MySQL NDB Cluster 7.2.10 is a new release of NDB Cluster, incorporating new features in the NDB storage engine, and fixing recently discovered bugs in previous MySQL NDB Cluster 7.2 releases.

Obtaining MySQL NDB Cluster 7.2.  MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.29 (see Changes in MySQL 5.5.29 (2012-12-21, General Availability)).

Functionality Added or Changed

  • Added several new columns to the transporters table and counters for the counters table of the ndbinfo information database. The information provided may help in troublehsooting of transport overloads and problems with send buffer memory allocation. For more information, see the descriptions of these tables. (Bug #15935206)

  • To provide information which can help in assessing the current state of arbitration in a MySQL NDB Cluster as well as in diagnosing and correcting arbitration problems, 3 new tables—membership, arbitrator_validity_detail, and arbitrator_validity_summary—have been added to the ndbinfo information database. (Bug #13336549)

Bugs Fixed

  • NDB Replication: Setting slave_allow_batching had no effect. (Bug #15953730)

  • When an NDB table grew to contain approximately one million rows or more per partition, it became possible to insert rows having duplicate primary or unique keys into it. In addition, primary key lookups began to fail, even when matching rows could be found in the table by other means.

    This issue was introduced in MySQL NDB Cluster 7.0.36, MySQL NDB Cluster 7.1.26, and MySQL NDB Cluster 7.2.9. Signs that you may have been affected include the following:

    • Rows left over that should have been deleted

    • Rows unchanged that should have been updated

    • Rows with duplicate unique keys due to inserts or updates (which should have been rejected) that failed to find an existing row and thus (wrongly) inserted a new one

    This issue does not affect simple scans, so you can see all rows in a given table using SELECT * FROM table and similar queries that do not depend on a primary or unique key.

    Upgrading to or downgrading from an affected release can be troublesome if there are rows with duplicate primary or unique keys in the table; such rows should be merged, but the best means of doing so is application dependent.

    In addition, since the key operations themselves are faulty, a merge can be difficult to achieve without taking the MySQL NDB Cluster offline, and it may be necessary to dump, purge, process, and reload the data. Depending on the circumstances, you may want or need to process the dump with an external application, or merely to reload the dump while ignoring duplicates if the result is acceptable.

    Another possibility is to copy the data into another table without the original table' unique key constraints or primary key (recall that CREATE TABLE t2 SELECT * FROM t1 does not by default copy t1's primary or unique key definitions to t2). Following this, you can remove the duplicates from the copy, then add back the unique constraints and primary key definitions. Once the copy is in the desired state, you can either drop the original table and rename the copy, or make a new dump (which can be loaded later) from the copy. (Bug #16023068, Bug #67928)

  • The management client command ALL REPORT BackupStatus failed with an error when used with data nodes having multiple LQH worker threads (ndbmtd data nodes). The issue did not effect the node_id REPORT BackupStatus form of this command. (Bug #15908907)

  • The multithreaded job scheduler could be suspended prematurely when there were insufficient free job buffers to allow the threads to continue. The general rule in the job thread is that any queued messages should be sent before the thread is allowed to suspend itself, which guarantees that no other threads or API clients are kept waiting for operations which have already completed. However, the number of messages in the queue was specified incorrectly, leading to increased latency in delivering signals, sluggish response, or otherwise suboptimal performance. (Bug #15908684)

  • The setting for the DefaultOperationRedoProblemAction API node configuration parameter was ignored, and the default value used instead. (Bug #15855588)

  • Node failure during the dropping of a table could lead to the node hanging when attempting to restart.

    When this happened, the NDB internal dictionary (DBDICT) lock taken by the drop table operation was held indefinitely, and the logical global schema lock taken by the SQL the drop table operation from which the drop operation originated was held until the NDB internal operation timed out. To aid in debugging such occurrences, a new dump code, DUMP 1228 (or DUMP DictDumpLockQueue), which dumps the contents of the DICT lock queue, has been added in the ndb_mgm client. (Bug #14787522)

  • Job buffers act as the internal queues for work requests (signals) between block threads in ndbmtd and could be exhausted if too many signals are sent to a block thread.

    Performing pushed joins in the DBSPJ kernel block can execute multiple branches of the query tree in parallel, which means that the number of signals being sent can increase as more branches are executed. If DBSPJ execution cannot be completed before the job buffers are filled, the data node can fail.

    This problem could be identified by multiple instances of the message sleeploop 10!! in the cluster out log, possibly followed by job buffer full. If the job buffers overflowed more gradually, there could also be failures due to error 1205 (Lock wait timeout exceeded), shutdowns initiated by the watchdog timer, or other timeout related errors. These were due to the slowdown caused by the 'sleeploop'.

    Normally up to a 1:4 fanout ratio between consumed and produced signals is permitted. However, since there can be a potentially unlimited number of rows returned from the scan (and multiple scans of this type executing in parallel), any ratio greater 1:1 in such cases makes it possible to overflow the job buffers.

    The fix for this issue defers any lookup child which otherwise would have been executed in parallel with another is deferred, to resume when its parallel child completes one of its own requests. This restricts the fanout ratio for bushy scan-lookup joins to 1:1. (Bug #14709490)

    References: See also: Bug #14648712.

  • During an online upgrade, certain SQL statements could cause the server to hang, resulting in the error Got error 4012 'Request ndbd time-out, maybe due to high load or communication problems' from NDBCLUSTER. (Bug #14702377)

  • The recently added LCP fragment scan watchdog occasionally reported problems with LCP fragment scans having very high table id, fragment id, and row count values.

    This was due to the watchdog not accounting for the time spent draining the backup buffer used to buffer rows before writing to the fragment checkpoint file.

    Now, in the final stage of an LCP fragment scan, the watchdog switches from monitoring rows scanned to monitoring the buffer size in bytes. The buffer size should decrease as data is written to the file, after which the file should be promptly closed. (Bug #14680057)

  • Under certain rare circumstances, MySQL NDB Cluster data nodes could crash in conjunction with a configuration change on the data nodes from a single-threaded to a multithreaded transaction coordinator (using the ThreadConfig configuration parameter for ndbmtd). The problem occurred when a mysqld that had been started prior to the change was shut down following the rolling restart of the data nodes required to effect the configuration change. (Bug #14609774)

  • On Microsoft Windows with CMake 2.6, the build process would not stop if the create_initial_db step failed. (Bug #13713525)