MySQL Cluster NDB 7.1.26 is a new release of MySQL Cluster,
incorporating new features in the
NDBCLUSTER storage engine and
fixing recently discovered bugs in previous MySQL Cluster NDB
7.1 releases.
Obtaining MySQL Cluster NDB 7.1. The latest MySQL Cluster NDB 7.1 binaries for supported platforms can be obtained from http://dev.mysql.com/downloads/cluster/. Source code for the latest MySQL Cluster NDB 7.1 release can be obtained from the same location. You can also access the MySQL Cluster NDB 7.1 development source tree at https://code.launchpad.net/~mysql/mysql-server/mysql-cluster-7.1.
This release also incorporates all bugfixes and changes made in previous MySQL Cluster releases, as well as all bugfixes and feature changes which were added in mainline MySQL 5.1 through MySQL 5.1.67 (see Changes in MySQL 5.1.67 (2012-12-21)).
Functionality Added or Changed
Added several new columns to the
transporters table and
counters for the counters
table of the ndbinfo
information database. The information provided may help in
troublehsooting of transport overloads and problems with send
buffer memory allocation. For more information, see the
descriptions of these tables.
(Bug #15935206)
To provide information which can help in assessing the current
state of arbitration in a MySQL Cluster as well as in diagnosing
and correcting arbitration problems, 3 new
tables—membership,
arbitrator_validity_detail,
and
arbitrator_validity_summary—have
been added to the ndbinfo
information database.
(Bug #13336549)
Bugs Fixed
When an NDB table grew to contain
approximately one million rows or more per partition, it became
possible to insert rows having duplicate primary or unique keys
into it. In addition, primary key lookups began to fail, even
when matching rows could be found in the table by other means.
This issue was introduced in MySQL Cluster NDB 7.0.36, MySQL Cluster NDB 7.1.26, and MySQL Cluster NDB 7.2.9. Signs that you may have been affected include the following:
Rows left over that should have been deleted
Rows unchanged that should have been updated
Rows with duplicate unique keys due to inserts or updates (which should have been rejected) that failed to find an existing row and thus (wrongly) inserted a new one
This issue does not affect simple scans, so you can see all rows
in a given table using
SELECT * FROM
and similar queries
that do not depend on a primary or unique key.
table
Upgrading to or downgrading from an affected release can be troublesome if there are rows with duplicate primary or unique keys in the table; such rows should be merged, but the best means of doing so is application dependent.
In addition, since the key operations themselves are faulty, a merge can be difficult to achieve without taking the MySQL Cluster offline, and it may be necessary to dump, purge, process, and reload the data. Depending on the circumstances, you may want or need to process the dump with an external application, or merely to reload the dump while ignoring duplicates if the result is acceptable.
Another possibility is to copy the data into another table
without the original table' unique key constraints or
primary key (recall that
CREATE
TABLE t2 SELECT * FROM t1 does not by default copy
t1's primary or unique key definitions
to t2). Following this, you can remove the
duplicates from the copy, then add back the unique constraints
and primary key definitions. Once the copy is in the desired
state, you can either drop the original table and rename the
copy, or make a new dump (which can be loaded later) from the
copy.
(Bug #16023068, Bug #67928)
The multithreaded job scheduler could be suspended prematurely when there were insufficient free job buffers to allow the threads to continue. The general rule in the job thread is that any queued messages should be sent before the thread is allowed to suspend itself, which guarantees that no other threads or API clients are kept waiting for operations which have already completed. However, the number of messages in the queue was specified incorrectly, leading to increased latency in delivering signals, sluggish response, or otherwise suboptimal performance. (Bug #15908684)
The management client command ALL REPORT
BackupStatus failed with an error when used with data
nodes having multiple LQH worker threads
(ndbmtd data nodes). The issue did not effect
the form of this command.
(Bug #15908907)node_id REPORT
BackupStatus
The setting for the
DefaultOperationRedoProblemAction
API node configuration parameter was ignored, and the default
value used instead.
(Bug #15855588)
Node failure during the dropping of a table could lead to the node hanging when attempting to restart. (Bug #14787522)
During an online upgrade, certain SQL statements could cause the server to hang, resulting in the error Got error 4012 'Request ndbd time-out, maybe due to high load or communication problems' from NDBCLUSTER. (Bug #14702377)
The recently added LCP fragment scan watchdog occasionally reported problems with LCP fragment scans having very high table id, fragment id, and row count values.
This was due to the watchdog not accounting for the time spent draining the backup buffer used to buffer rows before writing to the fragment checkpoint file.
Now, in the final stage of an LCP fragment scan, the watchdog switches from monitoring rows scanned to monitoring the buffer size in bytes. The buffer size should decrease as data is written to the file, after which the file should be promptly closed. (Bug #14680057)
Job buffers act as the internal queues for work requests (signals) between block threads in ndbmtd and could be exhausted if too many signals are sent to a block thread.
Performing pushed joins in the DBSPJ kernel
block can execute multiple branches of the query tree in
parallel, which means that the number of signals being sent can
increase as more branches are executed. If
DBSPJ execution cannot be completed before
the job buffers are filled, the data node can fail.
This problem could be identified by multiple instances of the message sleeploop 10!! in the cluster out log, possibly followed by job buffer full. If the job buffers overflowed more gradually, there could also be failures due to error 1205 (Lock wait timeout exceeded), shutdowns initiated by the watchdog timer, or other timeout related errors. These were due to the slowdown caused by the 'sleeploop'.
Normally up to a 1:4 fanout ratio between consumed and produced signals is permitted. However, since there can be a potentially unlimited number of rows returned from the scan (and multiple scans of this type executing in parallel), any ratio greater 1:1 in such cases makes it possible to overflow the job buffers.
The fix for this issue defers any lookup child which otherwise would have been executed in parallel with another is deferred, to resume when its parallel child completes one of its own requests. This restricts the fanout ratio for bushy scan-lookup joins to 1:1. (Bug #14709490)
References: See also Bug #14648712.
Under certain rare circumstances, MySQL Cluster data nodes could
crash in conjunction with a configuration change on the data
nodes from a single-threaded to a multi-threaded transaction
coordinator (using the
ThreadConfig
configuration parameter for ndbmtd). The
problem occurred when a mysqld that had been
started prior to the change was shut down following the rolling
restart of the data nodes required to effect the configuration
change.
(Bug #14609774)