MySQL NDB Cluster 7.2 Release Notes
Added several new columns to the
transporters
table and
counters for the counters
table of the ndbinfo
information database. The information provided may help in
troublehsooting of transport overloads and problems with send
buffer memory allocation. For more information, see the
descriptions of these tables.
(Bug #15935206)
To provide information which can help in assessing the current
state of arbitration in a MySQL NDB Cluster as well as in
diagnosing and correcting arbitration problems, 3 new
tables—membership
,
arbitrator_validity_detail
,
and
arbitrator_validity_summary
—have
been added to the ndbinfo
information database.
(Bug #13336549)
When an NDB
table grew to contain
approximately one million rows or more per partition, it became
possible to insert rows having duplicate primary or unique keys
into it. In addition, primary key lookups began to fail, even
when matching rows could be found in the table by other means.
This issue was introduced in MySQL NDB Cluster 7.0.36, MySQL NDB Cluster 7.1.26, and MySQL NDB Cluster 7.2.9. Signs that you may have been affected include the following:
Rows left over that should have been deleted
Rows unchanged that should have been updated
Rows with duplicate unique keys due to inserts or updates (which should have been rejected) that failed to find an existing row and thus (wrongly) inserted a new one
This issue does not affect simple scans, so you can see all rows
in a given table
using
SELECT * FROM
and similar queries
that do not depend on a primary or unique key.
table
Upgrading to or downgrading from an affected release can be troublesome if there are rows with duplicate primary or unique keys in the table; such rows should be merged, but the best means of doing so is application dependent.
In addition, since the key operations themselves are faulty, a merge can be difficult to achieve without taking the MySQL NDB Cluster offline, and it may be necessary to dump, purge, process, and reload the data. Depending on the circumstances, you may want or need to process the dump with an external application, or merely to reload the dump while ignoring duplicates if the result is acceptable.
Another possibility is to copy the data into another table
without the original table' unique key constraints or
primary key (recall that
CREATE
TABLE t2 SELECT * FROM t1
does not by default copy
t1
's primary or unique key definitions
to t2
). Following this, you can remove the
duplicates from the copy, then add back the unique constraints
and primary key definitions. Once the copy is in the desired
state, you can either drop the original table and rename the
copy, or make a new dump (which can be loaded later) from the
copy.
(Bug #16023068, Bug #67928)
The management client command ALL REPORT
BackupStatus
failed with an error when used with data
nodes having multiple LQH worker threads
(ndbmtd data nodes). The issue did not effect
the
form of this command.
(Bug #15908907)node_id
REPORT
BackupStatus
The multithreaded job scheduler could be suspended prematurely when there were insufficient free job buffers to allow the threads to continue. The general rule in the job thread is that any queued messages should be sent before the thread is allowed to suspend itself, which guarantees that no other threads or API clients are kept waiting for operations which have already completed. However, the number of messages in the queue was specified incorrectly, leading to increased latency in delivering signals, sluggish response, or otherwise suboptimal performance. (Bug #15908684)
The setting for the
DefaultOperationRedoProblemAction
API node configuration parameter was ignored, and the default
value used instead.
(Bug #15855588)
Node failure during the dropping of a table could lead to the node hanging when attempting to restart.
When this happened, the NDB
internal dictionary (DBDICT
) lock taken by
the drop table operation was held indefinitely, and the logical
global schema lock taken by the SQL the drop table operation
from which the drop operation originated was held until the
NDB
internal operation timed out. To aid in
debugging such occurrences, a new dump code,
DUMP 1228
(or DUMP
DictDumpLockQueue
), which dumps the contents of the
DICT
lock queue, has been added in the
ndb_mgm client.
(Bug #14787522)
Job buffers act as the internal queues for work requests (signals) between block threads in ndbmtd and could be exhausted if too many signals are sent to a block thread.
Performing pushed joins in the DBSPJ
kernel
block can execute multiple branches of the query tree in
parallel, which means that the number of signals being sent can
increase as more branches are executed. If
DBSPJ
execution cannot be completed before
the job buffers are filled, the data node can fail.
This problem could be identified by multiple instances of the message sleeploop 10!! in the cluster out log, possibly followed by job buffer full. If the job buffers overflowed more gradually, there could also be failures due to error 1205 (Lock wait timeout exceeded), shutdowns initiated by the watchdog timer, or other timeout related errors. These were due to the slowdown caused by the 'sleeploop'.
Normally up to a 1:4 fanout ratio between consumed and produced signals is permitted. However, since there can be a potentially unlimited number of rows returned from the scan (and multiple scans of this type executing in parallel), any ratio greater 1:1 in such cases makes it possible to overflow the job buffers.
The fix for this issue defers any lookup child which otherwise would have been executed in parallel with another is deferred, to resume when its parallel child completes one of its own requests. This restricts the fanout ratio for bushy scan-lookup joins to 1:1. (Bug #14709490)
References: See also: Bug #14648712.
During an online upgrade, certain SQL statements could cause the server to hang, resulting in the error Got error 4012 'Request ndbd time-out, maybe due to high load or communication problems' from NDBCLUSTER. (Bug #14702377)
The recently added LCP fragment scan watchdog occasionally reported problems with LCP fragment scans having very high table id, fragment id, and row count values.
This was due to the watchdog not accounting for the time spent draining the backup buffer used to buffer rows before writing to the fragment checkpoint file.
Now, in the final stage of an LCP fragment scan, the watchdog switches from monitoring rows scanned to monitoring the buffer size in bytes. The buffer size should decrease as data is written to the file, after which the file should be promptly closed. (Bug #14680057)
Under certain rare circumstances, MySQL NDB Cluster data nodes
could crash in conjunction with a configuration change on the
data nodes from a single-threaded to a multithreaded transaction
coordinator (using the
ThreadConfig
configuration parameter for ndbmtd). The
problem occurred when a mysqld that had been
started prior to the change was shut down following the rolling
restart of the data nodes required to effect the configuration
change.
(Bug #14609774)