MySQL NDB Cluster 7.2 Release Notes
MySQL NDB Cluster 7.2.16 is a new release of NDB Cluster,
incorporating new features in the NDB
storage engine, and fixing recently discovered bugs in previous
MySQL NDB Cluster 7.2 development releases.
Obtaining MySQL NDB Cluster 7.2. MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.
This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.37 (see Changes in MySQL 5.5.37 (2014-03-27, General Availability)).
Handling of
LongMessageBuffer
shortages and statistics has been improved as follows:
The default value of LongMessageBuffer
has been increased from 4 MB to 64 MB.
When this resource is exhausted, a suitable informative message is now printed in the data node log describing possible causes of the problem and suggesting possible solutions.
LongMessageBuffer
usage information is
now shown in the
ndbinfo.memoryusage
table.
See the description of this table for an example and
additional information.
Important Change:
The server system variables
ndb_index_cache_entries
and
ndb_index_stat_freq
, which had been
deprecated in a previous MySQL NDB Cluster release series, have
now been removed.
(Bug #11746486, Bug #26673)
NDB Replication: A slave in MySQL NDB Cluster Replication now monitors the progression of epoch numbers received from its immediate upstream master, which can both serve as a useful check on the low-level functioning of replication, and provide a warning in the event replication is restarted accidentally at an already-applied position.
As a result of this enhancement, an epoch ID collision has the following results, depending on the state of the slave SQL thread:
Following a RESET SLAVE
statement, no action is taken, in order to allow the
execution of this statement without spurious warnings.
Following START SLAVE
, a
warning is produced that the slave is being positioned at an
epoch that has already been applied.
In all other cases, the slave SQL thread is stopped against the possibility that a system malfunction has resulted in the re-application of an existing epoch.
(Bug #17461576)
References: See also: Bug #17369118.
NDB Cluster APIs:
When an NDB API client application received a signal with an
invalid block or signal number, NDB
provided only a very brief error message that did not accurately
convey the nature of the problem. Now in such cases, appropriate
printouts are provided when a bad signal or message is detected.
In addition, the message length is now checked to make certain
that it matches the size of the embedded signal.
(Bug #18426180)
NDB Cluster APIs:
Refactoring that was performed in MySQL NDB Cluster 7.2.15
inadvertently introduced a dependency in
Ndb.hpp
on a file that is not included in
the distribution, which caused NDB API applications to fail to
compile. The dependency has been removed.
(Bug #18293112, Bug #71803)
References: This issue is a regression of: Bug #17647637.
NDB Cluster APIs:
An NDB API application sends a scan query to a data node; the
scan is processed by the transaction coordinator (TC). The TC
forwards a LQHKEYREQ
request to the
appropriate LDM, and aborts the transaction if it does not
receive a LQHKEYCONF
response within the
specified time limit. After the transaction is successfully
aborted, the TC sends a TCROLLBACKREP
to the
NDBAPI client, and the NDB API client processes this message by
cleaning up any Ndb
objects
associated with the transaction.
The client receives the data which it has requested in the form
of TRANSID_AI
signals, buffered for sending
at the data node, and may be delivered after a delay. On
receiving such a signal, NDB
checks the
transaction state and ID: if these are as expected, it processes
the signal using the Ndb
objects associated
with that transaction.
The current bug occurs when all the following conditions are fulfilled:
The transaction coordinator aborts a transaction due to
delays and sends a TCROLLBACPREP
signal
to the client, while at the same time a
TRANSID_AI
which has been buffered for
delivery at an LDM is delivered to the same client.
The NDB API client considers the transaction complete on
receipt of a TCROLLBACKREP
signal, and
immediately closes the transaction.
The client has a separate receiver thread running concurrently with the thread that is engaged in closing the transaction.
The arrival of the late TRANSID_AI
interleaves with the closing of the user thread's
transaction such that TRANSID_AI
processing passes normal checks before
closeTransaction()
resets the transaction state and invalidates the receiver.
When these conditions are all met, the receiver thread proceeds
to continue working on the TRANSID_AI
signal
using the invalidated receiver. Since the receiver is already
invalidated, its usage results in a node failure.
Now the Ndb
object cleanup done for
TCROLLBACKREP
includes invalidation of the
transaction ID, so that, for a given transaction, any signal
which is received after the TCROLLBACKREP
arrives does not pass the transaction ID check and is silently
dropped. This fix is also implemented for the
TC_COMMITREF
,
TCROLLBACKREF
,
TCKEY_FAILCONF
, and
TCKEY_FAILREF
signals as well.
See also Operations and Signals, for additional information about NDB messaging. (Bug #18196562)
NDB Cluster APIs:
The example
ndbapi-examples/ndbapi_blob_ndbrecord/main.cpp
included an internal header file
(ndb_global.h
) not found in the MySQL NDB
Cluster binary distribution. The example now uses
stdlib.h
and string.h
instead of this file.
(Bug #18096866, Bug #71409)
NDB Cluster APIs: ndb_restore could sometimes report Error 701 System busy with other schema operation unnecessarily when restoring in parallel. (Bug #17916243)
When an ALTER TABLE
statement
changed table schemas without causing a change in the
table's partitioning, the new table definition did not copy
the hash map from the old definition, but used the current
default hash map instead. However, the table data was not
reorganized according to the new hashmap, which made some rows
inaccessible using a primary key lookup if the two hash maps had
incompatible definitions.
To keep this situation from occurring, any ALTER
TABLE
that entails a hashmap change now triggers a
reorganisation of the table. In addition, when copying a table
definition in such cases, the hashmap is now also copied.
(Bug #18436558)
When certain queries generated signals having more than 18 data words prior to a node failure, such signals were not written correctly in the trace file. (Bug #18419554)
Checking of timeouts is handled by the signal
TIME_SIGNAL
. Previously, this signal was
generated by the QMGR
NDB
kernel block in the main
thread, and sent to the QMRG
,
DBLQH
, and DBTC
blocks
(see NDB Kernel Blocks) as needed to
check (respectively) heartbeats, disk writes, and transaction
timeouts. In ndbmtd (as opposed to
ndbd), these blocks all execute in different
threads. This meant that if, for example,
QMGR was actively working and some other
thread was put to sleep, the previously sleeping thread received
a large number of TIME_SIGNAL messages
simultaneously when it was woken up again, with the effect that
effective times moved very quickly in DBLQH
as well as in DBTC. In
DBLQH, this had no noticeable adverse
effects, but this was not the case in DBTC;
the latter block could not work on transactions even though time
was still advancing, leading to a situation in which many
operations appeared to time out because the transaction
coordinator (TC) thread was comparatively slow in answering
requests.
In addition, when the TC thread slept for longer than 1500
milliseconds, the data node crashed due to detecting that the
timeout handling loop had not yet stopped. To rectify this
problem, the generation of the TIME_SIGNAL
has been moved into the local threads instead of
QMGR
; this provides for better control over
how quickly TIME_SIGNAL
messages are allowed
to arrive.
(Bug #18417623)
The ServerPort
and
TcpBind_INADDR_ANY
configuration parameters were not included in the output of
ndb_mgmd
--print-full-config
.
(Bug #18366909)
After dropping an NDB
table,
neither the cluster log nor the output of the REPORT
MemoryUsage
command showed that the
IndexMemory
used by that
table had been freed, even though the memory had in fact been
deallocated. This issue was introduced in MySQL NDB Cluster
7.2.14.
(Bug #18296810)
ndb_show_tables sometimes failed with the
error message Unable to connect to management
server and immediately terminated, without providing
the underlying reason for the failure. To provide more useful
information in such cases, this program now also prints the most
recent error from the
Ndb_cluster_connection
object
used to instantiate the connection.
(Bug #18276327)
-DWITH_NDBMTD=0
did not function
correctly, which could cause the build to fail on platforms such
as ARM and Raspberry Pi which do not define the memory barrier
functions required to compile ndbmtd.
(Bug #18267919)
References: See also: Bug #16620938.
The block threads managed by the multithreading scheduler communicate by placing signals in an out queue or job buffer which is set up between all block threads. This queue has a fixed maximum size, such that when it is filled up, the worker thread must wait for the consumer to drain the queue. In a highly loaded system, multiple threads could end up in a circular wait lock due to full out buffers, such that they were preventing each other from performing any useful work. This condition eventually led to the data node being declared dead and killed by the watchdog timer.
To fix this problem, we detect situations in which a circular wait lock is about to begin, and cause buffers which are otherwise held in reserve to become available for signal processing by queues which are highly loaded. (Bug #18229003)
The ndb_mgm client START
BACKUP
command (see
Commands in the NDB Cluster Management Client) could
experience occasional random failures when a ping was received
prior to an expected BackupCompleted
event.
Now the connection established by this command is not checked
until it has been properly set up.
(Bug #18165088)
The local checkpoint lag watchdog tracking the number of times a check for LCP timeout was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the LCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called. (Bug #17842035)
References: See also: Bug #17647469.
Data nodes running ndbmtd could stall while performing an online upgrade of a MySQL NDB Cluster containing a great many tables from a version prior to NDB 7.2.5 to version 7.2.5 or later. (Bug #16693068)