MySQL NDB Cluster 7.3 Release Notes
Packaging; Solaris:
Compilation of ndbmtd failed on Solaris 10
and 11 for 32-bit x86
, and the binary was not
included in the binary distributions for these platforms.
(Bug #16620938)
Microsoft Windows:
Timers used in timing scheduler events in the
NDB
kernel have been refactored, in
part to insure that they are monotonic on all platforms. In
particular, on Windows, event intervals were previously
calculated using values obtained from
GetSystemTimeAsFileTime()
, which reads
directly from the system time (“wall clock”), and
which may arbitrarily be reset backward or forward, leading to
false watchdog or heartbeat alarms, or even node shutdown. Lack
of timer monotonicity could also cause slow disk writes during
backups and global checkpoints. To fix this issue, the Windows
implementation now uses
QueryPerformanceCounters()
instead of
GetSystemTimeAsFileTime()
. In the event that
a monotonic timer is not found on startup of the data nodes, a
warning is logged.
In addition, on all platforms, a check is now performed at
compile time for available system monotonic timers, and the
build fails if one cannot be found; note that
CLOCK_HIGHRES
is now supported as an
alternative for CLOCK_MONOTONIC
if the latter
is not available.
(Bug #17647637)
NDB Disk Data: When using Disk Data tables and ndbmtd data nodes, it was possible for the undo buffer to become overloaded, leading to a crash of the data nodes. This issue was more likely to be encountered when using Disk Data columns whose size was approximately 8K or larger. (Bug #16766493)
NDB Cluster APIs:
UINT_MAX64
was treated as a signed value by
Visual Studio 2010. To prevent this from happening, the value is
now explicitly defined as unsigned.
(Bug #17947674)
References: See also: Bug #17647637.
NDB Cluster APIs:
It was possible for an Ndb
object to receive signals for handling before it was
initialized, leading to thread interleaving and possible data
node failure when executing a call to
Ndb::init()
. To guard against
this happening, a check is now made when it is starting to
receive signals that the Ndb
object is
properly initialized before any signals are actually handled.
(Bug #17719439)
NDB Cluster APIs: Compilation of example NDB API program files failed due to missing include directives. (Bug #17672846, Bug #70759)
NDB Cluster APIs:
An application, having opened two distinct instances of
Ndb_cluster_connection
,
attempted to use the second connection object to send signals to
itself, but these signals were blocked until the destructor was
explicitly called for that connection object.
(Bug #17626525)
References: This issue is a regression of: Bug #16595838.
Interrupting a drop of a foreign key could cause the underlying table to become corrupt. (Bug #18041636)
Monotonic timers on several platforms can experience issues which might result in the monotonic clock doing small jumps back in time. This is due to imperfect synchronization of clocks between multiple CPU cores and does not normally have an adverse effect on the scheduler and watchdog mechanisms; so we handle some of these cases by making backtick protection less strict, although we continue to ensure that the backtick is less than 10 milliseconds. This fix also removes several checks for backticks which are thereby made redundant. (Bug #17973819)
Under certain specific circumstances, in a cluster having two SQL nodes, one of these could hang, and could not be accessed again even after killing the mysqld process and restarting it. (Bug #17875885, Bug #18080104)
References: See also: Bug #17934985.
Poor support or lack of support on some platforms for monotonic timers caused issues with delayed signal handling by the job scheduler for the multithreaded data node. Variances (timer leaps) on such platforms are now handled in the same way the multithreaded data node process that they are by the singlethreaded version. (Bug #17857442)
References: See also: Bug #17475425, Bug #17647637.
In some cases, with
ndb_join_pushdown
enabled, it
was possible to obtain from a valid query the error
Got error 290 'Corrupt key in TC, unable to xfrm'
from NDBCLUSTER even though the data was not
actually corrupted.
It was determined that a NULL
in a
VARCHAR
column could be used to
construct a lookup key, but since NULL
is
never equal to any other value, such a lookup could simple have
been eliminated instead. This NULL
lookup in
turn led to the spurious error message.
This fix takes advantage of the fact that a key lookup with
NULL
never finds any matching rows, and so
NDB
does not try to perform the
lookup that would have led to the error.
(Bug #17845161)
The local checkpoint lag watchdog tracking the number of times a check for LCP timeout was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the LCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called. (Bug #17842035)
References: See also: Bug #17647469.
It was theoretically possible in certain cases for a number of
output functions internal to the
NDB
code to supply an uninitialized
buffer as output. Now in such cases, a newline character is
printed instead.
(Bug #17775602, Bug #17775772)
Use of the localtime()
function in
NDB
multithreading code led to
otherwise nondeterministic failures in
ndbmtd. This fix replaces this function,
which on many platforms uses a buffer shared among multiple
threads, with localtime_r()
, which can have
allocated to it a buffer of its own.
(Bug #17750252)
When using single-threaded (ndbd) data nodes
with RealTimeScheduler
enabled, the CPU did not, as intended, temporarily lower its
scheduling priority to normal every 10 milliseconds to give
other, non-realtime threads a chance to run.
(Bug #17739131)
During arbitrator selection, QMGR
(see
The QMGR Block) runs through
a series of states, the first few of which are (in order)
NULL
, INIT
,
FIND
, PREP1
,
PREP2
, and START
. A check
for an arbitration selection timeout occurred in the
FIND
state, even though the corresponding
timer was not set until QMGR
reached the
PREP1
and PREP2
states.
Attempting to read the resulting uninitialized timestamp value
could lead to false Could not find an arbitrator,
cluster is not partition-safe warnings.
This fix moves the setting of the timer for arbitration timeout
to the INIT
state, so that the value later
read during FIND
is always initialized.
(Bug #17738720)
The global checkpoint lag watchdog tracking the number of times a check for GCP lag was performed using the system scheduler and used this count to check for a timeout condition, but this caused a number of issues. To overcome these limitations, the GCP watchdog has been refactored to keep track of its own start times, and to calculate elapsed time by reading the (real) clock every time it is called.
In addition, any backticks (rare in any case) are now handled by taking the backward time as the new current time and calculating the elapsed time for this round as 0. Finally, any ill effects of a forward leap, which possibly could expire the watchdog timer immediately, are reduced by never calculating an elapsed time longer than the requested delay time for the watchdog timer. (Bug #17647469)
References: See also: Bug #17842035.
The length of the interval (intended to be 10 seconds) between
warnings for GCP_COMMIT
when the GCP progress
watchdog did not detect progress in a global checkpoint was not
always calculated correctly.
(Bug #17647213)
Trying to drop an index used by a foreign key constraint caused data node failure. Now in such cases, the statement used to perform the drop fails. (Bug #17591531)
In certain rare cases on commit of a transaction, an
Ndb
object was released before
the transaction coordinator (DBTC
kernel
block) sent the expected COMMIT_CONF
signal;
NDB
failed to send a
COMMIT_ACK
signal in response, which caused a
memory leak in the NDB
kernel could later
lead to node failure.
Now an Ndb
object is not released until the
COMMIT_CONF
signal has actually been
received.
(Bug #16944817)
Losing its connections to the management node or data nodes
while a query against the
ndbinfo.memoryusage
table was
in progress caused the SQL node where the query was issued to
fail.
(Bug #14483440, Bug #16810415)
The ndbd_redo_log_reader utility now supports
a --help
option. Using this options causes the
program to print basic usage information, and then to exit.
(Bug #11749591, Bug #36805)