MySQL NDB Cluster 7.3 Release Notes
A number of improvements, listed here, have been made with regard to handling issues that could arise when an overload arose due to a great number of inserts being performed during a local checkpoint (LCP):
Failures sometimes occurred during restart processing when trying to execute the undo log, due to a problem with finding the end of the log. This happened when there remained unwritten pages at the end of the first undo file when writing to the second undo file, which caused the execution of undo logs in reverse order and so execute old or even nonexistent log records.
This is fixed by ensuring that execution of the undo log begins with the proper end of the log, and, if started earlier, that any unwritten or faulty pages are ignored.
It was possible to fail during an LCP, or when performing a
COPY_FRAGREQ
, due to running out of
operation records. We fix this by making sure that LCPs and
COPY_FRAG
use resources reserved for
operation records, as was already the case with scan
records. In addition, old code for ACC operations that was
no longer required but that could lead to failures was
removed.
When an LCP was performed while loading a table, it was
possible to hit a livelock during LCP scans, due to the fact
that each record that was inserted into new pages after the
LCP had started had its LCP_SKIP
flag
set. Such records were discarded as intended by the LCP
scan, but when inserts occurred faster than the LCP scan
could discard records, the scan appeared to hang. As part of
this issue, the scan failed to report any progress to the
LCP watchdog, which after 70 seconds of livelock killed the
process. This issue was observed when performing on the
order of 250000 inserts per second over an extended period
of time (120 seconds or more), using a single LDM.
This part of the fix makes a number of changes, listed here:
We now ensure that pages created after the LCP has
started are not included in LCP scans; we also ensure
that no records inserted into those pages have their
LCP_SKIP
flag set.
Handling of the scan protocol is changed such that a certain amount of progress is made by the LCP regardless of load; we now report progress to the LCP watchdog so that we avoid failure in the event that an LCP is making progress but not writing any records.
We now take steps to guarantee that LCP scans proceed more quickly than inserts can occur, by ensuring that scans are prioritized this scanning activity, and thus, that the LCP is in fact (eventually) completed.
In addition, scanning is made more efficient, by prefetching tuples; this helps avoid stalls while fetching memory in the CPU.
Row checksums for preventing data corruption now include the tuple header bits.
(Bug #76373, Bug #20727343, Bug #76741, Bug #69994, Bug #20903880, Bug #76742, Bug #20904721, Bug #76883, Bug #20980229)
Important Change; NDB Cluster APIs:
Added the method
Ndb::isExpectingHigherQueuedEpochs()
to the NDB API to detect when additional, newer event epochs
were detected by
pollEvents2()
.
The behavior of
Ndb::pollEvents()
has also been
modified such that it now returns
NDB_FAILURE_GCI (equal to
~(Uint64) 0
) when a cluster failure has been
detected.
(Bug #18753887)
NDB Cluster APIs:
Added the
Column::getSizeInBytesForRecord()
method, which returns the size required for a column by an
NdbRecord
, depending on the
column's type (text/blob, or other).
(Bug #21067283)
NDB Cluster APIs:
Creation and destruction of
Ndb_cluster_connection
objects
by multiple threads could make use of the same application lock,
which in some cases led to failures in the global dictionary
cache. To alleviate this problem, the creation and destruction
of several internal NDB API objects have been serialized.
(Bug #20636124)
NDB Cluster APIs: A number of timeouts were not handled correctly in the NDB API. (Bug #20617891)
NDB Cluster APIs:
When an Ndb
object created
prior to a failure of the cluster was reused, the event queue of
this object could still contain data node events originating
from before the failure. These events could reference
“old” epochs (from before the failure occurred),
which in turn could violate the assumption made by the
nextEvent()
method that
epoch numbers always increase. This issue is addressed by
explicitly clearing the event queue in such cases.
(Bug #18411034)
References: See also: Bug #20888668.
After restoring the database metadata (but not any data) by
running ndb_restore
--restore-meta
(or
-m
), SQL nodes would hang while trying to
SELECT
from a table in the
database to which the metadata was restored. In such cases the
attempt to query the table now fails as expected, since the
table does not actually exist until
ndb_restore is executed with
--restore-data
(-r
).
(Bug #21184102)
References: See also: Bug #16890703.
When a great many threads opened and closed blocks in the NDB
API in rapid succession, the internal
close_clnt()
function synchronizing the
closing of the blocks waited an insufficiently long time for a
self-signal indicating potential additional signals needing to
be processed. This led to excessive CPU usage by
ndb_mgmd, and prevented other threads from
opening or closing other blocks. This issue is fixed by changing
the function polling call to wait on a specific condition to be
woken up (that is, when a signal has in fact been executed).
(Bug #21141495)
Previously, multiple send threads could be invoked for handling sends to the same node; these threads then competed for the same send lock. While the send lock blocked the additional send threads, work threads could be passed to other nodes.
This issue is fixed by ensuring that new send threads are not activated while there is already an active send thread assigned to the same node. In addition, a node already having an active send thread assigned to it is no longer visible to other, already active, send threads; that is, such a node is longer added to the node list when a send thread is currently assigned to it. (Bug #20954804, Bug #76821)
Queueing of pending operations when the redo log was overloaded
(DefaultOperationRedoProblemAction
API node configuration parameter) could lead to timeouts when
data nodes ran out of redo log space
(P_TAIL_PROBLEM errors). Now when the
redo log is full, the node aborts requests instead of queuing
them.
(Bug #20782580)
References: See also: Bug #20481140.
NDB
statistics queries could be
delayed by the error delay set for
ndb_index_stat_option
(default
60 seconds) when the index that was queried had been marked with
internal error. The same underlying issue could also cause
ANALYZE TABLE
to hang when
executed against an NDB
table having multiple
indexes where an internal error occurred on one or more but not
all indexes.
Now in such cases, any existing statistics are returned immediately, without waiting for any additonal statistics to be discovered. (Bug #20553313, Bug #20707694, Bug #76325)
The multithreaded scheduler sends to remote nodes either directly from each worker thread or from dedicated send threadsL, depending on the cluster's configuration. This send might transmit all, part, or none of the available data from the send buffers. While there remained pending send data, the worker or send threads continued trying to send in a loop. The actual size of the data sent in the most recent attempt to perform a send is now tracked, and used to detect lack of send progress by the send or worker threads. When no progress has been made, and there is no other work outstanding, the scheduler takes a 1 millisecond pause to free up the CPU for use by other threads. (Bug #18390321)
References: See also: Bug #20929176, Bug #20954804.
In some cases, attempting to restore a table that was previously
backed up failed with a File Not Found
error due to a missing table fragment file. This occurred as a
result of the NDB kernel BACKUP
block
receiving a Busy error while trying to
obtain the table description, due to other traffic from external
clients, and not retrying the operation.
The fix for this issue creates two separate queues for such
requests—one for internal clients such as the
BACKUP
block or
ndb_restore, and one for external clients
such as API nodes—and prioritizing the internal queue.
Note that it has always been the case that external client applications using the NDB API (including MySQL applications running against an SQL node) are expected to handle Busy errors by retrying transactions at a later time; this expectation is not changed by the fix for this issue. (Bug #17878183)
References: See also: Bug #17916243.
In some cases, the DBDICT
block failed to
handle repeated GET_TABINFOREQ
signals after
the first one, leading to possible node failures and restarts.
This could be observed after setting a sufficiently high value
for
MaxNoOfExecutionThreads
and low value for
LcpScanProgressTimeout
.
(Bug #77433, Bug #21297221)
Client lookup for delivery of API signals to the correct client
by the internal
TransporterFacade::deliver_signal()
function
had no mutex protection, which could cause issues such as
timeouts encountered during testing, when other clients
connected to the same TransporterFacade
.
(Bug #77225, Bug #21185585)
It was possible to end up with a lock on the send buffer mutex
when send buffers became a limiting resource, due either to
insufficient send buffer resource configuration, problems with
slow or failing communications such that all send buffers became
exhausted, or slow receivers failing to consume what was sent.
In this situation worker threads failed to allocate send buffer
memory for signals, and attempted to force a send in order to
free up space, while at the same time the send thread was busy
trying to send to the same node or nodes. All of these threads
competed for taking the send buffer mutex, which resulted in the
lock already described, reported by the watchdog as
Stuck in Send
. This fix is made in two parts,
listed here:
The send thread no longer holds the global send thread mutex while getting the send buffer mutex; it now releases the global mutex prior to locking the send buffer mutex. This keeps worker threads from getting stuck in send in such cases.
Locking of the send buffer mutex done by the send threads
now uses a try-lock. If the try-lock fails, the node to make
the send to is reinserted at the end of the list of send
nodes in order to be retried later. This removes the
Stuck in Send
condition for the send
threads.
(Bug #77081, Bug #21109605)