MySQL NDB Cluster 7.3 Release Notes
Performance: Recent improvements made to the multithreaded scheduler were intended to optimize the cache behavior of its internal data structures, with members of these structures placed such that those local to a given thread do not overflow into a cache line which can be accessed by another thread. Where required, extra padding bytes are inserted to isolate cache lines owned (or shared) by other threads, thus avoiding invalidation of the entire cache line if another thread writes into a cache line not entirely owned by itself. This optimization improved MT Scheduler performance by several percent.
It has since been found that the optimization just described
depends on the global instance of struct
thr_repository
starting at a cache line
aligned base address as well as the compiler not rearranging or
adding extra padding to the scheduler struct; it was also found
that these prerequisites were not guaranteed (or even checked).
Thus this cache line optimization has previously worked only
when g_thr_repository
(that is, the global
instance) ended up being cache line aligned only by accident. In
addition, on 64-bit platforms, the compiler added extra padding
words in struct thr_safe_pool
such that
attempts to pad it to a cache line aligned size failed.
The current fix ensures that g_thr_repository
is constructed on a cache line aligned address, and the
constructors modified so as to verify cacheline aligned adresses
where these are assumed by design.
Results from internal testing show improvements in MT Scheduler read performance of up to 10% in some cases, following these changes. (Bug #18352514)
NDB Cluster APIs:
Two new example programs, demonstrating reads and writes of
CHAR
,
VARCHAR
, and
VARBINARY
column values, have
been added to storage/ndb/ndbapi-examples
in the MySQL NDB Cluster source tree. For more information about
these programs, including source code listings, see
NDB API Simple Array Example, and
NDB API Simple Array Example Using Adapter.
NDB Disk Data:
An update on many rows of a large Disk Data table could in some
rare cases lead to node failure. In the event that such problems
are observed with very large transactions on Disk Data tables
you can now increase the number of page entries allocated for
disk page buffer memory by raising the value of the
DiskPageBufferEntries
data node configuration parameter added in this release.
(Bug #19958804)
NDB Disk Data:
In some cases, during DICT
master takeover,
the new master could crash while attempting to roll forward an
ongoing schema transaction.
(Bug #19875663, Bug #74510)
NDB Disk Data:
When a node acting as a DICT
master fails,
the arbitrator selects another node to take over in place of the
failed node. During the takeover procedure, which includes
cleaning up any schema transactions which are still open when
the master failed, the disposition of the uncommitted schema
transaction is decided. Normally this transaction be rolled
back, but if it has completed a sufficient portion of a commit
request, the new master finishes processing the commit. Until
the fate of the transaction has been decided, no new
TRANS_END_REQ
messages from clients can be
processed. In addition, since multiple concurrent schema
transactions are not supported, takeover cleanup must be
completed before any new transactions can be started.
A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results.
The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)
NDB Disk Data:
When a node acting as DICT
master fails, it
is still possible to request that any open schema transaction be
either committed or aborted by sending this request to the new
DICT
master. In this event, the new master
takes over the schema transaction and reports back on whether
the commit or abort request succeeded. In certain cases, it was
possible for the new master to be misidentified—that is,
the request was sent to the wrong node, which responded with an
error that was interpreted by the client application as an
aborted schema transaction, even in cases where the transaction
could have been successfully committed, had the correct node
been contacted.
(Bug #74521, Bug #19880747)
NDB Cluster APIs:
It was possible to delete an
Ndb_cluster_connection
object
while there remained instances of
Ndb
using references to it. Now
the Ndb_cluster_connection
destructor waits
for all related Ndb
objects to be released
before completing.
(Bug #19999242)
References: See also: Bug #19846392.
NDB Cluster APIs:
The buffer allocated by an
NdbScanOperation
for receiving
scanned rows was not released until the
NdbTransaction
owning the scan
operation was closed. This could lead to excessive memory usage
in an application where multiple scans were created within the
same transaction, even if these scans were closed at the end of
their lifecycle, unless
NdbScanOperation::close()
was
invoked with the releaseOp
argument
equal to true
. Now the buffer is released
whenever the cursor navigating the result set is closed with
NdbScanOperation::close()
, regardless of the
value of this argument.
(Bug #75128, Bug #20166585)
The global checkpoint commit and save protocols can be delayed
by various causes, including slow disk I/O. The
DIH
master node monitors the progress of both
of these protocols, and can enforce a maximum lag time during
which the protocols are stalled by killing the node responsible
for the lag when it reaches this maximum. This
DIH
master GCP monitor mechanism did not
perform its task more than once per master node; that is, it
failed to continue monitoring after detecting and handling a GCP
stop.
(Bug #20128256)
References: See also: Bug #19858151, Bug #20069617, Bug #20062754.
When running mysql_upgrade on a MySQL NDB
Cluster SQL node, the expected drop of the
performance_schema
database on this node was
instead performed on all SQL nodes connected to the cluster.
(Bug #20032861)
A number of problems relating to the fired triggers pool have been fixed, including the following issues:
When the fired triggers pool was exhausted,
NDB
returned Error 218 (Out of
LongMessageBuffer). A new error code 221 is
added to cover this case.
An additional, separate case in which Error 218 was wrongly reported now returns the correct error.
Setting low values for
MaxNoOfFiredTriggers
led to an error when no memory was allocated if there was
only one hash bucket.
An aborted transaction now releases any fired trigger
records it held. Previously, these records were held until
its ApiConnectRecord
was reused by
another transaction.
In addition, for the Fired Triggers
pool
in the internal ndbinfo.ndb$pools
table,
the high value always equalled the total, due to the fact
that all records were momentarily seized when initializing
them. Now the high value shows the maximum following
completion of initialization.
(Bug #19976428)
Online reorganization when using ndbmtd data
nodes and with binary logging by mysqld
enabled could sometimes lead to failures in the
TRIX
and DBLQH
kernel
blocks, or in silent data corruption.
(Bug #19903481)
References: See also: Bug #19912988.
The local checkpoint scan fragment watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes.
To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)
References: See also: Bug #20128256, Bug #20069617, Bug #20062754.
A watchdog failure resulted from a hang while freeing a disk
page in TUP_COMMITREQ
, due to use of an
uninitialized block variable.
(Bug #19815044, Bug #74380)
Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)
When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)
When using the NDB
storage engine,
the maximum possible length of a database or table name is 63
characters, but this limit was not always strictly enforced.
This meant that a statement using a name having 64 characters
such CREATE DATABASE
,
DROP DATABASE
, or
ALTER TABLE
RENAME
could cause the SQL node on which it was
executed to fail. Now such statements fail with an appropriate
error message.
(Bug #19550973)
When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again.
To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)
When executing very large pushdown joins involving one or more
indexes each defined over several columns, it was possible in
some cases for the DBSPJ
block (see
The DBSPJ Block) in the
NDB
kernel to generate
SCAN_FRAGREQ
signals that were excessively
large. This caused data nodes to fail when these could not be
handled correctly, due to a hard limit in the kernel on the size
of such signals (32K). This fix bypasses that limitation by
breaking up SCAN_FRAGREQ
data that is too
large for one such signal, and sending the
SCAN_FRAGREQ
as a chunked or fragmented
signal instead.
(Bug #19390895)
ndb_index_stat sometimes failed when used against a table containing unique indexes. (Bug #18715165)
Queries against tables containing a CHAR(0) columns failed with ERROR 1296 (HY000): Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)
In the NDB
kernel, it was possible for a
TransporterFacade
object to reset a buffer
while the data contained by the buffer was being sent, which
could lead to a race condition.
(Bug #75041, Bug #20112981)
mysql_upgrade failed to drop and recreate the
ndbinfo
database and its
tables as expected.
(Bug #74863, Bug #20031425)
Due to a lack of memory barriers, MySQL NDB Cluster programs
such as ndbmtd did not compile on
POWER
platforms.
(Bug #74782, Bug #20007248)
In some cases, when run against a table having an AFTER
DELETE
trigger, a
DELETE
statement that matched no
rows still caused the trigger to execute.
(Bug #74751, Bug #19992856)
A basic requirement of the NDB
storage engine's design is that the transporter registry not
attempt to receive data
(TransporterRegistry::performReceive()
) from
and update the connection status
(TransporterRegistry::update_connections()
)
of the same set of transporters concurrently, due to the fact
that the updates perform final cleanup and reinitialization of
buffers used when receiving data. Changing the contents of these
buffers while reading or writing to them could lead to "garbage"
or inconsistent signals being read or written.
During the course of work done previously to improve the
implementation of the transporter facade, a mutex intended to
protect against the concurrent use of the
performReceive()
and
update_connections()
) methods on the same
transporter was inadvertently removed. This fix adds a watchdog
check for concurrent usage. In addition,
update_connections()
and
performReceive()
calls are now serialized
together while polling the transporters.
(Bug #74011, Bug #19661543)
ndb_restore failed while restoring a table
which contained both a built-in conversion on the primary key
and a staging conversion on a
TEXT
column.
During staging, a BLOB
table is
created with a primary key column of the target type. However, a
conversion function was not provided to convert the primary key
values before loading them into the staging blob table, which
resulted in corrupted primary key values in the staging
BLOB
table. While moving data from the
staging table to the target table, the BLOB
read failed because it could not find the primary key in the
BLOB
table.
Now all BLOB
tables are checked to see
whether there are conversions on primary keys of their main
tables. This check is done after all the main tables are
processed, so that conversion functions and parameters have
already been set for the main tables. Any conversion functions
and parameters used for the primary key in the main table are
now duplicated in the BLOB
table.
(Bug #73966, Bug #19642978)
Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down.
To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one).
Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)
ndb_restore
--print-data
truncated
TEXT
and
BLOB
column values to 240 bytes
rather than 256 bytes.
(Bug #65467, Bug #14571512)
Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)