MySQL NDB Cluster 7.2 Release Notes

40.14 Changes in MySQL NDB Cluster 7.2.19 (5.5.41-ndb-7.2.19) (2015-01-25, General Availability)

Bugs Fixed

  • NDB Disk Data: An update on many rows of a large Disk Data table could in some rare cases lead to node failure. In the event that such problems are observed with very large transactions on Disk Data tables you can now increase the number of page entries allocated for disk page buffer memory by raising the value of the DiskPageBufferEntries data node configuration parameter added in this release. (Bug #19958804)

  • NDB Disk Data: In some cases, during DICT master takeover, the new master could crash while attempting to roll forward an ongoing schema transaction. (Bug #19875663, Bug #74510)

  • NDB Disk Data: When a node acting as a DICT master fails, the arbitrator selects another node to take over in place of the failed node. During the takeover procedure, which includes cleaning up any schema transactions which are still open when the master failed, the disposition of the uncommitted schema transaction is decided. Normally this transaction be rolled back, but if it has completed a sufficient portion of a commit request, the new master finishes processing the commit. Until the fate of the transaction has been decided, no new TRANS_END_REQ messages from clients can be processed. In addition, since multiple concurrent schema transactions are not supported, takeover cleanup must be completed before any new transactions can be started.

    A similar restriction applies to any schema operations which are performed in the scope of an open schema transaction. The counter used to coordinate schema operation across all nodes is employed both during takeover processing and when executing any non-local schema operations. This means that starting a schema operation while its schema transaction is in the takeover phase causes this counter to be overwritten by concurrent uses, with unpredictable results.

    The scenarios just described were handled previously using a pseudo-random delay when recovering from a node failure. Now we check before the new master has rolled forward or backwards any schema transactions remaining after the failure of the previous master and avoid starting new schema transactions or performing operations using old transactions until takeover processing has cleaned up after the abandoned transaction. (Bug #19874809, Bug #74503)

  • NDB Disk Data: When a node acting as DICT master fails, it is still possible to request that any open schema transaction be either committed or aborted by sending this request to the new DICT master. In this event, the new master takes over the schema transaction and reports back on whether the commit or abort request succeeded. In certain cases, it was possible for the new master to be misidentified—that is, the request was sent to the wrong node, which responded with an error that was interpreted by the client application as an aborted schema transaction, even in cases where the transaction could have been successfully committed, had the correct node been contacted. (Bug #74521, Bug #19880747)

  • NDB Cluster APIs: The buffer allocated by an NdbScanOperation for receiving scanned rows was not released until the NdbTransaction owning the scan operation was closed. This could lead to excessive memory usage in an application where multiple scans were created within the same transaction, even if these scans were closed at the end of their lifecycle, unless NdbScanOperation::close() was invoked with the releaseOp argument equal to true. Now the buffer is released whenever the cursor navigating the result set is closed with NdbScanOperation::close(), regardless of the value of this argument. (Bug #75128, Bug #20166585)

  • The global checkpoint commit and save protocols can be delayed by various causes, including slow disk I/O. The DIH master node monitors the progress of both of these protocols, and can enforce a maximum lag time during which the protocols are stalled by killing the node responsible for the lag when it reaches this maximum. This DIH master GCP monitor mechanism did not perform its task more than once per master node; that is, it failed to continue monitoring after detecting and handling a GCP stop. (Bug #20128256)

    References: See also: Bug #19858151, Bug #20069617, Bug #20062754.

  • A number of problems relating to the fired triggers pool have been fixed, including the following issues:

    • When the fired triggers pool was exhausted, NDB returned Error 218 (Out of LongMessageBuffer). A new error code 221 is added to cover this case.

    • An additional, separate case in which Error 218 was wrongly reported now returns the correct error.

    • Setting low values for MaxNoOfFiredTriggers led to an error when no memory was allocated if there was only one hash bucket.

    • An aborted transaction now releases any fired trigger records it held. Previously, these records were held until its ApiConnectRecord was reused by another transaction.

    • In addition, for the Fired Triggers pool in the internal ndbinfo.ndb$pools table, the high value always equalled the total, due to the fact that all records were momentarily seized when initializing them. Now the high value shows the maximum following completion of initialization.

    (Bug #19976428)

  • Online reorganization when using ndbmtd data nodes and with binary logging by mysqld enabled could sometimes lead to failures in the TRIX and DBLQH kernel blocks, or in silent data corruption. (Bug #19903481)

    References: See also: Bug #19912988.

  • The local checkpoint scan fragment watchdog and the global checkpoint monitor can each exclude a node when it is too slow when participating in their respective protocols. This exclusion was implemented by simply asking the failing node to shut down, which in case this was delayed (for whatever reason) could prolong the duration of the GCP or LCP stall for other, unaffected nodes.

    To minimize this time, an isolation mechanism has been added to both protocols whereby any other live nodes forcibly disconnect the failing node after a predetermined amount of time. This allows the failing node the opportunity to shut down gracefully (after logging debugging and other information) if possible, but limits the time that other nodes must wait for this to occur. Now, once the remaining live nodes have processed the disconnection of any failing nodes, they can commence failure handling and restart the related protocol or protocol, even if the failed node takes an excessively long time to shut down. (Bug #19858151)

    References: See also: Bug #20128256, Bug #20069617, Bug #20062754.

  • A watchdog failure resulted from a hang while freeing a disk page in TUP_COMMITREQ, due to use of an uninitialized block variable. (Bug #19815044, Bug #74380)

  • Multiple threads crashing led to multiple sets of trace files being printed and possibly to deadlocks. (Bug #19724313)

  • When a client retried against a new master a schema transaction that failed previously against the previous master while the latter was restarting, the lock obtained by this transaction on the new master prevented the previous master from progressing past start phase 3 until the client was terminated, and resources held by it were cleaned up. (Bug #19712569, Bug #74154)

  • When a new data node started, API nodes were allowed to attempt to register themselves with the data node for executing transactions before the data node was ready. This forced the API node to wait an extra heartbeat interval before trying again.

    To address this issue, a number of HA_ERR_NO_CONNECTION errors (Error 4009) that could be issued during this time have been changed to Cluster temporarily unavailable errors (Error 4035), which should allow API nodes to use new data nodes more quickly than before. As part of this fix, some errors which were incorrectly categorised have been moved into the correct categories, and some errors which are no longer used have been removed. (Bug #19524096, Bug #73758)

  • When executing very large pushdown joins involving one or more indexes each defined over several columns, it was possible in some cases for the DBSPJ block (see The DBSPJ Block) in the NDB kernel to generate SCAN_FRAGREQ signals that were excessively large. This caused data nodes to fail when these could not be handled correctly, due to a hard limit in the kernel on the size of such signals (32K). This fix bypasses that limitation by breaking up SCAN_FRAGREQ data that is too large for one such signal, and sending the SCAN_FRAGREQ as a chunked or fragmented signal instead. (Bug #19390895)

  • ndb_index_stat sometimes failed when used against a table containing unique indexes. (Bug #18715165)

  • Queries against tables containing a CHAR(0) columns failed with ERROR 1296 (HY000): Got error 4547 'RecordSpecification has overlapping offsets' from NDBCLUSTER. (Bug #14798022)

  • ndb_restore failed while restoring a table which contained both a built-in conversion on the primary key and a staging conversion on a TEXT column.

    During staging, a BLOB table is created with a primary key column of the target type. However, a conversion function was not provided to convert the primary key values before loading them into the staging blob table, which resulted in corrupted primary key values in the staging BLOB table. While moving data from the staging table to the target table, the BLOB read failed because it could not find the primary key in the BLOB table.

    Now all BLOB tables are checked to see whether there are conversions on primary keys of their main tables. This check is done after all the main tables are processed, so that conversion functions and parameters have already been set for the main tables. Any conversion functions and parameters used for the primary key in the main table are now duplicated in the BLOB table. (Bug #73966, Bug #19642978)

  • Corrupted messages to data nodes sometimes went undetected, causing a bad signal to be delivered to a block which aborted the data node. This failure in combination with disconnecting nodes could in turn cause the entire cluster to shut down.

    To keep this from happening, additional checks are now made when unpacking signals received over TCP, including checks for byte order, compression flag (which must not be used), and the length of the next message in the receive buffer (if there is one).

    Whenever two consecutive unpacked messages fail the checks just described, the current message is assumed to be corrupted. In this case, the transporter is marked as having bad data and no more unpacking of messages occurs until the transporter is reconnected. In addition, an entry is written to the cluster log containing the error as well as a hex dump of the corrupted message. (Bug #73843, Bug #19582925)

  • ndb_restore --print-data truncated TEXT and BLOB column values to 240 bytes rather than 256 bytes. (Bug #65467, Bug #14571512)

  • Transporter send buffers were not updated properly following a failed send. (Bug #45043, Bug #20113145)