19 Changes in MySQL NDB Cluster 7.2.20 (5.5.43-ndb-7.2.20) (2015-04-13, General Availability)

MySQL NDB Cluster 7.2.20 is a new release of NDB Cluster, incorporating new features in the NDB storage engine, and fixing recently discovered bugs in previous MySQL NDB Cluster 7.2 development releases.

Obtaining MySQL NDB Cluster 7.2. MySQL NDB Cluster 7.2 source code and binaries can be obtained from https://dev.mysql.com/downloads/cluster/.

This release also incorporates all bug fixes and changes made in previous NDB Cluster releases, as well as all bug fixes and feature changes which were added in mainline MySQL 5.5 through MySQL 5.5.43 (see Changes in MySQL 5.5.43 (2015-04-06, General Availability)).

Bugs Fixed

Important Change: The maximum failure time calculation used to ensure that normal node failure handling mechanisms are given time to handle survivable cluster failures (before global checkpoint watchdog mechanisms start to kill nodes due to GCP delays) was excessively conservative, and neglected to consider that there can be at most number_of_data_nodes / NoOfReplicas node failures before the cluster can no longer survive. Now the value of NoOfReplicas is properly taken into account when performing this calculation.
This fix adds the TimeBetweenGlobalCheckpointsTimeout data node configuration parameter, which makes the minimum timeout between global checkpoints settable by the user. This timeout was previously fixed internally at 120000 milliseconds, which is now the default value for this parameter. (Bug #20069617, Bug #20069624)
References: See also: Bug #19858151, Bug #20128256, Bug #20135976.
NDB Cluster APIs: When a transaction is started from a cluster connection, Table and Index schema objects may be passed to this transaction for use. If these schema objects have been acquired from a different connection (Ndb_cluster_connection object), they can be deleted at any point by the deletion or disconnection of the owning connection. This can leave a connection with invalid schema objects, which causes an NDB API application to fail when these are dereferenced.
To avoid this problem, if your application uses multiple connections, you can now set a check to detect sharing of schema objects between connections when passing a schema object to a transaction, using the NdbTransaction::setSchemaObjectOwnerChecks() method added in this release. When this check is enabled, the schema objects having the same names are acquired from the connection and compared to the schema objects passed to the transaction. Failure to match causes the application to fail with an error. (Bug #19785977)
NDB Cluster APIs: The increase in the default number of hashmap buckets (DefaultHashMapSize API node configuration parameter) from 240 to 3480 in MySQL NDB Cluster 7.2.11 increased the size of the internal DictHashMapInfo::HashMap type considerably. This type was allocated on the stack in some getTable() calls which could lead to stack overflow issues for NDB API users.
To avoid this problem, the hashmap is now dynamically allocated from the heap. (Bug #19306793)
NDB Cluster APIs: A scan operation, whether it is a single table scan or a query scan used by a pushed join, stores the result set in a buffer. This maximum size of this buffer is calculated and preallocated before the scan operation is started. This buffer may consume a considerable amount of memory; in some cases we observed a 2 GB buffer footprint in tests that executed 100 parallel scans with 2 single-threaded (ndbd) data nodes. This memory consumption was found to scale linearly with additional fragments.
A number of root causes, listed here, were discovered that led to this problem:
- Result rows were unpacked to full NdbRecord format before they were stored in the buffer. If only some but not all columns of a table were selected, the buffer contained empty space (essentially wasted).
- Due to the buffer format being unpacked, VARCHAR and VARBINARY columns always had to be allocated for the maximum size defined for such columns.
- BatchByteSize and MaxScanBatchSize values were not taken into consideration as a limiting factor when calculating the maximum buffer size.
These issues became more evident in NDB 7.2 and later MySQL NDB Cluster release series. This was due to the fact buffer size is scaled by BatchSize, and that the default value for this parameter was increased fourfold (from 64 to 256) beginning with MySQL NDB Cluster 7.2.1.
This fix causes result rows to be buffered using the packed format instead of the unpacked format; a buffered scan result row is now not unpacked until it becomes the current row. In addition, BatchByteSize and MaxScanBatchSize are now used as limiting factors when calculating the required buffer size.
Also as part of this fix, refactoring has been done to separate handling of buffered (packed) from handling of unbuffered result sets, and to remove code that had been unused since NDB 7.0 or earlier. The NdbRecord class declaration has also been cleaned up by removing a number of unused or redundant member variables. (Bug #73781, Bug #75599, Bug #19631350, Bug #20408733)
It was found during testing that problems could arise when the node registered as the arbitrator disconnected or failed during the arbitration process.
In this situation, the node requesting arbitration could never receive a positive acknowledgement from the registered arbitrator; this node also lacked a stable set of members and could not initiate selection of a new arbitrator.
Now in such cases, when the arbitrator fails or loses contact during arbitration, the requesting node immediately fails rather than waiting to time out. (Bug #20538179)
When a data node fails or is being restarted, the remaining nodes in the same nodegroup resend to subscribers any data which they determine has not already been sent by the failed node. Normally, when a data node (actually, the SUMA kernel block) has sent all data belonging to an epoch for which it is responsible, it sends a SUB_GCP_COMPLETE_REP signal, together with a count, to all subscribers, each of which responds with a SUB_GCP_COMPLETE_ACK. When SUMA receives this acknowledgment from all subscribers, it reports this to the other nodes in the same nodegroup so that they know that there is no need to resend this data in case of a subsequent node failure. If a node failed before all subscribers sent this acknowledgement but before all the other nodes in the same nodegroup received it from the failing node, data for some epochs could be sent (and reported as complete) twice, which could lead to an unplanned shutdown.
The fix for this issue adds to the count reported by SUB_GCP_COMPLETE_ACK a list of identifiers which the receiver can use to keep track of which buckets are completed and to ignore any duplicate reported for an already completed bucket. (Bug #17579998)
When performing a restart, it was sometimes possible to find a log end marker which had been written by a previous restart, and that should have been invalidated. Now when searching for the last page to invalidate, the same search algorithm is used as when searching for the last page of the log to read. (Bug #76207, Bug #20665205)
When reading and copying transporter short signal data, it was possible for the data to be copied back to the same signal with overlapping memory. (Bug #75930, Bug #20553247)
When a bulk delete operation was committed early to avoid an additional round trip, while also returning the number of affected rows, but failed with a timeout error, an SQL node performed no verification that the transaction was in the Committed state. (Bug #74494, Bug #20092754)
References: See also: Bug #19873609.