MySQL 8.4 Reference Manual Including MySQL NDB Cluster 8.4
In MySQL, a Batched Key Access (BKA) Join algorithm is available that uses both index access to the joined table and a join buffer. The BKA algorithm supports inner join, outer join, and semijoin operations, including nested outer joins. Benefits of BKA include improved join performance due to more efficient table scanning. Also, the Block Nested-Loop (BNL) Join algorithm previously used only for inner joins is extended and can be employed for outer join and semijoin operations, including nested outer joins.
The following sections discuss the join buffer management that underlies the extension of the original BNL algorithm, the extended BNL algorithm, and the BKA algorithm. For information about semijoin strategies, see Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations
MySQL can employ join buffers to execute not only inner joins without index access to the inner table, but also outer joins and semijoins that appear after subquery flattening. Moreover, a join buffer can be effectively used when there is an index access to the inner table.
The join buffer management code slightly more efficiently
utilizes join buffer space when storing the values of the
interesting row columns: No additional bytes are allocated
in buffers for a row column if its value is
NULL
, and the minimum number of bytes is
allocated for any value of the
VARCHAR
type.
The code supports two types of buffers, regular and
incremental. Suppose that join buffer B1
is employed to join tables t1
and
t2
and the result of this operation is
joined with table t3
using join buffer
B2
:
A regular join buffer contains columns from each join
operand. If B2
is a regular join
buffer, each row r
put into
B2
is composed of the columns of a
row r1
from
B1
and the interesting columns of a
matching row r2
from table
t3
.
An incremental join buffer contains only columns from
rows of the table produced by the second join operand.
That is, it is incremental to a row from the first
operand buffer. If B2
is an
incremental join buffer, it contains the interesting
columns of the row r2
together with a link to the row
r1
from
B1
.
Incremental join buffers are always incremental relative to
a join buffer from an earlier join operation, so the buffer
from the first join operation is always a regular buffer. In
the example just given, the buffer B1
used to join tables t1
and
t2
must be a regular buffer.
Each row of the incremental buffer used for a join operation
contains only the interesting columns of a row from the
table to be joined. These columns are augmented with a
reference to the interesting columns of the matched row from
the table produced by the first join operand. Several rows
in the incremental buffer can refer to the same row
r
whose columns are stored in the
previous join buffers insofar as all these rows match row
r
.
Incremental buffers enable less frequent copying of columns from buffers used for previous join operations. This provides a savings in buffer space because in the general case a row produced by the first join operand can be matched by several rows produced by the second join operand. It is unnecessary to make several copies of a row from the first operand. Incremental buffers also provide a savings in processing time due to the reduction in copying time.
The block_nested_loop
flag
of the optimizer_switch
system variable controls hash joins.
The batched_key_access
flag controls how the optimizer uses the Batched Key Access
join algorithms.
By default,
block_nested_loop
is
on
and
batched_key_access
is
off
. See
Section 10.9.2, “Switchable Optimizations”. Optimizer hints
may also be applied; see
Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.
For information about semijoin strategies, see Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations
The original implementation of the MySQL BNL algorithm was extended to support outer join and semijoin operations (and was later superseded by the hash join algorithm; see Section 10.2.1.4, “Hash Join Optimization”).
When these operations are executed with a join buffer, each row put into the buffer is supplied with a match flag.
If an outer join operation is executed using a join buffer,
each row of the table produced by the second operand is
checked for a match against each row in the join buffer.
When a match is found, a new extended row is formed (the
original row plus columns from the second operand) and sent
for further extensions by the remaining join operations. In
addition, the match flag of the matched row in the buffer is
enabled. After all rows of the table to be joined have been
examined, the join buffer is scanned. Each row from the
buffer that does not have its match flag enabled is extended
by NULL
complements
(NULL
values for each column in the
second operand) and sent for further extensions by the
remaining join operations.
The block_nested_loop
flag
of the optimizer_switch
system variable controls hash joins.
See Section 10.9.2, “Switchable Optimizations”, for more information. Optimizer hints may also be applied; see Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms.
In EXPLAIN
output, use of BNL
for a table is signified when the Extra
value contains Using join buffer (Block Nested
Loop)
and the type
value is
ALL
,
index
, or
range
.
For information about semijoin strategies, see Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations
MySQL implements a method of joining tables called the Batched Key Access (BKA) join algorithm. BKA can be applied when there is an index access to the table produced by the second join operand. Like the BNL join algorithm, the BKA join algorithm employs a join buffer to accumulate the interesting columns of the rows produced by the first operand of the join operation. Then the BKA algorithm builds keys to access the table to be joined for all rows in the buffer and submits these keys in a batch to the database engine for index lookups. The keys are submitted to the engine through the Multi-Range Read (MRR) interface (see Section 10.2.1.11, “Multi-Range Read Optimization”). After submission of the keys, the MRR engine functions perform lookups in the index in an optimal way, fetching the rows of the joined table found by these keys, and starts feeding the BKA join algorithm with matching rows. Each matching row is coupled with a reference to a row in the join buffer.
When BKA is used, the value of
join_buffer_size
defines
how large the batch of keys is in each request to the
storage engine. The larger the buffer, the more sequential
access is made to the right hand table of a join operation,
which can significantly improve performance.
For BKA to be used, the
batched_key_access
flag of
the optimizer_switch
system
variable must be set to on
. BKA uses MRR,
so the mrr
flag must also
be on
. Currently, the cost estimation for
MRR is too pessimistic. Hence, it is also necessary for
mrr_cost_based
to be
off
for BKA to be used. The following
setting enables BKA:
mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';
There are two scenarios by which MRR functions execute:
The first scenario is used for conventional disk-based
storage engines such as
InnoDB
and
MyISAM
. For these engines,
usually the keys for all rows from the join buffer are
submitted to the MRR interface at once. Engine-specific
MRR functions perform index lookups for the submitted
keys, get row IDs (or primary keys) from them, and then
fetch rows for all these selected row IDs one by one by
request from BKA algorithm. Every row is returned with
an association reference that enables access to the
matched row in the join buffer. The rows are fetched by
the MRR functions in an optimal way: They are fetched in
the row ID (primary key) order. This improves
performance because reads are in disk order rather than
random order.
The second scenario is used for remote storage engines
such as NDB
. A package of
keys for a portion of rows from the join buffer,
together with their associations, is sent by a MySQL
Server (SQL node) to MySQL Cluster data nodes. In
return, the SQL node receives a package (or several
packages) of matching rows coupled with corresponding
associations. The BKA join algorithm takes these rows
and builds new joined rows. Then a new set of keys is
sent to the data nodes and the rows from the returned
packages are used to build new joined rows. The process
continues until the last keys from the join buffer are
sent to the data nodes, and the SQL node has received
and joined all rows matching these keys. This improves
performance because fewer key-bearing packages sent by
the SQL node to the data nodes means fewer round trips
between it and the data nodes to perform the join
operation.
With the first scenario, a portion of the join buffer is reserved to store row IDs (primary keys) selected by index lookups and passed as a parameter to the MRR functions.
There is no special buffer to store keys built for rows from the join buffer. Instead, a function that builds the key for the next row in the buffer is passed as a parameter to the MRR functions.
In EXPLAIN
output, use of BKA
for a table is signified when the Extra
value contains Using join buffer (Batched Key
Access)
and the type
value is
ref
or
eq_ref
.
In addition to using the
optimizer_switch
system
variable to control optimizer use of the BNL and BKA
algorithms session-wide, MySQL supports optimizer hints to
influence the optimizer on a per-statement basis. See
Section 10.9.3, “Optimizer Hints”.
To use a BNL or BKA hint to enable join buffering for any inner table of an outer join, join buffering must be enabled for all inner tables of the outer join.