MySQL 5.6 Reference Manual Including MySQL NDB Cluster 7.3-7.4 Reference Guide

The syntax for expressing joins permits nested joins. The following discussion refers to the join syntax described in Section 13.2.9.2, “JOIN Clause”.

The syntax of * table_factor* is
extended in comparison with the SQL Standard. The latter
accepts only

`table_reference`

`table_reference`

SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

Is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, `CROSS JOIN`

is syntactically
equivalent to `INNER JOIN`

; they can replace
each other. In standard SQL, they are not equivalent.
`INNER JOIN`

is used with an
`ON`

clause; `CROSS JOIN`

is
used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. Consider this join expression:

t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL) ON t1.a=t2.a

After removing parentheses and grouping operations to the left, that join expression transforms into this expression:

(t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL

Yet, the two expressions are not equivalent. To see this,
suppose that the tables `t1`

,
`t2`

, and `t3`

have the
following state:

Table

`t1`

contains rows`(1)`

,`(2)`

Table

`t2`

contains row`(1,101)`

Table

`t3`

contains row`(101)`

In this case, the first expression returns a result set
including the rows `(1,1,101,101)`

,
`(2,NULL,NULL,NULL)`

, whereas the second
expression returns the rows `(1,1,101,101)`

,
`(2,NULL,NULL,101)`

:

mysql>`SELECT *`

`FROM t1`

`LEFT JOIN`

`(t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)`

+------+------+------+------+ | a | a | b | b | +------+------+------+------+ | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | NULL | +------+------+------+------+ mysql>`ON t1.a=t2.a;`

`SELECT *`

`FROM (t1 LEFT JOIN t2 ON t1.a=t2.a)`

`LEFT JOIN t3`

+------+------+------+------+ | a | a | b | b | +------+------+------+------+ | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | 101 | +------+------+------+------+`ON t2.b=t3.b OR t2.b IS NULL;`

In the following example, an outer join operation is used together with an inner join operation:

t1 LEFT JOIN (t2, t3) ON t1.a=t2.a

That expression cannot be transformed into the following expression:

t1 LEFT JOIN t2 ON t1.a=t2.a, t3

For the given table states, the two expressions return different sets of rows:

mysql>`SELECT *`

+------+------+------+------+ | a | a | b | b | +------+------+------+------+ | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | NULL | +------+------+------+------+ mysql>`FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;`

`SELECT *`

+------+------+------+------+ | a | a | b | b | +------+------+------+------+ | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | 101 | +------+------+------+------+`FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;`

Therefore, if we omit parentheses in a join expression with outer join operators, we might change the result set for the original expression.

More exactly, we cannot ignore parentheses in the right operand of the left outer join operation and in the left operand of a right join operation. In other words, we cannot ignore parentheses for the inner table expressions of outer join operations. Parentheses for the other operand (operand for the outer table) can be ignored.

The following expression:

(t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)

Is equivalent to this expression for any tables
`t1,t2,t3`

and any condition
`P`

over attributes `t2.b`

and `t3.b`

:

t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)

Whenever the order of execution of join operations in a join
expression (* joined_table*) is not
from left to right, we talk about nested joins. Consider the
following queries:

SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a WHERE t1.a > 1 SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1

Those queries are considered to contain these nested joins:

t2 LEFT JOIN t3 ON t2.b=t3.b t2, t3

In the first query, the nested join is formed with a left join operation. In the second query, it is formed with an inner join operation.

In the first query, the parentheses can be omitted: The
grammatical structure of the join expression dictates the same
order of execution for join operations. For the second query,
the parentheses cannot be omitted, although the join
expression here can be interpreted unambiguously without them.
In our extended syntax, the parentheses in ```
(t2,
t3)
```

of the second query are required, although
theoretically the query could be parsed without them: We still
would have unambiguous syntactical structure for the query
because `LEFT JOIN`

and `ON`

play the role of the left and right delimiters for the
expression `(t2,t3)`

.

The preceding examples demonstrate these points:

For join expressions involving only inner joins (and not outer joins), parentheses can be removed and joins evaluated left to right. In fact, tables can be evaluated in any order.

The same is not true, in general, for outer joins or for outer joins mixed with inner joins. Removal of parentheses may change the result.

Queries with nested outer joins are executed in the same
pipeline manner as queries with inner joins. More exactly, a
variation of the nested-loop join algorithm is exploited.
Recall the algorithm by which the nested-loop join executes a
query (see Section 8.2.1.6, “Nested-Loop Join Algorithms”). Suppose that
a join query over 3 tables `T1,T2,T3`

has
this form:

SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2) INNER JOIN T3 ON P2(T2,T3) WHERE P(T1,T2,T3)

Here, `P1(T1,T2)`

and
`P2(T3,T3)`

are some join conditions (on
expressions), whereas `P(T1,T2,T3)`

is a
condition over columns of tables `T1,T2,T3`

.

The nested-loop join algorithm would execute this query in the following manner:

FOR each row t1 in T1 { FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } }

The notation `t1||t2||t3`

indicates a row
constructed by concatenating the columns of rows
`t1`

, `t2`

, and
`t3`

. In some of the following examples,
`NULL`

where a table name appears means a row
in which `NULL`

is used for each column of
that table. For example, `t1||t2||NULL`

indicates a row constructed by concatenating the columns of
rows `t1`

and `t2`

, and
`NULL`

for each column of
`t3`

. Such a row is said to be
`NULL`

-complemented.

Now consider a query with nested outer joins:

SELECT * FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON P2(T2,T3)) ON P1(T1,T2) WHERE P(T1,T2,T3)

For this query, modify the nested-loop pattern to obtain:

FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF P(t1,t2,NULL) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } }

In general, for any nested loop for the first inner table in
an outer join operation, a flag is introduced that is turned
off before the loop and is checked after the loop. The flag is
turned on when for the current row from the outer table a
match from the table representing the inner operand is found.
If at the end of the loop cycle the flag is still off, no
match has been found for the current row of the outer table.
In this case, the row is complemented by
`NULL`

values for the columns of the inner
tables. The result row is passed to the final check for the
output or into the next nested loop, but only if the row
satisfies the join condition of all embedded outer joins.

In the example, the outer join table expressed by the following expression is embedded:

(T2 LEFT JOIN T3 ON P2(T2,T3))

For the query with inner joins, the optimizer could choose a different order of nested loops, such as this one:

FOR each row t3 in T3 { FOR each row t2 in T2 such that P2(t2,t3) { FOR each row t1 in T1 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } }

For queries with outer joins, the optimizer can choose only
such an order where loops for outer tables precede loops for
inner tables. Thus, for our query with outer joins, only one
nesting order is possible. For the following query, the
optimizer evaluates two different nestings. In both nestings,
`T1`

must be processed in the outer loop
because it is used in an outer join. `T2`

and
`T3`

are used in an inner join, so that join
must be processed in the inner loop. However, because the join
is an inner join, `T2`

and
`T3`

can be processed in either order.

SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3) WHERE P(T1,T2,T3)

One nesting evaluates `T2`

, then
`T3`

:

FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t1,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } }

The other nesting evaluates `T3`

, then
`T2`

:

FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t3 in T3 such that P2(t1,t3) { FOR each row t2 in T2 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } }

When discussing the nested-loop algorithm for inner joins, we
omitted some details whose impact on the performance of query
execution may be huge. We did not mention so-called
“pushed-down” conditions. Suppose that our
`WHERE`

condition
`P(T1,T2,T3)`

can be represented by a
conjunctive formula:

P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).

In this case, MySQL actually uses the following nested-loop algorithm for the execution of the query with inner joins:

FOR each row t1 in T1 such that C1(t1) { FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2) { FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } }

You see that each of the conjuncts `C1(T1)`

,
`C2(T2)`

, `C3(T3)`

are
pushed out of the most inner loop to the most outer loop where
it can be evaluated. If `C1(T1)`

is a very
restrictive condition, this condition pushdown may greatly
reduce the number of rows from table `T1`

passed to the inner loops. As a result, the execution time for
the query may improve immensely.

For a query with outer joins, the `WHERE`

condition is to be checked only after it has been found that
the current row from the outer table has a match in the inner
tables. Thus, the optimization of pushing conditions out of
the inner nested loops cannot be applied directly to queries
with outer joins. Here we must introduce conditional
pushed-down predicates guarded by the flags that are turned on
when a match has been encountered.

Recall this example with outer joins:

P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)

For that example, the nested-loop algorithm using guarded pushed-down conditions looks like this:

FOR each row t1 in T1 such that C1(t1) { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) AND (f1?C2(t2):TRUE) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) { IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1 && P(t1,NULL,NULL)) { t:=t1||NULL||NULL; OUTPUT t; } }

In general, pushed-down predicates can be extracted from join
conditions such as `P1(T1,T2)`

and
`P(T2,T3)`

. In this case, a pushed-down
predicate is guarded also by a flag that prevents checking the
predicate for the `NULL`

-complemented row
generated by the corresponding outer join operation.

Access by key from one inner table to another in the same
nested join is prohibited if it is induced by a predicate from
the `WHERE`

condition.