MySQL 8.0 Reference Manual Including MySQL NDB Cluster 8.0

10.2.2.4 Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization

The optimizer can handle derived table references using two strategies (which also apply to view references and common table expressions):

Example 1:

SELECT * FROM (SELECT * FROM t1) AS derived_t1;

With merging of the derived table derived_t1, that query is executed similar to:

SELECT * FROM t1;

Example 2:

SELECT *
  FROM t1 JOIN (SELECT t2.f1 FROM t2) AS derived_t2 ON t1.f2=derived_t2.f1
  WHERE t1.f1 > 0;

With merging of the derived table derived_t2, that query is executed similar to:

SELECT t1.*, t2.f1
  FROM t1 JOIN t2 ON t1.f2=t2.f1
  WHERE t1.f1 > 0;

With materialization, derived_t1 and derived_t2 are each treated as a separate table within their respective queries.

The optimizer handles derived tables, view references, and common table expressions the same way: It avoids unnecessary materialization whenever possible, which enables pushing down conditions from the outer query to derived tables and produces more efficient execution plans. (For an example, see Section 10.2.2.2, “Optimizing Subqueries with Materialization”.)

If merging would result in an outer query block that references more than 61 base tables, the optimizer chooses materialization instead.

The optimizer propagates an ORDER BY clause in a derived table or view reference to the outer query block if these conditions are all true:

Otherwise, the optimizer ignores the ORDER BY clause.

The following means are available to influence whether the optimizer attempts to merge derived tables, view references, and common table expressions into the outer query block:

If the optimizer chooses the materialization strategy rather than merging for a derived table, it handles the query as follows:

Consider the following EXPLAIN statement, for a SELECT query that contains a derived table:

EXPLAIN SELECT * FROM (SELECT * FROM t1) AS derived_t1;

The optimizer avoids materializing the derived table by delaying it until the result is needed during SELECT execution. In this case, the query is not executed (because it occurs in an EXPLAIN statement), so the result is never needed.

Even for queries that are executed, delay of derived table materialization may enable the optimizer to avoid materialization entirely. When this happens, query execution is quicker by the time needed to perform materialization. Consider the following query, which joins the result of a derived table to another table:

SELECT *
  FROM t1 JOIN (SELECT t2.f1 FROM t2) AS derived_t2
          ON t1.f2=derived_t2.f1
  WHERE t1.f1 > 0;

If the optimization processes t1 first and the WHERE clause produces an empty result, the join must necessarily be empty and the derived table need not be materialized.

For cases when a derived table requires materialization, the optimizer may add an index to the materialized table to speed up access to it. If such an index enables ref access to the table, it can greatly reduce amount of data read during query execution. Consider the following query:

SELECT *
 FROM t1 JOIN (SELECT DISTINCT f1 FROM t2) AS derived_t2
         ON t1.f1=derived_t2.f1;

The optimizer constructs an index over column f1 from derived_t2 if doing so would enable use of ref access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a regular table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If ref access would result in higher cost than some other access method, the optimizer creates no index and loses nothing.

For optimizer trace output, a merged derived table or view reference is not shown as a node. Only its underlying tables appear in the top query's plan.

What is true for materialization of derived tables is also true for common table expressions (CTEs). In addition, the following considerations pertain specifically to CTEs.

If a CTE is materialized by a query, it is materialized once for the query, even if the query references it several times.

A recursive CTE is always materialized.

If a CTE is materialized, the optimizer automatically adds relevant indexes if it estimates that indexing can speed up access by the top-level statement to the CTE. This is similar to automatic indexing of derived tables, except that if the CTE is referenced multiple times, the optimizer may create multiple indexes, to speed up access by each reference in the most appropriate way.

The MERGE and NO_MERGE optimizer hints can be applied to CTEs. Each CTE reference in the top-level statement can have its own hint, permitting CTE references to be selectively merged or materialized. The following statement uses hints to indicate that cte1 should be merged and cte2 should be materialized:

WITH
  cte1 AS (SELECT a, b FROM table1),
  cte2 AS (SELECT c, d FROM table2)
SELECT /*+ MERGE(cte1) NO_MERGE(cte2) */ cte1.b, cte2.d
FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

The ALGORITHM clause for CREATE VIEW does not affect materialization for any WITH clause preceding the SELECT statement in the view definition. Consider this statement:

CREATE ALGORITHM={TEMPTABLE|MERGE} VIEW v1 AS WITH ... SELECT ...

The ALGORITHM value affects materialization only of the SELECT, not the WITH clause.

Prior to MySQL 8.0.16, if internal_tmp_disk_storage_engine=MYISAM, an error occurred for any attempt to materialize a CTE using an on-disk temporary table, since for CTEs, the storage engine used for on-disk internal temporary tables could not be MyISAM. Beginning with MySQL 8.0.16, this is no longer an issue, since TempTable now always uses InnoDB for on-disk internal temporary tables.

As mentioned previously, a CTE, if materialized, is materialized once, even if referenced multiple times. To indicate one-time materialization, optimizer trace output contains an occurrence of creating_tmp_table plus one or more occurrences of reusing_tmp_table.

CTEs are similar to derived tables, for which the materialized_from_subquery node follows the reference. This is true for a CTE that is referenced multiple times, so there is no duplication of materialized_from_subquery nodes (which would give the impression that the subquery is executed multiple times, and produce unnecessarily verbose output). Only one reference to the CTE has a complete materialized_from_subquery node with the description of its subquery plan. Other references have a reduced materialized_from_subquery node. The same idea applies to EXPLAIN output in TRADITIONAL format: Subqueries for other references are not shown.