Join Methods

The optimizer can select from multiple join methods. When the rows from two tables are joined, one table is designated the outer table and the other the inner table. The optimizer decides which of the tables should be the outer table and which should be the inner table. During a join, the optimizer scans the rows in the outer and inner tables to locate the rows that match the join condition.

The optimizer analyzes the statistics for each table and, for example, might identify the smallest table or the table with the best selectivity for the query as outer table. If indexes exist for one or more of the tables to be joined, the optimizer takes them into account when selecting the outer and inner tables.

If more than two tables are to be joined, the optimizer analyzes the various combinations of joins on table pairs to determine which pair to join first, which table to join with the result of the join, and so on for the optimum sequence of joins.

The cost of a join is largely influenced by the method in which the inner and outer tables are accessed to locate the rows that match the join condition. The optimizer selects from two join methods when determining the query optimizer plan:

Nested Loop Join

In a nested loop join with no indexes, a row in the outer table is selected one at a time and matched against every row in the inner table. All the rows in the inner table are scanned as many times as the number of rows in the outer table.

If the inner table has an index on the join column, that index is used to select the rows that meet the join condition. The rows from each table that satisfy the join condition are returned. Indexes may be created on the fly for inner tables in nested loops, and the results from inner scans may be materialized before the join.

Figure 5-2 shows an example of a nested loop join. The join condition is:

WHERE t1.a=t2.a

For this example, the optimizer has decided that t1 is the outer table and t2 is the inner table. Values in column a in table t1 that match values in column a in table t2 are 1 and 7. The join results concatenate the rows from t1 and t2. For example, the first join result is the following row:

7 50 43.54 21 13.69

It concatenates a row from t1:

7 50 43.54

with the first row from t2 in which the values in column a match:

7 21 13.69

Merge Join

A merge join is used only when the join columns are sorted by range indexes. In a merge join, a cursor advances through each index one row at a time. Because the rows are already sorted on the join columns in each index, a simple formula is applied to efficiently advance the cursors through each row in a single scan. The formula looks something like:

  • If Inner.JoinColumn < Outer.JoinColumn, then advance inner cursor

  • If Inner.JoinColumn = Outer.JoinColumn, then read match

  • If Inner.JoinColumn > Outer.JoinColumn, then advance outer cursor

Unlike a nested loop join, there is no need to scan the entire inner table for each row in the outer table. A merge join can be used when range indexes have been created for the tables before preparing the query. If no range indexes exist for the tables being joined before preparing the query, the optimizer may in some situations create temporary range indexes in order to use a merge join.

Figure 5-3 shows an example of a merge join. The join condition is:

WHERE t1.a=t2.a

x1 is the index on table t1, sorting on column a. x2 is the index on table t2, sorting on column a. The merge join results concatenate the rows in x1 with rows in x2 in which the values in column a match. For example, the first merge join result is:

1 20 23.09 20 43.59

It concatenates a row in x1:

1 20 23.09

with the first row in x2 in which the values in column a match:

1 20 43.59