The Derby optimizer usually makes a good
choice about join order. This section discusses the performance implications
of join order.
In a join operation involving two tables, Derby scans the tables
in a particular order. Derby accesses rows in one table first, and this
table is now called the outer table.
Then, for each qualifying row in the outer table, Derby looks for
matching rows in the second table, which is called the inner table.
Derby accesses the outer table once, and the inner table probably
many times (depending on how many rows in the outer table qualify).
This leads to a few general rules of thumb about join order:
- If the join has no restrictions in the WHERE clause that would limit the
number of rows returned from one of the tables to just a few, the following
rules apply:
- If only one table has an index on the joined column
or columns, it is much better for that table to be the inner table. This is
because for each of the many inner table lookups, Derby can use an index
instead of scanning the entire table.
- Since indexes on inner tables are accessed many times, if the index on
one table is smaller than the index on another, the table with the smaller
one should probably be the inner table. That is because smaller indexes (or
tables) can be cached (kept in Derby's memory, allowing Derby to
avoid expensive I/O for each iteration).
- On the other hand, if a query has restrictions in the WHERE clause for
one table that would cause it to return only a few rows from that table (for
example, WHERE flight_id = 'AA1111'), it is better for the restricted table
to be the outer table. Derby will have to go to the inner table only
a few times anyway.
Consider:
SELECT *
FROM huge_table, small_table
WHERE huge_table.unique_column = 1
AND huge_table.other_column = small_table.non_unique_column
- In this case, the qualification huge_table.unique_column
= 1 (assuming a unique index on the column) qualifies only one row, so
it is better for huge_table to be the outer table
in the join.