A Optimizing SQL Queries

This document provides tips on improving the performance of your SQL queries. Topics include:

Section A.1, "Optimizing Single-Table Queries"
Section A.2, "Optimizing Join Queries"
Section A.3, "Optimizing with Order By and Group By Clauses"

The tip examples use the database schema listed in Table A-1:

Table A-1 Database Schema Examples

Tables	Columns	Primary Keys	Foreign Keys
LOCATION	LOC# LOC_NAME	LOC#
EMP	SS# NAME JOB_TITLE WORKS_IN	SS#	WORKS_IN references DEPT (DEPT#)
DEPT	DEPT# NAME BUDGET LOC MGR	DEPT#	LOC references LOCATION (LOC#) MGR references EMP (SS#)

Tables

Columns

Primary Keys

Foreign Keys

LOCATION

LOC#

LOC_NAME

LOC#

EMP

SS#

NAME

JOB_TITLE

WORKS_IN

SS#

WORKS_IN references DEPT (DEPT#)

DEPT

DEPT#

NAME

BUDGET

LOC

MGR

DEPT#

LOC references LOCATION (LOC#)

MGR references EMP (SS#)

A.1 Optimizing Single-Table Queries

To improve the performance of a query that selects rows of a table based on a specific column value, create an index on that column. For example, the following query performs better if the NAME column of the EMP table has an index.

SELECT * 

FROM EMP 

WHERE NAME = 'Smith';

An index may ruin performance if selecting more than 10% of the rows of the indexing columns is poor. For example, an index on JOB_TITLE may not be a good choice even if the query is as follows.

SELECT *
FROM EMP
WHERE JOB_TITLE='CLERK'

A.2 Optimizing Join Queries

The following can improve the performance of a join query (a query with more than one table reference in the FROM clause).

A.2.1 Create an Index on the Join Column(s) of the Inner Table

In the following example, the inner table of the join query is DEPT and the join column of DEPT is DEPT#. An index on DEPT.DEPT# improves the performance of the query. In this example, since DEPT# is the primary key of DEPT, an index is implicitly created for it. The optimizer will detect the presence of the index and decide to use DEPT as the inner table. In case there is also an index on EMP.WORKS_IN column the optimizer evaluates the cost of both orders of execution; DEPT followed by EMP (where EMP is the inner table) and EMP followed by DEPT (where DEPT is the inner table) and picks the least expensive execution plan.

SELECT e.SS#, e.NAME, d.BUDGET

FROM EMP e, DEPT d 

WHERE e.WORKS_IN = DEPT.DEPT# 

AND e.JOB_TITLE = 'Manager';

A.2.2 Bypassing the Query Optimizer

Normally optimizer picks the best execution plan, an optimal order of tables to be joined. In case the optimizer is not producing a good execution plan you can control the order of execution using the HINTS feature SQL. For more information see the Oracle Database Lite SQL Reference.

For example, if you want to select the name of each department along with the name of its manager, you can write the query in one of two ways. In the first example which follows, the hint /*+ordered*/ says to do the join in the order the tables appear in the FROM clause.

SELECT /*+ordered*/ d.NAME, e.NAME

FROM DEPT d, EMP e

WHERE d.MGR = e.SS#

or:

SELECT /*+ordered*/ d.NAME, e.NAME 

FROM EMP e, DEPT d 

WHERE d.MGR = e.SS#

Suppose that there are 10 departments and 1000 employees, and that the inner table in each query has an index on the join column. In the first query, the first table produces 10 qualifying rows (in this case, the whole table). In the second query, the first table produces 1000 qualifying rows. The first query will access the EMP table 10 times and scan the DEPT table once. The second query will scan the EMP table once but will access the DEPT table 1000 times. Therefore the first query will perform much better. As a rule of thumb, tables should be arranged from smallest effective number of rows to largest effective number of rows. The effective row size of a table in a query is obtained by applying the logical conditions that are resolved entirely on that table.

In another example, consider a query to retrieve the social security numbers and names of employees in a given location, such as New York. According to the sample schema, the query would have three table references in the FROM clause. The three tables could be ordered in six different ways. Although the result is the same regardless of which order you choose, the performance could be quite different.

Suppose the effective row size of the LOCATION table is small, for example select count(*) from LOCATION where LOC_NAME = 'New York' is a small set. Based on the above rules, the LOCATION table should be the first table in the FROM clause. There should be an index on LOCATION.LOC_NAME. Since LOCATION must be joined with DEPT, DEPT should be the second table and there should be an index on the LOC column of DEPT. Similarly, the third table should be EMP and there should be an index on EMP#. You could write this query as:

SELECT /*+ordered*/ e.SS#, e.NAME 

FROM LOCATION l, DEPT d, EMP e 

WHERE l.LOC_NAME = 'New York' AND 

l.LOC# = d.LOC AND 

d.DEPT# = e.WORKS_IN;

A.3 Optimizing with Order By and Group By Clauses

Various performance improvements have been made so that SELECT statements run faster and consume less memory cache. Group by and Order by clauses attempt to avoid sorting if a suitable index is available.

A.3.1 IN Subquery Conversion

Converts IN subquery to a join when the select list in the subquery is uniquely indexed.

For example, the following IN subquery statement is converted to its corresponding join statement. This assumes that c1 is the primary key of table t2:

SELECT c2 FROM t1 WHERE 

c2 IN (SELECT c1 FROM t2);

becomes:

SELECT c2 FROM t1, t2 WHERE t1.c2 = t2.c1;

A.3.2 ORDER BY Optimization with No GROUP BY

This eliminates the sorting step for an ORDER BY clause in a select statement if ALL of the following conditions are met:

All ORDER BY columns are in ascending order or in descending order.
Only columns appear in the ORDER BY clause. That is, no expressions are used in the ORDER BY clause.
ORDER BY columns are a prefix of some base table index.
The estimated cost of accessing by the index is less than the estimated cost of sorting the result set.

A.3.3 GROUP BY Optimization with No ORDER BY

This eliminates the sorting step for the grouping operation if GROUP BY columns are the prefix of some base table index.

A.3.4 ORDER BY Optimization with GROUP BY

When ORDER BY columns are the prefix of GROUP BY columns, and all columns are sorted in either ascending or in descending order, the sorting step for the query result is eliminated. If GROUP BY columns are the prefix of a base table index, the sorting step in the grouping operation is also eliminated.

A.3.5 Cache Subquery Results

If the optimizer determines that the number of rows returned by a subquery is small and the query is non-correlated, then the query result will be cached in memory for better performance. For example:

select * from t1 where 

t1.c1 = (select sum(salary) 

from t2 where t2.deptno = 100);