Oracle7 Server Concepts Manual Go to Product Documentation Library
Go to books for this product
Go to Contents for this book
Go to Index

Go to previous file in sequence Go to next file in sequence

The Optimizer

I do the very best I know how--the very best I can; and I mean to keep doing so until the end.

Abraham Lincoln

This chapter discusses how the Oracle optimizer chooses how to execute SQL statements. It includes:

For more information on the Oracle optimizer, see Chapter 5 of Oracle7 Server Tuning.

What Is Optimization?

Optimization is the process of choosing the most efficient way to execute a SQL statement. This is an important step in the processing of any Data Manipulation Language statement (SELECT, INSERT, UPDATE, or DELETE). There may be many different ways for Oracle to execute such a statement, varying, for example, which tables or indexes are accessed in which order. The procedure used to execute a statement can greatly affect how quickly the statement executes. A part of Oracle called the optimizer chooses the way that it believes to be the most efficient.

The optimizer considers a number of factors to make what is usually the best choice among its alternatives. However, an application designer usually knows more about a particular application's data than the optimizer could know. Despite the best efforts of the optimizer, in some situations a developer can choose a more effective way to execute a SQL statement than the optimizer can.

Note: The optimizer may not make the same decisions from one version of Oracle to the next. In future versions of Oracle, the optimizer may make different decisions based on better, more sophisticated information available to it.

Execution Plans

To execute a Data Manipulation Language statement, Oracle may have to perform many steps. Each of these steps either physically retrieves rows of data from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan.


This example shows an execution plan for this SQL statement:

SELECT ename, job, sal, dname 
	FROM emp, dept 
 WHERE emp.deptno = dept.deptno 
		(SELECT * 
			FROM salgrade         
		  WHERE emp.sal BETWEEN losal AND hisal);

This statement selects the name, job, salary, and department name for all employees whose salaries do not fall into any recommended salary range.

Figure 13 - 1 shows a graphical representation of the execution plan.

Figure 13 - 1. An Execution Plan

Steps of Execution Plan

Each step of the execution plan returns a set of rows that either are used by the next step or, in the last step, returned to the user or application issuing the SQL statement. A set of rows returned by a step is called a row source. Figure 13 - 1 is a hierarchical diagram showing the flow of rows from one step to another. The numbering of the steps reflects the order in which they are shown when you view the execution plan, as described in the section "The EXPLAIN PLAN Command", on 13-5. This generally is not the order in which the steps are executed.

Each step of the execution plan either retrieves rows from the database or accepts rows from one or more row sources as input:

Access paths are discussed in the section "Choosing Access Paths" [*]. Methods by which Oracle joins row sources are discussed in Oracle7 Server Tuning.

Order of Performing Execution Plan Steps

The steps of the execution plan are not performed in the order in which they are numbered. Oracle first performs the steps that appear as leaf nodes in the tree-structured graphical representation in Figure 13 - 1. The rows returned by each step become the row sources of its parent step. Then Oracle performs the parent steps.

To execute the statement for Figure 13 - 1, for example, Oracle performs the steps in this order:

Note that Oracle performs Steps 5, 4, 2, 6, and 1 once for each row returned by Step 3. Many parent steps require only a single row from their child steps before they can be executed. For such a parent step, Oracle performs the parent step (and possibly the rest of the execution plan) as soon as a single row has been returned from the child step. If the parent of that parent step also can be activated by the return of a single row, then it is executed as well. Thus the execution can cascade up the tree possibly to encompass the rest of the execution plan. Oracle performs the parent step and all cascaded steps once for each row in turn retrieved by the child step. The parent steps that are triggered for each row returned by a child step include table accesses, index accesses, nested loops joins, and filters.

Some parent steps require all rows from their child steps before they can be performed. For such a parent step, Oracle cannot perform the parent step until all rows have been returned from the child step. Such parent steps include sorts, sort-merge joins, group functions, and aggregates.


You can examine the execution plan chosen by the optimizer for a SQL statement by using the EXPLAIN PLAN command. This command causes the optimizer to choose the execution plan and then inserts data describing the plan into a database table. The following is such a description for the statement examined in the previous section:

1		  FILTER 

You can obtain such a listing by using the EXPLAIN PLAN command and then querying the output table. For information on how to use this command and produce and interpret its output, see Appendix A "Performance Diagnostic Tools" of Oracle7 Server Tuning.

Each box in Figure 13 - 1 and each row in the output table corresponds to a single step in the execution plan. For each row in the listing, the value in the ID column is the value shown in the corresponding box in Figure 13 - 1.

Oracle's Approaches to Optimization

To choose an execution plan for a SQL statement, the optimizer uses one of these approaches:


Using the rule-based approach, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths in Table 13 - 1 [*].


Using the cost-based approach, the optimizer considers available access paths and factors in information based on statistics in the data dictionary for the objects (tables, clusters, or indexes) accessed by the statement to determine which execution plan is most efficient. The cost-based approach also considers hints, or optimization suggestions in the statement placed in a comment.

The Rule-Based Approach

The rule-based approach chooses execution plans based on heuristically ranked operations. If there is more than one way to execute a SQL statement, the rule-based approach always uses the operation with the lower rank. Usually, operations of lower rank execute faster than those associated with constructs of higher rank.

The Cost-Based Approach

Conceptually, the cost-based approach consists of these steps:

Goal of the Cost-Based Approach By default, the goal of the cost-based approach is the best throughput, or minimal elapsed time necessary to process all rows accessed by the statement.

Oracle can also optimize a statement with the goal of best response time, or minimal elapsed time necessary to process the first row accessed by a SQL statement. For information on how the optimizer chooses an optimization approach and goal, see Oracle7 Server Tuning.

Statistics Used for the Cost-Based Approach The cost-based approach uses statistics to estimate the cost of each execution plan. These statistics quantify the data distribution and storage characteristics of tables, columns, and indexes. These statistics are generated using the ANALYZE command. Using these statistics, the optimizer estimates how much I/O, CPU time, and memory are required to execute a SQL statement using a particular execution plan.

The statistics are visible through these data dictionary views:

For information on these statistics, see the Oracle7 Server Reference.


Oracle's cost based optimizer (CBO) uses data value histograms to get accurate estimates of the distribution of column data. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with non-uniform data distributions. You generate histograms by using the ANALYZE command.

One of the fundamental capabilities of any cost-based optimizer is determining the selectivity of predicates that appear in queries. In releases earlier than 7.3, Oracle's cost-based optimizer supported accurate selectivity estimates, assuming that the attribute domains (a table's columns) were uniformly distributed. However, most attribute domains are not uniformly distributed.

Histograms enable the cost-based optimizer to describe the distributions of non-uniform domains by using height balanced histograms on specified attributes. Selectivity estimates are used to decide when to use an index and to choose the order that tables are joined.


Consider a column C with values between 1 and 100 and a histogram with 10 buckets. If the data in C is uniformly distributed, this histogram would look like this, where the numbers are the endpoint values.

The number of rows in each bucket is one tenth the total number of rows in the table.

If the data is not uniformly distributed, the histogram might look like this:

In this case, most of the rows have the value 5 for the column. In the uniform example, 4/10 of the rows had values between 60 and 100, in the non-uniform example, only 1/10 of the rows have values between 60 and 100.

When to Use Histograms

For many users, it is appropriate to use the FOR ALL INDEXED COLUMNS option for creating histograms because indexed columns are typically the columns most often used in WHERE clauses.

You can view histograms by using the following views:

Histograms are useful only when they reflect the current data distribution of a given column. If the data distribution is not static, the histogram should be updated frequently. The data need not be static as long as the distribution remains constant.

Histograms can affect performance and should be used only when they substantially improve query plans. Histograms are not useful for columns with the following characteristics:

For More Information

See Oracle7 Server Tuning.

How Oracle Optimizes SQL Statements

This section explains how Oracle optimizes SQL statements. For any SQL statement processed by Oracle, the optimizer does the following:

evaluation of expressions and conditions

The optimizer first evaluates expressions and conditions containing constants as fully as possible.

statement transformation

For a complex statement involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement.

view merging

For a SQL statement that accesses a view, the optimizer often merges the query in the statement with that in the view and then optimizes the result.

choice of optimization approaches

The optimizer chooses either a rule-based or cost-based approach to optimization.

choice of access paths

For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data.

choice of join orders

For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, on so on.

choice of join operations

For any join statement, the optimizer chooses an operation to use to perform the join.

Types of SQL Statements

Oracle optimizes these different types of SQL statements:

simple statements

A simple statement is an INSERT, UPDATE, DELETE, or SELECT statement that only involves a single table.

simple queries

A query is another name for a SELECT statement.


A join is a query that selects data from more than one table. A join is characterized by multiple tables in the FROM clause. Oracle pairs the rows from these tables using the condition specified in the WHERE clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.


An equijoin is characterized by a join condition containing an equality operator.


A nonequijoin is characterized by a join condition containing something other than an equality operator.

outer joins

An outer join is characterized by a join condition that uses the outer join operator (+) with one or more of the columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.

cartesian products

A join with no join condition results in a cartesian product, or a cross product. A cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A cartesian product for more than two tables is the result of pairing each row of one table with every row of the cartesian product of the remaining tables. All other kinds of joins are subsets of cartesian products effectively created by deriving the cartesian product and then excluding rows that fail the join condition.

complex statements

A complex statement is an INSERT, UPDATE, DELETE, or SELECT statement that contains a form of the SELECT statement called a subquery. This is a query within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.

compound queries

A compound query is a query that uses set operators (UNION, UNION ALL, INTERSECT, or MINUS) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.

statements accessing views

You can also write simple, join, complex, and compound statements that access views as well as tables.

distributed statements

A distributed statement is one that accesses data on a remote database.

Choosing Access Paths

One of the most important choices the optimizer makes when formulating an execution plan is how to retrieve the data from the database. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer chooses one of them.

This section discusses these topics:

Access Methods

This section describes basic methods by which Oracle can access data.

Full Table Scans A full table scan retrieves rows from a table. To perform a full table scan, Oracle reads all rows in the table, examining each row to determine whether it satisfies the statement's WHERE clause. Oracle reads every data block allocated to the table sequentially, so a full table scan can be performed very efficiently using multiblock reads. Oracle reads each data block only once.

Table Access by ROWID A table access by ROWID also retrieves rows from a table. The ROWID of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its ROWID is the fastest way for Oracle to find a single row.

To access a table by ROWID, Oracle first obtains the ROWIDs of the selected rows, either from the statement's WHERE clause or through an index scan of one of more of the table's indexes. Oracle then locates each selected row in the table based on its ROWID.

Cluster Scans From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the ROWID of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this ROWID.

Hash Scans Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.

Index Scans An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, Oracle reads the indexed column values directly from the index, rather than from the table.

In addition to each indexed value, an index also contains the ROWIDs of rows in the table having that value. If the statement accesses other columns in addition to the indexed columns, Oracle then finds the rows in the table with a table access by ROWID or a cluster scan.

An index scan can be one of these types:


A unique scan of an index returns only a single ROWID. Oracle can only perform a unique scan in cases in which only a single ROWID, rather than many ROWIDs, is required. For example, Oracle performs a unique scan if there is a UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses only a single row.


A range scan of an index can return one or more ROWIDs depending on how many rows are accessed by the statement.

Access Paths

Table 13 - 1 lists the access paths. The rank of each path is used by the rule-based approach to choose a path when more than one path is available.

Rank Access Path
1 Single row by ROWID
2 Single row by cluster join
3 Single row by hash cluster key with unique or primary key
4 Single row by unique or primary key
5 Cluster join
6 Hash cluster key
7 Indexed cluster key
8 Composite key
9 Single-column indexes
10 Bounded range search on indexed columns
11 Unbounded range search on indexed columns
12 Sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed columns
15 Full table scan
Table 13 - 1. Access Paths

The optimizer can only choose to use a particular access path for a table if the statement contains a WHERE clause condition or other construct that makes that access path available. Each of the following sections describes an access path and discusses

Choosing Among Access Paths

This section describes how the optimizer chooses among available access paths:

Choosing Among Access Paths with the Rule-Based Approach With the rule-based approach, the optimizer chooses whether to use an access path based on these factors:

To choose an access path, the optimizer first examines the conditions in the statement's WHERE clause to determine which access paths are available. The optimizer then chooses the most highly ranked available access path. Note that the full table scan is the lowest ranked access path on the list. This means that the rule-based approach always chooses an access path that uses an index if one is available, even if a full table scan might execute faster.

The order of the conditions in the WHERE clause does not normally affect the optimizer's choice among access paths.


Consider this SQL statement that selects the employee numbers of all employees in the EMP table with an ENAME value of 'CHUNG' and with a SAL value greater than 2000:

SELECT empno 
	FROM emp 
	WHERE ename = 'CHUNG'  
	  AND sal > 2000;

Consider also that the EMP table has these integrity constraints and indexes:

Based on the conditions in the WHERE clause of the SQL statement, the integrity constraints, and the indexes, these access paths are available:

Note that the PK_EMPNO index does not make the single row by primary key access path available because the indexed column does not appear in a condition in the WHERE clause.

Using the rule-based approach, the optimizer chooses the access path that uses the ENAME_IND index to execute this statement. The optimizer chooses this path because it is the most highly ranked path available.

Choosing Among Access Paths with the Cost-Based Approach With the cost-based approach, the optimizer chooses an access path based on these factors:

To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan using the statistics for the index, columns, and tables accessible to the statement. The optimizer then chooses the execution plan with the lowest estimated cost.

The optimizer's choice among available access paths can be overridden with hints. For information on hints, see Oracle7 Server Tuning.

To choose among available access paths, the optimizer considers these factors:


Consider this query that uses an equality condition in its WHERE clause to select all employees name Jackson:

	FROM emp 	
	WHERE ename = 'JACKSON';

If the ENAME column is a unique or primary key, the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key (access path 4).


Consider again the query in the previous example. If the ENAME column is not a unique or primary key, the optimizer can use these statistics to estimate the query's selectivity:


This statistic is the number of values for each column in the table.


This statistic is the number of rows in each table.

By dividing the number of rows in the EMP table by the number of distinct values in the ENAME column, the optimizer estimates what percentage of employees have the same name. By assuming that the ENAME values are uniformly distributed, the optimizer uses this percentage as the estimated selectivity of the query.


Consider this query that selects all employees with employee ID numbers less than 7500:

	FROM emp 	
	WHERE empno < 7500;

To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the EMPNO column if available. These statistics can be found in the USER_TAB_COLUMNS view. The optimizer assumes that EMPNO values are evenly distributed in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.

Optimizing Distributed Statements

The optimizer chooses execution plans for SQL statements that access data on remote databases in much the same way it chooses executions for statements that access only local data:

When choosing the execution plan for a distributed statement, the optimizer considers the available indexes on remote databases just as it does indexes on the local database. If the statement uses the cost-based approach, the optimizer also considers statistics on remote databases. Furthermore, the optimizer considers the location of data when estimating the cost of accessing it. For example, a full scan of a remote table has a greater estimated cost than a full scan of an identical local table.

Go to previous file in sequence Go to next file in sequence
Prev Next
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Go to books for this product
Go to Contents for this book
Go to Index