|Oracle7 Server Concepts Manual||
This chapter discusses how the Oracle optimizer chooses how to execute SQL statements. It includes:
The optimizer considers a number of factors to make what is usually the best choice among its alternatives. However, an application designer usually knows more about a particular application's data than the optimizer could know. Despite the best efforts of the optimizer, in some situations a developer can choose a more effective way to execute a SQL statement than the optimizer can.
Note: The optimizer may not make the same decisions from one version of Oracle to the next. In future versions of Oracle, the optimizer may make different decisions based on better, more sophisticated information available to it.
This example shows an execution plan for this SQL statement:
SELECT ename, job, sal, dname
FROM emp, dept
WHERE emp.deptno = dept.deptno
AND NOT EXISTS
WHERE emp.sal BETWEEN losal AND hisal);
This statement selects the name, job, salary, and department name for all employees whose salaries do not fall into any recommended salary range.
Figure 13 - 1 shows a graphical representation of the execution plan.
Figure 13 - 1. An Execution Plan
Each step of the execution plan either retrieves rows from the database or accepts rows from one or more row sources as input:
To execute the statement for Figure 13 - 1, for example, Oracle performs the steps in this order:
Some parent steps require all rows from their child steps before they can be performed. For such a parent step, Oracle cannot perform the parent step until all rows have been returned from the child step. Such parent steps include sorts, sort-merge joins, group functions, and aggregates.
ID OPERATION OPTIONS OBJECT_NAME
0 SELECT STATEMENT
2 NESTED LOOPS
3 TABLE ACCESS FULL EMP
4 TABLE ACCESS BY ROWID DEPT
5 INDEX UNIQUE SCAN PK_DEPTNO
6 TABLE ACCESS FULL SALGRADE
You can obtain such a listing by using the EXPLAIN PLAN command and then querying the output table. For information on how to use this command and produce and interpret its output, see Appendix A "Performance Diagnostic Tools" of Oracle7 Server Tuning.
Each box in Figure 13 - 1 and each row in the output table corresponds to a single step in the execution plan. For each row in the listing, the value in the ID column is the value shown in the corresponding box in Figure 13 - 1.
Using the rule-based approach, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths in Table 13 - 1 .
Using the cost-based approach, the optimizer considers available access paths and factors in information based on statistics in the data dictionary for the objects (tables, clusters, or indexes) accessed by the statement to determine which execution plan is most efficient. The cost-based approach also considers hints, or optimization suggestions in the statement placed in a comment.
Oracle can also optimize a statement with the goal of best response time, or minimal elapsed time necessary to process the first row accessed by a SQL statement. For information on how the optimizer chooses an optimization approach and goal, see Oracle7 Server Tuning.
Statistics Used for the Cost-Based Approach The cost-based approach uses statistics to estimate the cost of each execution plan. These statistics quantify the data distribution and storage characteristics of tables, columns, and indexes. These statistics are generated using the ANALYZE command. Using these statistics, the optimizer estimates how much I/O, CPU time, and memory are required to execute a SQL statement using a particular execution plan.
The statistics are visible through these data dictionary views:
One of the fundamental capabilities of any cost-based optimizer is determining the selectivity of predicates that appear in queries. In releases earlier than 7.3, Oracle's cost-based optimizer supported accurate selectivity estimates, assuming that the attribute domains (a table's columns) were uniformly distributed. However, most attribute domains are not uniformly distributed.
Histograms enable the cost-based optimizer to describe the distributions of non-uniform domains by using height balanced histograms on specified attributes. Selectivity estimates are used to decide when to use an index and to choose the order that tables are joined.
The number of rows in each bucket is one tenth the total number of rows in the table.
If the data is not uniformly distributed, the histogram might look like this:
In this case, most of the rows have the value 5 for the column. In the uniform example, 4/10 of the rows had values between 60 and 100, in the non-uniform example, only 1/10 of the rows have values between 60 and 100.
You can view histograms by using the following views:
Histograms can affect performance and should be used only when they substantially improve query plans. Histograms are not useful for columns with the following characteristics:
evaluation of expressions and conditions
The optimizer first evaluates expressions and conditions containing constants as fully as possible.
For a complex statement involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement.
For a SQL statement that accesses a view, the optimizer often merges the query in the statement with that in the view and then optimizes the result.
choice of optimization approaches
The optimizer chooses either a rule-based or cost-based approach to optimization.
choice of access paths
For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data.
choice of join orders
For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, on so on.
choice of join operations
For any join statement, the optimizer chooses an operation to use to perform the join.
A simple statement is an INSERT, UPDATE, DELETE, or SELECT statement that only involves a single table.
A query is another name for a SELECT statement.
A join is a query that selects data from more than one table. A join is characterized by multiple tables in the FROM clause. Oracle pairs the rows from these tables using the condition specified in the WHERE clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.
An equijoin is characterized by a join condition containing an equality operator.
A nonequijoin is characterized by a join condition containing something other than an equality operator.
An outer join is characterized by a join condition that uses the outer join operator (+) with one or more of the columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.
A join with no join condition results in a cartesian product, or a cross product. A cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A cartesian product for more than two tables is the result of pairing each row of one table with every row of the cartesian product of the remaining tables. All other kinds of joins are subsets of cartesian products effectively created by deriving the cartesian product and then excluding rows that fail the join condition.
A complex statement is an INSERT, UPDATE, DELETE, or SELECT statement that contains a form of the SELECT statement called a subquery. This is a query within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.
A compound query is a query that uses set operators (UNION, UNION ALL, INTERSECT, or MINUS) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.
statements accessing views
You can also write simple, join, complex, and compound statements that access views as well as tables.
A distributed statement is one that accesses data on a remote database.
This section discusses these topics:
Full Table Scans A full table scan retrieves rows from a table. To perform a full table scan, Oracle reads all rows in the table, examining each row to determine whether it satisfies the statement's WHERE clause. Oracle reads every data block allocated to the table sequentially, so a full table scan can be performed very efficiently using multiblock reads. Oracle reads each data block only once.
Table Access by ROWID A table access by ROWID also retrieves rows from a table. The ROWID of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its ROWID is the fastest way for Oracle to find a single row.
To access a table by ROWID, Oracle first obtains the ROWIDs of the selected rows, either from the statement's WHERE clause or through an index scan of one of more of the table's indexes. Oracle then locates each selected row in the table based on its ROWID.
Cluster Scans From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the ROWID of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this ROWID.
Hash Scans Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.
Index Scans An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, Oracle reads the indexed column values directly from the index, rather than from the table.
In addition to each indexed value, an index also contains the ROWIDs of rows in the table having that value. If the statement accesses other columns in addition to the indexed columns, Oracle then finds the rows in the table with a table access by ROWID or a cluster scan.
An index scan can be one of these types:
A unique scan of an index returns only a single ROWID. Oracle can only perform a unique scan in cases in which only a single ROWID, rather than many ROWIDs, is required. For example, Oracle performs a unique scan if there is a UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses only a single row.
A range scan of an index can return one or more ROWIDs depending on how many rows are accessed by the statement.
|1||Single row by ROWID|
|2||Single row by cluster join|
|3||Single row by hash cluster key with unique or primary key|
|4||Single row by unique or primary key|
|6||Hash cluster key|
|7||Indexed cluster key|
|10||Bounded range search on indexed columns|
|11||Unbounded range search on indexed columns|
|13||MAX or MIN of indexed column|
|14||ORDER BY on indexed columns|
|15||Full table scan|
The order of the conditions in the WHERE clause does not normally affect the optimizer's choice among access paths.
Consider this SQL statement that selects the employee numbers of all employees in the EMP table with an ENAME value of 'CHUNG' and with a SAL value greater than 2000:
WHERE ename = 'CHUNG'
AND sal > 2000;
Consider also that the EMP table has these integrity constraints and indexes:
Using the rule-based approach, the optimizer chooses the access path that uses the ENAME_IND index to execute this statement. The optimizer chooses this path because it is the most highly ranked path available.
Choosing Among Access Paths with the Cost-Based Approach With the cost-based approach, the optimizer chooses an access path based on these factors:
The optimizer's choice among available access paths can be overridden with hints. For information on hints, see Oracle7 Server Tuning.
To choose among available access paths, the optimizer considers these factors:
Consider this query that uses an equality condition in its WHERE clause to select all employees name Jackson:
FROM emp WHERE ename = 'JACKSON';
If the ENAME column is a unique or primary key, the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key (access path 4).
Consider again the query in the previous example. If the ENAME column is not a unique or primary key, the optimizer can use these statistics to estimate the query's selectivity:
This statistic is the number of values for each column in the table.
This statistic is the number of rows in each table.
By dividing the number of rows in the EMP table by the number of distinct values in the ENAME column, the optimizer estimates what percentage of employees have the same name. By assuming that the ENAME values are uniformly distributed, the optimizer uses this percentage as the estimated selectivity of the query.
Consider this query that selects all employees with employee ID numbers less than 7500:
WHERE empno < 7500;
To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the EMPNO column if available. These statistics can be found in the USER_TAB_COLUMNS view. The optimizer assumes that EMPNO values are evenly distributed in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.
Copyright © 1996 Oracle Corporation.
All Rights Reserved.