Oracle8 Concepts
Release 8.0

A58227-01

Library

Product

Contents

Index

Prev Next

20
The Optimizer

I do the very best I know how - the very best I can; and I mean to keep doing so until the end.

Abraham Lincoln

This chapter discusses how the Oracle optimizer chooses how to execute SQL statements. It includes:

What Is Optimization?

Optimization is the process of choosing the most efficient way to execute a SQL statement. This is an important step in the processing of any data manipulation language (DML) statement: SELECT, INSERT, UPDATE, or DELETE. Many different ways to execute a SQL statement often exist, for example, by varying the order in which tables or indexes are accessed. The procedure Oracle uses to execute a statement can greatly affect how quickly the statement executes.

A part of Oracle called the optimizer chooses what it believes to be the most efficient way. The optimizer evaluates a number of factors to select among alternative access paths. Sometimes the application designer, who has more information about a particular application's data than is available to the optimizer, can choose a more effective way to execute a SQL statement. The application designer can use hints in SQL statements to specify how the statement should be executed.


Note:

The optimizer may not make the same decisions from one version of Oracle to the next. In more recent versions, the optimizer may make different decisions based on better, more sophisticated information available to it.

 

Additional Information:

See Oracle8 Tuning for information about using hints in SQL statements.

 

Execution Plans

To execute a DML statement, Oracle may have to perform many steps. Each of these steps either retrieves rows of data physically from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan.

Figure 20-1 shows a graphical representation of the execution plan for the following SQL statement, which selects the name, job, salary, and department name for all employees whose salaries do not fall into a recommended salary range:

SELECT ename, job, sal, dname 
    FROM emp, dept 
  WHERE emp.deptno = dept.deptno 
    AND NOT EXISTS 
      (SELECT * 
           FROM salgrade 
         WHERE emp.sal BETWEEN losal AND hisal); 

Figure 20-1 An Execution Plan

Steps of Execution Plan

Each step of the execution plan returns a set of rows that either are used by the next step or, in the last step, are returned to the user or application issuing the SQL statement. A set of rows returned by a step is called a row source.

Figure 20-1 is a hierarchical diagram showing the flow of row sources from one step to another. The numbering of the steps reflects the order in which they are displayed in response to the EXPLAIN PLAN command (described in the next section). This generally is not the order in which the steps are executed (see "Execution Order" on page 20-5).

Each step of the execution plan either retrieves rows from the database or accepts rows from one or more row sources as input:

Access paths are discussed further in the section "Choosing Access Paths" on page 20-43. Methods by which Oracle joins row sources are discussed in "Join Operations" on page 20-67.

The EXPLAIN PLAN Command

You can examine the execution plan chosen by the optimizer for a SQL statement by using the EXPLAIN PLAN command, which causes the optimizer to choose the execution plan and then inserts data describing the plan into a database table.

For example, the following output table is such a description for the statement examined in the previous section:

ID    OPERATION           OPTIONS        OBJECT_NAME 
------------------------------------------------------------ 
0     SELECT STATEMENT 
1       FILTER 
2         NESTED LOOPS 
3           TABLE ACCESS  FULL           EMP 
4           TABLE ACCESS  BY ROWID       DEPT 
5             INDEX       UNIQUE SCAN    PK_DEPTNO 
6           TABLE ACCESS  FULL           SALGRADE 

Each box in Figure 20-1 and each row in the output table corresponds to a single step in the execution plan. For each row in the listing, the value in the ID column is the value shown in the corresponding box in Figure 20-1.

You can obtain such a listing by using the EXPLAIN PLAN command and then querying the output table.

Additional Information:

See Oracle8 Tuning for information on how to use EXPLAIN PLAN and produce and interpret its output.

 

Execution Order

The steps of the execution plan are not performed in the order in which they are numbered. Rather, Oracle first performs the steps that appear as leaf nodes in the tree-structured graphical representation of the execution plan (Steps 3, 5, and 6 in Figure 20-1). The rows returned by each step become the row sources of its parent step. Then Oracle performs the parent steps.

To execute the statement for Figure 20-1, for example, Oracle performs the steps in this order:

Note that Oracle performs Steps 5, 4, 2, 6, and 1 once for each row returned by Step 3. If a parent step requires only a single row from its child step before it can be executed, Oracle performs the parent step (and possibly the rest of the execution plan) as soon as a single row has been returned from the child step. If the parent of that parent step also can be activated by the return of a single row, then it is executed as well.

Thus the execution can cascade up the tree, possibly to encompass the rest of the execution plan. Oracle performs the parent step and all cascaded steps once for each row in turn retrieved by the child step. The parent steps that are triggered for each row returned by a child step include table accesses, index accesses, nested loops joins, and filters.

If a parent step requires all rows from its child step before it can be executed, Oracle cannot perform the parent step until all rows have been returned from the child step. Such parent steps include sorts, sort-merge joins, group functions, and aggregates.

Cost-Based and Rule-Based Optimization

To choose an execution plan for a SQL statement, the optimizer uses one of two approaches: cost-based or rule-based.

The Cost-Based Approach

Using the cost-based approach, the optimizer determines which execution plan is most efficient by considering available access paths and factoring in information based on statistics in the data dictionary for the schema objects (tables, clusters, or indexes) accessed by the statement. The cost-based approach also considers hints, or optimization suggestions placed in a Comment in the statement.

Conceptually, the cost-based approach consists of these steps:

  1. The optimizer generates a set of potential execution plans for the statement based on its available access paths and hints.
  2. The optimizer estimates the cost of each execution plan based on the data distribution and storage characteristics statistics for the tables, clusters, and indexes in the data dictionary.

    The cost is an estimated value proportional to the expected resource use needed to execute the statement using the execution plan. The optimizer calculates the cost based on the estimated computer resources, including (but not limited to) I/O, CPU time, and memory, that are required to execute the statement using the plan.

    Serial execution plans with greater costs take more time to execute than those with smaller costs. When using a parallel execution plan, however, resource use is not directly related to elapsed time.

  3. The optimizer compares the costs of the execution plans and chooses the one with the smallest cost.

Goal of the Cost-Based Approach

By default, the goal of the cost-based approach is the best throughput, or minimal resource use necessary to process all rows accessed by the statement.

Oracle can also optimize a statement with the goal of best response time, or minimal resource use necessary to process the first row accessed by a SQL statement. For information on how the optimizer chooses an optimization approach and goal, see "Choosing an Optimization Approach and Goal" on page 20-41.


Note:

For parallel execution, the optimizer can choose to minimize elapsed time at the expense of resource consumption. Use the initialization parameter OPTIMIZER_PERCENT_PARALLEL to specify how much the optimizer attempts to parallelize.

 

Additional Information:

See Oracle8 Tuning for information about using the OPTIMIZER_PERCENT_PARALLEL parameter.

 

Statistics for the Cost-Based Approach

The cost-based approach uses statistics to estimate the cost of each execution plan. These statistics quantify the data distribution and storage characteristics of tables, columns, indexes, and partitions. You can generate these statistics using the ANALYZE command. The optimizer uses these statistics to estimate how much I/O, CPU time, and memory are required to execute a SQL statement using a particular execution plan.

You can view the statistics with these data dictionary views:

Histograms

Oracle's cost-based optimizer uses data value histograms to get accurate estimates of the distribution of column data. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions. You generate histograms by using the ANALYZE command.

One of the fundamental capabilities of any cost-based optimizer is determining the selectivity of predicates that appear in queries. Selectivity estimates are used to decide when to use an index and the order in which to join tables. Most attribute domains (a table's columns) are not uniformly distributed. The Oracle cost-based optimizer uses height-balanced histograms on specified attributes to describe the distributions of nonuniform domains.

Histogram Examples

Consider a column C with values between 1 and 100 and a histogram with 10 buckets. If the data in C is uniformly distributed, this histogram would look like this, where the numbers are the endpoint values.

The number of rows in each bucket is one tenth the total number of rows in the table. Four-tenths of the rows have values between 60 and 100 in this example of uniform distribution.

If the data is not uniformly distributed, the histogram might look like this:

In this case, most of the rows have the value 5 for the column. In this example, only 1/10 of the rows have values between 60 and 100.

Height-Balanced Histograms

Oracle uses height-balanced histograms (as opposed to width-balanced).

For example, suppose that the values in a single column of a 1000-row table range between 1 and 100, and suppose that you want a 10-bucket histogram (ranges in a histogram are called buckets). In a width-balanced histogram, the buckets would be of equal width (1-10, 11-20, 21-30, and so on) and each bucket would count the number of rows that fall into that bucket's range. In a height-balanced histogram, each bucket has the same height (in this case 100 rows) and the endpoints for each bucket are determined by the density of the distinct values in the column.

Advantages of Height-Balanced Histograms

The advantage of the height-balanced approach is clear when the data is highly skewed. Suppose that 800 rows of a 1000-row table have the value 5, and the remaining 200 rows are evenly distributed between 1 and 100. A width-balanced histogram would have 820 rows in the bucket labeled 1-10 and approximately 20 rows in each of the other buckets. The height-based histogram would have one bucket labeled 1-5, seven buckets labeled 5-5, one bucket labeled 5-50, and one bucket labeled 50-100.

If you want to know how many rows in the table contain the value 5, it is apparent from the height-balanced histogram that approximately 80% of the rows contain this value. However, the width-balanced histogram does not provied a mechanism for differentiating between the value 5 and the value 6. You would compute only 8% of the rows contain the value 5 in a width-balanced histogram. Therefore height-based histograms are more appropriate for determining the selectivity of column values.

When to Use Histograms

For many users, it is appropriate to use the FOR ALL INDEXED COLUMNS option of the ANALYZE command to create histograms because indexed columns are typically the columns most often used in WHERE clauses.

You can view histograms with the following views:

Histograms are useful only when they reflect the current data distribution of a given column. If the data distribution is not static, the histogram should be updated frequently. (The data need not be static as long as the distribution remains constant.)

Histograms can affect performance and should be used only when they substantially improve query plans. Histograms are not useful for columns with the following characteristics:

When to Use the Cost-Based Approach

In general, you should use the cost-based approach for all new applications; the rule-based approach is provided for applications that were written before cost-based optimization was available. Cost-based optimization can be used for both relational data and object types.

The following features can only use cost-based optimization:

The Rule-Based Approach

Using the rule-based approach, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths (shown in Table 20-1 "Access Paths" on page 20-47). You can use rule-based optimization to access both relational data and object types.

Oracle's ranking of the access paths is heuristic. If there is more than one way to execute a SQL statement, the rule-based approach always uses the operation with the lower rank. Usually, operations of lower rank execute faster than those associated with constructs of higher rank.

For more information, see "Choosing Among Access Paths with the Rule-Based Approach" on page 20-66.

Overview of Optimizer Operations

This section summarizes the operations performed by the Oracle optimizer and describes the types of SQL statements that can be optimized.

Optimizer Operations

For any SQL statement processed by Oracle, the optimizer does the following:

evaluation of expressions and conditions  

The optimizer first evaluates expressions and conditions containing constants as fully as possible. (See "Evaluation of Expressions and Conditions" on page 20-14.)  

statement transformation  

For a complex statement involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement. (See "Transforming and Optimizing Statements" on page 20-20.)  

view merging  

For a SQL statement that accesses a view, the optimizer often merges the query in the statement with that in the view and then optimizes the result. (See "Optimizing Statements That Access Views" on page 20-25.)  

choice of optimization approaches  

The optimizer chooses either a cost-based or rule-based approach to optimization and determines the goal of optimization. (See "Choosing an Optimization Approach and Goal" on page 20-41.)  

choice of access paths  

For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data. (See "Choosing Access Paths" on page 20-43.)  

choice of join orders  

For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, and so on. (See "Optimizing Join Statements" on page 20-67.)  

choice of join operations  

For any join statement, the optimizer chooses an operation to use to perform the join. (See "Optimizing Join Statements" on page 20-67.)  

Types of SQL Statements

Oracle optimizes these different types of SQL statements:

simple statement  

An INSERT, UPDATE, DELETE, or SELECT statement that involves only a single table.  

simple query  

Another name for a SELECT statement.  

join  

A query that selects data from more than one table. A join is characterized by multiple tables in the FROM clause. Oracle pairs the rows from these tables using the condition specified in the WHERE clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.  

equijoin  

A join condition containing an equality operator.  

nonequijoin  

A join condition containing something other than an equality operator.  

outer join  

A join condition using the outer join operator (+) with one or more columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.  

Cartesian product  

A join with no join condition results in a Cartesian product, or a cross product. A Cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A Cartesian product for more than two tables is the result of pairing each row of one table with every row of the Cartesian product of the remaining tables.

All other kinds of joins are subsets of Cartesian products effectively created by deriving the Cartesian product and then excluding rows that fail the join condition.  

complex statement  

An INSERT, UPDATE, DELETE, or SELECT statement that contains a subquery, which is a form of the SELECT statement within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.  

compound query  

A query that uses set operators (UNION, UNION ALL, INTERSECT, or MINUS) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.  

statement accessing views  

Simple, join, complex, or compound statement that accesses one or more views as well as tables.  

distributed statement  

A statement that accesses data on a remote database.  

Evaluation of Expressions and Conditions

The optimizer fully evaluates expressions whenever possible and translates certain syntactic constructs into equivalent constructs. The reason for this is either that Oracle can more quickly evaluate the resulting expression than the original expression, or that the original expression is merely a syntactic equivalent of the resulting expression. Different SQL constructs can sometimes operate identically (for example, = ANY (subquery) and IN (subquery)); Oracle maps these to a single construct.

Constants

Computation of constants is performed only once, when the statement is optimized, rather than each time the statement is executed.

Consider these conditions that test for monthly salaries greater than 2000:

sal > 24000/12 

sal > 2000 

sal*12 > 24000 

If a SQL statement contains the first condition, the optimizer simplifies it into the second condition.

Note that the optimizer does not simplify expressions across comparison operators: in the examples above, the optimizer does not simplify the third expression into the second. For this reason, application developers should write conditions that compare columns with constants whenever possible, rather than conditions with expressions involving columns.

LIKE Operator

The optimizer simplifies conditions that use the LIKE comparison operator to compare an expression with no wildcard characters into an equivalent condition that uses an equality operator instead. For example, the optimizer simplifies the first condition below into the second:

ename LIKE 'SMITH' 

ename = 'SMITH' 

The optimizer can simplify these expressions only when the comparison involves variable-length datatypes. For example, if ENAME was of type CHAR(10), the optimizer cannot transform the LIKE operation into an equality operation due to the equality operator following blank-padded semantics and LIKE not following blank-padded semantics.

IN Operator

The optimizer expands a condition that uses the IN comparison operator to an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

ename IN ('SMITH', 'KING', 'JONES') 

ename = 'SMITH' OR ename = 'KING' OR ename = 'JONES' 

See "Example 2: IN Subquery" on page 20-28 for more information.

ANY or SOME Operator

The optimizer expands a condition that uses the ANY or SOME comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

sal > ANY (:first_sal, :second_sal) 

sal > :first_sal OR sal > :second_sal 

The optimizer transforms a condition that uses the ANY or SOME operator followed by a subquery into a condition containing the EXISTS operator and a correlated subquery. For example, the optimizer transforms the first condition below into the second:

x > ANY (SELECT sal 
    FROM emp 
    WHERE job = 'ANALYST') 

EXISTS (SELECT sal 
    FROM emp 
    WHERE job = 'ANALYST'  
      AND x > sal) 

ALL Operator

The optimizer expands a condition that uses the ALL comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and AND logical operators. For example, the optimizer expands the first condition below into the second:

sal > ALL (:first_sal, :second_sal) 

sal > :first_sal AND sal > :second_sal 

The optimizer transforms a condition that uses the ALL comparison operator followed by a subquery into an equivalent condition that uses the ANY comparison operator and a complementary comparison operator. For example, the optimizer transforms the first condition below into the second:

x > ALL (SELECT sal 
    FROM emp 
    WHERE deptno = 10) 

NOT (x <= ANY (SELECT sal 
      FROM emp 
      WHERE deptno = 10) ) 

The optimizer then transforms the second query into the following query using the rule for transforming conditions with the ANY comparison operator followed by a correlated subquery:

NOT EXISTS (SELECT sal 
      FROM emp 
      WHERE deptno = 10 
        AND x <= sal) 

BETWEEN Operator

The optimizer always replaces a condition that uses the BETWEEN comparison operator with an equivalent condition that uses the >= and <= comparison operators. For example, the optimizer replaces the first condition below with the second:

sal BETWEEN 2000 AND 3000 

sal >= 2000 AND sal <= 3000 

NOT Operator

The optimizer simplifies a condition to eliminate the NOT logical operator. The simplification involves removing the NOT logical operator and replacing a comparison operator with its opposite comparison operator. For example, the optimizer simplifies the first condition below into the second one:

NOT deptno = (SELECT deptno FROM emp WHERE ename = 'TAYLOR') 

deptno <> (SELECT deptno FROM emp WHERE ename = 'TAYLOR') 

Often a condition containing the NOT logical operator can be written many different ways. The optimizer attempts to transform such a condition so that the subconditions negated by NOTs are as simple as possible, even if the resulting condition contains more NOTs. For example, the optimizer simplifies the first condition below into the second and dhen into the third.

NOT (sal < 1000 OR comm IS NULL) 

NOT sal < 1000 AND comm IS NOT NULL 

sal >= 1000 AND comm IS NOT NULL 

Transitivity

If two conditions in the WHERE clause involve a common column, the optimizer can sometimes infer a third condition using the transitivity principle. The optimizer can then use the inferred condition to optimize the statement. The inferred condition could potentially make available an index access path that was not made available by the original conditions.


Note:

Transitivity is used only by the cost-based approach.

 

Imagine a WHERE clause containing two conditions of these forms:

WHERE column1 comp_oper constant 
  AND column1 = column2 

In this case, the optimizer infers the condition:

column2 comp_oper constant 

where:

comp_oper  

is any of the comparison operators =, !=, ^=, <, <>, >, <=, or >=.  

constant  

is any constant expression involving operators, SQL functions, literals, bind variables, and correlation variables.  

Example:

Consider this query in which the WHERE clause contains two conditions, each or which uses the EMP.DEPTNO column:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = 20 
    AND emp.deptno = dept.deptno; 

Using transitivity, the optimizer infers this condition:

dept.deptno = 20 

If an index exists on the DEPT.DEPTNO column, this condition makes available access paths using that index.


Note:

The optimizer only infers conditions that relate columns to constant expressions, rather than columns to other columns. Imagine a WHERE clause containing two conditions of these forms:

WHERE column1 comp_oper column3 
  AND column1 = column2 

In this case, the optimizer does not infer this condition:

column2 comp_oper column3 
 

Transforming and Optimizing Statements

SQL is a very flexible query language; there are often many statements you could formulate to achieve the same goal. Sometimes the optimizer transforms one such statement into another that achieves the same goal if the second statement can be executed more efficiently.

This section discusses the following topics:

For additional information about optimizing statements, see "Optimizing Join Statements" on page 20-67 and "Optimizing "Star" Queries" on page 20-80.

Transforming ORs into Compound Queries

If a query contains a WHERE clause with multiple conditions combined with OR operators, the optimizer transforms it into an equivalent compound query that uses the UNION ALL set operator if this makes it execute more efficiently:

For information on access paths and how indexes make them available, see Table 20-1 "Access Paths" on page 20-47 and the sections that follow it.

Example:

Consider this query with a WHERE clause that contains two conditions combined with an OR operator:

SELECT * 
  FROM emp 
  WHERE job = 'CLERK' 
    OR deptno = 10; 

If there are indexes on both the JOB and DEPTNO columns, the optimizer may transform this query into the equivalent query below:

SELECT * 
  FROM emp 
  WHERE job = 'CLERK' 
UNION ALL 
SELECT * 
  FROM emp 
  WHERE deptno = 10 
    AND job <> 'CLERK'; 

If you are using the cost-based approach, the optimizer compares the cost of executing the original query using a full table scan with that of executing the resulting query when deciding whether to make the transformation.

If you are using the rule-based approach, the optimizer makes this UNION ALL transformation because each component query of the resulting compound query can be executed using an index. The rule-based approach assumes that executing the compound query using two index scans is faster than executing the original query using a full table scan.

The execution plan for the transformed statement might look like the illustration in Figure 20-2:

Figure 20-2 Execution Plan for a Transformed Query Containing OR

To execute the transformed query, Oracle performs the following steps:

If either of the JOB or DEPTNO columns is not indexed, the optimizer does not even consider the transformation, because the resulting compound query would require a full table scan to execute one of its component queries. Executing the compound query with a full table scan in addition to an index scan could not possibly be faster than executing the original query with a full table scan.

Example:

Consider this query and assume that there is an index on the ENAME column only:

SELECT * 
  FROM emp 
  WHERE ename = 'SMITH' 
    OR sal > comm; 

Transforming the query above would result in the compound query below:

SELECT * 
  FROM emp 
  WHERE ename = 'SMITH' 
UNION ALL 
SELECT * 
  FROM emp 
  WHERE sal > comm; 

Since the condition in the WHERE clause of the second component query (SAL > COMM) does not make an index available, the compound query requires a full table scan. For this reason, the optimizer does not make the transformation and it chooses a full table scan to execute the original statement.

Transforming Complex Statements into Join Statements

To optimize a complex statement, the optimizer chooses one of these alternatives:

The optimizer transforms a complex statement into a join statement whenever the resulting join statement is guaranteed to return exactly the same rows as the complex statement. This transformation allows Oracle to execute the statement by taking advantage of join optimization techniques described in "Optimizing Join Statements" on page 20-67.

Consider this complex statement that selects all rows from the ACCOUNTS table whose owners appear in the CUSTOMERS table:

SELECT * 
  FROM accounts 
  WHERE custno IN 
    (SELECT custno FROM customers); 

If the CUSTNO column of the CUSTOMERS table is a primary key or has a UNIQUE constraint, the optimizer can transform the complex query into this join statement that is guaranteed to return the same data:

SELECT accounts.* 
  FROM accounts, customers 
  WHERE accounts.custno = customers.custno; 

The execution plan for this statement might look like Figure 20-3.

Figure 20-3 Execution Plan for a Nested Loops Join

To execute this statement, Oracle performs a nested-loops join operation. For information on nested loops joins, see "Join Operations" on page 20-67.

If the optimizer cannot transform a complex statement into a join statement, the optimizer chooses execution plans for the parent statement and the subquery as though they were separate statements. Oracle then executes the subquery and uses the rows it returns to execute the parent query.

Consider this complex statement that returns all rows from the ACCOUNTS table that have balances greater than the average account balance:

SELECT * 
  FROM accounts 
  WHERE accounts.balance > 
    (SELECT AVG(balance) FROM accounts); 

No join statement can perform the function of this statement, so the optimizer does not transform the statement. Note that complex queries whose subqueries contain group functions such as AVG cannot be transformed into join statements.

Optimizing Statements That Access Views

To optimize a statement that accesses a view, the optimizer chooses one of these alternatives:

Merging the View's Query into the Statement

To merge the view's query into a referencing query block in the accessing statement, the optimizer replaces the name of the view with the names of its base tables in the query block and adds the condition of the view's query's WHERE clause to the accessing query block's WHERE clause.

This optimization applies to select-project-join views, which are views that contain only selections, projections, and joins - that is, views that do not contain set operators, group functions, DISTINCT, GROUP BY, CONNECT BY, and so on (as described in "Mergeable and Unmergeable Views" on page 20-26).

Example:

Consider this view of all employees who work in department 10:

CREATE VIEW emp_10 
  AS SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp 
    WHERE deptno = 10; 

Consider this query that accesses the view. The query selects the IDs greater than 7800 of employees who work in department 10:

SELECT empno 
  FROM emp_10 
  WHERE empno > 7800; 

The optimizer transforms the query into the following query that accesses the view's base table:

SELECT empno 
  FROM emp 
  WHERE deptno = 10 
    AND empno > 7800; 

If there are indexes on the DEPTNO or EMPNO columns, the resulting WHERE clause makes them available.

Mergeable and Unmergeable Views

The optimizer can merge a view into a referencing query block when the view has one or more base tables, provided the view does not contain:

When a view contains one of the following structures, it can be merged into a referencing query block only if complex view merging is enabled (as described below):

View merging is not possible for a view that has multiple base tables if it is on the right side of an outer join. If a view on the right side of an outer join has only one base table, however, the optimizer can use complex view merging even if an expression in the view can return a non-null value for a NULL. See "Views in Outer Joins" on page 20-77 for more information.

Complex View Merging

If a view's query contains a GROUP BY clause or DISTINCT operator in the select list, then the optimizer can merge the view's query into the accessing statement only if complex view merging is enabled. Complex merging can also be used to merge an IN subquery into the accessing statement, if the subquery is uncorrelated (see "Example 2: IN Subquery" on page 20-28).

Complex merging is not cost-based - it must be enabled with the initialization parameter COMPLEX_VIEW_MERGING or the MERGE hint, that is, either the COMPLEX_VIEW_MERGING parameter must be set to TRUE or the accessing query block must include the MERGE hint. Without this hint or parameter setting, the optimizer uses another approach (see "Pushing the Predicate into the View" on page 20-28).

Additional Information:

See Oracle8 Tuning for details about the MERGE and NO_MERGE hints.

 

Example 1: View with a GROUP BY Clause

Consider the view AVG_SALARY_VIEW, which contains the average salaries for each department:

CREATE VIEW avg_salary_view AS 
  SELECT deptno, AVG(sal) AS avg_sal_dept, 
    FROM emp 
    GROUP BY deptno; 

If complex view merging is enabled then the optimizer can transform this query, which finds the average salaries of departments in London:

SELECT dept.deptloc, avg_sal_dept 
  FROM dept, avg_salary_view 
  WHERE dept.deptno = avg_salary_view.deptno 
    AND dept.deptloc = 'London'; 

into this query:

SELECT dept.deptloc, AVG(sal) 
  FROM dept, emp 
  WHERE dept.deptno = emp.deptno 
    AND dept.deptloc = 'London' 
  GROUP BY dept.rowid, dept.deptloc; 

The transformed query accesses the view's base table, selecting only the rows of employees who work in London and grouping them by department.

Example 2: IN Subquery

Complex merging can be used for an IN clause with a noncorrelated subquery, as well as for views. Consider the view MIN_SALARY_VIEW, which contains the minimum salaries for each department:

SELECT deptno, MIN(sal) 
  FROM emp 
  GROUP BY deptno; 

If complex merging is enabled then the optimizer can transform this query, which finds all employees who earn the minimum salary for their department in London:

SELECT emp.ename, emp.sal 
  FROM emp, dept 
  WHERE (emp.deptno, emp.sal) IN min_salary_view 
    AND emp.deptno = dept.deptno 
    AND dept.deptloc = 'London'; 

into this query (where E1 and E2 represent the EMP table as it is referenced in the accessing query block and the view's query block, respectively):

SELECT e1.ename, e1.sal 
  FROM emp e1, dept, emp e2 
  WHERE e1.deptno = dept.deptno 
    AND dept.deptloc = 'London' 
    AND e1.deptno = e2.deptno 
  GROUP BY e1.rowid, dept.rowid, e1.ename, e1.sal 
  HAVING e1.sal = MIN(e2.sal); 

Pushing the Predicate into the View

The optimizer can transform a query block that accesses an unmergeable view by pushing the query block's predicates inside the view's query.

Example 1:

Consider the TWO_EMP_TABLES view, which is the union of two employee tables. The view is defined with a compound query that uses the UNION set operator:

CREATE VIEW two_emp_tables 
  (empno, ename, job, mgr, hiredate, sal, comm, deptno) AS 
  SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp1 
  UNION  
  SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp2; 

Consider this query that accesses the view. The query selects the IDs and names of all employees in either table who work in department 20:

SELECT empno, ename 
  FROM two_emp_tables 
  WHERE deptno = 20; 

Because the view is defined as a compound query, the optimizer cannot merge the view's query into the accessing query block. Instead, the optimizer can transform the accessing statement by pushing its predicate, the WHERE clause condition (DEPTNO = 20), into the view's compound query.

The resulting statement looks like this:

SELECT empno, ename 
  FROM ( SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
           FROM emp1 
           WHERE deptno = 20 
         UNION 
         SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
           FROM emp2 
           WHERE deptno = 20 ); 

If there is an index on the DEPTNO column, the resulting WHERE clauses make it available.

Figure 20-4, "Accessing a View Defined with the UNION Set Operator", shows the execution plan of the resulting statement.

Figure 20-4 Accessing a View Defined with the UNION Set Operator

To execute this statement, Oracle performs these steps:

Example 2:

Consider the view EMP_GROUP_BY_DEPTNO, which contains the department number, average salary, minimum salary, and maximum salary of all departments that have employees:

CREATE VIEW emp_group_by_deptno 
  AS SELECT deptno, 
         AVG(sal) avg_sal, 
         MIN(sal) min_sal, 
         MAX(sal) max_sal 
    FROM emp 
    GROUP BY deptno; 

Consider this query, which selects the average, minimum, and maximum salaries of department 10 from the EMP_GROUP_BY_DEPTNO view:

SELECT * 
  FROM emp_group_by_deptno 
  WHERE deptno = 10; 

The optimizer transforms the statement by pushing its predicate (the WHERE clause condition) into the view's query. The resulting statement looks like this:

SELECT deptno, 
    AVG(sal) avg_sal, 
    MIN(sal) min_sal, 
    MAX(sal) max_sal, 
  FROM emp 
  WHERE deptno = 10 
  GROUP BY deptno; 

If there is an index on the DEPTNO column, the resulting WHERE clause makes it available.

Figure 20-5, "Accessing a View Defined with a GROUP BY Clause", shows the execution plan for the resulting statement. The execution plan uses an index on the DEPTNO column.

Figure 20-5 Accessing a View Defined with a GROUP BY Clause

To execute this statement, Oracle performs these operations:

Applying a Group Function to the View

The optimizer can transform a query that contains a group function (AVG, COUNT, MAX, MIN, SUM) by applying the function to the view's query.

Example:

Consider a query that accesses the EMP_GROUP_BY_DEPTNO view defined in the previous example. This query derives the averages for the average department salary, the minimum department salary, and the maximum department salary from the employee table:

SELECT AVG(avg_sal), AVG(min_sal), AVG(max_sal) 
    FROM emp_group_by_deptno; 

The optimizer transforms this statement by applying the AVG group function to the select list of the view's query:

SELECT AVG(AVG(sal)), AVG(MIN(sal)), AVG(MAX(sal)) 
    FROM emp 
    GROUP BY deptno; 

Figure 20-6 shows the execution plan of the resulting statement.

Figure 20-6 Applying Group Functions to a View Defined with GROUP BY Clause

To execute this statement, Oracle performs these operations:

Accessing the View's Rows with the Original Statement

The optimizer cannot transform all statements that access views into equivalent statements that access base table(s). For example, if a query accesses a ROWNUM pseudocolumn in a view, the view cannot be merged into the query and the query's predicate cannot be pushed into the view.

To execute a statement that cannot be transformed into one that accesses base tables, Oracle issues the view's query, collects the resulting set of rows, and then accesses this set of rows with the original statement as though it were a table.

Example:

Consider the EMP_GROUP_BY_DEPTNO view defined in the previous section:

CREATE VIEW emp_group_by_deptno 
  AS SELECT deptno, 
        AVG(sal) avg_sal,
        MIN(sal) min_sal, 
        MAX(sal) max_sal 
    FROM emp 
    GROUP BY deptno; 

Consider this query, which accesses the view. The query joins the average, minimum, and maximum salaries from each department represented in this view and to the name and location of the department in the DEPT table:

SELECT emp_group_by_deptno.deptno, avg_sal, min_sal,
  max_sal, dname, loc 
    FROM emp_group_by_deptno, dept 
    WHERE emp_group_by_deptno.deptno = dept.deptno; 

Since there is no equivalent statement that accesses only base tables, the optimizer cannot transform this statement. Instead, the optimizer chooses an execution plan that issues the view's query and then uses the resulting set of rows as it would the rows resulting from a table access.

Figure 20-7, "Joining a View Defined with a GROUP BY Clause to a Table", shows the execution plan for this statement. For more information on how Oracle performs a nested loops join operation, see "Join Operations" on page 20-67.

Figure 20-7 Joining a View Defined with a GROUP BY Clause to a Table

To execute this statement, Oracle performs these operations:

Optimizing Compound Queries

To choose the execution plan for a compound query, the optimizer chooses an execution plan for each of its component queries and then combines the resulting row sources with the union, intersection, or minus operation, depending on the set operator used in the compound query.

Figure 20-8, "Compound Query with UNION ALL Set Operator", shows the execution plan for this statement, which uses the UNION ALL operator to select all occurrences of all parts in either the ORDERS1 table or the ORDERS2 table:

SELECT part FROM orders1 
UNION ALL 
SELECT part FROM orders2; 

Figure 20-8 Compound Query with UNION ALL Set Operator

To execute this statement, Oracle performs these steps:

Figure 20-9, "Compound Query with UNION Set Operator", shows the execution plan for the following statement, which uses the UNION operator to select all parts that appear in either the ORDERS1 or ORDERS2 table:

SELECT part FROM orders1 
UNION 
SELECT part FROM orders2; 

Figure 20-9 Compound Query with UNION Set Operator

This execution plan is identical to the one for the UNION-ALL operator shown in Figure 20-8 "Compound Query with UNION ALL Set Operator", except that in this case Oracle uses the SORT operation to eliminate the duplicates returned by the UNION-ALL operation.

Figure 20-10, "Compound Query with INTERSECT Set Operator", shows the execution plan for this statement, which uses the INTERSECT operator to select only those parts that appear in both the ORDERS1 and ORDERS2 tables:

SELECT part FROM orders1 
INTERSECT 
SELECT part FROM orders2; 

Figure 20-10 Compound Query with INTERSECT Set Operator

To execute this statement, Oracle performs these steps:

Optimizing Distributed Statements

The optimizer chooses execution plans for SQL statements that access data on remote databases in much the same way it chooses executions for statements that access only local data:

When choosing a cost-based execution plan for a distributed statement, the optimizer considers the available indexes on remote databases just as it does indexes on the local database. The optimizer also considers statistics on remote databases for cost-based optimization. Furthermore, the optimizer considers the location of data when estimating the cost of accessing it. For example, a full scan of a remote table has a greater estimated cost than a full scan of an identical local table.

For a rule-based execution plan, the optimizer does not consider indexes on remote tables.

Choosing an Optimization Approach and Goal

The optimizer's behavior when choosing an optimization approach and goal for a SQL statement is affected by these factors:

The OPTIMIZER_MODE Initialization Parameter

The OPTIMIZER_MODE initialization parameter establishes the default behavior for choosing an optimization approach for the instance. This parameter can have these values:

CHOOSE  

The optimizer chooses between a cost-based approach and a rule-based approach based on whether the statistics are available for the cost-based approach. If the data dictionary contains statistics for at least one of the accessed tables, the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, the optimizer uses a rule-based approach. This is the default value for the parameter.  

ALL_ROWS  

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best throughput (minimum resource use to complete the entire statement).  

FIRST_ROWS  

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best response time (minimum resource use to return the first row of the result set).  

RULE  

The optimizer chooses a rule-based approach for all SQL statements issued to the instance regardless of the presence of statistics.  

If the optimizer uses the cost-based approach for a SQL statement and some tables accessed by the statement have no statistics, the optimizer uses internal information (such as the number of data blocks allocated to these tables) to estimate other statistics for these tables.

Statistics in the Data Dictionary

Oracle stores statistics about columns, tables, clusters, indexes, and partitions in the data dictionary for use by the cost-based optimizer (see "Statistics for the Cost-Based Approach" on page 20-7).

Two options of the ANALYZE command generate statistics:

The OPTIMIZER_GOAL Parameter of the ALTER SESSION Command

The OPTIMIZER_GOAL parameter of the ALTER SESSION command can override the optimization approach and goal established by the OPTIMIZER_MODE initialization parameter for an individual session.

The value of this parameter affects the optimization of SQL statements issued by stored procedures and functions called during the session, but it does not affect the optimization of recursive SQL statements that Oracle issues during the session. The optimization approach for recursive SQL statements is affected only by the value of the OPTIMIZER_MODE initialization parameter.

The OPTIMIZER_GOAL parameter can have these values:

CHOOSE  

The optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available for the cost-based approach. If the data dictionary contains statistics for at least one of the accessed tables, the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, the optimizer uses a rule-based approach.  

ALL_ROWS  

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best throughput (minimum resource use to complete the entire statement).  

FIRST_ROWS  

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best response time (minimum resource use to return the first row of the result set).  

RULE  

The optimizer chooses a rule-based approach for all SQL statements issued to the Oracle instance regardless of the presence of statistics.  

The FIRST_ROWS, ALL_ROWS, CHOOSE, and RULE Hints

A FIRST_ROWS, ALL_ROWS, CHOOSE, or RULE hint in an individual SQL statement can override the effects of both the OPTIMIZER_MODE initialization parameter and the OPTIMIZER_GOAL parameter of the ALTER SESSION command.

Additional Information:

See Oracle8 Tuning for information on how to use hints.

 

PL/SQL and the Optimizer Goal

The optimizer goal applies only to queries submitted directly, not queries submitted from within PL/SQL.

You can use hints to determine the access path for SQL statements submitted from within PL/SQL.

Choosing Access Paths

One of the most important choices the optimizer makes when formulating an execution plan is how to retrieve data from the database. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer chooses one of them.

This section discusses:

Access Methods

This section describes basic methods by which Oracle can access data.

Full Table Scans

A full table scan retrieves rows from a table. To perform a full table scan, Oracle reads all rows in the table, examining each row to determine whether it satisfies the statement's WHERE clause. Oracle reads every data block allocated to the table sequentially, so a full table scan can be performed very efficiently using multiblock reads. Oracle reads each data block only once.

Table Access by ROWID

A table access by ROWID also retrieves rows from a table. The ROWID of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its ROWID is the fastest way for Oracle to find a single row.

To access a table by ROWID, Oracle first obtains the ROWIDs of the selected rows, either from the statement's WHERE clause or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its ROWID.

Cluster Scans

From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the ROWID of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this ROWID.

Hash Scans

Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.

Index Scans

An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, Oracle reads the indexed column values directly from the index, rather than from the table.

The index contains not only the indexed value, but also the ROWIDs of rows in the table having that value. Therefore, if the statement accesses other columns in addition to the indexed columns, Oracle can find the rows in the table with a table access by ROWID or a cluster scan.

An index scan can be one of these types:

unique scan  

A unique scan of an index returns only a single ROWID. Oracle performs a unique scan only in cases in which a single ROWID is required, rather than many ROWIDs. For example, Oracle performs a unique scan if there is a UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses only a single row.  

range scan  

A range scan of an index can return zero or more ROWIDs depending on how many rows the statement accesses.  

full scan  

Full index scan is available if a predicate references one of the columns in the index. The predicate does not have to be an index driver. Full scan is also available when there is no predicate if all of the columns in the table referenced in the query are included in the index and at least one of the index columns is not nullable. Full scan can be used to eliminate a sort operation. It reads the blocks singly.  

fast full scan  

Fast full index scan is an alternative to a full table scan when the index contains all the columns that are needed for the query and at least one column in the index key has the NOT NULL constraint. Fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation. It reads the entire index using multiblock reads (unlike a full index scan) and can be parallelized.

Fast full scan is available only with cost-based optimization. You can specify it with the initialization parameter FAST_FULL_SCAN_ENABLED or the INDEX_FFS hint.  

bitmap  

Bitmap indexes use a bitmap for key values and a mapping function that converts each bit position to a ROWID. Bitmaps efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions. Bitmap access is available only with cost-based optimization. See "Bitmap Indexes" on page 8-23.  


Attention:

Bitmap indexes are available only if you have purchased the Oracle8 Enterprise Edition. See Getting to Know Oracle8 and the Oracle8 Enterprise Edition for more information.

 

Access Paths

Table 20-1 lists the data access paths. The optimizer can only choose to use a particular access path for a table if the statement contains a WHERE clause condition or other construct that makes that access path available.

Each of the following sections describes an access path and discusses:

Path 1: Single Row by ROWID

This access path is available only if the statement's WHERE clause identifies the selected rows by ROWID or with the CURRENT OF CURSOR embedded SQL syntax supported by the Oracle Precompilers. To execute the statement, Oracle accesses the table by ROWID.

Example:

This access path is available in the following statement:

SELECT * FROM emp WHERE ROWID = 'AAAA7bAA5AAAA1UAAA'; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          BY ROWID       EMP 

Path 2: Single Row by Cluster Join

This access path is available for statements that join tables stored in the same cluster if both of these conditions are true:

These conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation. (For information on the nested loops operation, see "Join Operations" on page 20-67.)

Example:

This access path is available for the following statement in which the EMP and DEPT tables are clustered on the DEPTNO column and the EMPNO column is the primary key of the EMP table:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno 
    AND emp.empno = 7900; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    NESTED LOOPS 
       TABLE ACCESS       BY ROWID       EMP 
       INDEX              UNIQUE SCAN    PK_EMP    
       TABLE ACCESS       CLUSTER        DEPT 

PK_EMP is the name of an index that enforces the primary key.

Path 3: Single Row by Hash Cluster Key with Unique or Primary Key

This access path is available if both of these conditions are true:

To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses the hash value to perform a hash scan on the table.

Example:

This access path is available in the following statement in which the ORDERS and LINE_ITEMS tables are stored in a hash cluster, and the ORDERNO column is both the cluster key and the primary key of the ORDERS table:

SELECT * 
    FROM orders 
    WHERE orderno = 65118968; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          HASH           ORDERS 

Path 4: Single Row by Unique or Primary Key

This access path is available if the statement's WHERE clause uses all columns of a unique or primary key in equality conditions. For composite keys, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a unique scan on the index on the unique or primary key to retrieve a single ROWID and then accesses the table by that ROWID.

Example:

This access path is available in the following statement in which the EMPNO column is the primary key of the EMP table:

SELECT * 
    FROM emp 
    WHERE empno = 7900; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 UNIQUE SCAN    PK_EMP 

PK_EMP is the name of the index that enforces the primary key.

Path 5: Clustered Join

This access path is available for statements that join tables stored in the same cluster if the statement's WHERE clause contains conditions that equate each column of the cluster key in one table with the corresponding column in the other table. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation. (For information on nested loops operations, see "Join Operations" on page 20-67.)

Example:

This access path is available in the following statement in which the EMP and DEPT tables are clustered on the DEPTNO column:

SELECT * 
    FROM emp, dept 
    WHERE emp.deptno = dept.deptno; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  NESTED LOOPS 
    TABLE ACCESS          FULL           DEPT 
    TABLE ACCESS          CLUSTER        EMP 

Path 6: Hash Cluster Key

This access path is available if the statement's WHERE clause uses all the columns of a hash cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses this hash value to perform a hash scan on the table.

Example:

This access path is available for the following statement in which the ORDERS and LINE_ITEMS tables are stored in a hash cluster and the ORDERNO column is the cluster key:

SELECT * 
    FROM line_items 
    WHERE orderno = 65118968; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          HASH           LINE_ITEMS 

Path 7: Indexed Cluster Key

This access path is available if the statement's WHERE clause uses all the columns of an indexed cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a unique scan on the cluster index to retrieve the ROWID of one row with the specified cluster key value. Oracle then uses that ROWID to access the table with a cluster scan. Since all rows with the same cluster key value are stored together, the cluster scan requires only a single ROWID to find them all.

Example:

This access path is available in the following statement in which the EMP table is stored in an indexed cluster and the DEPTNO column is the cluster key:

SELECT *  FROM emp 
  WHERE deptno = 10; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            CLUSTER        EMP 
    INDEX                 UNIQUE SCAN    PERS_INDEX 

PERS_INDEX is the name of the cluster index.

Path 8: Composite Index

This access path is available if the statement's WHERE clause uses all columns of a composite index in equality conditions combined with AND operators. To execute the statement, Oracle performs a range scan on the index to retrieve ROWIDs of the selected rows and then accesses the table by those ROWIDs.

Example:

This access path is available in the following statement in which there is a composite index on the JOB and DEPTNO columns:

SELECT * 
    FROM emp 
    WHERE job = 'CLERK' 
      AND deptno = 30; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     JOB_DEPTNO_INDEX 

JOB_DEPTNO_INDEX is the name of the composite index on the JOB and DEPTNO columns.

Path 9: Single-Column Indexes

This access path is available if the statement's WHERE clause uses the columns of one or more single-column indexes in equality conditions. For multiple single-column indexes, the conditions must be combined with AND operators.

If the WHERE clause uses the column of only one index, Oracle executes the statement by performing a range scan on the index to retrieve the ROWIDs of the selected rows and then accessing the table by these ROWIDs.

Example:

This access path is available in the following statement in which there is an index on the JOB column of the EMP table:

SELECT * 
    FROM emp 
    WHERE job = 'ANALYST'; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     JOB_INDEX 

JOB_INDEX is the index on EMP.JOB.

If the WHERE clauses uses columns of many single-column indexes, Oracle executes the statement by performing a range scan on each index to retrieve the ROWIDs of the rows that satisfy each condition. Oracle then merges the sets of ROWIDs to obtain a set of ROWIDs of rows that satisfy all conditions. Oracle then accesses the table using these ROWIDs.

Oracle can merge up to five indexes. If the WHERE clause uses columns of more than five single-column indexes, Oracle merges five of them, accesses the table by ROWID, and then tests the resulting rows to determine whether they satisfy the remaining conditions before returning them.

Example:

This access path is available in the following statement in which there are indexes on both the JOB and DEPTNO columns of the EMP table:

SELECT * 
    FROM emp 
    WHERE job = 'ANALYST' 
      AND deptno = 20; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    AND-EQUAL 
      INDEX               RANGE SCAN     JOB_INDEX 
      INDEX               RANGE SCAN     DEPTNO_INDEX 

The AND-EQUAL operation merges the ROWIDs obtained by the scans of the JOB_INDEX and the DEPTNO_INDEX, resulting in a set of ROWIDs of rows that satisfy the query.

Path 10: Bounded Range Search on Indexed Columns

This access path is available if the statement's WHERE clause contains a condition that uses either the column of a single-column index or one or more columns that make up a leading portion of a composite index:

column = expr 

column >[=] expr AND column <[=] expr 

column BETWEEN expr AND expr 

column LIKE 'c%' 

Each of these conditions specifies a bounded range of indexed values that are accessed by the statement. The range is said to be bounded because the conditions specify both its least value and its greatest value. To execute such a statement, Oracle performs a range scan on the index and then accesses the table by ROWID.

This access path is not available if the expression expr references the indexed column.

Example:

This access path is available in this statement in which there is an index on the SAL column of the EMP table:

SELECT * 
    FROM emp 
    WHERE sal BETWEEN 2000 AND 3000; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     SAL_INDEX 

SAL_INDEX is the name of the index on EMP.SAL.

Example:

This access path is also available in the following statement in which there is an index on the ENAME column of the EMP table:

SELECT * 
    FROM emp 
    WHERE ename LIKE 'S%'; 

Path 11: Unbounded Range Search on Indexed Columns

This access path is available if the statement's WHERE clause contains one of these conditions that use either the column of a single-column index or one or more columns of a leading portion of a composite index:

WHERE column >[=] expr 

WHERE column <[=] expr 

Each of these conditions specifies an unbounded range of index values accessed by the statement. The range is said to be unbounded because the condition specifies either its least value or its greatest value, but not both. To execute such a statement, Oracle performs a range scan on the index and then accesses the table by ROWID.

Example:

This access path is available in the following statement in which there is an index on the SAL column of the EMP table:

SELECT * 
    FROM emp 
    WHERE sal > 2000; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     SAL_INDEX 

Example:

This access path is available in the following statement in which there is a composite index on the ORDER and LINE columns of the LINE_ITEMS table:

SELECT * 
    FROM line_items 
    WHERE order > 65118968; 

The access path is available because the WHERE clause uses the ORDER column, a leading portion of the index.

Example:

This access path is not available in the following statement in which there is an index on the ORDER and LINE columns:

SELECT * 
    FROM line_items 
    WHERE line < 4; 

The access path is not available because the WHERE clause only uses the LINE column, which is not a leading portion of the index.

Path 12: Sort-Merge Join

This access path is available for statements that join tables that are not stored together in a cluster if the statement's WHERE clause uses columns from each table in equality conditions. To execute such a statement, Oracle uses a sort-merge operation. Oracle can also use a nested loops operation to execute a join statement. (For information on these operations, see "Optimizing Join Statements" on page 20-67.)

Example:

This access path is available for the following statement in which the EMP and DEPT tables are not stored in the same cluster:

SELECT * 
    FROM emp, dept 
    WHERE emp.deptno = dept.deptno; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  MERGE JOIN 
    SORT                  JOIN 
      TABLE ACCESS        FULL           EMP 
    SORT                  JOIN 
  TABLE ACCESS            FULL           DEPT 

Path 13: MAX or MIN of Indexed Column

This access path is available for a SELECT statement for which all of these conditions are true:

To execute the query, Oracle performs a range scan of the index to find the maximum or minimum indexed value. Since only this value is selected, Oracle need not access the table after scanning the index.

Example:

This access path is available for the following statement in which there is an index on the SAL column of the EMP table:

SELECT MAX(sal) FROM emp; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  AGGREGATE               GROUP BY 
    INDEX                 RANGE SCAN     SAL_INDEX 

Path 14: ORDER BY on Indexed Column

This access path is available for a SELECT statement for which all of these conditions are true:

To execute the query, Oracle performs a range scan of the index to retrieve the ROWIDs of the selected rows in sorted order. Oracle then accesses the table by these ROWIDs.

Example:

This access path is available for the following statement in which there is a primary key on the EMPNO column of the EMP table:

SELECT * 
    FROM emp 
    ORDER BY empno; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     PK_EMP 

PK_EMP is the name of the index that enforces the primary key. The primary key ensures that the column does not contain nulls.

Path 15: Full Table Scan

This access path is available for any SQL statement, regardless of its WHERE clause conditions.

Note that these conditions make index access paths unavailable:

where column1 and column2 are in the same table.

regardless of whether column is indexed.

where expr is an expression that operates on a column with an operator or function, regardless of whether the column is indexed.

Any SQL statement that contains only these constructs and no others that make index access paths available must use full table scans.

Example:

This statement uses a full table scan to access the EMP table:

SELECT * 
    FROM emp; 

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            FULL           EMP 

Choosing Among Access Paths

This section describes how the optimizer chooses among available access paths:

Choosing Among Access Paths with the Cost-Based Approach

With the cost-based approach, the optimizer chooses an access path based on these factors:

To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan using the statistics for the index, columns, and tables accessible to the statement. The optimizer then chooses the execution plan with the lowest estimated cost.

The optimizer's choice among available access paths can be overridden with hints.

Additional Information:

See Oracle8 Tuning for information about hints in SQL statements.

 

To choose among available access paths, the optimizer considers these factors:

Example:

Consider this query, which uses an equality condition in its WHERE clause to select all employees named Jackson:

SELECT * 
  FROM emp 
  WHERE ename = 'JACKSON'; 

If the ENAME column is a unique or primary key, the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key (access path 4).

Example:

Consider again the query in the previous example. If the ENAME column is not a unique or primary key, the optimizer can use these statistics to estimate the query's selectivity:

By dividing the number of rows in the EMP table by the number of distinct values in the ENAME column, the optimizer estimates what percentage of employees have the same name. By assuming that the ENAME values are uniformly distributed, the optimizer uses this percentage as the estimated selectivity of the query.

Example:

Consider this query, which selects all employees with employee ID numbers less than 7500:

SELECT * 
  FROM emp 
  WHERE empno < 7500; 

To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the EMPNO column if available. These statistics can be found in the USER_TAB_COLUMNS view. The optimizer assumes that EMPNO values are evenly distributed in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.

Example:

Consider this query, which uses a bind variable rather than a literal value for the boundary value in the WHERE clause condition:

SELECT * 
  FROM emp 
  WHERE empno < :e1; 

The optimizer does not know the value of the bind variable E1. Indeed, the value of E1 may be different for each execution of the query. For this reason, the optimizer cannot use the means described in the previous example to determine selectivity of this query. In this case, the optimizer heuristically guesses a small value for the selectivity of the column (because it is indexed). The optimizer makes this assumption whenever a bind variable is used as a boundary value in a condition with one of the operators <, >, <=, or >=.

The optimizer's treatment of bind variables can cause it to choose different execution plans for SQL statements that differ only in the use of bind variables rather than constants. In one case in which this difference may be especially apparent, the optimizer may choose different execution plans for an embedded SQL statement with a bind variable in an Oracle Precompiler program and the same SQL statement with a constant in SQL*Plus.

Example:

Consider this query, which uses two bind variables as boundary values in the condition with the BETWEEN operator:

SELECT * 
  FROM emp 
  WHERE empno BETWEEN :low_e AND :high_e; 

The optimizer decomposes the BETWEEN condition into these two conditions:

empno >= :low_e 
empno <= :high_e 

The optimizer heuristically estimates a small selectiviy for indexed columns in order to favor the use of the index.

Example:

Consider this query, which uses the BETWEEN operator to select all employees with employee ID numbers between 7500 and 7800:

SELECT * 
  FROM emp 
  WHERE empno BETWEEN 7500 AND 7800; 

To determine the selectivity of this query, the optimizer decomposes the WHERE clause condition into these two conditions:

empno >= 7500 
empno <= 7800 

The optimizer estimates the individual selectivity of each condition using the means described in a previous example. The optimizer then uses these selectivities (S1 and S2) and the absolute value function (ABS) in this formula to estimate the selectivity (S) of the BETWEEN condition:

S = ABS( S1 + S2 - 1 )

Choosing Among Access Paths with the Rule-Based Approach

With the rule-based approach, the optimizer chooses whether to use an access path based on these factors:

To choose an access path, the optimizer first examines the conditions in the statement's WHERE clause to determine which access paths are available. The optimizer then chooses the most highly ranked available access path.

Note that the full table scan is the lowest ranked access path on the list. This means that the rule-based approach always chooses an access path that uses an index if one is available, even if a full table scan might execute faster.

The order of the conditions in the WHERE clause does not normally affect the optimizer's choice among access paths.

Example:

Consider this SQL statement, which selects the employee numbers of all employees in the EMP table with an ENAME value of 'CHUNG' and with a SAL value greater than 2000:

SELECT empno 
  FROM emp 
  WHERE ename = 'CHUNG' 
    AND sal > 2000; 

Consider also that the EMP table has these integrity constraints and indexes:

Based on the conditions in the WHERE clause of the SQL statement, the integrity constraints, and the indexes, these access paths are available:

Note that the PK_EMPNO index does not make the single row by primary key access path available because the indexed column does not appear in a condition in the WHERE clause.

Using the rule-based approach, the optimizer chooses the access path that uses the ENAME_IND index to execute this statement. The optimizer chooses this path because it is the most highly ranked path available.

Optimizing Join Statements

To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:

access paths  

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement. (See "Choosing Access Paths" on page 20-43.)  

join operations  

To join each pair of row sources, Oracle must perform one of these operations:

  • nested loops
  • sort-merge
  • cluster
  • hash join (not available with rule-based optimization)
 

join order  

To execute a statement that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.  

Join Operations

The optimizer can use the following operations to join two row sources:

Nested Loops Join

To perform a nested loops join, Oracle follows these steps:

  1. The optimizer chooses one of the tables as the outer table, or the driving table. The other table is called the inner table.
  2. For each row in the outer table, Oracle finds all rows in the inner table that satisfy the join condition.
  3. Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

Figure 20-11 shows the execution plan for this statement using a nested loops join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 20-11 Nested Loops Join

To execute this statement, Oracle performs these steps:

Sort-Merge Join

Oracle can only perform a sort-merge join for an equijoin. To perform a sort-merge join, Oracle follows these steps:

  1. Oracle sorts each row source to be joined if they have not been sorted already by a previous operation. The rows are sorted on the values of the columns used in the join condition.
  2. Oracle merges the two sources so that each pair of rows, one from each source, that contain matching values for the columns used in the join condition are combined and returned as the resulting row source.

Figure 20-12 shows the execution plan for this statement using a sort-merge join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 20-12 Sort-Merge Join

To execute this statement, Oracle performs these steps:

Cluster Join

Oracle can perform a cluster join only for an equijoin that equates the cluster key columns of two tables in the same cluster. In a cluster, rows from both tables with the same cluster key values are stored in the same blocks, so Oracle only accesses those blocks.

Additional Information:

Oracle8 Tuning provides guidelines for deciding which tables to cluster for best performance.

 

Figure 20-13 shows the execution plan for this statement in which the EMP and DEPT tables are stored together in the same cluster:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 20-13 Cluster Join

To execute this statement, Oracle performs these steps:

A cluster join is nothing more than a nested loops join involving two tables that are stored together in a cluster. Since each row from the DEPT table is stored in the same data blocks as the matching rows in the EMP table, Oracle can access matching rows most efficiently.

Hash Join

Oracle can only perform a hash join for an equijoin. Hash join is not available with rule-based optimization. You must enable hash join optimization, using the initialization parameter HASH_JOIN_ENABLED (which can be set with the ALTER SESSION command) or the USE_HASH hint.

To perform a hash join, Oracle follows these steps:

  1. Oracle performs a full table scan on each of the tables and splits each into as many partitions as possible based on the available memory.
  2. Oracle builds a hash table from one of the partitions (if possible, Oracle will select a partition that fits into available memory). Oracle then uses the corresponding partition in the other table to probe the hash table. All partition pairs that do not fit into memory are placed onto disk.
  3. For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.

Figure 20-14 shows the execution plan for this statement using a hash join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 20-14 Hash Join

To execute this statement, Oracle performs these steps:

The initialization parameter HASH_AREA_SIZE controls the amount of memory used for hash join operations and the initialization parameter HASH_MULTIBLOCK_IO_COUNT controls the number of blocks a hash join operation should read and write concurrently.

Additional Information:

See Oracle8 Tuning for more information about these initialization parameters and the USE_HASH hint.

 

Choosing Execution Plans for Join Statements

This section describes how the optimizer chooses an execution plan for a join statement:

Note these considerations that apply to the cost-based and rule-based approaches:

Choosing Execution Plans for Joins with the Cost-Based Approach

With the cost-based approach, the optimizer generates a set of execution plans based on the possible join orders, join operations, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:

With the cost-based approach, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for outer join, the optimizer ignores the hint and chooses the order. You can also override the optimizer's choice of join operations with hints.

Additional Information:

See Oracle8 Tuning for information on using hints.

 

Choosing Execution Plans for Joins with the Rule-Based Approach

With the rule-based approach, the optimizer follows these steps to choose an execution plan for a statement that joins R tables:

  1. The optimizer generates a set of R join orders, each with a different table as the first table. The optimizer generates each potential join order using this algorithm:
    1. To fill each position in the join order, the optimizer chooses the table with the most highly ranked available access path according to the ranks for access paths in Table 20-1 "Access Paths" on page 20-47. The optimizer repeats this step to fill each subsequent position in the join order.
    2. For each table in the join order, the optimizer also chooses the operation with which to join the table to the previous table or row source in the order. The optimizer does this by "ranking" the sort-merge operation as access path 12 and applying these rules:
    • If the access path for the chosen table is ranked 11 or better, the optimizer chooses a nested loops operation using the previous table or row source in the join order as the outer table.
    • If the access path for the table is ranked lower than 12, and there is an equijoin condition between the chosen table and the previous table or row source in join order, the optimizer chooses a sort-merge operation.
    • If the access path for the chosen table is ranked lower than 12, and there is not an equijoin condition, the optimizer chooses a nested loops operation with the previous table or row source in the join order as the outer table.
  2. The optimizer then chooses among the resulting set of execution plans. The goal of the optimizer's choice is to maximize the number of nested loops join operations in which the inner table is accessed using an index scan. Since a nested loops join involves accessing the inner table many times, an index on the inner table can greatly improve the performance of a nested loops join.

    Usually, the optimizer does not consider the order in which tables appear in the FROM clause when choosing an execution plan. The optimizer makes this choice by applying the following rules in order:

    1. The optimizer chooses the execution plan with the fewest nested-loops operations in which the inner table is accessed with a full table scan.
    2. If there is a tie, the optimizer chooses the execution plan with the fewest sort-merge operations.
    3. If there is still a tie, the optimizer chooses the execution plan for which the first table in the join order has the most highly ranked access path:
    • If there is a tie among multiple plans whose first tables are accessed by the single-column indexes access path, the optimizer chooses the plan whose first table is accessed with the most merged indexes.
    • If there is a tie among multiple plans whose first tables are accessed by bounded range scans, the optimizer chooses the plan whose first table is accessed with the greatest number of leading columns of the composite index.
    • If there is still a tie, the optimizer chooses the execution plan for which the first table appears later in the query's FROM clause.

Views in Outer Joins

For a view that is on the right side of an outer join, the optimzer can use one of two methods, depending on how many base tables the view accesses:

Merging a View That Has a Single Base Table

A view that has one base table and is on the right side of an outer join can be merged into the query block of an accessing statement. (See "Merging the View's Query into the Statement" on page 20-25.) View merging is possible even if an expression in the view can return a non-null value for a NULL.

Example:

Consider the view NAME_VIEW, which concatenates first and last names from the EMP table:

CREATE VIEW name_view 
  AS SELECT emp.firstname || emp.lastname AS emp_fullname, emp.deptno 
       FROM emp; 

and consider this outer join statement, which finds the names of all employees in London and their departments, as well as any departments that have no employees:

SELECT dept.deptno, name_view.emp_fullname 
  FROM emp_fullname, dept 
  WHERE dept.deptno = name_view.deptno(+) 
    AND dept.deptloc = 'London'; 

The optimizer merges the view's query into the outer join statement. The resulting statement looks like this:

SELECT dept.deptno, DECODE(emp.rowid, NULL, NULL, emp.firstname || emp.lastname) 
  FROM emp, dept 
  WHERE dept.deptno = emp.deptno(+) 
    AND dept.deptloc = 'London'; 

The transformed statement selects only the employees who work in London.

Pushing the Join Predicate into a View That Has Multiple Base Tables

For a view with multiple base tables on the right side of an outer join, the optimizer can push the join predicate into the view (see "Pushing the Predicate into the View" on page 20-28) if the initialization parameter PUSH_JOIN_PREDICATE is set to TRUE or the accessing query contains the PUSH_JOIN_PRED hint.

Pushing a join predicate is a cost-based transformation that can enable more efficient access path and join methods, such as transforming hash joins into nested loop joins, and full table scans to index scans.

Additional Information:

See Oracle8 Tuning for information about optimizer hints.

 

Example:

Consider the view LONDON_EMP, which selects the employees who work in London:

CREATE VIEW london_emp 
  AS SELECT emp.ename 
       FROM emp, dept 
       WHERE emp.deptno = dept.deptno 
         AND dept.deptloc = 'London'; 

and consider this outer join statement, which finds the engineers and accountants working in London who received bonuses:

SELECT bonus.job, london_emp.ename 
  FROM bonus, london_emp 
  WHERE bonus.job IN ('engineer', 'accountant') 
    AND bonus.ename = london_emp.ename(+); 

The optimizer pushes the outer join predicate into the view. The resulting statement (which does not conform to standard SQL syntax) looks like this:

SELECT bonus.job, london_emp.ename 
  FROM bonus, (SELECT emp.ename FROM emp, dept 
                   WHERE bonus.ename = london_emp.ename(+) 
                     AND emp.deptno = dept.deptno 
                     AND dept.deptloc = 'London') 
  WHERE bonus.job IN ('engineer', 'accountant'); 

Optimizing Anti-Joins and Semi-Joins

An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments:

SELECT * FROM emp 
  WHERE deptno NOT IN 
    (SELECT deptno FROM dept 
      WHERE loc = 'HEADQUARTERS'); 

The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the optimizer should use.

A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example:

SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 

In this query, only one row needs to be returned from DEPT even though many rows in EMP might match the subquery. If there is no index on the BONUS column in EMP, a semi-join can be used to improve query performance.

The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm the optimizer should use.

Additional Information:

See Oracle8 Tuning for information about optimizer hints.

 

Optimizing "Star" Queries

One type of data warehouse design centers around what is known as a "star" schema, which is characterized by one or more very large fact tables that contain the primary information in the data warehouse and a number of much smaller dimension tables (or "lookup" tables), each of which contains information about the entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of lookup tables. Each lookup table is joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not joined to each other.

The Oracle cost-based optimizer recognizes star queries and generates efficient execution plans for them. (Star queries are not recognized by the rule-based optimizer.)

A typical fact table contains keys and measures. For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, would typically contain information about each product number that appears in the fact table.

A star join is a primary-key to foreign-key join of the dimension tables to a fact table. The fact table normally has a concatenated index on the key columns to facilitate this type of join.

Star Query Example

This section discusses star queries with reference to the following example:

SELECT SUM(dollars) 
  FROM facts, time, product, market 
  WHERE market.stat = 'New York' 
    AND product.brand = 'MyBrand' 
    AND time.year = 1995 
    AND time.month = 'March' 
    /* Joins*/ 
    AND time.key = facts.tkey 
    AND product.pkey = facts.pkey 
    AND market.mkey = facts.mkey; 

Tuning Star Queries

To execute star queries efficiently, you must use the cost based optimizer. Begin by using the ANALYZE command to gather statistics for each of the tables accessed by the query.

Indexing

In the example above, you would construct a concatenated index on the columns tkey, pkey, and mkey. The order of the columns in the index is critical to performance. the columns in the index should take advantage of any ordering of the data. If rows are added to the large table in time order, then tkey should be the first key in the index. When the data is a static extract from another database, it is worthwhile to sort the data on the key columns before loading it.

If all queries specify predicates on each of the small tables, a single concatenated index suffices. If queries that omit leading columns of the concatenated index are frequent, additional indexes may be useful. In this example, if there are frequent queries that omit the time table, an index on pkey and mkey can be added.

Hints

Usually, if you analyze the tables the optimizer will choose an efficient star plan. You can also use hints to improve the plan. The most precise method is to order the tables in the FROM clause in the order of the keys in the index, with the large table last. Then use the following hints:

/*+ ORDERED USE_NL(facts) INDEX(facts fact_concat) */ 

A more general method is to use the STAR hint /*+ STAR */.

Extended Star Schemas

Each of the small tables can be replaced by a join of several smaller tables. For example, the product table could be normalized into brand and manufacturer tables. Normalization of all of the small tables can cause performance problems. One problem is caused by the increased number of permutations that the optimizer must consider. The other problem is the result of multiple executions of the small table joins. You can solve both of these problems by using denormalized views. For example:

CREATE VIEW prodview AS SELECT /*+ NO_MERGE */ * 
    FROM brands, mfgrs WHERE brands.mfkey = mfgrs.mfkey; 

This hint reduces the optimizer's search space and causes caching of the result of the view.

Star Transformation

The star transformation is a cost-based query transformation aimed at executing star queries efficiently. Whereas the star optimization works well for schemas with a small number of dimensions and dense fact tables, the star transformation may be considered as an alternative if any of the following holds true:

The star transformation does not rely on computing a Cartesian product of the dimension tables, which makes it better suited for cases where fact table sparsity and/or a large number of dimensions would lead to a large Cartesian product with few rows having actual matches in the fact table. In addition, rather than relying on concatenated indexes, the star transformation is based on combining bitmap indexes on individual fact table columns.

The transformation can thus choose to combine indexes corresponding precisely to the constrained dimensions. There is no need to create many concatenated indexes where the different column orders match different patterns of constrained dimensions in different queries.


Attention:

Bitmap indexes are available only if you have purchased the Oracle8 Enterprise Edition. In Oracle8, bitmap indexes are not available and star query processing uses B*-tree indexes. In the Oracle8 Enterprise Edition, the parallel bitmap index join algorithm is also available for star query processing.

See Getting to Know Oracle8 and the Oracle8 Enterprise Edition for more information about the features available in Oracle8 and Oracle8 Enterprise Edition.

 

The star transformation works by generating new subqueries that can be used to drive a bitmap index access path for the fact table.

Consider a simple case with three dimension tables, "d1", "d2", and "d3", and a fact table, "fact". The following query:

EXPLAIN PLAN FOR 
  SELECT * FROM fact, d1, d2, d3  
    WHERE fact.c1 = d1.c1 AND fact.c2 = d2.c1 AND fact.c3 = d3.c1 
    AND d1.c2 IN (1, 2, 3, 4) 
    AND d2.c2 < 100 
    AND d3.c2 = 35 

gets transformed by adding three subqueries:

SELECT * FROM fact, d1, d2 
  WHERE fact.c1 = d1.c1 AND fact.c2 = d2.c1 
    AND d1.c2 IN (1, 2, 3, 4) 
    AND d2.c2 < 100 
    AND fact.c1 IN (SELECT d1.c1 FROM d1 WHERE d1.c2 IN (1, 2, 3, 4)) 
    AND fact.c2 IN (select d2.c1 FROM d2 WHERE d2.c2 < 100) 
    AND fact.c3 IN (SELECT d3.c1 FROM d3 WHERE d3.c2 = 35) 

Given that there are bitmap indexes on fact.c1, fact.c2, and fact.c3, the newly generated subqueries can be used to drive a bitmap index access path in the following way. For each value of d1.c1 that is retrieved from the first subquery, the bitmap for that value is retrieved from the index on fact.c1 and these bitmaps are merged. The result is a bitmap for precisely those rows in fact that match the condition on d1 in the subquery WHERE-clause.

Similarly, the values from the second subquery are used together with the bitmap index on fact.c2 to produce a merged bitmap corresponding to the rows in fact that match the condition on d2 in the second subquery. The same operations apply to the third subquery. The three merged bitmaps can then be ANDed, resulting in a bitmap corresponding to those rows in fact that meet the conditions in all three subqueries simultaneously.

This bitmap can be used to access fact and retrieve the relevant rows. These are then joined to d1, d2, and d3 to produce the answer to the query. No Cartesian product is needed.

Execution Plan

The following execution plan might result from the query above:

 SELECT STATEMENT 
  HASH JOIN 
   HASH JOIN 
    HASH JOIN 
     TABLE ACCESS              FACT      BY ROWID 
      BITMAP CONVERSION                  TO ROWIDS 
       BITMAP AND 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         D3        FULL 
          BITMAP INDEX         FACT_C3   RANGE SCAN 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         D1        FULL 
          BITMAP INDEX         FACT_C1   RANGE SCAN 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         D2        FULL 
          BITMAP INDEX         FACT_C2   RANGE SCAN 
     TABLE ACCESS              D1        FULL 
    TABLE ACCESS               D2        FULL 
   TABLE ACCESS                D3        FULL 

In this plan the fact table is accessed through a bitmap access path based on a bitmap AND of three merged bitmaps. The three bitmaps are generated by the BITMAP MERGE row source being fed bitmaps from row source trees underneath it. Each such row source tree consists of a BITMAP KEY ITERATION row source which fetches values from the subquery row source tree, which in this example is just a full table access, and for each such value retrieves the bitmap from the bitmap index. After the relevant fact table rows have been retrieved using this access path, they are joined with the dimension tables to produce the answer to the query.

The star transformation is a cost-based transformation in the following sense. The optimizer generates and saves the best plan it can produce without the transformation. If the transformation is enabled, the optimizer then tries to apply it to the query and if applicable, generates the best plan using the transformed query. Based on a comparison of the cost estimates between the best plans for the two versions of the query, the optimizer will then decide whether to use the best plan for the transformed or untransformed version.

If the query requires accessing a large percentage of the rows in the fact table, it may well be better to use a full table scan and not use the tranformations. However, if the constraining predicates on the dimension tables are sufficiently selective that only a small portion of the fact table needs to be retrieved, the plan based on the transformation will probably be superior.

Note that the optimizer will generate a subquery for a dimension table only if it decides that it is reasonable to do so based on a number of criteria. There is no guarantee that subqueries will be generated for all dimension tables. The optimizer may also decide, based on the properties of the tables and the query, that the transformation does not merit being applied to a particular query. In this case the best regular plan will be used.

Using Star Transformation

You can enable star transformation by setting the value of the initialization parameter STAR_TRANSFORMATION_ENABLED to TRUE. Use the STAR_TRANSFORMATION hint to make the optimizer use the best plan in which the transformation has been used.

Restrictions on Star Transformation

Star transformation is not supported for tables with any of the following characteristics:




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index