|Oracle8i Designing and Tuning for Performance
Release 2 (8.1.6)
Part Number A76992-01
This chapter describes how Oracle optimizes Structured Query Language (SQL) using the cost-based optimizer (CBO).
This chapter contains the following sections:
This section describes five ways you can improve SQL statement efficiency:
Restructuring the indexes is a good starting point, because it has more impact on the application than does restructuring the statement or the data.
Do not use indexes as a panacea. Application developers sometimes think that performance will improve if they write more indexes. If a single programmer creates an appropriate index, then this might indeed improve the application's performance. However, if 50 programmers each create an index, then application performance will probably be hampered!
After restructuring the indexes, you can try restructuring the statement. Rewriting an inefficient SQL statement is often easier than repairing it. If you understand the purpose of a given statement, then you may be able to quickly and easily write a new statement that meets the requirement.
Because SQL is a flexible language, more than one SQL statement may meet the needs of your application. Although two SQL statements may produce the same result, Oracle may process one faster than the other. You can use the results of the
PLAN statement to compare the execution plans and costs of the two statements and determine which is more efficient.
This example shows the execution plans for two SQL statements that perform the same function. Both statements return all the departments in the
dept table that have no employees in the
emp table. Each statement searches the
emp table with a subquery. Assume there is an index,
deptno_index, on the
deptno column of the
The first statement and its execution plan:
Step 3 of the output indicates that Oracle executes this statement by performing a full table scan of the
emp table despite the index on the
deptno column. This full table scan can be a time-consuming operation. Oracle does not use the index, because the subquery that searches the
emp table does not have a
WHERE clause that makes the index available.
However, this SQL statement selects the same rows by accessing the index:
SELECT dname, deptno FROM dept WHERE NOT EXISTS
WHERE clause of the subquery refers to the
deptno column of the
emp table, so the index
deptno_index is used. The use of the index is reflected in step 3 of the execution plan. The index range scan of
deptno_index takes less time than the full scan of the
emp table in the first statement. Furthermore, the first query performs one full scan of the
emp table for every
deptno in the
dept table. For these reasons, the second SQL statement is faster than the first.
If you have statements in your applications that use the
IN operator, as the first query in this example does, then you should consider rewriting them so that they use the
EXISTS operator. This would allow such statements to use an index, if one exists.
Use equijoins whenever possible. Without exception, statements that perform equijoins on untransformed column values are the easiest to tune.
Join order can have a significant effect on performance. The main objective of SQL tuning is to avoid performing unnecessary work to access rows that do not affect the result. This leads to three general rules:
The following example shows how to tune join order effectively:
SELECT info FROM taba a, tabb b, tabc c WHERE a.acol BETWEEN :alow AND :ahighAND b.bcol BETWEEN :blow AND :bhigh AND c.ccol BETWEEN :clow AND :chigh AND a.key1 = b.key1 AND a.key2 = c.key2;
The first three conditions in the example above are filter conditions applying to only a single table each. The last two conditions are join conditions.
Filter conditions dominate the choice of driving table and index. In general, the driving table should be the one containing the filter condition that eliminates the highest percentage of the table. Thus, if the range of :
alow to :
ahigh is narrow compared with the range of
acol, but the ranges of :
b* and :
c* are relatively large, then
taba should be the driving table, all else being equal.
After you know your driving table, choose the most selective index available to drive into that table. Alternatively, choose a full table scan if that would be more efficient. From there, the joins should all happen through the join indexes, the indexes on the primary or foreign keys used to connect that table to an earlier table in the join tree. Rarely should you use the indexes on the non-join conditions, except for the driving table. Thus, after
taba is chosen as the driving table, you should use the indexes on
key2 to drive into
The work of the following join can be reduced by first joining to the table with the best still-unused filter. Thus, if "
BETWEEN ..." is more restrictive (rejects a higher percentage of the rows seen) than "
between ...", the last join can be made easier (with fewer rows) if
tabb is joined before
Use untransformed column values. For example, use:
WHERE TO_NUMBER (SUBSTR(a.order_no, instr(b.order_no, '.') - 1)) = TO_NUMBER (SUBSTR(a.order_no, instr(b.order_no, '.') - 1))
Do not use SQL functions in predicate clauses or
WHERE clauses. The use of an aggregate function, especially in a subquery, often indicates that you could have held a derived value on a master record.
Avoid mixed-mode expressions, and beware of implicit type conversions. When you want to use an index on the
charcol, but the
WHERE clause looks like this:
numexpr is an expression of number type (for example, 1,
numcol+0,...), Oracle translates that expression into:
This has the following consequences:
charcolas a string of characters that does not translate to a number, then an error is returned.
You can avoid this problem by replacing the top expression with the explicit conversion:
Alternatively, make all type conversions explicit. The statement:
allows use of an index on
numcol, because the default conversion is always character-to-number. This behavior, however, is subject to change. Making type conversions explicit also makes it clear that
charexpr should always translate to a number.
SQL is not a procedural language. Using one piece of SQL to do many different things is not a good idea: it usually results in a less-than-optimal result for each task. If you want SQL to accomplish different things, then write two different statements rather than writing one statement that will do different things depending on the parameters you give it.
Optimization (determining the execution plan) takes place before the database knows what values will be substituted into the query. An execution plan should not, therefore, depend on what those values are. For example:
SELECT info FROM tables WHERE ...AND somecolumn BETWEEN DECODE(:loval, 'ALL', somecolumn, :loval) AND DECODE(:hival, 'ALL', somecolumn, :hival);
Written as shown, the database cannot use an index on the
somecolumn column, because the expression involving that column uses the same column on both sides of the
This is not a problem if there is some other highly selective, indexable condition you can use to access the driving table. Often, however, this is not the case. Frequently, you may want to use an index on a condition like that shown, but need to know the values of :
loval, and so on, in advance. With this information, you can rule out the
ALL case, which should not use the index.
If you want to use the index whenever real values are given for :
loval and :
hival (that is, if you expect narrow ranges, even ranges where :
loval often equals :
hival), then you can rewrite the example in the following logically equivalent form:
SELECT /* change this half of union all if other half changes */ info FROM tables WHERE ... UNION ALL SELECT /* Change this half of union all if other half changes. */ info FROM tables WHERE ...
If you run
PLAN on the new query, then you seem to get both a desirable and an undesirable execution plan. However, the first condition the database evaluates for either half of the
ALL is the combined condition on whether
ALL. The database evaluates this condition before actually getting any rows from the execution plan for that part of the query.
When the condition comes back false for one part of the
ALL query, that part is not evaluated further. Only the part of the execution plan that is optimum for the values provided is actually carried out. Because the final conditions on
:loval are guaranteed to be mutually exclusive, then only one half of the
ALL actually returns rows. (The
ALL is logically valid because of this exclusivity. It allows the plan to be carried out without an expensive sort to rule out duplicate rows for the two halves of the query.)
Use optimizer hints, such as /*+
ORDERED */ to control access paths. This is a better approach than using traditional techniques or "tricks of the trade" such as
CUST_NO + 0. For example, use
For more information on hints, see Chapter 7, "Using Optimizer Hints".
EXISTS is a useful alternative.
Data value lists are generally a sign that an entity is missing. For example:
The real objective in the
WHERE clause above is to determine whether the mode of transport is an automobile, and not to identify a particular make. A reference table should be available in which transport type = '
Minimize the use of
DISTINCT always creates a sort; all the data must be instantiated before your results can be returned.
When appropriate, use
RETURNING to select and modify data with a single call. This technique improves performance by reducing the number of calls to the database.
For syntax information on the
Be careful when joining views, when performing outer joins to views, and when you consider recycling views.
The shared SQL area in Oracle reduces the cost of parsing queries that reference views. In addition, optimizer improvements make the processing of predicates against views very efficient. Together, these factors make possible the use of views for ad hoc queries. Despite this, joins to views are not recommended, particularly joins from one complex view to another.
The following example shows a query upon a column which is the result of a
BY. The entire view is first instantiated, and then the query is run against the view data.
CREATE VIEW dx(deptno, dname, totsal) AS SELECT d.deptno, d.dname, e.sum(sal) FROM emp e, dept d SELECT * FROM dx WHERE deptno=10;
UNNEST_SUBQUERY session parameter to
TRUE enables subquery unnesting. Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the optimizer to consider them together when evaluating access paths and joins.
This parameter not cost based, and it is not set by default.
UNNEST_SUBQUERY first verifies if the statement is valid. If the statement is not valid, then subquery unnesting cannot proceed. The statement must then must pass a heuristic test.
UNNEST hint checks the subquery block for validity only. If it is valid, then subquery unnesting is enabled without Oracle checking the heuristics. If you enabled subquery unnesting with the
UNNEST_SUBQUERY parameter, then the
NO_UNNEST hint turns it off for specific subquery blocks.
Because subquery unnesting generates views, some views will be merged in the main query block with complex view merging. When the subquery contains an aggregate function, it is a good idea to have complex view merging enabled. This allows the inline view generated by unnesting to be merged in the main query block.
An outer join to a multi-table view can be problematic. For example, you may start with the usual
dept tables with indexes on
deptno, and create the following view:
CREATE VIEW empdept (empno, deptno, ename, dname) AS SELECT e.empno, e.deptno, e.ename, d.dname
You may then construct the simplest possible query to do an outer join into this view on an indexed column (
deptno) of a table underlying the view:
The following execution plan results:
QUERY_PLAN -------------------------------------------- MERGE JOIN OUTER TABLE ACCESS BY ROWID DEPT INDEX UNIQUE SCAN DEPT_U1: DEPTNO FILTER VIEW EMPDEPT NESTED LOOPS OUTER TABLE ACCESS FULL EMP TABLE ACCESS BY ROWID DEPT INDEX UNIQUE SCAN DEPT_U1: DEPTNO
Until both tables of the view are joined, the optimizer does not know whether the view will generate a matching row. The optimizer must therefore generate all the rows of the view and perform a
OUTER with all the rows returned from the rest of the query. This approach would be extremely inefficient if all you want is a few rows from a multi-table view with at least one very large table.
Solving the problem in the preceding example is relatively easy. The second reference to
dept is not needed, so you can do an outer join straight to
emp. In other cases, the join need not be an outer join. You can still use the view simply by getting rid of the (+) on the join into the view.
Beware of writing a view for one purpose and then using it for other purposes, to which it may be ill-suited. Consider this example:
You can obtain
deptno directly from the
dept table. It would be inefficient to obtain this information by querying the DX view (which was declared earlier in the present example). To answer the query, the view would perform a join of the
emp tables, even though you do not need any data from the
Using triggers consumes system resources. If you use too many triggers, then you may find that performance is adversely affected and you may need to modify or disable them.
After restructuring the indexes and the statement, you can consider restructuring the data.
BYin response-critical code.
The overall purpose of any strategy for data distribution is to locate each data attribute such that its value makes the minimum number of network journeys. If the current number of journeys is excessive, then moving (migrating) the data is a natural solution.
Often, however, no single location of the data reduces the network load (or message transmission delays) to an acceptable level. In this case, consider either holding multiple copies (replicating the data) or holding different parts of the data in different places (partitioning the data).
Where distributed queries are necessary, it may be effective to code the required joins with procedures either in PL/SQL within a stored procedure, or within the user interface code.
When considering a cross-network join, you can either bring the data in from a remote node and perform the join locally, or you can perform the join remotely. The option you choose should be determined by the relative volume of data on the different nodes.
After you have tuned your application's SQL statements, consider maintaining statistics with the useful procedures of the
DBMS_STATS package. Also consider implementing plan stability to maintain application performance characteristics despite system changes.
Structured Query Language (SQL) is used to perform all database operations, although some Oracle tools and applications simplify or mask its use. This chapter provides an overview of the issues involved in tuning database operations from the SQL point-of-view.
For more information about tuning PL/SQL statements, see PL/SQL User's Guide and Reference.
This section introduces:
Always approach the tuning of database operations from the standpoint of the particular goals of your application. Are you tuning serial SQL statements or parallel operations? Do you have an online transaction processing (OLTP) application or a data warehousing application?
As a result, these two divergent types of applications have contrasting goals for tuning as described in Table 9-1.
Serial SQL Statement
Minimize resource use by the operation.
Maximize throughput for the hardware.
The goal of tuning one SQL statement in isolation is: Minimize resource use by the operation being performed.
You can experiment with alternative SQL syntax without actually modifying your application. To do this, use the
PLAN statement with the alternative statement that you are considering, and compare the alternative statement's execution plan and cost with that of the existing one. The cost of a SQL statement appears in the
POSITION column of the first row generated by
PLAN. You must run the application to see which statement can actually be executed more quickly.
For more information, see "Approaches to SQL Statement Tuning".
The goal of tuning parallel execution is: Maximize throughput for the given hardware.
If you have a powerful system and a massive, high-priority SQL statement to run, then parallelize the statement so that it uses all available resources.
Oracle can perform the following operations in parallel:
APPENDhint, parallel index scans)
Look for opportunities to parallelize operations in the following situations:
Whenever an operation you are performing in the database takes a long time, whether it is a query or a batch job, you may be able to reduce the elapsed time by using parallel operations.
You can split rows so that they are not all accessed by a single process.
For more information about parallel execution, see Oracle8i Concepts and your platform-specific Oracle documentation.
For information on using the following features, see Oracle8i Data Warehousing Guide:
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access Large Binary Objects (
Parallel execution benefits systems if they have all of the following characteristics:
If your system lacks any of these characteristics, then parallel execution may not significantly improve performance. In fact, parallel execution can reduce system performance on over-utilized systems or systems with small I/O bandwidth.
Parallel execution provides the best performance improvements in decision support systems (DSS). However, online transaction processing (OLTP) systems also benefit from parallel execution; for example, parallel index creation greatly benefits ecommerce businesses where there is little scheduled downtime.
During the day, most OLTP systems should probably not use parallel execution. During off-hours, however, parallel execution can effectively process high-volume batch operations. For example, a bank might use parallelized batch programs to perform millions of updates to apply interest to accounts.
Tuning OLTP applications mostly involves tuning serial SQL statements. You should consider two design issues: use of SQL and shared PL/SQL, and use of different transaction modes.
For more information on tuning data warehouse applications, see Oracle8i Data Warehousing Guide.
To minimize parsing, use bind variables in SQL statements within OLTP applications. This way, all users can share the same SQL statements while using fewer resources for parsing.
Sophisticated users can use discrete transactions if performance is of the utmost importance, and if the users are willing to design the application accordingly.
Serializable transactions can be used if the application must be ANSI compatible. Because of the overhead inherent in serializable transactions, Oracle strongly recommends the use of read-committed transactions instead.
For more information, see Chapter 17, "Transaction Modes".
If excessive use of triggers degrades system performance, then modify the conditions under which triggers fire by executing the
CREATE OR REPLACE
TRIGGER statements. You can also turn off triggers with the
This section documents the best practices for developing and tuning SQL with the cost-based optimizer (CBO). This includes the following:
The traditional RBO tuning techniques include:
Because the CBO is cost based, it is not necessary to force or disable a particular index. The CBO chooses the access path with the best cost.
The CBO chooses the most efficient join order based on cost after permuting the possible join graphs. Hence, there is no need, or benefit, to ordering the
FROM clause under the CBO.
In the following example, the CBO may choose a full table scan if the index probe on
employee_num is too costly (e.g., the estimated cardinality for employees having employee numbers beginning with 20 is high).
SELECT employee_num, full_name NAME, employee_id FROM mtl_employees_current_view WHERE (employee_num LIKE '20%') AND (organization_id = :1) ORDER BY employee_num;
The object statistics include the following:
The following example illustrates the cost model and selectivity of a query which, under the RBO, used an inefficient index. The CBO chooses a more efficient plan.
SELECT item.expenditure_item_id FROM pa_tasks t, pa_expenditures exp, pa_expenditure_types etype, pa_expenditure_items item WHERE TRUNC(exp.expenditure_ending_date)<=TRUNC(NVL(TO_DATE(:b0), exp.expenditure_ending_date)) AND exp.expenditure_status_code||''='APPROVED' AND exp.expenditure_group=NVL(:b1,exp.expenditure_group) AND exp.expenditure_id=item.expenditure_id AND (NVL(item.request_id,(:b2+1))<>:b2 OR item.cost_dist_rejection_code IS NULL ) AND item.cost_distributed_flag='N' and t.task_id=item.task_id AND t.project_id=DECODE(:b4,0,T.project_id,:b4) AND item.expenditure_type=etype.expenditure_type AND etype.system_linkage_function||''=:b6 ORDER BY item.expenditure_item_date; COST DISTRIBUTED FLAG C 7 N 80,251 Y 16,534,822
Cost= SELECT STATEMENT COUNT(*) Cost= SORT ORDER BY =================================== Cost= NESTED LOOPS Cost= NESTED LOOPS Cost= NESTED LOOPS Cost= TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_ITEMS_ALL Cost= INDEX RANGE SCAN PA_EXPENDITURE_ITEMS_N3: COST_DISTRIBUTED_ FLAG Cost= TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_TYPES Cost= INDEX UNIQUE SCAN PA_EXPENDITURE_TYPES_U1: EXPENDITURE_TYPE Cost= TABLE ACCESS BY INDEX ROWID PA_EXPENDITURES_ALL Cost= INDEX UNIQUE SCAN PA_EXPENDITURES_U1: EXPENDITURE_ID Cost= TABLE ACCESS BY INDEX ROWID PA_TASKS Cost= INDEX UNIQUE SCAN PA_TASKS_U1: TASK_ID
Cost=6503 SELECT STATEMENT Cost=6503 SORT ORDER BY Cost=6489 NESTED LOOPS Cost=6487 NESTED LOOPS Cost=6478 MERGE JOIN CARTESIAN Cost=6477 TABLE ACCESS FULL PA_EXPENDITURES_ALL Cost=1 SORT JOIN Cost=1 TABLE ACCESS FULL PA_EXPENDITURE_TYPES Cost=9 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_ITEMS_ALL Cost=4 INDEX RANGE SCAN PA_EXPENDITURE_ITEMS_N1: EXPENDITURE_ID Cost=2 TABLE ACCESS BY INDEX ROWID PA_TASKS Cost=1 INDEX UNIQUE SCAN PA_TASKS_U1: TASK_ID
This illustrates that the cost of the RBO plan is significantly higher than that of the the default CBO generated plan.
Cost=592532 SELECT STATEMENT Cost=592532 SORT ORDER BY Cost=592518 NESTED LOOPS Cost=592516 NESTED LOOPS Cost=587506 NESTED LOOPS Cost=504831 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_ITEMS_ALL Cost=32573 INDEX RANGE SCAN PA_EXPENDITURE_ITEMS_N3: Cost=1 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_TYPES Cost= INDEX UNIQUE SCAN PA_EXPENDITURE_TYPES_U1: Cost=2 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURES_ALL Cost=1 INDEX UNIQUE SCAN PA_EXPENDITURES_U1: Cost=2 TABLE ACCESS BY INDEX ROWID PA_TASKS Cost=1 INDEX UNIQUE SCAN PA_TASKS_U1:
In order to avoid the full table scan, the query can be rewritten in order to optimize by using a more selective filter. In this case, the expenditure group is rather selective, but the
NVL() function prevented an index from being used.
SELECT item.expenditure_item_id FROM pa_tasks t, pa_expenditures exp, pa_expenditure_types etype, pa_expenditure_items item WHERE TRUNC(exp.expenditure_ending_date)<=TRUNC(NVL(TO_DATE(:b0), exp.expenditure_ending_date)) AND exp.expenditure_status_code||''='APPROVED' AND exp.expenditure_group=:b1 AND (NVL(item.request_id,(:b2+1))<>:b2 OR item.cost_dist_rejection_code IS NULL) AND item.cost_distributed_flag='N' and t.task_id=item.task_id AND t.project_id=DECODE(:b4,0,t.project_id,:b4) AND item.expenditure_type=etype.expenditure_type AND etype.system_linkage_function||''=:b6 ORDER BY item.expenditure_item_date
Cost=32 SELECT STATEMENT Cost=32 SORT ORDER BY Cost=18 NESTED LOOPS Cost=16 NESTED LOOPS Cost=7 MERGE JOIN CARTESIAN Cost=1 TABLE ACCESS FULL PA_EXPENDITURE_TYPES Cost=6 SORT JOIN Cost=6 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURES_ALL Cost=2 INDEX RANGE SCAN PA_EXPENDITURES_N3: EXPENDITURE_GROUP Cost=9 TABLE ACCESS BY INDEX ROWID PA_EXPENDITURE_ITEMS_ALL Cost=4 INDEX RANGE SCAN PA_EXPENDITURE_ITEMS_N1: EXPENDITURE_ID Cost=2 TABLE ACCESS BY INDEX ROWID PA_TASKS Cost=1 INDEX UNIQUE SCAN PA_TASKS_U1: TASK_ID
Avoid the following kind of complex expressions:
These expressions prevent the optimizer from assigning valid cardinality or selectivity estimates, and can in turn affect the overall plan and the join method.
Add the predicate versus using
SELECT employee_num, full_name NAME, employee_id FROM mtl_employees_current_view WHERE (employee_num = NVL (:b1,employee_num)) AND (organization_id=:1) ORDER BY employee_num;
SELECT employee_num, full_name NAME, employee_id FROM mtl_employees_current_view WHERE (employee_num = :b1) AND (organization_id=:1) ORDER BY employee_num;
The balloon tactic is when a developer chooses to write a single complex SQL statement which incorporates complex application and business logic, as opposed to writing a few simple queries to achieve the same results. Developing a very large complex SQL statement has performance implications in terms of sharable memory and optimization. Coding a few simple queries in place of a single complex query is a better approach, because the individual SQL statements are easier to optimize and maintain.
Oracle Forms and Reports are powerful development tools which allow application logic to be coded using PL/SQL (triggers or program units). This helps reduce the complexity of SQL by allowing complex logic to be handled in the Forms or Reports. In addition, you can also invoke a server side PL/SQL package which performs the few SQL statements in place of a single large complex SQL statement. Because the package is a server-side unit, there are no issues surrounding client to database round-trips and network traffic.
Complex logic should be handled in the application via Oracle Forms triggers, PL/SQL logic, or C-Code.
SELECT * FROM ar_addresses_v WHERE (customer_id=:1) ================================================== AR_ADDRESSES_V: SELECT * FROM AR_LOOKUPS L_CAT, FND_TERRITORIES_VL TERR, FND_LANGUAGES_VL LANG, RA_SITE_USES SU_SHIP, RA_SITE_USES SU_STMT, RA_SITE_USES SU_DUN, RA_SITE_USES SU_LEGAL, RA_SITE_USES SU_BILL, RA_SITE_USES SU_MARKET, RA_ADDRESSES ADDR
The following steps were taken to improve the above query, which accessed a complex view with many outer joins:
Table 9-2 lists recommended tuning tips you should implement during your SQL statement design phase:
|SQL Tuning Tip||Notes|
Do the same work faster, or do less work. Tun by selectivity.
Aim to have the least rows selected. This leads to less work and less time taken by SQL execution. It also reduces parse times.
Decompose join layers.
Analyze the joins one by one and check that their use makes sense in each circumstance. See Chapter 4, "The Optimizer" .
Examine the underlying views.
If your query accesses a view, or joins with a view, then you should examine the view thoroughly to determine if the view is optimized, or if your query even needs all the complexity from the view.
Do not be afraid of full table scans, especially for small tables.
Full table scans may make sense and be cheaper than index scans in certain situations, like with smaller tables or non-selective indexes.
Examine the execution plan in detail.
Index access and NL joins may not be optimal. For example, the query could be returning too many rows for this particular join type.
Do the math for long-running queries:
Verify the following:
In other words, your expectations of needing the query to run in 3 minutes could be too high, depending on the system configuration.
Monitor disk reads and buffer gets
For instructions on how to do this, see "Disk Reads and Buffer Gets".
For advice on how to do this, see "Choose an Advantageous Join Order".
For advice on how to decide, see "Using EXISTS versus IN".
Tune for the typical case
It is important that you generate and review execution plans for all your SQL statements to ensure optimal performance.
For more information on execution plans, see Chapter 5, "Using EXPLAIN PLAN".
Predicate collapsing occurs when a column predicate involves more than one bind variable. An expression of the form
(:b1,'',:b3,col)] is a an example of predicate collapsing. This implies that if the bind variable 1 is null, then the bind variable 3 should be used; otherwise, the expression will result in
col]. This prevents the optimizer from utilizing the index on the "
col" column due to the decode construct.
The following example demonstrates how predicate collapsing is used to collapse a name bind variable with the
delivery_id bind variable in a single filter. As can be seen from the
PLAN, this results in a full table scan on the
wsh_deliveries table because of the
NVL() construct on the
delivery_id column, as well as the
DECODE() construct on the name column.
SELECT delivery_id, planned_departure_id, organization_id, status_code FROM wsh_deliveries WHERE delivery_id = NVL(:b1,delivery_id) AND name = DECODE(:b1,'',:b3, NAME) ORDER BY UPPER(HRE.full_name) PLAN: Cost=2090 SELECT STATEMENT Cost=2090 TABLE ACCESS FULL WSH_DELIVERIES
This query can be rewritten using a
UNION to short-circuit one-side of the
UNION based on the bind variable values. For example, if the
delivery_id bind is supplied, only the first branch of the
UNION is executed.
If a value for the name bind variable is supplied, then the second branch of the
UNION is executed. In either case, both sides of the
UNION use rather selective indexes on either the
delivery_id column or the name column. This is much more efficient than the original query which performed a full table scan.
SELECT delivery_id, planned_departure_id, organization_id, status_code FROM wsh_deliveries WHERE delivery_id = :b1 AND (:b1 IS NOT NULL) UNION SELECT delivery_id, planned_departure_id, organization_id, status_code FROM wsh_deliveries WHERE name = :b2 AND (:b1 is null) Cost=34 SELECT STATEMENT Cost=34 SORT UNIQUE Cost= UNION-ALL Cost= FILTER Cost=3 TABLE ACCESS BY INDEX ROWID WSH_DELIVERIES Cost=2 INDEX UNIQUE SCAN WSH_DELIVERIES_U1: DELIVERY_ID Cost= FILTER Cost=3 TABLE ACCESS BY INDEX ROWID WSH_DELIVERIES Cost=2 INDEX UNIQUE SCAN WSH_DELIVERIES_U2: NAME
The following example illustrates how a query can be optimized for the general case. Specifically, this purchasing query determines the list of approvers which can approve a purchase order for a given organizational structure. However, in most cases, the end user provides the approver name via a name pattern, and, therefore, it is not necessary to scan all the approvers.
SELECT COUNT(*), COUNT(DISTINCT HR.employee_id ), HR.full_name, HR.employee_num, HR.employee_id FROM hr_employees_current_v HR,(SELECT DISTINCT PEH.superior_id FROM po_employee_hierarchies PEH WHERE PEH.position_structure_id = :1 AND PEH.employee_id > 0) PEHV WHERE PEHV.superior_id = HR.employee_id AND (:2 = 'Y' OR (:3 = 'N' AND HR.employee_id != :4))GROUP BY full_name, employee_num, employee_id ORDER BY full_name
call count cpu elapsed disk query current ros ------- ------ -------- ---------- ---------- ---------- ---------- --------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 42 39.34 39.51 3756 7752 3 82 ------- ------ -------- ---------- ---------- ---------- ---------- --------- total 44 39.34 39.51 3756 7752 3 82
SELECT STATEMENT GOAL: ALL_ROWS SORT (GROUP BY) FILTER NESTED LOOPS NESTED LOOPS VIEW SORT (UNIQUE) INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PO_EMPLOYEE_HIERARCHIES_U1' (UNIQUE) TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'PER_ALL_PEOPLE_F' INDEX (RANGE SCAN) OF 'PER_PEOPLE_F_PK' (UNIQUE) TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'PER_ALL_ASSIGNMENTS_F' INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PER_ASSIGNMENTS_F_N12' (NON-UNIQUE) SORT (AGGREGATE) TABLE ACCESS GOAL: ANALYZED (FULL) OF 'FINANCIALS_SYSTEM_PARAMS_ALL'
SELECT COUNT(*), COUNT(DISTINCT HR.employee_id ), HR.full_name, HR.employee_num, HR.employee_id FROM hr_employees_current_v HR WHERE (full_name LIKE NVL(:1,'')||'%') AND (NVL(:2, 'N') = 'Y' OR (NVL(:3,'N') = 'N' AND HR.employee_id !=:4)) AND EXISTS(SELECT PEH.superior_id FROM po_employee_hierarchies PEH WHERE PEH.position_structure_id = :5 AND PEH.superior_id = HR.employee_id)GROUP BY full_name, employee_num, employee_id ORDER BY full_name
call count cpu elapsed disk query current ros ------- ------ -------- ---------- ---------- ---------- ---------- --------- Parse 0 0.00 0.00 0 0 0 0 Execute 1 0.00 0.01 0 0 0 0 Fetch 1 0.03 0.09 29 39 3 2 ------- ------ -------- ---------- ---------- ---------- ---------- --------- total 2 0.03 0.10 29 39 3 2
SELECT STATEMENT GOAL: ALL_ROWS SORT (GROUP BY) FILTER NESTED LOOPS TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF'PER_ALL_PEOPLE_F' INDEX GOAL: ANALYZED (RANGE SCAN) OF'PER_PEOPLE_F_N54' (NON-UNIQUE) TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF'PER_ALL_ASSIGNMENTS_F' INDEX GOAL: ANALYZED (RANGE SCAN) OF'PER_ASSIGNMENTS_F_N12' (NON-UNIQUE) TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'PO_EMPLOYEE_HIERARCHIES_ALL' INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PO_EMPLOYEE_HIERARCHIES_N2' (NON-UNIQUE) SORT (AGGREGATE) TABLE ACCESS GOAL: ANALYZED (FULL) OF'FINANCIALS_SYSTEM_PARAMS_ALL'
Monitor disk reads and buffer gets by executing the following statement:
Typical results returned are shown as follows:
Statistics ---------------------------------------------------------- 70 recursive calls 0 db block gets 591 consistent gets 404 physical reads 0 redo size 315 bytes sent via SQL*Net to client 850 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 0 rows processed
If 'consistent gets' or 'physical reads' are high relative to the amount of data returned, then this a sign that the query is expensive and needs to be reviewed for optimization.
For example, if you are expecting less than 1,000 rows back and 'consistent gets' is 1,000,000 and 'physical reads' is 10,000, then this query needs to be further optimized.
This section describes when to use
EXISTS and when to use the
IN clause in sub-queries.
SELECT COUNT(*) FROM so_picking_lines_all pl WHERE (EXISTS (SELECT pld.picking_line_id FROM so_picking_line_details pld WHERE (pld.picking_line_id=pl.picking_line_id AND pld.delivery_id=:b1)) AND nvl(PL.SHIPPED_QUANTITY,0)>0)
Cost=97740 SELECT STATEMENT Cost= SORT AGGREGATE Cost= FILTER Cost=97740 TABLE ACCESS FULL SO_PICKING_LINES_ALL Cost=4 TABLE ACCESS BY INDEX ROWID SO_PICKING_LINE_DETAILS Cost=3 INDEX RANGE SCAN SO_PICKING_LINE_DETAILS_N3: In this example, the use of
EXISTSresults in a full table scan because there is no selective criteria on the outer query. In this case, an
INoperator is more appropriate. The
INoperator enables Oracle to drive off of the
delivery_idindex, which is rather selective.
SELECT COUNT(*) FROM so_picking_lines_all pl WHERE pl.picking_line_id in (SELECT pld.picking_line_id FROM so_picking_line_details pld WHERE pld.delivery_id=:b1) AND PL.SHIPPED_QUANTITY>0
Cost=265 SELECT STATEMENT Cost= SORT AGGREGATE Cost=265 NESTED LOOPS Cost=19 VIEW Cost=19 SORT UNIQUE Cost=4 TABLE ACCESS BY INDEX ROWID SO_PICKING_LINE_DETAILS Cost=3 INDEX RANGE SCAN SO_PICKING_LINE_DETAILS_N3: Cost=2 TABLE ACCESS BY INDEX ROWID SO_PICKING_LINES_ALL Cost=1 INDEX UNIQUE SCAN SO_PICKING_LINES_U1:
This is another example where
IN is more appropriate than
UPDATE so_sales_credits_interface sc SET request_id=:b0 WHERE request_id IS NULL AND error_flag IS NULL AND interface_status IS NULL AND EXISTS (SELECT NULL FROM so_headers_interface i WHERE sc.original_system_reference=i.original_system_reference AND sc.order_source_id=i.order_source_id AND i.request_id=:b0)
Cost=1459 UPDATE STATEMENT Cost= UPDATE SO_SALES_CREDITS_INTERFACE Cost= FILTER Cost=1459 TABLE ACCESS FULL SO_SALES_CREDITS_INTERFACE Cost=2 TABLE ACCESS BY INDEX ROWID SO_HEADERS_INTERFACE_ALL Cost=1 INDEX UNIQUE SCAN SO_HEADERS_INTERFACE_U1:
In this example, the use of
EXISTS results in a full table scan because there is no selective criteria on the outer query. In this case, an
IN operator is more appropriate. The
IN operator enables Oracle to drive off of the
request_id index, which is rather selective.
This section documents the steps and procedures involved with diagnosing a CBO execution plan for a given SQL statement:
Oracle supports transparent distributed queries to access data from multiple databases. It also provides many other distributed features, such as transparent distributed transactions and a transparent, fully automatic two-phase commit. This section explains how the Oracle8i optimizer decomposes SQL statements and how this affects the performance of distributed queries. The section also provides guidelines on how to influence the optimizer and avoid performance bottlenecks.
This section contains the following sections:
If a SQL statement references one or more remote tables, then the optimizer first determines whether all remote tables are located at the same site. If all tables are located at the same remote site, then Oracle sends the entire query to the remote site for execution. The remote site sends the resulting rows back to the local site. This is called a remote SQL statement. If the tables are located at more than one site, then the optimizer decomposes the query into separate SQL statements to access each of the remote tables. This is called a distributed SQL statement. The site where the query is executed, called the driving site, is usually the local site.
This section describes:
If a SQL statement references multiple tables, then the optimizer must determine which columns belong to which tables before it can decompose the SQL statement. For example:
The optimizer must first determine that the
dname column belongs to the
dept table and the
ename column to the
emp table. After the optimizer has the data dictionary information of all remote tables, it can build the decomposed SQL statements.
Column and table names in decomposed SQL statements appear between double quotes. You must enclose in double quotes any column and table names that contain special characters, reserved words, or spaces.
This mechanism also replaces an asterisk (*) in the select list with the actual column names. For example:
Results in the decomposed SQL statement
If the entire SQL statement is sent to the remote database, then the optimizer uses table aliases
A2, and so on, for all tables and columns in the query, in order to avoid possible naming conflicts. For example:
This is sent to the remote database as the following:
When a query accesses data on one or more databases, one site drives the execution of the query. This is known as the driving site; it is here that the data is joined, grouped, and ordered. By default, the local Oracle server is the driving site. A hint called
DRIVING_SITE enables you to manually specify the driving site.
The decomposition of SQL statements is important, because it determines the number of records or even tables that must be sent through the network. A knowledge of how the optimizer decomposes SQL statements can help you achieve optimum performance for distributed queries.
If a SQL statement references one or more remote tables, then the optimizer must decompose the SQL statement into separate queries to be executed on the different databases. For example:
This could be decomposed into the following:
Which is executed locally, and:
Which is sent to the remote database. The data from both tables is joined locally. All this is done automatically and transparently for the user or application.
In some cases, however, it might be better to send the local table to the remote database and join the two tables on the remote database. This can be achieved either by creating a view or by using the
DRIVING_SITE hint. If you decide to create a view on the remote database, then a database link from the remote database to the local database is also needed.
For example (on the remote database):
Next, select from the remote view instead of the local and remote tables:
Now, the local
dept table is sent through the network to the remote database, joined on the remote database with the
emp table, and the result is sent back to the local database.
For details about the
The rule-based optimizer does not have information about indexes for remote tables. It never, therefore, generates a nested loops join between a local table and a remote table with the local table as the outer table in the join. It uses either a nested loops join with the remote table as the outer table or a sort merge join, depending on the indexes available for the local table.
The cost-based optimizer can consider more execution plans than the rule-based optimizer. The cost-based optimizer knows whether indexes on remote tables are available, and in which cases it makes sense to use them. The cost-based optimizer considers index access of the remote tables as well as full table scans, whereas the rule-based optimizer considers only full table scans.
The particular execution plan and table access that the cost-based optimizer chooses depends on the table and index statistics. For example:
Here, the optimizer might choose the local
dept table as the driving table, and access the remote
emp table using an index; so the decomposed SQL statement becomes the following:
This decomposed SQL statement is used for a nested loops operation.
If tables are on more than one remote site, then it can be more effective to create a view than to use the
DRIVING_SITE hint. If not all tables are on the same remote database, then the optimizer accesses each remote table separately. For example:
SELECT d.dname, e1.ename, e2.job FROM dept d, emp@remote e1, emp@remote e2 WHERE d.deptno = e1.deptno
This results in the decomposed SQL statements:
To join the two
emp tables remotely, create a view with the join of the remote tables on the remote database. For example (on the remote database):
Now, select from the remote view, instead of the remote tables:
This results in the decomposed SQL statement:
In a distributed query, all hints are supported for local tables. For remote tables, however, you can use only join order and join operation hints. (Hints for access methods, parallel hints, and so on, have no effect.) For remote mapped queries, all hints are supported.
For more information on hints for join orders and hints for join operations, see Chapter 7, "Using Optimizer Hints".
PLAN gives information not only about the overall execution plan of SQL statements, but also about the way in which the optimizer decomposes SQL statements.
PLAN stores information in the
PLAN_TABLE table. If remote tables are used in a SQL statement, then the
OPERATION column contains the value
REMOTE to indicate that a remote table is referenced, and the
OTHER column contains the decomposed SQL statement that will be sent to the remote database. For example:
EXPLAIN PLAN FOR SELECT DNAME FROM DEPT@REMOTE SELECT OPERATION, OTHER FROM PLAN_TABLE OPERATION OTHER --------- ------------------------------------- REMOTE SELECT A1."DNAME" FROM "DEPT" A1
Note the table alias and the double quotes around the column and table names.
For more information on
Partition views coalesce tables that have the same structure, but that contain different partitions of data. Partition views are supported for distributed databases where each partition resides on a database, and the data in each partition has common geographical properties.
When a query is executed on a partition view, and when the query contains a predicate that contains the result set to a subset of the view's partitions, the optimizer chooses a plan which skips partitions that are not needed for the query. This partition elimination takes place at run time, when the execution plan references all partitions.
Partition views were the only form of partitioning available in Oracle7 Release 7.3. They are not recommended for new applications in Oracle8i. Partition views that were created for Oracle7 databases can be converted to partitioned tables by using the
PARTITION option of the
There are circumstances under which a
ALL view enables the optimizer to skip partitions. The Oracle server that contains the partition view must conform to the following rules:
PARTITION_VIEW_ENABLEDinitialization parameter is set to
ALL view, there are multiple select statements, and each of these is called a branch. A
ALL view is a partition view if each select statement it defines conforms to the following rules:
WHEREclause that defines the subset of data from the partition that is contained in the view.
WHEREclause with subquery,
BY, aggregate functions,
SELECTlist of each branch is * or an explicit expansion of "*". The
FROMclause should be either the base table or a view of the base table that contains all the columns in the base table.
ALLview are exactly the same.
Partition elimination is based on column transitivity with constant predicates. The
WHERE clause used in the query that accesses the partition view is pushed down to the
WHERE clause of each of the branches in the
ALL view definition. For example:
Where the view
emp_view is defined as the following:
SELECT * FROM emp@d10 WHERE deptno=10 SELECT * FROM emp@d20 WHERE deptno=20 SELECT * FROM emp@d30 WHERE deptno=30 SELECT * FROM emp@d40 WHERE deptno=40
30" predicate used in the query is pushed down to the queries in the
ALL view. For a
WHERE clause such as "
30", the optimizer applies transitivity rules to generate an extra predicate of "10=30". This extra predicate is always false; thus, the table (
emp@d10) need not be accessed.
Transitivity applies to predicates which conform to the following rules:
WHEREclause for each branch are of the form:
where relation is of the form
relop is one of =, !=, >, >=, <, <=
To confirm that the system recognizes a partition view, check the
PLAN output. The following operations appear in the
OPERATIONS column of the
PLAN output, if a query was executed on a partition view:
This should include the optimizer cost in the
This should specify
When an operation is a child of the
PARTITION does not appear in the option column of the
ALL operation, then the partition view was not recognized, and no partitions were eliminated. Make sure that the
ALL view adheres to the rules defined in "Using UNION ALL to Skip Partitions" .
The following example shows the partition view
customer partitioned into two partitions: the
east database contains the East Coast customers, and the
west database contains the West Coast customers.
west database contains the following table
CREATE TABLE customer_west ( cust_no NUMBER CONSTRAINT CUSTOMER_WEST_PK PRIMARY KEY, cname VARCHAR2(10), location VARCHAR2(10) );
east database contains the database
CREATE TABLE customer_east ( cust_no NUMBER CONSTRAINT CUSTOMER_EAST_PK PRIMARY KEY, cname VARCHAR2(10), location VARCHAR2(10) );
The following partition view is created at the
east database (you could create a similar view at the
CREATE VIEW customer ASSELECT * FROM customer_east WHERE location='EAST' UNION ALL SELECT * FROM customer_west@west WHERE location='WEST';
If you execute the following statement, then notice that the
customer_west table in the
west database is not accessed:
As shown in the
PLAN output, the optimizer recognizes that the
customer_west partition need not be accessed:
SELECT LPAD(' ',LEVEL*3-3)||OPERATION OPERATION,COST,OPTIONS, OBJECT_NODE, OTHER FROM PLAN_TABLE CONNECT BY PARENT_ID = PRIOR ID START WITH PARENT_ID IS NULL OPERATION COST OPTIONS OBJECT_NOD OTHER ------------------------- ---- ---------- ---------- ------------------------- SELECT STATEMENT 1 VIEW 1 UNION-ALL PARTITION TABLE ACCESS 1 FULL FILTER REMOTE 1 WEST.WORLD SELECT "CUST_NO","CNAME", "LOCATION" FROM "CUSTOMER _WEST" "CUSTOMER_WEST" WH ERE "LOCATION"='EAST' AND "LOCATION"='WEST'
Distributed queries within the same version of Oracle have the following restrictions:
SELECTif all the tables in the select list are remote. In this case, you should create a view for the
SELECTstatement at the remote site.
The Transparent Gateways transparently access data from a non-Oracle system (relational databases, hierarchical databases, file systems, and so on), just as if it were another Oracle database.
When a SQL statement accesses data from non-Oracle systems, it is said to be a heterogeneous distributed SQL statement. To optimize heterogeneous distributed SQL statements, follow the same guidelines as for optimizing distributed SQL statements that access Oracle databases only. However, you must consider that the non-Oracle system usually does not support all the functions and operators that Oracle8i supports.
The Transparent Gateways tell Oracle (at connect time) which functions and operators they do support. If the other data source does not support a function or operator, then Oracle performs that function or operator. In this case, Oracle obtains the data from the other data source and applies the function or operator locally. This affects the way in which the SQL statements are decomposed and can affect performance, especially if Oracle is not on the same machine as the other data source.
You can use partition views with Oracle Transparent Gateways release 8 or higher. Make sure you adhere to the rules that are defined in "Using UNION ALL to Skip Partitions". In particular:
ALLview must be the same. Non-Oracle system datatypes are mapped onto Oracle datatypes. Make sure that the datatypes of each partition that reside in the different non-Oracle systems all map to the same Oracle datatype. To see how datatypes are mapped onto Oracle datatypes, execute a
DESCRIBEstatement in SQL*Plus.
You can improve performance of distributed queries in several ways:
In many cases, there are several SQL statements which can achieve the same result. If all tables are on the same database, then the difference in performance between these SQL statements might be minimal; but, if the tables are located on different databases, then the difference in performance might be more significant.
The cost-based optimizer uses indexes on remote tables, considers more execution plans than the rule-based optimizer, and generally gives better results. With the cost-based optimizer, performance of distributed queries is generally satisfactory. Only in rare occasions is it necessary to change SQL statements, create views, or use procedural code.
In some situations, views can be used to improve performance of distributed queries. For example:
In some rare occasions, it can be more efficient to replace a distributed query by procedural code, such as a PL/SQL procedure or a precompiler program. This option is mentioned here only for completeness, not because it is often needed.