16 Improving SQL Query Performance for the Oracle Lite Database

The following sections describe the methods you can manage the performance of your SQL Queries against the Oracle Lite database:

Section 16.1, "Determining Performance of Client SQL Queries With the EXPLAIN PLAN"
Section 16.2, "Determine SQL Query Execution Through Oracle Database Lite Tracing"
Section 16.3, "Optimizing SQL Queries for the Oracle Lite Database"

16.1 Determining Performance of Client SQL Queries With the EXPLAIN PLAN

To improve performance when accessing data on the local client Oracle Lite database, use the EXPLAIN PLAN. The EXPLAIN PLAN to determine the performance of your SQL query execution on the Oracle Lite database. To execute a SQL statement, Oracle might need to perform several steps. Each of these steps either physically retrieves rows of data from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan, which includes an access path for each table that the statement accesses and an ordering of the tables (the join order) with the appropriate join method. The execution plan shows you exactly how Oracle Database Lite executes your SQL statement.

The components of an execution plan include the following:

An ordering of the tables referenced by the statement.
An access method for each table mentioned in the statement.
A join method for tables affected by join operations in the statement.

The EXPLAIN PLAN command stores the execution plan chosen by the Oracle Database Lite optimizer for SELECT, UPDATE, INSERT, and DELETE statement.

For information on how to generate an Explain Plan, see the Section 1.11 "Tuning SQL Statement Execution with the EXPLAIN PLAN" in the Oracle Database Lite SQL Reference.

16.2 Determine SQL Query Execution Through Oracle Database Lite Tracing

By setting the parameter OLITE_SQL_TRACE = YES in the polite.ini or polite.txt file on the client device, Oracle Database Lite generates a trace file named oldb_trc.txt that shows the following:

The order tables are accessed by a query.
The table scan access method used.
The value of any bind variables utilized by the query.
The time it takes for the first record to be retrieved.

See Section 7.8, "Enable Tracing for the Oracle Lite Database" for full details.

16.3 Optimizing SQL Queries for the Oracle Lite Database

You can optimize the following SQL queries in your application:

Single-table queries
Join queries
Order By and Group By clauses

The following sections provide tips on improving the performance of the application SQL queries against the Oracle database:

Section 16.3.1, "Optimizing Single-Table Queries"
Section 16.3.2, "Optimizing Join Queries"
Section 16.3.3, "Optimizing with Order By and Group By Clauses"
Section 16.3.4, "Advanced Optimization Techniques for SQL Queries in Oracle Database Lite"

The tip examples use the database schema listed in Table 16-1:

Table 16-1 Database Schema Examples

Tables	Columns	Primary Keys	Foreign Keys
LOCATION	LOC# LOC_NAME	LOC#
EMP	SS# NAME JOB_TITLE WORKS_IN	SS#	WORKS_IN references DEPT (DEPT#)
DEPT	DEPT# NAME BUDGET LOC MGR	DEPT#	LOC references LOCATION (LOC#) MGR references EMP (SS#)

Tables

Columns

Primary Keys

Foreign Keys

LOCATION

LOC#

LOC_NAME

LOC#

EMP

SS#

NAME

JOB_TITLE

WORKS_IN

SS#

WORKS_IN references DEPT (DEPT#)

DEPT

DEPT#

NAME

BUDGET

LOC

MGR

DEPT#

LOC references LOCATION (LOC#)

MGR references EMP (SS#)

16.3.1 Optimizing Single-Table Queries

To improve the performance of a query that selects rows of a table based on a specific column value, create an index on that column. For example, the following query performs better if the NAME column of the EMP table has an index.

SELECT * 

FROM EMP 

WHERE NAME = 'Smith';

An index may ruin performance if selecting more than 10% of the rows of the indexing columns is poor. For example, an index on JOB_TITLE may not be a good choice even if the query is as follows.

SELECT *
FROM EMP
WHERE JOB_TITLE='CLERK'

16.3.2 Optimizing Join Queries

The following can improve the performance of a join query (a query with more than one table reference in the FROM clause).

16.3.2.1 Create an Index on the Join Column(s) of the Inner Table

In the following example, the inner table of the join query is DEPT and the join column of DEPT is DEPT#. An index on DEPT.DEPT# improves the performance of the query. In this example, since DEPT# is the primary key of DEPT, an index is implicitly created for it. The optimizer will detect the presence of the index and decide to use DEPT as the inner table. In case there is also an index on EMP.WORKS_IN column the optimizer evaluates the cost of both orders of execution; DEPT followed by EMP (where EMP is the inner table) and EMP followed by DEPT (where DEPT is the inner table) and picks the least expensive execution plan.

SELECT e.SS#, e.NAME, d.BUDGET

FROM EMP e, DEPT d 

WHERE e.WORKS_IN = DEPT.DEPT# 

AND e.JOB_TITLE = 'Manager';

16.3.2.2 Bypassing the Query Optimizer

Normally, the optimizer selects the best execution plan, an optimal order of tables to be joined. In case the optimizer is not producing the best execution plan, you can control the order of execution using the HINTS feature. For more information, see the Oracle Database Lite SQL Reference.

For example, if you want to select the name of each department along with the name of its manager, you can write the query in one of two ways. In the first example which follows, the hint /*+ordered*/ says to do the join in the order the tables appear in the FROM clause.

SELECT /*+ordered*/ d.NAME, e.NAME

FROM DEPT d, EMP e

WHERE d.MGR = e.SS#

or:

SELECT /*+ordered*/ d.NAME, e.NAME 

FROM EMP e, DEPT d 

WHERE d.MGR = e.SS#

Suppose that there are 10 departments and 1000 employees, and that the inner table in each query has an index on the join column. In the first query, the first table produces 10 qualifying rows (in this case, the whole table). In the second query, the first table produces 1000 qualifying rows. The first query will access the EMP table 10 times and scan the DEPT table once. The second query will scan the EMP table once but will access the DEPT table 1000 times. Therefore the first query performs better. As a rule of thumb, tables should be arranged from smallest effective number of rows to largest effective number of rows. The effective row size of a table in a query is obtained by applying the logical conditions that are resolved entirely on that table.

In another example, consider a query to retrieve the social security numbers and names of employees in a given location, such as New York. According to the sample schema, the query would have three table references in the FROM clause. The three tables could be ordered in six different ways. Although the result is the same regardless of which order you choose, the performance could be quite different.

Suppose the effective row size of the LOCATION table is small, for example select count(*) from LOCATION where LOC_NAME = 'New York' is a small set. Based on the above rules, the LOCATION table should be the first table in the FROM clause. There should be an index on LOCATION.LOC_NAME. Since LOCATION must be joined with DEPT, DEPT should be the second table and there should be an index on the LOC column of DEPT. Similarly, the third table should be EMP and there should be an index on EMP#. You could write this query as:

SELECT /*+ordered*/ e.SS#, e.NAME 

FROM LOCATION l, DEPT d, EMP e 

WHERE l.LOC_NAME = 'New York' AND 

l.LOC# = d.LOC AND 

d.DEPT# = e.WORKS_IN;

16.3.3 Optimizing with Order By and Group By Clauses

Various performance improvements have been made so that SELECT statements run faster and consume less memory cache. Group by and Order by clauses attempt to avoid sorting if a suitable index is available.

16.3.3.1 IN Subquery Conversion

Converts IN subquery to a join when the select list in the subquery is uniquely indexed.

For example, the following IN subquery statement is converted to its corresponding join statement. This assumes that c1 is the primary key of table t2:

SELECT c2 FROM t1 WHERE 

c2 IN (SELECT c1 FROM t2);

becomes:

SELECT c2 FROM t1, t2 WHERE t1.c2 = t2.c1;

16.3.3.2 ORDER BY Optimization with No GROUP BY

This eliminates the sorting step for an ORDER BY clause in a select statement if ALL of the following conditions are met:

All ORDER BY columns are in ascending order or in descending order.
Only columns appear in the ORDER BY clause. That is, no expressions are used in the ORDER BY clause.
ORDER BY columns are a prefix of some base table index.
The estimated cost of accessing by the index is less than the estimated cost of sorting the result set.

16.3.3.3 GROUP BY Optimization with No ORDER BY

This eliminates the sorting step for the grouping operation if GROUP BY columns are the prefix of some base table index.

16.3.3.4 ORDER BY Optimization with GROUP BY

When ORDER BY columns are the prefix of GROUP BY columns, and all columns are sorted in either ascending or in descending order, the sorting step for the query result is eliminated. If GROUP BY columns are the prefix of a base table index, the sorting step in the grouping operation is also eliminated.

16.3.3.5 Cache Subquery Results

If the optimizer determines that the number of rows returned by a subquery is small and the query is non-correlated, then the query result is cached in memory for better performance. For example:

select * from t1 where 

t1.c1 = (select sum(salary) 

from t2 where t2.deptno = 100);

16.3.4 Advanced Optimization Techniques for SQL Queries in Oracle Database Lite

Note:

This section is provided for those administrators and developers who are already very familiar with optimization techniques for SQL queries. Thus, this material is advanced and not for a beginner.

Unlike procedural languages—such as Java or C—SQL is a declarative language. It states what to do, but does not tell how to do it. This frees developers from writing navigation code to retrieve data. The responsibility of navigation falls on the database management system (DBMS).

The query optimizer—a component of the DBMS—is responsible to come up with an efficient plan to execute the query. Since there are several ways to perform a query, the query optimizer and query execution engine decide how to deliver the result in the quickest time. In a perfect world, the query optimizer will always be right and the database will always be infallible. However, this is not the case. The developer needs to think about the characteristics and peculiarities of the query optimizer. When you do run into performance issues, you can improve the performance as simply as creating some indexes, dropping additional indexes, or re-writing the query. Oracle Database Lite constantly improves the optimizer, so that you do not have to re-write the query.

This section introduces you to the Oracle Database Lite Query Optimizer, briefly covers the architecture of Oracle Lite database, and then provides details of the query compilation and optimization. Lastly, we provide tips on improving query performance. For further information, there are several excellent articles in technical journals that cover SQL query optimization in great technical detail. Some of these journals are listed in the reference section of this document.

Section 16.3.4.1, "Oracle Lite Database Application Architecture"
Section 16.3.4.2, "Overview of SQL Runtime"
Section 16.3.4.3, "Execution Plan Generation"
Section 16.3.4.4, "Query Execution Engine"
Section 16.3.4.5, "Optimization Tips"
Section 16.3.4.6, "Glossary"
Section 16.3.4.7, "References"

16.3.4.1 Oracle Lite Database Application Architecture

The basic database architecture components from the point of view of the application developer are outlined below:

Figure 16-1 Components in applications using ODBC or JDBC

Description of "Figure 16-1 Components in applications using ODBC or JDBC"

Table 16-2 Architecture Components

Component	Description
<<	Indicates a request and response information flow.
ODBC application	Typically a C or C++ application that issues ODBC API calls.
Java Application	A piece of code written in Java that uses JDBC API to manipulate the database.
ODBC driver	The driver that implements the ODBC API. It calls into the SQL runtime engine (SQLRT).
SQLRT	The SQL Runtime Engine that implements SQL functionality using the capabilities of the underlying database engine.
DB engine and DB	Database engine and Database

16.3.4.1.1 ODBC Application

The ODBC application is usually written in C, or C++, or Visual Basic. Third party tools, such as Power Builder, can also generate code that invokes ODBC. The ODBC driver implements ODBC API semantics and uses internal SQLRT APIs to call into SQLRT.

16.3.4.1.2 SQLRT

SQLRT, the Oracle Lite SQL engine, is implemented in the olsql40.dll. SQLRT implements SQL functionality using the capabilities of underlying database engine. This is covered in some detail in the following sections.

16.3.4.1.3 DB Engine

The Oracle Lite database engine implements the object kernel API (also known as OKAPI). The database engine implements an object view of the world; that is, it implements classes, attributes, and iterators.

Instead of creating a table—which contains a set of columns—you create a class containing a set of attributes.
Instead of creating a cursor on a table, you create an iterator on a class or a group.

All classes belong to a group, which is a collection of classes.

The DB Engine maintains its own set of Meta catalogs (Meta classes) to store declarative information about classes, attributes, and indexes. For example, see the table below:

Table 16-3 Database Engine Meta Classes

Class	Description
okClass	Information about every class
okAttribute	Information for every attribute in all classes
okIndex	Information for every index created in the database
okGroup	Information about every group in the database

Note:

All object kernel Meta classes belong to the MetaGroup group, which is case sensitive. The DB Engine is responsible to implement the ACID properties of a transaction.

16.3.4.2 Overview of SQL Runtime

The SQL Runtime is responsible for providing a SQL interface to the database. It maps SQL entities to the appropriate object kernel entities and translates all SQL operations into a sequence of basic object kernel primitives. For example, a table is mapped to a class; its table columns are mapped to attributes within the class. The mapping between SQL operations and object kernel APIs is not defined here, as this is not the focus of the document.

Execution of a SQL statement involves the following steps:

Compile

You can compile a SQL statement into an internal representation that is easy and efficient to execute. A SQL statement can be one of the following:
- DDL (data definition language): An example of a DDL statement is "CREATE INDEX emp_last_name ON employee (last_name, first_name)".
- DML (data manipulation) statement: Examples of DML statements are SELECT, INSERT, UPDATE, DELETE and COMMIT statements.
Bind

A SQL statement may contain markers (such as "?"), which are used as placeholders for parameters that can be supplied before execution of the statement. Binding sets the value for each marker in the SQL statement.
Execute

This is when a previously compiled statement is executed. Execution involves interpretation of the internal representation of the SQL statement and making all calls into the database engine to achieve the desired result. The following are examples of what the execution means for particular statements:
- For an index creation statement, the index is created.
- For an INSERT statement, the row is inserted into the table. In the object terminology, a new object is created.
- For a SELECT statement, the statement is executed, where a row is available for retrieval. The execution of a SELECT statement produces a result set, which is a set of rows. It is not necessary that all rows be materialized. However, for a READ COMMITED isolation level transaction, all rows are materialized at this step.
Fetch

This step is required for a SELECT statement. Every fetch call returns one row to the caller.
Close

Close the result set created in the execute step. Any remaining rows, if any, are discarded and any resources tied to the processing of the statement are freed up.

16.3.4.2.1 Compilation

Compilation is somewhat like translating a JAVA program into byte code. In SQLRT, we translate a SQL statement into an internal data structure called a tree. The following are the steps SQLRT goes through to generate the execution tree, which determines the best method to execute that statement:

Parsing: The input statement is scanned and is converted into an abstract syntax tree. Grammatically incorrect statements are flagged and any syntax error is returned.
Normalization: The tree is walked, normalized and decorated. Transformation is carried out and semantic information is added to the tree nodes. Any tautologies are evaluated.

For Example ((1 = 0) AND C1 > 5 OR C2 = 7) is evaluated to (C2 = 7). Any semantic error is caught during the tree traversal, such as Data type mismatches in expressions or SQL operations, references to non existing tables, or columns, unsupported SQL operations, and so on.
View expansion: Any references to views are expanded in line and the view tree is walked.
View Optimization: If possible, the view expansions are collapsed into the main queries. For example, the statement "SELECT * FROM v1,v2 where v1.c1=v2.c2" is resolved to a query on the base tables in v1 and v2. The transformation takes place on the query tree. This merging may not be possible. For example, if a view selects aggregate functions (COUNT, AVG, and so on.) or contains UNION or MINUS operators, it cannot be collapsed.
Subquery optimization: You can re-write the query to eliminate the subquery. This technique is called subquery un-nesting. The tables and filter conditions in the where clause are moved to the parent query block. This is possible only when the subquery does not contains any aggregates, UNION, or MINUS operations and SQLRT can make sure that the subquery does not return any duplicate rows.
Transitive Closure of Predicates: Predicates are analyzed and extra inferences are added to the WHERE clause, which helps the optimizer in selecting the best execution plan.
Predicate Push: The predicates are pushed down from top to bottom, which helps the queries on top of views. When a view contains any UNION, MINUS and GROUP BY clauses, it helps to push the filtering condition to the source of data or base tables.
Execution Plan Generation: The query is now analyzed to generate the best execution plan, which is based on a cost-based approach.
Query Execution: The execution plan generated is used to execute the query.

16.3.4.2.2 Query Tree Transformations or Query Re-write Examples

T hese are examples of query tree transformations or query re-writes.

View Optimization Example for View Replacement or Merging
View Expansion and Predicate Push
Subquery Transformation

View Optimization Example for View Replacement or Merging

Consider the following statements:

SQL> CREATE VIEW v_dept_emp AS SELECT emp.*, dept.dname, loc FROM emp, dept WHERE emp.deptno=dept.deptno;
SELECT * FROM v_dept_emp WHERE loc = 'DALLAS';

The query tree transformation process substitutes the definition of view v_dept_emp into the select query and collapses the query into single level query. The query then becomes as follows:

SELECT  emp.*, dept.dname, dept.loc FROM emp, dept WHERE emp.deptno=dept.deptno and loc = 'DALLAS'

Note:

The final query does not refer to the view.

View Expansion and Predicate Push

Consider the following example:

SQL> CREATE VIEW v_sal_expense (dno, name, total_sal) AS SELECT dept.deptno, dept.dname, sum(sal)  FROM emp, dept WHERE emp.deptno=dept.deptno group by dept.deptno, dname;

SELECT  * FROM v_sal_expense WHERE  total_sal > 10000;

Since the query involves aggregation, it cannot be collapsed into the main query and the query after re-write is as follows:

SELECT  * FROM (
   SELECT dept.deptno, dept.dname, sum(sal)  total_sal
   FROM emp, dept 
   WHERE emp.deptno=dept.deptno 
   group by dept.deptno, dname)  temp_view
WHERE temp_view.total_sal > 10000;

Note:

The predicate total_sal > 10000 was not pushed into the inner query block as total_sal refers to an aggregate sum(sal) column in the view definition.

Consider the following query on the same view:

SELECT  * FROM v_sal_expense WHERE  dno = 10;

The query after the re-write is as follows:

SELECT  * FROM (
   SELECT dept.deptno, dept.dname, sum(sal)  total_sal
   FROM emp, dept 
   WHERE emp.deptno=dept.deptno and
   dept.deptno = 10
   group by dept.deptno, dname)  temp_view
WHERE dno=10;

The predicate dept.deptno = 10 is pushed down into the nested view expansion, which demonstrates the Predicate Push optimization. The aggregation is performed for the department number 10 only; therefore, this query performs better.

Subquery Transformation

Consider the following query:

SELECT * FROM emp WHERE emp.deptno IN (SELECT deptno 
          FROM dept WHERE loc = 'DALLAS');

Since the subquery selects a unique key column (deptno), it can be converted into a join query. In general, a join query provides more flexibility in optimization and performs better. This query could be transformed as follows:

SELECT emp.* FROM emp, dept 
   WHERE emp.deptno = dept.deptno AND dept.loc = 'DALLAS';

Note:

The above subquery is a non-correlated subquery; that is, the subquery does not make a reference to columns from the tables in the outer query. For a non-correlated query, Oracle Database Lite does not always transform it to a join query. Instead, sometimes it decides to cache the query result into memory and perform the IN operation at run-time. A correlated subquery, if it meets the correctness requirements, is always transformed into a join query, as follows:

SELECT * FROM emp WHERE emp.deptno IN (SELECT deptno FROM dept WHERE loc = 'DALLAS' AND emp.deptno = dept.depno);

Which is transformed into the following:

SELECT emp.* FROM emp, dept WHERE emp.deptno = dept.deptno AND dept.loc = 'DALLAS' AND emp.deptno = dept.depno;

16.3.4.3 Execution Plan Generation

Execution plan generation is the last step of query compilation. It is the responsibility of the query optimizer to find the least expensive plan. It generates all plausible execution plans and picks the least expensive plan. As the number of tables in a query increases, the cost of evaluating all possible orders of execution increases exponentially. The optimizer uses its own heuristics to reduce the search space. The query optimizer considers only I/O costs for comparing the different execution plans. It does not consider the CPU time used to perform different operations. The I/O cost is computed based on the statistical information available to it; therefore, the quality of cost estimation depends upon the quality of statistics available.

Section 16.3.4.3.1, "Statistics"
Section 16.3.4.3.2, "Access Methods"
Section 16.3.4.3.3, "Single Table I/O Cost"
Section 16.3.4.3.4, "Join Query Optimization"

16.3.4.3.1 Statistics

The Oracle Database Lite engine maintains the following statistics at all times. You do not have to run a separate command to update the statistics.

Table 16-4 Oracle Database Lite Engine Statistic Parameters

Parameter	Description
npg	Number of data pages allocated to each table.
nrows	Number of rows in the table.
ndk	For each index, number of distinct keys.
nrangeSize	For each index, OKAPI supports an API to estimate the number of rows selected for a given range of key values.
nMaxKey	Maximum value of a key (an OKAPI call is used to estimate it).
nMinKey	Minimum value of a key (an OKAPI call is used to estimate it).

Selectivity Factor

To estimate I/O cost, the optimizer estimates the number of pages that will be read or written to satisfy the query. It evaluates the disk I/O costs for different execution plans before selecting the best one. It assigns a selectivity factor to each predicate (also called a factor in boolean algebra), which is defined as an expected fraction of rows and satisfies the predicate. That is, the selectivity factor is defined as follows:

Selectivity factor = (expected-number-rows)/(total-number-of-rows)

The current values of the selectivity factor are as follows:

Note:

The values are subject to change without any notice.

Table 16-5 Selectivity Factor Values

Condition	Example	Default	With Index
Equality	Name = 'Smith'	1/5	1/ndk
Range	C1 > 5	1/2	Pretty good estimate
Between	C1 between (4,10) orC1 > 4 and C1 < 10	1/3	Pretty good estimate
Is Null	C1 is NULL	1/10	1/10
Like	Name Like 'Sm%'	1/3	Estimate*

Like: A like predicate is transformed into a range and like. The range predicate is then appropriately optimized. For example, Name like 'S%' is converted into Name like 'S%' AND Name >= 'S' AND Name < 'T'. Now the range ('S', 'T') for Name can be used to calculate the selectivity.

Not Equal: Selectivity factor for not equal is as follows: (1-Selectivity factor for the equal operator).

Caveat With Bind Variables

When bind variables are present, then the selectivity factor for "range", "between" and "like" cannot be correctly estimated and the default selectivity factor is used.

16.3.4.3.2 Access Methods

An important component of an execution plan is the "access method" used for each table involved in the execution plan. The Oracle Lite database engine supports the following access methods:

A Full table scan: All pages of the table are searched. Therefore, the cost of retrieval is equal to npg (the number pages) in the table.
Index access method: A key value or key range—such as, price between (10,15)—is used to retrieve the qualifying rows. The key or key range is used to find the row-ids of the matching rows. Each row-id uniquely identifies the location of the row in the database. The rows are fetched in increasing or decreasing order of the key, which is useful when optimizing queries containing order by or group by clauses.

Cost of Access Methods

The I/O Cost can be computed in terms of the following parameters:

Table 16-6 I/O Access Method Cost

Parameter	Description
npg	Number of data pages.
nrows	Number of rows in the table.
h	Height of the index. It is also called depth of an index tree.
nlf	Number of leaf pages in an index tree.
sf	Expected Fraction of the number of rows selected. It is between 0 and 1.

Since the values for "h" and "nlf" are not available, its values are improvised based on nrows and estimated key size.

Cost of a Full Table Scan

The cost of a full table scan is the number of data pages, as follows: Cost = npg.

Cost of an Index Scan

The cost of an index scan is approximated to be as follows:

Cost = the number of index pages read + the number of data pages read

Where: number of index pages read = (h-2) + ceil(nlf * sf). The value for h is calculated based on the estimated key size and number of rows.

It is assumed that the root of index tree is always in memory. Thus, the cost of reading the root page is ignored. Assuming that only a small number of rows are selected by the index access method, we approximate the number of leaf pages read to be one. This is performed sine we do not have information about nlf. Even for a range scan, we approximate it to be one.

For a primary key index or for an index with ndk/nrows close to one, we assume the data to be clustered on the key column values and we estimate the number of data pages read as follows:

Number of data pages read = ceil(sf * npg)

If the index is not a primary key index, then there is a good chance that the consecutive key values will not be on the same data page. Each new row access can potentially read a new page. The number of data pages read will be in between sf *npg and sf * nrows. We use the following formula as an approximation to actual number of data pages read:

Number of data pages read = ceil (sf * sqrt (npg, nrows))

Therefore, the cost of index access is as follows:

For a clustered index, the cost is = (h-1) + ceil(sf * npg).
For a non-clustered index, the cost is = (h-1) + ceil(sf*sqrt(npg,nrows)).

16.3.4.3.3 Single Table I/O Cost

To find the optimal execution plan for a single table query, the costs for each possible access methods are evaluated and the least expensive access method is picked. For example:

SELECT * FROM T1 where C1 between 1 and 5 AND C2 > 5 and C2 < 100;

Assuming that the indexes exist on C1 and C2, then the optimizer estimates the selectivity for predicates "C1 between 1 and 5" and "C2 > 5 and C2 < 100". It then computes the I/O cost for retrieving the rows using a full table scan, an index scan on C1, and an index scan on C2. The access method that produces the least amount of I/O is chosen.

Interesting Order Optimization

For a single table query that contains "order by" or "group by" clause, the interesting order optimization technique is used to influence which access method is chosen. The result set size and sorting cost are estimated. Sorting can be avoided, if an index is available that can return the rows in the right order. If it is less costly to use an execution plan that does not involve any sort, then it will be chosen. The size of the result set is given by the following:

Number of rows in the result set =      nrows * min(selectivity values for each predicate in the where clause)

16.3.4.3.4 Join Query Optimization

The join query optimization involves evaluation of a large number of query execution plans. The number of possible plans increases exponentially with the number of tables. The following query illustrates this:

SELECT e.empno, e.ENAME, d.dname
FROM EMP e, DEPT d
WHERE e.deptno = DEPT.DEPTNO
AND e.JOB = 'MANAGER'
AND e.sal > 2000;

Here both possible orders of (EMP, DEPT) or (DEPT, EMP) exist for the execution. If EMP is chosen as the driving table, then the rows qualifying (EMP.JOB_TITLE = 'Manager' AND EMP.sal > 5000) are retrieved one by one from the EMP table. The optimizer considers the three possible access methods for EMP table, as follows:

Sequential scan of EMP table.
Index access using index on EMP.JOB_TITLE if one exists.
Index access using index on EMP.SAL if one exists.

The optimizer picks the method that produces least amount of I/O. Based on the selectivity factor assigned to each predicate, it estimates the number of rows selected for the EMP table. Then, it estimates the cost of retrieving a set of matching rows for each outer row in the EMP table. The total cost of execution using this order is as follows:

Cost = npgemp + est_rowemp * ( cost_per_row_dept)

Where:

Table 16-7

Parameter	Description
est_rowemp	Estimated number of rows fetched from EMP table
cost_per_row_dept	Cost of index access into DEPT to retrieve matching department rows for each row fetched from EMP

The same calculation is repeated for the order DEPT, EMP. Whichever order produces the lowest cost is chosen. As the number of predicates and tables increase the cost of computing, the different possibilities grow exponentially. To reduce the compilation time, Oracle Lite uses aggressive heuristics to prune the search space, thereby sometimes landing into a sub-optimal execution plan. Also, unreliable statistics values, skewed data, and unavailability of selectivity factors for non-index columns can contribute to sub-optimal execution plan generation.

The following are the main tasks performed during a join query optimization:

The optimizer isolates local predicates (the predicates on a single table) from join predicates. In addition, the optimizer estimates the effective table sizes by the applying the selectivity factor of local predicates to the table. Local predicates are predicates that refer to columns from one table only. Whenever an index is available, the calculation of selectivity factor is fairly accurate. Oracle Lite assumes that the data is uniformly distributed; however, when the data is skewed, the estimate can go wrong and the execution plan chosen may not be optimal. When an index is not available, it uses default selectivity for computation.
A driving table—the table with the smallest effective cardinality—is picked first. Its optimal access method is picked. The table is put in the set of "outer" tables.
The query is examined to discover which possible tables can be joined to the tables in the current outer tables. The cross product is not considered. The I/O cost is estimated for all possible joins. The least costly join is chosen and is added, along with the chosen table, to the outer table set. The step is repeated until it has selected all tables in the query. By the end, it has computed the execution order and access methods for each table in the query.
The optimizer saves the current execution plan and picks a new driving table, whose effective cardinality is the second lowest. It repeats step 3 and selects the least expensive execution plan of the two plans. Again, it repeats step 3 with the third, fourth and fifth smallest table—always keeping a copy of the current least expensive execution plan.
The optimizer creates hints for when to create an index for intermediate results of a view. This is useful when joining a view that is not collapsed to another table or view.
When two tables are outer joined, the master table has to be scanned before the slave table (the table whose column has "+" in the joining column).

Interesting Order Optimization

An interesting order optimization eliminates the final sorting step for queries containing order by or group by clauses. If a suitable index exists that can eliminate the sorting step, then the cost is estimated the following ways:

Sorting + the best execution plan.
Pick a drive table that has columns from order by or group by clause, such that an index can be used to retrieve the data in the right order. Estimate the execution plan cost.

The least expensive plan is then chosen.

16.3.4.4 Query Execution Engine

The SQL Runtime engine relies on the database engine to implement simple data filtering, access methods, and sorting. A single table query execution of a query involves the following steps:

Decide if a temporary table is necessary to store the result. For a READ COMMITED isolation level transaction, a temporary table is required to store the result. While the result is being materialized, all other transactions are blocked from committing to preserve the read committed semantics. A temporary table is necessary when sorting is required. The DBE can only sort full tables.
Create the iterator on the table. Push the maximum number filter conditions to the database engine. This way, the smaller result set is returned to SQLRT.
If there are any complex filters that cannot be evaluated by DBE, evaluate them now and reject any rows that do not qualify. Examples of complex filters are SQL functions and subquery operators, such as UPPER(NAME) = 'SMITH', or zipcode IN (SELECT zipcode from TX where ….).
If the temporary table is created, then store all qualifying rows into this table. Once all rows are inserted into the temporary table, then the result is returned from this table.

16.3.4.4.1 Join Query Execution

The SQLRT implements the join operation by executing the query in a nested loop fashion. The optimizer has already picked the optimal order of tables. The execution begins with the first (outer most) table in the list. An iterator is created on this table. A qualifying row is retrieved. The next table is picked from the list and a new iterator is created using the qualifying values from the row already fetched. A new qualifying row is retrieved from the second table. If there are more tables in the list, then the process continues until you reach end of the list. This provides the first matching row for the SELECT statement. Find the next matching row from the last table. If you do not find any qualifying rows, then return to the previous table in the list and repeat the process. Every time you advance to the next table in the list, you create a new iterator. Every time you do not find any more matching rows on a table, then close the iterator and return to the previous table in the list. If you exhaust all rows in the outer most table, then you have found all rows. The execution is analogous to nested loops execution in a programming language, which is why it is called a nested loop join.

16.3.4.4.2 Nested View Execution

Oracle Database Lite does not distinguish between dynamic views (the query block in the FROM clause) or a view table being used in the FROM clause. Both are processed in the same way. If a nested view cannot be merged with the containing query and it is not the first to be picked in the execution order, then SQLRT materializes the view into a temporary table and creates a temporary index on the joining column(s). The index is used for joining outer tables with the view. Since the index is created at runtime, the optimizer does not have access to selectivity factors for view columns. The order chosen by the optimizer is based on default selectivity factors and estimated number of rows in the view.

16.3.4.5 Optimization Tips

This section provides guidelines to debug performance problems and helps you design better queries for the Oracle Lite database. Query optimization refers to the techniques used to find best execution plan.

Section 16.3.4.5.1, "Index Access Method"
Section 16.3.4.5.2, "Identifying The Bottleneck"
Section 16.3.4.5.3, "Single Table Query Blocks"
Section 16.3.4.5.4, "Query Blocks Containing Multiple Tables"
Section 16.3.4.5.5, "Known Limitations"

16.3.4.5.1 Index Access Method

An index access method can be used for equality, as well as range predicates. A predicate has to be one of the following forms in order for it to be considered for index access:

column_1 = value1
column_1 rel-op value
column_1 = value1 AND column_2 = value2 AND …
column_1 = value1 AND column_2 = value2 AND … column_n rel-op value-n

Where:

rel-op—One of "=". ">", "<", ">=", <="
column_n—Prefix columns of an index key. The value is an expression that can be evaluated to a constant value. For example, UPPER(name) = 'TOM' cannot be used with an index access method, UPPER(name) is not a column name, but an expression on the column name. Whereas name = UPPER('TOM') can be used as an index predicate; the right hand side is a constant expression.

Note:

You should not create indexes on a column that has multiple duplicate values; that is, the ratio of nrows/ndk to ndk is large.

16.3.4.5.2 Identifying The Bottleneck

The largest problem of solving a query optimization problem is identifying the performance bottlenecks. Where is the CPU spending time? A typical customer query contains multiple tables, predicates, and subqueries. It may also contain outer joins and UNION operations. We recommend the following steps to debug the problem:

Replace all synonyms with base objects. Expand all views by corresponding view definitions. Imagine how SQLRT processes the query and carries out all possible transformations. Identify all query blocks, where each query block contains one SELECT statement.
Experiment with different query blocks one by one and find the slowest performing query block.
Optimize the problematic query block by examining the indexes already existing on columns involved in the query block. Determine if creating new indexes or dropping some indexes improves the performance. Check the order of tables selected by the optimizer (See the "Tools" section). Can it be improved if the query is executed using a different execution order? You can use a HINT to force the execution order of tables.
Once the bottleneck is resolved, repeat the process for the next bottleneck.

Tools

In Oracle Database Lite, you can dump query execution plan information by enabling SQL TRACING, which is enabled by including the following line in the polite.ini configuration file.

OLITE_SQL_TRACE= yes

This creates an oldb_trc.txt file in the current working directory of the process accessing the database. If the file already exists, then it opens the file for appending the dump information. The dump contains the following basic information:

Text of the SQL statement and every views in the SQL statement.
The time taken to compile the query.
The value of each bind variables.
Order of joining the tables.
Temporary tables created.
Access method used for each table. For an index access method, it prints the index name and index number. If the index name is blank, then you can use idxinfo.exe to discover the index information.

16.3.4.5.3 Single Table Query Blocks

For a single table query, the query optimizer does not select the best available access method. However, it does collect statistics for all available indexes. The job for selecting the best index is left to the DBEngine, which uses a rule-based approach to select the appropriate index. Ensure that the index is available for the highest selective columns, as shown in the following example:

SELECT * FROM EMP WHERE NAME = 'Smith' and  EmpNo  between 1 and 1000;

Assuming that the total number of employees is a few thousand, then we would expect the predicate NAME = 'Smith' to return fewer rows than the predicate EmpNO between 1 and 1000. Therefore, we should create an index on the NAME column.

Note:

Since the DBEngine is following a rule-based approach and EMPNO is a primary key column, it may not select the index on the NAME column.

16.3.4.5.4 Query Blocks Containing Multiple Tables

Due to limitations of availability of statistics, and inherent assumptions made about the distribution of data, the execution plan chosen is not always optimal. Also, when suitable indexes are not present, the Oracle Lite Database Engine uses a sequential scan, as opposed to an index access method. To illustrate the importance of the index, see the following query:

SELECT e.empno, e.ENAME, d.dname FROM EMP e, DEPT d
WHERE e.deptno = DEPT.DEPTNO AND e.JOB = 'MANAGER' AND e.sal > 2000;

cost = npg_emp + est_row_emp * ( cost_per_row_dept)

Let us assume that EMP has 1000 rows with 50 rows per page; that is, the npgemp = 20. Let us assume that the est_row_emp is 50, npg_dept = 10 and the cost of the index access into the department is 2. The cost calculation is tabulated, as follows:

Table 16-8 Cost Calculation Tabulation

npg_emp	est_row_emp	npg_dept	Access Method dept	cost_per_row_dept	cost
20	50	10	Sequential	10	520 pages
20	50	10	Indexed	2	120 pages

The cost of execution changes dramatically when an index is present. Therefore, the biggest challenge to improve the performance of a query in an Oracle Lite database is as follows:

Find the right set of indexes.
Optimal order for execution of tables.

There is no easy answer to the above tasks. It requires a deep understanding of the query that you are writing. The first action is to figure out the driving table, which is usually a table with many conditions involving constant values. For example, in the above table, it is most likely the EMP table. On the other hand, if there are only couple of rows in the DEPT table, then the DEPT table is a good candidate for the driving table. Once you select a driving table, the next task is to figure out the possible tables that can be joined to this table. What indexes will help in joining the current result set to the new table? Try joining these two tables and test if the time you receive makes sense. Now, add the third table and repeat the process. To force a specific join order, you can use the HINT clause supported by the Oracle Lite Database. Refer to the Oracle Database Lite SQL Reference for more information.

16.3.4.5.5 Known Limitations

In the process of finding the maximum and minimum values for an index key, the optimizer can spend too much time if there are large number of duplicates values near maximum and minimum values.
Sorting cost calculation is arbitrary.
In the presence of host variables, the selectivity for a range or like predicate cannot be correctly estimated.

16.3.4.6 Glossary

API - Application Programming Interface
ACID - ACID properties refer to atomicity, consistency, isolation, and durability
A Correlated Subquery - A subquery that references columns from tables that are not present in its own "from" clause.
Cross Product - When you join two tables without any joining condition, you produce a cross product. The cross product of a table containing m rows with another table containing n rows produces (m x n) rows.
OKAPI -- Object Kernel Application Program Interface is implemented by the Oracle Lite Database Engine, which you can use to program your database application.
Predicate – A boolean condition that evaluates to "true", "false" or unknown. Examples are: (Emp.Salary > 50000), (Dept.DepNo = 10 AND Emp.HireDate > '17-Nov-2001')
SQLRT – Oracle Lite SQL Runtime Engine that is responsible for implementing SQL functionality on top of Oracle Lite database engine.

16.3.4.7 References

Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A.,Price T.G. Access Path Selection in a Relational Database System. In Readings in Database Systems. Morgan Kaufman. This is a classical paper and must read for any one who wants to learn about query optimization.
Surajit Chaudhuri, An Overview of Query Optimization in Relational Systems, Microsoft Research