5 Query Optimization

TimesTen has a cost-based query optimizer that ensures efficient data access by automatically determining the most efficient way to execute a given query by considering possible query plans. Optimization is performed in the third stage of the compilation process. The stages of compilation are shown in Figure 5-1.

Figure 5-1 Compilation stages

Description of ''Figure 5-1 Compilation stages''

TimesTen invokes the optimizer for SQL statements when more than one execution plan is possible. The optimizer chooses what it thinks is the optimum plan. This plan persists until the statement is either invalidated or dropped by the application.

The optimizer determines the cost of a plan based on:

Table and column statistics
Metadata information (such as referential integrity, primary key)
The presence or absence of indexes. Index choices, including the creation of temporary indexes.
The volume of data
The number of unique values
The selectivity of predicates
Scan methods (full table scan, rowid lookup, range index scan, hash index lookup)
Join algorithm choice (nested loop join, nested loop join with indexes, or merge join)

This chapter includes the following topics:

Optimization time and memory usage
Statistics
Optimizer hints
Indexes
Scan methods
Join methods
Optimizer plan

For more information about the query optimizer, see "The TimesTen Query Optimizer" in Oracle TimesTen In-Memory Database Operations Guide.

Optimization time and memory usage

The optimizer is designed to generate the best possible plan within reasonable time and memory constraints. Because of the size of the search space (which grows exponentially with the number of tables in a join), the optimizer cannot always choose an optimal plan for every query. Instead, the goal of the optimizer is to choose a good plan from among a set of plans generated by using strategies for finding the most promising areas within the search-space of plans. Since query optimization is relatively expensive relative to execution, the optimizer is designed to give precedence to execution time over optimization time.

The plans generated by the optimizer emphasize performance over memory usage. The optimizer may designate the use of significant amounts of temporary memory space in order to speed up execution time. In memory-constrained environments, applications can use the optimizer hints described in "Optimizer hints" to disable the use of temporary indexes and tables in order to create plans that trade maximum performance for reduced memory usage.

Statistics

When determining the execution path for a query, the optimizer examines statistics about the data referenced by the query, such as the number of rows in the tables, the minimum and maximum values and the number of unique values in interval statistics of columns used in predicates, the existence of primary keys and unique constraints within a table, and the size and configuration of any existing indexes.

Table-level and column-level statistics are stored in the SYS.TBL_STATS and SYS.COL_STATS tables. These tables are populated when an application calls the ttOptUpdateStats or ttOptEstimateStats built-in procedures, or uses the ttIsql statsupdate or statsestimate commands.

The optimizer uses the statistics for each table to calculate the selectivity of predicates, such as t1.a=4, or a combination of predicates, such as t1.a=4 AND t1.b<10. Selectivity is an estimate of the number of rows in a table. If a predicate selects a small percentage of rows, it is said to have high selectivity, while a predicate that selects a large percentage of rows has low selectivity.

Optimizer hints

You can apply hints to pass instructions to the TimesTen query optimizer. The optimizer considers these hints when choosing the best execution plan for your query. You can apply hints as follows:

To apply a hint only for a particular SQL statement, use a statement level optimizer hint.
To apply a hint for an entire transaction, use a transaction level optimizer hint with the appropriate TimesTen built-in procedure.
To apply a hint for every SQL statement in the user application, use a connection level optimizer hint with the OptimizerHint connection attribute.

The order of precedence for optimizer hints is statement level hints, transaction level hints and then connection level hints.

See "Use optimizer hints to modify the execution plan" in the Oracle TimesTen In-Memory Database Operations Guide for more information on all optimizer hints.

Indexes

The query optimizer uses indexes to speed up the execution of a query. The optimizer uses existing indexes or creates temporary indexes to generate an execution plan. An index is a map of keys to row locations in a table. Strategic use of indexes is essential to obtain maximum performance from a TimesTen system.

TimesTen uses these types of indexes:

TimesTen uses two types of indexes, each of which can be defined as unique or not unique (the default):

Hash index: Hash indexes are useful for finding rows with an exact match on one or more columns. You create a hash index with the CREATE HASH INDEX SQL statement. In general, hash indexes are faster than range indexes for exact match lookups and equijoins. However, hash indexes cannot be used for lookups involving ranges or the prefix of a key and can require more space than range indexes.
Range index: Range indexes are useful for finding rows with column values within a range specified as an equality or inequality. Range indexes can be created over one or more columns of a table. You create a range index with the CREATE INDEX SQL statement.

For more information about indexing, see "Understanding indexes" in Oracle TimesTen In-Memory Database Operations Guide. Also, see descriptions for the "CREATE INDEX" statement in Oracle TimesTen In-Memory Database SQL Reference.

The Index Advisor tool can be used to recommend a set of indexes that could improve the performance of a specific SQL workload. For more details, see "Using the Index Advisor to recommend indexes" in the Oracle TimesTen In-Memory Database Operations Guide.

Scan methods

The optimizer can select from multiple types of scan methods. The most common scan methods are:

Full table scan: A full table scan examines every row in a table. Because it is typically the least efficient way to evaluate a query predicate, a full scan is only used when no other method is available.
Rowid lookup: A rowid is an address (a unique ID) of a physical tuple in a table to each row stored in a table. A rowid lookup is applicable if, for example, an application has previously selected a rowid and then uses a WHERE ROWID= clause to fetch that same row. Rowid lookups are faster than index lookups.

However, since TimesTen Scaleout may be configured to have multiple copies, then each copy would have a different ROWID. Since ROWID is the identifier of a specific copy of a row, if that copy is not available, you cannot access the row by ROWID. In this case, you should access the row by primary key.

See "Understanding ROWID in data distribution" in the Oracle TimesTen In-Memory Database Scaleout User's Guide for more information.
Range index scan (on either a permanent or temporary index): A range index scan uses a range index to access a table. Such a scan is applicable to exact match predicates such as t1.a=2 or to range predicates such as t1.a>2 and t1.a<10 as long as the column used in the predicate has a range index defined over it. If a range index is defined over multiple columns, it can be used for multiple column predicates. For example, the predicates t1.b=100 and t1.c>'ABC' result in a range index scan if a range index is defined over columns t1.b and t1.c. The index can be used if it is defined over more columns. For example, if a range index is defined over t1.b, t1.c and t1.d, the optimizer uses the index prefix over columns b and c and returns all the values for column d that match the stated predicate over columns b and c.
Hash index lookup (on either a permanent or temporary index): A hash index lookup uses a hash index to find rows with an exact match on one or more columns. Such lookups are applicable for equality searches over one or more specified columns.

TimesTen performs fast exact matches through hash indexes and rowid lookups. They perform range matches through range indexes. Optimizer hints can be used to allow or disallow the optimizer from considering certain scan methods when choosing a query plan.

Join methods

The optimizer can select from multiple join methods. When the rows from two tables are joined, one table is designated the outer table and the other the inner table. The optimizer decides which of the tables should be the outer table and which should be the inner table. During a join, the optimizer scans the rows in the outer and inner tables to locate the rows that match the join condition.

The optimizer analyzes the statistics for each table and, for example, might identify the smallest table or the table with the best selectivity for the query as outer table. If indexes exist for one or more of the tables to be joined, the optimizer takes them into account when selecting the outer and inner tables.

If more than two tables are to be joined, the optimizer analyzes the various combinations of joins on table pairs to determine which pair to join first, which table to join with the result of the join, and so on for the optimum sequence of joins.

The cost of a join is largely influenced by the method in which the inner and outer tables are accessed to locate the rows that match the join condition. The optimizer selects from two join methods when determining the query optimizer plan:

Nested loop join
Merge join

Nested loop join

In a nested loop join with no indexes, a row in the outer table is selected one at a time and matched against every row in the inner table. All of the rows in the inner table are scanned as many times as the number of rows in the outer table. If the inner table has an index on the join column, that index is used to select the rows that meet the join condition. The rows from each table that satisfy the join condition are returned. Indexes may be created on the fly for inner tables in nested loops, and the results from inner scans may be materialized before the join.

Figure 5-2 shows an example of a nested loop join. The join condition is:

WHERE t1.a=t2.a

For this example, the optimizer has decided that t1 is the outer table and t2 is the inner table. Values in column a in table t1 that match values in column a in table t2 are 1 and 7. The join results concatenate the rows from t1 and t2. For example, the first join result is the following row:

7 50 43.54 21 13.69

It concatenates a row from t1:

7 50 43.54

with the first row from t2 in which the values in column a match:

7 21 13.69

Figure 5-2 Nested loop join

Description of ''Figure 5-2 Nested loop join''

Merge join

A merge join is used only when the join columns are sorted by range indexes. In a merge join, a cursor advances through each index one row at a time. Because the rows are already sorted on the join columns in each index, a simple formula is applied to efficiently advance the cursors through each row in a single scan. The formula looks something like:

If Inner.JoinColumn < Outer.JoinColumn, then advance inner cursor
If Inner.JoinColumn = Outer.JoinColumn, then read match
If Inner.JoinColumn > Outer.JoinColumn, then advance outer cursor

Unlike a nested loop join, there is no need to scan the entire inner table for each row in the outer table. A merge join can be used when range indexes have been created for the tables before preparing the query. If no range indexes exist for the tables being joined before preparing the query, the optimizer may in some situations create temporary range indexes in order to use a merge join.

Figure 5-3 shows an example of a merge join. The join condition is:

WHERE t1.a=t2.a

x1 is the index on table t1, sorting on column a. x2 is the index on table t2, sorting on column a. The merge join results concatenate the rows in x1 with rows in x2 in which the values in column a match. For example, the first merge join result is:

1 20 23.09 20 43.59

It concatenates a row in x1:

1 20 23.09

with the first row in x2 in which the values in column a match:

1 20 43.59

Figure 5-3 Merge join

Description of ''Figure 5-3 Merge join''

Optimizer plan

Like the optimizer in most other relational databases, the query optimizer stores the details on how to most efficiently perform SQL operations in a query execution plan, which can be examined and customized by application developers and administrators.

The execution plan data is stored in the TimesTen SYS.PLAN table and includes information about which tables are to be accessed and in what order, which tables are to be joined, and which indexes are to be used. You can use either the ttSqlCmdQueryPlan built-in procedure or the ttIsql explain command to display the query plans for recently executed SQL statements. Users can direct the query optimizer to enable or disable the creation of an execution plan in the SYS.PLAN table with the generate plan optimizer hint. (For transaction level hints, use the GenPlan optimizer flag in the ttOptSetFlag built-in procedure; for statement level hints, use the TT_GenPlan hint in the SQL statement.) See "The TimesTen Query Optimizer" in the Oracle TimesTen In-Memory Database Operations Guide for more information.

The execution plan designates a separate step for each database operation to be performed to execute the query. The steps in the plan are organized into levels that designate which steps must be completed to generate the results required by the step or steps at the next level.

Consider this query:

SELECT COUNT(*)
  FROM t1, t2, t3
    WHERE t3.b/t1.b > 1
      AND t2.b <> 0
      AND t1.a = -t2.a
      AND t2.a = t3.a;

In this example, the optimizer breaks down the query into its individual operations and generates a 5-step execution plan to be performed at three levels, as shown in Figure 5-4.

Figure 5-4 Example execution plan

Description of ''Figure 5-4 Example execution plan''