Skip Headers
Oracle® Database SQL Tuning Guide
12c Release 1 (12.1)

E15858-16
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

10 Optimizer Statistics Concepts

This chapter explains basic concepts relating to optimizer statistics.

This chapter includes the following topics:

Introduction to Optimizer Statistics

Oracle Database optimizer statistics describe details about the database and its objects. The optimizer cost model relies on statistics collected about the objects involved in a query, and the database and host where the query runs. Statistics are critical to the optimizer's ability to pick the best execution plan for a SQL statement.

Optimizer statistics include the following:

  • Table statistics

    • Number of rows

    • Number of blocks

    • Average row length

  • Column statistics

    • Number of distinct values (NDV) in a column

    • Number of nulls in a column

    • Data distribution (histogram)

    • Extended statistics

  • Index statistics

    • Number of leaf blocks

    • Number of levels

    • Index clustering factor

  • System statistics

    • I/O performance and utilization

    • CPU performance and utilization

As shown in Figure 10-1, the database stores optimizer statistics for tables, columns, indexes, and the system in the data dictionary. You can access these statistics using data dictionary views.

Note:

The optimizer statistics are different from the performance statistics visible through V$ views.

Figure 10-1 Optimizer Statistics

Description of Figure 10-1 follows
Description of "Figure 10-1 Optimizer Statistics"

About Optimizer Statistics Types

The optimizer collects statistics on different types of database objects and characteristics of the database environment. This section contains the following topics:

Table Statistics

In Oracle Database, table statistics include information about rows and blocks. The optimizer uses these statistics to determine the cost of table scans and table joins. DBMS_STATS can gather statistics for both permanent and temporary tables.

The database tracks all relevant statistics about permanent tables. DBMS_STATS.GATHER_TABLE_STATS commits before gathering statistics on permanent tables. For example, table statistics stored in DBA_TAB_STATISTICS track the following:

  • Number of rows and average row length

    The database uses the row count stored in DBA_TAB_STATISTICS when determining cardinality.

  • Number of data blocks

    The optimizer uses the number of data blocks with the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter to determine the base table access cost.

Example 10-1 Table Statistics

This example queries some table statistics for the sh.customers table.

sys@PROD> SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS, LAST_ANALYZED
  2  FROM   DBA_TAB_STATISTICS
  3  WHERE  OWNER='SH'
  4  AND    TABLE_NAME='CUSTOMERS';
 
  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANAL
---------- ----------- ---------- ---------
     55500         181       1486 14-JUN-10

See Also:

Column Statistics

Column statistics track information about column values and data distribution. The optimizer uses these statistics to generate accurate cardinality estimates and make better decisions about index usage, join orders, join methods, and so on.

For example, index statistics in DBA_TAB_COL_STATISTICS track the following:

  • Number of distinct values (NDV)

  • Number of nulls

  • High and low values

  • Histogram-related information (see "Histograms")

The optimizer can use extended statistics, which are a special type of column statistics. These statistics are useful for informing the optimizer of logical relationships among columns.

See Also:

Index Statistics

The index statistics include information about the number of index levels, the number of index blocks, and the relationship between the index and the data blocks. The optimizer uses these statistics to determine the cost of index scans.

For example, index statistics stored in the DBA_IND_STATISTICS view track the following:

  • Levels

    The BLEVEL column shows the number of blocks required to go from the root block to a leaf block. A B-tree index has two types of blocks: branch blocks for searching and leaf blocks that store values. See Oracle Database Concepts for a conceptual overview of B-tree indexes.

  • Distinct keys

    This columns tracks the number of distinct indexed values. If a unique constraint is defined, and if no NOT NULL constraint is defined, then this value equals the number of non-null values.

  • Average number of leaf blocks for each distinct indexed key

  • Average number of data blocks pointed to by each distinct indexed key

Example 10-2 Index Statistics

This example queries some index statistics for the cust_lname_ix and customers_pk indexes on the sh.customers table (sample output included):

SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS AS "LEAFBLK", DISTINCT_KEYS AS "DIST_KEY",
       AVG_LEAF_BLOCKS_PER_KEY AS "LEAFBLK_PER_KEY",
       AVG_DATA_BLOCKS_PER_KEY AS "DATABLK_PER_KEY"
FROM   DBA_IND_STATISTICS
WHERE  OWNER = 'SH'
AND    INDEX_NAME IN ('CUST_LNAME_IX','CUSTOMERS_PK');

INDEX_NAME     BLEVEL LEAFBLK DIST_KEY LEAFBLK_PER_KEY DATABLK_PER_KEY
-------------- ------ ------- -------- --------------- ---------------
CUSTOMERS_PK        1     115    55500               1               1
CUST_LNAME_IX       1     141      908               1              10

See Also:

Oracle Database Reference for a description of the DBA_IND_STATISTICS view

Index Clustering Factor

For a B-tree index, the index clustering factor measures the physical grouping of rows in relation to an index value, such as last name (see Oracle Database Concepts for an overview).

A clustering factor that is close to the number of blocks in a table indicates that the rows are physically ordered in the table blocks by the index key. If the database performs a full table scan, then the database tends to retrieve the rows as they are stored on disk sorted by the index key. A clustering factor that is close to the number of rows indicates that the rows are scattered randomly across the database blocks in relation to the index key. If the database performs a full table scan, then the database would not retrieve rows in any sorted order by this index key.

The index clustering factor helps the optimizer decide whether an index scan or full table scan is more efficient for certain queries. A low clustering factor indicates an efficient index scan.

The clustering factor is a property of a specific index, not a table. If multiple indexes exist on a table, then the clustering factor for one index might be small while the factor for another index is large. An attempt to reorganize the table to improve the clustering factor for one index may degrade the clustering factor of the other index.

Example 10-3 Index Clustering Factor

This example shows how the optimizer uses the index clustering factor to determine whether using an index is more effective than a full table scan.

  1. Start SQL*Plus and connect to a database as sh, and then query the number of rows and blocks in the sh.customers table (sample output included):

    SELECT  table_name, num_rows, blocks
    FROM    user_tables
    WHERE   table_name='CUSTOMERS';
     
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    CUSTOMERS                           55500       1486
    
  2. Create an index on the customers.cust_last_name column.

    For example, execute the following statement:

    CREATE INDEX CUSTOMERS_LAST_NAME_IDX ON customers(cust_last_name);
     
    
  3. Query the index clustering factor of the newly created index.

    The following query shows that the customers_last_name_idx index has a high clustering factor because the clustering factor is significantly more than the number of blocks in the table:

    SELECT index_name, blevel, leaf_blocks, clustering_factor
    FROM   user_indexes
    WHERE  table_name='CUSTOMERS'
    AND    index_name= 'CUSTOMERS_LAST_NAME_IDX';
     
    INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
    ------------------------------ ---------- ----------- -----------------
    CUSTOMERS_LAST_NAME_IDX                 1         141              9859
    
  4. Create a new copy of the customers table, with rows ordered by cust_last_name.

    For example, execute the following statements:

    DROP TABLE customers3 PURGE;
    CREATE TABLE customers3 AS 
      SELECT * 
      FROM   customers 
      ORDER BY cust_last_name;
    
  5. Gather statistics on the customers3 table.

    For example, execute the GATHER_TABLE_STATS procedure as follows:

    EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'CUSTOMERS3');
    
  6. Query the number of rows and blocks in the customers3 table .

    For example, enter the following query (sample output included):

    SELECT    TABLE_NAME, NUM_ROWS, BLOCKS
    FROM      USER_TABLES
    WHERE     TABLE_NAME='CUSTOMERS3';
     
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    CUSTOMERS3                          55500       1485 
    
  7. Create an index on the cust_last_name column of customers3.

    For example, execute the following statement:

    CREATE INDEX CUSTOMERS3_LAST_NAME_IDX ON customers3(cust_last_name);
     
    
  8. Query the index clustering factor of the customers3_last_name_idx index.

    The following query shows that the customers3_last_name_idx index has a lower clustering factor:

    SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR
    FROM   USER_INDEXES
    WHERE  TABLE_NAME = 'CUSTOMERS3'
    AND    INDEX_NAME = 'CUSTOMERS3_LAST_NAME_IDX';
     
    INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
    ------------------------------ ---------- ----------- -----------------
    CUSTOMERS3_LAST_NAME_IDX                1         141              1455
    

    The table customers3 has the same data as the original customers table, but the index on customers3 has a much lower clustering factor because the data in the table is ordered by the cust_last_name. The clustering factor is now about 10 times the number of blocks instead of 70 times.

  9. Query the customers table.

    For example, execute the following query (sample output included):

    SELECT cust_first_name, cust_last_name
    FROM   customers
    WHERE  cust_last_name BETWEEN 'Puleo' AND 'Quinn';
     
    CUST_FIRST_NAME      CUST_LAST_NAME
    -------------------- ----------------------------------------
    Vida                 Puleo
    Harriett             Quinlan
    Madeleine            Quinn
    Caresse              Puleo 
    
  10. Display the cursor for the query.

    For example, execute the following query (partial sample output included):

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());
    
    -------------------------------------------------------------------------------
    | Id | Operation                 | Name      | Rows | Bytes | Cost (%CPU)|Time|
    -------------------------------------------------------------------------------
    |   0| SELECT STATEMENT          |           |      |     | 405 (100)|        |
    |*  1|  TABLE ACCESS STORAGE FULL| CUSTOMERS |  2335|35025| 405   (1)|00:00:01|
    -------------------------------------------------------------------------------
    

    The preceding plan shows that the optimizer did not use the index on the original customers tables.

  11. Query the customers3 table.

    For example, execute the following query (sample output included):

    SELECT cust_first_name, cust_last_name
    FROM   customers3
    WHERE  cust_last_name BETWEEN 'Puleo' AND 'Quinn';
     
    CUST_FIRST_NAME      CUST_LAST_NAME
    -------------------- ----------------------------------------
    Vida                 Puleo
    Harriett             Quinlan
    Madeleine            Quinn
    Caresse              Puleo 
    
  12. Display the cursor for the query.

    For example, execute the following query (partial sample output included):

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());
    
    -------------------------------------------------------------------------------
    |Id| Operation                   | Name            |Rows|Bytes|Cost(%CPU)|Time|
    -------------------------------------------------------------------------------
    | 0| SELECT STATEMENT            |                  |    |     |69(100)|      |
    | 1|  TABLE ACCESS BY INDEX ROWID|CUSTOMERS3        |2335|35025|69(0)|00:00:01|
    |*2|   INDEX RANGE SCAN          |CUSTOMERS3_LAST_NAME_IDX|2335| 7(0)|00:00:01|
    -------------------------------------------------------------------------------
    

    The result set is the same, but the optimizer chooses the index. The plan cost is much less than the cost of the plan used on the original customers table.

  13. Query customers with a hint that forces the optimizer to use the index.

    For example, execute the following query (partial sample output included):

    SELECT /*+ index (Customers CUSTOMERS_LAST_NAME_IDX) */ cust_first_name, 
           cust_last_name 
    FROM   customers 
    WHERE  cust_last_name BETWEEN 'Puleo' and 'Quinn';
     
    CUST_FIRST_NAME      CUST_LAST_NAME
    -------------------- ----------------------------------------
    Vida                 Puleo
    Caresse              Puleo
    Harriett             Quinlan
    Madeleine            Quinn 
    
  14. Display the cursor for the query.

    For example, execute the following query (partial sample output included):

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());
    
    -------------------------------------------------------------------------------
    | Id  | Operation                | Name           |Rows|Bytes|Cost (%CPU)|Time|
    -------------------------------------------------------------------------------
    | 0| SELECT STATEMENT            |                  |   |     |422(100)|      |
    | 1|  TABLE ACCESS BY INDEX ROWID|CUSTOMERS         |335|35025|422(0)|00:00:01|
    |*2|   INDEX RANGE SCAN          |CUSTOMERS_LAST_NAME_IDX|2335| |7(0)|00:00:01|
    -------------------------------------------------------------------------------
    

    The preceding plan shows that the cost of using the index on customers is higher than the cost of a full table scan. Thus, using an index does not necessarily improve performance. The index clustering factor is an effective measure of whether an using an index is more effective than a full table scan.

Example: Effect of Index Clustering Factor on Cost

To illustrate how the index clustering factor can influence the cost of table access, consider the following scenario:

  • A table contains 9 rows that are stored in 3 data blocks.

  • The col1 column currently stores the values A, B, and C.

  • A nonunique index named col1_idx exists on col1 for this table.

Assume that the rows are stored in the blocks as shown in Example 10-4.

Example 10-4 Collocated Data

Block 1       Block 2        Block 3
-------       -------        -------
A  A  A       B  B  B        C  C  C

In Example 10-4, the index clustering factor for col1_idx is low. The rows that have the same indexed column values for col1 are in the same data blocks in the table. Thus, the cost of using an index range scan to return all rows with value A is low because only one block in the table must be read.

Assume that the same rows are scattered across data blocks as shown in Example 10-5.

Example 10-5 Scattered Data

Block 1       Block 2        Block 3
-------       -------        -------
A  B  C       A  C  B        B  A  C

In Example 10-5, the index clustering factor for col1_idx is higher. The database must read all three blocks in the table to retrieve all rows with the value A in col1.

See Also:

Oracle Database Reference for a description of the DBA_INDEXES view

Session-Specific Statistics for Global Temporary Tables

A global temporary table is a special table that stores intermediate session-private data for a specific duration. The ON COMMIT clause of CREATE GLOBAL TEMPORARY TABLE indicates whether the table is transaction-specific (DELETE ROWS) or session-specific (PRESERVE ROWS). Thus, a temporary table holds intermediate result sets for the duration of either a transaction or a session.

When you create a global temporary table, you create a definition that is visible to all sessions. No physical storage is allocated. When a session first puts data into the table, the database allocates storage space. The data in the temporary table is only visible to the current session.

Shared and Session-Specific Statistics for Global Temporary Tables

In releases before Oracle Database 12c, the database did not maintain statistics for global temporary tables and non-global temporary tables differently. The database maintained one version of the statistics shared by all sessions, even though data in different sessions could differ.

Starting in Oracle Database 12c, you can set the table-level preference GLOBAL_TEMP_TABLE_STATS to make statistics on a global temporary table shared or session-specific. When set to session-specific, you can gather statistics for a global temporary table in one session, and then use the statistics for this session only. Meanwhile, users can continue to maintain a shared version of the statistics. During optimization, the optimizer first checks whether a global temporary table has session-specific statistics. If yes, the optimizer uses them. Otherwise, the optimizer uses shared statistics if they exist.

Session-specific statistics have the following characteristics:

  • Dictionary views that track statistics show both the shared statistics and the session-specific statistics in the current session.

    The views are DBA_TAB_STATISTICS, DBA_IND_STATISTICS, DBA_TAB_HISTOGRAMS, and DBA_TAB_COL_STATISTICS (each view has a corresponding USER_ and ALL_ version). The SCOPE column shows whether statistics are session-specific or shared.

  • Other sessions do not share the cursor using the session-specific statistics.

    Different sessions can share the cursor using shared statistics, as in previous releases. The same session can share the cursor using session-specific statistics.

  • Pending statistics are not supported for session-specific statistics.

  • When the GLOBAL_TEMP_TABLE_STATS preference is set to SESSION, by default GATHER_TABLE_STATS immediately invalidates previous cursors compiled in the same session. However, this procedure does not invalidate cursors compiled in other sessions.

Effect of DBMS_STATS on Transaction-Specific Temporary Tables

DBMS_STATS commits changes to session-specific global temporary tables, but not to transaction-specific global temporary tables. Before Oracle Database 12c, running DBMS_STATS.GATHER_TABLE_STATS on a transaction-specific temporary table (ON COMMIT DELETE ROWS) would delete all rows, making the statistics show the table as empty. Starting in Oracle Database 12c, the following procedures do not commit for transaction-specific temporary tables, so that rows in these tables are not deleted:

  • GATHER_TABLE_STATS

  • DELETE_TABLE_STATS

  • DELETE_COLUMN_STATS

  • DELETE_INDEX_STATS

  • SET_TABLE_STATS

  • SET_COLUMN_STATS

  • SET_INDEX_STATS

  • GET_TABLE_STATS

  • GET_COLUMN_STATS

  • GET_INDEX_STATS

The preceding program units observe the GLOBAL_TEMP_TABLE_STATS preference. For example, if the table preference is set to SESSION, then SET_TABLE_STATS sets the session statistics, and GATHER_TABLE_STATS preserves all rows in a transaction-specific temporary table. If the table preference is set to SHARED, then SET_TABLE_STATS sets the shared statistics, and GATHER_TABLE_STATS deletes all rows from a transaction-specific temporary table.

See Also:

System Statistics

The system statistics describe hardware characteristics such as I/O and CPU performance and utilization. System statistics enable the query optimizer to more accurately estimate I/O and CPU costs when choosing execution plans.

The database does not invalidate previously parsed SQL statements when updating system statistics. The database parses all new SQL statements using new statistics.

User-Defined Optimizer Statistics

The extensible optimizer enables authors of user-defined functions and indexes to create statistics collection, selectivity, and cost functions for the optimizer to use when choosing a execution plan. The optimizer cost model is extended to integrate information supplied by the user to assess CPU and the I/O cost.

Statistics types act as interfaces for user-defined functions that influence the choice of execution plan by the optimizer. However, to use a statistics type, the optimizer requires a mechanism to bind the type to a database object such as a column, standalone function, object type, index, indextype, or package. The SQL statement ASSOCIATE STATISTICS creates this association.

Functions for user-defined statistics are relevant for columns that use both standard SQL data types and object types, and for domain indexes. When you associate a statistics type with a column or domain index, the database calls the statistics collection method in the statistics type whenever DBMS_STATS gathers statistics for database objects.

See Also:

How the Database Gathers Optimizer Statistics

Oracle Database provides several mechanisms to gather statistics. This section contains the following topics:

DBMS_STATS Package

The DBMS_STATS PL/SQL package collects and manages optimizer statistics. This package enables you to control what and how statistics are collected, including the degree of parallelism for statistics collection, sampling methods, granularity of statistics collection in partitioned tables, and so on.

Note:

Do not use the COMPUTE and ESTIMATE clauses of the ANALYZE statement to collect optimizer statistics. These clauses have been deprecated. Instead, use DBMS_STATS.

Statistics gathered with the DBMS_STATS package are required for the creation of accurate execution plans. For example, table statistics gathered by DBMS_STATS include the number of rows, number of blocks, and average row length.

By default, Oracle Database uses automatic optimizer statistics collection. In this case, the database automatically runs DBMS_STATS to collect optimizer statistics for all schema objects for which statistics are missing or stale. The process eliminates many manual tasks associated with managing the optimizer, and significantly reduces the risks of generating suboptimal execution plans because of missing or stale statistics. You can also update and manage optimizer statistics by manually executing DBMS_STATS.

Dynamic Statistics

By default, when optimizer statistics are missing, stale, or insufficient, the database automatically gathers dynamic statistics during a parse. The database uses recursive SQL to scan a small random sample of table blocks.

Note:

Dynamic statistics augment statistics rather than providing an alternative to them.

Dynamic statistics can supplement statistics such as table and index block counts, table and join cardinalities (estimated number of rows), join column statistics, and GROUP BY statistics. This information helps the optimizer improve plans by making better estimates for predicate selectivity.

Dynamic statistics are beneficial in the following situations:

  • An execution plan is suboptimal because of complex predicates.

  • The sampling time is a small fraction of total execution time for the query.

  • The query is executed many times so that the sampling time is amortized.

Online Statistics Gathering for Bulk Loads

Starting in Oracle Database 12c, the database can gather table statistics automatically during the following types of bulk load operations:

  • CREATE TABLE AS SELECT

  • INSERT INTO ... SELECT into an empty table using a direct path insert

Note:

By default, a parallel insert uses a direct path insert. You can force a direct path insert by using the /*+APPEND */ hint.

This section contains the following topics:

See Also:

Oracle Database Data Warehousing Guide to learn more about bulk loads

Purpose of Online Statistics Gathering for Bulk Loads

Data warehouses typically load large amounts of data into the database. For example, a sales data warehouse might load sales data nightly.

In releases earlier than Oracle Database 12c, to avoid the possibility of a suboptimal plan caused by stale statistics, you needed to gather statistics manually after a bulk load. The ability to gather statistics automatically during bulk loads has the following benefits:

  • Improved performance

    Gathering statistics during the load avoids an additional table scan to gather table statistics.

  • Improved manageability

    No user intervention is required to gather statistics after a bulk load.

Global Statistics During Inserts into Empty Partitioned Tables

When inserting rows into an empty partitioned table, the database gathers global statistics during the insert. For example, if you run INSERT INTO sales SELECT, and if sales is an empty partitioned table, then the database gathers global statistics for sales, but does not gather partition-level statistics.

If you insert rows into a empty partitioned table using extended syntax, and if the specified partition or subpartition is empty, then the database gathers the statistics on the specified partition or subpartition during the insert. No global level statistics are gathered. For example, if you run INSERT INTO sales PARTITION (sales_q4_2000) SELECT, and if partition sales_q4_2000 is empty before the insert (other partitions need not be empty), then the database gathers statistics during the insert. Moreover, if the INCREMENTAL preference is enabled for sales, then the database also gathers synopses for sales_q4_2000. Statistics are immediately available after the INSERT statement. However, if you roll back the transaction, then the database automatically deletes statistics gathered during the bulk load.

Index Statistics and Histograms During Bulk Loads

While gathering online statistics, the database does not gather index statistics or create histograms. If these statistics are required, then Oracle recommends running DBMS_STATS.GATHER_TABLE_STATS with the options parameter set to GATHER AUTO after the bulk load. For example, the following command gathers statistics for the bulk-loaded sh_ctas table:

EXEC DBMS_STATS.GATHER_TABLE_STATS( user, 'SH_CTAS', options => 'GATHER AUTO' );

The preceding example only gathers missing or stale statistics. The database does not gather table and basic column statistics collected during the bulk load.

Note:

You can set the table preference options to GATHER AUTO on the tables that you plan to bulk load. In this way, you need not explicitly set the options parameter when running GATHER_TABLE_STATS.

Restrictions for Online Statistics Gathering for Bulk Loads

Currently, statistics gathering does not occur automatically for bulk loads when any of the following conditions apply to the target table, partition, or subpartition:

  • It is not empty, and you perform an INSERT INTO ... SELECT.

    In this case, an OPTIMIZER STATISTICS GATHERING row source appears in the plan, but this row source is only a pass-through. The database does not actually gather optimizer statistics.

    Note:

    The DBA_TAB_COL_STATISTICS.NOTES column is set to STATS_ON_LOAD by a bulk load into an empty table. However, subsequent bulk loads into the non-empty table do not reset the NOTES column. One method for determining whether the database gathered statistics is to execute DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO, and then query USER_TAB_MODIFICATIONS.INSERTS. If the query returns a row indicating the number of rows loaded, then the statistics were not gathered automatically during the most recent bulk load.
  • It is in an Oracle-owned schema such as SYS.

  • It is a nested table.

  • It is an index-organized table (IOT).

  • It is an external table.

  • It is a global temporary table defined as ON COMMIT DELETE ROWS.

  • It has virtual columns.

  • It has a PUBLISH preference set to FALSE.

  • Its statistics are locked.

  • It is partitioned, INCREMENTAL is set to true, and extended syntax is not used.

    For example, assume that you execute DBMS_STATS.SET_TABLE_PREFS(null, 'sales', incremental', 'true'). In this case, the database does not gather statistics for INSERT INTO sales SELECT, even when sales is empty. However, the database does gather statistics automatically for INSERT INTO sales PARTITION (sales_q4_2000) SELECT.

Hints for Online Statistics Gathering for Bulk Loads

By default, the database gathers statistics during bulk loads. You can disable the feature at the statement level by using the NO_GATHER_OPTIMIZER_STATISTICS hint, and enable the feature at the statement level by using the GATHER_OPTIMIZER_STATISTICS hint. For example, the following statement disables online statistics gathering for bulk loads:

CREATE TABLE employees2 AS
  SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS */* FROM employees

See Also:

Oracle Database SQL Language Reference to learn about the GATHER_OPTIMIZER_STATISTICS and NO_GATHER_OPTIMIZER_STATISTICS hints

When the Database Gathers Optimizer Statistics

The database collects optimizer statistics at various times and from various sources. The database uses the following sources:

  • DBMS_STATS execution, automatic or manual

    This PL/SQL package is the primary means of gathering optimizer statistics. See Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_STATS.GATHER_TABLE_STATS procedure.

  • SQL compilation

    During SQL compilation, the database can augment the statistics previously gathered by DBMS_STATS. In this stage, the database runs additional queries to obtain more accurate information on how many rows in the tables satisfy the WHERE clause predicates in the SQL statement (see "When the Database Samples Data").

  • SQL execution

    During execution, the database can further augment previously gathered statistics. In this stage, Oracle Database collects the number of rows produced by every row source during the execution of a SQL statement. At the end of execution, the optimizer determines whether the estimated number of rows is inaccurate enough to warrant reparsing at the next statement execution. If the cursor is marked for reparsing, then the optimizer uses actual row counts from the previous execution instead of estimates.

  • SQL profiles

    A SQL profile is a collection of auxiliary statistics on a query. The profile stores these supplemental statistics in the data dictionary. The optimizer uses SQL profiles during optimization to determine the most optimal plan (see "About SQL Profiles").

The database stores optimizer statistics in the data dictionary and updates or replaces them as needed. You can query statistics in data dictionary views.

This section contains the following topics:

SQL Plan Directives

A SQL plan directive is additional information and instructions that the optimizer can use to generate a more optimal plan. For example, a SQL plan directive can instruct the optimizer to record a missing extension.

About SQL Plan Directives

During SQL execution, if a cardinality misestimate occurs, then the database creates SQL plan directives. During SQL compilation, the optimizer examines the query corresponding to the directive to determine whether missing extensions or histograms exist (see "Managing Extended Statistics"). The optimizer records any missing extensions. Subsequent DBMS_STATS calls collect statistics for the extensions.

The optimizer uses dynamic statistics whenever it does not have sufficient statistics corresponding to the directive. For example, the optimizer gathers dynamic statistics until the creation of column group statistics, and also after this point when misestimates occur. Currently, the optimizer monitors only column groups. The optimizer does not create an extension on expressions.

SQL plan directives are not tied to a specific SQL statement or SQL ID. The optimizer can use directives for statements that are nearly identical because directives are defined on a query expression. For example, directives can help the optimizer with queries that use similar patterns, such as queries that are identical except for a select list item.

The database automatically manages SQL plan directives. The database initially creates directives in the shared pool. The database periodically writes the directives to the SYSAUX tablespace. You can manage directives with the APIs available in the DBMS_SPD package.

How the Optimizer Uses SQL Plan Directives: Example

This example shows how the database automatically creates and uses SQL plan directives for SQL statements.

Assumptions

You plan to run queries against the sh schema, and you have privileges on this schema and on data dictionary and V$ views.

To see how the database uses a SQL plan directive: 

  1. Query the sh.customers table.

    SELECT /*+gather_plan_statistics*/ * 
    FROM   customers 
    WHERE  cust_state_province='CA' 
    AND    country_id='US';
    

    The gather_plan_statistics hint shows the actual number of rows returned from each operation in the plan. Thus, you can compare the optimizer estimates with the actual number of rows returned.

  2. Query the plan for the preceding query.

    Example 10-6 shows the execution plan (sample output included).

    Example 10-6 Execution Plan

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
     
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  b74nw722wjvy3, child number 0
    -------------------------------------
    select /*+gather_plan_statistics*/ * from customers where
    CUST_STATE_PROVINCE='CA' and country_id='US'
     
    Plan hash value: 1683234692
     
    --------------------------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |      1 |        |     29 |00:00:00.01 |      17 |     14 |
    |*  1 |  TABLE ACCESS FULL| CUSTOMERS |      1 |      8 |     29 |00:00:00.01 |      17 |     14 |
    --------------------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
    

    The actual number of rows (A-Rows) returned by each operation in the plan varies greatly from the estimates (E-Rows). This statement is a candidate for automatic reoptimization (see "Automatic Reoptimization").

  3. Check whether the customers query can be reoptimized.

    The following statement queries the V$SQL.IS_REOPTIMIZABLE value (sample output included):

    SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE
    FROM   V$SQL
    WHERE  SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%';
     
    SQL_ID        CHILD_NUMBER SQL_TEXT    I
    ------------- ------------ ----------- -
    b74nw722wjvy3            0 select /*+g Y
                               ather_plan_
                               statistics*
                               / * from cu
                               stomers whe
                               re CUST_STA
                               TE_PROVINCE
                               ='CA' and c
                               ountry_id='
                               US'
    

    The IS_REOPTIMIZABLE column is marked Y, so the database will perform a hard parse of the customers query on the next execution. The optimizer uses the execution statistics from this initial execution to determine the plan. The database persists the information learned from reoptimization as a SQL plan directive.

  4. Display the directives for the sh schema.

    Example 10-7 uses DBMS_SPD to write the SQL plan directives to disk, and then shows the directives for the sh schema only.

    Example 10-7 Displaying SQL Plan Directives for the sh Schema

    EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
     
    SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, 
           o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
    FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
    WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
    AND    o.OWNER IN ('SH')
    ORDER BY 1,2,3,4,5;
     
    DIR_ID                  OWNER OBJECT_NAME   COL_NAME    OBJECT TYPE             STATE REASON
    ----------------------- ----- ------------- ----------- ------ ---------------- ----- ------------------------
    1484026771529551585     SH    CUSTOMERS     COUNTRY_ID  COLUMN DYNAMIC_SAMPLING NEW   SINGLE TABLE CARDINALITY 
                                                                                          MISESTIMATE
    1484026771529551585     SH    CUSTOMERS     CUST_STATE_ COLUMN DYNAMIC_SAMPLING NEW   SINGLE TABLE CARDINALITY 
                                                PROVINCE                                  MISESTIMATE        
    1484026771529551585     SH    CUSTOMERS                 TABLE  DYNAMIC_SAMPLING NEW   SINGLE TABLE CARDINALITY 
                                                                                          MISESTIMATE
    

    Initially, the database stores SQL plan directives in memory, and then writes them to disk every 15 minutes. Thus, the preceding example calls DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE to force the database to write the directives to the SYSAUX tablespace.

    Monitor directives using the views DBA_SQL_PLAN_DIRECTIVES and DBA_SQL_PLAN_DIR_OBJECTS. Three entries appear in the views, one for the customers table itself, and one for each of the correlated columns. Because the customers query has the IS_REOPTIMIZABLE value of Y, if you reexecute the statement, then the database will hard parse it again, and then generate a plan based on the previous execution statistics.

  5. Query the customers table again.

    For example, enter the following statement:

    SELECT /*+gather_plan_statistics*/ * 
    FROM   customers 
    WHERE  cust_state_province='CA' 
    AND    country_id='US';
    
  6. Query the plan in the cursor.

    Example 10-8 shows the execution plan (sample output included).

    Example 10-8 Execution Plan

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  b74nw722wjvy3, child number 1
    -------------------------------------
    select /*+gather_plan_statistics*/ * from customers where
    CUST_STATE_PROVINCE='CA' and country_id='US'
     
    Plan hash value: 1683234692
     
    -----------------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |      1 |        |     29 |00:00:00.01 |      17 |
    |*  1 |  TABLE ACCESS FULL| CUSTOMERS |      1 |     29 |     29 |00:00:00.01 |      17 |
    -----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
     
    Note
    -----
       - cardinality feedback used for this statement
    

    The Note section indicates that the database used reoptimization for this statement. The estimated number of rows (E-Rows) is now correct. The SQL plan directive has not been used yet.

  7. Query the cursors for the customers query.

    For example, run the following query (sample output included):

    SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE
    FROM   V$SQL
    WHERE  SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%';
     
    SQL_ID        CHILD_NUMBER SQL_TEXT    I
    ------------- ------------ ----------- -
    b74nw722wjvy3            0 select /*+g Y
                               ather_plan_
                               statistics*
                               / * from cu
                               stomers whe
                               re CUST_STA
                               TE_PROVINCE
                               ='CA' and c
                               ountry_id='
                               US'
     
    b74nw722wjvy3            1 select /*+g N
                               ather_plan_
                               statistics*
                               / * from cu
                               stomers whe
                               re CUST_STA
                               TE_PROVINCE
                               ='CA' and c
                               ountry_id='
                               US'
    

    A new plan exists for the customers query, and also a new child cursor.

  8. Confirm that a SQL plan directive exists and is usable for other statements.

    For example, run the following query, which is similar but not identical to the original customers query (the state is MA instead of CA):

    SELECT /*+gather_plan_statistics*/ CUST_EMAIL
    FROM   CUSTOMERS
    WHERE  CUST_STATE_PROVINCE='MA'
    AND    COUNTRY_ID='US';
    
  9. Query the plan in the cursor.

    The following statement queries the cursor (sample output included).:

    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  3tk6hj3nkcs2u, child number 0
    -------------------------------------
    Select /*+gather_plan_statistics*/ cust_email From   customers Where
    cust_state_province='MA' And    country_id='US'
     
    Plan hash value: 1683234692
    
    -------------------------------------------------------------------------------
    |Id | Operation         | Name      | Starts|E-Rows|A-Rows| A-Time    |Buffers|
    -------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT  |           |     1 |      |    2 |00:00:00.01|    16 |
    |*1 |  TABLE ACCESS FULL| CUSTOMERS |     1 |     2|    2 |00:00:00.01|    16 |
    -----------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       1 - filter(("CUST_STATE_PROVINCE"='MA' AND "COUNTRY_ID"='US'))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
       - 1 Sql Plan Directive used for this statement
    

    The Note section of the plan shows that the optimizer used the SQL directive for this statement, and also used dynamic statistics.

  10. Query the state of the directive to determine whether it has changed.

    Example 10-9 queries the data dictionary for information about the directive, which the optimizer has now used.

    Example 10-9 Display Directives for sh Schema

    EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
     
    SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, 
           o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
    FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
    WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
    AND    o.OWNER IN ('SH')
    ORDER BY 1,2,3,4,5;
     
    DIR_ID              OW OBJECT_NA COL_NAME    OBJECT  TYPE            STATE         REASON
    ------------------- -- --------- ---------- ------- ---------------  ------------- ------------------------
    1484026771529551585 SH CUSTOMERS COUNTRY_ID  COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY 
                                                                                       MISESTIMATE
    1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY 
                                     PROVINCE                                          MISESTIMATE
    1484026771529551585 SH CUSTOMERS             TABLE  DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY
                                                                                       MISESTIMATE
    

    The SQL plan directive now has a state of MISSING_STATS. This state means that the database has automatically determined that extended statistics, in the form of column group statistics, can address this problem. So, if you gather statistics on the customers table, then you should also see that a new column group is automatically created for the cust_state_province and country_id columns.

See Also:

How the Optimizer Uses Extensions and SQL Plan Directives: Example

This example is a continuation of "How the Optimizer Uses SQL Plan Directives: Example". The example shows how the database uses a SQL plan directive until the optimizer verifies that an extension exists and the statistics are applicable. At this point, the directive changes its status to HAS_STATS. Subsequent compilations use the statistics instead of the directive.

Assumptions

This example assumes you have already followed the steps in "How the Optimizer Uses SQL Plan Directives: Example".

To see how the optimizer uses an extension and SQL plan directive: 

  1. Gather statistics for the sh.customers table.

    For example, execute the following PL/SQL program:

    BEGIN
      DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS');
    END;
    /
    
  2. Check whether an extension exists on the customers table.

    For example, execute the following query (sample output included):

    SELECT TABLE_NAME, EXTENSION_NAME, EXTENSION 
    FROM   DBA_STAT_EXTENSIONS
    WHERE  OWNER='SH' 
    AND    TABLE_NAME='CUSTOMERS';
     
    TABLE_NAM EXTENSION_NAME                 EXTENSION
    --------- ------------------------------ -----------------------------------
    CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE","COUNTRY_ID")
    

    The preceding output indicates that a column group extension exists on the cust_state_province and country_id columns.

  3. Query the state of the SQL plan directive.

    Example 10-10 queries the data dictionary for information about the directive.

    Example 10-10 Display Directives for sh Schema

    EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
     
    SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, 
           o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
    FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
    WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
    AND    o.OWNER IN ('SH')
    ORDER BY 1,2,3,4,5;
     
    DIR_ID              OWN OBJECT_NA COL_NAME   OBJECT  TYPE             STATE         REASON
    ------------------- --- --------- ---------- ------- ---------------- ------------- --------------------
    1484026771529551585  SH CUSTOMERS COUNTRY_ID  COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY 
                                                                                        MISESTIMATE
    1484026771529551585  SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY 
                                      PROVINCE                                          MISESTIMATE 
    1484026771529551585  SH CUSTOMERS              TABLE DYNAMIC_SAMPLING MISSING_STATS SINGLE TABLE CARDINALITY 
                                                                                        MISESTIMATE
    

    Although column group statistics exist, the directive has a state of MISSING_STATS because the database has not yet recompiled the statement. During the next compilation, the optimizer verifies that the statistics are applicable. If they are applicable, then the status of the directive changes to HAS_STATS. Subsequent compilations use the statistics instead of the directive.

  4. Query the sh.customers table.

    SELECT /*+gather_plan_statistics*/ * 
    FROM   customers 
    WHERE  cust_state_province='CA' 
    AND    country_id='US';
    
  5. Query the plan in the cursor.

    Example 10-11 shows the execution plan (sample output included).

    Example 10-11 Execution Plan

    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
     
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  b74nw722wjvy3, child number 0
    -------------------------------------
    select /*+gather_plan_statistics*/ * from customers where
    CUST_STATE_PROVINCE='CA' and country_id='US'
     
    Plan hash value: 1683234692
     
    -----------------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |      1 |        |     29 |00:00:00.01 |      16 |
    |*  1 |  TABLE ACCESS FULL| CUSTOMERS |      1 |     29 |     29 |00:00:00.01 |      16 |
    -----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
       - 1 Sql Plan Directive used for this statement
    

    The Note section shows that the optimizer used the directive and not the extended statistics. During the compilation, the database verified the extended statistics.

  6. Query the state of the SQL plan directive.

    Example 10-12 queries the data dictionary for information about the directive.

    Example 10-12 Display Directives for sh Schema

    EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
     
    SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME, 
           o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
    FROM   DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
    WHERE  d.DIRECTIVE_ID=o.DIRECTIVE_ID
    AND    o.OWNER IN ('SH')
    ORDER BY 1,2,3,4,5;
     
    DIR_ID                 OWN OBJECT_NAME   COL_NAME   OBJECT  TYPE             STATE     REASON
    -------------------    --- -----------   ---------- ------- ---------------- --------- ------------------------
    1484026771529551585     SH   CUSTOMERS   COUNTRY_ID  COLUMN DYNAMIC_SAMPLING HAS_STATS SINGLE TABLE CARDINALITY 
                                                                                           MISESTIMATE
    1484026771529551585     SH   CUSTOMERS   CUST_STATE_ COLUMN DYNAMIC_SAMPLING HAS_STATS SINGLE TABLE CARDINALITY 
                                             PROVINCE                                      MISESTIMATE
    1484026771529551585     SH   CUSTOMERS                TABLE DYNAMIC_SAMPLING HAS_STATS SINGLE TABLE CARDINALITY 
                                                                                           MISESTIMATE
    

    The state of the directive, which has changed to HAS_STATS, shows that the database has verified the extended statistics.

  7. Query the sh.customers table again, using a slightly different form of the statement.

    For example, run the following query:

    SELECT /*+gather_plan_statistics*/ /* force reparse */ * 
    FROM   customers 
    WHERE  cust_state_province='CA' 
    AND    country_id='US';
    

    If the cursor is in the shared SQL area, then the database typically shares the cursor. To force a reparse, this step changes the SQL text slightly by adding a comment.

  8. Query the plan in the cursor.

    Example 10-13 shows the execution plan (sample output included).

    Example 10-13 Execution Plan

    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
     
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  b74nw722wjvy3, child number 0
    -------------------------------------
    select /*+gather_plan_statistics*/ * from customers where
    CUST_STATE_PROVINCE='CA' and country_id='US'
     
    Plan hash value: 1683234692
     
    -----------------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |      1 |        |     29 |00:00:00.01 |      17 |
    |*  1 |  TABLE ACCESS FULL| CUSTOMERS |      1 |     29 |     29 |00:00:00.01 |      17 |
    -----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
     
    19 rows selected.
    

    The absence of a Note shows that the optimizer used the extended statistics instead of the SQL plan directive. If the directive is not used for 53 weeks, then the database automatically purges it.

See Also:

When the Database Samples Data

In releases earlier than Oracle Database 12c, the database always gathered dynamic statistics (formerly called dynamic sampling) during optimization, and only when a table in the query had no statistics. Starting in Oracle Database 12c, the optimizer automatically decides whether dynamic statistics are useful and which statistics level to use for all SQL statements.

The primary factor in the decision to use dynamic statistics is whether available statistics are sufficient to generate an optimal plan. If statistics are insufficient, then the optimizer uses dynamic statistics.

Automatic dynamic statistics are enabled when any of the following conditions is true:

  • The OPTIMIZER_DYNAMIC_SAMPLING initialization parameter uses its default value, which means that it is not explicitly set.

  • The dynamic statistics level is set to 11 either through the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter or a SQL hint (see "Controlling Dynamic Statistics").

In general, the optimizer uses default statistics rather than dynamic statistics to compute statistics needed during optimizations on tables, indexes, and columns. The optimizer decides whether to use dynamic statistics based on several factors. For example, the database uses automatic dynamic statistics in the following situations:

  • The SQL statement uses parallel execution.

  • A SQL plan directive exists.

  • The SQL statement is known to the database, which means that it was captured in SQL Plan Management or Automatic Workload Repository, or is currently in the shared SQL area.

Figure 10-2 illustrates the process of gathering dynamic statistics.

Figure 10-2 Dynamic Statistics

Description of Figure 10-2 follows
Description of "Figure 10-2 Dynamic Statistics"

As shown in Figure 10-2, the optimizer automatically gathers dynamic statistics in the following cases:

  • Missing statistics

    When tables in a query have no statistics, the optimizer gathers basic statistics on these tables before optimization. Statistics can be missing because the application creates new objects without a follow-up call to DBMS_STATS to gather statistics, or because statistics were locked on an object before statistics were gathered.

    In this case, the statistics are not as high-quality or as complete as the statistics gathered using the DBMS_STATS package. This trade-off is made to limit the impact on the compile time of the statement.

  • Stale statistics

    Statistics gathered by DBMS_STATS can become out-of-date. Typically, statistics are stale when 10% or more of the rows in the table have changed since the last time statistics were gathered.

    For an example of the problem posed by stale statistics, consider a sales table that includes the sales date. After an application inserts new rows, the maximum statistics on the sales date column becomes stale because new rows have a higher sales date than the maximum value seen during the last statistics gathering. For any query that fetches the most recently added sales data, the optimizer assumes that table access will return very few or no rows, which leads to the selection of a suboptimal access path to the sales table (for example, the index on the sales date column), a suboptimal join method (typically a cartesian product), or an inefficient join order. This is commonly known as the out-of-range condition: the value specified in the predicate on the sales date column is outside the column statistics value domain.

  • Insufficient statistics

    Statistics can be insufficient whenever the optimizer estimates the selectivity of predicates (filter or join) or the GROUP BY clause without taking into account correlation between columns, skew in the column data distribution, statistics on expressions, and so on.

    Extended statistics help the optimizer obtain accurate quality cardinality estimates for complex predicate expressions (see "About Statistics on Column Groups"). The optimizer can use dynamic statistics to compensate for the lack of extended statistics or when it cannot use extended statistics, for example, for non-equality predicates.

Note:

The database does not use dynamic statistics for queries that contain the AS OF clause.

See Also:

Oracle Database Reference to learn about the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter

How the Database Samples Data

At the beginning of optimization, when deciding whether a table is a candidate for dynamic statistics, the optimizer checks for the existence of persistent SQL plan directives on the table (see Figure 10-2). For each directive, the optimizer registers a statistics expression that the optimizer computes when it must determine the selectivity of a predicate involving the table.

When sampling is necessary, the database must determine the sample size (see Figure 10-2). Starting in Oracle Database 12c, if the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter is not explicitly set to a value other than 11, then the optimizer automatically decides whether to use dynamic statistics and which level to use.

In Figure 10-2, the database issues a recursive SQL statement to scan a small random sample of the table blocks. The database applies the relevant single-table predicates and joins to estimate predicate selectivities.

The database persists the results of dynamic statistics as sharable statistics. The database can share the results during the SQL compilation of one query with recompilations of the same query. The database can also reuse the results for queries that have the same patterns. If no rows have been inserted, deleted, or updated in the table being sampled, then the use of dynamic statistics is repeatable.

See Also: