14 Attribute Clustering

Attribute clustering is a table-level directive that clusters data in close physical proximity based on the content of certain columns. Storing data that logically belongs together in close physical proximity can greatly reduce the amount of data to be processed and can lead to better performance of certain queries in the workload.

This chapter includes the following sections:

14.1 About Attribute Clustering

An attribute-clustered table stores data in close proximity on disk in an ordered way based on the values of a certain set of columns in the table or a set of columns in the other tables.

You can cluster according to the linear order of specified columns or by using a function that permits multi-dimensional clustering (also known as interleaved clustering). Attribute clustering improves the effectiveness of zone maps, Exadata Storage Indexes, and In-memory min/max pruning. Queries that qualify clustered columns will access only the clustered regions. When attribute clustering is defined on a partitioned table, the clustering applies to all partitions.

Attribute clustering is a directive property of a table. It is not enforced for every DML operation, but only affects direct-path insert operations, data movement, or table creation. Conventional DML operations on the table are not affected by attribute clustering. This means that whatever is done to cluster the data is an operation that is only done on the current working data set. This is in contrast to a manually-applied ORDER BY command, such as what occurs as part of a CTAS operation.

This section contains the following topics:

14.1.1 Methods of Clustering Data

You can cluster data using the following methods:

  • Clustering based on one or more columns of the table on which attribute clustering is defined.

  • Clustering based on one or more columns that are joined with the table on which attribute clustering is defined. Clustering based on joined columns is called join attribute clustering. The tables should be connected through a primary key-foreign key relationship but foreign keys do not have to be enforced.

    Because star queries typically qualify dimension hierarchies, it can be beneficial if fact tables are clustered based on columns (attributes) of one or more dimension tables. With join attribute clustering, you can join one or more dimension tables with a fact table and then cluster the fact table data by dimension hierarchy columns. To cluster a fact table on columns from one or more dimension tables, the join to the dimension tables must be on a primary or unique key of the dimension tables. Join attribute clustering in the context of star queries is also known as hierarchical clustering because the table data is clustered by dimension hierarchies, each made up of an ordered list of hierarchical columns (for example, the nation, state, and city columns forming a location hierarchy).

    Note: In contrast with Oracle Table Clusters, join attribute clustered tables do not store data from a group of tables in the same database blocks. For example, consider an attribute clustered table sales joined with a dimension table products. The sales table will only contain rows from the sales table, but the ordering of the rows will be based on the values of columns joined from products table. The appropriate join will be executed during data movement, direct path insert and CTAS operations.

14.1.2 Types of Attribute Clustering

Attribute clustering is a user-defined table directive that provides data clustering on one or more columns in a table. The directives can be specified when the table is created or modified.

Oracle Database provides the following types of attribute clustering:

Regardless of the type of attribute clustering used, you can either cluster data based on a single table or by joining multiple tables (join attribute clustering).

14.1.2.1 Attribute Clustering with Linear Ordering

Linear ordering stores the data according to the order of specified columns. This is the default type of clustering. For example, linear ordering on the (prod_id, channel_id) columns of the table SALES sorts the data by prod_id first and then by channel_id. The sorted data is stored on disk with the data for clustered columns being in close proximity.

Linear ordering can be defined on single tables or multiple tables that are connected through a primary key-foreign key relationship.

Use the CLUSTERING ... BY LINEAR ORDER directive to perform attribute clustering based on the order of specified columns.

Attribute clustering based on linear ordering of columns is best used in the following scenarios:

  • Queries specify the prefix of columns included in the CLUSTERING clause in a single table

    For example, if queries on sales often specify either a customer ID or a combination of customer ID and product ID, then you could cluster data in the table using the column order cust_id, prod_id.

  • Columns used in the CLUSTERING clause have an acceptable level of cardinality

    The potential data reduction that can be obtained in the scenarios described in "Advantages of Attribute-Clustered Tables" increases in direct proportion to the data reduction obtained from a predicate on a column.

Linear clustering combined with zone maps is very effective in I/O reduction.

14.1.2.2 Attribute Clustering with Interleaved Ordering

Interleaved ordering uses a special multidimensional clustering technique based on Z-order curve fitting. It maps multiple column attribute values (multidimensional data points) to a single one-dimensional value while preserving the multidimensional locality of column values (data points). Interleaved ordering is supported on single tables or multiple tables. Unlike linear ordering, this method does not require the leading columns of the clustering definition to be present to achieve I/O pruning benefits for the scenarios described in "Advantages of Attribute-Clustered Tables".

Columns can be used individually or grouped together into column groups. Each individual column or column group will be used to constitute one of the multidimensional data points in the cluster. Grouped columns are bracketed by '('..')', and must follow the dimensional hierarchy from the coarsest to the finest level of granularity. For example, (product_category, product_subcategory).

Use the CLUSTERING ... BY INTERLEAVED ORDER directive to perform clustering by interleaved ordering.

Interleaved clustering is most beneficial for SQL operations with varying predicates on multiple columns. This is often the case for star queries against a dimensional model, where the query predicates are on dimension tables and the number of predicates vary. Using interleaved join attribute clustering is most common in environments where the fact table is clustered based on columns from the dimension tables. The columns from a dimension table will likely contain a hierarchy, for example, the hierarchy of a product category and sub-category. In this case, clustering of the fact table would occur on dimension columns forming a hierarchy. This is the reason join attribute clustering for star schemas is sometimes referred to as hierarchical clustering. For example, if queries on sales specify columns from different dimensions, then you could cluster data in the sales table according to columns in these dimensions.

Interleaved clustering combined with zone maps is very effective in I/O pruning for star schema queries. In addition, it enables you to provide a very efficient I/O pruning for queries using zone maps, and enhances compression because the same column values are close to each other and can be easily compressed.

14.1.3 Example: Attribute Clustered Table

An example of how a clustered table looks is illustrated in Figure 14-1. Assume you have a table sales with columns (category, country). The table on the left is clustered using linear ordering, and the table on the right is clustered using interleaved ordering. Observe that, in the interleaved-ordered table, there are contiguous regions on disk that contain data with a given category and country.

Figure 14-1 Attribute-Clustered Tables

Description of Figure 14-1 follows
Description of "Figure 14-1 Attribute-Clustered Tables"

14.1.4 Guidelines for Using Attribute Clustering

The following are some considerations when defining an attribute clustered table:

  • Use attribute clustering in combination with zone maps to facilitate zone pruning and its associated I/O reduction.

  • Consider large tables that are frequently queried with predicates on medium to low cardinality columns.

  • Consider fact tables that are frequently queried by dimensional hierarchies.

  • For a partitioned table, consider including columns that correlate with partition keys (to facilitate zone map partition pruning).

  • For linear ordering, list columns in prefix-to-suffix order.

  • Group together columns that form a dimensional hierarchy. This constitutes a column group. Within each column group, list columns in order of coarsest to finest granularity.

  • If there are more than four dimension tables, include the dimensions that are most commonly specified with filters. Limit the number of dimensions to two or three for better clustering effect.

  • Consider using attribute clustering instead of indexes on low to medium cardinality columns.

  • If the primary key of a dimension table is composed of dimension hierarchy values (for example, the primary key is made up of year, quarter, month, day values), make the corresponding foreign key as clustering column instead of dimension hierarchy.

14.1.5 Advantages of Attribute-Clustered Tables

  • Eliminates storage costs associated with using indexes

  • Enables the accessing of clustered regions rather than performing random I/O or full table scans when used in conjunction with zone maps

  • Provides I/O reduction when used in conjunction with any of the following:

    • Oracle Exadata Storage Indexes

    • Oracle In-memory min/max pruning

    • Zone maps

    Attribute clustering provides data clustering based on the attributes that are used as filter predicates. Because both Exadata Storage Indexes and Oracle In-memory min/max pruning track the minimum and maximum values of columns stored in each physical region, clustering reduces the I/O required to access data.

    I/O pruning using zone maps can significantly reduce I/O costs and CPU cost of table scans and index scans.

  • Enables clustering of fact tables based on dimension columns in star schemas

    Techniques such as traditional table clusters do not provide for ordering by columns of other tables. In star schemas, most queries qualify dimension tables and not fact tables, so clustering by fact table columns is not effective. Oracle Database supports clustering on columns in dimension tables.

  • Improves data compression ratios and in this way indirectly improves table scan costs

    Compression can be improved because, with clustering, there is a high probability that clustered columns with the same values are close to each other on disk, hence the database can more easily compress them.

  • Minimizes table lookup and single block I/O operations for index range scan operations when the attribute clustering is on the index selection criteria.

  • Enables I/O reduction in OLTP applications for queries that qualify a prefix in and use attribute clustering with linear order

  • Enables I/O reduction on a subset of the clustering columns for attribute clustering with interleaved ordering

    If table data is ordered on multiple columns, as in an index-organized table, then a query must specify a prefix of the columns to gain I/O savings. In contrast, a BY INTERLEAVED table permits queries to benefit from I/O pruning when they specify columns from multiple tables in a non-prefix order.

14.1.6 About Defining Attribute Clustering for Tables

Attribute clustering information is part of the table metadata. You can define attribute clustering for a table either when table is first created or subsequently, by altering the table definition.

Use the CLUSTERING clause of the CREATE TABLE statement to define attribute clustering for a table. The type of attribute clustering is specified by including BY LINEAR ORDER or BY INTERLEAVED ORDER.

If attribute clustering was not defined when the table was created, you can modify the table definition and add clustering. Use the ALTER TABLE ... ADD CLUSTERING statement to define attribute clustering for an existing table.

14.1.7 About Specifying When Attribute Clustering Must be Performed

Performing clustering may be expensive because it involves reorganization of the table and clustering data during DML operations. Oracle Database does not enforce the clustering of data on conventional DML, conventional insert, update, and merge.

Clustering can be performed in two ways. The first is to automatically perform clustering for certain DML operations on the table. This is done by defining, as part of the table metadata, the operations for which clustering is triggered. The second is to explicitly specify that clustering must be performed as described in "Using Hints to Control Attribute Clustering for DML Operations" and "Overriding Table-level Settings for Attribute Clustering During DDL Operations". In this case, you can perform clustering for a table even if its metadata definition does not include clustering.

As part of the table definition, you can specify that attribute clustering must be performed when the following operations are triggered:

  • Direct-path insert operations

    Set the ON LOAD option to YES to specify that attribute clustering must be performed during direct-path insert operations.

  • Data movement operations

    Set the ON DATA MOVEMENT option to YES to specify clustering must be performed during data movement operations. This includes online table redefinition and the following partition operations: MOVE, MERGE, SPLIT, and COALESCE.

The ON LOAD and ON DATA MOVEMENT options can be included in a CREATE TABLE or ALTER TABLE statement. If neither YES ON LOAD nor YES ON DATA MOVEMENT is specified, then clustering is not enforced automatically.

It will serve only as metadata defining natural clustering of the table that may be used later for zone map creation. In this case, it is up to the user to enforce clustering during loads.

See Also:

"Adding Attribute Clustering to an Existing Table" for an example on using the ON LOAD and ON DATA MOVEMENT options

14.2 Attribute Clustering Operations

This section describes common tasks involving attribute clustering and includes:

14.2.1 Privileges for Attribute-Clustered Tables

To define attribute clustering for a table, you must have the CREATE or ALTER privilege on the table. Additionally, for join attribute clustering, you must also have the SELECT or READ privilege on the joined table or tables.

See Also:

Oracle Database SQL Language Reference for syntax and semantics of the CLUSTERING clause of CREATE TABLE

14.2.2 Creating Attribute-Clustered Tables with Linear Ordering

Linear ordering stores the data according to the order of specified columns, equivalent to an ORDER BY clause. Linear ordering is supported on columns of a single table or multiple tables in a star schema. Examples of Attribute Clustering with Linear Ordering contains examples of attribute-clustered tables with linear ordering.

See Also:

Oracle Database SQL Language Reference for information about attribute clustering restrictions

14.2.2.1 Examples of Attribute Clustering with Linear Ordering

Example 14-1 and Example 14-2 illustrate linear ordering.

Example 14-1 Creating a Table with Linear Ordering

Assume that queries on sales often specify either a customer ID or a combination of a customer ID and product ID. You can create an attribute-clustered table so that such queries benefit from I/O reduction for the scenarios described in "Advantages of Attribute-Clustered Tables".

The following statement creates the sales table with linear ordering:

CREATE TABLE sales (
  prod_id        NUMBER(6) NOT NULL,
  cust_id        NUMBER NOT NULL,
  time_id        DATE NOT NULL,
  channel_id     CHAR(1) NOT NULL,
  promo_id       NUMBER(6) NOT NULL,
  quantity_sold  NUMBER(3) NOT NULL,
  amount_sold    NUMBER(10,2) NOT NULL
)
CLUSTERING
  BY LINEAR ORDER (cust_id, prod_id);

This clustered table is useful for queries containing a predicate on cust_id or predicates on both cust_id and prod_id.

Example 14-2 Creating a Table with Linear Ordering and a Join

Assume that the products dimension table has a unique key or primary key on the prod_id column. Other columns in this table include, but are not limited to, prod_name, prod_desc, prod_category, prod_subcategory, and prod_status. Queries on the my_sales fact table often contain one of the following:

  • a predicate on cust_id

  • predicates on cust_id and prod_category

  • predicates on cust_id, prod_category, and prod_subcategory

Defining attribute clustering for the my_sales table is useful for queries that contain the predicates included in the CLUSTERING clause.

CREATE TABLE my_sales (
  prod_id        NUMBER(6) NOT NULL,
  cust_id        NUMBER NOT NULL,
  time_id        DATE NOT NULL,
  channel_id     CHAR(1) NOT NULL,
  promo_id       NUMBER(6) NOT NULL,
  quantity_sold  NUMBER(3) NOT NULL,
  amount_sold    NUMBER(10,2) NOT NULL
)
CLUSTERING
  my_sales JOIN products ON (my_sales.prod_id = products.prod_id)
  BY LINEAR ORDER (cust_id, prod_category, prod_subcategory);

See Also:

Oracle Database SQL Language Reference for syntax and semantics of the BY LINEAR ORDER clause

14.2.3 Creating Attribute-Clustered Tables with Interleaved Ordering

Interleaved ordering uses a special multidimensional clustering technique similar to a Z-order sort. It is especially beneficial when you have a specific set of predicates that are commonly used most of the time, but do not always use all of them. Interleaved ordering is useful for dimensional hierarchies of star schemas in a data warehouse. "Examples of Attribute Clustering with Interleaved Ordering" contains examples of attribute-clustered tables with interleaved ordering.

See Also:

Oracle Database SQL Language Reference for information about attribute clustering restrictions

14.2.3.1 Examples of Attribute Clustering with Interleaved Ordering

Example 14-3 and Example 14-4 illustrate interleaved ordering.

You can also create an attribute clustered table so that queries benefit from pruning with zone maps. "Creating Zone Maps with Attribute Clustering" contains examples of defining zone maps with attribute clustering.

Example 14-3 Creating a Table with Interleaved Ordering

Assume that queries on sales often specify either a time ID or a combination of time ID and product ID. You can create sales with interleaved attribute clustering using the following command:

CREATE TABLE sales (
  prod_id        NUMBER(6) NOT NULL,
  cust_id        NUMBER NOT NULL,
  time_id        DATE NOT NULL,
  channel_id     CHAR(1) NOT NULL,
  promo_id       NUMBER(6) NOT NULL,
  quantity_sold  NUMBER(3) NOT NULL,
  amount_sold    NUMBER(10,2) NOT NULL
)
CLUSTERING
  BY INTERLEAVED ORDER (time_id, prod_id);

This clustered table is useful for queries containing one of the following:

  • a predicate on time_id

  • a predicate on prod_id

  • predicates on time_id and prod_id

Example 14-4 Creating a Table with Interleaved Ordering and a Join

Large data warehouses frequently organize data in star schemas. A dimension table uses a parent-child hierarchy and is connected to a fact table by a foreign key. Clustering a fact table with interleaved ordering enables the database to use a special function to skip values in dimension columns during table scans. Note that clustering does not require an enforced foreign key relationship. However, Oracle Database does require primary or unique keys on the dimension tables.

The following command defines attribute clustering using interleaved ordering for the sales fact table:

CREATE TABLE sales (
  prod_id        NUMBER(6) NOT NULL,
  cust_id        NUMBER NOT NULL,
  time_id        DATE NOT NULL,
  channel_id     CHAR(1) NOT NULL,
  promo_id       NUMBER(6) NOT NULL,
  quantity_sold  NUMBER(3) NOT NULL,
  amount_sold    NUMBER(10,2) NOT NULL
)
CLUSTERING
  sales JOIN products ON (sales.prod_id = products.prod_id)
  BY INTERLEAVED ORDER ((time_id), (prod_category, prod_subcategory));

This clustered table is useful for queries containing one of the following:

  • a predicate on time_id

  • a predicate on prod_category

  • predicates on prod_category and prod_subcategory

  • predicates on time_id and prod_category

  • predicates on time_id, prod_category, and prod_subcategory

See Also:

Oracle Database SQL Language Reference for information on the CREATE TABLE statement and CLUSTERING clause

14.2.4 Maintaining Attribute Clustering

You can add, drop, and update the attribute clustering definition of a table at any point in time. The modified definition does not affect existing table data, but can only be used as directive for future operations.

The following maintenance operations modify table metadata:

You can also override the attribute clustering definitions on a table at runtime. The maintenance operations that influence attribute clustering behavior at runtime are:

14.2.4.1 Adding Attribute Clustering to an Existing Table

When you create a table with clustering, it is created with a zone map by default. You can, however, explicitly prevent this by using WITHOUT ZONEMAP. This could be done for several reasons, such as wanting to create a zone map on clustering columns plus additional columns that correlate to clustering columns, or to use specific zone map storage options instead of the defaults.

Use the ALTER TABLE ... ADD CLUSTERING command to add attribute clustering to an existing table that does not currently use attribute clustering.

The following command adds attribute clustering to the SALES fact table. The modified table will use interleaved clustering that is based on the joined dimension tables CUSTOMERS and PRODUCTS.

ALTER TABLE sales 
ADD CLUSTERING sales JOIN customers ON (sales.cust_id = customers.cust_id)
                     JOIN products ON (sales.prod_id = products.prod_id)
    BY INTERLEAVED ORDER ((prod_category, prod_subcategory),
                          (country_id, cust_state_province, cust_city))
    YES ON LOAD YES ON DATA MOVEMENT
    WITHOUT MATERLALIZED ZONEMAP;

When you add clustering to a table, the existing data is not clustered. To force the existing data to be clustered, you need to move the content of the table using an ALTER TABLE...MOVE statement. You can do this partition by partition.

The following command clusters data in the sales table:

ALTER TABLE sales MOVE PARTITION sales_1995 UPDATE INDEXES ALLOW CLUSTERING;

For more information about zone maps, see "About Zone Maps".

14.2.4.2 Modifying Attribute Clustering Definitions

Use the ALTER TABLE ... MODIFY CLUSTERING statement to modify when attribute clustering is triggered for a table. Modifying clustering definitions does not affect the existing table data. The modified definitions are applicable only to future data movement or direct-path insert operations.

The following command modifies the clustering definition of the SALES table and enables clustering during data movement.

ALTER TABLE sales MODIFY CLUSTERING YES ON DATA MOVEMENT;

You can also modify a table definition and create or drop a zone map that is based on the attribute clustering. The following statement modifies the definition of the SALES table and adds a zone map:

ALTER TABLE sales MODIFY CLUSTERING WITH MATERIALIZED ZONEMAP;

Use the following statement to modify the definition of the attribute-clustered table SALES and remove zone maps.

ALTER TABLE sales MODIFY CLUSTERING WITHOUT MATERIALIZED ZONEMAP;
14.2.4.3 Dropping Attribute Clustering for an Existing Table

If attribute clustering is defined for an existing table, use the ALTER TABLE ... DROP CLUSTERING statement to remove attribute clustering. Dropping a clustering definition does not have any impact on the existing table data.

The following command removes attribute clustering for the SALES table:

ALTER TABLE sales DROP CLUSTERING;
14.2.4.4 Using Hints to Control Attribute Clustering for DML Operations

You can use hints to enforce the use of clustering or to prevent its use during direct-path insert operations. Use the CLUSTERING hint to enforce clustering for a table and NO_CLUSTERING hint to prevent the use of clustering.

The following command disables attribute clustering while inserting data into the SALES table. This table was created with the YES ON LOAD option.

INSERT /*+ APPEND NO_CLUSTERING */ INTO sales SELECT * FROM external_sales;

See "Controlling the Use of Zone Maps" for more information about hints.

14.2.4.5 Overriding Table-level Settings for Attribute Clustering During DDL Operations

You can override the attribute clustering definition during data movement DDL operations such as partition maintenance that creates new data segments (split or merge operations) or moving a table, partition, or subpartition. For example, if a table was defined using the NO ON DATA MOVEMENT option, then you can cluster data for this table during a data movement operation by using the ALTER TABLE ... ALLOW CLUSTERING statement.

The following command allows clustering during data movement for the sales_2010 partition of the SALES tables that was defined using the NO ON DATA MOVEMENT option:

ALTER TABLE sales MOVE PARTITION sales_2010 UPDATE INDEXES ALLOW CLUSTERING;

Similarly, you can disable clustering during data movement for a table that was defined using the YES ON DATA MOVEMENT option by including the DISALLOW CLUSTERING clause in the ALTER TABLE command that is used to move data.

14.2.4.6 Clustering Table Data During Online Table Redefinition

Online table redefinition enables you to modify the logical or physical structure of a table without significantly affecting its availability. The table is accessible to both queries and DML during much of the redefinition process.

You can redefine a table online and add attribute clustering to a table that did not previously use attribute clustering. The DBMS_REDEFINITION package enables you redefine tables online and add attribute clustering to them.

See Also:

Oracle Database PL/SQL Packages and Types Reference for more information about the DBMS_REDEFINITION package

Example 14-5 Redefining an Attribute-Clustered Table Online

Assume that you want to redefine the sales table to change the data type of amount_sold from a number to a float, add attribute clustering to the table, and cluster the data during online redefinition.

Use the following steps to redefine the sales table in the SH schema and cluster its data during online table redefinition:

  1. Verify that the table can be redefined online by invoking the CAN_REDEF_TABLE procedure of the DBMS_REDEFINITION package.

    The following command verifies that the sales table can be redefined online:

    exec DBMS_REDEFINITION.CAN_REDEF_TABLE('SH','SALES');
    
  2. Create the interim table in the SH schema with the desired physical and logical attributes that you want to use for the redefined table.

    The following command creates the interim table sales_interim. The data type of the amount_sold column is binary_double and the CLUSTERING clause specifies how attribute clustering must be performed.

    CREATE TABLE sales_interim
    (
     PROD_ID        NUMBER(6) PRIMARY KEY,
     CUST_ID        NUMBER NOT NULL,
     TIME_ID        DATE NOT NULL,
     CHANNEL_ID     CHAR(1) NOT NULL,
     PROMO_ID       NUMBER(6),
     QUANTITY_SOLD  NUMBER(3) NOT NULL,
     AMOUNT_SOLD    binary_double
    )
    CLUSTERING sales_interim JOIN customers ON 
              (sales_interim.cust_id = customers.cust_id)
                     JOIN products ON (sales_interim.prod_id = products.prod_id)
        BY INTERLEAVED ORDER ( (prod_category, prod_subcategory),
                                (country_id, cust_state_province, cust_city));
    
  3. Start the online table redefinition process using the DBMS_REDEFINITON.START_REDEF_TABLE procedure. The sales table is available for queries and DML during this process.

    The following command starts the redefinition process for the sales table:

    exec DBMS_REDEFINITION.START_REDEF_TABLE(uname => 'SH',orig_table  => 'SALES', int_table => 'SALES_INTERIM', options_flag => DBMS_REDEFINITION.CONS_USE_ROWID);
     
  4. Optionally synchronize the interim table with the original table.

    Synchronization is recommended if a large number of DML statements may have been executed on the original table after the redefinition was started. This step reduces the time taken to finish the redefinition process.

    The following command synchronizes the sales_interim table with the original sales table:

    exec DBMS_REDEFINITION.SYNC_INTERIM_TABLE('SH', 'SALES', 'SALES_INTERIM');
    
  5. Complete the online table redefinition using the DBMS_REDEFINITION.FINISH_REDEF_TABLE procedure.

    The following command completes the online redefinition of the sales table:

    exec DBMS_REDEFINITION.FINISH_REDEF_TABLE('SH', 'SALES', 'SALES_INTERIM');
    

14.3 Viewing Attribute Clustering Information

Oracle Database provides a set of data dictionary views that contain information about attribute clustering. This section describes how you can use these views to obtain information about attribute clustering.

This section contains the following topics:

14.3.1 Determining if Attribute Clustering is Defined for Tables

The CLUSTERING column in the views DBA_TABLES, USER_TABLES, and ALL_TABLES specifies if attribute clustering is defined for the tables. The CLUSTERING column displays YES if attribute clustering is defined for the table and NO otherwise.

The following query displays the names of tables in the SH schema and indicates if they use attribute clustering.

SELECT TABLE_NAME, CLUSTERING FROM DBA_TABLES WHERE OWNER='SH';

TABLE_NAME      CLUSTERING
-----------    ------------
SALES             YES
PRODUCTS          NO
MY_SALES          YES

14.3.2 Viewing Attribute-Clustering Information for Tables

Use one of the following data dictionary views to obtain details about attribute clustering for tables:

  • DBA_CLUSTERING_TABLES to describe all attribute-clustered tables in the database

  • ALL_CLUSTERING_TABLES to describe attribute-clustered table accessible to the user

  • USER_CLUSTERING_TABLES to describe attribute-clustered tables owned by the user

The following query displays details about the attribute clustering for the SALES table. The details include the type of attribute clustering and the operations for which clustering is enabled for the table. The output has been formatted to fit on the page.

SELECT owner, table_name, clustering_type, on_load, on_datamovement, with_zonemap
FROM DBA_CLUSTERING_TABLES WHERE table_name='SALES';

OWNER    TABLE_NAME CLUSTERING_TYPE  ON_LOAD  ON_DATAMOVEMENT  WITH_ZONEMAP
------   ---------- ---------------  -------- ---------------  -------------
SH       SALES       LINEAR          YES      YES              YES
SELECT owner, table_name, clustering_type, on_load, on_datamovement
FROM DBA_CLUSTERING_TABLES WHERE table_name='SALES';

OWNER    TABLE_NAME CLUSTERING_TYPE  ON_LOAD  ON_DATAMOVEMENT 
------   ---------- ---------------  -------- --------------- 
SH       SALES       LINEAR          YES      YES            

14.3.3 Viewing Information About the Columns on Which Attribute Clustering is Performed

Use one of the following data dictionary views to obtain information about the columns on which attribute clustering is defined for tables:

  • DBA_CLUSTERING_KEYS

  • ALL_CLUSTERING_KEYS

  • USER_CLUSTERING_KEYS

For example, the data in the table SALES is clustered using linear ordering. Use the following command to display the columns on which table is clustered. The output has been formatted to fit in the page.

SELECT detail_owner, detail_name, detail_column, position
FROM DBA_CLUSTERING_KEYS
WHERE table_name='SALES';

DETAIL_OWNER    DETAIL_NAME       DETAIL_COLUMN      POSITION
------------   --------------   -----------------   ---------
SH             SALES             PROD_ID                2       
SH             SALES             TIME_ID                1        

14.3.4 Viewing Information About Dimensions and Joins on Which Attribute Clustering is Performed

To view information about the dimension tables by which a fact table is clustered, query the DBA_CLUSTERING_DIMENSIONS, ALL_CLUSTERING_DIMENSIONS, or USER_CLUSTERING_DIMENSIONS data dictionary views.

To view details about the joins of the fact table and dimension tables, query the DBA_CLUSTERING_JOINS, ALL_CLUSTERING_JOINS, or USER_CLUSTERING_JOINS views. The output has been formatted to fit in the page.

The following query displays the dimension tables by which the fact table SALES is attribute-clustered.

SELECT * FROM DBA_CLUSTERING_DIMENSIONS WHERE table_name='MY_SALES';

OWNER      TABLE_NAME      DIMENSION_OWNER        DIMENSION_NAME
------   --------------   -------------------  ---------------------
SH         MY_SALES           SH                   PRODUCTS

The following query displays the columns used to join the fact table my_sales with dimension table products. The output has been formatted to fit in the page.

SELECT tab1_owner, tab1_name,tab1_column
FROM DBA_CLUSTERING_JOINS 
WHERE table_name='MY_SALES';

TAB1_OWNER       TAB1_NAME     TAB1_COLUMN
-----------   -------------   --------------------
SH               MY_SALES        PROD_ID

14.4 About Automatic Data Clustering

Oracle Database automatically and transparently clusters storage-based data in response to the type of queries used by the application workload. You can make configuration changes to this process.

Automatic data clustering allows the workload to make more efficient use of data access optimizations, such as storage indexes, zone maps, and join zone maps. With little user interaction, the Automatic Data Clustering feature monitors and analyzes workload query execution, recommends the tables to cluster, how to cluster, and what zone maps to create. The impact of clustering is verified before it is implemented.

The steps are as follows:

  • Workload capture.
  • Workload analysis.
  • You generate a clustering and zone map recommendation.
  • You verify the recommendation.
  • You apply the recommendation.

Workload Capture and Analysis

Workload SQL statements can be captured using the automatic SQL tuning set. Over time, workload queries are captured without manual intervention. By default, automatic clustering analyzes a workload captured in this system-maintained SQL tuning set. Alternatively, automatic clustering can be directed to base and analysis and recommendation on a workload captured in any manually populated SQL tuning set.

Clustering and Zone Map Recommendation

Automatic clustering identifies fact and dimension tables and analyzes joins between them using workload SQL statements captured in the automatic SQL tuning set or in a user-specified SQL tuning set. Certain criteria must be met before clustering is considered. For example, fact tables must be a minimum of 64GB, optimizer statistics must be up-to-date, and there should be no existing manual clustering clause or zone map.

The following may be returned to the fact table:
  • a single table or join clustering with dimension tables to join
  • a clustering method (linear or interleaved clustering) with clustering columns and their ordering
  • zone map columns

14.4.1 User Controls for Automatic Data Clustering

The workflow for Automatic Data Clustering is to reconfigure the process, generate a recommendation, verify the recommendation, and apply the recommendation.

Reconfigure the Process

DBMS_AUTO_CLUSTERING.CONFIGURE (
     parameter_name      IN VARCHAR2, 
     parameter_value     IN VARCHAR2,
     allow               IN BOOLEAN DEFAULT TRUE);
Example:
exec dbms_auto_clustering.configure('AUTO_CLUSTERING_SCHEMA', 'DW_SCHEMA', allow=> TRUE);

select parameter_name,parameter_value 
from   DBA_AUTO_CLUSTERING_CONFIG
where  parameter_name = 'AUTO_CLUSTERING_SCHEMA_INCLUDE';

PARAMETER_NAME                                               PARAMETER_VALUE
------------------------------------------------------------ ---------------------
AUTO_CLUSTERING_SCHEMA_INCLUDE                               DW_SCHEMA

Generate Recommendations

DBMS_AUTO_CLUSTERING.RECOMMEND_CLUSTERING_METHOD (
     sqlset_owner      IN  VARCHAR2 DEFAULT ‘SYS’,    
     sqlset_name       IN  VARCHAR2 DEFAULT ‘SYS_AUTO_STS’, 
     table_owner       IN  VARCHAR2 DEFAULT  NULL,    
     table_name        IN  VARCHAR2 DEFAULT  NULL, 
     report_type       IN  VARCHAR2 DEFAULT ‘TEXT’,
     report_section    IN  VARCHAR2 DEFAULT ‘SUMMARY’,
     report_level      IN  VARCHAR2 DEFAULT ‘TYPICAL’,
     verification      IN  BOOLEAN  DEFAULT  TRUE)
RETURN RAW; 
Example:

set serveroutput on
declare 
   rec_id raw(16);
begin
     rec_id := dbms_auto_clustering.recommend_clustering_method(table_owner => user,
               table_name => 'MYFACT', 
               verification=>false, 
               report_level=>'ALL', report_section=>'ALL');
     dbms_output.put_line('Rec ID: '||rec_id);
end;
/
Rec ID: F76780956B8AACA1E053624F46641292

set linesize 200
set pagesize 1000
column table_name format a20
column clustering_ddl format a80
column zonemap_ddl format a80
select table_name,clustering_ddl,zonemap_ddl,status
from  dba_auto_clustering_recommendations
where recommendation_id = 'F767C7BD8F92BF4AE053624F4664E9A2';

TABLE_NAME   CLUSTERING_DDL                                      ZONEMAP_DDL                                                       STATUS
------------ --------------------------------------------------- ----------------------------------------------------------------- ----------
MYFACT       ALTER TABLE "DW_SCHEMA"."MYFACT" ADD CLUSTERING  …  CREATE MATERIALIZED ZONEMAP "DW_SCHEMA"."ACZMAP$_F" AS SELECT …   UNVERIFIED
 

Verify Recommendations

The verification step measures compares the performance of the application workload with and without the recommended clustering method and zone maps. If performance is improved, the recommendations are marked as VERIFIED and they can be accepted. If performance is not improved significantly, the recommendation is marked REJECTED.

DBMS_AUTO_CLUSTERING.VERIFY_RECOMMENDATION (
     recommendation_id,    
     table_owner,    
     table_name );
Example:
exec dbms_auto_clustering.verify_recommendation(' F76780956B8AACA1E053624F46641292')

set long 10000
column report format a100
var report clob
begin
   :report := dbms_auto_clustering.report_last_activity(
                 type=>'TEXT',
                 section=>'ALL',
                 level=>'ALL');
end;
/

select :report report
from dual;

REPORT
----------------------------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
 Activity start             : <Date>
 Activity end               : <Date>
 Recommendations generated  : 1
 Recommendation verified    : 1
 Recommendation applied     : 0
-------------------------------------------------------------------------------

CLUSTERING DETAILS
-------------------------------------------------------------------------------
1. The following clusterings were created:
-------------------------------------------------------------------------------
------------------------------------------------------------------------
| Owner     | Table      | Recommendation ID                | Status   | Method |
------------------------------------------------------------------------
| DW_SCHEMA | MYFACT     | F76780956B8AACA1E053624F46641292 | ACCEPTED | SINGLE |
-------------------------------------------------------------------------------
* Clustering DDLs

ALTER TABLE "ADHOC"."F" ADD CLUSTERING BY ORDER ( "DW_SCHEMA"."MYFACT"."Y")

-------------------------------------------------------------------------------

VERIFICATION DETAILS

Apply Recommendations

DBMS_AUTO_CLUSTERING.APPLY_RECOMMENDATION (
     table_owner,    
     table_name,
     recommendation_id, 
     clustering_ddl, 
     zonemap_ddl,
     apply_mode ); 
Example:
declare
  rec_id raw(16) := null;
  begin	
    rec_id := dbms_auto_clustering.get_recommendation('DW_SCHEMA', 'MYFACT');
    dbms_auto_clustering.apply_recommendation(rec_id, 'DW_SCHEMA', 'MYFACT');
  end;
/

select table_name,
       clustering_type,
       on_load,
       on_datamovement,
       valid,
       with_zonemap,
       automatic 
from user_clustering_tables;

TABLE_NAME           CLUSTERING_ ON_ ON_ VAL WIT AUT
-------------------- ----------- --- --- --- --- ---
MYFACT               LINEAR      YES YES YES NO  YES

select zonemap_name,
       query,
       automatic 
from user_zonemaps;

ZONEMAP_NAME                   QUERY                                       AUTOMATIC
------------------------------ ------------------------------------------- ---------
ACZMAP$_MYFACT                 SELECT SYS_OP_ZONE_ID("MYFACT"."ROWID") …   YES

See Also:

  • In the Oracle Database Reference, the view DBA_AUTO_CLUSTERING_CONFIG displays the current configuration parameter settings for automatic clustering.. DBA_AUTO_CLUSTERING_RECOMMENDATIONS displays recommendations associated with automatic clustering.
  • The Oracle Database PL/SQL Packages and Types Reference describes the DBMS_AUTO_CLUSTERING package, which includes the APIs for configuring automatic data clustering as well as generating, verifying, and applying recommendations.