5
Selecting an Index Strategy

This chapter discusses the procedures necessary to create and manage the different types of objects contained in a user's schema. The topics include:

Managing Indexes

Indexes are used in Oracle to provide quick access to rows in a table. Indexes provide faster access to data for operations that return a small portion of a table's rows.

Oracle does not limit the number of indexes you can create on a table. However, you should consider the performance benefits of indexes and the needs of your database applications to determine which columns to index.

The following sections explain how to create, alter, and drop indexes using SQL commands. Some simple guidelines to follow when managing indexes are included.

See Also:
See Oracle8i Designing and Tuning for Performance for performance implications of index creation.

Create Indexes After Inserting Table Data

With one notable exception, you should usually create indexes after you have inserted or loaded (using SQL*Loader or Import) data into a table. It is more efficient to insert rows of data into a table that has no indexes and create the indexes later. If you create indexes before table data is loaded, then every index must be updated every time you insert a row into the table. The exception to this rule is that you must create an index for a cluster before you insert any data into the cluster.

When you create an index on a table that already has data, Oracle must use sort space to create the index. Oracle uses the sort space in memory allocated for the creator of the index (the amount per user is determined by the initialization parameter SORT_AREA_SIZE), but must also swap sort information to and from temporary segments allocated on behalf of the index creation. If the index is extremely large, it might be beneficial to complete the following steps:

Create a new temporary tablespace using the CREATE TABLESPACE command.
Use the TEMPORARY TABLESPACE option of the ALTER USER command to make this your new temporary tablespace.
Create the index using the CREATE INDEX command.
Drop this tablespace using the DROP TABLESPACE command. Then use the ALTER USER command to reset your temporary tablespace to your original temporary tablespace.

Under certain conditions, you can load data into a table with the SQL*Loader "direct path load", and an index can be created as data is loaded.

See Also:
Oracle8i Utilities

Index the Correct Tables and Columns

Use the following guidelines for determining when to create an index:

Create an index if you frequently want to retrieve less than 15% of the rows in a large table. The percentage varies greatly according to the relative speed of a table scan and how clustered the row data is about the index key. The faster the table scan, the lower the percentage; the more clustered the row data, the higher the percentage.
Index columns used for joins to improve performance on joins of multiple tables.

See Also:
Primary and unique keys automatically have indexes, but you might want to create an index on a foreign key; see Chapter 4, "Maintaining Data Integrity" for more information.
Small tables do not require indexes; if a query is taking too long, then the table might have grown from small to large.

Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are candidates for indexing:

Values are relatively unique in the column.
There is a wide range of values.
The column contains many nulls, but queries often select all rows having a value. In this case, the following phrase:
```
WHERE COL_X > -9.99 *power(10,125)
```
is preferable to
```
WHERE COL_X IS NOT NULL
```
This is because the first uses an index on COL_X (assuming that COL_X is a numeric column).

Columns with the following characteristics are less suitable for indexing:

The column has few distinct values (for example, a column for the sex of employees).
There are many nulls in the column and you do not search on the non-null values.

LONG and LONG RAW columns cannot be indexed.

The size of a single index entry cannot exceed roughly one-half (minus some overhead) of the available space in the data block. Consult with the database administrator for assistance in determining the space required by an index.

Limit the Number of Indexes per Table

A table can have any number of indexes. However, the more indexes, the more overhead is incurred as the table is altered. When rows are inserted or deleted, all indexes on the table must be updated. When a column is updated, all indexes on the column must be updated.

Thus, there is a trade-off between speed of retrieval for queries on a table and speed of accomplishing updates on the table. For example, if a table is primarily read-only, then more indexes might be useful; but, if a table is heavily updated, then fewer indexes might be preferable.

Order Index Columns for Performance

The order in which columns are named in the CREATE INDEX command does not need to correspond to the order in which they appear in the table. However, the order of columns in the CREATE INDEX statement is significant because query performance can be affected by the order chosen. In general, you should put the column expected to be used most often first in the index.

For example, assume the columns of the VENDOR_PARTS table are as shown in Figure 5-1.

Figure 5-1 The VENDOR_PARTS Table

Assume that there are five vendors, and each vendor has about 1000 parts.

Suppose that the VENDOR_PARTS table is commonly queried by SQL statements such as the following:

SELECT * FROM vendor_parts
    WHERE part_no = 457 AND vendor_id = 1012;

To increase the performance of such queries, you might create a composite index putting the most selective column first; that is, the column with the most values:

CREATE INDEX ind_vendor_id
    ON vendor_parts (part_no, vendor_id);

Indexes speed retrieval on any query using the leading portion of the index. So in the above example, queries with WHERE clauses using only the PART_NO column also note a performance gain. Because there are only five distinct values, placing a separate index on VENDOR_ID would serve no purpose.

Creating Indexes

You can create an index for a table to improve the performance of queries issued against the corresponding table. You can also create an index for a cluster. You can create a composite index on multiple columns up to a maximum of 32 columns. A composite index key cannot exceed roughly one-half (minus some overhead) of the available space in the data block.

Oracle automatically creates an index to enforce a UNIQUE or PRIMARY KEY integrity constraint. In general, it is better to create such constraints to enforce uniqueness and not explicitly use the obsolete CREATE UNIQUE INDEX syntax.

Use the SQL command CREATE INDEX to create an index. The following statement 
CREATE INDEX emp_ename ON Emp_tab(ename)
    TABLESPACE users
    STORAGE (INITIAL     20K
             NEXT        20k
             PCTINCREASE 75)
             PCTFREE      0;

Notice that several storage settings are explicitly specified for the index.

Privileges Required to Create an Index

To create a new index, you must own, or have the INDEX object privilege for, the corresponding table. The schema that contains the index must also have a quota for the tablespace intended to contain the index, or the UNLIMITED TABLESPACE system privilege. To create an index in another user's schema, you must have the CREATE ANY INDEX system privilege.

Dropping Indexes

You might drop an index for the following reasons:

The index is not providing anticipated performance improvements for queries issued against the associated table (the table is very small, or there are many rows in the table but very few index entries, etc.).
Applications do not contain queries that use the index.
The index is no longer needed and must be dropped before being rebuilt.

When you drop an index, all extents of the index's segment are returned to the containing tablespace and become available for other objects in the tablespace.

Use the SQL command DROP INDEX to drop an index. For example, to drop the EMP_ENAME index, enter the following statement:

DROP INDEX Emp_ename;

If you drop a table, then all associated indexes are dropped.

Privileges Required to Drop an Index

To drop an index, the index must be contained in your schema or you must have the DROP ANY INDEX system privilege.

Function-Based Indexes

A function-based index is an index built on an expression. It extends your indexing capabilities beyond indexing on a column. A function-based index increases the variety of ways in which you can access data.

Note:
You can create function-based indexes only if you are using the Oracle8i release, or higher.

The expression used in a function-based index can be an arithmetic expression or an expression that contains a PL/SQL function, package function, C callout, or SQL function. Function-based indexes also support linguistic sorts based on linguistic sort keys (collation), efficient linguistic collation of SQL statements, and case-insensitive sorts.

Like other indexes, function-based indexes improve query performance. For example, if you need to access a computationally complex expression often, then you can store it in an index. Then when you need to access the expression, it is already computed. You can find a detailed description of the advantages of function-based indexes in "Using Function-Based Indexes".

Function-based indexes have all of the same properties as indexes on columns. However, unlike indexes on columns which can be used by both cost-based and rule-based optimization, function-based indexes can be used by only by cost-based optimization. Other restrictions on function-based indexes are described in "Requirements and Restrictions for Function-Based Indexes".

See Also:
For more information on function-based indexes, see Oracle8i Concepts. For information on creating function-based indexes, see Oracle8i Administrator's Guide.

Using Function-Based Indexes

The following list describes the advantages of function-based indexes in greater detail:

Increase the number of situations where the optimizer can perform a range scan instead of a full table scan. For example: consider the expression in the WHERE clause below:
```
CREATE INDEX Idx ON Example_tab(Column_a + Column_b);
SELECT * FROM Example_tab WHERE Column_a + Column_b < 10; 
```
In the CREATE INDEX statement, idx is the name of the index, Example_tab is the name of the table, and column_a and column_b represent columns. The optimizer can use a range scan for this query because the index is built on (column_a + column_b). Range scans typically produce fast response times if the predicate has low selectivity (that is, if the predicate selects less than 15% of the rows of a large table). In addition, the optimizer can estimate selectivity of predicates involving expressions more accurately if the expressions are materialized in a function-based index (expressions of function-based indexes are represented as virtual columns and ANALYZE can build histograms on such columns).
Precompute the value of a computationally intensive function and store it in the index. If you have a computationally intensive expression that you access often, then you can store it in an index. When you need to access it, the value is already computed. This can greatly improve query execution performance.
Create indexes on object columns and REF columns. Methods that describe objects can be used as functions on which to build indexes. For example, you can use the MAP method to build indexes on an object type column.

Create more powerful sorts. You can perform case-insensitive sorts with the UPPER and LOWER functions, descending order sorts with the DESC keyword, and linguistic-based sorts with the NLSSORT function.

Note:
The DESC keyword in the CREATE INDEX statement is no longer ignored. Oracle sorts columns with the DESC keyword in descending order. Such indexes are treated as function-based indexes. Descending indexes cannot be bitmapped or reverse, and cannot be used in bitmapped optimizations. To get the pre-Oracle 8.1 release DESC functionality, remove the DESC keyword from the CREATE INDEX statement.

See Also:
For examples of how to use function-based indexes, see the Oracle8i Administrator's Guide.

Example

As an example, consider a weather research institute that maintains tables of weather data for various cities. Some of their projects include tracking daily temperature fluctuations throughout the year. Other projects include tracking fluctuations as a function of the city's distance from the equator. By building indexes on the complex functions that they want to calculate, the institute can optimize the execution of the queries they submit. The following section contains examples of indexes that could be created and the queries that could use them.

The table, Weatherdata_tab, contains columns for the minimum daily temperature (Mintemp), maximum daily temperature (Maxtemp), the day the temperature was recorded (Day), and the Region (Region_Obj). Region_Obj is an object column that contains columns for country (Country) and city (Cityname). Figure 5-2 illustrates the Weatherdata_tab schema.

Figure 5-2 WEATHERDATA_TAB Schema Design

An index is created that calculates the difference in temperature for the cities in the tables. A query that could use the delta_index index returns the contents of the table for temperature differences less than 20:

Note:
You may need to set up data structures similar to the following for certain examples to work:
CREATE OR REPLACE FUNCTION distance_from_equator(input NUMBER) RETURN NUMBER DETERMINISTIC IS distance NUMBER; BEGIN distance := 100000; RETURN (distance); END;

CREATE INDEX Delta_index 
ON Weatherdata_tab (Maxtemp - Mintemp);

SELECT * 
FROM Weatherdata_tab 
WHERE (Maxtemp - Mintemp) < '20';

An index is created that calls the object method distance_from_equator to calculate the distance from the equator for each city in the table. The method is applied to the object column Region_Obj. A query that could use the distance_index index returns the names of the cities that are at a distance greater than 1000 miles from the equator:

CREATE INDEX Distance_index 
ON Weatherdata_tab (Distance_from_equator (Reg_obj));

SELECT * 
FROM Weatherdata_tab 
WHERE (Distance_from_equator (Reg_Obj)) > '1000';

An index is created that satisfies the queries of German-speaking users that sorts temperature data by city name. A query that could use the City_index index returns the contents of the table, ordered by city name. The German sort order for city name is used. Note that in the SELECT statement, a WHERE clause is not needed. This is because in a German session, NLS_SORT is set to German and NLS_COMP is set to ANSI.

CREATE INDEX City_index 
ON Weatherdata_tab (NLSSORT(Cityname, 'NLS_SORT=German'));

SELECT * 
FROM Weatherdata_tab WHERE Cityname IS NOT NULL
ORDER BY Cityname;

An index is created on the difference between the maximum and minimum temperatures, and on the maximum temperature. The result of the difference is sorted in descending order. A query that could use the compare_index index returns the contents of the table that satisfy the condition where the difference is less than 20 and the maximum temperature is greater than 75.

CREATE INDEX compare_index 
ON Weatherdata_tab ((Maxtemp - Mintemp) DESC, Maxtemp);

SELECT * 
FROM Weatherdata_tab WHERE ((Maxtemp - Mintemp) < '20' AND Maxtemp > '75');

Example Function-Based Indexes

Example 1:

The following command creates a function-based index IDX on table EMP_TAB, for efficient case-insensitive searches.

CREATE INDEX Idx ON Emp_tab (UPPER(Ename));

The SELECT command uses the function-based index on UPPER(e_name) to return all of the employees with name like :KEYCOL.

SELECT * 
FROM Emp_tab 
WHERE UPPER(Ename) like :KEYCOL;

Example 2:

The following command creates a function-based index IDX on table Fbi_tab where A, B, and C represent columns.

CREATE INDEX Idx
On Fbi_tab (A + B * (C - 1), A, B);

The SELECT statement can either use index range scan (notice that the expression is a prefix of index IDX) or index fast full scan (which may be preferable if the index has specified a high parallel degree).

SELECT a 
FROM Fbi_tab 
Where A + B * (C - 1) < 100;

Example 3:

This example demonstrates how a function-based index can be used to support an NLS Sort Index. Given a string, the NLSSORT function returns a sort key. The following CREATE INDEX statement creates an NLS_SORT sort on table NLS_TAB with collation sequence GERMAN.

CREATE INDEX Nls_index 
ON Nls_tab (NLSSORT(Name, 'NLS_SORT = German'));

The SELECT statement selects all of the contents of the table and orders it by NAME. The rows are ordered using the German collation sequence.

SELECT * 
FROM Nls_tab WHERE Name IS NOT NULL
ORDER BY Name;

Requirements and Restrictions for Function-Based Indexes

Note the following requirements and restrictions for function-based indexes:

Only cost-based optimization can use function-based indexes.
A PL/SQL function, either a top level function or a package-level function, used in the index expression must be declared as DETERMINISTIC. There is no error checking whether or not a subprogram is qualified as DETERMINISTIC. You must ensure that the subprogram is DETERMINISTIC.

The following semantic rules demonstrate how to use the keyword DETERMINISTIC:
- A top level subprogram can be declared as DETERMINISTIC.
- A PACKAGE level subprogram can be declared as DETERMINISTIC in the PACKAGE specification but not in the PACKAGE BODY. Errors are raised if DETERMINISTIC is used inside a PACKAGE BODY.
- A private subprogram (declared inside another subprogram or a PACKAGE BODY) cannot be declared as DETERMINISTIC.
- A DETERMINISTIC subprogram can call another subprogram whether the called program is declared as DETERMINISTIC or not.
Function-based indexes cannot be built on LOB columns, nested tables, or varrays.
Expressions used in a function-based index should reference only columns in a row in the table. Hence, these expressions cannot contain any aggregate functions.
You must have the initialization parameters COMPATIBLE set to 8.1.0.0.0 or higher, QUERY_REWRITE_ENABLED=TRUE, and QUERY_REWRITE_INTEGRITY=TRUSTED.
You must analyze the table or index before the index is used.
Bitmap optimizations cannot used descending indexes.
Function-based indexes are not used when OR-expansion is done.
The index function cannot be marked NOT NULL. To avoid a full table scan, you must ensure that the query cannot fetch null values.

Function-based indexes that return VARCHAR2 or RAW data types from a PL/SQL function are not permitted due to length restrictions. A possible work around is to use substrings to limit the size of the function's output. For example:

Note:
You may need to set up data structures similar to the following for certain examples to work:
CREATE OR REPLACE FUNCTION x(input IN VARCHAR2) RETURN VARCHAR2 AS output VARCHAR2(12); BEGIN output :=input; RETURN (output); END; SELECT SUBSTR(x('hello'),1,100) FROM DUAL;

SUBSTR (F(X), 1, 100)

Where F(X) represents the PL/SQL function. The SUBSTR command would need to be used for the function when creating the index and when referencing the function in queries.

Managing Clusters, Clustered Tables, and Cluster Indexes

Because clusters store related rows of different tables together in the same data blocks, two primary benefits are achieved when clusters are properly used:

Disk I/O is reduced and access time improves for joins of clustered tables.
In a cluster, a cluster key value (the related value) is only stored once, no matter how many rows of different tables contain the value. Therefore, less storage may be required to store related table data in a cluster than is necessary in non-clustered table format.

Guidelines for Creating Clusters

Some guidelines for creating clusters are outlined below.

See Also:
For performance characteristics, see Oracle8i Designing and Tuning for Performance.

Choose Appropriate Tables to Cluster

Use clusters to store one or more tables that are primarily queried (not predominantly inserted into or updated), and for which queries often join data of multiple tables in the cluster or retrieve related data from a single table.

Choose Appropriate Columns for the Cluster Key

Choose cluster key columns carefully. If multiple columns are used in queries that join the tables, then make the cluster key a composite key. In general, the same column characteristics that make a good index apply for cluster indexes.

See Also:
"Index the Correct Tables and Columns" has more information about these guidelines.

A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one data block. Too few rows per cluster key value can waste space and result in negligible performance gains. Cluster keys that are so specific that only a few rows share a common value can cause wasted space in blocks, unless a small SIZE was specified at cluster creation time.

Too many rows per cluster key value can cause extra searching to find rows for that key. Cluster keys on values that are too general (for example, MALE and FEMALE) result in excessive searching and can result in worse performance than with no clustering.

A cluster index cannot be unique or include a column defined as LONG.

Performance Considerations

Also note that clusters can reduce the performance of DML statements (INSERTs, UPDATEs, and DELETEs) as compared to storing a table separately with its own index. These disadvantages relate to the use of space and the number of blocks that must be visited to scan a table. Because multiple tables share each block, more blocks must be used to store a clustered table than if that same table were stored non-clustered. You should decide about using clusters with these trade-offs in mind.

To identify data that would be better stored in clustered form than in non-clustered form, look for tables that are related via referential integrity constraints, and tables that are frequently accessed together using SELECT statements that join data from two or more tables. If you cluster tables on the columns used to join table data, then you reduce the number of data blocks that must be accessed to process the query; all the rows needed for a join on a cluster key are in the same block. Therefore, query performance for joins is improved.

Similarly, it may be useful to cluster an individual table. For example, the EMP_TAB table could be clustered on the DEPTNO column to cluster the rows for employees in the same department. This would be advantageous if applications commonly process rows, department by department.

Like indexes, clusters do not affect application design. The existence of a cluster is transparent to users and to applications. Data stored in a clustered table is accessed via SQL just like data stored in a non-clustered table.

Creating Clusters, Clustered Tables, and Cluster Indexes

Use a cluster to store one or more tables that are frequently joined in queries. Do not use a cluster to cluster tables that are frequently accessed individually.

Once you create a cluster, tables can be created in the cluster. However, before you can insert any rows into the clustered tables, you must create a cluster index. The use of clusters does not affect the creation of additional indexes on the clustered tables; you can create and drop them as usual.

Use the SQL command CREATE CLUSTER to create a cluster. The following statement creates a cluster named EMP_DEPT, which stores the EMP_TAB and DEPT_TAB tables, clustered by the DEPTNO column:

CREATE CLUSTER Emp_dept (Deptno NUMBER(3))
        PCTUSED 80
        PCTFREE 5;

Create a table in a cluster using the SQL command CREATE TABLE with the CLUSTER option. For example, the EMP_TAB and DEPT_TAB tables can be created in the EMP_DEPT cluster using the following statements:

CREATE TABLE Dept_tab (
    Deptno NUMBER(3) PRIMARY KEY,
    . . . )
    CLUSTER Emp_dept (Deptno);

CREATE TABLE Emp_tab (
    Empno NUMBER(5) PRIMARY KEY,
    Ename VARCHAR2(15) NOT NULL,
    . . .
    Deptno NUMBER(3) REFERENCES Dept_tab)
    CLUSTER Emp_dept (Deptno);

A table created in a cluster is contained in the schema specified in the CREATE TABLE statement; a clustered table might not be in the same schema that contains the cluster.

You must create a cluster index before any rows can be inserted into any clustered table. For example, the following statement creates a cluster index for the EMP_DEPT cluster:

CREATE INDEX Emp_dept_index
    ON CLUSTER Emp_dept
    INITRANS   2
    MAXTRANS   5
    PCTFREE    5;

Note:
A cluster index cannot be unique. Furthermore, Oracle is not guaranteed to enforce uniqueness of columns in the cluster key if they have UNIQUE or PRIMARY KEY constraints.

The cluster key establishes the relationship of the tables in the cluster.

Privileges Required to Create a Cluster, Clustered Table, and Cluster Index

To create a cluster in your schema, you must have the CREATE CLUSTER system privilege and a quota for the tablespace intended to contain the cluster or the UNLIMITED TABLESPACE system privilege. To create a cluster in another user's schema, you must have the CREATE ANY CLUSTER system privilege, and the owner must have a quota for the tablespace intended to contain the cluster or the UNLIMITED TABLESPACE system privilege.

To create a table in a cluster, you must have either the CREATE TABLE or CREATE ANY TABLE system privilege. You do not need a tablespace quota or the UNLIMITED TABLESPACE system privilege to create a table in a cluster.

To create a cluster index, your schema must contain the cluster, and you must have the following privileges:

The CREATE ANY INDEX system privilege or, if you own the cluster, the CREATE INDEX privilege
A quota for the tablespace intended to contain the cluster index, or the UNLIMITED TABLESPACE system privilege

Manually Allocating Storage for a Cluster

Oracle dynamically allocates additional extents for the data segment of a cluster, as required. In some circumstances, you might want to explicitly allocate an additional extent for a cluster. For example, when using the Oracle Parallel Server, an extent of a cluster can be allocated explicitly for a specific instance.

You can allocate a new extent for a cluster using the SQL command ALTER CLUSTER with the ALLOCATE EXTENT option.

Dropping Clusters, Clustered Tables, and Cluster Indexes

Drop a cluster if the tables currently within the cluster are no longer necessary. When you drop a cluster, the tables within the cluster and the corresponding cluster index are dropped; all extents belonging to both the cluster's data segment and the index segment of the cluster index are returned to the containing tablespace and become available for other segments within the tablespace.

You can individually drop clustered tables without affecting the table's cluster, other clustered tables, or the cluster index. Drop a clustered table in the same manner as a non-clustered table--use the SQL command DROP TABLE.

See "Dropping Tables" for more information about individually dropping tables.

Note:
When you drop a single clustered table from a cluster, each row of the table must be deleted from the cluster. To maximize efficiency, if you intend to drop the entire cluster including all tables, then use the DROP CLUSTER command with the INCLUDING TABLES option.
You should only use the DROP TABLE command to drop an individual table from a cluster when the rest of the cluster is going to remain.

You can drop a cluster index without affecting the cluster or its clustered tables. However, you cannot use a clustered table if it does not have a cluster index. Cluster indexes are sometimes dropped as part of the procedure to rebuild a fragmented cluster index.

See Also:
"Dropping Indexes"

To drop a cluster that contains no tables, as well as its cluster index, if present, use the SQL command DROP CLUSTER. For example, the following statement drops the empty cluster named EMP_DEPT:

DROP CLUSTER Emp_dept;

If the cluster contains one or more clustered tables, and if you intend to drop the tables as well, then add the INCLUDING TABLES option of the DROP CLUSTER command. For example:

DROP CLUSTER Emp_dept INCLUDING TABLES;

If you do not include the INCLUDING TABLES option, and if the cluster contains tables, then an error is returned.

If one or more tables in a cluster contain primary or unique keys that are referenced by FOREIGN KEY constraints of tables outside the cluster, then you cannot drop the cluster unless you also drop the dependent FOREIGN KEY constraints. Use the CASCADE CONSTRAINTS option of the DROP CLUSTER command, as in

DROP CLUSTER Emp_dept INCLUDING TABLES CASCADE CONSTRAINTS;

An error is returned if the above option is not used in the appropriate situation.

Privileges Required to Drop a Cluster

To drop a cluster, your schema must contain the cluster, or you must have the DROP ANY CLUSTER system privilege. You do not have to have any special privileges to drop a cluster that contains tables, even if the clustered tables are not owned by the owner of the cluster.

Managing Hash Clusters and Clustered Tables

The following sections explain how to create, alter, and drop hash clusters and clustered tables using SQL commands.

Creating Hash Clusters and Clustered Tables

A hash cluster is used to store individual tables or a group of clustered tables that are static and often queried by equality queries. Once you create a hash cluster, you can create tables. To create a hash cluster, use the SQL command CREATE CLUSTER. The following statement creates a cluster named TRIAL_CLUSTER that is used to store the TRIAL_TAB table, clustered by the TRIALNO column:

Note:
You may need to use a setup similar to the following for certain examples to work:
ALTER TABLESPACE SYSTEM ADD DATAFILE 'disk1:moredata1' SIZE 50K AUTOEXTEND ON;

CREATE CLUSTER Trial_cluster (
    Trialno NUMBER(5,0))
    PCTUSED 80     
    PCTFREE 5
    SIZE    2K
    HASH IS Trialno HASHKEYS 100000;

CREATE TABLE Trial_tab (
    Trialno NUMBER(5) PRIMARY KEY,
    ...)
    CLUSTER Trial_cluster (Trialno);

Controlling Space Usage Within a Hash Cluster

When you create a hash cluster, it is important that you correctly choose the cluster key and set the HASH IS, SIZE, and HASHKEYS parameters to achieve the desired performance and space usage for the cluster. The following sections provide guidance, as well as examples of setting these parameters.

Choosing the Key

Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the EMP_TAB table in a hash cluster. If queries often select rows by employee number, then the EMPNO column should be the cluster key; if queries often select rows by department number, then the DEPTNO column should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table. A hash cluster with a composite key must use Oracle's internal hash function.

Setting HASH IS

Only specify the HASH IS parameter if the cluster key is a single column of the NUMBER datatype and contains uniformly distributed integers. If the above conditions apply, then you can distribute rows in the cluster such that each unique cluster key value hashes to a unique hash value (with no collisions). If the above conditions do not apply, you should use the internal hash function.

Dropping Hash Clusters

Drop a hash cluster using the SQL command DROP CLUSTER:

DROP CLUSTER Emp_dept;

Drop a table in a hash cluster using the SQL command DROP TABLE. The implications of dropping hash clusters and tables in hash clusters are the same as for index clusters.

When to Use Hashing

Storing a table in a hash cluster is an alternative to storing the same table with an index. Hashing is useful in the following situations:

Most queries are equality queries on the cluster key. For example:
```
SELECT . . . WHERE Cluster_key = . . . ;
```
In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. With an indexed table, the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).
The table or tables in the hash cluster are primarily static in size such that you can determine the number of rows and amount of space required for the tables in the cluster. If tables in a hash cluster require more space than the initial allocation for the cluster, then performance degradation can be substantial because overflow blocks are required.
A hash cluster with the HASH IS col, HASHKEYS n, and SIZE m clauses is an ideal representation for an array (table) of n items (rows) where each item consists of m bytes of data. For example:
```
ARRAY X[100] OF NUMBER(8)
```
This could be represented as the following:
```
CREATE CLUSTER C(Subscript INTEGER)
```
```
HASH IS Subscript HASHKEYS 100 SIZE 100;
```
```
CREATE TABLE X(Subscript NUMBER(2), Value NUMBER(8))
```
```
CLUSTER C(Subscript);
```

Alternatively, hashing is not advantageous in the following situations:

Most queries on the table retrieve rows over a range of cluster key values. For example, in full table scans, or queries:
```
SELECT . . . WHERE Cluster_key < . . . ;
```
A hash function cannot be used to determine the location of specific hash keys; instead, the equivalent of a full table scan must be done to fetch the rows for the query. With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE clause of a query can be found with relatively few I/Os.
A table is not static, but is continually growing. If a table grows without limit, then the space required over the life of the table (thus, of its cluster) cannot be predetermined.
Applications frequently perform full table scans on the table and the table is sparsely populated. A full table scan in this situation takes longer under hashing.
You cannot afford to preallocate the space the hash cluster will eventually need.

In most cases, you should decide (based on the above information) whether to use hashing or indexing. If you use indexing, consider whether it is best to store a table individually or as part of a cluster.

If you decide to use hashing, then a table can still have separate indexes on any columns, including the cluster key.

See Also:
For additional guidelines on the performance characteristics of hash clusters, see Oracle8i Designing and Tuning for Performance.

5 Selecting an Index Strategy