|Oracle8i Application Developer's Guide - Fundamentals
This chapter discusses the procedures necessary to create and manage the different types of objects contained in a user's schema. The topics include:
Specific information is described in the following locations:
Specific information is described in the following locations:
Indexes are used in Oracle to provide quick access to rows in a table. Indexes provide faster access to data for operations that return a small portion of a table's rows.
Oracle does not limit the number of indexes you can create on a table. However, you should consider the performance benefits of indexes and the needs of your database applications to determine which columns to index.
The following sections explain how to create, alter, and drop indexes using SQL commands. Some simple guidelines to follow when managing indexes are included.
With one notable exception, you should usually create indexes after you have inserted or loaded (using SQL*Loader or Import) data into a table. It is more efficient to insert rows of data into a table that has no indexes and then to create the indexes for subsequent queries, etc. If you create indexes before table data is loaded, then every index must be updated every time you insert a row into the table. The exception to this rule is that you must create an index for a cluster before you insert any data into the cluster.
When you create an index on a table that already has data, Oracle must use sort space to create the index. Oracle uses the sort space in memory allocated for the creator of the index (the amount per user is determined by the initialization parameter
SORT_AREA_SIZE), but must also swap sort information to and from temporary segments allocated on behalf of the index creation. If the index is extremely large, it might be beneficial to complete the following steps:
TEMPORARY TABLESPACEoption of the
ALTER USERcommand to make this your new temporary tablespace.
DROP TABLESPACEcommand. Then use the
ALTER USERcommand to reset your temporary tablespace to your original temporary tablespace.
Under certain conditions, you can load data into a table with the SQL*Loader "direct path load", and an index can be created as data is loaded.
Use the following guidelines for determining when to create an index:
Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are candidates for indexing:
is preferable to
This is because the first uses an index on
COL_X (assuming that
COL_X is a numeric column).
Columns with the following characteristics are less suitable for indexing:
RAW columns cannot be indexed.
The size of a single index entry cannot exceed roughly one-half (minus some overhead) of the available space in the data block. Consult with the database administrator for assistance in determining the space required by an index.
A table can have any number of indexes. However, the more indexes, the more overhead is incurred as the table is altered. When rows are inserted or deleted, all indexes on the table must be updated. When a column is updated, all indexes on the column must be updated.
Thus, there is a trade-off between speed of retrieval for queries on a table and speed of accomplishing updates on the table. For example, if a table is primarily read-only, then more indexes might be useful; but, if a table is heavily updated, then fewer indexes might be preferable.
The order in which columns are named in the
INDEX command does not need to correspond to the order in which they appear in the table. However, the order of columns in the
INDEX statement is significant because query performance can be affected by the order chosen. In general, you should put the column expected to be used most often first in the index.
For example, assume the columns of the
VENDOR_PARTS table are as shown in Figure 6-1.
Assume that there are five vendors, and each vendor has about 1000 parts.
Suppose that the
VENDOR_PARTS table is commonly queried by SQL statements such as the following:
To increase the performance of such queries, you might create a composite index putting the most selective column first; that is, the column with the most values:
Indexes speed retrieval on any query using the leading portion of the index. So in the above example, queries with
WHERE clauses using only the
PART_NO column also note a performance gain. Because there are only five distinct values, placing a separate index on
VENDOR_ID would serve no purpose.
You can create an index for a table to improve the performance of queries issued against the corresponding table. You can also create an index for a cluster. You can create a composite index on multiple columns up to a maximum of 16 columns. A composite index key cannot exceed roughly one-half (minus some overhead) of the available space in the data block.
Oracle automatically creates an index to enforce a
KEY integrity constraint. In general, it is better to create such constraints to enforce uniqueness and not explicitly use the obsolete
Use the SQL command
INDEXto create an index. The following statement CREATE INDEX emp_ename ON Emp_tab(ename)
Notice that several storage settings are explicitly specified for the index.
To create a new index, you must own, or have the
INDEX object privilege for, the corresponding table. The schema that contains the index must also have a quota for the tablespace intended to contain the index, or the
TABLESPACE system privilege. To create an index in another user's schema, you must have the
INDEX system privilege.
You might drop an index for the following reasons:
When you drop an index, all extents of the index's segment are returned to the containing tablespace and become available for other objects in the tablespace.
Use the SQL command
INDEX to drop an index. For example, to drop the
EMP_ENAME index, enter the following statement:
If you drop a table, then all associated indexes are dropped.
To drop an index, the index must be contained in your schema or you must have the
INDEX system privilege.
A function-based index is an index built on an expression. It extends your indexing capabilities beyond indexing on a column. A function-based index increases the variety of ways in which you can access data.
The expression used in a function-based index can be an arithmetic expression or an expression that contains a PL/SQL function, package function, C callout, or SQL function. Function-based indexes also support linguistic sorts based on linguistic sort keys (collation), efficient linguistic collation of SQL statements, and case-insensitive sorts.
Like other indexes, function-based indexes improve query performance. For example, if you need to access a computationally complex expression often, then you can store it in an index. Then when you need to access the expression, it is already computed. You can find a detailed description of the advantages of function-based indexes in "Using Function-Based Indexes".
Function-based indexes have all of the same properties as indexes on columns. However, unlike indexes on columns which can be used by both cost-based and rule-based optimization, function-based indexes can be used by only by cost-based optimization. Other restrictions on function-based indexes are described in "Restrictions on Function-Based Indexes".
The following list describes the advantages of function-based indexes in greater detail:
CREATE INDEX Idx ON Example_tab(Column_a + Column_b); SELECT * FROM Example_tab WHERE Column_a + Column_b < 10;
idx is the name of the index,
Example_tab is the name of the table, and
column_b represent columns. The optimizer can use a range scan for this query because the index is built on (
column_b). Range scans typically produce fast response times if the predicate has low selectivity (that is, if the predicate selects less than 15% of the rows of a large table). In addition, the optimizer can estimate selectivity of predicates involving expressions more accurately if the expressions are materialized in a function-based index (expressions of function-based indexes are represented as virtual columns and
ANALYZE can build histograms on such columns).
MAPmethod to build indexes on an object type column.
LOWERfunctions, descending order sorts with the
DESCfunction, and linguistic-based sorts with the
DESC keyword in the
INDEX statement is no longer ignored. Oracle sorts columns with the
DESC keyword in descending order. Such indexes are treated as function-based indexes. Descending indexes cannot be bitmapped or reverse, and cannot be used in bitmapped optimizations. To get the pre-Oracle 8.1 release
DESC functionality, remove the
DESC keyword from the
For examples of how to use function-based indexes, see the Oracle8i Administrator's Guide.
As an example, consider a weather research institute that maintains tables of weather data for various cities. Some of their projects include tracking daily temperature fluctuations throughout the year. Other projects include tracking fluctuations as a function of the city's distance from the equator. By building indexes on the complex functions that they want to calculate, the institute can optimize the execution of the queries they submit. The following section contains examples of indexes that could be created and the queries that could use them.
Weatherdata_tab, contains columns for the minimum daily temperature (
Mintemp), maximum daily temperature (
Maxtemp), the day the temperature was recorded (
Day), and the Region (
Region_Obj is an object column that contains columns for country (
Country) and city (
Cityname). Figure 6-2 illustrates the
An index is created that calculates the difference in temperature for the cities in the tables. A query that could use the
delta_index index returns the contents of the table for temperature differences less than 20:
CREATE INDEX Delta_index ON Weatherdata_tab (Maxtemp - Mintemp); SELECT * FROM Weatherdata_tab WHERE (Maxtemp - Mintemp) < '20';
An index is created that calls the object method
distance_from_equator to calculate the distance from the equator for each city in the table. The method is applied to the object column
Region_Obj. A query that could use the
distance_index index returns the names of the cities that are at a distance greater than 1000 miles from the equator:
CREATE INDEX Distance_index ON Weatherdata_tab (Distance_from_equator (Reg_obj)); SELECT * FROM Weatherdata_tab WHERE (Distance_from_equator (Reg_Obj)) > '1000';
An index is created that satisfies the queries of German-speaking users that sorts temperature data by city name. A query that could use the
City_index index returns the contents of the table, ordered by city name. The German sort order for city name is used. Note that in the
SELECT statement, a
WHERE clause is not needed. This is because in a German session,
NLSSORT is set to
CREATE INDEX City_index ON Weatherdata_tab (NLSSORT(Cityname, 'NLS_SORT=German')); SELECT * FROM Weatherdata_tab ORDER BY Cityname;
An index is created on the difference between the maximum and minimum temperatures, and on the maximum temperature. The result of the difference is sorted in descending order. A query that could use the
compare_index index returns the contents of the table that satisfy the condition where the difference is less than 20 and the maximum temperature is greater than 75.
CREATE INDEX compare_index ON Weatherdata_tab ((Maxtemp - Mintemp) DESC, Maxtemp); SELECT * FROM Weatherdata_tab WHERE ((Maxtemp - Mintemp) < '20' AND Maxtemp > '75');
The following command creates a function-based index
IDX on table
SELECT command uses the function-based index on
e_name) to return all of the employees with name like :
The following command creates a function-based index
IDX on table Fbi_tab where A, B, and C represent columns.
SELECT statement can either use index range scan (notice that the expression is a prefix of index
IDX) or index fast full scan (which may be preferable if the index has specified a high parallel degree).
This example demonstrates how a function-based index can be used to support an NLS Sort Index. Given a string, the
NLSSORT function returns a sort key. The following
INDEX statement creates an
NLS_SORT sort on table
NLS_TAB with collation sequence
SELECT statement selects all of the contents of the table and orders it by
NAME. The rows are ordered using the German collation sequence.
This example demonstrates a case-insensitive search. The
UPPER function converts the
ENAMEs to all uppercase letters:
An example query which would use this index is:
Note the following restrictions on function-based indexes:
DETERMINISTIC. There is no error checking whether or not a subprogram is qualified as
DETERMINISTIC. It is up to you to ensure that a subprogram is
The following semantic rules demonstrate how to use the keyword
PACKAGElevel subprogram can be declared as DETERMINISTIC in the
PACKAGEspecification but not in the
BODY. Errors will be raised if DETERMINISTIC is used inside a
BODY) cannot be declared as DETERMINISTIC.
DETERMINISTICsubprogram can call another subprogram whether the called program is declared as
RAWdata types from a PL/SQL function are not permitted due to length restrictions. A possible work around is to use substrings to limit the size of the function's output. For example:
F(X) represents the PL/SQL function. The
SUBSTR command would need to be used for the function when creating the index and when referencing the function in queries.
Because clusters store related rows of different tables together in the same data blocks, two primary benefits are achieved when clusters are properly used:
Some guidelines for creating clusters are outlined below.
For performance characteristics, see Oracle8i Tuning.
Use clusters to store one or more tables that are primarily queried (not predominantly inserted into or updated), and for which queries often join data of multiple tables in the cluster or retrieve related data from a single table.
Choose cluster key columns carefully. If multiple columns are used in queries that join the tables, then make the cluster key a composite key. In general, the same column characteristics that make a good index apply for cluster indexes.
"Index the Correct Tables and Columns" has more information about these guidelines.
A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one data block. Too few rows per cluster key value can waste space and result in negligible performance gains. Cluster keys that are so specific that only a few rows share a common value can cause wasted space in blocks, unless a small
SIZE was specified at cluster creation time.
Too many rows per cluster key value can cause extra searching to find rows for that key. Cluster keys on values that are too general (for example,
FEMALE) result in excessive searching and can result in worse performance than with no clustering.
A cluster index cannot be unique or include a column defined as
Also note that clusters can reduce the performance of DML statements (
DELETEs) as compared to storing a table separately with its own index. These disadvantages relate to the use of space and the number of blocks that must be visited to scan a table. Because multiple tables share each block, more blocks must be used to store a clustered table than if that same table were stored non-clustered. You should decide about using clusters with these trade-offs in mind.
To identify data that would be better stored in clustered form than in non-clustered form, look for tables that are related via referential integrity constraints, and tables that are frequently accessed together using
SELECT statements that join data from two or more tables. If you cluster tables on the columns used to join table data, then you reduce the number of data blocks that must be accessed to process the query; all the rows needed for a join on a cluster key are in the same block. Therefore, query performance for joins is improved.
Similarly, it may be useful to cluster an individual table. For example, the
EMP_TAB table could be clustered on the
DEPTNO column to cluster the rows for employees in the same department. This would be advantageous if applications commonly process rows, department by department.
Like indexes, clusters do not affect application design. The existence of a cluster is transparent to users and to applications. Data stored in a clustered table is accessed via SQL just like data stored in a non-clustered table.
Use a cluster to store one or more tables that are frequently joined in queries. Do not use a cluster to cluster tables that are frequently accessed individually.
Once you create a cluster, tables can be created in the cluster. However, before you can insert any rows into the clustered tables, you must create a cluster index. The use of clusters does not affect the creation of additional indexes on the clustered tables; you can create and drop them as usual.
Use the SQL command
CLUSTER to create a cluster. The following statement creates a cluster named
EMP_DEPT, which stores the
DEPT_TAB tables, clustered by the
Create a table in a cluster using the SQL command
TABLE with the
CLUSTER option. For example, the
DEPT_TAB tables can be created in the
EMP_DEPT cluster using the following statements:
CREATE TABLE Dept_tab ( CREATE TABLE Emp_tab (Empno NUMBER(5) PRIMARY KEY, Ename VARCHAR2(15) NOT NULL, . . . Deptno NUMBER(3) REFERENCES Dept_tab) CLUSTER Emp_dept (Deptno);
A table created in a cluster is contained in the schema specified in the
TABLE statement; a clustered table might not be in the same schema that contains the cluster.
You must create a cluster index before any rows can be inserted into any clustered table. For example, the following statement creates a cluster index for the
The cluster key establishes the relationship of the tables in the cluster.
To create a cluster in your schema, you must have the
CLUSTER system privilege and a quota for the tablespace intended to contain the cluster or the
TABLESPACE system privilege. To create a cluster in another user's schema, you must have the
CLUSTER system privilege, and the owner must have a quota for the tablespace intended to contain the cluster or the
TABLESPACE system privilege.
To create a table in a cluster, you must have either the
TABLE system privilege. You do not need a tablespace quota or the
TABLESPACE system privilege to create a table in a cluster.
To create a cluster index, your schema must contain the cluster, and you must have the following privileges:
INDEXsystem privilege or, if you own the cluster, the
Oracle dynamically allocates additional extents for the data segment of a cluster, as required. In some circumstances, you might want to explicitly allocate an additional extent for a cluster. For example, when using the Oracle Parallel Server, an extent of a cluster can be allocated explicitly for a specific instance.
You can allocate a new extent for a cluster using the SQL command
CLUSTER with the
Drop a cluster if the tables currently within the cluster are no longer necessary. When you drop a cluster, the tables within the cluster and the corresponding cluster index are dropped; all extents belonging to both the cluster's data segment and the index segment of the cluster index are returned to the containing tablespace and become available for other segments within the tablespace.
You can individually drop clustered tables without affecting the table's cluster, other clustered tables, or the cluster index. Drop a clustered table in the same manner as a non-clustered table--use the SQL command
See "Dropping Tables" for more information about individually dropping tables.
When you drop a single clustered table from a cluster, each row of the table must be deleted from the cluster. To maximize efficiency, if you intend to drop the entire cluster including all tables, then use the
You should only use the
You can drop a cluster index without affecting the cluster or its clustered tables. However, you cannot use a clustered table if it does not have a cluster index. Cluster indexes are sometimes dropped as part of the procedure to rebuild a fragmented cluster index.
To drop a cluster that contains no tables, as well as its cluster index, if present, use the SQL command
CLUSTER. For example, the following statement drops the empty cluster named
If the cluster contains one or more clustered tables, and if you intend to drop the tables as well, then add the
TABLES option of the
CLUSTER command. For example:
If you do not include the
TABLES option, and if the cluster contains tables, then an error is returned.
If one or more tables in a cluster contain primary or unique keys that are referenced by
KEY constraints of tables outside the cluster, then you cannot drop the cluster unless you also drop the dependent
FOREIGN KEY constraints. Use the
CONSTRAINTS option of the
CLUSTER command, as in
An error is returned if the above option is not used in the appropriate situation.
To drop a cluster, your schema must contain the cluster, or you must have the
CLUSTER system privilege. You do not have to have any special privileges to drop a cluster that contains tables, even if the clustered tables are not owned by the owner of the cluster.
The following sections explain how to create, alter, and drop hash clusters and clustered tables using SQL commands.
A hash cluster is used to store individual tables or a group of clustered tables that are static and often queried by equality queries. Once you create a hash cluster, you can create tables. To create a hash cluster, use the SQL command
CLUSTER. The following statement creates a cluster named
TRIAL_CLUSTER that is used to store the
TRIAL_TAB table, clustered by the
CREATE CLUSTER Trial_cluster ( Trialno NUMBER(5,0)) CREATE TABLE Trial_tab (
Controlling Space Usage Within a Hash Cluster
When you create a hash cluster, it is important that you correctly choose the cluster key and set the
HASHKEYSparameters to achieve the desired performance and space usage for the cluster. The following sections provide guidance, as well as examples of setting these parameters.
Choosing the Key
Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the
EMP_TABtable in a hash cluster. If queries often select rows by employee number, then the
EMPNOcolumn should be the cluster key; if queries often select rows by department number, then the
DEPTNOcolumn should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table. A hash cluster with a composite key must use Oracle's internal hash function.
Setting HASH IS
Only specify the
ISparameter if the cluster key is a single column of the
NUMBERdatatype and contains uniformly distributed integers. If the above conditions apply, then you can distribute rows in the cluster such that each unique cluster key value hashes to a unique hash value (with no collisions). If the above conditions do not apply, you should use the internal hash function.
Dropping Hash Clusters
Drop a hash cluster using the SQL command
Drop a table in a hash cluster using the SQL command
TABLE. The implications of dropping hash clusters and tables in hash clusters are the same as for index clusters.
When to Use Hashing
Storing a table in a hash cluster is an alternative to storing the same table with an index. Hashing is useful in the following situations:
In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. With an indexed table, the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).
SIZEm clauses is an ideal representation for an array (table) of n items (rows) where each item consists of m bytes of data. For example:
This could be represented as the following:
Alternatively, hashing is not advantageous in the following situations:
A hash function cannot be used to determine the location of specific hash keys; instead, the equivalent of a full table scan must be done to fetch the rows for the query. With an index, key values are ordered in the index, so cluster key values that satisfy the
WHERE clause of a query can be found with relatively few I/Os.
In most cases, you should decide (based on the above information) whether to use hashing or indexing. If you use indexing, consider whether it is best to store a table individually or as part of a cluster.
If you decide to use hashing, then a table can still have separate indexes on any columns, including the cluster key.
For additional guidelines on the performance characteristics of hash clusters, see Oracle8i Tuning.