|Oracle7 Server Administrator's Guide||
Oracle uses a hash function to generate a distribution of numeric values, called hash values, which are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, Oracle applies the hash function to the row's cluster key value; the resulting hash value corresponds to a data block in the cluster, which Oracle then reads or writes on behalf of the issued statement.
To find or store a row in an indexed table or cluster, a minimum of two (there are usually more) I/Os must be performed:
Hashing is most advantageous when you have the following conditions:
SELECT . . . WHERE cluster_key = . . . ;
SELECT . . . WHERE cluster_key < . . . ;
For information about hash functions and specifying user-defined hash functions, see the Oracle7 Server Concepts manual.
Even if you decide to use hashing, a table can still have separate indexes on any columns, including the cluster key. See the Oracle7 Server Application Developer's Guide for additional recommendations.
Oracle guarantees that the initial allocation of space is sufficient to store the hash table according to the settings SIZE and HASHKEYS. If settings for the storage parameters INITIAL, NEXT, and MINEXTENTS do not account for the hash table size, incremental (additional) extents are allocated until at least SIZE*HASHKEYS is reached. For example, assume that the data block size is 2K, the available data space per block is approximately 1900 bytes (data block size minus overhead), and that the STORAGE and HASH parameters are specified in the CREATE CLUSTER command as follows:
STORAGE (INITIAL 100K
In this example, only one hash key can be assigned per data block. Therefore, the initial space required for the hash cluster is at least 100*2K or 200K. The settings for the storage parameters do not account for this requirement. Therefore, an initial extent of 100K and a second extent of 150K are allocated to the hash cluster.
Alternatively, assume the HASH parameters are specified as follows:
SIZE 500 HASHKEYS 100
In this case, three hash keys are assigned to each data block. Therefore, the initial space required for the hash cluster is at least 34*2K or 68K. The initial settings for the storage parameters are sufficient for this requirement (an initial extent of 100K is allocated to the hash cluster).
See Also: To estimate the size of a hash cluster, use the procedure given in "Estimating Space Required by Clusters" , along with the supplemental information in "Estimating Space Required by Hash Clusters" .
CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))
STORAGE (INITIAL 250K NEXT 50K
MINEXTENTS 1 MAXEXTENTS 3
HASH IS trialno HASHKEYS 150;
CREATE TABLE trial (
trialno NUMBER(5,0) PRIMARY KEY,
CLUSTER trial_cluster (trialno);
The following sections explain setting the parameters of the CREATE CLUSTER command specific to hash clusters.
See Also: For additional information about creating tables in a cluster, guidelines for setting other parameters of the CREATE CLUSTER command, and the privileges required to create a hash cluster, see "Creating Clusters" .
The key of a hash cluster, like that of an index cluster, can be a single column or a composite key (multiple column key). A hash cluster with a composite key must use Oracle's internal hash function.
However, if the value of SIZE is large (fewer than five hash keys can be assigned per data block), you should also consider the expected frequency of collisions and whether performance of data retrieval or efficiency of space usage is more important to you:
|Available Space per Block/Calc`d SIZE||Setting for SIZE|
|2||Calculated SIZE + 15%|
|3||Calculated SIZE + 12%|
|4||Calculated SIZE + 8%|
For example, suppose you cluster the EMP table by DEPTNO, and there are 100 DEPTNOs, with values 10, 20, . . ., 1000. Assuming you bypass the internal hash function and you create a cluster with HASHKEYS of 100, then department 10 will hash to 10, department 20 to 20, . . ., department 110 to 10 (110 mod 100), department 120 to 20, and so on. Notice that there are 10 entries for hash values of 10, 20, . . ., but none for 1, 2, . . ., and so on. As a result, there is a lot of wasted space and possibly a lot of overflow blocks because of collisions. Alternatively, if HASHKEYS is set to 101, then each department number hashes to a unique hash key value.
You decide to load the EMP table into a hash cluster. Most queries retrieve employee records by their employee number. You estimate that the maximum number of rows in the EMP table at any given time is 10000 and that the average row size is 55 bytes.
In this case, EMPNO should be the cluster key. Since this column contains integers that are unique, the internal hash function can be bypassed. SIZE can be set to the average row size, 55 bytes; note that 34 hash keys are assigned per data block. HASHKEYS can be set to the number of rows in the table, 10000, rounded up to the next highest prime number, 10001:
CREATE CLUSTER emp_cluster (empno NUMBER)
. . .
HASH IS empno HASHKEYS 10001;
Conditions similar to the previous example exist. In this case, however, rows are usually retrieved by department number. At most, there are 1000 departments with an average of 10 employees per department. Note that department numbers increment by 10 (0, 10, 20, 30, . . . ).
In this case, DEPTNO should be the cluster key. Since this column contains integers that are uniformly distributed, the internal hash function can be bypassed. A pre-estimated SIZE (the average amount of space required to hold all rows per department) is 55 bytes * 10, or 550 bytes. Using this value for SIZE, only three hash keys can be assigned per data block. If you expect some collisions and want maximum performance of data retrieval, slightly alter your estimated SIZE to prevent collisions from requiring overflow blocks. By adjusting SIZE by 12%, to 620 bytes (see previous section about setting SIZE for clarification), only three hash keys are assigned per data block, leaving more space for rows from expected collisions.
HASHKEYS can be set to the number of unique department numbers, 1000, rounded up to the next highest prime number, 1009:
CREATE CLUSTER emp_cluster (deptno NUMBER)
. . .
HASH IS deptno HASHKEYS 1009;
ALTER CLUSTER emp_dept . . . ;
The implications for altering a hash cluster are identical to those for altering an index cluster. However, note that the SIZE, HASHKEYS, and HASH IS parameters cannot be specified in an ALTER CLUSTER statement. You must re-create the cluster to change these parameters and then copy the data from the original cluster.
See Also: For more information about altering an index cluster, see "Altering Clusters" .
DROP CLUSTER emp_dept;
A table in a hash cluster is dropped using the SQL command DROP TABLE. The implications of dropping hash clusters and tables in hash clusters are the same for index clusters.
See Also: For more information about dropping clusters, see "Dropping Clusters" .
Copyright © 1996 Oracle Corporation.
All Rights Reserved.