|Oracle9i Data Cartridge Developer's Guide
Release 2 (9.2)
Part Number A96595-01
This chapter describes extensible indexing, including:
What is extensible indexing? Why is it important to you as a cartridge developer? How should you go about implementing it?
To answer these questions we first need to understand the modes of indexing provided by the Oracle, which in turn requires that we first consider the role of indexing in information management systems.
The impetus to index data arises because of the need to locate specific information and then to retrieve it as efficiently as possible. If you could keep the entire dataset in main memory (equivalent to a person memorizing a book), there would be no need for indexing. Since this is not possible, and since disk access times are much slower than main memory access times, you are forced to wrestle with the art of indexing.
If you think of the form of indexing with which we are most familiar -- the index at the back of a technical book -- you will note that every index token has three characteristics which refer to the item being indexed:
This has many implications. For one, it means that the same data can be subject to different indexing schemes. For another, it means that the indexing scheme provides a pathway of access to the information. The index in the back of the book gives you access to the entire range of topics covered in the book. Provided that its structure meets your needs, its presorting of the data means that you do not have to sift through every iota of information.
The upshot is that you can retrieve the information much quicker than if you had to page through the entire book (equivalent to sequential scanning of a file)! However, note that while indexing speeds up retrieval, it slows down inserts because you have to update the index.
An index can be any structure which can be used to represent information that can be used to efficiently evaluate a query.
There is no single structure that is optimal for all applications.
Regionscontain a city named Metropolis, you will deploy an equality operator that will return an exact match (or not).
In each case, you will want to organize the data in a different index structure since different queries require that information be indexed in different ways. As we will discuss in the following sections, a Hash structure is best suited for determining exact match, whereas a B-tree is much better suited for range queries.
Moreover, these are not the only kind of queries. What if you want to discover whether Power Station A or B can best service Quadrant 3, or to determine the overlapping coverage zones derived from different distributions of power stations? In these cases, you will want to create operators (
servesArea, and so on) that meet your specific requirements. Unfortunately, you cannot do this by means of either Hash or B-tree indexes.
The limitation of Hash and B-tree indexes is important because one criterion that distinguishes cartridges from other database applications is that data often incorporates many different kinds of information. While database systems are accomplished in processing scalar values, they cannot encompass the domain-specific data of interest to cartridge developers. Information in these contexts may be made up of text, images, audio, video -- and combinations of these that comprise domain-specific datatypes.
One way to resolve this problem is to create an index that serves as an intermediate structure. This is a logical extension of the basic idea underlying software-based indexing, namely that pointers refer to data (records, pages, files). In this scheme, keywords used to index video may be stored as an index. Going one step further, an intermediate structure may itself be indexed, as you might index abstracts (capsule text descriptions) of films.The advantage of this approach is that it may be easier to construct an index based on textual description of film than it is to index video footage. Employing this strategy you can scan the index without ever referring to the primary data (the film).
Unfortunately, intermediate structures in which text or scalars are used to represent unstructured data cannot satisfy all requirements. For one thing, they are always slower than direct indexing of the data because they introduce a level of indirection. More importantly, if the task is to analyze the density of bone in x-rays, or to categorize primate gestures, or to record the radio emissions of stars, there is no efficient substitute for direct indexing of unstructured data.
While there is no single kind of index that can satisfy all needs, the B-tree index comes closest to meeting the requirement. Here we describe the Knuth variation in which the index consists of two parts: a sequence set that provides fast sequential access to the data, and an index set that provides direct access to the sequence set.
While the nodes of a B-tree will generally not contain the same number of data values, and will usually contain a certain amount of unused space, the B-tree algorithm ensures that it remains balanced (the leaf nodes will all be at the same level).
Hashing gives fast direct access to a specific stored record based on a given field value. Each record is placed at a location whose address is computed as some function of some field of that record. The same function is used both at the time of insertion and retrieval.
The problem with hashing is that the physical ordering of records has little if any relation to their logical ordering. Also, there may be large unused areas on the disk.
Our sample scenario integrates geographic data with other kinds of data. Insofar as we are interested in points that can be defined with two dimensions (latitude and longitude), such as geographic location of power stations, we can use a variation on the k-d tree known as the 2-d tree.
In this structure, each node is a datatype with fields for information, the two co-ordinates, a left-link and a right-link which can point to two children.
The structure allows for range queries. That is, if the user specifies a point (xx, xx) and a distance, the query will return the set of all points within the specified distance of the point.
2-d trees are very easy to implement. However,the fact that a 2-d tree containing k nodes may have a height of k means that insertion and querying may be complex.
The point quadtree is also used to represent point data in a two dimensional spaces. But these structures divide regions into four parts while 2-d trees divide regions into two. The fields of the record type for this node are comprised of an attribute for information, two co-ordinates, and four compass points (NW, SW, NE, SE) that can therefore point to four children.
Like 2-d trees, point quadtrees are very easy to implement. Also like 2-d trees, the fact that a point quadtree containing k nodes may have a height of k means that insertion and querying may be complex. Each comparison requires comparisons on at least two co-ordinates. However, in practice the lengths from root to leaf tend to be shorter in point quadtrees.
The fact is that Oracle provides a limited number of kinds of indexes, so that if (for instance) you wish to utilize either a k-d tree or the point quadtree, you will have to implement this yourself. As you consider your need to access your data, you need to keep in mind the following restrictions that pertain to the standard kinds of indexes:
Oracle's standard modes of indexing do not permit indexing a column that contains
You may not be able to index a column object using Oracle's standard indexing schemes or the elements of a collection type.
Oracle object types may be compared using either a map function or an order function. If the object utilizes a map function, then you can define a function-based index that can be used implicitly to evaluate relational predicates. However, if an order function is used, you will not be able to use this to construct an index.
Further, you cannot utilize functions in predicates in which the range of the parameters is infinite. Function-based indexes allow you to include a function in a predicate, provided you can precompute the function values for all the rows. Typically the index would store the rowid and the functional value. Queries that apply relational operators to values based on derived values utilize the index.
However, you can use function-based indexes only if the function is so designed that there are a finite number of input combinations. Put another way: you cannot use function-based indexes in cases in which the input parameters do not have a limited cardinality.
This SQL-based interface lets you define domain-specific operators and indexing schemes, and integrate these into the Oracle server.
Oracle provides a set of pre-defined operators which include arithmetic operators (+, -, *, /), comparison operators (=, >, <) and logical operators (
OR). These operators take as input one or more arguments (or operands) and return a result. They are represented by special characters (+) or keywords (
Like built-in operators, user-defined operators (such as
Contains) take a set of operands as input and return a result. The implementation of the operator is provided by the user. After a user has defined a new operator, it can be used in SQL statements like any other built-in operator.
For instance, suppose you define a new operator
Contains, which takes as input a text document and a keyword, and returns
1 if the document contains the specified keyword. You can then write an SQL query as:
Oracle uses indexes to efficiently evaluate some built-in operators. For example, a B-tree index can be used to evaluate the comparison operators =, > and <. Similarly, user-defined domain indexes can be used to efficiently evaluate user-defined operators.
Typical database management systems support a few types of access methods (B+Trees, Hash Index) on some set of data types (numbers, strings, and so on). In recent years, databases are more and more being used to store different types of data, such as text, spatial, image, video and audio. In these complex domains, there is a need for indexing complex data types and also specialized indexing techniques. For instance, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all possible kinds of complex data and indexing. The solution is to provide an extensible server which lets the user define new index types.
The framework to develop new index types is based on the concept of cooperative indexing where an application and the Oracle server cooperate to build and maintain indexes for data types such as text, spatial and On-line-Analytical Processing (OLAP). The application software, in the form of a cartridge, is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The index structure itself can either be stored in an Oracle database as an Index-Organized Table, or externally as a file.
The extensible indexing framework consists of the following components:
Overlapsoperator in the spatial domain. In general, user-defined operators can be bound to functions. However, operators can also be evaluated using indexes. For instance, the equality operator can be evaluated using a hash index. An indextype provides index-based implementation for the operators listed in the indextype definition.
To illustrate the role of each of these components, let us consider a text domain application. Suppose a new indextype
TextIndexType be defined as part of the text cartridge. It contains routines for managing and accessing the text index. The text index is an inverted index storing the occurrence list for each token in each of the text documents. The text cartridge also defines the
Contains operator for performing content-based search on textual data. It provides both a functional implementation (a simple number function) and an index implementation (using the text index) for the Contains operator.
Employees be an employee table with a
resume column containing textual data.
A domain index can be created on resume column as follows:
The Oracle server invokes the routine corresponding to the create method in the
TextIndexType, which results in the creation of an index-organized table to store the occurrence list of all tokens in the resumes (essentially, the inverted index data). The inverted index modeled by
ResumeTextIndex is automatically maintained by invoking routines defined in
TextIndexType, whenever an
Employees row is inserted, updated, or deleted.
Content-based search on the resume column can be performed as follows:
Index-based implementation of the
Contains operator can take advantage of the previously built inverted index. Specifically, the Oracle server can invoke routines specified in
TextIndexType to search the domain index for identifying candidate rows, and then do further processing such as filtering, selection, and fetching of rows. Note that the preceding query can also be evaluated using the non-index implementation of the
Contains operator, if the Oracle server chooses to not use the index defined on resume column. In such a case, the filtering of rows will be done by applying the non-index implementation on each resume instance of the table.
In summary, the extensible indexing interface will
This interface will enable a domain index to operate essentially the same way as any other Oracle Server index, the primary difference being that the Oracle Server will invoke application code specified as part of the indextype to create, drop, truncate, modify, and search a domain index.
It should be noted that an index designer may choose to store the index data in files, rather than in index-organized tables. The SQL interface for extensible indexing makes no restrictions on the location of the index data, only that the application adhere to the protocol for index definition, maintenance and search operations.
This section describes the key concepts of the Extensible Indexing Framework.
For simple data types such as integers and small strings, all aspects of indexing can be easily handled by the database system. This is not the case for documents, images, video clips and other complex data types that require content-based retrieval (CBR). The essential reason is that complex data types have application specific formats, indexing requirements, and selection predicates. For example, there are many different document encodings (such as ODA, SGML, plain text) and information retrieval (IR) techniques (keyword, full-text boolean, similarity, probabilistic, and so on). To effectively accommodate the large and growing number of complex data objects, the database system must support application specific indexing. The approach that we employ to satisfy this requirement is termed extensible indexing.
With Extensible indexing,
In effect, the application controls the structure and semantic content of the domain index. The database system interacts with the application to build, maintain, and employ the domain index. It is highly desirable for the database to handle the physical storage of domain indexes. In the following discussion, we implicitly make the assumption that the index is stored in an index-organized table. Note however, that the extensible indexing paradigm does not impose this requirement. The index could be stored in one or more external files.
To illustrate the notion of extensible indexing, we consider a textual database application with IR functionality. For such applications, document indexing involves parsing the text and inserting the words, or tokens, into an inverted index. Such index entries typically have the following logical form
(token, <docid, data>)
where token is the key, docid is a unique identifier (such as object identification) for the related document, and data is a segment containing IR specific quantities. For example, a probabilistic IR scheme could have a data segment with token frequency and occurrence list attributes. The occurrence list identifies all locations within the related document where the token appears. Assuming an IR scheme such as this, each index entry would be of the form:
(token, <docid, frequency, occlist> ..)
The following sample index entry for the token Archimedes illustrates the associated logical content.
In this sample index entry, the token "Archimedes" appears in document 5 at 3 locations(7, 62, and 225), and in document 26 at 2 locations(33 and 49). Note that the index would contain one entry for every document with the word "Archimedes".
IR applications can use domain indexes to locate documents that satisfy some given selection criteria. After consulting the index, the documents of interest are retrieved with the related docid values. It should be noted that the occurrence lists are required for queries that contain proximity expressions (for example, the phrase "Oracle Corporation").
When the database system handles the physical storage of domain indexes, applications must be able to:
In the following section, we illustrate the extensible indexing framework by building a text domain index.
This section presents an example of adding a text indexing scheme to Oracle RDBMS using the extensible indexing framework. It describes:
'The sequence of steps required to define the Text Indextype are:
The text cartridge intends to support an operator
Contains, that takes as parameters a text value and a key and returns a number value indicating whether the text contained the key. The functional implementation of this operator is a regular function defined as:
CREATE FUNCTION TextContains(Text IN VARCHAR2, Key IN VARCHAR2) RETURN NUMBER AS BEGIN ....... END TextContains;
ODCIIndex. This involves implementing routines for index definition, index maintenance, and index scan operations.
The index definition routines (
ODCIIndexTruncate) build the text index when index is created, alter the index information when index is altered, remove the index information when the index is dropped, and truncate the text index when the base table is truncated.
The index maintenance routines (
ODCIIndexUpdate) maintain the text index when the table rows are inserted, deleted, or updated.
The index scan routines (
ODCIIndexClose) implement access to the text index to retrieve rows of the base table that satisfy the operator predicate. In this case, the
Contains(...) =1, whose arguments are passed to the index scan routines. The index scan routines scan the text index and return the qualifying rows to the system.
Indextypeschema object. The Indextype definition also specifies all the operators supported by the new indextype and specifies the type that implements the index interface.
Suppose that the text indextype presented in the previous section has been defined in the system. You can define text indexes on text columns and use the associated
Contains operator to query text data.
Employees table defined as follows:
A text domain index can be built on the resume column as follows:
The text data in the resume column can be queried as:
The query execution will use the text index on
resume to efficiently evaluate the
The following sections describe the concepts of indextypes, domain indexes and operators in greater detail.
The purpose of an indextype is to enable efficient search and retrieval functions for complex domains such as text, spatial, image, and OLAP using external software. An indextype is analogous to the sorted or bit-mapped indextype that are supplied internally within the Oracle Server. The essential difference is that the implementation for an indextype is provided by application software, as opposed to the Oracle Server internal routines.
A set of routine specifications. It does not refer to a separate schema object but rather a logical set of documented method specifications.
The set of index definition, maintenance and scan routine specifications.
The interface specifies all the routines which have to be implemented by the index designer. The routines are implemented as type methods.
After the type implementing the
ODCIIndex interface has been defined, a new indextype can be created by specifying the list of operators supported by the indextype and referring to the type that implements the index interface.
Using the information retrieval example, the DDL statement for defining the new indextype
TextIndexType which supports the
Contains operator and whose implementation is provided by the type
TextIndexMethods (implemented in the previous section) is as follows:
In addition to the
ODCIIndex interface routines, the implementation type must always implement the
ODCIGetInterfaces routine. This function returns the list of names of the interface routines implemented by the type and tells the server the version of these routines. The
ODCIGetInterfaces routine is invoked by Oracle when
CREATE INDEXTYPE is executed. If the indextype implements the Oracle9i version of the routines,
ODCIGetInterfaces must specify
'SYS.ODCIINDEX2' in the
OUT parameter. If the indextype implements the Oracle8i version of the routines,
ODCIGetInterfaces must specify
'SYS.ODCIINDEX1' in the
OUT parameter. (The Oracle8i routines lack the
ODCIEnv parameter added to many of the routines in Oracle9i.)
DROP statement is supported to remove the definition of an indextype. For our example, this statement would be of the following form:
DROP behavior is
DROP RESTRICT semantics, that is, if one or more domain indexes exist that uses the indextype then the
DROP operation is disallowed. User can override the default behavior with the
FORCE option, which drops the indextype and marks dependent domain indexes (if any) invalid. For more details on object dependencies and drop semantics see "Object Dependencies, Drop Semantics, and Validation".
COMMENT statement can be used to supply information about an indextype or operator. For example:
COMMENT ON INDEXTYPE Ordsys.TextIndexType IS 'implemented by the type TextIndexMethods to support the Contains operator';
Comments on indextypes can be viewed in these data dictionary views:
To place a comment on an indextype, the indextype must be in your own schema or you must have the
COMMENT ANY INDEXTYPE privilege.
ODCIIndex (Oracle Data Cartridge Interface Index) interface consists of the following classes of methods:
Index definition methods allow specification of
ODCIIndexCreate procedure is called when a
INDEX statement is issued that references the indextype. Upon invocation, any physical parameters specified as part of the
PARAMETERS (...) statement are passed in along with the description of the index.
A typical action of this procedure is to create tables/files to store index data. Further, if the base table is not empty, this routine should build the index for the existing data in the indexed columns.
ODCIIndexAlter procedure is invoked when a domain index is altered using an
INDEX statement. The description of the domain index to be altered is passed in along with any specified physical parameters.
In addition, this procedure is allowed to handle
REBUILD option, which supports rebuilding of domain index. The precise behavior in these two cases is defined by the person who implements indextype.
ODCIIndexAlter routine is also invoked when a domain index is renamed using the
ODCIIndexTruncate procedure is called when a
TRUNCATE statement is issued against a table that contains a column or
OBJECT type attribute indexed by the indextype. After this procedure executes, the domain index should be empty.
ODCIIndexDrop procedure is invoked when a domain index is destroyed using a
Index maintenance methods allow specification of index
ODCIIndexInsert procedure in the indextype is called when a record is inserted in a table that contains columns or
OBJECT attributes indexed by the indextype. The new values in the indexed columns are passed in as arguments along with the corresponding row identifier.
ODCIIndexDelete procedure in the indextype is called when a record is deleted from a table that contains columns or
OBJECT attributes indexed by the indextype. The old values in the indexed columns are passed in as arguments along with the corresponding row identifier.
ODCIIndexUpdate procedure in the indextype is called when a record is updated in a table that contains columns or
OBJECT attributes indexed by the indextype. The old and new values in the indexed columns are passed in as arguments along with the row identifier.
Index scan methods allow specification of an index-based implementation for evaluating predicates containing operators.
An index scan is specified through three routines,
ODCIIndexClose. These perform initialization, fetch rows (or identifiers of rows) that satisfy the predicate, and clean up after all rows are returned.
ODCIIndexStart() is invoked to initialize any data structures and start an index scan. The index related information and the operator related information are passed in as arguments.
A typical action performed when
ODCIIndexStart() is invoked is to parse and execute SQL statements that query the tables storing the index data. It could also generate some set of result rows to be returned later when
ODCIIndexFetch() is invoked.
Since the index and operator related information are passed in as arguments to
ODCIIndexStart() and not to the other index scan routines (
ODCIIndexClose()), any information needed in the later routines must be saved. This is referred to as the state that has to be shared among the index scan routines. There are two ways of doing this:
The chapter on Cartridge Services in the Oracle Call Interface Programmer's Guide for information on memory services and maintaining context
In both cases, Oracle RDBMS will pass the
SELF value to subsequent
ODCIIndexClose() calls which can then use the value to access the relevant context information.
There are two modes of evaluating the operator predicate to return the result set of rows.
ODCIIndexStart(). Iterate over the results returning a row at a time in
ODCIIndexFetch(). This mode is required for operators involving some sort of ranking over the entire collection. Evaluating such operators would require looking at the entire result set to compute the ranking, relevance, and so on for each candidate row.
ODCIIndexFetch(). This mode is applicable for operators which can determine the candidate rows one at a time without having to look at the entire result set.
The choice of evaluating modes as well as what gets saved is left to the index designer. In either case, the Oracle RDBMS simply executes the
ODCIIndexStart() routine as part of processing query containing operators which returns the context as an output
SELF value.The returned value is passed back to subsequent
ODCIIndexFetch() returns the "next" row identifier of the row that satisfies the operator predicate.The operator predicate is specified in terms of the operator expression (name and arguments) and a lower and upper bound on the operator return values. Thus, a
ODCIIndexFetch() call returns the row identifier of the rows for which the operator return value falls within the specified bounds. A
NULL is returned to indicate end of index scan. The fetch method supports returning a batch of rows in each call. The state returned by
ODCIIndexStart() or a previous call to
ODCIIndexFetch() is passed in as an argument.
ODCIIndexClose() is invoked when the cursor is closed or reused. In this call the Indextype can perform any clean-ups or other needed functions. The current state is passed in as an argument.
ODCIIndexGetMetadata routine, if it is implemented, is called by the export utility to write implementation-specific metadata into the export dump file. This metadata might be policy information, version information, individual user settings, and so on, which are not stored in the system catalogs. The metadata is written to the dump files as anonymous PL/SQL blocks that get executed at import time immediately prior to the creation of the associated index.
The index interface routines (with the exception of index definition methods, namely,
ODCIIndexDrop()) are invoked under the same transaction that triggered these actions. Thus, the changes made by these routines are atomic and are committed or aborted based on the parent transaction. To achieve this, there are certain restrictions on the nature of the actions that can be performed in the different indextype routines.
For example, if an
INSERT statement caused the
ODCIIndexInsert() routine to be invoked,
ODCIIndexInsert() runs under the same transaction as
ODCIIndexInsert() routine can execute any number of DML statements (for example, insert into index-organized tables). If the original transaction aborts, all the changes made by the indextype routines are rolled back.
However, if the indextype routines cause changes external to the database (like writing to external files), transaction semantics are not assured.
The index definition routines do not have any restrictions on the nature of actions within them. Consider
ODCIIndexCreate() to understand this difference. A typical set of actions to be performed in
ODCIIndexCreate() could be:
ODCIIndexCreate() to execute an arbitrary sequence of DDL and DML statements, we consider each statement to be an independent operation. Consequently, the changes made by
ODCIIndexCreate() are not guaranteed to be atomic. The same is true for other index-definition routines.
The index maintenance (and scan routines) execute with the same snapshot as the top level SQL statement performing the DML (or query) operation. This enables the index data processed by the index method to be consistent with the data in the base tables.
Indextype routines always execute as the owner of the index. To support this, the index access driver will dynamically change user mode to index owner before invoking the indextype routines.
For certain operations, indextype routines may require to store information in tables owned by indextype designer. Indextype implementation must code those actions in a separate routine which will be executed using definer's privileges. For more information on syntax, see
TYPE in the Oracle9i SQL Reference.
This section describes the domain index operations and how metadata associated with the domain index can be obtained.
A domain index can be created on a column of a table just like a B-tree index. However, an indextype must be explicitly specified. For example:
CREATE INDEX ResumeTextIndex ON Employees(resume) INDEXTYPE IS TextIndexType PARAMETERS (':Language English :Ignore the a an');
INDEXTYPE clause specifies the indextype to be used. The
PARAMETERS clause identifies any parameters for the domain index, specified as a string. This string is passed uninterpreted to the
ODCIIndexCreate routine for creating the domain index. In the preceding example, the parameters string identifies the language of the text document (thus identifying the lexical analyzer to use) and the list of stop words which are to be ignored while creating the text index.
A domain index can be altered using
INDEX statement. For example:
The parameter string is passed uninterpreted to
ODCIIndexAlter() routine, which takes appropriate actions to alter the domain index. In the preceding example, additional stop words to ignore in the text index are specified.
ALTER statement can be used to rename a domain index.
ODCIIndexAlter() routine is invoked, which takes appropriate actions to rename the domain index.
In addition, the
ALTER statement can be used to rebuild a domain index.
ODCIIndexAlter() routine is called but with additional information about the
When the end user executes an
ALTER INDEX <domain_index>
UPDATE BLOCK REFERENCES for a domain index on an index-organized table (IOT),
ODCIIndexAlter() is called with the
AlterIndexUpdBlockRefs bit set. This gives the cartridge developer the opportunity to update guesses (as to the block locations of rows) stored in the domain index in logical rowids.
There is no explicit statement for truncating a domain index. However, when the corresponding table is truncated the truncate procedure specified as part of the indextype is invoked. For example:
will result in truncating
ResumeTextIndex by calling
To drop an instance of a domain index, the
INDEX statement is used. For our example, this statement would be of the form:
This results in calling the
ODCIIndexDrop() routine and passing information about the index.
When the base table of a domain index is an index-organized table, and you want to store rowids for the base table in a table of your own, you should store the rowids in a
UROWID column if you will be testing rowids for equality.
If the rowids are stored in a
VARCHAR column instead, comparisons for equality of the text of the rowids from the base table and your own table will fail even when the rowids pick out the same row. This is because a domain index on an index-organized table uses logical instead of physical rowids, and, unlike physical rowids, logical rowids for the same row can have different textual representations. (Two logical rowids are equivalent when they have the same primary key, regardless of the guesses stored with them.)
UROWID (univeral rowid) column can contain both physical and logical rowids. Storing rowids for an IOT in a
UROWID column ensures that the equality operator will succeed on two logical rowids that have the same primary key information but different primary keys.
If you create an index storage table with a rowid column by performing a
CREATE TABLE AS SELECT from the IOT base table, then a
UROWID column of the correct size is created for you in your index table. If you create a table with a rowid column, then you need to explicitly declare your rowid column to be of type
x is the size of the
UROWID column. The size chosen should be large enough to hold any rowid from the base table and so should be a function of the primary key from the base table. The following query can be used to determine a suitable size for the
SELECT (SUM(column_length + 3) + 7) FROM user_ind_columns ic, user_indexes i WHERE ic.index_name = i.index_name AND i.index_type = IOT - TOP AND ic.table_ name = <base_table>;
You can use the
IndexOnIOT bit of
IndexInfoFlags in the
ODCIIndexInfo structure to determine if the base table is an IOT.
ALTER INDEX REBUILD on index storage tables raises the same issues as doing a
CREATE TABLE if you drop your storage tables and re-create them. If, on the other hand, you reuse your storage tables, no additional work should be necessary if your base table is an IOT.
UROWID column is maintained in the index storage table, then you may need to change the type of the rowid bind variable in DML
DELETE statements so that it will work for all kinds of rowids. Converting the rowid argument passed in to a character array and then binding it as a
SQLT_STR works well for both physical and universal rowids. This strategy may enable you to more easily code your indextype to work with both regular tables and IOTs.
If you use an index scan-context structure to pass context between
Close, you will need to alter this structure. In particular, if you store the rowid define variable for the query in a buffer in this structure, then you will need to allocate the maximum size for a
UROWID in this buffer (3800 bytes for universal rowids in byte format, 5072 for universal rowids in character format) unless you know the size of the primary key of the base table in advance or wish to determine it at run-time. You will also need to store a bit in the context to indicate if the base table is an IOT, since
ODCIIndexInfo is not available in
As with DML operations, setting up the define variable as a
SQLT_STR works well for both physical and universal rowids. When physical rowids are fetched from the index table, you can be sure that their length is 18 characters. Universal rowids, however, may be up to 5072 characters long, and so a string length function must be used to correctly determine the actual length of a fetched universal rowid.
All values of a primary key column must be unique, so a domain index defined upon a non-unique column of a table cannot use this column as the primary key of an underlying IOT used to store the index. To work around this, you can add a column in the IOT, holding the index data, to hold a unique sequence number. Whenever a column value is inserted in the table, you should generate a unique sequence number to go with it. The indexed column together with the sequence number can be used as the primary key of the IOT. (Note that the sequence-number column cannot be a
UROWID columns cannot be part of a primary key for an IOT.) This approach also preserves the fast access to primary key column values that is a major benefit of IOTs.
For B-tree indexes, users can query the
USER_INDEXES view to get index information. To provide similar support for domain indexes, indextype designers can add any domain-specific metadata in the following manner:
schema.index). The remainder of the column definitions are at the discretion of the index designer.
Like B-tree and bitmap indexes, domain indexes are exported and subsequently imported when their base tables are exported. However, domain indexes can have implementation-specific metadata associated with them that are not stored in the system catalogs. For example, a text domain index can have associated policy information, a list of irrelevant words, and so on. Export/Import provides a mechanism to opaquely move this metadata from the source platform to target platform.
To move the domain index metadata, the indextype needs to implement the
ODCIIndexGetMetadata interface routine (see the reference chapters for details). This interface routine gets invoked when a domain index is being exported. The domain index information is passed in as a parameter. It can return any number of anonymous PL/SQL blocks that are written into the dump file and executed on import. If present, these anonymous PL/SQL blocks are executed immediately before the creation of the associated domain index.
Note that the
ODCIIndexGetMetadata is an optional interface routine. It is needed only if the domain index has extra metadata to be moved.
The transportable tablespaces feature enables you to move tablespaces from one Oracle database into another. You can use transportable tablespaces to move domain index data as an alternative to exporting and importing it.
Moving data using transportable tablespaces can be much faster than performing either an export/import or unload/load of the data because transporting a tablespace only requires copying datafiles and integrating tablespace structural information. Also, you do not need to rebuild the index afterward as you do when loading or importing.
Oracle9i Database Administrator's Guide for information about using transportable tablespaces
A user-defined operator is a top-level schema object. It is identified by a name which is in the same namespace as tables, views, types and standalone functions.
An operator binding associates an operator with the signature of a function that implements the operator. A signature consists of a list of the datatypes of the arguments of the function, in order of occurrence, and the function's return type. Binding an operator to a certain signature enables Oracle to pick out the function to execute when the operator is invoked. An operator can be implemented by more than one function as long as each function has a different signature. For each such function, you must define a corresponding binding.
Thus, any operator has an associated set of one or more bindings. Each binding can be evaluated using a user-defined function of any of these kinds:
An operator created in a schema can be evaluated using functions defined in the same or different schemas. The operator bindings can be specified at the time of creating the operator. It is ensured that the signatures of the bindings are unique.
An operator can be created by specifying the operator name and its bindings.
For example, an operator
Contains can be created in the
Ordsys schema with two bindings and the corresponding functions that provide the implementation in the Text and Spatial domains.
CREATE OPERATOR Ordsys.Contains BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER USING text.contains, (Spatial.Geo, Spatial.Geo) RETURN NUMBER USING Spatial.contains;
Although the return data type is specified as part of operator binding declaration, it is not considered to determine the uniqueness of the binding, However, the specified function must have the same argument and return datatypes as the operator binding.
An existing operator and all its bindings can be dropped using the
OPERATOR statement. For example:
DROP behavior is
RESTRICT semantics. Namely, if there are any dependent indextypes for any of the operator bindings, then the
DROP operation is disallowed.
However, users can override the default behavior by using the
FORCE option. For example,
Contains and all its bindings and marks any dependent indextype objects invalid
COMMENT statement can be used to supply information about an indextype or operator. For example:
COMMENT ON OPERATOR Ordsys.TextIndexType IS 'a number indicating whether the text contains the key';
Comments on operators can be viewed in these views in the data dictionary:
To place a comment on an operator, the operator must be in your own schema or you must have the
COMMENT ANY OPERATOR privilege.
User-defined operators can be invoked anywhere built-in operators can be used, that is, wherever expressions can occur. For example, user-defined operators can be used in the following:
When an operator is invoked, the evaluation of the operator is accomplished by executing a function bound to it. The function is selected based on the datatypes of the arguments to the operator. If no function bound to the operator matches the signature with which the operator is invoked (perhaps after some implicit type conversions), an error is raised.
Consider the operator created with the following statement:
CREATE OPERATOR Ordsys.Contains BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER USING text.contains, (spatial.geo, spatial.geo) RETURN NUMBER USING spatial.contains;
Consider the operator
Contains being used in the following SQL statements:
The invocation of the operator
'Oracle') is transformed into the execution of the function
'Oracle') since the signature of the function matches the datatypes of the operator arguments. Similarly, the invocation of the operator
:bay_area) is transformed into the execution of the function
The following statement would raise an error since none of the operator bindings satisfy the argument datatypes:
System privileges for operator schema objects are:
See the Oracle9i SQL Reference for details.
To use a user-defined operator in an expression, you must own the operator or have
EXECUTE privilege on it.
An operator can be optionally supported by one or more user-defined indextypes. An indextype can support one or more operators. This means that a domain index of this indextype can be used in efficiently evaluating these operators. For example, B-tree indexes can be used to evaluate the relational operators like =, < and >. Operators can also be bound to regular functions. For example, an operator Equal can be bound to a function
eq(number, number) that compares two numbers. The DDL for this would be:
Thus, an indextype designer should first design the set of operators to be supported by the indextype. For each of these operators, a functional implementation should be provided.
The list of operators supported by an indextype are specified when the indextype schema object is created (as described previously).
Operators that occur in a
WHERE clause are evaluated differently than operators occurring elsewhere in a SQL statement. Both kinds of cases are considered in the following sections.
Operators appearing in the
WHERE clause can be evaluated efficiently by performing an index scan using the scan methods provided as part of the implementation of an indextype. This involves recognizing operator predicates of a certain form, selection of a domain index, setting up an appropriate index scan, and finally, executing index scan methods. Let's consider each one of these steps in detail.
An indextype supports efficient evaluation of operator predicates that can be represented by a range of lower and upper bounds on the operator return values. Specifically, predicates of the form:
are possible candidates for index scan-based evaluation.
Use of the operators in any expression, for example
precludes index scan-based evaluation.
Predicates of the form,
are not evaluated using an index scan. They are evaluated using the functional implementation.
Finally, any other operator predicates which can internally be converted into one of the preceding forms by Oracle can also make use of the index scan based evaluation.
An index scan-based evaluation of an operator is a possible candidate for predicate evaluation only if the operator occurring in the predicate (as described in the preceding section) operates on a column or object attribute indexed using an indextype. The final decision to choose between the indexed implementation and the functional implementation is made by the optimizer. The optimizer takes into account the selectivity and cost while generating the query execution plan.
For example, consider the query
The optimizer can choose to use a domain index in evaluating the
Contains operator if
TextIndexTypesupports the appropriate
If any of these conditions do not hold, a complete scan of the
Employees table is performed and the functional implementation of
Contains is applied as a post-filter. If these conditions are met, the optimizer uses selectivity and cost functions to compare the cost of index-based evaluation with the full table scan and appropriately generates the execution plan.
Consider a slightly different query,
In this query, the
Employees table could be accessed through an index on the
id column or one on the
resume column. The optimizer estimates the costs of the two plans and picks the cheaper one, which could be to use the index on
id and apply the
Contains operator on the resulting rows. In this case, the functional implementation of
Contains() is used and the domain index is not used.
If a domain index is selected for the evaluation of an operator predicate, an index scan is set-up. The index scan is performed by the scan methods (
lose()) specified as part of the corresponding indextype implementation. The
ODCIIndexStart() method is invoked with the operator related information including name and arguments and the lower and upper bounds describing the predicate. After the
ODCIIndexStart() call, a series of fetches are performed to obtain row identifiers of rows satisfying the predicate, and finally the
lose() is called when the SQL cursor is destroyed.
The index scan routines must be implemented with an understanding of how the routines' invocations are ordered and how multiple sets of invocations can be interleaved.
As an example, consider the following query:
SELECT * FROM Emp1, Emp2 WHERE Contains(Emp1.resume, 'Oracle') =1 AND Contains(Emp2.resume, 'Unix') =1 AND Emp1.id = Emp2.id;
If the optimizer decides to use the domain indexes on the resume columns of both tables, the indextype routines may be invoked in the following sequence:
start(ctx1, ...); /* corr. to Contains(Emp1.resume, 'Oracle') */ start(ctx2, ...); /* corr. to Contains(Emp2.resume, 'Unix'); fetch(ctx1, ...); fetch(ctx2, ...); fetch(ctx1, ...); ... close(ctx1); close(ctx2);
Thus, the same indextype routine may be invoked but for different instances of operators. At any time, many operators are being evaluated through the same indextype routines. In case of a routine that does not need to maintain any state across calls because all the information is obtained through its parameters (as with the create routine), this is not a problem. However, in case of routines needing to maintain state across calls (like the fetch routine, which needs to know which row to return next), the state should be maintained in the
SELF parameter that is passed in to each call. The
SELF parameter (which is an instance of the implementation type) can be used to store either the entire state (if it is not too big) or a handle to the cursor-duration memory that stores the state.
Operators occurring in expressions other than in the
WHERE clause are evaluated using the functional implementation. For example,
would be executed by scanning the
Employee table and invoking the functional implementation for
Contains on each instance of resume. The function is invoked by passing it the actual value of the resume (text data) in the current row. Note that this function would not make use of any domain indexes that may have been built on the resume column.
However, it is possible to have a functional implementation for an operator that makes use of a domain index. The following sections discuss how functions that use domain indexes can be written and how they are invoked by the system.
For many domain-specific operators, such as
Contains, the functional implementation can work in two ways:
OBJECTattribute) that has a domain index of a particular indextype, the function can evaluate the operator by looking at the index data rather than the actual argument value.
For example, when
'Oracle') is invoked on a particular row of the
Employee table, it is easier for the function to look up the text domain index defined on the resume column and evaluate the operator based on the row identifier for the row containing the resume - rather than work on the resume text data argument.
To achieve both the behaviors of (1) and (2), the functional implementation is provided using a regular function which has three additional arguments--that is, additional to the original arguments to the operator. The additional arguments are:
For example, the index-based functional implementation for the
Contains operator is provided by the following function.
CREATE FUNCTION TextContains (Text IN VARCHAR2, Key IN VARCHAR2, indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER) RETURN NUMBER AS BEGIN ....... END TextContains;
Contains operator is bound to the functional implementation as follows:
CREATE OPERATOR Contains BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER WITH INDEX CONTEXT, SCAN CONTEXT TextIndexMethods USING TextContains;
WITH INDEX CONTEXT clause specifies that the functional implementation can make use of any applicable domain indexes. The
SCAN CONTEXT specifies the datatype of the scan context argument. It must be the same as the implementation type of the relevant indextype that supports this operator.
Oracle invokes the functional implementation for the operator if the operator appears outside the
WHERE clause. If the functional implementation is index-based (that is, defined to use an indextype), the additional index information is passed in as arguments only if the operator's first argument is a column (or object attribute) with a domain index of the appropriate indextype defined on it.
For example, in the query
Contains is evaluated using the index-based functional implementation by passing the index information about the domain index on the
resume column instead of the resume data.
To execute the index-based functional implementation, Oracle RDBMS sets up the arguments in the following manner:
ODCIIndexCtxattributes are set to
ODCIIndexCtxattributes are set up as follows.
NULLto the first invocation of the operator. Since it is an
OUTparameter, the return value from the first invocation is passed in to the second invocation and so on.
RegularCallfor all normal invocations of the operator. After the last invocation, the functional implementation is invoked once more, at which time any cleanup actions can be performed. During this call, the scan flag is set to
CleanupCalland all other arguments except the scan context are set to
When index information is passed in, the implementation can compute the operator value by doing a domain index lookup using the row identifier as key. The index metadata is used to identify the index structures associated with the domain index. The scan context is typically used to share state with the subsequent invocations of the same operator.
Apart from filtering rows, an operator occurring in the
WHERE clause might need to support returning ancillary data. Ancillary data is modeled as an operator (or multiple operators) with a single literal number argument. It has a functional implementation that has access to state generated by the index scan-based implementation of the primary operator occurring in the
For example, in the following query,
Contains is the primary operator and can be evaluated using an index scan which, in addition to determining the rows that satisfy the predicate, also computes a score value for each row. The functional implementation for the
Score operator simply accesses the state generated by the index scan to obtain the score for a given row identified by its row identifier. The literal argument
1 associates the ancillary operator
Score to the corresponding primary operator
Contains which generates the ancillary data.
In summary, ancillary data is modeled as independent operator(s) and is invoked by the user with a single number argument that ties it with the corresponding primary operator. Its functional implementation makes use of either the domain index or the state generated by the primary operator occurring in the
WHERE clause. The functional implementation is invoked with extra arguments: the index context containing the domain index information, and the scan context which provides access to the state generated by the primary operator. The following sections discuss how operators modeling ancillary data are defined and invoked.
An indextype designer needs to specify that an operator binding computes ancillary data. Such a binding is referred to as a primarybinding. For example, a primary binding for Contains can be defined as follows:
CREATE OPERATOR Contains BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER WITH INDEX CONTEXT, SCAN TextIndexMethods COMPUTE ANCILLARY DATA USING TextContains;
This definition registers two bindings for
VARCHAR2)--This can be used as before.
NUMBER)--When ancillary data is required elsewhere in SQL query, the operator can be invoked with the preceding signature. The
NUMBERargument is used to associate the corresponding ancillary operator binding.
However, the indextype designer needs to define a single functional implementation:
An indextype designer has to implement the functional implementation for ancillary data operators in a manner similar to the index-based functional implementation. As discussed earlier, the function takes extra arguments. After the function is defined, the indextype designer can bind it to the operator with an additional
TO attribute, which indicates that the functional implementation needs to share state with the primary operator binding. The binding that is used for modeling ancillary data is referred to as the ancillary operator binding.
For example, let the
TextScore() function contain code to evaluate the
Score ancillary operator.
CREATE FUNCTION TextScore (Text IN VARCHAR2, Key IN VARCHAR2, indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER) RETURN NUMBER AS BEGIN ....... END TextScore;
An ancillary operator binding can be created as follows:
CREATE OPERATOR Score BINDING (NUMBER) RETURN NUMBER ANCILLARY TO Contains(VARCHAR2, VARCHAR2) USING TextScore;
TOclause specifies that it shares state with the implementation of corresponding primary operator binding
Score(2), and so on.
The operators corresponding to ancillary data are invoked by the user with a single number argument.
The number argument must be a literal in both the ancillary operation and the primary operator invocation. This is required so that the operator association can be done at the query compilation time.
The corresponding primary operator invocation in the query is determined by matching it with the number passed in as the last argument to the primary operator. After the matching primary operator invocation is found (it is an error to find zero or more than one matching primary operator invocation):
For example, consider the query
The invocation of
Score is determined to be ancillary to
Contains based on the number argument
1, and the functional implementation for
Score gets the following operands:
scanctx, scanflg), where
scanctx is shared with the invocation of
The execution involves using an index scan to process the
Contains operator. For each of the rows returned by the
fetch() call of the index scan, the functional implementation of
Score is invoked by passing it the
ODCIIndexCtx argument, which contains the index information, row identifier, and a handle to the index scan state. The functional implementation can use the handle to the index scan state to compute the score.
The dependencies among various objects are as follows:
Thus, the order in which these objects must be created, or their definitions exported for future Import are:
The drop behavior for an object is as follows:
RESTRICTsemantics: If there are any dependent objects the drop operation is disallowed.
FORCEsemantics: The object is dropped even in the presence of dependent objects and the dependent objects if any are recursively marked invalid.
The following table shows the default and explicit drop options supported for operators and indextypes. The other schema objects are included for completeness and the corresponding drop behavior already available in Oracle.
|Schema Object||Default Drop Behavior||Explicit Options Supported|
Invalid objects are automatically revalidated, if possible, the next time they are referenced.
EXECUTEprivilege on the function, operator, package, or the type referenced in addition to
EXECUTEprivilege on the type that implements the indextype in addition to
INDEXTYPEprivilege. Also, you must have
EXECUTEprivileges on the operators that the indextype supports.
CREATE INDEXTYPEsystem privilege.
ALTER ANY INDEXTYPEor
ALTER ANY OPERATORsystem privilege.
EXECUTEprivilege on the indextype in addition to
EXECUTEprivilege on the indextype.
EXECUTEprivilege on the operator and the associated function/package/type.
EXECUTEprivilege on the new implementation type.
A domain index can be built to have discrete index partitions that correspond to the partitions of a range-partitioned table. Such an index is called a local domain index, as opposed to a global domain index, which has no index partitions. The term local domain index refers to a partitioned index as a whole, not to the partitions that comprise a local domain index.
A local domain index is equipartitioned with the underlying table: all keys in a local domain index refer only to rows stored in its corresponding table partition; none refer to rows in other partitions.
Currently, local domain indexes can be created only for range-partitioned tables. Local domain indexes cannot be built for hash-partitioned tables or IOTs.
A local (as opposed to a global) domain index can index only a single column; it cannot index an expression.
You provide for using local domain indexes in the indextype, with the
CREATE INDEXTYPE statement. For example:
CREATE INDEXTYPE TextIndexType FOR Contains (VARCHAR2, VARCHAR2) USING TextIndexMethods WITH LOCAL RANGE PARTITION;
This statement specifies that the implementation type
TextIndexType is capable of creating/maintaining local domain indexes. The clause
WITH LOCAL RANGE PARTITION specifies the partitioning method for the base table.
CREATE INDEX statement creates and partitions the index. Here is the syntax:
CREATE INDEX [schema.]index ON [schema.]table [t.alias] (indexed_column) INDEXTYPE IS indextype [LOCAL [PARTITION [partition [PARAMETERS ('string')]]] [...] ] [PARAMETERS ('string')];
The given syntax for
LOCAL [PARTITION] clause indicates that the index is a local index on a partitioned table. You can specify partition names or allow Oracle to generate them.
PARAMETERS clause, specify the parameter string that is passed uninterpreted to the appropriate ODCI indextype routine. The maximum length of the parameter string is 1000 characters.
When you specify this clause at the top level of the syntax, the parameters become the default parameters for the index partitions. If you specify this clause as part of the
LOCAL [PARTITION] clause, you override any default parameters with parameters for the individual partition. The
LOCAL [PARTITION] clause can specify multiple partitions.
Once the domain index is created, Oracle invokes the appropriate ODCI routine. If the routine does not return successfully, the domain index is marked
FAILED. The only operations supported on an failed domain index are
DROP INDEX and (for non-local indexes)
The following example creates local domain index
There are these restrictions on creating a local domain index:
A specified index partition cannot be dropped explicitly. To drop a local index partition, the entire local domain index must be dropped:
ALTER INDEX statement can be used to perform the following operations on a local domain index:
ALTER INDEXTYPE statement enables you to change properties and the implementation type of an indextype without having to drop and re-create the indextype and then rebuild all dependent indexes.
Oracle9i SQL Reference for complete syntax of the SQL statements mentioned in this section
Like a domain index, a partition of a local domain index can be in one or more of several states:
The index or the index partition is in this state before and during the execution of the
Same as for regular indexes: An index on a partitioned table is marked
An index gets marked
DML operations cannot be performed on the underlying table if an index partition of a local domain index is in any of these states:
The following tables list operations that can be performed on the underlying table of an index and describe the effect, if any, on the index.
Drops the table. Drops all the indexes and their corresponding partitions
Truncates the table. Truncates all the indexes and the index partitions
|ALTER TABLE Operation||Description|
Base table operations that do not involve partition maintenance
Modify Partition Unusable local indexes
Marks the local index partition associated with the table partition as
Modify Partition Rebuild Unusable local indexes
Rebuilds the local index partitions that are marked
Adds a new table partition. Also adds a new local index partition.
Drops a range table partition. Also drops the associated local index partition
Truncate the table partition. Also truncates the associated local index partition
Base table operations that involve partition maintenance
Moves the base table partition to another tablespace. Corresponding local index partitions are marked
Splits a table partition into two partitions. Corresponding local index partition is also split. If the resulting partitions are non-empty, the index partitions are marked
Merges two table partitions into one partition. Corresponding local index partitions should also merge. If the resulting partition contains data, the index partition is marked
Exchange Partition Excluding Indexes
Exchanges a table partition with a non-partitioned table. Local index partitions and global indexes are marked
Exchange Partition Including Indexes
Exchanges a table partition with a non-partitioned table. Local index partition is exchanged with global index on the non-partitioned table. Index partitions remain
The set of
ODCIIndex interfaces that needs to be implemented for a domain index depends on whether the index is to be partitioned and, if so, in what way. There are two possibilities:
ODCIIndex interfaces that must be implemented for each option are listed in the following sections. Those in the first group must be implemented for any domain index, partitioned or not. Those in the other group need be implemented only to provide support for local range-partitioned indexes.
SQL*Loader conventional path loads are supported for tables on which domain indexes are defined, but direct path loads are not. To do a direct path load, first drop the domain index, do the direct path load in SQL*Loader, and then re-create the domain indexes.