7
Building Domain Indexes

This chapter describes extensible indexing, including:

Introduction to Extensible Indexing

What is extensible indexing? Why is it important to you as a cartridge developer? How should you go about implementing it?

To answer these questions we first need to understand the modes of indexing provided by the Oracle, which in turn requires that we first consider the role of indexing in information management systems.

What is Indexing?

The impetus to index data arises because of the need to locate specific information and then to retrieve it as efficiently as possible. If you could keep the entire dataset in main memory (equivalent to a person memorizing a book), there would be no need for indexing. Since this is not possible, and since disk access times are much slower than main memory access times, you are forced to wrestle with the art of indexing.

If you think of the form of indexing with which we are most familiar -- the index at the back of a technical book -- you will note that every index token has three characteristics which refer to the item being indexed:

Identity -- the token must allows us to identify the item in such a way that it is distinguished from the rest of the mass of the data. But this is not simply a representative relationship. By defining an index item you filter the information, implicitly providing a logical structure for the indexed information.

This has many implications. For one, it means that the same data can be subject to different indexing schemes. For another, it means that the indexing scheme provides a pathway of access to the information. The index in the back of the book gives you access to the entire range of topics covered in the book. Provided that its structure meets your needs, its presorting of the data means that you do not have to sift through every iota of information.
Location --the token must allow us to locate the information. In the case of a book, this is a page number, and may also include a chapter designation. This is not very precise since we still have to search the page for the item. In contrast to the normal index, conversation analysis makes use of line numbers because of the need for greater precision in locating the item:
```
10296   HELEN: If you really loved me you wouldn't go to war.
10297   PARIS: If you really loved me you wouldn't stand in the way of my
               duty.
```
Storage -- the index token has to be located somewhere, and the information that it maps also has to be stored. In the case of books, a page is normally the unit of storage in both cases, but the nature of the storage is different. While the body text is stored as sentences, the index tokens have an altogether different structure.

The upshot is that you can retrieve the information much quicker than if you had to page through the entire book (equivalent to sequential scanning of a file)! However, note that while indexing speeds up retrieval, it slows down inserts because you have to update the index.

Index Structures

An index can be any structure which can be used to represent information that can be used to efficiently evaluate a query.

The Relationship between Logical and Physical Structures

There is no single structure that is optimal for all applications.

If you want to discover if any Regions contain a city named Metropolis, you will deploy an equality operator that will return an exact match (or not).
If you are interested how many time-periods have power demands between two stipulated numbers, you will use an operator that can process a range of data.

In each case, you will want to organize the data in a different index structure since different queries require that information be indexed in different ways. As we will discuss below, a Hash structure is best suited for determining exact match, whereas a B-tree is much better suited for range queries.

Moreover, these are not the only kind of queries. What if you want to discover whether Power Station A or B can best service Quadrant 3, or to determine the overlapping coverage zones derived from different distributions of power stations? In these cases, you will want to create operators (inRangeOf, servesArea, etc.) that meet your specific requirements. Unfortunately, you cannot do this by means of either Hash or B-tree indexes.

The Need for Index Structures that Encompass Unstructured Data

The limitation of Hash and B-tree indexes is important because one criterion that distinguishes cartridges from other database applications is that data often incorporates many different kinds of information. While database systems are accomplished in processing scalar values, they cannot encompass the domain-specific data of interest to cartridge developers. Information in these contexts may be made up of text, images, audio, video -- and combinations of these that comprise domain-specific datatypes.

One way to resolve this problem is to create an index that serves as an intermediate structure. This is a logical extension of the basic idea underlying software-based indexing, namely that pointers refer to data (records, pages, files). In this scheme, keywords used to index video may be stored as an index. Going one step further, an intermediate structure may itself be indexed, as you might index abstracts (capsule text descriptions) of films.The advantage of this approach is that it may be easier to construct an index based on textual description of film than it is to index video footage. Employing this strategy you can scan the index without ever referring to the primary data (the film).

Unfortunately, intermediate structures in which text or scalars are used to represent unstructured data cannot satisfy all requirements. For one thing, they are always slower than direct indexing of the data because they introduce a level of indirection. More importantly, if the task is to analyze the density of bone in x-rays, or to categorize primate gestures, or to record the radio emissions of stars, there is no efficient substitute for direct indexing of unstructured data.

Kinds of Indexes

B-tree

While there is no single kind of index that can satisfy all needs, the B-tree index comes closest to meeting the requirement. Here we describe the Knuth variation in which the index consists of two parts: a sequence set that provides fast sequential access to the data, and an index set that provides direct access to the sequence set.

Figure 7-1 B-tree Index Structure

Text description of dci81034.gif follows

Text description of the illustration dci81034.gif

While the nodes of a B-tree will generally not contain the same number of data values, and will usually contain a certain amount of unused space, the B-tree algorithm ensures that it remains balanced (the leaf nodes will all be at the same level).

Hash

Hashing gives fast direct access to a specific stored record based on a given field value. Each record is placed at a location whose address is computed as some function of some field of that record. The same function is used both at the time of insertion and retrieval.

The problem with hashing is that the physical ordering of records has little if any relation to their logical ordering. Also, there may be large unused areas on the disk.

Figure 7-2 Hash Index Structure

Text description of dci81035.gif follows

Text description of the illustration dci81035.gif

k-d tree

Our sample scenario integrates geographic data with other kinds of data. Insofar as we are interested in points that can be defined with two dimensions (latitude and longitude), such as geographic location of power stations, we can use a variation on the k-d tree known as the 2-d tree.

In this structure, each node is a datatype with fields for information, the two co-ordinates, a left-link and a right-link which can point to two children.

Figure 7-3 2-d Index Structure

Text description of dci81036.gif follows

Text description of the illustration dci81036.gif

The structure allows for range queries. That is, if the user specifies a point (xx, xx) and a distance, the query will return the set of all points within the specified distance of the point.

2-d trees are very easy to implement. However,the fact that a 2-d tree containing k nodes may have a height of k means that insertion and querying may be complex.

Point Quadtree

Figure 7-4 Point Quadtree Index Structure

Text description of dci81038.gif follows

Text description of the illustration dci81038.gif

The point quadtree is also used to represent point data in a two dimensional spaces. But these structures divide regions into four parts while 2-d trees divide regions into two. The fields of the record type for this node are comprised of an attribute for information, two co-ordinates, and four compass points (NW, SW, NE, SE) that can therefore point to four children.

Like 2-d trees, point quadtrees are very easy to implement. Also like 2-d trees, the fact that a point quadtree containing k nodes may have a height of k means that insertion and querying may be complex. Each comparison requires comparisons on at least two co-ordinates. However, in practice the lengths from root to leaf tend to be shorter in point quadtrees.

Why is Extensible Indexing Necessary?

The fact is that Oracle provides a limited number of kinds of indexes, so that if (for instance) you wish to utilize either a k-d tree or the point quadtree, you will have to implement this yourself. As you consider your need to access your data, you need to keep in mind the following restrictions that pertain to the standard kinds of indexes:

Inability to Index Unstructured Data

Oracle's standard modes of indexing do not permit indexing a column that contains LONG or LOB values.

Inability to Index Attributes of Column Objects

You may not be able to index a column object using Oracle's standard indexing schemes or the elements of a collection type.

Inability to Index Values Derived from Domain-specific Operations

Oracle object types may be compared using either a map function or an order function. If the object utilizes a map function, then you can define a function-based index that can be used implicitly to evaluate relational predicates. However, if an order function is used, you will not be able to use this to construct an index.

Further, you cannot utilize functions in predicates in which the range of the parameters is infinite. Function-based indexes allow you to include a function in a predicate, provided you can precompute the function values for all the rows. Typically the index would store the rowid and the functional value. Queries that apply relational operators to values based on derived values utilize the index.

However, you can use function-based indexes only if the function is so designed that there are a finite number of input combinations. Put another way: you cannot use function-based indexes in cases in which the input parameters do not have a limited cardinality.

The Extensible Indexing API

This SQL-based interface allows you to define domain-specific operators and indexing schemes, and integrate these into the Oracle server.

Oracle provides a set of pre-defined operators which include arithmetic operators (+, -, *, /), comparison operators (=, >, <) and logical operators (NOT, AND, OR). These operators take as input one or more arguments (or operands) and return a result. They are represented by special characters (+) or keywords (AND).

Like built-in operators, user-defined operators (e.g., Contains) take a set of operands as input and return a result. The implementation of the operator is provided by the user. After a user has defined a new operator, it can be used in SQL statements like any other built-in operator.

For instance, suppose you define a new operator Contains, which takes as input a text document and a keyword, and returns 1 if the document contains the specified keyword. You can then write an SQL query as:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle and UNIX')=1;

Oracle uses indexes to efficiently evaluate some built-in operators. For example, a B-tree index can be used to evaluate the comparison operators =, > and <. Similarly, user-defined domain indexes can be used to efficiently evaluate user-defined operators.

Typical database management systems support a few types of access methods (B+Trees, Hash Index) on some set of data types (numbers, strings, etc.). In recent years, databases are more and more being used to store different types of data, such as text, spatial, image, video and audio. In these complex domains, there is a need for indexing complex data types and also specialized indexing techniques. For instance, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all possible kinds of complex data and indexing. The solution is to provide an extensible server which allows the user to define new index types.

The framework to develop new index types is based on the concept of cooperative indexing where an application and the Oracle server cooperate to build and maintain indexes for data types such as text, spatial and On-line-Analytical Processing (OLAP). The application software, in the form of a cartridge, is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The index structure itself can either be stored in an Oracle database as an Index-Organized Table, etc., or externally as a file.

The extensible indexing framework consists of the following components:

Indextype: A schema object Indextype specifies the routines that manage all aspects of an application-specific index, namely, index definition, index maintenance, and index scan operations. This schema object enables the Oracle Server to establish a user-defined index on a column of a table or attribute of an Object. It encapsulates the set of routines that together manage and access the user-defined index.
Domain Index: Using the Indextype schema object, an application-specific index can be created. Such an index is called a domain index since it is used for indexing data in application-specific domains. A domain index is an instance of an index which is created, managed, and accessed by routines supplied by an indextype. This is in contrast to B-tree indexes maintained by Oracle internally, which are simply referred to as indexes.
Operators: Queries and data manipulation statements can involve application-specific operators, like the Overlaps operator in the spatial domain. In general, user-defined operators can be bound to functions. However, operators can also be evaluated using indexes. For instance, the equality operator can be evaluated using a hash index. An indextype provides index-based implementation for the operators listed in the indextype definition.
Index-Organized tables: This feature enables applications to define, build, maintain, and access indexes for complex objects using a table metaphor. To the application, an index is modeled as a table, where each row is an index entry. In addition, this feature extends the current sorted access method to handle indexing content-rich objects by providing improved handling of duplicate index entries. For detailed information on index-organized tables see Oracle9i Database Administrator's Guide.

To illustrate the role of each of these components, let us consider a text domain application. Suppose a new indextype TextIndexType be defined as part of the text cartridge. It contains routines for managing and accessing the text index. The text index is an inverted index storing the occurrence list for each token in each of the text documents. The text cartridge also defines the Contains operator for performing content-based search on textual data. It provides both a functional implementation (a simple number function) and an index implementation (using the text index) for the Contains operator.

Now, let Employees be an employee table with a resume column containing textual data.

CREATE TABLE Employees 
(name VARCHAR(128), id INTEGER, resume VARCHAR2(1024));

A domain index can be created on resume column as follows:

CREATE INDEX ResumeTextIndex ON Employees(resume)
INDEXTYPE IS TextIndexType;

The Oracle server invokes the routine corresponding to the create method in the TextIndexType, which results in the creation of an index-organized table to store the occurrence list of all tokens in the resumes (essentially, the inverted index data). The inverted index modeled by ResumeTextIndex is automatically maintained by invoking routines defined in TextIndexType, whenever an Employees row is inserted, updated, or deleted.

Content-based search on the resume column can be performed as follows:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle and UNIX')=1;

Index-based implementation of the Contains operator can take advantage of the previously built inverted index. Specifically, the Oracle server can invoke routines specified in TextIndexType to search the domain index for identifying candidate rows, and then do further processing such as filtering, selection, and fetching of rows. Note that the above query can also be evaluated using the non-index implementation of the Contains operator, if the Oracle server chooses to not use the index defined on resume column. In such a case, the filtering of rows will be done by applying the non-index implementation on each resume instance of the table.

In summary, the extensible indexing interface will

Allow encapsulating application-specific index management routines as an indextype schema object,
Support defining a domain index (an application-specific index) on table columns, and
Provide efficient processing of application-specific operators.

This interface will enable a domain index to operate essentially the same way as any other Oracle Server index, the primary difference being that the Oracle Server will invoke application code specified as part of the indextype to create, drop, truncate, modify, and search a domain index.

It should be noted that an index designer may choose to store the index data in files, rather than in index-organized tables. The SQL interface for extensible indexing makes no restrictions on the location of the index data, only that the application adhere to the protocol for index definition, maintenance and search operations.

Concepts: Extensible Indexing

This section describes the key concepts of the Extensible Indexing Framework.

Overview

For simple data types such as integers and small strings, all aspects of indexing can be easily handled by the database system. This is not the case for documents, images, video clips and other complex data types that require content-based retrieval (CBR). The essential reason is that complex data types have application specific formats, indexing requirements, and selection predicates. For example, there are many different document encodings (e.g., ODA, SGML, plain text) and information retrieval (IR) techniques (e.g., keyword, full-text boolean, similarity, probabilistic, and so on). To effectively accommodate the large and growing number of complex data objects, the database system must support application specific indexing. The approach that we employ to satisfy this requirement is termed extensible indexing.

With Extensible indexing,

The application defines the structure of the domain index
The application stores the index data either inside the Oracle database (e.g., in form of index-organized tables) or outside the Oracle database
The application manages, retrieves and uses the index data to evaluate user queries

In effect, the application controls the structure and semantic content of the domain index. The database system interacts with the application to build, maintain, and employ the domain index. It is highly desirable for the database to handle the physical storage of domain indexes. In the following discussion, we implicitly make the assumption that the index is stored in an index-organized table. Note however, that the extensible indexing paradigm does not impose this requirement. The index could be stored in one or more external files.

To illustrate the notion of extensible indexing, we consider a textual database application with IR functionality. For such applications, document indexing involves parsing the text and inserting the words, or tokens, into an inverted index. Such index entries typically have the following logical form

(token, <docid, data>)

where token is the key, docid is a unique identifier (e.g., object identification) for the related document, and data is a segment containing IR specific quantities. For example, a probabilistic IR scheme could have a data segment with token frequency and occurrence list attributes. The occurrence list identifies all locations within the related document where the token appears. Assuming an IR scheme such as this, each index entry would be of the form:

(token, <docid, frequency, occlist> ..)

The following sample index entry for the token Archimedes illustrates the associated logical content.

(Archimedes, <5, 3, [7 62 225]>, <26, 2, [33, 49]>, ...);

In this sample index entry, the token "Archimedes" appears in document 5 at 3 locations(7, 62, and 225), and in document 26 at 2 locations(33 and 49). Note that the index would contain one entry for every document with the word "Archimedes".

IR applications can use domain indexes to locate documents that satisfy some given selection criteria. After consulting the index, the documents of interest are retrieved with the related docid values. It should be noted that the occurrence lists are required for queries that contain proximity expressions (e.g., the phrase "Oracle Corporation").

When the database system handles the physical storage of domain indexes, applications must be able to:

Define the format and content of an index. This enables applications to define an index structure that can accommodate a complex data object.
Build, delete, and update a domain index. With this capability, the application software handles building and maintaining the index structures. Note that this is a significant departure from the "automatic" indexing features provided for simple SQL data types. Also, since an index is modeled as a collection of tuples, in-place updating is directly supported.
Access and interpret the content of an index. This capability enables the application software to become an integral component of query processing. That is, the content-related clauses for database queries are handled by application software.

In the following section, we illustrate the extensible indexing framework by building a text domain index.

Example: A Text Indextype

This section presents an example of adding a text indexing scheme to Oracle RDBMS using the extensible indexing framework. It describes:

Defining a new indexing scheme using text indextype.
Use of text indextype by the end user to index and operate on textual data.

Text Indextype Designer

'The sequence of steps required to define the Text Indextype are:

Define and code functions to support functional implementation of operators which would eventually be supported by the text indextype.

The text cartridge intends to support an operator Contains, that takes as parameters a text value and a key and returns a number value indicating whether the text contained the key. The functional implementation of this operator is a regular function defined as:

CREATE FUNCTION TextContains(Text IN VARCHAR2, Key IN VARCHAR2)
RETURN NUMBER AS
BEGIN
.......
END TextContains;

Create a new operator, and define its specification, namely, the argument and return datatypes, and the functional implementation
```
CREATE OPERATOR Contains
 BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER USING TextContains;
```
Define a type that implements the index interface ODCIIndex. This involves implementing routines for index definition, index maintenance, and index scan operations.

The index definition routines (ODCIIndexCreate, ODCIIndexAlter, ODCIIndexDrop, ODCIIndexTruncate) build the text index when index is created, alter the index information when index is altered, remove the index information when the index is dropped, and truncate the text index when the base table is truncated.

The index maintenance routines (ODCIIndexInsert, ODCIIndexDelete, ODCIIndexUpdate) maintain the text index when the table rows are inserted, deleted, or updated.

The index scan routines (ODCIIndexStart, ODCIIndexFetch, ODCIIndexClose) implement access to the text index to retrieve rows of the base table that satisfy the operator predicate. In this case, the Contains(...) =1, whose arguments are passed to the index scan routines. The index scan routines scan the text index and return the qualifying rows to the system.
```
CREATE TYPE TextIndexMethods
(
FUNCTION ODCIIndexCreate(...)
...
);
CREATE TYPE BODY TextIndexMethods
(
...
);
```
Create the Text Indextype schema object. The Indextype definition also specifies all the operators supported by the new indextype and specifies the type that implements the index interface.
```
CREATE INDEXTYPE TextIndexType
FOR Contains(VARCHAR2, VARCHAR2)
USING TextIndexMethods;
```

End User of Text Indextype

Suppose that the text indextype presented in the previous section has been defined in the system. You can define text indexes on text columns and use the associated Contains operator to query text data.

Consider the Employees table defined as follows:

CREATE TABLE Employees
(name VARCHAR2(64), id INTEGER, resume VARCHAR2(2000));

A text domain index can be built on the resume column as follows:

CREATE INDEX ResumeIndex ON Employees(resume) INDEXTYPE IS TextIndexType;

The text data in the resume column can be queried as:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') =1;

The query execution will use the text index on resume to efficiently evaluate the Contains predicate.

The following sections describe the concepts of indextypes, domain indexes and operators in greater detail.

Indextypes

The purpose of an indextype is to enable efficient search and retrieval functions for complex domains such as text, spatial, image, and OLAP using external software. An indextype is analogous to the sorted or bit-mapped indextype that are supplied internally within the Oracle Server. The essential difference is that the implementation for an indextype is provided by application software, as opposed to the Oracle Server internal routines.

Interface

A set of routine specifications. It does not refer to a separate schema object but rather a logical set of documented method specifications.

ODCIIndex Interface

The set of index definition, maintenance and scan routine specifications.

The interface specifies all the routines which have to be implemented by the index designer. The routines are implemented as type methods.

Creating Indextypes

After the type implementing the ODCIIndex interface has been defined, a new indextype can be created by specifying the list of operators supported by the indextype and referring to the type that implements the index interface.

Using the information retrieval example, the DDL statement for defining the new indextype TextIndexType which supports the Contains operator and whose implementation is provided by the type TextIndexMethods (implemented in the previous section) is as follows:

CREATE INDEXTYPE TextIndexType

FOR Contains (VARCHAR2, VARCHAR2)
USING TextIndexMethods;

In addition to the ODCIIndex interface routines, the implementation type must always implement the ODCIGetInterfaces routine. This function returns the list of names of the interface routines implemented by the type and tells the server the version of these routines. The ODCIGetInterfaces routine is invoked by Oracle when CREATE INDEXTYPE is executed. If the indextype implements the Oracle9i version of the routines, ODCIGetInterfaces must specify 'SYS.ODCIINDEX2' in the OUT parameter. If the indextype implements the Oracle8i version of the routines, ODCIGetInterfaces must specify 'SYS.ODCIINDEX1' in the OUT parameter. (The Oracle8i routines lack the ODCIEnv parameter added to many of the routines in Oracle9i.)

Dropping Indextypes

A corresponding DROP statement is supported to remove the definition of an indextype. For our example, this statement would be of the following form:

DROP INDEXTYPE TextIndexType;

The default DROP behavior is DROP RESTRICT semantics, that is, if one or more domain indexes exist that uses the indextype then the DROP operation is disallowed. User can override the default behavior with the FORCE option, which drops the indextype and marks dependent domain indexes (if any) invalid. For more details on object dependencies and drop semantics see "Object Dependencies, Drop Semantics, and Validation".

Commenting on Indextypes

The COMMENT statement can be used to supply information about an indextype or operator. For example:

COMMENT ON INDEXTYPE
Ordsys.TextIndexType IS 'implemented by the type TextIndexMethods to support the 
Contains operator';

Comments on indextypes can be viewed in these data dictionary views:

USER_INDEXTYPE_COMMENTS
ALL_INDEXTYPE_COMMENTS
DBA_INDEXTYPE_COMMENTS

To place a comment on an indextype, the indextype must be in your own schema or you must have the COMMENT ANY INDEXTYPE privilege.

ODCI Index Interface

The ODCIIndex (Oracle Data Cartridge Interface Index) interface consists of the following classes of methods:

Index Definition methods
Index Maintenance methods
Index Scan methods
Index Metadata method

Index Definition Methods

Index definition methods allow specification of CREATE, ALTER, DROP, and TRUNCATE behaviors.

ODCIIndexCreate

The ODCIIndexCreate procedure is called when a CREATE INDEX statement is issued that references the indextype. Upon invocation, any physical parameters specified as part of the CREATE INDEX... PARAMETERS (...) statement are passed in along with the description of the index.

A typical action of this procedure is to create tables/files to store index data. Further, if the base table is not empty, this routine should build the index for the existing data in the indexed columns.

ODCIIndexAlter

The ODCIIndexAlter procedure is invoked when a domain index is altered using an ALTER INDEX statement. The description of the domain index to be altered is passed in along with any specified physical parameters.

In addition, this procedure is allowed to handle ALTER with REBUILD option, which supports rebuilding of domain index. The precise behavior in these two cases is defined by the person who implements indextype.

The ODCIIndexAlter routine is also invoked when a domain index is renamed using the ALTER INDEX ... RENAME command.

ODCIIndexTruncate

The ODCIIndexTruncate procedure is called when a TRUNCATE statement is issued against a table that contains a column or OBJECT type attribute indexed by the indextype. After this procedure executes, the domain index should be empty.

ODCIIndexDrop

The ODCIIndexDrop procedure is invoked when a domain index is destroyed using a DROP INDEX statement.

Index Maintenance Methods

Index maintenance methods allow specification of index INSERT, UPDATE, and DELETE behaviors.

ODCIIndexInsert

The ODCIIndexInsert procedure in the indextype is called when a record is inserted in a table that contains columns or OBJECT attributes indexed by the indextype. The new values in the indexed columns are passed in as arguments along with the corresponding row identifier.

ODCIIndexDelete

The ODCIIndexDelete procedure in the indextype is called when a record is deleted from a table that contains columns or OBJECT attributes indexed by the indextype. The old values in the indexed columns are passed in as arguments along with the corresponding row identifier.

ODCIIndexUpdate

The ODCIIndexUpdate procedure in the indextype is called when a record is updated in a table that contains columns or OBJECT attributes indexed by the indextype. The old and new values in the indexed columns are passed in as arguments along with the row identifier.

Index Scan Methods

Index scan methods allow specification of an index-based implementation for evaluating predicates containing operators.

An index scan is specified through three routines, ODCIIndexStart, ODCIIndexFetch, and ODCIIndexClose. These perform initialization, fetch rows (or identifiers of rows) that satisfy the predicate, and clean up after all rows are returned.

ODCIIndexStart

ODCIIndexStart() is invoked to initialize any data structures and start an index scan. The index related information and the operator related information are passed in as arguments.

A typical action performed when ODCIIndexStart() is invoked is to parse and execute SQL statements that query the tables storing the index data. It could also generate some set of result rows to be returned later when ODCIIndexFetch() is invoked (see below).

Since the index and operator related information are passed in as arguments to ODCIIndexStart() and not to the other index scan routines (ODCIIndexFetch() and ODCIIndexClose()), any information needed in the later routines must be saved. This is referred to as the state that has to be shared among the index scan routines. There are two ways of doing this:

Return State: If the state to be maintained is small, it can be returned back to Oracle RDBMS through an output SELF argument.
Return Handle: If the state to be maintained is large (for example, a subset of the results), cursor-duration memory can be allocated to save the state. In this case, a handle to the memory can be returned to Oracle RDBMS through the output SELF parameter.

See Also:
The chapter on Cartridge Services in the Oracle Call Interface Programmer's Guide for information on memory services and maintaining context

In both cases, Oracle RDBMS will pass the SELF value to subsequent ODCIIndexFetch() and ODCIIndexClose() calls which can then use the value to access the relevant context information.

There are two modes of evaluating the operator predicate to return the result set of rows.

Precompute All: Compute the entire result set in ODCIIndexStart(). Iterate over the results returning a row at a time in ODCIIndexFetch(). This mode is required for operators involving some sort of ranking over the entire collection, etc. Evaluating such operators would require looking at the entire result set to compute the ranking, relevance, etc. for each candidate row.
Incremental Computation: Compute one result row at a time as part of ODCIIndexFetch(). This mode is applicable for operators which can determine the candidate rows one at a time without having to look at the entire result set.

The choice of evaluating modes as well as what gets saved is left to the index designer. In either case, the Oracle RDBMS simply executes the ODCIIndexStart() routine as part of processing query containing operators which returns the context as an output SELF value.The returned value is passed back to subsequent ODCIIndexFetch() and ODCIIndexClose() calls.

ODCIIndexFetch

ODCIIndexFetch() returns the "next" row identifier of the row that satisfies the operator predicate.The operator predicate is specified in terms of the operator expression (name and arguments) and a lower and upper bound on the operator return values. Thus, a ODCIIndexFetch() call returns the row identifier of the rows for which the operator return value falls within the specified bounds. A NULL is returned to indicate end of index scan. The fetch method supports returning a batch of rows in each call. The state returned by ODCIIndexStart() or a previous call to ODCIIndexFetch() is passed in as an argument.

ODCIIndexClose

ODCIIndexClose() is invoked when the cursor is closed or reused. In this call the Indextype can perform any clean-ups, etc. The current state is passed in as an argument.

Index Metadata Method

The optional ODCIIndexGetMetadata routine, if it is implemented, is called by the export utility to write implementation-specific metadata into the export dump file. This metadata might be policy information, version information, individual user settings, and so on, which are not stored in the system catalogs. The metadata is written to the dump files as anonymous PL/SQL blocks that get executed at import time immediately prior to the creation of the associated index.

Transaction Semantics during Index Method Execution

The index interface routines (with the exception of index definition methods, namely, ODCIIndexCreate(), ODCIIndexAter(), ODCIIndexTruncate(), ODCIIndexDrop()) are invoked under the same transaction that triggered these actions. Thus, the changes made by these routines are atomic and are committed or aborted based on the parent transaction. To achieve this, there are certain restrictions on the nature of the actions that can be performed in the different indextype routines.

Index definition routines have no restrictions.
Index maintenance routines can only execute DML statements. However, the DML statements cannot update the base table on which the domain index is created.
Index scan routines can only execute SQL query statements.

For example, if an INSERT statement caused the ODCIIndexInsert() routine to be invoked, ODCIIndexInsert() runs under the same transaction as INSERT. The ODCIIndexInsert() routine can execute any number of DML statements (for example, insert into index-organized tables). If the original transaction aborts, all the changes made by the indextype routines are rolled back.

However, if the indextype routines cause changes external to the database (like writing to external files), transaction semantics are not assured.

Transaction Semantics for Index Definition Routines

The index definition routines do not have any restrictions on the nature of actions within them. Consider ODCIIndexCreate() to understand this difference. A typical set of actions to be performed in ODCIIndexCreate() could be:

Create an index-organized table
Insert data into the index-organized table
Create a secondary index on a column of the index-organized table

To allow ODCIIndexCreate() to execute an arbitrary sequence of DDL and DML statements, we consider each statement to be an independent operation. Consequently, the changes made by ODCIIndexCreate() are not guaranteed to be atomic. The same is true for other index-definition routines.

Consistency Semantics during Index Method Execution

The index maintenance (and scan routines) execute with the same snapshot as the top level SQL statement performing the DML (or query) operation. This enables the index data processed by the index method to be consistent with the data in the base tables.

Privileges During Index Method Execution

Indextype routines always execute as the owner of the index. To support this, the index access driver will dynamically change user mode to index owner before invoking the indextype routines.

For certain operations, indextype routines may require to store information in tables owned by indextype designer. Indextype implementation must code those actions in a separate routine which will be executed using definer's privileges. For more information on syntax, see CREATE TYPE in the Oracle9i SQL Reference.

Domain Indexes

This section describes the domain index operations and how metadata associated with the domain index can be obtained.

Domain Index Operations

Creating a Domain Index

A domain index can be created on a column of a table just like a B-tree index. However, an indextype must be explicitly specified. For example:

CREATE INDEX ResumeTextIndex ON Employees(resume)
INDEXTYPE IS TextIndexType
PARAMETERS (':Language English :Ignore the a an');

The INDEXTYPE clause specifies the indextype to be used. The PARAMETERS clause identifies any parameters for the domain index, specified as a string. This string is passed uninterpreted to the ODCIIndexCreate routine for creating the domain index. In the example above, the parameters string identifies the language of the text document (thus identifying the lexical analyzer to use) and the list of stop words which are to be ignored while creating the text index.

Altering a Domain Index

A domain index can be altered using ALTER INDEX statement. For example:

ALTER INDEX ResumeTextIndex PARAMETERS (':Ignore on');

The parameter string is passed uninterpreted to ODCIIndexAlter() routine, which takes appropriate actions to alter the domain index. In the above example, additional stop words to ignore in the text index are specified.

The ALTER statement can be used to rename a domain index.

ALTER INDEX ResumeTextIndex RENAME TO ResumeTIdx;

The ODCIIndexAlter() routine is invoked, which takes appropriate actions to rename the domain index.

In addition, the ALTER statement can be used to rebuild a domain index.

ALTER INDEX ResumeTextIndex REBUILD PARAMETERS (':Ignore of');

The same ODCIIndexAlter() routine is called but with additional information about the ALTER option.

When the end user executes an ALTER INDEX <domain_index> UPDATE BLOCK REFERENCES for a domain index on an index-organized table (IOT), ODCIIndexAlter() is called with the AlterIndexUpdBlockRefs bit set. This gives the cartridge developer the opportunity to update guesses (as to the block locations of rows) stored in the domain index in logical rowids.

Truncating a Domain Index

There is no explicit statement for truncating a domain index. However, when the corresponding table is truncated the truncate procedure specified as part of the indextype is invoked. For example:

TRUNCATE TABLE Employees;

will result in truncating ResumeTextIndex by calling ODCIIndexTruncate() routine.

Dropping a Domain Index

To drop an instance of a domain index, the DROP INDEX statement is used. For our example, this statement would be of the form:

DROP INDEX ResumeTextIndex;

This results in calling the ODCIIndexDrop() routine and passing information about the index.

Domain Indexes on Index-Organized Tables

Storing rowids in a UROWID column

When the base table of a domain index is an index-organized table, and you want to store rowids for the base table in a table of your own, you should store the rowids in a UROWID column if you will be testing rowids for equality.

If the rowids are stored in a VARCHAR column instead, comparisons for equality of the text of the rowids from the base table and your own table will fail even when the rowids pick out the same row. This is because a domain index on an index-organized table uses logical instead of physical rowids, and, unlike physical rowids, logical rowids for the same row can have different textual representations. (Two logical rowids are equivalent when they have the same primary key, regardless of the guesses stored with them.)

A UROWID (univeral rowid) column can contain both physical and logical rowids. Storing rowids for an IOT in a UROWID column ensures that the equality operator will succeed on two logical rowids that have the same primary key information but different primary keys.

If you create an index storage table with a rowid column by performing a CREATE TABLE AS SELECT from the IOT base table, then a UROWID column of the correct size is created for you in your index table. If you create a table with a rowid column, then you need to explicitly declare your rowid column to be of type UROWID(x), where x is the size of the UROWID column. The size chosen should be large enough to hold any rowid from the base table and so should be a function of the primary key from the base table. The following query can be used to determine a suitable size for the UROWID column:

  SELECT (SUM(column_length + 3) + 7) 
  FROM user_ind_columns ic, user_indexes i 
  WHERE ic.index_name = i.index_name 
  AND i.index_type = IOT - TOP 
  AND ic.table_
  name = <base_table>;

You can use the IndexOnIOT bit of IndexInfoFlags in the ODCIIndexInfo structure to determine if the base table is an IOT.

Doing an ALTER INDEX REBUILD on index storage tables raises the same issues as doing a CREATE TABLE if you drop your storage tables and recreate them. If, on the other hand, you reuse your storage tables, no additional work should be necessary if your base table is an IOT.

DML on Index Storage Tables

If a UROWID column is maintained in the index storage table, then you may need to change the type of the rowid bind variable in DML INSERT, UPDATE, DELETE statements so that it will work for all kinds of rowids. Converting the rowid argument passed in to a character array and then binding it as a SQLT_STR works well for both physical and universal rowids. This strategy may enable you to more easily code your indextype to work with both regular tables and IOTs.

Start, Fetch, and Close Operations on Index Storage Tables

If you use an index scan-context structure to pass context between Start, Fetch, and Close, you will need to alter this structure. In particular, if you store the rowid define variable for the query in a buffer in this structure, then you will need to allocate the maximum size for a UROWID in this buffer (3800 bytes for universal rowids in byte format, 5072 for universal rowids in character format) unless you know the size of the primary key of the base table in advance or wish to determine it at run-time. You will also need to store a bit in the context to indicate if the base table is an IOT, since ODCIIndexInfo is not available in Fetch.

As with DML operations, setting up the define variable as a SQLT_STR works well for both physical and universal rowids. When physical rowids are fetched from the index table, you can be sure that their length is 18 characters. Universal rowids, however, may be up to 5072 characters long, and so a string length function must be used to correctly determine the actual length of a fetched universal rowid.

Indexes on Non-Unique Columns

All values of a primary key column must be unique, so a domain index defined upon a non-unique column of a table cannot use this column as the primary key of an underlying IOT used to store the index. To work around this, you can add a column in the IOT, holding the index data, to hold a unique sequence number. Whenever a column value is inserted in the table, you should generate a unique sequence number to go with it. The indexed column together with the sequence number can be used as the primary key of the IOT. (Note that the sequence-number column cannot be a UROWID because UROWID columns cannot be part of a primary key for an IOT.) This approach also preserves the fast access to primary key column values that is a major benefit of IOTs.

Domain Index Metadata

For B-tree indexes, users can query the USER_INDEXES view to get index information. To provide similar support for domain indexes, indextype designers can add any domain-specific metadata in the following manner:

The indextype designer can define one or more tables that will contain this meta information. The key column of this table must be a unique identifier for the index. This unique key could be the index name (schema.index). The remainder of the column definitions are at the discretion of the index designer.
Views can be created that join the system defined metadata tables with the index meta tables to provide a comprehensive set of information for each instance of a domain index. It is the responsibility of the indextype designer to provide the view definition.

Export/Import of Domain Indexes

Like B-tree and bitmap indexes, domain indexes are exported and subsequently imported when their base tables are exported. However, domain indexes can have implementation-specific metadata associated with them that are not stored in the system catalogs. For example, a text domain index can have associated policy information, a list of irrelevant words, and so on. Export/Import provides a mechanism to opaquely move this metadata from the source platform to target platform.

To move the domain index metadata, the indextype needs to implement the ODCIIndexGetMetadata interface routine (see the reference chapters for details). This interface routine gets invoked when a domain index is being exported. The domain index information is passed in as a parameter. It can return any number of anonymous PL/SQL blocks that are written into the dump file and executed on import. If present, these anonymous PL/SQL blocks are executed immediately before the creation of the associated domain index.

Note that the ODCIIndexGetMetadata is an optional interface routine. It is needed only if the domain index has extra metadata to be moved.

Moving Domain Indexes Using Transportable Tablespaces

The transportable tablespaces feature enables you to move tablespaces from one Oracle database into another. You can use transportable tablespaces to move domain index data as an alternative to exporting and importing it.

Moving data using transportable tablespaces can be much faster than performing either an export/import or unload/load of the data because transporting a tablespace only requires copying datafiles and integrating tablespace structural information. Also, you do not need to rebuild the index afterwards as you do when loading or importing.

See Also:

Oracle9i Database Administrator's Guide for information about using transportable tablespaces

Operators

A user-defined operator is a top-level schema object. It is identified by a name which is in the same namespace as tables, views, types and stand-alone functions.

Operator Bindings

An operator binding associates an operator with the signature of a function that implements the operator. A signature consists of a list of the datatypes of the arguments of the function, in order of occurrence, and the function's return type. Binding an operator to a certain signature enables Oracle to pick out the function to execute when the operator is invoked. An operator can be implemented by more than one function as long as each function has a different signature. For each such function, you must define a corresponding binding.

Thus, any operator has an associated set of one or more bindings. Each binding can be evaluated using a user-defined function of any of these kinds:

Stand-alone function
Package function
OBJECT member method

An operator created in a schema can be evaluated using functions defined in the same or different schemas. The operator bindings can be specified at the time of creating the operator. It is ensured that the signatures of the bindings are unique.

Creating operators

An operator can be created by specifying the operator name and its bindings.

For example, an operator Contains can be created in the Ordsys schema with two bindings and the corresponding functions that provide the implementation in the Text and Spatial domains.

CREATE OPERATOR Ordsys.Contains
BINDING
(VARCHAR2, VARCHAR2) RETURN NUMBER USING text.contains, 
(Spatial.Geo, Spatial.Geo) RETURN NUMBER USING Spatial.contains;

Note:

Although the return data type is specified as part of operator binding declaration, it is not considered to determine the uniqueness of the binding, However, the specified function must have the same argument and return datatypes as the operator binding.

Dropping Operators

An existing operator and all its bindings can be dropped using the DROP OPERATOR statement. For example:

DROP OPERATOR Contains;

The default DROP behavior is DROP RESTRICT semantics. Namely, if there are any dependent indextypes for any of the operator bindings, then the DROP operation is disallowed.

However, users can override the default behavior by using the FORCE option. For example,

DROP OPERATOR Contains FORCE;

drops operator Contains and all its bindings and marks any dependent indextype objects invalid

Commenting on Operators

The COMMENT statement can be used to supply information about an indextype or operator. For example:

COMMENT ON OPERATOR
Ordsys.TextIndexType IS 'a number indicating whether the text contains the key';

Comments on operators can be viewed in these views in the data dictionary:

USER_OPERATOR_COMMENTS
ALL_OPERATOR_COMMENTS
DBA_OPERATOR_COMMENTS

To place a comment on an operator, the operator must be in your own schema or you must have the COMMENT ANY OPERATOR privilege.

Invoking Operators

Operator Usage

User-defined operators can be invoked anywhere built-in operators can be used. i.e. wherever expressions can occur. For example, user-defined operators can be used in the following:

the select list of a SELECT command
the condition of a WHERE clause
the ORDER BY and GROUP BY clauses

Operator Execution

When an operator is invoked, the evaluation of the operator is accomplished by executing a function bound to it. The function is selected based on the datatypes of the arguments to the operator. If no function bound to the operator matches the signature with which the operator is invoked (perhaps after some implicit type conversions), an error is raised.

Examples

Consider the operator created with the following statement:

CREATE OPERATOR Ordsys.Contains
BINDING 
(VARCHAR2, VARCHAR2) RETURN NUMBER 
USING text.contains, 
(spatial.geo, spatial.geo) RETURN NUMBER 
USING spatial.contains;

Consider the operator Contains being used in the following SQL statements:

SELECT * FROM Employee
WHERE Contains(resume, 'Oracle')=1 AND Contains(location, :bay_area)=1;

The invocation of the operator Contains(resume, 'Oracle') is transformed into the execution of the function text.contains(resume, 'Oracle') since the signature of the function matches the datatypes of the operator arguments. Similarly, the invocation of the operator Contains(location, :bay_area) is transformed into the execution of the function spatial.contains(location, :bay_area).

The following statement would raise an error since none of the operator bindings satisfy the argument datatypes:

SELECT * FROM Employee
WHERE Contains(salary, 10000)=1;

Operator Privileges

System privileges for operator schema objects are:

CREATE OPERATOR
CREATE ANY OPERATOR
DROP ANY OPERATOR

See the Oracle9i SQL Reference for details.

To use a user-defined operator in an expression, you must own the operator or have EXECUTE privilege on it.

Operators and Indextypes

An operator can be optionally supported by one or more user-defined indextypes. An indextype can support one or more operators. This means that a domain index of this indextype can be used in efficiently evaluating these operators. For example, B-tree indexes can be used to evaluate the relational operators like =, < and >. Operators can also be bound to regular functions. For example, an operator Equal can be bound to a function eq(number, number) that compares two numbers. The DDL for this would be:

CREATE OPERATOR Equal 
BINDING(NUMBER, NUMBER) RETURN NUMBER USING eq;

Thus, an indextype designer should first design the set of operators to be supported by the indextype. For each of these operators, a functional implementation should be provided.

The list of operators supported by an indextype are specified when the indextype schema object is created (as described above).

Operators that occur in a WHERE clause are evaluated differently than operators occurring elsewhere in a SQL statement. Both kinds of cases are considered below.

Operators in the WHERE Clause

Operators appearing in the WHERE clause can be evaluated efficiently by performing an index scan using the scan methods provided as part of the implementation of an indextype. This involves recognizing operator predicates of a certain form, selection of a domain index, setting up an appropriate index scan, and finally, executing index scan methods. Let's consider each one of these steps in detail.

Operator Predicates

An indextype supports efficient evaluation of operator predicates that can be represented by a range of lower and upper bounds on the operator return values. Specifically, predicates of the form:

op(...) relop <value expression>, where relop in {<, <=, =, >=,>}

op(...) LIKE <value_expression>

are possible candidates for index scan-based evaluation.

Use of the operators in any expression, for example

op(...) + 2 = 3

precludes index scan-based evaluation.

Predicates of the form,

op() is NULL

are not evaluated using an index scan. They are evaluated using the functional implementation.

Finally, any other operator predicates which can internally be converted into one of the above forms by Oracle can also make use of the index scan based evaluation.

Operator Resolution

An index scan-based evaluation of an operator is a possible candidate for predicate evaluation only if the operator occurring in the predicate (as described above) operates on a column or object attribute indexed using an indextype. The final decision to choose between the indexed implementation and the functional implementation is made by the optimizer. The optimizer takes into account the selectivity and cost while generating the query execution plan.

For example, consider the query

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') = 1;

The optimizer can choose to use a domain index in evaluating the Contains operator if

The resume column has an index defined on it
The index is of type TextIndexType
TextIndexType supports the appropriate Contains() operator

If any of the above conditions do not hold, a complete scan of the Employees table is performed and the functional implementation of Contains is applied as a post-filter. If the above conditions are met, the optimizer uses selectivity and cost functions to compare the cost of index-based evaluation with the full table scan and appropriately generates the execution plan.

Consider a slightly different query,

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') =1 AND id =100;

In this query, the Employees table could be accessed through an index on the id column or one on the resume column. The optimizer estimates the costs of the two plans and picks the cheaper one, which could be to use the index on id and apply the Contains operator on the resulting rows. In this case, the functional implementation of Contains() is used and the domain index is not used.

Index Scan Setup

If a domain index is selected for the evaluation of an operator predicate, an index scan is set-up. The index scan is performed by the scan methods (ODCIIndexStart(), ODCIIndexFetch(), ODCIIndexClose()) specified as part of the corresponding indextype implementation. The ODCIIndexStart() method is invoked with the operator related information including name and arguments and the lower and upper bounds describing the predicate. After the ODCIIndexStart() call, a series of fetches are performed to obtain row identifiers of rows satisfying the predicate, and finally the ODCIIndexClose() is called when the SQL cursor is destroyed.

Execution Model for Index Scan Methods

The index scan routines must be implemented with an understanding of how the routines' invocations are ordered and how multiple sets of invocations can be interleaved.

As an example, consider the following query:

SELECT * FROM Emp1, Emp2 WHERE 
Contains(Emp1.resume, 'Oracle') =1 AND Contains(Emp2.resume, 'Unix') =1
AND Emp1.id = Emp2.id;

If the optimizer decides to use the domain indexes on the resume columns of both tables, the indextype routines may be invoked in the following sequence:

start(ctx1, ...); /* corr. to Contains(Emp1.resume, 'Oracle') */
start(ctx2, ...); /* corr. to Contains(Emp2.resume, 'Unix');
fetch(ctx1, ...);
fetch(ctx2, ...);
fetch(ctx1, ...);
...
close(ctx1);
close(ctx2);

Thus, the same indextype routine may be invoked but for different instances of operators. At any time, many operators are being evaluated through the same indextype routines. In case of a routine that does not need to maintain any state across calls because all the information is obtained through its parameters (as with the create routine), this is not a problem. However, in case of routines needing to maintain state across calls (like the fetch routine, which needs to know which row to return next), the state should be maintained in the SELF parameter that is passed in to each call. The SELF parameter (which is an instance of the implementation type) can be used to store either the entire state (if it is not too big) or a handle to the cursor-duration memory that stores the state.

Operators Outside the WHERE Clause

Using a Functional Implementation

Operators occurring in expressions other than in the WHERE clause are evaluated using the functional implementation. For example,

SELECT Contains(resume, 'Oracle') FROM Employee;

would be executed by scanning the Employee table and invoking the functional implementation for Contains on each instance of resume. The function is invoked by passing it the actual value of the resume (text data) in the current row. Note that this function would not make use of any domain indexes that may have been built on the resume column.

However, it is possible to have a functional implementation for an operator that makes use of a domain index. The following sections discuss how functions that use domain indexes can be written and how they are invoked by the system.

Creating Index-based Functional Implementation

For many domain-specific operators, such as Contains, the functional implementation can work in two ways:

If the operator is operating on a column (or OBJECT attribute) that has a domain index of a particular indextype, the function can evaluate the operator by looking at the index data rather than the actual argument value.

For example, when Contains(resume, 'Oracle') is invoked on a particular row of the Employee table, it is easier for the function to look up the text domain index defined on the resume column and evaluate the operator based on the row identifier for the row containing the resume - rather than work on the resume text data argument.
If the operator is operating on a column that does not have an appropriate domain index defined on it or if the operator is invoked with literal values (non-columns), the functional implementation evaluates the operator based on only the argument values. This is the default behavior for all operator bindings.

To achieve both the behaviors of (1) and (2) above, the functional implementation is provided using a regular function which has three additional arguments--that is, additional to the original arguments to the operator. The additional arguments are:

Index context--containing domain index information and the row identifier of the row on which the operator is being evaluated
Scan context--a context value to share state with subsequent invocations of the same operator (operating on other rows of the table)
Scan flag--indicates whether the current call is the last invocation during which all clean up operations should be done

For example, the index-based functional implementation for the Contains operator is provided by the following function.

CREATE FUNCTION TextContains (Text IN VARCHAR2, Key IN VARCHAR2,
indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER)
RETURN NUMBER AS
BEGIN
.......
END TextContains;

The Contains operator is bound to the above functional implementation as follows:

CREATE OPERATOR Contains
BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER 
WITH INDEX CONTEXT, SCAN CONTEXT TextIndexMethods
USING TextContains;

The WITH INDEX CONTEXT clause specifies that the functional implementation can make use of any applicable domain indexes. The SCAN CONTEXT specifies the datatype of the scan context argument. It must be the same as the implementation type of the relevant indextype that supports this operator.

Operator Resolution

Oracle invokes the functional implementation for the operator if the operator appears outside the WHERE clause. If the functional implementation is index-based (that is, defined to use an indextype), the additional index information is passed in as arguments only if the operator's first argument is a column (or object attribute) with a domain index of the appropriate indextype defined on it.

For example, in the query

SELECT Contains(resume, 'Oracle & Unix') FROM Employees;

the Operator Contains is evaluated using the index-based functional implementation by passing the index information about the domain index on the resume column instead of the resume data.

Operator Execution

To execute the index-based functional implementation, Oracle RDBMS sets up the arguments in the following manner:

The initial set of arguments are the same as those specified by the user for the operator.
If the first argument is not a column, the ODCIIndexCtx attributes are set to NULL.
If the first argument is a column, the ODCIIndexCtx attributes are set up as follows.
- If there are no applicable domain indexes, the ODCIIndexInfo attribute is set to NULL, else it is set up with the information about the domain index.
- The rowid attribute holds the row identifier of the row being operated on.
The scan context is passed as NULL to the first invocation of the operator. Since it is an IN/OUT parameter, the return value from the first invocation is passed in to the second invocation and so on.
The scan flag is set to RegularCall for all normal invocations of the operator. After the last invocation, the functional implementation is invoked once more, at which time any cleanup actions can be performed. During this call, the scan flag is set to CleanupCall and all other arguments except the scan context are set to NULL.

When index information is passed in, the implementation can compute the operator value by doing a domain index lookup using the row identifier as key. The index metadata is used to identify the index structures associated with the domain index. The scan context is typically used to share state with the subsequent invocations of the same operator.

Ancillary Data

Apart from filtering rows, an operator occurring in the WHERE clause might need to support returning ancillary data. Ancillary data is modeled as an operator (or multiple operators) with a single literal number argument. It has a functional implementation that has access to state generated by the index scan-based implementation of the primary operator occurring in the WHERE clause.

For example, in the following query,

SELECT Score(1) FROM Employees 
WHERE Contains(resume, 'OCI & UNIX', 1) =1;

Contains is the primary operator and can be evaluated using an index scan which, in addition to determining the rows that satisfy the predicate, also computes a score value for each row. The functional implementation for the Score operator simply accesses the state generated by the index scan to obtain the score for a given row identified by its row identifier. The literal argument 1 associates the ancillary operator Score to the corresponding primary operator Contains which generates the ancillary data.

In summary, ancillary data is modeled as independent operator(s) and is invoked by the user with a single number argument that ties it with the corresponding primary operator. Its functional implementation makes use of either the domain index or the state generated by the primary operator occurring in the WHERE clause. The functional implementation is invoked with extra arguments: the index context containing the domain index information, and the scan context which provides access to the state generated by the primary operator. The following sections discuss how operators modeling ancillary data are defined and invoked.

Creating Operator Binding that Computes Ancillary Data

An indextype designer needs to specify that an operator binding computes ancillary data. Such a binding is referred to as a primary binding. For example, a primary binding for Contains can be defined as follows:

CREATE OPERATOR Contains
BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER
WITH INDEX CONTEXT, SCAN TextIndexMethods COMPUTE ANCILLARY DATA
USING TextContains;

The above definition registers two bindings for Contains, namely:

CONTAINS(VARCHAR2, VARCHAR2) -- This can be used as before.
CONTAINS(VARCHAR2, VARCHAR2, NUMBER) -- When ancillary data is required elsewhere in SQL query, the operator can be invoked with the above signature. The NUMBER argument is used to associate the corresponding ancillary operator binding.

However, the indextype designer needs to define a single functional implementation:

TextContains(VARCHAR2, VARCHAR2, ODCIIndexCtx, TextIndexMethods, NUMBER).

Creating Operator Binding that Models Ancillary Data

An indextype designer has to implement the functional implementation for ancillary data operators in a manner similar to the index-based functional implementation. As discussed earlier, the function takes extra arguments. After the function is defined, the indextype designer can bind it to the operator with an additional ANCILLARY TO attribute, which indicates that the functional implementation needs to share state with the primary operator binding. The binding that is used for modeling ancillary data is referred to as the ancillary operator binding.

For example, let the TextScore() function contain code to evaluate the Score ancillary operator.

CREATE FUNCTION TextScore (Text IN VARCHAR2, Key IN VARCHAR2,
indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER)
RETURN NUMBER AS
BEGIN
.......
END TextScore;

An ancillary operator binding can be created as follows:

CREATE OPERATOR Score
BINDING (NUMBER) RETURN NUMBER
ANCILLARY TO Contains(VARCHAR2, VARCHAR2) 
USING TextScore;

The ANCILLARY TO clause specifies that it shares state with the implementation of corresponding primary operator binding CONTAINS(VARCHAR2, VARCHAR2).
Note that the functional implementation for the ancillary operator binding must have the same signature as the functional implementation for the primary operator binding.
The ancillary operator binding is invoked with a single literal number argument, e.g Score(1), Score(2), etc.

Operator Resolution

The operators corresponding to ancillary data are invoked by the user with a single number argument.

Note:

The number argument must be a literal in both the ancillary operation and the primary operator invocation. This is required so that the operator association can be done at the query compilation time.

The corresponding primary operator invocation in the query is determined by matching it with the number passed in as the last argument to the primary operator. After the matching primary operator invocation is found (it is an error to find zero or more than one matching primary operator invocation):

The arguments to the primary operator are also made operands to the ancillary operator.
The ancillary and primary operator executions are passed in the same scan context.

For example, consider the query

SELECT Score(1) FROM Employees
WHERE Contains(resume, ' Oracle & Unix', 1) =1;

The invocation of Score is determined to be ancillary to Contains based on the number argument 1, and the functional implementation for Score gets the following operands: (resume, 'Oracle&Unix', indexctx, scanctx, scanflg), where scanctx is shared with the invocation of Contains.

Operator Execution

The execution involves using an index scan to process the Contains operator. For each of the rows returned by the fetch() call of the index scan, the functional implementation of Score is invoked by passing it the ODCIIndexCtx argument, which contains the index information, row identifier, and a handle to the index scan state. The functional implementation can use the handle to the index scan state to compute the score.

Object Dependencies, Drop Semantics, and Validation

Dependencies

The dependencies among various objects are as follows:

Functions, Packages, and Object Types: Referenced by Operators and Indextypes.
Operators: Referenced by Indextypes, DML and Query SQL Statements.
Indextypes: Referenced by Domain Indexes.
Domain Indexes: Referenced (used implicitly) by DML and Query SQL Statements

Thus, the order in which these objects must be created, or their definitions exported for future Import are:

Functions, Packages, and Object Types, followed by Operators, followed by Indextypes.

Drop Semantics

The drop behavior for an object is as follows:

RESTRICT semantics: If there are any dependent objects the drop operation is disallowed.
FORCE semantics: The object is dropped even in the presence of dependent objects and the dependent objects if any are recursively marked invalid.

The table below shows the default and explicit drop options supported for operators and indextypes. The other schema objects are included for completeness and the corresponding drop behavior already available in Oracle.

Schema Object	Default Drop Behavior	Explicit Options Supported
Function	`FORCE`	None
Package	`FORCE`	None
Object Types	`RESTRICT`	`FORCE`
Operator	`RESTRICT`	`FORCE`
Indextype	`RESTRICT`	`FORCE`

Object Validation

Invalid objects are automatically revalidated, if possible, the next time they are referenced.

Privileges

To create an operator and its bindings, you must have EXECUTE privilege on the function, operator, package, or the type referenced in addition to CREATE OPERATOR or CREATE ANY OPERATOR privilege.
To create an indextype, you must have EXECUTE privilege on the type that implements the indextype in addition to CREATE INDEXTYPE or CREATE ANY INDEXTYPE privilege. Also, you must have EXECUTE privileges on the operators that the indextype supports.
To alter an indextype in your own schema, you must have CREATE INDEXTYPE system privilege.
To alter an indextype or operator in another user's schema, you must have the ALTER ANY INDEXTYPE or ALTER ANY OPERATOR system privilege.
To create a domain index, you must have EXECUTE privilege on the indextype in addition to CREATE INDEX or CREATE ANY INDEX privileges.
To alter a domain index, you must have EXECUTE privilege on the indextype.
To use the operators in queries and/or DML statements, you must have EXECUTE privilege on the operator and the associated function/package/type.
To change the implementation type, you must have EXECUTE privilege on the new implementation type.

Partitioned Domain Indexes

Discrete domain indexes can be built on the partitions of a range-partitioned table. Such indexes, called local domain indexes, are equipartitioned with the underlying table: all keys in a local domain index refer only to rows stored in its corresponding table partition; none refer to rows in other partitions.

Currently, local domain indexes can be created only for range-partitioned tables. Local domain indexes cannot be built for hash-partitioned tables or IOTs.

A local (as opposed to a global) domain index can index only a single column; it cannot index an expression.

You provide for using local domain indexes in the indextype, with the CREATE INDEXTYPE statement. For example:

CREATE INDEXTYPE TextIndexType
  FOR Contains (VARCHAR2, VARCHAR2)
  USING TextIndexMethods
  WITH LOCAL RANGE PARTITION;

The statement above specifies that the implementation type TextIndexType is capable of creating/maintaining local domain indexes. The clause WITH LOCAL RANGE PARTITION specifies the partitioning method for the base table.

The CREATE INDEX statement creates and partitions the index. Here is the syntax:

CREATE INDEX [schema.]index 
  ON [schema.]table [t.alias] (indexed_column)
  INDEXTYPE IS indextype
  [LOCAL [PARTITION [partition [PARAMETERS ('string')]]] [...] ]
  [PARAMETERS ('string')];

Note:

The syntax for CREATE INDEX given above differs from the syntax shown in Oracle9i SQL Reference, which omits the LOCAL [PARTITION] clause. Use the syntax given above to create a local domain index.

The LOCAL [PARTITION] clause indicates that the index is a local index on a partitioned table. You can specify partition names or allow Oracle to generate them.

In the PARAMETERS clause, specify the parameter string that is passed uninterpreted to the appropriate ODCI indextype routine. The maximum length of the parameter string is 1000 characters.

When you specify this clause at the top level of the syntax, the parameters become the default parameters for the index partitions. If you specify this clause as part of the LOCAL [PARTITION] clause, you override any default parameters with parameters for the individual partition. The LOCAL [PARTITION] clause can specify multiple partitions.

Once the domain index is created, Oracle invokes the appropriate ODCI routine. If the routine does not return successfully, the domain index is marked FAILED. The only operations supported on an failed domain index are DROP INDEX and (for non-local indexes) REBUILD INDEX.

The following example creates local domain index ResumeIndex:

CREATE INDEX ResumeIndex ON Employees(Resume)
  INDEXTYPE IS TextIndexType LOCAL;

There are these restrictions on creating a local domain index:

The index_expr list can specify only a single column.
You cannot specify a bitmap or unique domain index.

Dropping a Local Domain Index

A specified index partition cannot be dropped explicitly. To drop a local index partition (a local domain index), the entire domain index must be dropped:

DROP INDEX ResumeIndex;

Altering a Local Domain Index

The ALTER INDEX statement can be used to perform the following operations on a local domain index:

Rename the top level index
Modify the default parameter string for all the index objects
Modify the parameter string associated with a specific partition
Rename an index partition
Rebuild an index partition

The ALTER INDEXTYPE statement enables you to change properties and the implementation type of an indextype without having to drop and recreate the indextype and then rebuild all dependent indexes.

See Also:

Oracle9i SQL Reference for complete syntax of the SQL statements mentioned in this section

Summary of Index States

Like a domain index, a local domain index partition can be in one or more of the following states:

`IN_PROGRESS`	The index or the index partition is in this state before and during the execution of the `ODCIndex` DDL interface routines. The state is generally transitional and temporary. However, if the routine ends prematurely, the index could remain marked `IN_PROGRESS`.
`FAILED`	If the `ODCIIndex` interface routine doing DDL operations on the index returns an error, the index or index partition is marked `FAILED`.
`UNUSABLE`	Same as for regular indexes: an index on a partitioned table is marked `UNUSABLE` as a result of certain partition maintenance operations. Note that, for partitioned indexes, `UNUSABLE` is associated only with an index partition, not with the index as a whole.
`INVALID` or `VALID`	An index gets marked `INVALID` if an object that the index directly or indirectly depends upon is dropped or invalidated. This property is associated only with an index, never with an index partition.

Table Operations That Affect Indexes

The following tables list operations that can be performed on the underlying table of an index and describe the effect, if any, on the index.

Table Operation	Description
DROP table	Drops the table. Drops all the indexes and their corresponding partitions
TRUNCATE table	Truncates the table. Truncates all the indexes and the index partitions

ALTER TABLE Operation	Description
Base table operations that do not involve partition maintenance
Modify Partition Unusable local indexes	Marks the local index partition associated with the table partition as `UNUSABLE`
Modify Partition Rebuild Unusable local indexes	Rebuilds the local index partitions that are marked `UNUSABLE` and are associated with this table partition
Add Partition	Adds a new table partition. Also adds a new local index partition.
Drop Partition	Drops a range table partition. Also drops the associated local index partition
Truncate Partition	Truncate the table partition. Also truncates the associated local index partition
Base table operations that involve partition maintenance
Move Partition	Moves the base table partition to another tablespace. Corresponding local index partitions are marked `UNUSABLE`
Split Partition	Splits a table partition into two partitions. Corresponding local index partition is also split. If the resulting partitions are non-empty, the index partitions are marked `UNUSABLE`
Merge Partition	Merges two table partitions into one partition. Corresponding local index partitions should also merge. If the resulting partition contains data, the index partition is marked `UNUSABLE`
Exchange Partition Excluding Indexes	Exchanges a table partition with a non-partitioned table. Local index partitions and global indexes are marked `UNUSABLE`
Exchange Partition Including Indexes	Exchanges a table partition with a non-partitioned table. Local index partition is exchanged with global index on the non-partitioned table. Index partitions remain `USABLE`

ODCIIndex Interfaces for Partitioning Domain Indexes

The set of ODCIIndex interfaces that needs to be implemented for a domain index depends on whether the index is to be partitioned and, if so, in what way. There are two possibilities:

Non-partitioned domain index
Local range-partitioned indexes

The ODCIIndex interfaces that must be implemented for each option are listed below. Those in the first group must be implemented for any domain index, partitioned or not. Those in the other group need be implemented only to provide support for local range-partitioned indexes.

ODCIIndex Interfaces Required for any Domain Index

ODCIIndexGetInterface()

ODCIIndexAlter()

ODCIIndexCreate()

ODCIIndexDrop()

ODCIIndexTruncate()

ODCIIndexInsert()

ODCIIndexDelete()

ODCIIndexUpdate()

ODCIIndexStart()

ODCIIndexFetch()

ODCIIndexClose()

ODCIIndex Interfaces Required for Local Range-Partitioned Indexes

ODCIIndexExchangePartition()

ODCIIndexMergePartition()

ODCIIndexSplitPartition()

Domain Indexes and SQL*Loader

SQL*Loader conventional path loads are supported for tables on which domain indexes are defined, but direct path loads are not. To do a direct path load, first drop the domain index, do the direct path load in SQL*Loader, and then recreate the domain indexes.

7 Building Domain Indexes

Introduction to Extensible Indexing

What is Indexing?

Index Structures

The Relationship between Logical and Physical Structures

The Need for Index Structures that Encompass Unstructured Data

Kinds of Indexes

B-tree

Figure 7-1 B-tree Index Structure

Hash

Figure 7-2 Hash Index Structure

k-d tree

Figure 7-3 2-d Index Structure

Point Quadtree

Figure 7-4 Point Quadtree Index Structure

Why is Extensible Indexing Necessary?

Inability to Index Unstructured Data

Inability to Index Attributes of Column Objects

Inability to Index Values Derived from Domain-specific Operations

The Extensible Indexing API

Concepts: Extensible Indexing

Overview

Example: A Text Indextype

Indextypes

Interface

ODCIIndex Interface

Creating Indextypes

Dropping Indextypes

Commenting on Indextypes

ODCI Index Interface

Index Definition Methods

ODCIIndexCreate

ODCIIndexAlter

ODCIIndexTruncate

ODCIIndexDrop

Index Maintenance Methods

ODCIIndexInsert

ODCIIndexDelete

ODCIIndexUpdate

Index Scan Methods

ODCIIndexStart

ODCIIndexFetch

ODCIIndexClose

Index Metadata Method

Transaction Semantics during Index Method Execution

Transaction Semantics for Index Definition Routines

Consistency Semantics during Index Method Execution

Privileges During Index Method Execution

Domain Indexes

Domain Index Operations

Creating a Domain Index

Altering a Domain Index

Truncating a Domain Index

Dropping a Domain Index

Domain Indexes on Index-Organized Tables

Storing rowids in a UROWID column

DML on Index Storage Tables

Start, Fetch, and Close Operations on Index Storage Tables

Indexes on Non-Unique Columns

Domain Index Metadata

Export/Import of Domain Indexes

Moving Domain Indexes Using Transportable Tablespaces

Operators

Operator Bindings

Creating operators

Dropping Operators

Commenting on Operators

Invoking Operators

Operator Usage

Operator Execution

Examples

Operator Privileges

Operators and Indextypes

Operators in the WHERE Clause

Operator Predicates

Operator Resolution

Index Scan Setup

Operators Outside the WHERE Clause

Operator Resolution

Operator Execution

7
Building Domain Indexes