Skip Headers

Oracle® Data Cartridge Developer's Guide
10g Release 1 (10.1)

Part Number B10800-01
Go to Documentation Home
Go to Book List
Book List
Go to Table of Contents
Go to Index
Go to Master Index
Master Index
Go to Feedback page

Go to previous page
Go to next page
View PDF

1 What Is a Data Cartridge?

In addition to the efficient and secure management of data ordered under the relational model, Oracle provides support for data organized under the object model. Object types and other features such as large objects (LOBs), external procedures, extensible indexing, and query optimization can be used to build powerful, reusable server-based components called data cartridges.

This chapter contains these topics:

What Are Data Cartridges?

Data cartridges extend the capabilities of the Oracle server. To do this, they take advantage of the framework of the Oracle Extensibility Architecture.

The framework lets you capture business logic and processes associated with specialized, or domain-specific, data in user-defined datatypes. Data cartridges that provide new behavior without needing new attributes have the option of using packages rather than user-defined types. Either way, you determine how the server interprets, stores, retrieves, and indexes the data. Data cartridges package this functionality, resulting in software components that plug into a server, extending its capabilities into a new domain.

In other words, the database is extensible. You can customize the indexing and query optimization mechanisms of the database management system, providing specialized services or more efficient processing for user-defined business objects and rich types. You use the extensibility interfaces to register your implementations with the server. Registering them causes the server to use your implementations instead its own when doing your specialized processing.

The extensibility interfaces consist of functions that the server calls to execute the custom indexing or optimizing behavior implemented for a data cartridge. The interfaces are defined by Oracle; as the cartridge developer, you must implement the functions (also frequently called interfaces) to embody the specialized behavior you require. In general, you implement the functions as static methods of an object type. An object type that implements the extensible indexing interface is called an indextype; an object type that implements the extensible optimizing interface is called a statistics type.

The key characteristics of data cartridges are:

Why Build Data Cartridges?

The Need to Handle Complex Data Objects

Most industries have evolved sophisticated models to handle complex data objects that make up the essence of their business. By data objects, we mean both the structures that relate different units of information and the operations that are performed on them.

The simple names given to these data objects often conceal considerable complexity. For example, the banking industry has many different types of bank accounts. Each bank account has customer demographic information, balance information, transaction information, and rules that embody its behavior (deposit, withdrawal, interest accrual, and so forth).

Data cartridges let you encapsulate this business logic in software components that integrate with the Oracle server. For some time it has been possible to add logic to a database using stored procedures. With the addition of the object-relational extension, application programmers and independent software vendors can enhance the Oracle server to support data types, processes, and logic to model business objects.

The Need to Operate on Complex and Multimedia Datatypes

While business models have developed increasingly complex data objects, information technology has made it necessary to work with new kinds of data, such as satellite images, X-rays, animal sounds, seismic vibrations, chemical models. Complex and multimedia datatypes are now frequently stored and retrieved, queried and analyzed.

Web-based applications routinely include many different kinds of complex data. Including application-specific data types and the associated business logic requires a new class of networked, content-rich, multitiered, distributed applications. Data cartridges help you meet this need by combining scalar and unstructured datatypes in domain-specific components.

Data Cartridge Domains

Data cartridges are typically domain-specific. Domains are characterized by content and scope.

In terms of content, a data cartridge can accommodate scalar, complex, and multimedia data. Scalar data can be modeled using native SQL types such as INTEGER, NUMBER, or CHAR. Complex data includes matrices, temperature and magnetic grids, and compound documents. Unstructured multimedia data includes such information as video, voice, and image data.

In terms of scope, a data cartridge can have broad horizontal (cross-industry) coverage or it can be specialized for a specific type of business. For example, a data cartridge for general storage and retrieval of textual data is cross-industry in scope; a data cartridge for the storage and retrieval of legal documents for litigation support is industry-specific.

Table 1-1 shows a way of classifying data cartridge domains according to their content (type of data) and scope (cross-industry or industry-specific), with some examples.

Table 1-1 Data Cartridge Domains by Content and Scope

Content Scope: Cross-Industry Uses Scope: Industry-Specific Extensions
Scalar Data Statistical conversion Financial and Petroleum
Multimedia and Complex Unstructured Data Text Image
Audio/Video Spatial Legal
Medical Broadcasting Utilities

Oracle lets you use built-in scalar datatypes to construct more complex user-defined types. The Object-Relational Database Management System provides foundational cartridges that package multimedia and complex data, which can be used as bases for applications across many different industries:

Table 1-2 Oracle Cartridges as Bases for Development

Cartridge Database Model Behavior
Text Tokenized serial byte stream Display, Compress, Reformat, Index...
Image Structured large object Compress, Crop, Scale, Rotate, Reformat...
Spatial Geometric objects such as points, lines, polygons Project, Rotate, Transform, Map...
Video Structured large object of serial (dynamic) image data Compress, Play, Rewind, Pause...

Another way of viewing the relationship of cartridges to domains is to view basic multimedia datatypes as forming a foundation that can be extended in specific ways by specific industries. For example, Table 1-3 shows cartridges that could be built for medical applications:

Table 1-3 Medicine-Specific Extensions to Basic Cartridges

Text Image Audio Video Spatial
Records MRI Heartbeat Teaching Demographic Analysis

A cartridge providing basic services can be deployed across many industries; for example, a text cartridge can be utilized within both law and medicine. A cartridge can also leverage domain expertise across an industry; for example, an image cartridge can provide basic functionality for both X-rays and Sonar within medicine. These cartridges can be further extended for more specialized vertical applications. For instance, medical cartridges could be extended by other cartridges:

Table 1-4 Examples of Extensions to a Basic Cartridge

Image -> MRI -> Brain MRI -> Neonatal Brain MRI

Extending the Server: Services and Interfaces

The Oracle server provides services for basic data storage, query processing, optimization, and indexing. Various applications use these services to access database capabilities. However, data cartridges have specialized needs because they incorporate domain-specific data. To accommodate these specialized applications, the basic services have been made extensible.

That is, where standard Oracle services are not adequate for the processing a data cartridge requires, you as the data cartridge developer can provide services that are specially tuned to your cartridge. Every data cartridge can provide its own implementations of these services.

For example, suppose you want to build a spatial data cartridge for geographic information systems (GIS) applications. In this case, you might need to implement routines that create a spatial index, insert an entry into the index, update the index, delete from the index, and perform other required operations. Thus, you extend the indexing service of the server.

Extensibility Services

Figure 1-1 shows the standard services implemented by the Oracle server. This section describes some of these services, highlighting major Oracle capabilities as they relate to data cartridge development.

Extensible Type System

The Oracle universal data server provides both native and extensible type system services. Historically, most applications have focused on accessing and modifying corporate data that is stored in tables composed of native SQL datatypes, such as INTEGER, NUMBER, DATE, and CHAR. Oracle adds support for new types, including:

  • User-defined object types

  • Collections:

    • VARRAY (varying length array)

    • Nested table

  • REF (relationship)

  • Internal large object types:

    • BLOB (binary large object)

    • CLOB (character large object)

  • BFILE (external file)

This section discusses these types.

User-Defined Types

A user-defined type differs from native SQL datatypes in two ways:

  • it is defined by a user, usually a cartridge developer

  • it specifies both the underlying persistent data, attributes, and the related behaviors, methods,

User-defined types extend the modeling capabilities of the native datatypes. With user-defined types, you can make better models of complex entities in the real world by binding data attributes to semantic behaviors.

A user-defined type can have one or more attributes. Each attribute has a name and a type. The type of an attribute can be a native SQL type, a LOB, a collection, another object type, or a REF type. The syntax for defining such types is discussed in Chapter 3, " Defining Object Types ".

See Also:

Oracle Database Application Developer's Guide - Object-Relational Features for more information on user-defined types

A method is a procedure or a function that is part of a user-defined type definition. Methods can access and manipulate attributes of the related type. Methods can run within the execution environment of the Oracle server. Methods can also be dispatched outside the server as part of the extensible server execution environment.

Collection Types

Collections are SQL datatypes that contain multiple elements. Elements, or values, of a collection are all from the same type hierarchy. In Oracle, collections of complex types can be VARRAYs or nested tables.

A VARRAY contains a variable number of ordered elements. The VARRAY datatype can be used for a column of a table or an attribute of an object type. The element type of a VARRAY can be either a native datatype, such as NUMBER, or a user-defined type.

A nested table can be created using Oracle SQL to provide the semantics of an unordered collection. As with a VARRAY, a nested table can define a column of a table or an attribute of a user-defined type.

Reference Types (REF)

If you create an object table in Oracle, you can obtain a reference that acts as a database pointer to an associated row object. References are important for navigating among object instances.

The REF operator obtains a reference to a row object. Because REFs rely on the underlying object identity, you can only use REF with an object stored as a row in an object table or objects composed from an object view.

See Also:

PL/SQL User's Guide and Reference for details of the REF operator, and Oracle Database Application Developer's Guide - Object-Relational Features for more information about objects

Large Objects

Oracle provides large object (LOB) types to handle the storage demands of images, video clips, documents, and other forms of unstructured data. Large objects are stored in a way that optimizes space utilization and provides efficient access. Large objects are composed of locators and the related binary or character data. The LOB locators are stored in-line with other table columns and, for internal LOBs (BLOB, CLOB, and NCLOB), the data can be in a separate database storage area. For external LOBs (BFILE), the data is stored outside the database tablespaces in operating system files. A table can contain multiple LOB columns (in contrast to the limit of one LONG RAW column for each table). Each LOB column can be stored in a separate tablespace, and even on different secondary storage devices.

Oracle SQL data definition language (DDL) extensions let you create, modify, and delete tables and object types that contain LOBs. The Oracle SQL data manipulation language (DML) includes statements to insert and delete complete LOBs. There is also an extensive set of statements for piece-wise reading, writing, and manipulating of LOBs with Java, PL/SQL, and the Oracle Call Interface (OCI) software.

For internal LOB types, both the locators and related data participate fully in the transactional model of the Oracle server. The data for BFILEs does not participate in transactions; however, BFILE locators are fully supported by Oracle server transactions.

Unlike scalar quantities, a LOB value cannot be indexed using built-in indexing schemes. However, you can use the various LOB APIs to build modules, including methods of user-defined types, to access and manipulate LOB content. The extensible indexing framework lets you define the semantics of data residing in LOBs and manipulate the data using these semantics.

Oracle provides a variety of interfaces and environments to access and manipulate LOBs. The use of LOBs to store and manipulate binary and character data to represent your domain is discussed Chapter 6, " Working with Multimedia Datatypes".

See Also:

Oracle Database Application Developer's Guide - Large Objects for detailed discussions of large objects

Extensible Server Execution Environment

The Oracle type system decouples the implementation of a member method for a user-defined type from the specification of the method. Components of an Oracle data cartridge can be implemented using any of the popular programming languages. In Oracle, methods, functions, and procedures can be developed using PL/SQL, external C/C++ language routines, or Java. Thus, the database server runtime environment can be extended by user-defined methods, functions, and procedures.

In Oracle, Java offers data cartridge developers a powerful implementation choice for data cartridge behavior. In addition, PL/SQL offers a data cartridge developer a powerful procedural language that supports all the object extensions for SQL. With PL/SQL, program logic can execute on the server and perform traditional procedural language operations such as loops, if-then-else clauses, and array access.

While PL/SQL and Java are powerful, certain computation-intensive operations such as a Fast Fourier Transform or an image format conversion are handled more efficiently by C programs. With the Oracle Server, you can call C language programs from the server. Such programs are executed in a separate address space from the server. Thus, the database server is insulated from any program failures that might occur in external procedures, preventing the Oracle database from being corrupted by such failures.

With certain reasonable restrictions, external procedures can call back to the Oracle Server using OCI. Callbacks are particularly useful for processing LOBs. For example, an external procedure can use callbacks to perform piece-wise reads or writes of LOBs stored in the database. External procedures can also use callbacks to manipulate domain indexes stored as Index-Organized Tables in the database.

Figure 1-2 External Program Executing in Separate Address Space

Description of addci010.gif follows
Description of the illustration addci010.gif

Extensible Indexing

Basic database management systems support a few types of access methods, such as B+trees and hash indexes, on a limited set of data types, such as numbers and strings. For simple data types such as integers and small strings, all aspects of indexing can easily be handled by the database system. However, as data becomes more complex, including information such as text, spatial, image, video, and audio, it requires complex data types and specialized indexing techniques.

Complex data types have application-specific formats, indexing requirements, and selection predicates. For example, there are many different means of document encoding (ODA, XML, plain text) and information retrieval techniques (keyword, full-text boolean, similarity, probabilistic, and so on). Similarly, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all possible kinds of complex data and indexing. Oracle's solution is to build an extensible server that lets you define the index types you require.

Such user-defined indexes are called domain indexes because they index data in an application-specific domain. The cartridge is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The physical index can be stored in the Oracle database as tables or externally as a file.

A domain index is a schema object. It is created, managed, and accessed by routines implemented as methods of a user-defined type called an indextype. The routines that an indextype must implement, and the operations the routines must perform, are described in Chapter 8, " Building Domain Indexes". Implementation of the routines is specific to an application and must be done by the cartridge developer.

With extensible indexing, the application

  • Defines the structure of the domain index

  • Stores the index data, either inside or outside the Oracle database

  • Manages, retrieves, and uses the index data to evaluate user queries

When the database system handles the physical storage of domain indexes, data cartridges

  • Define the format and content of an index. Cartridges define an index structure that can accommodate a complex data object.

  • Build, delete, and update a domain index. Cartridges handle building and maintaining the index structures. Because indexes are modeled as collections of tuples, in-place updating is directly supported.

  • Access and interpret the content of an index. Cartridges become an integral component of query processing: content-related clauses for database queries are handled by the data cartridge.

Typical relational and object-relational database management systems do not support extensible indexing. Consequently, many applications maintain file-based indexes for complex data residing in relational database tables. A considerable amount of code and effort is required to:

  • maintain consistency between external indexes and the related relational data

  • support compound queries involving tabular values and external indexes

  • manage the system, performing backup, recovery, storage allocation, and so on, with multiple forms of persistent storage, such as files and databases

By supporting extensible indexes, the Oracle server significantly reduces the level of effort needed to develop solutions involving high-performance access to complex datatypes.

Extensible Optimizer

The extensible optimizer lets user-defined functions and indexes collect statistical information such as selectivity and cost functions. This information is used by the optimizer in choosing a query plan. The cost-based optimizer is thus extended to use the user-supplied information; the rule-based optimizer is unchanged.

The optimizer generates an execution plan for a SQL statement. An execution plan includes an access method for each table in the FROM clause, and an ordering (called the join order) of the tables in the FROM clause. System-defined access methods include indexes, hash clusters, and table scans. The optimizer chooses a plan by generating a set of join orders or permutations, computing the cost of each, and selecting the one with the lowest cost.

For each table in the join order, the optimizer computes the cost of each possible access method and join method and chooses the one with the lowest cost. The cost of the join order is the sum of the access method and join method costs. The costs are calculated using algorithms that together compose the cost model. A cost model can include varying levels of detail about the physical environment in which the query is executed. The current cost model includes the number of disk accesses and estimates of network costs, with minor adjustments to compensate for the lack of detail.

The optimizer uses statistics about the objects referenced in the query to compute the costs. The statistics are gathered using the DBMS_STATS package. The optimizer uses these statistics to calculate cost and selectivity. The selectivity of a predicate is the fraction of rows in a table that will be chosen by the predicate. It is a number between 0 and 100 (expressed as percentage).

Extensibility allows users to define new operators, index types, and domain indexes. For such user-defined operators and domain indexes, the extensible optimizer lets users control the three main components used by the optimizer to select an execution plan: statistics, selectivity, and cost.

See Also:

PL/SQL Packages and Types Reference for information about DBMS_STATS

Extensibility Interfaces

There are three classes of extensibility interfaces:

DBMS Interfaces

The DBMS interfaces are the simplest kind of extensibility services. DBMS interfaces are made available through extensions to SQL or to the Oracle Call Interface (OCI). For example, the extensible type manager uses the CREATE TYPE syntax in SQL. Similarly, extensible indexing uses DDL and DML support for specifying and manipulating indexes.

Cartridge Basic Service Interfaces

Generic interfaces provide basic services like memory management, context management, internationalization, and cartridge-specific management. These cartridge basic interface services are used by data cartridges to implement behavior for new datatypes in the context of the server's execution environment. These services provide helper routines that make it easy for data cartridge developers to write robust, portable server-side methods.

Data Cartridge Interfaces

When processing user-defined indextypes, Oracle calls data cartridge functions to perform operations such as index searches or fetch operations. For user-defined query optimization, the query optimizer calls functions implemented by the data cartridge to compute the cost of user-defined operators or functions.