This chapter discusses various design considerations related to data cartridges.
This chapter includes these topics:
You can implement methods for object types in PL/SQL, C/C++, or Java. PL/SQL and Java methods run in the address space of the server. C/C++ methods are dispatched as external procedures and run outside the address space of the server.
The best implementation choice depends on the situation. Here are some guidelines:
A callout involving C or C++ is generally fastest if the processing is substantially CPU-bound. However, callouts incur the cost of dispatch, which might be important for small amounts of processing in C/C++.
PL/SQL is most efficient for methods that are not computation-intensive. The other implementation options are typically favored over PL/SQL if you have a large body of code already implemented in another language that you want to use a part of the data cartridge, or if you need to perform extensive computations.
Java is a relatively open implementation choice. Although Java is usually interpreted, high-performance applications might benefit from pre-compilation of methods or just-in-time compilers.
The invoker's rights mechanism lets a function execute with the privileges of the invoker. Thus, a cartridge can live within a schema dedicated to it, which can be used by other schemas without privileges for operating on objects in the schema where the cartridge resides.
LOBs with callouts, consider the following:
It can be to your advantage to code your callout so that it is independent of
LOB types (
The PL/SQL layer of your cartridge can open your
BFILE so that no
BFILE-specific logic is required in your callout (other than error recovery from
OCILob calls that do not operate on
With the advent of temporary
LOBs, you need to be aware of the deep copy that can occur when assignments and calls are done with temporary
BY REFERENCE) on
BLOB parameters as appropriate.
Traditionally, external procedures have a state-less model. All statement handles opened during the invocation of an external procedure are closed implicitly at the end of the call.
Oracle Database allows state information, such as OCI statement handles and associated state in the database, to be saved and used across invocations of external procedures in a session. By default, cartridges are stateless; however, you can use
OCIMemory services and
OCIContext services with
OCI_DURATION_SESSION or other appropriate duration to save state. Statement handles created in one external procedure invocation can be re-used in another. As the data cartridge developer, you must explicitly free these handles. Oracle recommends that you do this as soon as the statement handle is no longer needed. All state maintained for the statement in the OCI handles and in the database is freed as a result. This helps to improve the scalability of your data cartridge.
See Also:Oracle Database PL/SQL Language Reference
This section discusses some factors you should consider when designing indexes for your data cartridge.
Creating a domain index is not always the best course. If you decide to create a domain index, keep the following factors in mind:
For complex domain indexes, the functional implementation works better with small data size and when results are a large percentage of the total data size
Judicious use of the extensible optimizer can improve performance
Naming internal components for a domain index implementation can be an issue. Names of internal data objects are typically based on names you provide for table and indexes. The problem is that the derived names for the internal objects must not conflict with any other user-defined object or system object. To avoid this problem, develop some policy that restricts names, or implement some metadata management scheme to avoid errors during
CREATE, and so on.
You can create secondary indexes on IOT because using them is more efficient than storing data in a table and a separate index, particularly if most of your data is in the index. This offers a big advantage if you are accessing the data in multiple ways. Note that prior to the Oracle9i release, you could create only one index on IOTs.
Index structures can be stored in
LOBs, but take care to tune the
LOB for best performance. If you are accessing a particular
LOB frequently, create your table with the
CACHE option and place the
LOB index in a separate tablespace. If you are updating a
LOGGING and read/write in multiples of
CHUNK size. If you are accessing a particular portion of a
LOB frequently, buffer your reads/writes using
LOB buffering or your own buffering scheme.
With the extensible indexing framework, the meaning and representation of a user-defined index is left to the cartridge developer. Oracle provides basic index implementations such as IOTs. In certain cases, binary or character
LOBs can also be used to store complex index structures. IOTs,
CLOBs all live within the database. In addition to them, you may also store a user-defined index as a structure external to the database, for example in a
The external index structure gives you the most flexibility in representing your index. An external index structure is particularly useful if you have already invested in the development of in-memory indexing structures. For example, an operating system file may store index data, which is read into a memory mapped file at runtime. Such a case can be handled as a
BFILE in the external index routines.
External index structures may also provide superior performance, although this gain comes at some cost. Index structures external to the database do not participate in the transaction semantics of the database, which, in the case of index structures inside the database, make data and concomitant index updates atomic. This means that if an update to the data causes an update for the external index to be invoked through the extensible indexing interface, failures can cause the data updates to be rolled back but not the index updates. The database can only roll back what is internal to it: external index structures cannot be rolled back in synchronization with a database rollback. External index structures are perhaps most useful for read-only access. Their semantics become complex if updates to data are involved.
When the ODCIIndexFetch() routine is called, the rowids of all the rows that satisfy the operator predicate are returned. The maximum number of rows that can be returned by the ODCIIndexFetch() routine is
nrows being an argument to the ODCIIndexFetch() routine). The value of
nrows is decided by Oracle based on some internal factors. If you have a better idea of the number of rows that ought to be returned to achieve optimal query performance, you can determine that this number of rows is returned in the
VARRAY instead of
nrows. Note that the number of values in the
ODCIRidList must be less than or equal to
As the cartridge designer, you are in the best position to make a judgement regarding the number of rows to be returned. For example, if in the index
1500 rowids are stored together and
nrows = 2000, then it may be optimal to return
1500 rows instead of
2000 rows. Otherwise, the user would have to retrieve
3000 rowids, return
2000 of them, and note which
1000 rowids were not returned.
If you do not have any specific optimization in mind, you can use the value of
nrows to determine the number of rows to be returned. Currently the value of
nrows has been set to
If you implement indexes that use callouts, use multirow fetch to fetch the largest number of rows back to the server. This offsets the cost of making the callout.
All domain indexes should contain both indexed and functional implementations of operators, in case the optimizer chooses not to use the indexed implementation. You can, however, use the indexing structures to produce the functional result.
Data cartridges can be more efficient if they are designed with the extensible optimizer in mind. This section discusses topics that help you create such a design.
When estimating cost, Oracle considers the costs associated with CPU, I/O, and Network.
You can determine the cost of executing a C function using common profilers or tools. For SQL queries, an explain plan of the query gives a rough estimate of the cost of the query. In addition, the
tkprof utility helps you gather information about the CPU and the I/O cost involved in the operation. You can also determine the cost of executing a callout by using it in a SQL query which "selects from dual" and then estimating its cost using
The selectivity of a predicate is the number of rows returned by the predicate divided by the total number of rows in the tables. Selectivity refers to the fraction of rows of the table returned by the predicate.
The selectivity function should use the statistics collected for the table to determine what percentage of rows of the table will be returned by the predicate with the given list of arguments. For example, to compute the selectivity of a predicate
SelectedImage) which determines the images that are greater than the
SelectedImage, you might use a histogram of the sizes of the images in the database to compute the selectivity.
Statistics can affect the calculation of selectivity for predicates, as well as the cost of domain indexes.
The statistics collected for a table can affect the computation of selectivity of a predicate. Thus, statistics that help the user make a better judgement about the selectivity of a predicate should be collected for tables and columns. Knowing the predicates that can operate on the data is helpful in determining what statistics to collect.
For example, in a spatial domain the average, minimum, and maximum number of elements in a
VARRAY that contains the nodes of the spatial objects is a useful statistic to collect.
When a domain index is analyzed, statistics for the underlying objects that constitute the domain index should be analyzed. For example, if the domain index is composed of tables, the statistics collection function should analyze the tables when the domain index is analyzed. The cost of accessing the domain index can be influenced by the statistics that have been collected for the index. For instance, the cost of accessing a domain index could be approximated as the selectivity times the total number of data blocks in the various tables being accessed when the domain index is accessed.
To define cost, selectivity and statistics functions accurately requires a good understanding of the domain. The preceding guidelines are meant to help you understand some of the issues you need to take into account while working on the cost, selectivity and statistics functions. In general it may be a good idea to start by using the default cost and selectivity, and observing how queries of interest behave.
When you design a data cartridge, keep in mind the issues regarding maintenance.
In particular, if your cartridge maintains a large number of objects, views, tables, and so on, consider making a metadata table to maintain the relationships among the objects for the user. This reduces the complexity of developing and maintaining the cartridge when it is in use.
README with your cartridge to tell users how to install the cartridge
Make the cartridge installable in one step in the database, if possible, such as in
Tell users how to start the
listener if you are using callouts.
Tell users how to setup
extproc. Most users have never heard of
extproc and many users have never set up a listener. This is the primary problem when deploying cartridges.
With the Oracle Software Packager, you can easily create custom SQL install scripts using the
instantiate_file action. This feature lets you substitute variables in your files when they are installed and it leaves your user with scripts and files that are customized for their installation.
See Also:Oracle Database Advanced Application Developer's Guide for information on setting up the
To make your data cartridge more portable, consider the following:
Use the datatypes in
Use OCI calls where ever possible.
Use the switches that enforce ANSI C conformance when possible.
Use ANSI C function prototypes.
Build and test on your target platforms as early in your development cycle as possible. This helps you locate platform-specific code and provides the maximum amount of time to redesign.
Portability is reduced by:
Storing endian (big/little) specific data
Storing floating point data (IEEE/VAX/other)
Operating system-specific calls (if you must use them, isolate them in a layer specific to the operating system; however, if the calls you require are not in the OCI, and also are not in POSIX, then you are likely to encounter intractable problems)
size_t on a 64-bit platform