2 Getting Started with Oracle XML DB

This chapter provides some preliminary design criteria for consideration when planning your Oracle XML DB solution.

This chapter contains these topics:

Oracle XML DB Installation

Oracle XML DB is installed automatically in the following situations:

  • If Database Configuration Assistant (DBCA) is used to build Oracle Database using the general-purpose template

  • If you use SQL script catqm to install Oracle Database

You can determine whether or not Oracle XML DB is already installed. If it is installed, then the following are true:

  • User XDB exists. To check: SELECT * FROM ALL_USERS;


For a manual installation or de-installation of Oracle XML DB, see Chapter 28, "Administering Oracle XML DB".

When to Use Oracle XML DB

Oracle XML DB is suited for any application where some or all of the data processed by the application is represented using XML. Oracle XML DB provides for high performance ingestion, storage, processing and retrieval of XML data. Additionally, it also provides the ability to quickly and easily generate XML from existing relational data.

The type of applications that Oracle XML DB is particularly suited to include:

  • Business-to-Business (B2B) and Application-to-Application (A2A) integration

  • Internet applications

  • Content-management applications

  • Messaging

  • Web Services

A typical Oracle XML DB application has one or more of the following requirements and characteristics:

  • Large numbers of XML documents must be ingested or generated

  • Large XML documents need to be processed or generated

  • High performance searching, both within a document and across a large collections of documents

  • High Levels of security. Fine grained control of security

  • Data processing must be contained in XML documents and data contained in traditional relational tables

  • Uses languages such as Java that support open standards such as SQL, XML, XPath, and XSLT

  • Accesses information using standard Internet protocols such as FTP, HTTP(S)/WebDAV, or Java Database Connectivity (JDBC)

  • Full queriability from SQL and integration with analytic capabilities

  • Validation of XML documents is critical

Designing Your XML Application

Oracle XML DB provides you with the ability to fine tune how XML documents will be stored and processed in Oracle Database. Depending on the nature of the application being developed, XML storage must have at least one of the following features

  • High performance ingestion and retrieval of XML documents

  • High performance indexing and searching of XML documents

  • Be able to update sections of an XML document

  • Manage highly either or both structured and unstructured XML documents

Oracle XML DB Design Issues: Introduction

This section discusses the preliminary design criteria you can consider when planning your Oracle XML DB application. Figure 2-1 provides an overview of your main design options for building Oracle XML DB applications.

A. Data

Will your data be highly structured (mostly XML), semistructured, or mostly unstructured? If highly structured, will your tables be XML schema-based or non-schema-based? See "Oracle XML DB Application Design: A. How Structured Is Your Data?" and Chapter 3, "Using Oracle XML DB".

B. Access

How will other applications and users access your XML and other data? How secure must the access be? Do you need versioning? See "Oracle XML DB Application Design: B. Access Models".

C. Application Language

In which language(s) will you be programming your application? See "Oracle XML DB Application Design: C. Application Language".

D. Processing

Will you need to generate XML? See Chapter 16, "Generating XML Data from the Database".

How often will XML documents be accessed, updated, and manipulated? Will you need to update fragments or the whole document?

Will you need to transform the XML to HTML, WML, or other languages, and how will your application transform the XML? See Chapter 9, "Transforming and Validating XMLType Data".

Does your application need to be primarily database resident or work in both database and middle tier?

Is your application data-centric, document- and content-centric, or integrated (is both data- and document-centric). "Oracle XML DB Application Design: D. Processing Models".

Will you be exchanging XML data with other applications, across gateways? Will you need Advanced Queuing (AQ) or SOAP compliance? See Chapter 31, "Exchanging XML Data with Oracle Streams AQ".

E. Storage

How and where will you store the data, XML data, XML schema, and so on? See "Oracle XML DB Application Design: F. Storage Models".


The choices you make for A–D are typically interdependent, but they are not dependent on the storage model you choose (E).

Figure 2-1 Oracle XML DB Design Options

Description of Figure 2-1 follows
Description of "Figure 2-1 Oracle XML DB Design Options"

Oracle XML DB Application Design: A. How Structured Is Your Data?

Figure 2-2 shows the following data-structure categories and associated suggested storage options:

  • Structured data. Is your data mostly XML data?

  • Semi/pseudo-structured data. Does your data include some XML data?

  • Unstructured data. Is most of your data not XML data?

XML Schema-Based or Non-Schema-Based

Also consider the following data modeling questions:

  • If your application is XML schema-based:

    • For structured data, you can use either Character Large Object (CLOB) or structured storage.

    • For semistructured data, you can use either CLOB, structured, or hybrid storage. Here your XML schema can be more loosely coupled. See also "Oracle XML DB Application Design: F. Storage Models".

    • For unstructured data, an XML schema design is not applicable.

  • If your application is non-schema-based. For structured, semi/ pseudo-structured, and unstructured data, you can store your data in either CLOB values in XMLType tables or views or in files in Oracle XML DB Repository folders. With this design you have many access options including path- and query-based access through resource views.

Figure 2-2 Data Storage Models: How Structured Is Your Data?

Description of Figure 2-2 follows
Description of "Figure 2-2 Data Storage Models: How Structured Is Your Data?"

Oracle XML DB Application Design: B. Access Models

Figure 2-3 shows the two main data access modes to consider when designing your Oracle XML DB applications:

These options for accessing Oracle XML DB Repository data are also discussed in Chapter 20, "Accessing Oracle XML DB Repository Data".

You can also consider the following access model criteria:

Figure 2-3 Data Access Models: How Will Users or Applications Access the Data?

Description of Figure 2-3 follows
Description of "Figure 2-3 Data Access Models: How Will Users or Applications Access the Data?"

Oracle XML DB Application Design: C. Application Language

You can program your Oracle XML DB applications in the following languages:

Oracle XML DB Application Design: D. Processing Models

The following processing options are available and should be considered when designing your Oracle XML DB application:

How often will XML documents be accessed, updated, and manipulated? See Chapter 4, "XMLType Operations" and Chapter 22, "SQL Access Using RESOURCE_VIEW and PATH_VIEW".

Will you need to update fragments or the whole document? You can use XPath to specify individual elements and attributes of your document during updates, without rewriting the entire document. This is more efficient, especially for large XML documents. Chapter 5, "XML Schema Storage and Query: Basic".

Is your application data-centric, document- and content-centric, or integrated (is both data- and document-centric)? See Chapter 3, "Using Oracle XML DB".

Messaging Options

Advanced Queuing (AQ) supports XML and XMLType applications. You can create queues with payloads that contain XMLType attributes. These can be used for transmitting and storing messages that contain XML documents. By defining Oracle Database objects with XMLType attributes, you can do the following:

Oracle XML DB Application Design: F. Storage Models

Figure 2-4 summarizes the Oracle XML DB storage options with regards to using XMLType tables or views. If you have existing or legacy relational data, use XMLType views.

Regardless of which storage options you choose for your Oracle XML DB application, Oracle XML DB provides the same functionality. However, the option you choose will affect your application performance and the data fidelity (data accuracy).

Currently, the three main storage options for Oracle XML DB applications are:

  • LOB-based storage – LOB-based storage assures complete textual (document) fidelity, including preservation of whitespace. This means that if you store your XML documents as CLOB values, when the XML documents are retrieved there will be no data loss. Data integrity is high, and the cost of regeneration is low.

  • Structured storage – Structured storage loses whitespace information but maintains fidelity to the XML DOM, namely DOM stored = DOM retrieved. This provides:

    • Better SQL 'queriability' with improved performance

    • Piece-wise updatability

  • Semistructured storage – Semistructured, or hybrid, storage is a special case of structured storage in which a portion of the XML data is broken up into a structured format and the remainder of the data is stored as a CLOB value.

The storage options are totally independent of the following criteria:

Using XMLType Tables

If you are using XMLType tables you can store your data in:

  • CLOB (unstructured) storage

  • Structured storage

  • Semistructured storage


Use the thick JDBC driver with schema-based XMLType values stored object-relationally. (You can use either the thin or the thick driver with CLOB storage of XMLType values.)

Using XMLType Views

Use XMLType views if you have existing relational data. You can use the following options to define the XMLType views:

  • SQL/XML SQL functions. Using these functions you can store the data in relational tables and also generate/regenerate the XML. See Chapter 16, "Generating XML Data from the Database".

  • Object Types:

    • Object tables

    • Object constructors. You can store the data in relational tables using object constructors.

    • Object views

Figure 2-4 Structured Storage Options

Description of Figure 2-4 follows
Description of "Figure 2-4 Structured Storage Options"

Oracle XML DB Performance

One objection to using XML to represent data is that it generates higher overhead than other representations. Oracle XML DB incorporates a number of features specifically designed to address this issue by significantly improving the performance of XML processing. These are described in the following sections:

XML Storage Requirements

Surveys show that data represented in XML and stored in a text file is three times the size of the same data in a Java object or in relational tables. There are two main reasons for this:

  • Tag names (metadata describing the data) and white space (formatting characters) take up a significant amount of space in the document, particularly for highly structured, data-centric XML.

  • All data in an XML file is represented in human readable (string) format.

Storing Structured Documents in Oracle XML DB Saves Space

The string representation of a numeric value needs about twice as many bytes as the native (binary) representation When XML documents are stored in Oracle XML DB using the structured storage option, the shredding process discards all tags and white space in the document.

The amount of space saved by this optimization depends on the ratio of tag names to data, and the number of collections in the document. For highly-structured, data-centric XML the savings can be significant. When a document is printed, or when node-based operations such as XPath evaluations take place, Oracle XML DB uses the information contained in the associated XML schema to dynamically reconstruct any necessary tag information.

XML Memory Management

Document Object Model (DOM) is the dominant programming model for XML documents. DOM APIs are easy to use but the DOM Tree that underpins them is expensive to generate, in terms of memory. A typical DOM implementation maintains approximately 80 to 120 bytes of system overhead for each node in the DOM tree. This means that for highly structured data, the DOM tree can require 10 to 20 times more memory than the document on which it is based.

A conventional DOM implementation requires the entire contents of an XML document to be loaded into the DOM tree before any operations can take place. If an application only needs to process a small percentage of the nodes in the document, this is extremely inefficient in terms of memory and processing overhead. The alternative SAX approach reduces the amount of memory required to process an XML document, but its disadvantage is that it only allows linear processing of nodes in the XML Document.

Oracle XML DB Reduces Memory Overhead for XML Schema-Based Documents by Using XML Objects (XOBs)

Oracle XML DB reduces memory overhead associated with DOM programming by managing XML schema-based XML documents using an internal in-memory structure called an XML Object (XOB). A XOB is much smaller than the equivalent DOM since it does not duplicate information like tag names and node types, that can easily be obtained from the associated XML schema. Oracle XML DB automatically uses a XOB whenever an application works with the contents of a schema-based XMLType. The use of the XOB is transparent to you. It is hidden behind the XMLType datatype and the C, PL/SQL, and Java APIs.

XOB Uses Lazily-Loaded Virtual DOM

The XOB can also reduce the amount of memory required to work with an XML document using the Lazily-Loaded Virtual DOM feature. This allows Oracle XML DB to defer loading in-memory representation of nodes that are part of sub-elements or collection until methods attempt to operate on a node in that object. Consequently, if an application only operates on a few nodes in a document, only those nodes and their immediate siblings are loaded into memory.The XOB can only used when an XML document is based on an XML schema. If the contents of the XML document are not based on an XML schema, a traditional DOM is used instead of the XOB.

XML Parsing Optimizations

To populate a DOM tree the application must parse the XML document. The process of creating a DOM tree from an XML file is very CPU- intensive. In a typical DOM-based application, where the XML documents are stored as text, every document has to be parsed and loaded into the DOM tree before the application can work with it. If the contents of the DOM tree are updated the whole tree has to be serialized back into a text format and written out to disk.

With Oracle XML DB No Re-Parsing is Needed

Oracle XML DB eliminates the need to keep re-parsing documents. Once an XML document has been stored using structured storage techniques no further parsing is required when the document is loaded from disk into memory. Oracle XML DB is able to map directly between the on disk format and in-memory format using information derived from the associated XML schema. When changes are made to the contents of a schema-based XMLType, Oracle XML DB is able to write just the updated data back to disk.Again, when the contents of the XMLType are not based on an XML schema a traditional DOM is used instead.

Node-Searching Optimizations

Most DOM implementations use string comparisons when searching for a particular node in the DOM tree. Even a simple search of a DOM tree can require hundreds or thousands of instruction cycles. Searching for a node in a XOB is much more efficient than searching for a node in a DOM. A XOB is based on a computed offset model, similar to a C/C++ object, and uses dynamic hashtables rather than string comparisons to perform node searches.

XML Schema Optimizations

Making use of the powerful features associated with XML schema in a conventional XML application can generate significant amounts of additional overhead. For example, before an XML document can be validated against an XML schema, the schema itself must be located, parsed, and validated.

Minimizing XML Schema Overhead After a Schema Is Registered

Oracle XML DB minimizes the overhead associated with using XML schema. When an XML schema is registered with the database it is loaded in the Oracle XML DB schema cache, along with all of the metadata required to map between the XML, XOB and on disk representations of the data. This means that once the XML schema has been registered with the database, no additional parsing or validation of the XML schema is required before it can be used. The schema cache is shared by all users of the database. Whenever an Oracle XML DB operation requires information contained in the XML schema it can access the required information directly from the cache.

Load Balancing Through Cached XML Schema

Some operations, such as performing a full schema validation, or serializing an XML document back into text form can still require significant memory and CPU resources. Oracle XML DB allows these operations to be off-loaded to the client or middle tier processor. Oracle Call Interface (OCI) interface and thick Java Database Connectivity (JDBC) driver both allow the XOB to be managed by the client.The cached representation of the XML schema can also be downloaded to the client. This allows operations such as XML printing, and XML schema validation to be performed using client or middle tier resources, rather than server resources.

Reduced Bottlenecks From Code That Is Not Native

Another bottleneck for XML-based Java applications happens when parsing an XML file. Even natively compiled or JIT compiled Java performs XML parsing operations twice as slowly compared to using native C language. One of the major performance bottlenecks in implementing XML applications is the cost of transforming data in an XML document between text, Java, and native server representations. The cost of performing these transformations is proportional to the size and complexity of the XML file and becomes severe even in moderately sized files.

Oracle XML DB Implements Java and PL/SQL APIs Over Native C

Oracle XML DB addresses these issues by implementing all of the Java and PL/SQL interfaces as very thin facades over a native 'C' implementation. This provides for language-neutral XML support (Java, C, PL/SQL, and SQL all use the same underlying implementation), as well as the higher performance XML parsing and DOM processing.

Reduced Java Type Conversion Bottlenecks

One of the biggest bottlenecks when using Java and XML is with type conversions. Internally Java uses UCS-2 to represent character data. Most XML files and databases do not contain UCS-2 encoded data. This means that all data contained in an XML file has to be converted from 8-Bit or UTF-8 encoding to UCS-2 encoding before it can be manipulated in a Java program.

Oracle XML DB Uses Lazy Type Conversion to Avoid Unneeded Type Conversions

Oracle XML DB addresses these problems with lazy type conversions. With lazy type conversions the content of a node is not converted into the format required by Java until the application attempts to access the contents of the node. Data remains in the internal representation till the last moment. Avoiding unnecessary type conversions can result in significant performance improvements when an application only needs to access a few nodes in an XML document.

Consider a JSP that loads a name from the Oracle Database and prints it out in the generated HTML output. Typical JSP implementations read the name from the database (that probably contains data in the ASCII or ISO8859 character sets) convert the data to UCS-2, and return it to Java as a string. The JSP would not look at the string content, but only print it out after printing the enclosing HTML, probably converting back to the same ASCII or ISO8859 for the client browser. Oracle XML DB provides a write interface on XMLType so that any element can write itself directly to a stream (such as a ServletOutputStream) without conversion through Java character sets. Figure 2-5 shows the Oracle XML DB Application Program Interface (API) stack.

Figure 2-5 Oracle XML DB Application Program Interface (API) Stack

Description of Figure 2-5 follows
Description of "Figure 2-5 Oracle XML DB Application Program Interface (API) Stack"