6 XML Programming Best Practices

This chapter describes the best programming practices to use when creating Java applications that process XML data.

This chapter includes the following sections:

When to Use the DOM, SAX, and StAX APIs
Increasing Performance of XML Validation
When to Use XML Schemas or DTDs
Configuring External Entity Resolution for Maximum Performance
Using SAX InputSources
Improving Performance of Transformations

When to Use the DOM, SAX, and StAX APIs

You can parse an XML document with the DOM, SAX, or StAX APIs. This section describes the pros and cons of each API.

The DOM API is good for small documents, or those that contain under a thousand elements. Because DOM constructs a tree of your XML data, it is ideal for editing the structure of your XML document by adding or deleting elements.

The DOM API parses the entire XML document and converts it into a DOM tree before you can begin processing it. This cost might be beneficial if you know that you need to access the entire document. If you occasionally need to access only part of the XML document, the cost could decrease the performance of your application with no added benefit. In this case the SAX or StAX API is preferable.

The SAX API is the most lightweight of the APIs. It is ideal for parsing shallow documents (documents that do not contain much nesting of elements) with unique element names. SAX uses a callback structure; this means that programmers handle parsing events as the API is reading an XML document, which is a relatively efficient and quick way to parse.

However, the callback nature of SAX also means that it is not the best API to use if you want to modify the structure of your XML data. Additionally, programming your application to handle callbacks is not always straight-forward and intuitive.

The StAX API is based on SAX, so all the reasons for using SAX also apply to the StAX API. In addition, the StAX API is more intuitive to use than SAX, because programmers ask for events rather than react to them as they happen. The StAX API is also best if you plan to pass the entire XML document as a parameter; it is easier to pass an input stream than it is to pass SAX events. Finally, the StAX API was designed to be used when binding XML data to Java objects.

Increasing Performance of XML Validation

If the performance of your XML application decreases due to a parser validation issue, and you need to validate your XML documents, you might improve the performance of your application by writing your own customized code that validates the data as it is being received or parsed, rather than using the setValidating() method of the DocumentBuilderFactory or SaxParserFactory.

When you turn on validation while parsing an XML document with SAX or DOM, the parser might do more validation of the document than you really need, thus decreasing the overall performance of the application. Instead, consider choosing certain points during the parsing of the document when you want to check that the XML document is valid, and add your own Java code at those points.

For example, assume you are writing an application that uses the WebLogic XML Streaming API to processes an XML purchase order. Because you know that the first element of the document should be <purchase_order>, you can quickly verify that the document appears to be valid by pulling the first element off the stream and checking its name. This check does not ensure that the entire XML document is valid, of course, but you can continue checking for known elements as you pull elements from the stream. These quick checks are much faster than using the standard setValidating() methods.

When to Use XML Schemas or DTDs

There are two ways to describe the structure of an XML document: DTDs and XML Schemas.

The current trend is to use Schemas to describe XML documents. Schemas are much more expressive than DTDs because the set of available data types to describe XML elements is much richer and you can describe more specifically what is valid in an XML document. In addition, you can only use Schemas, and not DTDs, in SOAP messages. Because SOAP is the main messaging protocol used in Web services, consider using Schemas to describe any XML documents that might be used as either input or output parameters to Web services.

Still, DTDs have a few advantages. DTDs are more widely supported than Schemas, although that is changing rapidly. Because DTDs are less expressive than Schemas, they are easier to write and manage.

However, Oracle recommends that you use Schemas to describe your XML documents.

Configuring External Entity Resolution for Maximum Performance

Oracle highly recommends you store external entities locally whenever possible rather than always retrieving the entity over the network. Storage improves the performance of your applications because it is much faster to look up an entity on the same machine as WebLogic Server than it is to look it up over a network connection.

For detailed information on configuring external entity resolution for WebLogic Server, see External Entity Configuration Tasks.

Using SAX InputSources

When you use the SAX API to parse an XML document, you first create an InputSource object from the XML document and then pass the InputSource object to the parse() method. You can create the InputSource object from either a java.io.InputStream or java.io.Reader object based on your XML data.

Oracle recommends that you create an InputSource from a java.io.InputStream object whenever possible. When passed an InputStream object, the SAX parser auto-detects the character encoding of the XML data and automatically instantiates an InputStreamReader object for you, using the correct character encoding. In other words, the parser does all the character encoding work for you, which is more likely to be error-free at runtime than if you decide to specify the character encoding yourself.

Improving Performance of Transformations

XSLT is a language for transforming an XML document into a different format, such as another XML document, HTML, WML, and so on. To use XSLT, you create a stylesheet that defines how each element in the input XML document should be transformed in the output document.

Although XSLT is a powerful language, creating stylesheets for complex transformations can be very complicated. In addition, the actual transformation requires a lot of resources and might decrease the performance of your application.

Therefore, if your transformations are complex, consider writing your own transformation code in your application rather than using XSLT stylesheets. Also consider using the DOM API. First parse the XML document, manipulate the resulting DOM tree as needed, then write out the new document, using custom Java code to transform it into its final format.