The Java EE 5 Tutorial

Iterator API

The StAX iterator API represents an XML document stream as a set of discrete event objects. These events are pulled by the application and provided by the parser in the order in which they are read in the source XML document.

The base iterator interface is called XMLEvent, and there are subinterfaces for each event type listed in Table 18–2. The primary parser interface for reading iterator events is XMLEventReader, and the primary interface for writing iterator events is XMLEventWriter. The XMLEventReader interface contains five methods, the most important of which is nextEvent, which returns the next event in an XML stream. XMLEventReader implements java.util.Iterator, which means that returns from XMLEventReader can be cached or passed into routines that can work with the standard Java Iterator; for example:

public interface XMLEventReader extends Iterator {
    public XMLEvent nextEvent() throws XMLStreamException;
    public boolean hasNext();
    public XMLEvent peek() throws XMLStreamException;
    ...
}

Similarly, on the output side of the iterator API, you have:

public interface XMLEventWriter {
    public void flush() throws XMLStreamException;
    public void close() throws XMLStreamException;
    public void add(XMLEvent e) throws XMLStreamException;
    public void add(Attribute attribute) throws XMLStreamException;
    ...
}

Iterator Event Types

Table 18–2 lists the XMLEvent types defined in the event iterator API.

Table 18–2 XMLEvent Types

Event Type 

Description 

StartDocument

Reports the beginning of a set of XML events, including encoding, XML version, and standalone properties. 

StartElement

Reports the start of an element, including any attributes and namespace declarations; also provides access to the prefix, namespace URI, and local name of the start tag. 

EndElement

Reports the end tag of an element. Namespaces that have gone out of scope can be recalled here if they have been explicitly set on their corresponding StartElement.

Characters

Corresponds to XML CData sections and CharacterData entities. Note that ignorable white space and significant white space are also reported as Character events.

EntityReference

Character entities can be reported as discrete events, which an application developer can then choose to resolve or pass through unresolved. By default, entities are resolved. Alternatively, if you do not want to report the entity as an event, replacement text can be substituted and reported as Characters.

ProcessingInstruction

Reports the target and data for an underlying processing instruction. 

Comment

Returns the text of a comment. 

EndDocument

Reports the end of a set of XML events. 

DTD

Reports as java.lang.String information about the DTD, if any, associated with the stream, and provides a method for returning custom objects found in the DTD.

Attribute

Attributes are generally reported as part of a StartElement event. However, there are times when it is desirable to return an attribute as a standalone Attribute event; for example, when a namespace is returned as the result of an XQuery or XPath expression.

Namespace

As with attributes, namespaces are usually reported as part of a StartElement, but there are times when it is desirable to report a namespace as a discrete Namespace event.

Note that the DTD, EntityDeclaration, EntityReference, NotationDeclaration, and ProcessingInstruction events are only created if the document being processed contains a DTD.

Example of Event Mapping

As an example of how the event iterator API maps an XML stream, consider the following XML document:

<?xml version="1.0"?>
<BookCatalogue xmlns="http://www.publishing.org">
    <Book>
        <Title>Yogasana Vijnana: the Science of Yoga</Title>
        <ISBN>81-40-34319-4</ISBN>
        <Cost currency="INR">11.50</Cost>
    </Book>
</BookCatalogue>

This document would be parsed into eighteen primary and secondary events, as shown in Table 18–3. Note that secondary events, shown in curly braces ({}), are typically accessed from a primary event rather than directly.

Table 18–3 Example of Iterator API Event Mapping

Element/Attribute 

Event 

version="1.0"

StartDocument

isCData = false
data = "\n"
IsWhiteSpace = true

Characters

qname = BookCatalogue:http://www.publishing.org
attributes = null
namespaces = {BookCatalogue" -> http://www.publishing.org"}

StartElement

qname = Book
attributes = null
namespaces = null

StartElement

qname = Title
attributes = null
namespaces = null

StartElement

isCData = false
data = "Yogasana Vijnana: the Science of Yoga\n\t"
IsWhiteSpace = false

Characters

qname = Title
namespaces = null

EndElement

qname = ISBN
attributes = null
namespaces = null

StartElement

isCData = false
data = "81-40-34319-4\n\t"
IsWhiteSpace = false

Characters

10 

qname = ISBN
namespaces = null

EndElement

11 

qname = Cost
attributes = {"currency" -> INR}
namespaces = null

StartElement

12 

isCData = false
data = "11.50\n\t"
IsWhiteSpace = false

Characters

13 

qname = Cost
namespaces = null

EndElement

14 

isCData = false
data = "\n"
IsWhiteSpace = true

Characters

15 

qname = Book
namespaces = null

EndElement

16 

isCData = false
data = "\n"
IsWhiteSpace = true

Characters

17 

qname = BookCatalogue:http://www.publishing.org
namespaces = {BookCatalogue" -> http://www.publishing.org"}

EndElement

18 

 

EndDocument

There are several important things to note in this example: