2 XML Overview

This chapter describes Extensible Markup Language (XML) technology and the WebLogic Server XML subsystem.

This chapter includes the following sections:

What Is XML?
How Do You Describe an XML Document?
Why Use XML?
What Are XSL and XSLT?
What Are DOM and SAX?
What Is the Streaming API for XML (StAX)?
What Is JAXP?
Common Uses of XML and XSLT

What Is XML?

Extensible Markup Language (XML) is a markup language used to describe the content and structure of data in a document. It is a simplified version of Standard Generalized Markup Language (SGML). XML is an industry standard for delivering content on the Internet. Because it provides a facility to define new tags, XML is also extensible.

Like HTML, XML uses tags to describe content. However, rather than focusing on the presentation of content, the tags in XML describe the meaning and hierarchical structure of data. This functionality allows for the sophisticated data types that are required for efficient data interchange between different programs and systems. Further, because XML enables separation of content and presentation, the content, or data, is portable across heterogeneous systems.

The XML syntax uses matching start and end tags (such as <name> and </name>) to mark up information. Information delimited by tags is called an element. Every XML document has a single root element, which is the top-level element that contains all the other elements. Elements that are contained by other elements are often referred to as sub-elements. An element can optionally have attributes, structured as name-value pairs, that are part of the element and are used to further define it.

The following sample XML file describes the contents of an address book:

<?xml version="1.0"?>

<address_book>
  <person gender="f">
    <name>Jane Doe</name>
    <address>
      <street>123 Main St.</street>
      <city>San Francisco</city>
      <state>CA</state>
      <zip>94117</zip>
    </address>
    <phone area_code=415>555-1212</phone>
  </person>
  <person gender="m">
    <name>John Smith</name>
    <phone area_code=510>555-1234</phone>
    <email>johnsmith@somewhere.com</email>
  </person>
</address_book>

The root element of the XML file is address_book. The address book currently contains two entries in the form of person elements: Jane Doe and John Smith. Jane Doe's entry includes her address and phone number; John Smith's includes his phone and email address. Note that the structure of the XML document defines the phone element as storing the area code using the area_code attribute rather than a sub-element in the body of the element. Also note that not all sub-elements are required for the person element.

How Do You Describe an XML Document?

There are two ways to describe an XML document: XML Schemas and DTDs.

XML Schemas define the basic requirements for the structure of a particular XML document. A Schema describes the elements and attributes that are valid in an XML document, and the contexts in which they are valid. In other words, a Schema specifies which tags are allowed within certain other tags, and which tags and attributes are optional. Schemas are themselves XML files.

The schema specification is a product of the World Wide Web Consortium (W3C). For detailed information on XML schemas, see http://www.w3.org/TR/xmlschema-0/.

The following example shows a schema that describes the preceding address book sample XML document:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">

<xsd:complexType name="personType">
  <xsd:element   name="name"     type="xsd:string"/>
  <xsd:element   name="address"  type="addressType"/>
  <xsd:element   name="phone"    type="phoneType"/>
  <xsd:element   name="email"    type="xsd:string"/>
  <xsd:attribute name="gender"   type="xsd:string"/> 
</xsd:complexType>

<xsd:complexType name="addressType">
  <xsd:element   name="street"  type="xsd:string"/>
  <xsd:element   name="city"    type="xsd:string"/>
  <xsd:element   name="state"   type="xsd:string"/>
  <xsd:element   name="zip"     type="xsd:string"/>
</xsd:complexType>

<xsd:simpleType name="phoneType">
  <xsd:restriction base="xsd:string"/>
  <xsd:attribute name="area_code" type="xsd:string"/>
</xsd:simpleType>

</xsd:schema>

You can also describe XML documents using Document Type Definition (DTD) files, a technology older than XML Schemas. DTDs are not XML files.

The following example shows a DTD that describes the preceding address book sample XML document:

<!DOCTYPE address_book [
<!ELEMENT person (name, address?, phone?, email?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (street, city, state, zip)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)> 
<!ATTLIST person gender CDATA #REQUIRED>
<!ATTLIST phone area_code CDATA #REQUIRED>
]>

An XML document can include a Schema or DTD as part of the document itself, reference an external Schema or DTD, or not include or reference a Schema or DTD at all. The following excerpt from an XML document shows how to reference an external DTD called address.dtd:

<?xml version=1.0?>
<!DOCTYPE address_book SYSTEM "address.dtd">
<address_book>
...

XML documents only need to be accompanied by Schema or DTD if they need to be validated by a parser or if they contain complex types. An XML document is considered valid if 1) it has an associated Schema or DTD, and 2) it complies with the constraints expressed in the associated Schema or DTD. If, however, an XML document only needs to be well-formed, then the document does not have to be accompanied by a Schema or DTD. A document is considered well-formed if it follows all the rules in the W3C Recommendation for XML 1.0. For the full XML 1.0 specification, see http://www.w3.org/XML/.

Why Use XML?

An industry typically uses data exchange methods that are meaningful and specific to that industry. With the advent of e-commerce, businesses conduct an increasing number of relationships with a variety of industries and, therefore, must develop expert knowledge of the various protocols used by those industries for electronic communication.

The extensibility of XML makes it a very effective tool for standardizing the format of data interchange among various industries. For example, when message brokers and workflow engines must coordinate transactions among multiple industries or departments within an enterprise, they can use XML to combine data from disparate sources into a format that is understandable by all parties.

What Are XSL and XSLT?

The Extensible Stylesheet Language (XSL) is a W3C standard for describing presentation rules that apply to XML documents. XSL includes both a transformation language, (XSLT), and a formatting language. These two languages function independently of each other. XSLT is an XML-based language and W3C specification that describes how to transform an XML document into another XML document, or into HTML, PDF, or some other document format.

An XSLT transformer accepts as input an XML document and an XSLT document. The template rules contained in an XSLT document include patterns that specify the XML tree to which the rule applies. The XSLT transformer scans the XML document for patterns that match the rule, and then it applies the template to the appropriate section of the original XML document.

What Are DOM and SAX?

DOM and SAX are two standard Java application programming interfaces (APIs) for parsing XML data. Both are supported by the WebLogic Server default parser. The two APIs differ in their approach to parsing, with each API having its strengths and weaknesses.

SAX

SAX stands for the Simple API for XML. It is a platform-independent language neutral standard interface for event-based XML parsing. SAX defines events that can occur as a parser is reading through an XML document, such as the start or the end of an element. Programmers provide handlers to deal with different events as the document is parsed.

Programmers that use the SAX API to parse XML documents have full control over what happens when these events occur and can, as a result, customize the parsing process extensively. For example, a programmer might decide to stop parsing an XML document as soon as the parser encounters an error that indicates that the document is invalid, rather than waiting until the entire document is parsed, thus improving performance.

The WebLogic Server default parser (the parser included in the JDK 5.0) supports SAX Version 2.0. Programmers who have created programs that use Version 1.0 of SAX to parse XML documents should read about the changes between the two versions and update their programs accordingly. For detailed information about the differences between the two versions, refer to http://www.saxproject.org/.

DOM

DOM stands for the Document Object Model. It is platform- and language-neutral interface that allows programs and scripts to access and update the content, structure, and style of XML documents dynamically. DOM reads an XML document into memory and represents it as a tree; each node of the tree represents a particular piece of data from the original XML document. Because the tree structure is a standard programming mechanism for representing data, traversing and manipulating the tree using Java is relatively easy, fast, and efficient. The main drawback, however, is that the entire XML document has to be read into memory for DOM to create the tree, which might decrease the performance of an application as the XML documents get larger.

The WebLogic Server default parser (the parser included in the JDK 5.0) supports DOM Level 2.0 Core. Programmers who have created programs that use Level 1.0 of DOM to parse XML documents should read about the changes between the two versions and update their programs accordingly. For detailed information about the differences, refer to http://www.w3.org/DOM/DOMTR.

What Is the Streaming API for XML (StAX)?

In addition to SAX and DOM, you can also parse and generate an XML document using the Streaming API for XML (StAX).

StAX is Java Community Process specification that describes a bi-directional API for reading and writing XML. StAX gives parsing control to the programmer by exposing a simple iterator-based API and an underlying stream of events; the API includes methods such as next() and hasNext() that allow the programmer to ask for the next event rather than handle the event in a callback. This gives the programmer more procedural control over the processing of the XML document.

Unlike DOM and SAX, StAX is not yet part of the Java API for XML Processing (JAXP).

Note:

Previous versions of WebLogic Server included a similar proprietary API called WebLogic XML Streaming API. This API was a basis for StAX. Although the WebLogic XML Streaming API is still accessible in this release of WebLogic Server, its functionality has been deprecated as of release 9.0 of WebLogic Server. Programmers should use StAX instead.

For detailed information about using StAX, see Chapter 4, "Using the Streaming API for XML (StAX)."

What Is JAXP?

The previous section discusses two APIs, SAX and DOM, that programmers can use to parse XML data. The Java API for XML Processing (JAXP) provides a means to get to these parsers. JAXP also defines a pluggability layer that allows programmers to use any compliant parser or transformer.

WebLogic Server implements JAXP to facilitate XML application development and the work required to move XML applications built on WebLogic Server to other Web application servers. JAXP was developed to make XML applications portable; it provides basic support for parsing and transforming XML documents through a standardized set of Java platform APIs. JAXP 1.2, included in the WebLogic Server distribution, is configured to use the default parser. Therefore, by default, XML applications built using WebLogic Server use JAXP.

The WebLogic Server distribution contains the interfaces and classes needed for JAXP 1.2. JAXP 1.2 contains explicit support for SAX Version 2 and DOM Level 2.

JAXP Packages

JAXP contains the following two packages:

javax.xml.parsers
javax.xml.transform

The javax.xml.parsers package contains the classes to parse XML data in SAX Version 2.0 and DOM Level 2.0 mode. To parse an XML document in SAX mode, a programmer first instantiates a new SaxParserFactory object with the newInstance() method. This method looks up the specific implementation of the parser to load based on a well-defined list of locations. The programmer then obtains a SaxParser instance from the SaxParserFactory and executes its parse() method, passing it the XML document to be parsed. Parsing an XML document in DOM mode is similar, except that the programmer uses the DocumentBuilder and DocumentBuilderFactory classes instead.

For detailed information on using JAXP to parse XML documents, see Parsing XML Documents.

The javax.xml.transform package contains classes to transform XML data, such as an XML document, a DOM tree, or SAX events, into a different format. The transformer classes work similarly to the parser classes. To transform an XML document, a programmer first instantiates a TransformerFactory object with the newInstance() method. This method looks up the specific implementation of the XSLT transformer to load based on a well-defined list of locations. The programmer then instantiates a new Transformer object based on a specific XSLT style sheet and executes its transform() method, passing it the XML object to transform. The XML object might be an XML file, a DOM tree, and so on.

For detailed information on using JAXP to transform XML objects, see Using JAXP to Transform XML Data.

New Feature of JAXP 1.2

Java EE 1.4 includes version 1.2 of JAXP, which is based on the 1.1 version, but adds support for W3C XML Schema. In particular, the 1.2 version specifies new property strings to enable XML Schema validation, specify the schema language being used, and the location of the particular XML Schema file that should be used when validating an XML file.

The following code snippet shows how to set Schema validation when using a SAX parser:

try {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        spf.setNamespaceAware(true);
        spf.setValidating(true);
        SAXParser sp = spf.newSAXParser();
sp.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                       "http://www.w3.org/2001/XMLSchema");
sp.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource",
                       "http://www.example.com/Report.xsd");
        DefaultHandler dh = new DefaultHandler();
        sp.parse("http://www.wombats.com/foo.xml", dh);
    } catch(SAXException se) {
        se.printStackTrace();
    }

For more information, see http://download.oracle.com/javase/6/docs/technotes/guides/xml/jaxp/index.html.

Common Uses of XML and XSLT

How you use XML and XSLT depends on your particular business needs.

Using XML and XSLT to Separate Content from Presentation

XML and XSLT are often used in applications that support multiple client types. For example, suppose you have a Web-based application that supports both browser-based clients and Wireless Application Protocol (WAP) clients. These clients understand different markup languages, HTML and Wireless Markup Language (WML), respectively, but your application must deliver content that is appropriate for both.

To accomplish this goal, you can write your application to first produce an XML document that represents the data it is sending to the client. Then the application can transform the XML document that represents the data into HTML or WML, depending on the client's browser type. Your application can determine the client browser type by examining the User-Agent request header of an HTTP request. Once the application knows the client browser type, it uses the appropriate XSLT style sheet to transform the document into the correct markup language. See the SnoopServlet example included in the examples/servlets directory of your WebLogic Server distribution for an example of how to access this type of header information.

This method of rendering the same XML document using different markup languages in respective client types helps concentrate the effort required to support multiple client types into the development of the appropriate XSLT style sheets. Additionally, it allows your application to adapt to other clients types easily, if necessary.

For additional information about XSLT, see Other XML Specifications and Information.

XML as a Message Format for Business-to-Business Communication

In a business-to-business (B2B) environment, Company A and Company B want to exchange information about e-commerce transactions in which both are involved. Company A is a major e-commerce site. Company B is a small affiliate that sells Company A's products to a niche group of customers. When Company B sends customers to Company A, Company B is compensated in two ways: it receives, from Company A, both money and information about other customers that make the same sort of purchases as those made by the customers referred by Company B. To exchange information, Company A and Company B must agree on a data format for information that is machine readable and that operates with systems from both companies easily. XML is the logical data format to use in this scenario, but selecting this format is only the first step. The companies must then agree on the format of the XML messages to be exchanged. Because Company A has a one-to-many relationship with its affiliates, Company A must define the format of the XML messages that will be exchanged.

To define the format of XML messages, or XML documents, Company A creates two document type definitions (DTDs): one that describes the information that A will provide about customers and one that describes the information that A wants to receive about a newly affiliated company. Company B must also create two DTDs: one to process the XML documents received from Company A and one to prepare an XML document in a format that can be processed by Company A.