XML Overview

The following sections provide an overview of XML technology and the WebLogic Server XML subsystem:

Extensible Markup Language (XML) is a markup language used to describe the content and structure of data in a document. It is a simplified version of Standard Generalized Markup Language (SGML). XML is an industry standard for delivering content on the Internet. Because it provides a facility to define new tags, XML is also extensible.

Like HTML, XML uses tags to describe content. However, rather than focusing on the presentation of content, the tags in XML describe the meaning and hierarchical structure of data. This functionality allows for the sophisticated data types that are required for efficient data interchange between different programs and systems. Further, because XML enables separation of content and presentation, the content, or data, is portable across heterogeneous systems.

The XML syntax uses matching start and end tags (such as <name> and </name>) to mark up information. Information delimited by tags is called an element. Every XML document has a single root element, which is the top-level element that contains all the other elements. Elements that are contained by other elements are often referred to as sub-elements. An element can optionally have attributes, structured as name-value pairs, that are part of the element and are used to further define it.

The root element of the XML file is address_book. The address book currently contains two entries in the form of person elements: Jane Doe and John Smith. Jane Doe's entry includes her address and phone number; John Smith's includes his phone and email address. Note that the structure of the XML document defines the phone element as storing the area code using the area_code attribute rather than a sub-element in the body of the element. Also note that not all sub-elements are required for the person element.

Document Type Definitions (DTDs) define the basic requirements for the structure of a particular XML document. A DTD describes the elements and attributes that are valid in an XML document, and the contexts in which they are valid. In other words, a DTD specifies which tags are allowed within certain other tags, and which tags and attributes are optional.

The following example shows a DTD that describes the preceding address book sample XML document:

Schemas are a recent development in XML specifications and are intended to supersede DTDs. They describe XML documents with more flexibility and detail than DTDs do, and are XML documents themselves, which DTDs are not. The schema specification, currently under development, is a product of the World Wide Web Consortium (W3C) and is intended to address many limitations of DTDs. For detailed information on XML schemas, see http://www.w3.org/TR/xmlschema-0/.

The following example shows a schema that describes the preceding address book sample XML document:

An XML document can include a DTD or Schema as part of the document itself, reference an external DTD or Schema using the DOCTYPE declaration, or not include or reference a DTD or Schema at all. The following excerpt from an XML document shows how to reference an external DTD called address.dtd:

XML documents only need to be accompanied by a DTD or Schema if they need to be validated by a parser or if they contain complex types. An XML document is considered valid if 1) it has an associated DTD or Schema, and 2) it complies with the constraints expressed in the associated DTD or Schema. If, however, an XML document only needs to be well-formed, then the document does not have to be accompanied by a DTD or Schema. A document is considered well-formed if it follows all the rules in the W3C Recommendation for XML 1.0. For the full XML 1.0 specification, see http://www.w3.org/XML/.

An industry typically uses data exchange methods that are meaningful and specific to that industry. With the advent of e-commerce, businesses conduct an increasing number of relationships with a variety of industries and, therefore, must develop expert knowledge of the various protocols used by those industries for electronic communication.

The extensibility of XML makes it a very effective tool for standardizing the format of data interchange among various industries. For example, when message brokers and workflow engines must coordinate transactions among multiple industries or departments within an enterprise, they can use XML to combine data from disparate sources into a format that is understandable by all parties.

The Extensible Stylesheet Language (XSL) is a W3C standard for describing presentation rules that apply to XML documents. XSL includes both a transformation language, (XSLT), and a formatting language. These two languages function independently of each other. XSLT is an XML-based language and W3C specification that describes how to transform an XML document into another XML document, or into HTML, PDF, or some other document format.

An XSLT transformer accepts as input an XML document and an XSLT document. The template rules contained in an XSLT document include patterns that specify the XML tree to which the rule applies. The XSLT transformer scans the XML document for patterns that match the rule, and then it applies the template to the appropriate section of the original XML document.

DOM and SAX are two standard Java application programming interfaces (APIs) for parsing XML data. Both are supported by the WebLogic Server built-in parser. The two APIs differ in their approach to parsing, with each API having its strengths and weaknesses.

SAX stands for the Simple API for XML. It is a platform-independent language neutral standard interface for event-based XML parsing. SAX defines events that can occur as a parser is reading through an XML document, such as the start or the end of an element. Programmers provide handlers to deal with different events as the document is parsed.

Programmers that use the SAX API to parse XML documents have full control over what happens when these events occur and can, as a result, customize the parsing process extensively. For example, a programmer might decide to stop parsing an XML document as soon as the parser encounters an error that indicates that the document is invalid, rather than waiting until the entire document is parsed, thus improving performance.

The WebLogic Server built-in parser (Apache Xerces) supports SAX Version 2.0. Programmers who have created programs that use Version 1.0 of SAX to parse XML documents should read about the changes between the two versions and update their programs accordingly. For detailed information about the differences between the two versions, refer to http://www.saxproject.org/.

DOM stands for the Document Object Model. It is platform- and language-neutral interface that allows programs and scripts to access and update the content, structure, and style of XML documents dynamically. DOM reads an XML document into memory and represents it as a tree; each node of the tree represents a particular piece of data from the original XML document. Because the tree structure is a standard programming mechanism for representing data, traversing and manipulating the tree using Java is relatively easy, fast, and efficient. The main drawback, however, is that the entire XML document has to be read into memory for DOM to create the tree, which might decrease the performance of an application as the XML documents get larger.

The WebLogic Server built-in parser (Apache Xerces) supports DOM Level 2.0 Core. Programmers who have created programs that use Level 1.0 of DOM to parse XML documents should read about the changes between the two versions and update their programs accordingly. For detailed information about the differences, refer to http://www.w3.org/DOM/DOMTR.

In addition to SAX and DOM, you can also parse an XML document using the XML streaming API.

The WebLogic XML Streaming API provides an easy and intuitive way to parse and generate XML documents. It is based upon the SAX API, but enables a procedural, stream-based handling of XML documents rather than requiring you to write SAX event handlers, which can get complicated when you work with complex XML documents. In other words, the streaming API gives you more control over parsing than the SAX API.

Note: Unlike DOM and SAX, XML Streaming is not yet part of the Java API for XML Processing (JAXP).

The previous section discusses two APIs, SAX and DOM, that programmers can use to parse XML data. The Java API for XML Processing (JAXP) provides a means to get to these parsers. JAXP also defines a pluggability layer that allows programmers to use any compliant parser or transformer.

WebLogic Server implements JAXP to facilitate XML application development and the work required to move XML applications built on WebLogic Server to other Web application servers. JAXP was developed by Sun Microsystems to make XML applications portable; it provides basic support for parsing and transforming XML documents through a standardized set of Java platform APIs. JAXP 1.1, included in the WebLogic Server distribution, is configured to use the built-in parser. Therefore, by default, XML applications built using WebLogic Server use JAXP.

The WebLogic Server distribution contains the interfaces and classes needed for JAXP 1.1. JAXP 1.1 contains explicit support for SAX Version 2 and DOM Level 2. The Javadoc for JAXP is included with the WebLogic Server online reference documentation.

The javax.xml.parsers package contains the classes to parse XML data in SAX Version 2.0 and DOM Level 2.0 mode. To parse an XML document in SAX mode, a programmer first instantiates a new SaxParserFactory object with the newInstance() method. This method looks up the specific implementation of the parser to load based on a well-defined list of locations. The programmer then obtains a SaxParser instance from the SaxParserFactory and executes its parse() method, passing it the XML document to be parsed. Parsing an XML document in DOM mode is similar, except that the programmer uses the DocumentBuilder and DocumentBuilderFactory classes instead.

For detailed information on using JAXP to parse XML documents, see Parsing XML Documents.

The javax.xml.transform package contains classes to transform XML data, such as an XML document, a DOM tree, or SAX events, into a different format. The transformer classes work similarly to the parser classes. To transform an XML document, a programmer first instantiates a TransformerFactory object with the newInstance() method. This method looks up the specific implementation of the XSLT transformer to load based on a well-defined list of locations. The programmer then instantiates a new Transformer object based on a specific XSLT style sheet and executes its transform() method, passing it the XML object to transform. The XML object might be an XML file, a DOM tree, and so on.

XML and XSLT are often used in applications that support multiple client types. For example, suppose you have a Web-based application that supports both browser-based clients and Wireless Application Protocol (WAP) clients. These clients understand different markup languages, HTML and Wireless Markup Language (WML), respectively, but your application must deliver content that is appropriate for both.

To accomplish this goal, you can write your application to first produce an XML document that represents the data it is sending to the client. Then the application can transform the XML document that represents the data into HTML or WML, depending on the client's browser type. Your application can determine the client browser type by examining the User-Agent request header of an HTTP request. Once the application knows the client browser type, it uses the appropriate XSLT style sheet to transform the document into the correct markup language. See the SnoopServlet example included in the examples/servlets directory of your WebLogic Server distribution for an example of how to access this type of header information.

This method of rendering the same XML document using different markup languages in respective client types helps concentrate the effort required to support multiple client types into the development of the appropriate XSLT style sheets. Additionally, it allows your application to adapt to other clients types easily, if necessary.

In a business-to-business (B2B) environment, Company A and Company B want to exchange information about e-commerce transactions in which both are involved. Company A is a major e-commerce site. Company B is a small affiliate that sells Company A's products to a niche group of customers. When Company B sends customers to Company A, Company B is compensated in two ways: it receives, from Company A, both money and information about other customers that make the same sort of purchases as those made by the customers referred by Company B. To exchange information, Company A and Company B must agree on a data format for information that is machine readable and that operates with systems from both companies easily. XML is the logical data format to use in this scenario, but selecting this format is only the first step. The companies must then agree on the format of the XML messages to be exchanged. Because Company A has a one-to-many relationship with its affiliates, Company A must define the format of the XML messages that will be exchanged.

To define the format of XML messages, or XML documents, Company A creates two document type definitions (DTDs): one that describes the information that A will provide about customers and one that describes the information that A wants to receive about a newly affiliated company. Company B must also create two DTDs: one to process the XML documents received from Company A and one to prepare an XML document in a format that can be processed by Company A.

WebLogic Server consolidates XML technologies applicable to WebLogic Server and XML applications based on WebLogic Server. The WebLogic Server XML subsystem allows customers to use standard parsers, the WebLogic FastParser, XSLT transformers, and DTDs and XML Schemas to process and convert XML files.

Table 1-1 Parsers Included With WebLogic Server

Parser	Description
Built-in	A validating parser based on the Apache Xerces parser version 1.4.4. You can use the built-in parser in either Simple API For XML (SAX) mode or Document Object Model (DOM) mode using the JAXP API.
WebLogic FastParser	A high-performance non-validating XML parser specifically designed for processing small to medium size documents, such as SOAP and WSDL files associated with WebLogic Web services. The FastParser supports SAX-style parsing only. Configure WebLogic Server to use FastParser if your application mostly handles small to medium size (up to 10,000 elements) XML documents. For detailed information on using WebLogic FastParser, refer to Using the WebLogic FastParser.

Parser

Description

Built-in

A validating parser based on the Apache Xerces parser version 1.4.4. You can use the built-in parser in either Simple API For XML (SAX) mode or Document Object Model (DOM) mode using the JAXP API.

WebLogic FastParser

A high-performance non-validating XML parser specifically designed for processing small to medium size documents, such as SOAP and WSDL files associated with WebLogic Web services. The FastParser supports SAX-style parsing only. Configure WebLogic Server to use FastParser if your application mostly handles small to medium size (up to 10,000 elements) XML documents.

For detailed information on using WebLogic FastParser, refer to Using the WebLogic FastParser.

You can also use any other XML parser of your choice by using the Administration Console to configure it in the XML Registry. You can configure a single instance of WebLogic Server to use one parser for a particular application and use another parser for a different application.

WebLogic Server includes a built-in XSLT transformer that is based on the Apache Xalan XSLT transformer version 2.2. You can use this built-in XSLT transformer or other XSLT transformers in your XML application to transform XML documents into other XML documents, HTML, and so on. For more information about transforming XML documents, see Using JAXP to Transform XML Data.

The WebLogic XML Streaming API provides an easy and intuitive way to parse and generate XML documents. It is based upon the SAX API, but provides a more procedural, stream-based handling of XML documents rather than having to write SAX event handlers, which can get complicated when dealing with complex XML documents. In other words, the streaming API gives you more control over parsing than the SAX API.

Java API for XML Processing (JAXP) 1.1 is a Java-standard, parser-independent API for XML. For more information on JAXP, see What Is JAXP?.

Note: WebLogic Server uses the XML Registry, accessed through the Administration Console, to plug in parsers and transformers. This is different from the JAXP 1.1 specification which specifies the use of system properties to plug in parsers and transformers.

Calling the setAttribute (for SAX parsing) and getAttribute (for DOM parsing) methods on a ServletRequest object with the preceding attributes will parse any given XML document.

Note: The setAttribute and getAttribute methods are provided for convenience only; they are not required to parse XML from a Servlet.

The JSP tag library provides a simple tag that enables access to the built-in XSLT transformer from within a Java Server Page (JSP) running on WebLogic Server. Currently, this tag supports the built-in XSLT transformer only; you cannot use the tag to transform an XML document from within a JSP using a different transformer.

The JSP tag library is included in xmlx-tags.jar, which is installed when you install your WebLogic Server distribution.

Note: The JSP tag library is provided for convenience only; it is not required to access XSLT transformers from within a JSP.

The XML Registry simplifies administration and configuration tasks by separating these tasks from the XML application. Use the Administration Console (a graphical user interface, or GUI, for WebLogic Server administration) to configure the parsers and transformers for an instance of WebLogic Server.

Note: Each WebLogic Server domain can include any number of registries; each WebLogic Server instance in a domain can be assigned zero or one registry.

All the preceding capabilities are available if your application uses the standard Java API for XML Processing (JAXP), which is included in this version of WebLogic Server. These capabilities are for use on the server side only.

WebLogic XML supports external entity resolution through the XML Registry. External entities are chunks of text that are not literally part of an XML document, but are referenced inside the XML document. The actual text might reside anywhere - in another file on the same computer or even somewhere on the Web. An example of an external entity is a DTD file that is used to validate an XML document. To use this feature, open the Administration Console and use the XML Registry to enter the Public ID or System ID associated with the external entity.

In addition to storing external entities locally, you can configure WebLogic Server to retrieve and cache external entities from external repositories that support an HTTP interface, such as a URL. You can configure WebLogic Server to cache the external entity in memory or on the disk and specify how long the entity should remain cached before it is considered out of date.

The examples are located in the WL_HOME\samples\server\src\examples\xml directory, where WL_HOME refers to the top-level WebLogic Platform directory.

For detailed instructions on how to build and run the examples, invoke the Web page WL_HOME\samples\server\src\examples\xml\package-summary.html in your browser.

To edit XML files, use the BEA XML Editor, an entirely Java-based XML stand-alone editor. It is a simple, user-friendly tool for creating and editing XML files. It displays XML file contents both as a hierarchical XML tree structure and as raw XML code. Thus you can choose how to edit the XML document:

BEA XML Editor can validate XML code according to a specified DTD or XML schema.

To learn about XML, see the following online courses and tutorials. XML Reference, provides links to more information.