An XML Primer
This Appendix contains the following sections:
What is XML?
XML, eXtensible Markup Language, is the standard way to identify and describe data on the web. It is widely implementable and easy to deploy.
XML is a human-readable, machine-understandable, general syntax for describing hierarchical data, applicable to a wide range of applications, databases, e-commerce, Java, web development, searching, and so on.
Custom tags enable the definition, transmission, validation, and interpretation of data between applications and between organizations.
XML elements use start tags (<) and end tags (>). For example, <author> where author, the name of the tag, is enclosed is start and end tags. You can name tags whatever you want.
Attributes add more information about each XML element. Attributes can be used to describe how the data is encoded or represented, to indicate where the links or external resources are located, to identify and call external processes such as applets, servlets, and so on, and to specify element instance in documents so that you can find them rapidly during a document search. Attributes also can provide extra information about the XML document's content or other elements. Attributes are not used to specify fonts, colors, or other style or formatting.
XML attributes can be held in the start tag of a start-end tag pair, or an empty tag. They can be name value pairs. For example, <image="adx10.jpg" ada_txt="XSQL Description"/>. Attributes must always be in quotes.
Attributes and their content are defined in DTDs or XML Schema.
An example of an element in an XML document is <author>charles kopman</author>. The element includes the start, tag, end tag, and text in the middle of the start and end tags.
Every XML document must have a root or top-level element. This is the outermost element and contains all the other elements. You can select any name for your root element. In HTML, the root element was always <html>....</html>.
Entities are virtual storage units that can contain graphics, text, sound files, binary data. in XML entities are represented by character strings. You can create your own entities. Five internal entities are already defined for you to use in XML:
less than sign, < uses <
greater than sign, > uses >
ampersand &, uses &
single quote or apostrophe, ' uses '
double quotation mark, " uses "
Basic Rules for XML Markup
Here are eight basic XML markup rules:
- Declare XML first. The very first line of your XML document must have an XML declaration that states that the XML document complies with the W3C XML recommendation. For example, <?xml version="1.0" standalone="yes" ?>
- Use one top-level tag, or "document element" or root tag. All tags and XML content are contained in (under) this top-level tag.
- Every element must have a start and end tag, for example, <author>charles kopman</author>
- Empty elements must end />.
For example, <author name="charles kopman" />
- Ensure that your elements are well nested in the correct hierarchy.
- All attribute values must be quoted with single or double quotes. For example, <author name = "charles kopman">
- Every XML tag begins with <. Every XML entity begins with & and ends with ;
- Remember the five internal entities, listed in the previous paragraph. See "Entity".
- You can tell the XML Parser which character encoding you are using in the XML declaration at the top of your XML document. For example: <?xml version="1.0" encoding="ISO-8859-9" ?>
For a comprehensive list of encoding, see: http://www.isi.edu/in-notes/iana/assignments/character-sets
W3C XML Recommendations
The World Wide Web Consortium (W3C) XML recommendations are an ever-growing set of interlocking specifications.
- XML 1.0 was recommended by W3C in February 1998. It has resulted numerous additional W3C Working Groups, a Java Platform Extension Expert Group, and the XML conversion of numerous data interchange standards such as Electronic Data Interchange (EDI). The next version of HTML will be an XML application known as xHTML.
- XML Namespaces. Another W3C recommendation aimed at removing element ambiguity in multi-namespace well-formed XML applications.
- XML Query. The W3C standards effort to specify a query language for XML documents.
- XML Schema. The W3C standards effort to add simple and complex datatypes to XML documents and replace the functionality of DTDs with an XML Schema definition XML document.
- XSL. XSL consists of two W3C recommendations:
- XSL Transformations for transforming one XML document into another
- XSL Formatting Objects for specifying the presentation of an XML document
- XPath. XPath is the W3C recommendation that specifies the data model and grammar for navigating an XML document utilized by XSL-T, XLink, and XML Query.
- XPointer. XPointer is the W3C recommendation that specifies the identification of individual entities or fragments within an XML document using XPath navigation. This W3C proposed recommendation is defined at http://www.w3.org/TR/WD-xptr
- DOM. The W3C recommendation that specifies the Document Object Model of an XML Document including APIs for programmatic access.
The XML family of applications is illustrated in Figure A-1.
Figure A-1 The XML Family of Applications ('Including XML-Based Standards')
Text description of the illustration adxml007.gif
The following bullets describe XML features:
- Data Exchange, From Structured to Unstructured Data: XML enables a universal standard syntax for exchanging data. XML specifies a rigorous, text-based way to represent the structure inherent in data so that it can be authored and interpreted unambiguously. Its simple, tag-based approach leverages developers' familiarity of HTML but provides a flexible, extensible mechanism that can handle the gamut of "digital assets" from highly structured database records to unstructured documents and everything in between. " W3C
- SGML Was Designed Specifically for Documents - XML is Designed for Potentially Any Data: The SGML markup language was specifically designed for documents. Web-centric XML is like a toolkit that can be used to write other languages. It is not designed for documents only. Any data that can be described in a tree can be programed in XML.
- A Class of Data Objects - A Restricted Form of SGML: www.oasis-open.org describes XML as follows: "... XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language. By construction, XML documents are conforming SGML documents."
- XML's Many Uses...: A W3C.org press release describes XML as follows: "... XML is primarily intended to meet the requirements of large-scale Web content providers for industry-specific markup, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of Web documents by intelligent clients.
- Metadata. XML is also finding use in certain metadata applications.
- Internationalization. "XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encoding..." Its primary use is for electronic publishing and data interchange..."
- Parsed or Unparsed Storage Entities: From the W3C.org XML specification proposal: "... XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form the character data in the document, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure.
- XML Processor Reads XML Documents. "... XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application...."
- Open Internet Standard. XML is gaining wide industry support from other vendors besides, like IBM, Sun, Microsoft, Netscape, SAP, CISCO and others, as a platform- and application-neutral format for exchanging information.
Although this manual is not intended to expound on XML syntax, a brief overview of some key XML topics is presented here. You can refer to the many excellent resources listed in "Additional XML Resources" for more information on XML syntax.
How XML Differs From HTML
Like HTML, XML is a subset of SGML (Structured Generalized Markup Language), optimized for delivery over the web.
Unlike HTML, which tags elements in web pages for presentation by a browser, for example, <bold>Oracle</bold>, XML tags elements as data, such as, <company>Oracle</company>. You can use XML to give context to words and values in web pages, identifying them as data instead of simple textual or numeric elements.
The following example is in HTML code. This is followed by the corresponding XML example. The examples show employee data:
- Employee number
HTML Example 1
XML Example 1
In the XML code, note the addition of XML data tags and the nested structure of the elements.
HTML Example 2
Consider the following HTML that uses tags to present data in a row of a table. Is "Java Programming" the name of a book? A university course? A job skill? You cannot be sure by looking at the data and tags on the page. Imagine a computer program trying to figure this out!
The analogous XML example has the same data, but the tags indicate what information the data represents, not how it should be displayed. It's clear that "Java Programming" is the Name of a Course, but it says nothing about how it should be displayed.
XML Example 2
XML and HTML both represent information:
- XML represents information content
- HTML represents the presentation of that content
Summary of Differences Between XML and HTML
Figure 29-2 summarizes, how XML differs from HTML.
Table 29-2 XML and HTML Differences
Represents information content
Represents the presentation of the content
Has user-defined tags
Has a fixed set of tags defined by standards.
All start tags must have end tags
Current browsers relax this requirement on tags <P>, <B>, and so on.
Attributes must be single or double quoted
Current browsers relax this requirement on tags
Empty elements are clearly indicated
Current browsers relax this requirement on tags
Element names and attributes are case sensitive
Element names and attributes are not case sensitive.
Presenting XML Using Stylesheets
A key advantage of using XML as a datasource is that its presentation (such as a web page) can be separate from its structure and content.
- Presentation. Applied stylesheets define its presentation. XML data can be presented in various ways, both in appearance and organization, simply by applying different stylesheets.
- Structure and content: XML data defines the structure and content.
Consider these ways of using stylesheets:
- A different interface can be presented to different users based on user profile, browser type, or other criteria by defining a different stylesheet for each presentation style.
- Stylesheets can be used to transform XML data into a format tailored to the specific application that receives and processes the data.
Stylesheets can be applied on the server or client side. The XSL-Transformation Processor (XSL-T Processor) transforms one XML format into XML or any other text-based format such as HTML. Oracle XML Parsers all include an XSL-T Processor.
How to apply stylesheets and use the XSL-T Processor is described in the following sections:
eXtensible Stylesheet Language (XSL)
eXtensible Stylesheet Language (XSL), the stylesheet language of XML is another W3C recommendation. XSL provides for stylesheets that allow you to do the following:
- Transform XML into XML or other text-based formats such as HTML
- Rearrange or filter data
- Convert XML data to XML that conforms with another Document Type Definition (DTD), an important capability for allowing different applications to share data
Cascading Style Sheets (CSS)
Cascading Style Sheets (CSS1), a W3C specification was originally created for use with HTML documents. With CSS you can control the following aspects of your document's appearance:
- Spacing. Element visibility, position, and size
- Colors and background
- Fonts and text
CSS2 was published by W3C in 1998 and includes the following additional features:
- System fonts and colors
- Automatic numbering
- Supports paged media
- Tables and aura stylesheets
'Cascading' here implies that you can apply several stylesheets to any one document. On a web page deploying CSS, for example, three stylesheets can apply or cascade:
- User's preferred stylesheet takes precedence
- Cascading stylesheet
- Browser stylesheet
Extensibility and Document Type Definitions (DTD)
Another key advantage of XML over HTML is that it leaves the specification of the tags and how they can be used to the user. You construct an XML document by creating your own tags to represent the meaning and structure of your data.
Tags may be defined by using them in an XML document or they may be formally defined in a Document Type Definition (DTD). As your data or application requirements change, you can change or add tags to reflect new data contexts or extend existing ones.
The following is a simple DTD for the previous XML example:
<!ELEMENT EMPLIST (EMP)*>
<!ELEMENT EMP (EMPNO, ENAME, JOB, SAL)>
<!ELEMENT EMPNO (#PCDATA)>
<!ELEMENT ENAME (#PCDATA)>
<!ELEMENT JOB (#PCDATA)>
<!ELEMENT SAL (#PCDATA)>
The DOCTYPE declaration is only used when the DTD is embedded in XML code.
Well-Formed and Valid XML Documents
Well-Formed XML Documents
An XML document that conforms to the structural and notational rules of XML is considered well-formed. A well-formed XML document does not have to contain or reference a DTD, but rather can implicitly define its data elements and their relationships. Well-formed XML documents must follow these rules:
- Document must start with the XML declaration, <?xml version="1.0">
- All elements must be contained within one root element
- All elements must be nested in a tree structure without overlapping
- All non-empty elements must have start and end tags
Valid XML Documents
Well-formed XML documents that also conform to a DTD are considered valid. When an XML document containing or referencing a DTD is parsed, the parsing application can verify that the XML conforms to the DTD and is therefore valid, which allows the parsing application to process it with the assurance that all data elements and their content follow rules defined in the DTD.
Why Use XML?
XML, the internet standard for information exchange is useful for the following reasons:
- Solves Data Interchange Problems. It facilitates efficient data communication where the data:
- Is in many different formats and platforms
- It must be sent to different platforms
- Must appear in different formats and presentations
- Must appear on many different end devices
In short, XML solves application data interchange problems. Businesses can now easily communicate with other businesses and workflow components using XML. See Chapters 2 through 20 for more information and examples of how XML solves data interchange problems.
Web-based applications can be built using XML which helps the interoperation of web, database, networking, and middleware. XML provides a structured format for data transmission.
- Industry-Specific Data Objects are Being Designed Using XML. Organizations such as OAG and XML.org are using XML to standardize data objects on a per-industry basis. This will further facilitate business-to-business data interchange.
- Database-Resident Data is Easily Accessed, Converted, and Stored Using XML. Large amounts of business data resides in relational and object-relational tables as the database provides excellent data queriability, scalability and availability. This data can be converted from XML format and stored in object-relational and pure relational database structures or generated from them back to XML for further processing.
Other Advantages of Using XML
Other advantages of using XML include the following:
- You can make your own tags
- Many tools support XML
- XML is an Open standard
- XML parsers built according to the Open standard are interoperable parsers and avoid vendor lock-in. XML specifications are widely industry approved.
- In XML the presentation of data is separate from the data's structure and content. It is simple to customize the data's presentation. See "Presenting XML Using Stylesheets" and "Customizing Your Data Presentation".
- Universality -- XML enables the representation of data in a manner that can be self-describing and thus universally used
- Persistence -- Through the materialization of data as an XML document this data can persist while still allowing programmatic access and manipulation.
- Platform and application independence
Additional XML Resources
Here are some additional resources for information about XML:
- The Oracle XML Handbook, Ben Chang, Mark Scardina, et.al., Oracle Press
- Building Oracle XML Applications, Steve Muench, O'Reilly
- XML Bible, Elliotte Rusty Harold, IDG Books Worldwide
- XML Unleashed, Morrison et al., SAMS
- Building XML Applications, St.Laurent and Cerami, McGraw-Hill
- Building Web Sites with XML, Michael Floyd, Prentice Hall PTR
- Building Corporate Portals with XML, Finkelstein and Aiken, McGraw-Hill
- XML in a Nutshell, O'Reilly
- Learning XML - (Guide to) Creating Self-Describing Data, Ray, O'Reilly
- http://www.w3.org/TR lists W3C technical reports
- http://www.w3.org/xml is the W3C XML activity overview page
- http://www.xml.com includes latest industry news about xml
- http://www.xml-cml.org has information about Chemical Markup Language (CML). CML documents can be viewed and edited on the Jumbo browser.
- http://www.loc.gov/ead/ Encoded Archival Description (EAD) information developed for the US Library of Congress.
- http://www.docuverse.com/xlf for information about Extensible Log Format (XLF) a project to convert log files into XML log files to simplify log file administration.
- http://www.w3.org/Math for information about MathML which provides a way of interchanging equations between applications.
- http://www.naa.org Newspaper Association of America (naa) classified ads format for easy exchange of classified ads.
- http://www.w3.org/AudioVideo/ for information about Synchronized Multimedia Integration Language (SMIL).
- Oracle is an official sponsor of OASIS. OASIS, http://www.oasis-open.org, is the world's largest independent, non-profit organization dedicated to the standardization of XML applications. It promotes participation from all industry, and brings together both competitors and overlapping standards bodies.