| Oracle9i XML Database Developer's Guide - Oracle XML DB Release 2 (9.2) Part Number A96620-01 | 
 | 
This appendix describes introductory information about the W3C XPath Recommendation, Namespace Recommendation, and the Information Set (infoset). It contains the following sections:
XML Path Language (XPath) is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. It can be used as a searching or query language as well as in hypertext linking. Parts of this brief XPath primer are extracted from the W3C XPath Recommendation.
XPath also facilities the manipulation of strings, numbers and booleans.
XPath uses a compact, non-XML syntax to facilitate use of XPath in URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. It gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching, that is, testing whether or not a node matches a pattern. This use of XPath is described in the W3C XSLT Recommendation.
XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes, and text nodes. XPath defines a way to compute a string-value for each type of node. Some types of nodes also have names. XPath fully supports XML Namespaces. Thus, the name of a node is modeled as a pair consisting of a local part and a possibly null namespace URI; this is called an expanded-name. The data model is described in detail in "XPath 1.0 Data Model". A summary of XML Namespaces is provided in "Introducing the W3C Namespaces in XML Recommendation".
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the context is determined for XPath expressions used in XSLT and XPointer respectively. The context consists of the following:
Both XSLT and XPointer extend XPath by defining additional functions; some of these functions operate on the four basic types; others operate on additional data types defined by XSLT and XPointer.
The variable bindings, function library, and namespace declarations used to evaluate a subexpression are always the same as those used to evaluate the containing expression.
The context node, context position, and context size used to evaluate a subexpression are sometimes different from those used to evaluate the containing expression. Several kinds of expressions change the context node; only predicates change the context position and context size. When the evaluation of a kind of expression is described, it will always be explicitly stated if the context node, context position, and context size change for the evaluation of subexpressions; if nothing is said about the context node, context position, and context size, they remain unchanged for the evaluation of subexpressions of that kind of expression.
The grammar specified here applies to the attribute value after XML 1.0 normalization. So, for example, if the grammar uses the character <, this must not appear in the XML source as < but must be quoted according to XML 1.0 rules by, for example, entering it as <.
Within expressions, literal strings are delimited by single or double quotation marks, which are also used to delimit XML attributes. To avoid a quotation mark in an expression being interpreted by the XML processor as terminating the attribute value:
" or ')One important kind of expression is a location path. A location path is the 'route' to be taken. The route can consist of directions and several steps, each step being separated by a '/'.
A location path selects a set of nodes relative to the context node. The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path.
Location paths can recursively contain expressions used to filter sets of nodes. A location path matches the production LocationPath.
Expressions are parsed by first dividing the character string to be parsed into tokens and then parsing the resulting sequence of tokens. Whitespace can be freely used between tokens.
Although location paths are not the most general grammatical construct in the XPath language (a LocationPath is a special case of an Expr), they are the most important construct.
Every location path can be expressed using a straightforward but rather verbose syntax. There are also a number of syntactic abbreviations that allow common cases to be expressed concisely. The next sections:
Table C-1 lists examples of location paths using the unabbreviated syntax.
Table C-2 lists examples of location paths using abbreviated syntax.
The most important abbreviation is that child:: can be omitted from a location step. In effect, child is the default axis. For example, a location path div/para is short for child::div/child::para.
There is also an abbreviation for attributes: attribute:: can be abbreviated to @.
For example, a location path para[@type="warning"] is short for child::para[attribute::type="warning"] and so selects para children with a type attribute with value equal to warning.
// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document (even a para element that is a document element will be selected by //para since the document element node is a child of the root node);
div//para is short for div/descendant-or-self::node()/child::para and so will select all para descendants of div children.
A location step of . is short for self::node(). This is particularly useful in conjunction with //. For example, the location path .//para is short for:
self::node()/descendant-or-self::node()/child::para
and so will select all para descendant elements of the context node.
Similarly, a location step of .. is short for parent::node(). For example, ../title is short for:
parent::node()/child::title
and so will select the title children of the parent of the context node.
AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath
AbbreviatedRelativeLocationPath ::= RelativeLocationPath '//' Step
AbbreviatedStep ::= '.' | '..'
AbbreviatedAxisSpecifier ::= '@'?
There are two kinds of location path:
For example, child::div/child::para selects the para element children of the div element children of the context node, or, in other words, the para element grandchildren that have div parents.
Location path provides a means to search for target nodes. Here is the general syntax for location path:
axisname :: nodetest expr1 expr2 ... LocationPath ::= RelativeLocationPath | AbsoluteLocationPath AbsoluteLocationPath ::= '/' RelativeLocationPath? | AbbreviatedAbsoluteLocationPath RelativeLocationPath ::= Step | RelativeLocationPath '/' Step | AbbreviatedRelativeLocationPath
XPath operates on an XML document as a tree. This section describes how XPath models an XML document as a tree. The relationship of this model to the XML documents operated on by XPath must conform to the XML Namespaces Recommendation.
The tree contains nodes. There are seven types of node:
The root node is the root of the tree. It does not occur except as the root of the tree. The element node for the document element is a child of the root node. The root node also has as children processing instruction and comment nodes for processing instructions and comments that occur in the prolog and after the end of the document element. The string-value of the root node is the concatenation of the string-values of all text node descendants of the root node in document order. The root node does not have an expanded-name.
There is an element node for every element in the document. An element node has an expanded-name computed by expanding the QName of the element specified in the tag in accordance with the XML Namespaces Recommendation. The namespace URI of the element's expanded-name will be null if the QName has no prefix and there is no applicable default namespace.
| Note: In the notation of Appendix A.3 of  | 
The children of an element node are the element nodes, comment nodes, processing instruction nodes and text nodes for its content. Entity references to both internal and external entities are expanded. Character references are resolved. The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.
Unique IDs. An element node may have a unique identifier (ID). This is the value of the attribute that is declared in the DTD as type ID. No two elements in a document may have the same unique ID. If an XML processor reports two elements in a document as having the same unique ID (which is possible only if the document is invalid) then the second element in document order must be treated as not having a unique ID.
Character data is grouped into text nodes. As much character data as possible is grouped into each text node: a text node never has an immediately following or preceding sibling that is a text node. The string-value of a text node is the characterdata. A text node always has at least one character of data. Each character within a CDATA section is treated as character data. Thus, <![CDATA[<]]> in the source document will treated the same as <. Both will result in a single < character in a text node in the tree. Thus, a CDATA section is treated as if the <![CDATA[ and ]]> were removed and every occurrence of < and & were replaced by < and & respectively.
Each element node has an associated set of attribute nodes; the element is the parent of each of these attribute nodes; however, an attribute node is not a child of its parent element.
| Note: This is different from the DOM, which does not treat the element bearing an attribute as the parent of the attribute. | 
Elements never share attribute nodes: if one element node is not the same node as another element node, then none of the attribute nodes of the one element node will be the same node as the attribute nodes of another element node.
A defaulted attribute is treated the same as a specified attribute. If an attribute was declared for the element type in the DTD, but the default was declared as #IMPLIED, and the attribute was not specified on the element, then the element's attribute set does not contain a node for the attribute.
Some attributes, such as xml:lang and xml:space, have the semantics that they apply to all elements that are descendants of the element bearing the attribute, unless overridden with an instance of the same attribute on another descendant element. However, this does not affect where attribute nodes appear in the tree: an element has attribute nodes only for attributes that were explicitly specified in the start-tag or empty-element tag of that element or that were explicitly declared in the DTD with a default value.
An attribute node has an expanded-name and a string-value. The expanded-name is computed by expanding the QName specified in the tag in the XML document in accordance with the XML Namespaces Recommendation. The namespace URI of the attribute's name will be null if the QName of the attribute does not have a prefix.
An attribute node has a string-value. The string-value is the normalized value as specified by the XML Recommendation. An attribute whose normalized value is a zero-length string is not treated specially: it results in an attribute node whose string-value is a zero-length string.
There are no attribute nodes corresponding to attributes that declare namespaces.
Each element has an associated set of namespace nodes, one for each distinct namespace prefix that is in scope for the element (including the xml prefix, which is implicitly declared by the XML Namespaces Recommendation) and one for the default namespace if one is in scope for the element. The element is the parent of each of these namespace nodes; however, a namespace node is not a child of its parent element.
Elements never share namespace nodes: if one element node is not the same node as another element node, then none of the namespace nodes of the one element node will be the same node as the namespace nodes of another element node. This means that an element will have a namespace node:
xmlns:;xmlns attribute, if the element or some ancestor has an xmlns attribute, and the value of the xmlns attribute for the nearest such element is non-empty 
A namespace node has an expanded-name: the local part is the namespace prefix (this is empty if the namespace node is for the default namespace); the namespace URI is always NULL.
The string-value of a namespace node is the namespace URI that is being bound to the namespace prefix; if it is relative, it must be resolved just like a namespace URI in an expanded-name.
There is a processing instruction node for every processing instruction, except for any processing instruction that occurs within the document type declaration. A processing instruction has an expanded-name: the local part is the processing instruction's target; the namespace URI is NULL. The string-value of a processing instruction node is the part of the processing instruction following the target and any whitespace. It does not include the terminating ?>.
| Note: The XML declaration is not a processing instruction. Therefore, there is no processing instruction node corresponding to the XML declaration. | 
There is a comment node for every comment, except for any comment that occurs within the document type declaration. The string-value of comment is the content of the comment not including the opening <!-- or the closing -->. A comment node does not have an expanded-name.
For every type of node, there is a way of determining a string-value for a node of that type. For some types of node, the string-value is part of the node; for other types of node, the string-value is computed from the string-value of descendant nodes.
| Note: For element nodes and root nodes, the string-value of a node is not the same as the string returned by the DOM nodeValue method. | 
Some types of node also have an expanded-name, which is a pair consisting of:
Two expanded-names are equal if they have the same local part, and either both have a null namespace URI or both have non-null namespace URIs that are equal.
There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node.
Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities). The attribute nodes and namespace nodes of an element occur before the children of the element. The namespace nodes are defined to occur before the attribute nodes.
The relative order of namespace nodes is implementation-dependent.
The relative order of attribute nodes is implementation-dependent.
Reverse document order is the reverse of document order.
Root nodes and element nodes have an ordered list of child nodes. Nodes never share children: if one node is not the same node as another node, then none of the children of the one node will be the same node as any of the children of another node.
Every node other than the root node has exactly one parent, which is either an element node or the root node. A root node or an element node is the parent of each of its child nodes. The descendants of a node are the children of the node and the descendants of the children of the node.
XPath 2.0 is the result of joint work by the W3C XSL and XML Query Working Groups. XPath 2.0 is a language derived from both XPath 1.0 and XQuery. The XPath 2.0 and XQuery 1.0 Working Drafts are generated from a common source. These languages are closely related and share much of the same expression syntax and semantics. The two Working Drafts in places are identical.
XPath is designed to be embedded in a host language such as XSLT or XQuery. XPath has a natural subset that can be used for matching, that is, testing whether or not a node matches a pattern.
XQuery Version 1.0 contains XPath Version 2.0 as a subset. Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages.
XPath also depends on and is closely related to the following specifications:
The basic building block of XPath is the expression. The language provides several kinds of expressions which may be constructed from keywords, symbols, and operands. In general, the operands of an expression are other expressions.
XPath is a functional language which allows various kinds of expressions to be nested with full generality. It is also a strongly-typed language in which the operands of various expressions, operators, and functions must conform to designated types.
Like XML, XPath is a case-sensitive language. All keywords in XPath use lower-case characters.
Expr ::= OrExpr | AndExpr | ForExpr | QuantifiedExpr | IfExpr | GeneralComp | ValueComp | NodeComp | OrderComp | InstanceofExpr | RangeExpr | AdditiveExpr | MultiplicativeExpr | UnionExpr | IntersectExceptExpr | UnaryExpr | CastExpr | PathExpr
Software modules must recognize tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.
Document constructs should have universal names, whose scope extends beyond their containing document. The W3C Namespaces in XML Recommendation describes the mechanism, XML namespaces, which accomplishes this.
An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names. XML namespaces differ from the "namespaces" conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set. These issues are discussed in the W3C Namespace Recommendation, appendix, "A. The Internal Structure of XML Namespaces".
URI references which identify namespaces are considered identical when they are exactly the same character-for-character. Note that URI references which are not identical in this sense may in fact be functionally equivalent. Examples include URI references which differ only in case, or which are in external entities which have different effective base URIs.
Names from XML namespaces may appear as qualified names, which contain a single colon, separating the name into a namespace prefix and a local part.
The prefix, which is mapped to a URI reference, selects a namespace. The combination of the universally managed URI namespace and the document's own namespace produces identifiers that are universally unique. Mechanisms are provided for prefix scoping and defaulting.
URI references can contain characters not allowed in names, so cannot be used directly as namespace prefixes. Therefore, the namespace prefix serves as a proxy for a URI reference. An attribute-based syntax described in the following section is used to declare the association of the namespace prefix with a URI reference; software which supports this namespace proposal must recognize and act on these declarations and prefixes.
Many of the nonterminals in the productions in this specification are defined not here but in the W3C XML Recommendation. When nonterminals defined here have the same names as nonterminals defined in the W3C XML Recommendation, the productions here in all cases match a subset of the strings matched by the corresponding ones there.
In this document's productions, the NSC is a "Namespace Constraint", one of the rules that documents conforming to this specification must follow.
All Internet domain names used in examples, with the exception of w3.org, are selected at random and should not be taken as having any import.
A namespace is declared using a family of reserved attributes. Such an attribute's name must either be xmlns or have xmlns: as a prefix. These attributes, like any other XML attributes, can be provided directly or by default.
[1] NSAttName ::= PrefixedAttName | DefaultAttName [2] PrefixedAttName ::= 'xmlns:' NCName [NSC: Leading "XML" ] [3] DefaultAttName ::= 'xmlns' [4] NCName ::= (Letter | '_') (NCNameChar)* /* An XML Name, minus the ":" */ [5] NCNameChar ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender
The attribute's value, a URI reference, is the namespace name identifying the namespace. The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists). An example of a syntax that is designed with these goals in mind is that for Uniform Resource Names [RFC2141]. However, it should be noted that ordinary URLs can be managed in such a way as to achieve these same goals.
If the attribute name matches PrefixedAttName, then the NCName gives the namespace prefix, used to associate element and attribute names with the namespace name in the attribute value in the scope of the element to which the declaration is attached. In such declarations, the namespace name may not be empty.
If the attribute name matches DefaultAttName, then the namespace name in the attribute value is that of the default namespace in the scope of the element to which the declaration is attached. In such a default declaration, the attribute value may be empty. Default namespaces and overriding of declarations are discussed in section "Applying Namespaces to Elements and Attributes" of the W3C Namespace Recommendation.
The following example namespace declaration associates the namespace prefix edi with the namespace name http://ecommerce.org/schema:
<x xmlns:edi='http://ecommerce.org/schema'> <!-- the "edi" prefix is bound to http://ecommerce.org/schema for the "x" element and contents --> </x>
Prefixes beginning with the three-letter sequence x, m, l, in any case combination, are reserved for use by XML and XML-related specifications.
In XML documents conforming to the W3C Namespace Recommendation, some names (constructs corresponding to the nonterminal Name) may be given as qualified names, defined as follows:
[6] QName ::= (Prefix ':')? LocalPart [7] Prefix ::= NCName [8] LocalPart::= NCName
The Prefix provides the namespace prefix part of the qualified name, and must be associated with a namespace URI reference in a namespace declaration.
The LocalPart provides the local part of the qualified name. Note that the prefix functions only as a placeholder for a namespace name. Applications should use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document.
In XML documents conforming to the W3C Namespace Recommendation, element types are given as qualified names, as follows:
[9] STag ::= '<' QName (S Attribute)* S? '>' [NSC: Prefix Declared ] [10] ETag::= '</' QName S? '>'[NSC: Prefix Declared ] [11] EmptyElemTag ::= '<' QName (S Attribute)* S? '/>' [NSC: Prefix Declared ]
The following is an example of a qualified name serving as an element type:
<x xmlns:edi='http://ecommerce.org/schema'> <!-- the 'price' element's namespace is http://ecommerce.org/schema --> <edi:price units='Euro'>32.18</edi:price> </x>
Attributes are either namespace declarations or their names are given as qualified names:
[12] Attribute::= NSAttName Eq AttValue|QName Eq AttValue [NSC:Prefix Declared]
The following is an example of a qualified name serving as an attribute name:
<x xmlns:edi='http://ecommerce.org/schema'> <!-- the 'taxClass' attribute's namespace is http://ecommerce.org/schema --> <lineItem edi:taxClass="exempt">Baby food</lineItem> </x>
The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an an ancestor element, that is, an element in whose content the prefixed markup occurs:
The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace.
The prefix xmlns is used only for namespace bindings and is not itself bound to any namespace name.
This constraint may lead to operational difficulties in the case where the namespace declaration attribute is provided, not directly in the XML document entity, but through a default attribute declared in an external entity. Such declarations may not be read by software which is based on a non-validating XML processor.
Many XML applications, presumably including namespace-sensitive ones, fail to require validating processors. For correct operation with such applications, namespace declarations must be provided either directly or through default attributes declared in the internal subset of the DTD.
Element names and attribute types are also given as qualified names when they appear in declarations in the DTD:
[13] doctypedecl::= '<!DOCTYPE' S QName (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>' [14] elementdecl::= '<!ELEMENT' S QName S contentspec S? '>' [15] cp ::= (QName | choice | seq) ('?' | '*' | '+')? [16] Mixed ::= '(' S? '#PCDATA' (S? '|' S? QName)* S? ')*' | '(' S? '#PCDATA' S? ')' [17] AttlistDecl::= '<!ATTLIST' S QName AttDef* S? '>' [18] AttDef ::= S (QName | NSAttName) S AttType S DefaultDecl
The namespace declaration is considered to apply to the element where it is specified and to all elements within the content of that element, unless overridden by another namespace declaration with the same NSAttName part:
<?xml version="1.0"?> <!-- all elements here are explicitly in the HTML namespace --> <html:html xmlns:html='http://www.w3.org/TR/REC-html40'> <html:head><html:title>Frobnostication</html:title></html:head> <html:body><html:p>Moved to <html:a href='http://frob.com'>here.</html:a></html:p></html:body> </html:html>
Multiple namespace prefixes can be declared as attributes of a single element, as shown in this example:
<?xml version="1.0"?> <!-- both namespace prefixes are available throughout --> <bk:book xmlns:bk='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6'> <bk:title>Cheaper by the Dozen</bk:title> <isbn:number>1568491379</isbn:number> </bk:book>
A default namespace is considered to apply to the element where it is declared (if that element has no namespace prefix), and to all elements with no prefix within the content of that element. If the URI reference in a default namespace declaration is empty, then unprefixed elements in the scope of the declaration are not considered to be in any namespace. Note that default namespaces do not apply directly to attributes.
<?xml version="1.0"?> <!-- elements are in the HTML namespace, in this case by default --> <html xmlns='http://www.w3.org/TR/REC-html40'> <head><title>Frobnostication</title></head> <body><p>Moved to <a href='http://frob.com'>here</a>.</p></body> </html> <?xml version="1.0"?> <!-- unprefixed element types are from "books" --> <book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6'> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> </book>
A larger example of namespace scoping:
<?xml version="1.0"?> <!-- initially, the default namespace is "books" --> <book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6'> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> <notes> <!-- make HTML the default namespace for some commentary --> <p xmlns='urn:w3-org-ns:HTML'> This is a <i>funny</i> book! </p> </notes> </book>
The default namespace can be set to the empty string. This has the same effect, within the scope of the declaration, of there being no default namespace.
<?xml version='1.0'?> <Beers> <!-- the default namespace is now that of HTML --> <table xmlns='http://www.w3.org/TR/REC-html40'> <th><td>Name</td><td>Origin</td><td>Description</td></th> <tr> <!-- no default namespace inside table cells --> <td><brandName xmlns="">Huntsman</brandName></td> <td><origin xmlns="">Bath, UK</origin></td> <td> <details xmlns=""><class>Bitter</class><hop>Fuggles</hop> <pro>Wonderful hop, light alcohol, good summer beer</pro> <con>Fragile; excessive variance pub to pub</con> </details> </td> </tr> </table> </Beers>
In XML documents conforming to this specification, no tag may contain two attributes which:
For example, each of the bad start-tags is illegal in the following:
<!-- http://www.w3.org is bound to n1 and n2 --> <x xmlns:n1="http://www.w3.org" xmlns:n2="http://www.w3.org" > <bad a="1" a="2" /> <bad n1:a="1" n2:a="2" /> </x>
However, each of the following is legal, the second because the default namespace does not apply to attribute names:
<!-- http://www.w3.org is bound to n1 and is the default -->
<x xmlns:n1="http://www.w3.org" xmlns="http://www.w3.org" > <good a="1" b="2" /> <good a="1" n1:a="2" /> </x>
In XML documents which conform to the W3C Namespace Recommendation, element types and attribute names must match the production for QName and must satisfy the "Namespace Constraints".
An XML document conforms to this specification if all other tokens in the document which are required, for XML conformance, to match the XML production for Name, match this specification's production for NCName.
The effect of conformance is that in such a document:
Strictly speaking, attribute values declared to be of types ID, IDREF(S), ENTITY(IES), and NOTATION are also Names, and thus should be colon-free.
However, the declared type of attribute values is only available to processors which read markup declarations, for example validating processors. Thus, unless the use of a validating processor has been specified, there can be no assurance that the contents of attribute values have been checked for conformance to this specification.
The following W3C Namespace Recommendation Appendixes are not included in this primer:
The W3C XML Information Set specification defines an abstract data set called the XML Information Set (Infoset). It provides a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document.
The primary criterion for inclusion of an information item or property has been that of expected usefulness in future specifications. It does not constitute a minimum set of information that must be returned by an XML processor.
An XML document has an information set if it is well-formed and satisfies the namespace constraints described in the following section.
There is no requirement for an XML document to be valid in order to have an information set.
Information sets may be created by methods (not described in this specification) other than parsing an XML document. See "Synthetic Infosets".
An XML document's information set consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others. An information item is an abstract description of some part of an XML document: each information item has a set of associated named properties. In this specification, the property names are shown in square brackets, [thus]. The types of information item are listed in section 2.
The XML Information Set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a modified tree for the sake of clarity and simplicity, but there is no requirement that the XML Information Set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces, are also capable of providing information conforming to the XML Information Set.
The terms "information set" and "information item" are similar in meaning to the generic terms "tree" and "node", as they are used in computing. However, the former terms are used in this specification to reduce possible confusion with other specific data models. Information items do not map one-to-one with the nodes of the DOM or the "tree" and "nodes" of the XPath data model.
In this specification, the words "must", "should", and "may" assume the meanings specified in [RFC2119], except that the words do not appear in uppercase.
XML 1.0 documents that do not conform to the W3C Namespace Recommendation, though technically well-formed, are not considered to have meaningful information sets. That is, this specification does not define an information set for documents that have element or attribute names containing colons that are used in other ways than as prescribed by the W3C Namespace Recommendation.
Also, the XML Infoset specification does not define an information set for documents which use relative URI references in namespace declarations. This is in accordance with the decision of the W3C XML Plenary Interest Group described in Relative Namespace URI References in the W3C Namespace Recommendation.
The value of a namespace name property is the normalized value of the corresponding namespace attribute; no additional URI escaping is applied to it by the processor.
An information set describes its XML document with entity references already expanded, that is, represented by the information items corresponding to their replacement text. However, there are various circumstances in which a processor may not perform this expansion. An entity may not be declared, or may not be retrievable. A non-validating processor may choose not to read all declarations, and even if it does, may not expand all external entities. In these cases an unexpanded entity reference information item is used to represent the entity reference.
The values of all properties in the Infoset take account of the end-of-line normalization described in the XML Recommnedation, 2.11 "End-of-Line Handling".
Several information items have a base URI or declaration base URI property. These are computed according to XML Base. Note that retrieval of a resource may involve redirection at the parser level (for example, in an entity resolver) or at a lower level; in this case the base URI is the final URI used to retrieve the resource after all redirection.
The value of these properties does not reflect any URI escaping that may be required for retrieval of the resource, but it may include escaped characters if these were specified in the document, or returned by a server in the case of redirection.
In some cases (such as a document read from a string or a pipe) the rules in XML Base may result in a base URI being application dependent. In these cases this specification does not define the value of the base URI or declaration base URI property.
When resolving relative URIs the base URI property should be used in preference to the values of xml:base attributes; they may be inconsistent in the case of Synthetic Infosets.
Some properties may sometimes have the value unknown or no value, and it is said that a property value is unknown or that a property has no value respectively. These values are distinct from each other and from all other values. In particular they are distinct from the empty string, the empty set, and the empty list, each of which simply has no members. This specification does not use the term null since in some communities it has particular connotations which may not match those intended here.
This specification describes the information set resulting from parsing an XML document. Information sets may be constructed by other means, for example by use of an API such as the DOM or by transforming an existing information set.
An information set corresponding to a real document will necessarily be consistent in various ways; for example the in-scope namespaces property of an element will be consistent with the [namespace attributes] properties of the element and its ancestors. This may not be true of an information set constructed by other means; in such a case there will be no XML document corresponding to the information set, and to serialize it will require resolution of the inconsistencies (for example, by outputting namespace declarations that correspond to the namespaces in scope).
| 
 |  Copyright © 2002 Oracle Corporation. All Rights Reserved. | 
 |