Working with imperative XML processing

With imperative processing, you write RPL programs that walk the DOM tree to find the different types of nodes. The nodes are used as RPL elements to obtain data.

This section uses the DOM tree of the example document and the variable doc created in the previous example.

The examples in this section assume that you put the XML document into the data model as the variable doc. The doc variable corresponds to the root of the DOM tree, document. This section shows an example of how to use the doc variable.

Accessing elements by name

The following example prints the title of the book:

<h1>${doc.book.title}</h1>  

and produces this output:

<h1>Test Book</h1>  

Both doc and book can be used as hashes; you get their child nodes as sub-variables. You describe the path by which you reach the target (element title) in the DOM tree. You might notice that ${doc.book.title} seems to instruct RPL to print the title element itself, but the example prints its child text node. This works because elements are string variables as well as hash variables. The scalar value of an element node is the string resulting from the concatenation of all its text child nodes. However, trying to use an element as a scalar will cause an error if the element has child elements. For example ${doc.book} will produce an error.

The following example prints the titles of the two chapters:

< >${doc.book.chapter[0].title}</ >
< >${doc.book.chapter[1].title}</ >  

Here, since book has 2 chapter element children, doc.book.chapter is a sequence that stores the two element nodes. Thus, we can generalize the above example, so that it works with any number of chapters:

<#list doc.book.chapter as ch>
  < >${ch.title}</ >
</#list>  

When you access an element as a hash sub-variable, it is always a sequence as well as a hash and string. If the sequence contains only one item, then the variable also acts as that item. So, returning to the first example, the following code will print the book title as well:

<h1>${doc.book[0].title[0]}</h1>  

If there is only one book element, and that a book has only one title, you can omit the [0]. ${doc.book.chapter.title} will work as well if the book has only one chapter. If the book has more than one chapter, the previous code will produce an error. If the element book has no chapter child, then doc.book.chapter will be a 0 length sequence, so the code with <#list ...> will work.

It is important to realize that if book has no chapters, then book.chapter is an empty sequence, so doc.book.chapter?? will return true. To check whether a child node exists, use doc.book.chapter[0]?? (or doc.book.chapter?size == 0). You can use any missing value handler operator (e.g. doc.book.author[0]!"Anonymous"), but make sure to include [0].

NOTE: The rule with sequences of size of one is a convenience feature of the XML wrapper (implemented via multi-type RPL variables). It will not work with other sequences.

To finish the example, print all paragraphs of each chapter:

<h1>${doc.book.title}</h1>
<#list doc.book.chapter as ch>
  < >${ch.title}</ >
  <#list ch.para as p>
    <p>${p}
  </#list>
</#list>  

produces this output:

<h1>Test</h1>
  < >Ch1</ >
    <p>p1.1
    <p>p1.2
    <p>p1.3
  < >C </ >
    <p>p2.1
    <p>p2.2  

The above example can also be written as:

<#assign book = doc.book>
<h1>${book.title}</h1>
<#list book.chapter as ch>
  < >${ch.title}</ >
  <#list ch.para as p>
    <p>${p}
  </#list>
</#list>  

Finally, to illustrate general usage of the child selector mechanism, the following example lists all paragraphs of the example XML document:

 
<#list doc.book.chapter.para as p>
  <p>${p}
</#list>  

produces this output:

  <p>p1.1
  <p>p1.2
  <p>p1.3
  <p>p2.1
  <p>p2.2

This example shows that hash sub-variables select the children of a sequence of nodes (in the earlier examples, the sequence had one 1 item). In this case, the sub-variable chapter returns a sequence of size 2 (since there are two chapters), and then sub-variable para selects the para child nodes of all nodes in that sequence.

A negative consequence of this mechanism is that doc.anything.anytingElse will evaluate to an empty sequence and will not produce an error.

Accessing attributes

In this section, the XML is the same as in the previous section, except that it uses attributes for titles instead of elements:

<!-- THIS XML IS USED FOR THE "Accessing attributes" CHAPTER ONLY! -->
<!-- Outside this chapter examples use the XML from earlier.       -->
 
<book title="Test">
  <chapter title="Ch1">
    <para>p1.1</para>
    <para>p1.2</para>
    <para>p1.3</para>
  </chapter>
  <chapter title="C ">
    <para>p2.1</para>
    <para>p2.2</para>
  </chapter>
</book>  

You can access the attributes of an element in the same way as the child elements of an element, except that you use an at sign (@) before the name of the attribute as shown in the following example:

<#assign book = doc.book>
<h1>${book.@title}</h1>
<#list book.chapter as ch>
  < >${ch.@title}</ >
  <#list ch.para as p>
    <p>${p}
  </#list>
</#list>  

This will produce the same output as the previous example.

Getting attributes follows the same logic as getting child elements, so the result of ch.@title above is a sequence of size one. If there were no title attribute, the result would be a sequence of size 0. Using built-ins here is tricky: to find out whether foo has an attribute called bar, you must write foo.@bar[0]??. (foo.@bar)?? is incorrect because it always returns true. Similarly, if you want a default value for the bar attribute, write foo.@bar[0]!"theDefaultValue".

As with child elements, you can select attributes of multiple nodes. For example, the following code prints the titles of all chapters:

<#list doc.book.chapter.@title as t>
  ${t}
</#list>  

Exploring the DOM tree

The following example enumerates all child nodes of the book element:

<#list doc.book?children as c>
- ${c?node_type} <#if c?node_type = 'element'>${c?node_name}</#if>
</#list>  

produces this output:

- text
- element title
- text
- element chapter
- text
- element chapter
- text  

?node_name returns the name of element for element nodes. For other node types, it also returns something, but that's mainly useful for declarative XML processing, which will be discussed later in this chapter.

If the book element had attributes, they would not appear in the above list. You can get a list that contains all attributes of the element, with the sub-variable @@ of the element variable. If you modify the first line of the XML to this:

<book foo="Foo" bar="Bar" baaz="Baaz">  

then the following example:

<#list doc.book.@@ as attr>
- ${attr?node_name} = ${attr}
</#list>  

produces this output:

- baaz = Baaz
- bar = Bar
- foo = Foo  

RPL provides a convenience sub-variable to list only the children of an element. For example:

<#list doc.book.* as c>
- ${c?node_name}
</#list>  

produces this output:

- title
- chapter
- chapter  

To get the parent of an element, use the parent built-in as shown in the following example:

<#assign e = doc.book.chapter[0].para[0]>
<#-- Now e is the first para of the first chapter -->
${e?node_name}
${e?parent?node_name}
${e?parent?parent?node_name}
${e?parent?parent?parent?node_name}  

produces this output:

para
chapter
book
@document  

The last line in the example reaches the root of the DOM tree, the document node. This is not an element, therefore it appears differently in the output.

To return to the document node, use the root built-in as shown in the following example:

<#assign e = doc.book.chapter[0].para[0]>
${e?root?node_name}
${e?root.book.title}  

produces this output:

@document
Test Book  

For a complete list of built-ins you can use to navigate in the DOM tree, see Built-Ins Reference.

Using XPath expressions

If a hash key used with a node variable cannot be interpreted otherwise (see the next section for the precise definition), then it will be interpreted as an XPath expression. For more information about XPath, please visit http://www.w3.org/TR/xpath.

For example, to list the para elements of the chapter called Ch1:

<#list doc["book/chapter[title='Ch1']/para"] as p>
  <p>${p}
</#list>  

produces this output:

  <p>p1.1
  <p>p1.2
  <p>p1.3  

The rule for sequences of length one (explained in an earlier section) applies to XPath results as well. That is, if the resulting sequence contains exactly one node, it also acts as the node itself. For example, to print the first paragraph of chapter Ch1:

${doc["book/chapter[title='Ch1']/para[1]"]}  

The above example produces the same output as:

${doc["book/chapter[title='Ch1']/para[1]"][0]}  

The context node of the XPath expression is the node (or sequence of nodes) whose hash sub-variable is used to issue the XPath expression. Thus, the following example produces the same output as the previous one:

${doc.book["chapter[title='Ch1']/para[1]"]}  

Also note that XPath indexes sequence items from one. Thus, to select the first chapter, the XPath expression is "/book/chapter[1]" as shown in the following example:

<#assign currentTitle = "Ch1">
<#list doc["book/chapter[title=$currentTitle]/para"] as p>
...  

Note that $currentTitle is an XPath expression, not an RPL interpolation, as it is not enclosed in curly braces ({ }).

The result of some XPath expressions is not a node set, but a string, a number, or a boolean. For those XPath expressions, the result is an RPL string, number, or boolean variable, respectively. For example, the following code counts the total number of para elements in the XML document, so the result is a number:

${x["count(//para)"]}  

and the output is:

About XML namespaces

By default, when you write doc.book, RPL selects the element with the name book that does not belong to any XML namespace (similarly to XPath). To select an element inside an XML namespace, you must register a prefix and use that prefix. For example, if the book element is in the XML namespace http://example.com/ebook, you have to associate a prefix with it at the top of the template with the ns_prefixes parameter of the rpl directive as shown in the following example:

<#rpl ns_prefixes={"e":"http://example.com/ebook"}>  

Now, you can write expressions such as doc["e:book"]. Note that in this case, the usage of square brackets ([]) is required.

As the value of ns_prefixes is a hash, you can register multiple prefixes as shown in the following example:

<#rpl ns_prefixes={
    "e":"http://example.com/ebook",
    "f":"http://example.com/form",
    "vg":"http://example.com/vectorGraphics"}

The ns_prefixes parameter affects the entire RPL namespace. This means that the prefixes you have registered in the main template will be visible in all <#include ...>-d templates, but not in <#imported ...>-d templates (often referred  to as RPL libraries). An RPL library can register XML namespace prefixes for its own use, without interfering with prefix registrations of the main template and other libraries.

Note that you can set a default namespace for an input document. In this case, if you do not use a prefix, as in doc.book, RPL selects the element that belongs to the default namespace. You set the default namespace using the reserved prefix D, for example:

<#rpl ns_prefixes={"D":"http://example.com/ebook"}>  

Now, the expression doc.book will select the book element that belongs to the XML namespace http://example.com/ebook.

Note that XPath does not support default namespaces. Thus, in XPath expressions, element names without a prefix always apply to the elements that do not belong to any XML namespace.

However, to access elements in the default namespace, you can use the prefix D, for example:

doc["D:book/D:chapter[title='Ch1']"]

Note that when you use a default namespace, you can select elements that do not belong to any node namespace using the reserved prefix N, for example:

doc.book["N:foo"].

The above example does not work for XPath expressions. The equivalent for an XPath expression is:

doc["D:book/foo"]

Escaping

In HTML, certain characters such as < and & are reserved. This means that to print plain text in HTML output, you must use escaping as shown in the following example:

<#escape x as x?html>
<#assign book = doc.book>
<h1>${book.title}</h1>
<#list book.chapter as ch>
  < >${ch.title}</ >
  <#list ch.para as p>
    <p>${p}
  </#list>
</#list>
</#escape>  

in the HTML output of the above example, the book title Romeo & Julia will be printed correctly:

...
<h1>Romeo &amp; Julia</h1>
...  

 

Formal description

Every variable that corresponds to a single node in the DOM tree is a multi-type variable of type node and type hash. Thus, you can use the node built-ins with them. Hash keys are interpreted as XPath expressions, with the exception of the special keys shown in the table below. Some node variables have a string type as well, so you can use them as string variables (they implement TemplateScalarModel).

Node type (?node_type)

Node name (?node_name)

String value (e.g. <p>${node})

Special hash keys

"document"

"@document"

No string value. (Error when you try to use it as string.)

"elementName", "prefix:elementName", "*", "**", "@@markup", "@@nested_markup", "@@text"

"element"

"name": the name of the element. This is the local name (i.e. name without namespace prefix).

If it has no element children, the text of all text node children concatenated together. Error otherwise, when you try to use it as string.

"elementName", "prefix:elementName", "*", "**", "@attrName", "@prefix:attrName", "@@", "@*", "@@start_tag", "@@end_tag", "@@attributes_markup", "@@markup", "@@nested_markup", "@@text", "@@qname"

"text"

"@text"

The text itself.

"@@markup", "@@nested_markup", "@@text"

"pi"

"@pi$target"

The part between the target name and the ?>.

"@@markup", "@@nested_markup", "@@text"

"comment"

"@comment"

The text of the comment, without the delimiters <!-- and -->.

"@@markup", "@@nested_markup", "@@text"

"attribute"

"name": the name of the attribute. This is the local name (i.e. name without namespace prefix).

The value of the attribute.

"@@markup", "@@nested_markup", "@@text", "@@qname"

"document_type"

"@document_type$name": name is the name of the document element.

No string value. (Error when you try to use it as string.)

"@@markup", "@@nested_markup", "@@text"

NOTES:

  • There is no CDATA type. CDATA nodes are transparently considered as text nodes.

  • Variables do not support ?keys and ?values.

  • Element and attribute node names are local names, that is, they do not contain the namespace prefix. The URI of the namespace to which a node belongs can be queried with the ?node_namespace built-in.

  • Variables are visible with XPath variable references (e.g. doc["book/chapter[title=$currentTitle]"]).

Meaning of special hash keys:

"elementName", "prefix:elementName"
Returns the sequence of child nodes that are elements of elementName. The selection is XML namespace-aware, unless the XML document was parsed with an XML parser that was not namespace-aware. In XML namespace-aware mode, names without a prefix (for example, elementName) select only elements that do not belong to any XML namespace (unless you have registered a default XML namespace), and names with a prefix (for example prefix:elementName) select only elements that belong to the XML namespace denoted by the prefix. You register prefixes and set the default XML namespace using the ns_prefixes parameter of the rpl directive.

"*"
Returns the sequence of all child element nodes. The sequence will contain the elements in the document order, that is, in the order in which the first character of the XML representation of each node occurs (after expansion of general entities).

"**"
Returns the sequence of all descendant element nodes. The sequence will contain the elements in the document order.

"@attName", "@prefix:attrName"
Returns the attribute attName of the element as a sequence of size one that contains the attribute node, or as an empty sequence if the attribute does not exist. To check whether an attribute exists, use foo.@attName[0]??, not foo.@attName??. As with the special key "elementName", if the length of the sequence is 1, then it also acts as its first sub-variable. If no prefix is used, then it returns only the attribute that does not use XML namespace (even if you have set a default XML namespace). If a prefix is used, it returns only the attribute that belongs to the XML namespace associated with the prefix. The registration of prefixes is done with the ns_prefixes parameter of the rpl directive.

"@@" or "@*"
Returns the sequence of attribute nodes belonging to the parent element. This is the same as XPath @*.

"@@qname"
Returns the fully qualified name of the element (such as e:book, in contrast to the local name returned by ?node_name) . The prefix used is chosen based on the prefix registered in the current namespace with the ns_prefixes parameter of the rpl directive, and is not influenced by the prefix used in the source XML document. If you have set a default XML namespace, the prefix D is used for nodes that use that namespace. For nodes that do not belong to an XML namespace, no prefix is used even if a default namespace is set. If there is no prefix registered for the namespace of the node, the result is a non-existent variable (node.@@qname?? is false).

"@@markup"
Returns the full XML markup of a node, as a string. Full XML markup means that it also contains the markup of the child nodes, and the markup of the children of the child nodes, and so on. The markup is not necessary the same as the markup in the source XML file, but it is semantically identical. Note that CDATA sections will be converted to plain text. Also note that depending on how the original XML document is enclosed with RPL, comment or processing instruction nodes might be removed and will be missing from the output. The first outputted start tag will contain xmlns:prefix attributes for each XML namespace used in the outputted XML fragment, and those prefixes will be used in the outputted element and attribute names. These prefixes will be the same as the prefixes registered with the ns_prefixes parameter of the rpl directive (no prefix will be used for D, as it will be registered as the default namespace with an xmlns attribute). If no prefix was assigned for a XML namespace, an arbitrarily chosen unused prefix will be used.

"@@nested_markup"
This is similar to @@markup, but it returns the XML markup of an element without its opening and closing tags. For the document node, it returns the same as @@markup. For other node types, it returns an empty string. Unlike with @@markup, no xmlns:prefix attributes will be placed into the output. The rules regarding the prefixes used in element and attribute names are the same as for @@markup.

"@@text"
Returns the value of all text nodes that occur within the node (all descendant text nodes, not only direct children), concatenated into a single string. If the node has no text node children, the result is an empty string.

"@@start_tag"
Returns the markup of the start tag of the element node. As with @@markup, the output is the semantic equivalent of the original XML document. For XML namespaces (xmlns:prefix attributes in the output, etc.), the rules are the same as for "@@markup".

"@@end_tag"
Returns the markup of the end tag of the element node. As with @@markup, the output is the semantic equivalent of the original XML document.

@@attributes_markup
Returns the markup of the attributes of the element node. As with @@markup, the output is the semantic equivalent of the original XML document.

About node sequences

Many of the special hash keys described in the list above and XPath expressions that result in node sets (see the XPath recommendation) return a sequence of nodes.

If these sequences store only one sub-variable, they also act as the sub-variable. For example, ${book.title[0]} is the same as ${book.title} if the book element has only one title element.

Returning an empty node sequence is common. For example, if the element book has no child element chapter, then book.chapter results in an empty node sequence. Note that this means that an invalid element, for example book.chap will also return an empty node sequence, and will produce an error. Also, book.chaptre?? (note the typo) will return true because the empty sequence exists, so you have to use book.chaptre[0]?? instead.

Node sequences that store 0 or more than 1 node, also support the following hash keys:

"elementName", "prefix:elementName"

"@attrName", "@prefix:attrName"

"@@markup", "@@nested_markup"

"@@text"

"*", "**"

"@@", "@*"

When you apply one of those special keys on a node sequence that contains more than one or zero nodes, the special key is applied for each node in the same way as for single nodes, and the result is concatenated to form the final result. The result is concatenated in the order in which the nodes occur in the node sequence. The nodes are concatenated based on the type of the result. For example, if the special key would return a string for a single node, then the result for multiple nodes is a single string; if the special key would return a sequence for a single node, then result for multiple nodes is a single sequence. If you apply a special key to a sequence with zero nodes, the string result is an empty string or an empty sequence.

Note that you can use XPath expressions with node sequences.

Next steps

Working with declarative XML processing