7
XML Parser for C

This chapter describes the following sections:

Parser APIs

Extensible Markup Language (XML) describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML documents are conforming SGML documents.

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application.

This C implementation of the XML processor (or parser) followed the W3C XML specification (rev REC-xml-19980210) and included the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

The following is the default behavior of this parser:

The character set encoding is UTF-8. If all your documents are ASCII, you are encouraged to set the encoding to US-ASCII for better performance.
Messages are printed to stderr unless msghdlr is given.
A parse tree which can be accessed by DOM APIs is built unless saxcb is set to use the SAX callback APIs. Note that any of the SAX callback functions can be set to NULL if not needed.
The default behavior for the parser is to check that the input is well-formed but not to check whether it is valid. The flag XML_FLAG_VALIDATE can be set to validate the input. The default behavior for whitespace processing is to be fully conformant to the XML 1.0 spec, i.e. all whitespace is reported back to the application but it is indicated which whitespace is ignorable. However, some applications may prefer to set the XML_FLAG_DISCARD_WHITESPACE which will discard all whitespace between an end-element tag and the following start-element tag.

Calling Sequence

Parsing a single document:

xmlinit, xmlparsexxx, xmlterm

Parsing multiple documents, but only the latest document needs to be available:

xmlinit, xmlparsexxx, xmlclean, xmlparsexxx, xmlclean ... xmlterm

Parsing multiple documents, all document data must be available:

xmlinit, xmlparsexxx, xmlparsexxx ... xmlterm

Memory

The memory callback functions specified in memcb may be used if you wish to use your own memory allocation. If they are used, all of the functions should be specified.

The memory allocated for parameters passed to the SAX callbacks or for nodes and data stored with the DOM parse tree will not be freed until one of the following is done:

xmlparsexxx is called to parse another document.
xmlclean is called.
xmlterm is called.

Thread Safety

If threads are forked off somewhere in the midst of the init-parse-terminate sequence of calls, you will get unpredictable behavior and results.

Function/Method Index

xmlinit	XMLParser::xmlinit	Initialize XML parser
xmlclean		Clean up memory used during parse
xmlparse		Parse a document specified by a URL
xmlparsebuf		Parse a document that's resident in memory
xmlparsefile		Parse a document from the filesystem
xmlparsestream		Parse a document from a user-defined stream
xmlterm		Shut down XML parser
createDocument		Create a new document
isStandalone		Return document's standalone flag
isSingleChar		Return single/multibyte encoding flag
getEncoding		Return name of document's encoding

Functions

xmlinit

Purpose

Initializes the XML parser. It must be called before any parsing can take place.

C Prototype

xmlctx *xmlinit(uword *err, const oratext *encoding, 
                void (*msghdlr)(void *msgctx, const oratext *msg,
                                uword errcode),
                void *msgctx, const xmlsaxcb *saxcb, void *saxcbctx, 
                const xmlmemcb *memcb, void *memcbctx, const oratext *lang);

Parameters

err      (OUT)- The error, if any (C only)
encoding (IN) - default character set encoding
msghdlr  (IN) - Error message handler function
msgctx   (IN) - Context for the error message handler
saxcb    (IN) - SAX callback structure filled with function pointers
saxcbctx (IN) - Context for SAX callbacks
memcb    (IN) - Memory function callbacks
memcbctx (IN) - Context for the memory function callbacks
lang     (IN) - Language for error messages

Comments

The C version of this call returns the XML context on success, and sets the user's err argument on error. As usual, a zero error code means success, non-zero indicates a problem.

This function should only be called once before starting the processing of one or more XML files. xmlterm() should be called after all processing of XML files has completed.

Error codes: XMLERR_LEH_INIT, XMLERR_BAD_ENCODING, XMLERR_NLS_INIT, XMLERR_NO_MEMORY, XMLERR_NULL_PTR

For C, all arguments may be NULL except for err. For C++, all arguments have default values and may be omitted if not needed.

By default, the character set encoding is UTF-8. If all your documents are ASCII, you are encouraged to set the encoding to US-ASCII for better performance.

By default, messages are printed to stderr unless msghdlr is given.

By default, a parse tree is built (accessible by DOM APIs) unless saxcb is set (in which case the SAX callback APIs are invoked). Note that any of the SAX callback functions can be set to NULL if not needed.

The memory callback functions memcb may be used if you wish to use your own memory allocation. If they are used, all of the functions should be specified.

The parameters msgctx, saxcbctx, and memcbctx are structures that you may define and use to pass information to your callback routines for the message handler, SAX functions, or memory functions, respectively. They should be set to NULL if your callback functions do not need any additional information passed in to them.

The lang parameter is not used currently and may be set to NULL. It will be used in future releases to determine the language of the error messages.

xmlclean

Purpose

Frees memory used during the previous parse.

Syntax

void xmlclean(xmlctx *ctx);

Parameters

ctx (IN) - The XML parser context

Comments

This function is provided as a convenience for those who want to parse multiple documents using a single context. Before parsing the second and subsequent documents, call xmlclean to release memory used by the previous document.

Note that memory is reused internally after this call. Memory is not returned to the system until xmlterminate.

xmlparse

Purpose

Invokes the XML parser on an input document that is specified by a URL. The parser must have been initialized successfully with a call to xmlinit first.

Syntax

uword xmlparse(xmlctx *ctx, const oratext *url,
               const oratext *encoding, ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
url      (IN) - URL of XML document
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

Flag bits must be OR'd to override the default behavior of the parser. The following flag bits may be set:

XML_FLAG_VALIDATE turns validation on. The default behavior is to not validate the input.
XML_FLAG_DISCARD_WHITESPACE will discard whitespace where it appears to be insignificant. The default behavior for whitespace processing is to be fully conformant to the XML 1.0 spec, i.e. all whitespace is reported back to the application but it is indicated which whitespace is ignorable. However, some applications may prefer to set the XML_FLAG_DISCARD_WHITESPACE which will discard all whitespace between an end-element tag and the following start-element tag.
XML_FLAG_DTD_ONLY tells the parser that the input is an external DTD only, not a complete document.
XML_FLAG_STOP_ON_WARNING makes the parser stop immediately if any validation warnings occur. By default, validation warnings are printed but validation continues.

The memory passed to the SAX callbacks or stored with the DOM parse tree will not be freed until one of the following is done:

xmlparsexxx is called to parse another document.
xmlclean is called.
xmlterm is called.

xmlparsebuf

Purpose

Invokes the XML parser on a document that is resident in memory. The parser must have been initialized successfully with a call to xmlinit first.

Syntax

uword xmlparsebuf(xmlctx *ctx, const oratext *buffer, size_t len,
                  const oratext *encoding, ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
buffer   (IN) - pointer to document in memory
len      (IN) - length of the buffer
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

This function is identical to xmlparse except that input is taken from the user's buffer instead of from a URI, file, etc.

xmlparsefile

Purpose

Invokes the XML parser on a document in the filesystem. The parser must have been initialized successfully with a call to xmlinit first.

Syntax

uword xmlparsefile(xmlctx *ctx, const oratext *path,
                   const oratext *encoding, ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
path     (IN) - filesystem path of document
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

This function is identical to xmlparse except that input is taken from a file in the user's filesystem, instead of from a URL, memory buffer, etc.

xmlparsestream

Purpose

Invokes the XML parser on a document that is to be read from a user-defined stream. The parser must have been initialized successfully with a call to xmlinit first.

Syntax

uword xmlparsestream(xmlctx *ctx, const void *stream,
                     const oratext *encoding, ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
stream   (IN) - pointer to stream or stream context
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

This function is identical to xmlparse except that input is taken from a user-defined stream, instead of from a URL, file, etc. The I/O callback functions for access method XMLACCESS_STREAM must be set up first. The stream (or stream context) pointer will be available in each callback function as the ptr_xmlihdl memory of the ihdl structure. Its meaning and use are user-defined.

xmlterm

Purpose

Terminates the XML parser. It should be called after xmlinit, and before exiting the main program.

Syntax

uword xmlterm(xmlctx *ctx);

Parameters

ctx (IN) - the XML parser context

Comments

This function will free all memory used by the parser and terminates the context, which may not then be reused (a new context must be created if additional parsing is to be done).

createDocument

Purpose

Creates a new document in memory.

Syntax

xmlnode* createDocument(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

This function is used when constructing a new document in memory. An XML document is always rooted in a node of type DOCUMENT_NODE. This function creates that root node and sets it in the context. There can be only one current document and hence only one document node; if one already exists, this function does nothing and returns NULL.

isStandalone

Purpose

Return value of document's standalone flag.

Syntax

boolean isStandalone(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

This function returns the boolean value of the document's standalone flag, as specified in the <?xml?> processing instruction.

isSingleChar

Purpose

Returns a flag which specifies whether the current document is encoded as single-byte characters (i.e. ASCII), or multi-byte characters (e.g. UTF-8).

Syntax

boolean isSingleChar(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

Compare to getEncoding, which returns the actual name of the document's encoding.

getEncoding

Purpose

Returns the name of the current document's character encoding scheme (e.g., "ASCII", "UTF8", etc).

Syntax

oratext *getEncoding(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

Compare to isSingleChar which just returns a boolean flag saying whether the current encoding is single or multi-byte.

XSLT API

XSLT is a language for tranforming XML documents into other XML documents.

XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used indepently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformation that are needed when XSLT is used as part of XSL.

A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree. The result tree is separate from the source tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added.

A transformation expressed in XSLT is called a stylesheet. This is because, in the case when XSLT is transforming into the XSL formatting vocabulary, the transformation functions as a stylesheet.

A stylesheet contains a set of template rules. A template rule has two parts: a pattern which is matched against nodes in the source tree and a template which can be instantiated to form part of the result tree. This allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures.

A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. A template can also contain elements from the XSLT namespace that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template. Note that elements are only processed when they have been selected by the execution of an instruction. The result tree is constructed by finding the template rule for the root node and instantiating its template.

A software module called an XSL processor is used to read XML documents and transform them into other XML documents with different styles.

The C implementation of the XSL processor followed the XSL Transformations standard (version 1.0, November 16, 1999) and included the required behavior of an XSL processor as specified in the XSLT specification.

Functions

xslprocess(xmlctx *docctx, xmlctx *xslctx, xmlctx *resctx, xmlnode **result)

Processes XSL Stylesheet with XML document source and returns success or an error code.

Function Prototypes

xslprocess

Purpose

This function processes an XSL Stylesheet with an XML document source.

Syntax

uword xslprocess(xmlctx *docctx, xmlctx *xslctx,
                 xmlctx *resctx, xmlnode **result);

Parameters

xmlctx (IN/OUT) - The XML document context

xslctx (IN) - The XSL stylesheet context

resctx (IN) - The result document fragment context

result (IN/OUT) - The result document fragment node

W3C SAX APIs

SAX is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list.

There are two major types of XML (or SGML) APIs:

tree-based APIs, and
event-based APIs.

A tree-based API compiles an XML document into an internal tree structure, then allows an application to navigate that tree using the Document Object Model (DOM), a standard tree-based API for XML and HTML documents.

An event-based API, on the other hand, reports parsing events (such as the start and end of elements) directly to the application through callbacks, and does not usually build an internal tree. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.

Tree-based APIs are useful for a wide range of applications, but they often put a great strain on system resources, especially if the document is large (under very controlled circumstances, it is possible to construct the tree in a lazy fashion to avoid some of this problem). Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree.

In both of these cases, an event-based API provides a simpler, lower-level access to an XML document: you can parse documents much larger than your available system memory, and you can construct your own data structures using your callback event handlers.

To use SAX, an xmlsaxcb structure is initialized with function pointers and passed to the xmlinit call. A pointer to a user-defined context structure may also be included; that context pointer will be passed to each SAX function.

The SAX callback structure:

typedef struct
{
   sword (*startDocument)(void *ctx);
   sword (*endDocument)(void *ctx);
   sword (*startElement)(void *ctx, const oratext *name,
                         const struct xmlarray *attrs);
   sword (*endElement)(void *ctx, const oratext *name);
   sword (*characters)(void *ctx, const oratext *ch, size_t len);
   sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len);
   sword (*processingInstruction)(void *ctx, const oratext *target,
                                  const oratext *data);
   sword (*notationDecl)(void *ctx, const oratext *name,
                         const oratext *publicId, const oratext *systemId);
   sword (*unparsedEntityDecl)(void *ctx, const oratext *name,
                               const oratext *publicId,
                               const oratext *systemId,
                               const oratext *notationName);
   sword (*nsStartElement)(void *ctx, const oratext *qname,
                           const oratext *local, const oratext *nsp,
                           const struct xmlnodes *attrs);
} xmlsaxcb;

Data Structures and Types

Callback Functions conforming to the SAX standard:

sword (*characters)(void *ctx, const oratext *ch, size_t len)

Receive notification of character data inside an element.

sword (*endDocument)(void *ctx)

Receive notification of the end of the document.

sword (*endElement)(void *ctx, const oratext *name)

Receive notification of the end of an element.

sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len)

Receive notification of ignorable whitespace in element content.

sword (*notationDecl)(void *ctx, const oratext *name,
                     const oratext *publicId, const oratext *systemId)

Receive notification of a notation declaration.

sword (*processingInstruction)(void *ctx, const oratext *target,
                               const oratext *data)

Receive notification of a processing instruction.

sword (*startDocument)(void *ctx)

Receive notification of the beginning of the document.

sword (*startElement)(void *ctx, const oratext *name,
                      const struct xmlattrs *attrs)

Receive notification of the start of an element.

sword (*unparsedEntityDecl)(void *ctx, const oratext *name,
                            const oratext *publicId, const oratext *systemId,
                            const oratext *notationName)

Receive notification of an unparsed entity declaration.

Non-SAX Callback Functions

sword (*nsStartElement)(void *ctx, const oratext *qname, const oratext *local,
                       const oratext *namespace, const struct xmlattrs *attrs)

Receive notification of the start of a namespace for an element.

Function Prototypes

characters

Purpose

This callback function receives notification of character data inside an element.

Syntax

sword (*characters)(void *ctx, const oratext *ch, size_t len);

Parameters

ctx (IN) - client context pointer

ch (IN) - the characters

len (IN) - number of characters to use from the character pointer

Comments

endDocument

Purpose

This callback function receives notification of the end of the document.

Syntax

sword (*endDocument)(void *ctx);

Parameters

ctx (IN) - client context

Comments

endElement

Purpose

This callback function receives notification of the end of an element.

Syntax

sword (*endElement)(void *ctx, const oratext *name);

Parameters

ctx (IN) - client context

name (IN) - element type name

Comments

ignorableWhitespace

Purpose

This callback function receives notification of ignorable whitespace in element content.

Syntax

sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len);

Parameters

ctx (IN) - client context

ch (IN) - whitespace characters

len (IN) - number of characters to use from the character pointer

Comments

notationDecl

Purpose

This callback function receives notification of a notation declaration.

Syntax

sword (*notationDecl)(void *ctx, const oratext *name, const oratext *publicId,
                      const oratext *systemId);

Parameters

ctx (IN) - client context

name (IN) - notation name

publicId (IN) - notation public identifier, or null if not available

systemId (IN) - notation system identifier

Comments

processingInstruction

Purpose

This callback function receives notification of a processing instruction.

Syntax

sword (*processingInstruction)(void *ctx, const oratext *target,
                               const oratext *data);

Parameters

ctx (IN) - client context

target (IN) - processing instruction target

data (IN) - processing instruction data, or null if none is supplied

Comments

startDocument

Purpose

This callback function receives notification of the beginning of the document.

Syntax

sword (*startDocument)(void *ctx);

Parameters

ctx (IN) - client context

Comments

startElement

Purpose

This callback function receives notification of the beginning of an element.

Syntax

sword (*startElement)(void *ctx, const oratext *name,
                      const struct xmlattrs *attrs);

Parameters

ctx (IN) - client context

name (IN) - element type name

attrs (IN) - specified or defaulted attributes

Comments

unparsedEntityDecl

Purpose

This callback function receives notification of an unparsed entity declaration.

Syntax

sword (*unparsedEntityDecl)(void *ctx, const oratext *name,
                            const oratext *publicId, const oratext *systemId,
                            const oratext *notationName);

Parameters

ctx (IN) - client context

name (IN) - entity name

publicId (IN) - entity public identifier, or null if not available

systemId (IN) - entity system identifier

notationName (IN) - name of the associated notation

Comments

nsStartElement

Purpose

This callback function receives notification of the start of a namespace for an element.

Syntax

sword (*nsStartElement)(void *ctx, const oratext *qname, const oratext *local,
                       const oratext *namespace, const struct xmlattrs *attrs);

Parameters

ctx (IN) - client context

qname (IN) - element fully qualified name

local (IN) - element local name

namespace (IN) - element namespace (URI)

attrs (IN) - specified or defaulted attributes

Comments

W3C DOM APIs

The Document Object Model (DOM) is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. In the DOM specification, the term document is used in the broad sense -- increasingly, XML is being used as a way of representing many different kinds of information that may be stored in diverse systems, and much of this would traditionally be seen as data rather than as documents. Nevertheless, XML presents this data as documents, and the DOM may be used to manage this data.

With the DOM, programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Anything found in an HTML or XML document can be accessed, changed, deleted, or added using the DOM, with a few exceptions -- in particular, the DOM interfaces for the XML internal and external subsets have not yet been specified.

One important objective of the W3C specification for the DOM is to provide a standard programming interface that can be used in a wide variety of environments and applications. The DOM is designed to be used with any programming language. Since the DOM standard is object-oriented, for the C adaptation, some changes had to be made:

Reused function names had to be expanded, e.g. getValue in the attribute class is given the unique name getAttrValue, matching the pattern established by getNodeValue.
Also, some functions were added to extend the DOM. For example, there is no function defined which returns the number of children of a node, so numChildNodes was invented, etc.

The implementation of this C DOM interface follows REC-DOM-Level-1-19981001.

DOM Functions

appendChild	Node::appendChild	Append child node to current node
appendData Node::appendData		Append character data to end of node's current data
cloneNode		Create a new node identical to the current one
createAttribute		Create an new attribute for an element node
createCDATASection		Create a CDATA node
createComment		Create a comment node
createDocumentFragment		Create a document fragment node
createElement		Create an element node
createEntityReference		Create an entity reference node
createProcessingInstruction		Create a processing instruction (PI) node
createTextNode		Create a text ode
deleteData		Remove substring from a node's character data
getAttrName		Return an attribute's name
getAttrSpecified		Return value of attribute's specified flag [DOM getSpecified]
getAttrValue		Return an attribute's value (definition) [DOM getValue]
getAttribute		Return the value of an attribute
getAttributeIndex		Return an element's attribute given its index
getAttributeNode		Get an element's attribute node given its name [DOM getName]
getAttributes		Return array of element's attributes
getCharData		Return character data for a TEXT node [DOM getData]
getCharLength		Return length of TEXT node's character data [DOM getLength]
getChildNode		Return indexed node from array of nodes [DOM item]
getChildNodes		Return array of node's children
getContentModel		Returns the content model for an element from the DTD [DOM extension]
getDocument		Return top-level DOCUMENT node [DOM extension]
getDocumentElement		Return highest-level (root) ELEMENT node
getDocType		Return current DTD
getDocTypeEntities		Return array of DTD's general entities
getDocTypeName		Return name of DTD
getDocTypeNotations		Return array of DTD's notations
getElementsByTagName		Return list of elements with matching name
getEntityNotation		Return an entity's NDATA [DOM getNotation]
getEntityPubID		Return an entity's public ID [DOM getPublicId]
getEntitySysID		Return an entity's system ID [DOM getSystemId]
getFirstChild		Return the first child of a node
getImplementation		Return DOM-implementation structure (if defined)
getLastChild		Return the last child of a node
getNextSibling		Return a node's next sibling
getNamedItem		Returns the named node from a list of nodes
getNodeMapLength		Returns number of entries in a NodeMap [DOM getLength]
getNodeName		Returns a node's name
getNodeType		Returns a node's type code (enumeration)
getNodeValue		Returns a node's "value", its character data
getNotationPubID		Returns a notation's public ID [DOM getPublicId]
getNotationSysID		Returns a notation's system ID [DOM getSystemId]
getOwnerDocument		Returns the DOCUMENT node containing the given node
getPIData		Returns a processing instruction's data [DOM getData]
getPITarget		Returns a processing instruction's target [DOM getTarget]
getParentNode		Returns a node's parent node
getPreviousSibling		Returns a node's "previous" sibling
getTagName		Returns a node's "tagname", same as name for now
hasAttributes		Determine if element node has attributes [DOM extension]
hasChildNodes		Determine if node has children
hasFeature		Determine if DOM implementation supports a specific feature
insertBefore		Inserts a new child node before the given reference node
insertData		Inserts new character data into a node's existing data
isStandalone		Determine if document is standalone [DOM extension]
nodeValid		Validate a node against the current DTD [DOM extension]
normalize		Normalize a node by merging adjacent TEXT nodes
numAttributes		Returns number of element node's attributes [DOM extension]
numChildNodes		Returns number of node's children [DOM extension]
removeAttribute		Removes an element's attribute given its names
removeAttributeNode		Removes an element's attribute given its pointer
removeChild		Removes a node from its parents list of children
removeNamedItem		Removes a node from a list of nodes given its name
replaceChild		Replace one node with another
replaceData		Replace a substring of a node's character data with another string
setAttribute		Sets (adds or replaces) a new attribute for an element node given the attribute's name and value
setAttributeNode		Sets (adds or replaces) a new attribute for an element node given a pointer to the new attribute
setNamedItem		Sets (adds or replaces) a new node in a parent's list of children
setNodeValue		Sets a node's "value" (character data)
setPIData		Sets a processing instruction's data [DOM setData]
splitText		Split a node's character data into two parts
substringData		Return a substring of a node's character data

Function Prototypes

appendChild

Purpose

Adds new node to the end of the list of children for the given parent and returns the node added.

C Prototype

xmlnode *appendChild(xmlctx *ctx, xmlnode *parent, xmlnode *newnode)

C++ Prototype

Node* Node::appendChild(Node *newChild)

Parameters

ctx	(IN)	XML context
parent	(IN)	parent node
newnode	(IN)	new node to append

C Example

xmlnode *node, *parent;
...

if (node = createElement(ctx, "node"))
    appendChild(ctx, parent, node);

appendData

Purpose

Append the given string to the character data of a TEXT or CDATA node.

Syntax

void appendData(xmlctx *ctx, xmlnode *node, const oratext *arg)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node
arg	(IN)	new data to append

Example

xmlnode *node;
...
getNodeValue(node) -> "foo"
appendData(ctx, node, "bar");
getNodeValue(node) -> "foobar"

cloneNode

Purpose

Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns NULL).

Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node.

A deep clone differs in that the node's children are also recursively cloned instead of just pointed-to.

Syntax

xmlnode *cloneNode(xmlctx *ctx, const xmlnode *old, boolean deep)

Parameters

ctx	(IN)	XML context
old	(IN)	old node to clone
deep	(IN)	recursion flag

createAttribute

Purpose

Create a new ATTRIBUTE node with the given name and value. The new node is unattached and must be added to an element node with setAttributeNode.

Syntax

xmlnode *createAttribute(xmlctx *ctx, const oratext *name, const oratext *value)

Parameters

ctx	(IN)	XML context
name	(IN)	name of new attribute
value	(IN)	value of new attribute

Example

xmlnode *attr, *elem;
...
if (attr = createAttribute(ctx, "attr1", "value1"))
{
    setAttributeNode(ctx, elem, attr, NULL);
}

createCDATASection

Purpose

Create a new CDATA node.

Syntax

xmlnode *createCDATASection(xmlctx *ctx, const oratext *data)

Parameters

ctx	(IN)	XML context
data	(IN)	CDATA body

Example

xmlnode *node, *parent;
...
if (node = createCDATASection(ctx, "<greeting>H'o!</greeting>"))
    appendChild(ctx, parent, node);

createComment

Purpose

Create a new COMMENT node.

Syntax

xmlnode *createComment(xmlctx *ctx, const oratext *data)

Parameters

ctx	(IN)	XML context
data	(IN)	text of comment

Example

xmlnode *node, *parent;
...
if (node = createComment(ctx, "From here on this document is unfinished"))
    appendChild(ctx, parent, node);

createDocumentFragment

Purpose

Create a new DOCUMENT_FRAGMENT node. A document fragment is a lightweight document object that contains one or more children, but does not have the overhead of a full document. It can be used in some operations (inserting for example) in place of a simple node, in which case all the fragment's children are operated on instead of the fragment node itself.

Syntax

xmlnode *createDocumentFragment(xmlctx *ctx)

Parameters

ctx

(IN)

XML context

Example

xmlnode *frag, *fragelem, *fragtext;
...
if ((frag = createDocumentFragment(ctx)) &&
    (fragelem = createElement(ctx, (oratext *) "FragElem")) &&
    (fragtext = createTextNode(ctx, (oratext *) "FragText")))
{
    appendChild(ctx, frag, fragelem);
    appendChild(ctx, frag, fragtext);
}

createElement

Purpose

Create a new ELEMENT node.

Syntax

xmlnode *createElement(xmlctx *ctx, const oratext *elname)

Parameters

ctx	(IN)	XML context
elname	(IN)	name of new element

Example

xmlnode *node, *parent;
...
if (node = createElement(ctx, "BOOK"))
    appendChild(ctx, parent, node);

createEntityReference

Purpose

Create a new ENTITY_REFERENCE node.

Syntax

xmlnode *createEntityReference(xmlctx *ctx, const oratext *name)

Parameters

ctx	(IN)	XML context
name	(IN)	name of entity to reference

Example

xmlnode *node, *parent;
...
if (node = createEntityReference(ctx, "homephone"))
    appendChild(ctx, parent, node);

createProcessingInstruction

Purpose

Create a new PROCESSING_INSTRUCTION node with the given target and contents.

Syntax

xmlnode *createProcessingInstruction(xmlctx *ctx, const oratext *target,
                                     const oratext *data)

Parameters

ctx (IN) XML context target (IN) PI target data (IN) PI definition

Example

xmlnode *node, *parent;
...
if (node = createProcessingInstruction(ctx, "target", "definition"))
    appendChild(ctx, parent, node);

createTextNode

Purpose

Create a new TEXT node with the given contents.

Syntax

xmlnode *createTextNode(xmlctx *ctx, const oratext *data)

Parameters

ctx	(IN)	XML context
data	(IN)	data for node

Example

xmlnode *node, *parent;
...
if (node = createTextNode(ctx, "riverrun, past Eve and Adam's..."))
    appendChild(ctx, parent, node);

deleteData

Purpose

Delete a substring from the node's character data.

Syntax

void deleteData(xmlctx *ctx, xmlnode *node, ub4 offset, ub4 count)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node
offset	(IN)	offset of start of substring (0 is first char)
count	(IN)	length of substring

Example

xmlnode *node;
...
getNodeValue(node) -> "phoenix"
deleteData(ctx, node, 2, 1);
getNodeValue(node) -> "phenix"

getAttribute

Purpose

Returns one attribute from an array of attributes, given an index (starting at 0). Fetch the attribute name and/or value (with getAttrName and getAttrValue). On error, returns NULL.

Syntax

const oratext *getAttribute(const xmlnode *node, const oratext *name)

Parameters

node	(IN)	node whose attribtutes to scan
name	(IN)	name of the attribute

Example

xmlnode  *node, *attr;
xmlnodes *nodes;
const oratext *attrval;
...
if (nodes = getAttributes(node))
{
    attr = getAttributeIndex(nodes, 1);/* second attribute */
    attrval = getAttribute(attr, "foo");
    ...
}

getAttributeIndex

Purpose

Returns one attribute from an array of attributes, given an index (starting at 0). Fetch the attribute name and/or value (with getAttrName and getAttrValue). On error, returns NULL.

Syntax

xmlnode *getAttributeIndex(const xmlnodes *attrs, size_t index)

Parameters

attrs	(IN)	pointer to attribute nodes structure (as returned by getAttributes)
index	(IN)	zero-based attribute# to return

Example

xmlnode  *node, *attr;
xmlnodes *nodes;
...
if (nodes = getAttributes(node))
{
    attr = getAttributeIndex(nodes, 1);      /* second attribute */
    ...
}

getAttributeNode

Purpose

Returns a pointer to the element node's attribute of the given name. If no such thing exists, returns NULL.

Syntax

xmlnode *getAttributeNode(const xmlnode *elem, const oratext *name)

Parameters

elem	(IN)	pointer to element node
name	(IN)	name of attribute

Example

xmlnode *node, *attr;
...
if (attr = getAttributeNode(elem, "attr1"))
    ...

getAttributes

Purpose

Returns an array of all attributes of the given node. This pointer may then be passed to getAttribute to fetch individual attribute pointers, or to numAttributes to return the total number of attributes. If no attributes are defined, returns NULL.

Syntax

xmlnodes *getAttributes(const xmlnode *node)

Parameters

node

(IN)

node whose attributes to return

Example

xmlnode  *node;
xmlnodes *nodes;
...
if (nodes = getAttributes(node))
    ...

getAttrName

Purpose

Given a pointer to an attribute, returns the name of the attribute. Under the DOM spec, this is a method named getName.

Syntax

const oratext *getAttrName(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrName(attr) -> "x"

getAttrSpecified

Purpose

Return the 'specified' flag for the attribute: if this attribute was explicitly given a value in the original document or through the DOM, this is TRUE; otherwise, it is FALSE. If the node is not an attribute, returns FALSE. Under the DOM spec, this is a method named getSpecified.

Syntax

boolean getAttrSpecified(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrSpecified(attr) -> TRUE

getAttrValue

Purpose

Given a pointer to an attribute, returns the "value" (definition) of the attribute. Under the DOM spec, this is a method named getValue.

Syntax

const oratext *getAttrValue(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrValue(attr) -> "y"

getCharData

Purpose

Returns the character data of a TEXT or CDATA node. Under the DOM spec, this is a method named getData.

Syntax

const oratext *getCharData(const xmlnode *node)

Parameters

node

(IN)

pointer to text node

Example

xmlnode *node;
...
if (node = createTextNode(ctx, "riverrun"))
    getCharData(node) -> "riverrun"

getCharLength

Purpose

Returns the length of the character data of a TEXT or CDATA node. Under the DOM spec, this is a method named getLength.

Syntax

ub4 getCharLength(const xmlnode *node)

Parameters

node

(IN)

pointer to text node

Example

xmlnode *node;
...
if (node = createTextNode(ctx, "prumptly"))
    getCharLength(node) -> 8

getChildNode

Purpose

Returns the nth node in an array of nodes, or NULL if the numbered node does not exist. Invented function, not in DOM, but named to match the DOM pattern.

Syntax

xmlnode* getChildNode(const xmlnodes *nodes, size_t index)

Parameters

nodes	(IN)	array of nodes (see getChildNodes)
index	(IN)	zero-based child#

Example

xmlnode  *node, *child;
xmlnodes *nodes;
...
if (nodes = getChildNodes(node))
{
    child = getChildNode(nodes, 1);/* second child node */
    ...
}

getChildNodes

Purpose

Returns the array of children of the given node. This pointer may then be passed to getChildNode to fetch individual children.

Syntax

xmlnodes* getChildNodes(const xmlnode *node)

Parameters

node

(IN)

node whose children to return

Example

xmlnode  *node;
xmlnodes *nodes;
...
if (nodes = getChildNodes(node))
    ...

getContentModel

Purpose

Returns the content model for the named element from the current DTD. The content model is composed of xmlnodes, so may be traversed with the same functions as the parsed document. See also the getModifier function which returns the '?', '*', and '+' modifiers to content model nodes.

Syntax

xmlnode *LpxGetContentModel(xmldtd *dtd, oratext *name)

Parameters

dtd	(IN)	pointer to the DTD
name	(IN)	name of element

getDocType

Purpose

Returns a pointer to the (opaque) DTD for the current document.

Syntax

xmldtd* getDocType(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

Example


xmlnodes *nodes;
...
nodes = getDocTypeEntities(getDocType(ctx));

getDocTypeEntities

Purpose

Returns an array of (general) entities defined for the given DTD.

Syntax

xmlnodes *getDocTypeEntities(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

Example


xmldtd   *dtd;
xmlnodes *entities;
...
dtd = getDocType(ctx);
entities = getDocTypeEntities(dtd);

getDocTypeName

Purpose

Returns the given DTD's name.

Syntax

oratext *getDocTypeName(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

getDocTypeNotations

Purpose

Returns an array of notations defined for the given DTD.

Syntax

xmlnodes *getDocTypeNotations(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

Example


xmldtd   *dtd;
xmlnodes *notations;
...
dtd = getDocType(ctx);
notations = getDocTypeNotations(dtd);

getElementsByTagName

Purpose

Returns a list of all elements (within the tree rooted at the given node) with a given tag name in the order in which they would be encountered in a pre-order traversal of the tree. If root is NULL, the entire document is searched. The special value "*" matches all tags.

Syntax

xmlnodes *getElementsByTagName(xmlctx *ctx, xmlnode *root, const oratext *name)

Parameters

ctx	(IN)	XML parser context
root	(IN)	root node of tree
name	(IN)	element tag name

Example

xmlnodes *nodes;
...
nodes = getElementsByTagName(ctx, NULL, "ACT");/* find all ACT elements */

getDocument

Purpose

Returns the root node of the parsed document. The root node is always of type DOCUMENT_NODE. Compare to the getDocumentElement function, which returns the root element node, which is a child of the DOCUMENT node.

Syntax

xmlnode* getDocument(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

getDocumentElement

Purpose

Returns the root element (node) of the parsed document. The entire document is rooted at this node. Compare to getDocument which returns the uppermost DOCUMENT node (the parent of the root element node).

Syntax

xmlnode* getDocumentElement(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

getEntityNotation

Purpose

Returns an entity node's NDATA (notation). Under the DOM spec, this is a method named getNotationName.

Syntax

const oratext *getEntityNotation(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!NOTATION n SYSTEM "http://www.w3.org/">
<!ENTITY e SYSTEM "http://www.w3.org/" NDATA n>

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntityNotation(ent) -> "n"

getEntityPubID

Purpose

Returns an entity node's public ID. Under the DOM spec, this is a method named getPublicId.

Syntax

const oratext *getEntityPubID(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!ENTITY e PUBLIC "PublicID" "nop.ent">

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntityPubID(ent) -> "PublicID"

getEntitySysID

Purpose

Returns an entity node's system ID. Under the DOM spec, this is a method named getSystemId.

Syntax

const oratext *getEntitySysID(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!ENTITY e PUBLIC "PublicID" "nop.ent">

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntitySysID(ent) -> "nop.ent"

getFirstChild

Purpose

Returns the first child of the given node, or NULL if the node has no children.

Syntax

xmlnode* getFirstChild(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode *elem;/* assume elem will point to element Thing */
...
getFirstChild(elem) -> element "A"

getImplementation

Purpose

This function returns a pointer to the DOMImplementation structure for this implementation, or NULL if no such information is available.

Syntax

xmldomimp* getImplementation(xmlctx *ctx)

Parameters

ctx

(IN)

XML context

getLastChild

Purpose

Returns the last child of the given node, or NULL if the node has no children.

Syntax

xmlnode* getLastChild(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode *elem;/* assume elem will point to element Thing */
...
getLastChild(elem) -> element "C"

getNamedItem

Purpose

Returns the named node from an array nodes; sets the user's index (if provided) to the child# of the node (first node is zero).

Syntax

xmlnode *getNamedItem(const xmlnodes *nodes, const oratext *name, size_t *index)

Parameters

nodes	(IN)	array of nodes
name	(IN)	name of node to fetch
index	(OUT)	index of found node

Example


xmlnode  *node, *elem;
xmlnodes *nodes;
size_t    index;
...
if (nodes = getChildNodes(elem))
{
    node = getNamedItem(nodes, "FOO", &index);
    ...
}

getNextSibling

Purpose

This function returns a pointer to the next sibling of the given node, that is, the next child of the parent. For the last child, NULL is returned.

Syntax

xmlnode* getNextSibling(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode *node, *elem;/* assume elem will point to node Thing */
...
for (node = getFirstChild(elem); node; node = getNextSibling(node))
    ...node will be A then B then C...

getNodeMapLength

Purpose

Given an array of nodes (as returned by getChildNodes), returns the number of nodes in the map. Under the DOM spec, this is a member function named getLength.

Syntax

size_t getNodeMapLength(const xmlnodes *nodes)

Parameters

nodes

(IN)

array of nodes

Example


<Thing><A/><B/><C/></Thing>

xmlnodes *nodes;
xmlnode  *elem;/* assume elem will point to node Thing */
...
if (nodes = getChildNodes(elem))
    getNodeMapLength(nodes) -> 3

getNodeName

Purpose

Returns the name of the given node, or NULL if the node has no name. Note that "tagname" and "name" are currently synonymous.

Syntax

const oratext* getNodeName(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node Thing */
...
getNodeName(elem) -> "Thing"

getNodeType

Purpose

Returns the type code for a node.

Syntax

xmlntype getNodeType(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node Thing */
...
getNodeType(elem) -> ELEMENT_NODE

getNodeValue

Purpose

Returns the "value" (associated character data) for a node, or NULL if the node has no data.

Syntax

const oratext* getNodeValue(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<!--This is a comment-->

xmlnode *node;/* assume node will point to comment node above */
...
getNodeValue(node) -> "This is a comment"

getNotationPubID

Purpose

Return a notation node's public ID. Under the DOM spec, this is a method named getPublicId.

Syntax

const oratext *getNotationPubID(const xmlnode *note)

Parameters

note

(IN)

pointer to node

Example


<!NOTATION n PUBLIC "whatever">

xmlnode *note;/* assume note will point to notation node above */
...
getNotationPubID(note) -> "whatever"

getNotationSysID

Purpose

Return a notation node's system ID. Under the DOM spec, this is a method named getSystemId.

Syntax

const oratext *getNotationSysID(const xmlnode *note)

Parameters

note

(IN)

pointer to node

Example


<!NOTATION n SYSTEM "http://www.w3.org/">

xmlnode *note;/* assume note will point to notation node above */
...
getNotationSysID(note) -> "http://www.w3.org/"

getOwnerDocument

Purpose

Returns the document node which contains the given node. An XML document is always rooted in a node of type DOCUMENT_NODE. Calling getOwnerDocument on any node in the document returns that document node.

Syntax

xmlnode* getOwnerDocument(xmlnode *node)

Parameters

node

(IN)

pointer to node

getParentNode

Purpose

Returns the parent node of the given node. For the top-most node, NULL is returned.

Syntax

xmlnode* getParentNode(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node A */
...
getParentNode(elem) -> node Thing

getPIData

Purpose

Returns a Processing Instruction's (PI) data string. Under the DOM spec, this is a method named getData.

Syntax

const oratext *getPIData(const xmlnode *pi)

Parameters

(IN)

pointer to PI node

Example


<?PI Blither blather?>

xmlnode *pi;/* assume pi will point to PI node above */
...
getPIData(pi) -> "Blither blather"

getPITarget

Purpose

Returns a Processing Instruction's (PI) target string. Under the DOM spec, this is a method named getTarget.

Syntax

const oratext *getPITarget(const xmlnode *pi)

Parameters

(IN)

pointer to PI node

Example


<?PI Blither blather?>

xmlnode *pi;/* assume pi will point to PI node above */
...
getPITarget(pi) -> "PI"

getPreviousSibling

Purpose

Returns the previous sibling of the given node. That is, the node at the same level which came before this one. For the first child of a node, NULL is returned.

Syntax

xmlnode* getPreviousSibling(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode *node, *elem;/* assume elem will point to node Thing */
...
for (node = getLastChild(elem); node; node = getPreviousSibling(node))
    ...node will be C then B then A...

getTagName

Purpose

Returns the "tagname" of a node, which is the same as its name for now, see getNodeName. The DOM says "...even though there is a generic nodeName attribute on the Node interface, there is still a tagName attribute on the Element interface; these two attributes must contain the same value, but the Working Group considers it worthwhile to support both, given the different constituencies the DOM API must satisfy.

Syntax

const oratext *getTagName(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasAttributes

Purpose

Determines if if the given node has any defined attributes, returning TRUE if so, FALSE if not. This is a DOM extension named after the pattern started by hasChildNodes.

Syntax

boolean hasAttributes(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasChildNodes

Purpose

Determines if the given node has children, returning TRUE if so, FALSE if not. The same result can be achieved by testing if getChildNodes returns a pointer (has children) or NULL (no children).

Syntax

boolean hasChildNodes(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasFeature

Purpose

Tests if the DOM implementation implements a specific feature and version. feature is the package name of the feature to test. In DOM Level 1, the legal values are "HTML" and "XML" (case-insensitive). version is the version number of the package name to test. In DOM Level 1, this is the string "1.0". If the version is not specified, supporting any version of the feature will cause the method to return TRUE.

Syntax

boolean hasFeature(xmlctx *ctx, const oratext *feature, const oratext *version)

Parameters

ctx	(IN)	XML context
feature	(IN)	the package name of the feature to test
version	(IN)	the version number of the package name to test

insertBefore

Purpose

Inserts a new node into the given parent node's list of children before the existing reference node. If the reference node is NULL, appends the new node at the end of the list. If the new node is a DocumentFragment, its children are inserted, in the same order, instead of the fragment itself. If the new node is already in the tree, it is first removed.

Syntax

xmlnode *insertBefore(xmlctx *ctx, xmlnode *parent,
                      xmlnode *newChild, xmlnode *refChild)

Parameters

ctx	(IN)	XML context
parent	(IN)	parent node to insert into
newChild	(IN)	new child node to insert
refChild	(IN)	reference node to insert before

Example


<Thing><A/><B/><C/></Thing>

xmlnode *elem, *new, *ref;    /* assume elem points to Thing, new is a new
                                 element "Z", and ref points to node B */
...
insertBefore(ctx, elem, new, ref);

<Thing><A/><Z/><B/><C/></Thing>

insertData

Purpose

Inserts a string into the node character data at the specified offset.

Syntax

void insertData(xmlctx *ctx, xmlnode *node, ub4 offset, const oratext *arg)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node
offset	(IN)	insertion point (0 is first position)
refChild	(IN)	new string to insert

Example


xmlnode *node;
...
getNodeValue(node) -> "abcdefg"
insertData(ctx, node, 3, "ZZZ");
getNodeValue(node) -> "abcZZZdefg"

isStandalone

Purpose

Returns the value of the standalone flag as specified in the document's <?xml?> processing instruction. This is an invented function, not in DOM spec, but named to match the DOM pattern.

Syntax

boolean isStandalone(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

nodeValid

Purpose

Validate a node against the DTD. Returns 0 on success, else a non-zero error code (which can be looked up in the message file). This function is provided for applications which construct their own documents via the API and/or Class Generator. Normally the parser will validate the document and the user need not call nodeValid explicitly.

Syntax

uword nodeValid(xmlctx *ctx, const xmlnode *node)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node

normalize

Purpose

"Normalizes" an element, i.e. merges adjacent TEXT nodes. Adjacent TEXT nodes don't happen during a normal parse, only when extra nodes are inserted via the DOM.

Syntax

void normalize(xmlctx *ctx, xmlnode *elem)

Parameters

ctx	(IN)	XML context
elem	(IN)	pointer to element node

Example


xmlnode *node, *t1, *t2;
...
if ((node = createElement(ctx, "FOO")) &&
    (t1 = createTextNode(ctx, "one of ")) &&
    (t2 = createTextNode(ctx, "these days")) &&
    appendChild(ctx, node, t1) &&
    appendChild(ctx, node, t2))
{
    <FOO>"one of " "these days"</FOO>
    normalize(ctx, node);
    <FOO>"one of these days"</FOO>
}

numAttributes

Purpose

Returns the number of defined attributes in an attribute array (as returned by getAttributes). This is an invented function, not in the DOM spec, but named after the DOM pattern.

Syntax

size_t numAttributes(const xmlnodes *attrs)

Parameters

attrs

(IN)

array of attributes

Example


xmlnodes *nodes;
xmlnode  *node;
size_t    i;
...
if (nodes = getAttributes(node))
{
    for (i = 0; i < numAttributes(nodes); i++)
        ...
}

numChildNodes

Purpose

Returns the number of children in an array of nodes (as returned by getChildNodes). This is an invented function, not in the DOM spec, but named after the DOM pattern.

Syntax

size_t numChildNodes(const xmlnodes *nodes)

Parameters

nodes

(IN)

pointer to opaque nodes structure

Example


xmlnodes *nodes;
xmlnode  *elem;
size_t    i;
...
if (nodes = getChildNodes(elem))
{
    for (i = 0; i < numChildNodes(nodes); i++)
        ...
}

removeAttribute

Purpose

Removes the named attribute from an element node. If the removed attribute has a default value it is immediately replaced.

Syntax

void removeAttribute(xmlnode *elem, const oratext *name)

Parameters

elem	(IN)	pointer to element node
name	(IN)	name of attribute to remove

Example


<!ATTLIST FOO attr CDATA 'default'>

xmlnode *elem;/* assume elem point to a FOO node */
...
<FOO attr="snark"/>
removeAttribute(elem, "attr");
<FOO attr="default"/>

removeAttributeNode

Purpose

Removes an attribute from an element, given a pointer to the attribute. If successful, returns the attribute node back. On error, returns NULL.

Syntax

xmlnode *removeAttributeNode(xmlnode *elem, xmlnode *attr)

Parameters

elem	(IN)	pointer to element node
attr	(IN)	attribute node to remove

Example


xmlnode *elem, *attr;
...
if (attr = getAttributeNode(elem, "attr1"))
    removeAttributeNode(elem, attr);

removeChild

Purpose

Removes the given node from its parent and returns it.

Syntax

xmlnode *removeChild(xmlnode *node)

Parameters

node

(IN)

old node to remove

Example


xmlnodes *nodes;
xmlnode  *elem, *node;
...
if ((nodes = getChildNodes(elem)) &&
    (node = getNamedItem(nodes, "B", NULL))
{
    <Thing><A/><B/><C/></Thing>
    removeChild(node);
    <Thing><A/><C/></Thing>
}

removeNamedItem

Purpose

Removes the named node from an array of nodes.

Syntax

xmlnode *removeNamedItem(xmlnodes *nodes, const oratext *name)

Parameters

nodes	(IN)	list of nodes
name	(IN)	name of node to remove

Example


xmlnodes *nodes;
xmlnode  *elem;
...
if (nodes = getChildNodes(elem))
{
    <Thing><A/><B/><C/></Thing>
    removeNamedItem(nodes, "B");
    <Thing><A/><C/></Thing>
}

replaceChild

Purpose

Replaces an existing child node with a new node and returns the old node. If the new node is already in the tree, it is first removed.

Syntax

xmlnode *replaceChild(xmlctx *ctx, xmlnode *newChild, xmlnode *oldChild)

Parameters

ctx	(IN)	XML context
newChild	(IN)	new replacement node
oldChild	(IN)	old node being replaced

Example


xmlnodes *nodes;
xmlnode  *elem, *old, *new;
...
if ((nodes = getChildNodes(elem)) &&
    (old = getNamedItem(nodes, "B", NULL)) &&
    (new = createElement(ctx, "NEW")))
{
    <Thing><A/><B/><C/></Thing>
    replaceChild(ctx, new, old);
    <Thing><A/><NEW/><C/></Thing>
}

replaceData

Purpose

Replaces the substring at the given character offset and length with a replacement string.

Syntax

void replaceData(xmlctx *ctx, xmlnode *node, ub4 offset,
                 ub4 count, oratext *arg)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node
offset	(IN)	start of substring to replace (0 is first character)
count	(IN)	length of old substring
arg	(IN)	replacement text

Example


xmlnode *node;
...
getNodeValue(node) -> "every dog has his day"
replaceData(ctx, node, 6, 3, "man");
getNodeValue(node) -> "every man has his day"

setAttribute

Purpose

Create a new attribute for an element. If the named attribute already exists, its value is simply replaced.

Syntax

xmlnode *setAttribute(xmlctx *ctx, xmlnode *elem,
                      const oratext *name, const oratext *value)

Parameters

ctx	(IN)	XML context
elem	(IN)	pointer to element node
name	(IN)	name of new attribute
value	(IN)	value of new attribute

Example


xmlnode *elem;
...
<Thing/>
setAttribute(ctx, elem, "attr", "value");
<Thing attr="value"/>

setAttributeNode

Purpose

Adds a new attribute to the given element. If the named attribute already exists, it is replaced and the user's old pointer (if provided) is set to the old attr. If the attribute is new, it is added and the old pointer is set to NULL. Returns a truth value indicating success.

Syntax

boolean setAttributeNode(xmlctx *ctx, xmlnode *elem,
                         xmlnode *newNode, xmlnode **oldNode)

Parameters

ctx	(IN)	XML context
elem	(IN)	pointer to element node
newNode	(IN)	pointer to new attribute
oldNode	(OUT)	return pointer for old attribute

Example


xmlnode *elem, *attr;
...
if (attr = createAttribute(ctx, "attr", "value"))
{
    <Thing/>
    setAttributeNode(ctx, elem, attr, NULL);
    <Thing attr="value"/>
}

setNamedItem

Purpose

Sets a new child node in a parent node's map; if an old node exists with same name, replaces the old node (and sets user's pointer, if provided, to it); if no such named node exists, appends node to map and sets pointer to NULL.

Syntax

boolean setNamedItem(xmlctx *ctx, xmlnode *parent, xmlnode *node, xmlnode **old)

Parameters

node	(IN)	pointer to node
parent	(IN)	parent to add node to
node	(IN)	new node to add
old	(IN)	pointer to replaced node

Example


xmlnode *elem, *new;
...
if ((new = createElement(ctx, "B")) &&
    setAttribute(ctx, new, "attr", "value"))
{
    <Thing><A/><B/><C/></Thing>
    setNamedItem(ctx, elem, new, NULL);
    <Thing><A/><B attr="value"/><C/></Thing>
}

setNodeValue

Purpose

Sets the value (character data) associated with a node.

Syntax

boolean setNodeValue(xmlnode *node, const oratext *data)

Parameters

node	(IN)	pointer to node
data	(IN)	new data for node

Example


xmlnode *node;
...
getNodeValue(node) -> "umbrella"
setNodeValue(node, "brolly");
getNodeValue(node) -> "brolly"

setPIData

Purpose

Sets a Processing Instruction's (PI) data (equivalent to setNodeValue). It is not permitted to set the data to NULL. Under the DOM spec, this is a method named setData.

Syntax

void setPIData(xmlnode *pi, const oratext *data)

Parameters

pi	(IN)	pointer to PI node
data	(IN)	new data for PI

Example


xmlnode *pi;
...
<?SKRINKLIT Monster Grendel's tastes are plainish?>
setPIData(pi, "Breakfast?  Just a couple Danish.");
<?SKRINKLIT Breakfast?  Just a couple Danish.?>

splitText

Purpose

Breaks a TEXT node into two TEXT nodes at the specified offset, keeping both in the tree as siblings. The original node then only contains all the content up to the offset point. And a new node, which is inserted as the next sibling of the original, contains all the old content starting at the offset point.

Syntax

xmlnode *splitText(xmlctx *ctx, xmlnode *old, uword offset)

Parameters

ctx	(IN)	XML context
old	(IN)	original node to split
offset	(IN)	offset of split point

Example


xmlnode *node;
...
<FOO>"one of these days"</FOO>
splitText(ctx, node, 7);
<FOO>"one of " "these days"</FOO>

substringData

Purpose

Returns a substring of a node's character data.

Syntax

const oratext *substringData(xmlctx *ctx, const xmlnode *node,
                             ub4 offset, ub4 count)

Parameters

ctx	(IN)	XML context
node	(IN)	pointer to node
offset	(IN)	offset of start of substring
count	(IN)	length of substring

Example


xmlnode *node;
...
<FOO>"one of these days"</FOO>
substringData(ctx, node, 0, 3) -> "one"

Namespace APIs

Namespace APIs provide an interface that is an extension to the DOM and give information relating to the document namespaces.

XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references. A single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.

These considerations require that document constructs should have universal names, whose scope extends beyond their containing document. This C implementation of XML namespaces provides a mechanism to accomplish this.

Names from XML namespaces may appear as qualified names, which contain a single colon, separating the name into a namespace prefix and a local part. The prefix, which is mapped to a URI reference, selects a namespace. The combination of the universally managed URI namespace and the document's own namespace produces identifiers that are universally unique. Mechanisms are provided for prefix scoping and defaulting.

URI references can contain characters not allowed in names, so cannot be used directly as namespace prefixes. Therefore, the namespace prefix serves as a proxy for a URI reference. An attribute-based syntax described in the W3C Namespace specification is used to declare the association of the namespace prefix with a URI reference.

The implementation of this C Namespace interface followed the XML Namespace standard of revision REC-xml-names-19990114.

Data Structures and Types

Oratext
Xmlattr
Xmlnode

Functions

getAttrLocal(xmlattr *attrs)

Returns attribute local name.

getAttrNamespace(xmlattr *attr)

Returns attribute namespace (URI).

getAttrPrefix(xmlattr *attr)

Returns attribute prefix.

getAttrQualifiedName(xmlattr *attr)

Returns attribute fully qualified name.

getNodeLocal(xmlnode *node)

Returns node local name.

getNodeNamespace(xmlnode *node)

Returns node namespace (URI).

getNodePrefix(xmlnode *node)

Returns node prefix.

getNodeQualifiedName(xmlnode *node)

Returns node qualified name.

Data Structure and Type Description

ORATEXT 
Typedef unsigned char oratext;

XMLATTR 

Typedef struct xmlattr xmlattr;

Note:

the contents of xmlattr are private and must not be accessed by users.

XMLNODE


Typedef struct xmlnode xmlnode;

Note:

the contents of xmlnode are private and must not be accessed by users.

Function Prototypes

getAttrLocal

Purpose

This function returns the local name of this attribute.

Syntax

const oratext *getAttrLocal(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrNamespace

Purpose

This function returns namespace for this attribute.

Syntax

const oratext *getAttrNamespace(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrPrefix

Purpose

This function returns prefix for this attribute.

Syntax

const oratext *getAttrPrefix(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrQualifiedName

Purpose

This function returns fully qualified name for the attribute.

Syntax

const oratext *getAttrQualifiedName(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getNodeLocal

Purpose

This function returns the local name of this node.

Syntax

const oratext *getNodeLocal(const xmlnode *node);

Parameters

node (IN) - node to get local name from

Comments

getNodeNamespace

Purpose

This function returns namespace for this node.

Syntax

const oratext *getNodeNamespace(const xmlnode *node);

Parameters

node (IN) - node to get namespace from

Comments

getNodePrefix

Purpose

This function returns prefix for this node.

Syntax

const oratext *getNodePrefix(const xmlnode *node);

Parameters

node (IN) - node to get prefix from

Comments

getNodeQualifiedName

Purpose

This function returns fully qualified name for this node.

Syntax

const oratext *getNodeQualifiedName(const xmlnode *node);

Parameters

node (IN) - node to get name from

Comments

Datatypes

oratext*/String

String pointer (C/C++)

xmlctx

Master XML context

xmlmemcb

Memory callback structure (optional)

xmlsaxcb

SAX callback structure (SAX only)

ub4

32-bit (or larger) unsigned integer

uword

Native unsigned integer

boolean	Boolean value, TRUE or FALSE
oratext	String pointer
xmlcpmod	Content model node modifier
xmlctx	Master XML parser context
xmlnode	Document node
xmlnodes	Array of nodes
xmlntype	Node type enumeration

oratext/String

The basic character pointer type (for C/C++):

typedef unsigned char oratext;
typedef unsigned char String;

xmlctx

The top-level XML context:

typedef struct xmlctx xmlctx;

Note:

The contents of xmlctx are private and must not be accessed by users.

xmlmemcb

The memory callback structure passed to xmlinit:

struct xmlmemcb
{
   void *(*alloc)(void *ctx, size_t size);
   void  (*free)(void *ctx, void *ptr);
   void *(*realloc)(void *ctx, void *ptr, size_t size);
};
typedef struct xmlmemcb xmlmemcb

xmlsaxcb

The SAX callback structure passed to xmlinit:

struct xmlsaxcb
{
   sword (*startDocument)(void *ctx);
   sword (*endDocument)(void *ctx);
   sword (*startElement)(void *ctx, const oratext *name, 
                         const struct xmlattrs *attrs);
   sword (*endElement)(void *ctx, const oratext *name);
   sword (*characters)(void *ctx, const oratext *ch, size_t len);
   sword (*ignorableWhitespace)(void *ctx, const oratext *ch, 
                                    size_t len);
   sword (*processingInstruction)(void *ctx, const oratext *target, 
                                  const oratext *data);
   sword (*notationDecl)(void *ctx, const oratext *name, 
                         const oratext *publicId, 
                         const oratext *systemId);
   sword (*unparsedEntityDecl)(void *ctx, const oratext *name, 
                               const oratext *publicId, 
                               const oratext *systemId, 
                               const oratext *notationName);
   sword (*nsStartElement)(void *ctx, const oratext *qname, 
                           const oratext *local, 
                           const oratext *namespace,
                           const struct xmlattrs *attrs);
};
typedef struct xmlsaxcb xmlsaxcb;

ub4

Unsigned integer with a minimum of four bytes:

typedef unsigned int ub4;

uword

Unsigned integer in the native word size:

typedef unsigned int uword;

boolean

typedef int boolean;

oratext

typedef unsigned char oratext;

xmlcpmod

Content model node modifiers, see getModifier.

XMLCPMOD_NONE  = 0                 /* no modifier */
XMLCPMOD_OPT   = 1                 /* '?' optional */
XMLCPMOD_0MORE = 2                 /* '*' zero or more */
XMLCPMOD_1MORE = 3                 /* '+' one or more */

xmlctx

typedef struct xmlctx xmlctx;

Note:

The contents of xmlctx are private and must not be accessed by users.

xmlnode

typedef struct xmlnode xmlnode;

Note:

The contents of xmlnode are private and must not be accessed by users.

xmlnodes

typedef struct xmlnodes xmlnodes;

Note:

The contents of xmlnodes are private and must not be accessed by users.

xmlntype

Parse tree node types, see getNodeType. Names and values match DOM specification.

ELEMENT_NODE                = 1    /* element */
ATTRIBUTE_NODE              = 2    /* attribute */
TEXT_NODE                   = 3    /* char data not escaped by CDATA */
CDATA_SECTION_NODE          = 4    /* char data escaped by CDATA */
ENTITY_REFERENCE_NODE       = 5    /* entity reference */
ENTITY_NODE                 = 6    /* entity */
PROCESSING_INSTRUCTION_NODE = 7    /* processing instruction */
COMMENT_NODE                = 8    /* comment */
DOCUMENT_NODE               = 9    /* document */
DOCUMENT_TYPE_NODE          = 10   /* DTD */
DOCUMENT_FRAGMENT_NODE      = 11   /* document fragment */
NOTATION_NODE               = 12   /* notation */

oratext*/String	String pointer (C/C++)
xmlctx	Master XML context
xmlmemcb	Memory callback structure (optional)
xmlsaxcb	SAX callback structure (SAX only)
ub4	32-bit (or larger) unsigned integer
uword	Native unsigned integer

7XML Parser for C

Parser APIs

Calling Sequence

Memory

Thread Safety

Function/Method Index

Functions

xmlinit

Purpose

C Prototype

Parameters

Comments

xmlclean

Purpose

Syntax

Parameters

Comments

xmlparse

Purpose

Syntax

Parameters

Comments

xmlparsebuf

Purpose

Syntax

Parameters

Comments

xmlparsefile

Purpose

Syntax

Parameters

Comments

xmlparsestream

Purpose

Syntax

Parameters

Comments

xmlterm

Purpose

Syntax

Parameters

Comments

createDocument

Purpose

Syntax

Parameters

Comments

isStandalone

Purpose

Syntax

Parameters

Comments

isSingleChar

Purpose

Syntax

Parameters

Comments

getEncoding

Purpose

Syntax

Parameters

Comments

XSLT API

Functions

Function Prototypes

xslprocess

Purpose

Syntax

Parameters

W3C SAX APIs

Data Structures and Types

Non-SAX Callback Functions

Function Prototypes

characters

Purpose

Syntax

Parameters

Comments

endDocument

Purpose

7
XML Parser for C