6
XML Parser for C

This chapter describes the following sections:

Parser APIs
W3 Sax APIs
W3 DOM APIs
Namespace APIs

Parser APIs

Extensible Markup Language (XML) describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML documents are conforming SGML documents.

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application.

This C implementation of the XML processor (or parser) followed the W3C XML specification (rev REC-xml-19980210) and included the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

The following is the default behavior of this parser:

The character set encoding is UTF-8. If all your documents are ASCII, you are encouraged to set the encoding to US-ASCII for better performance.
Messages are printed to stderr unless msghdlr is given.
A parse tree which can be accessed by DOM APIs is built unless saxcb is set to use the SAX callback APIs. Note that any of the SAX callback functions can be set to NULL if not needed.
The default behavior for the parser is to check that the input is well-formed but not to check whether it is valid. The flag XML_FLAG_VALIDATE can be set to validate the input. The default behavior for whitespace processing is to be fully conformant to the XML 1.0 spec, i.e. all whitespace is reported back to the application but it is indicated which whitespace is ignorable. However, some applications may prefer to set the XML_FLAG_DISCARD_WHITESPACE which will discard all whitespace between an end-element tag and the following start-element tag.

Calling Sequence

The sequence of calls to the parser can be:

xmlinit() - xmlparse() or xmlparsebuf() - xmlterm()

xmlinit() - xmlparse() or 
xmlparsebuf() - xmlclean() - xmlparse() or 
xmlparsebuf() - xmlclean() - ... - xmlterm()

xmlinit() - xmlparse() or 
xmlparsebuf() - xmlparse() or 
xmlparsebuf() - ... - xmlterm()

Memory

The memory callback functions memcb may be used if you wish to use your own memory allocation. If they are used, all of the functions should be specified.

The memory allocated for parameters passed to the SAX callbacks or for nodes and data stored with the DOM parse tree will not be freed until one of the following is done:

xmlparse() or xmlparsebuf() is called to parse another file or buffer.
xmlclean() is called.
xmlterm() is called.

Thread Safety

If threads are forked off somewhere in the midst of the init-parse-term sequence of calls, you will get unpredictable behavior and results.

Data Types Index

oratext

String pointer

xmlctx

Master XML context

xmlmemcb

Memory callback structure (optional)

xmlsaxcb

SAX callback structure (SAX only)

ub4

32-bit (or larger) unsigned integer

uword

Native unsigned integer

Function Index

xmlinit

Initialize XML parser

xmlclean

Clean up memory used during parse

xmlparse

Parse a file

xmlparsebuf

Parse a buffer

xmlterm

Shut down XML parser

createDocument

Create a new document

isStandalone

Return document's standalone flag

Data Structures and Types

oratext

typedef unsigned char oratext;

xmlctx

typedef struct xmlctx xmlctx;

Note:
The contents of xmlctx are private and must not be accessed by users.

xmlmemcb

struct xmlmemcb
{
   void *(*alloc)(void *ctx, size_t size);
   void  (*free)(void *ctx, void *ptr);
   void *(*realloc)(void *ctx, void *ptr, size_t size);
};
typedef struct xmlmemcb xmlmemcb;

Note:
This is the memory callback structure.

xmlsaxcb

struct xmlsaxcb
{
   sword (*startDocument)(void *ctx);
   sword (*endDocument)(void *ctx);
   sword (*startElement)(void *ctx, const oratext *name, 
                              const struct xmlattrs *attrs);
   sword (*endElement)(void *ctx, const oratext *name);
   sword (*characters)(void *ctx, const oratext *ch, size_t len);
   sword (*ignorableWhitespace)(void *ctx, const oratext *ch, 
                                        size_t len);
   sword (*processingInstruction)(void *ctx, const oratext *target, 
                                       const oratext *data);
   sword (*notationDecl)(void *ctx, const oratext *name, 
                              const oratext *publicId, 
                              const oratext *systemId);
   sword (*unparsedEntityDecl)(void *ctx, const oratext *name, 
                                    const oratext *publicId, 
                                    const oratext *systemId, 
                                    const oratext *notationName);
   sword (*nsStartElement)(void *ctx, const oratext *qname, 
                                   const oratext *local, 
                                   const oratext *namespace,
                                   const struct xmlattrs *attrs);
   /* The following 8 fields are reserved for future use. */
   void (*empty1)();
   void (*empty2)();
   void (*empty3)();
   void (*empty4)();
   void (*empty5)();
   void (*empty6)();
   void (*empty7)();
   void (*empty8)();
};
typedef struct xmlsaxcb xmlsaxcb;

Note:
Callbacks for SAX-like API.

ub4

typedef unsigned int ub4;

uword

typedef unsigned int uword;

Functions

xmlinit

Purpose

Initializes the C XML parser. It must be called before any parsing can take place.

Syntax

xmlctx *xmlinit(uword *err, const oratext *encoding, 
                 void (*msghdlr)(void *msgctx, const oratext *msg, ub4 errcode), 
                 void *msgctx, const xmlsaxcb *saxcb, void *saxcbctx, 
                 const xmlmemcb *memcb, void *memcbctx, const oratext *lang);

Parameters

err      (OUT)- The error, if any
encoding (IN) - default character set encoding
msghdlr  (IN) - Error message handler function
msgctx   (IN) - Context for the error message handler
saxcb    (IN) - SAX callback structure filled with function pointers
saxcbctx (IN) - Context for SAX callbacks
memcb    (IN) - Memory function callbacks
memcbctx (IN) - Context for the memory function callbacks
lang     (IN) - Language for error messages

Comments

Do not call any other XML parser functions if this is not successful!

This function should only be called once before starting the processing of one or more XML files. xmlterm() should be called after all processing of XML files has completed.

Error codes: XMLERR_LEH_INIT, XMLERR_BAD_ENCODING, XMLERR_NLS_INIT, XMLERR_NO_MEMORY, XMLERR_NULL_PTR

All values may be NULL except for err.

By default, the character set encoding is UTF-8. If all your documents are ASCII, you are encouraged to set the encoding to US-ASCII for better performance.

By default, messages are printed to stderr unless msghdlr is given.

By default, a parse tree is built (accessible by DOM APIs) unless saxcb is set (in which case the SAX callback APIs are invoked). Note that any of the SAX callback functions can be set to NULL if not needed.

The memory callback functions memcb may be used if you wish to use your own memory allocation. If they are used, all of the functions should be specified.

The parameters msgctx, saxcbctx, and memcbctx are structures that you may define and use to pass information to your callback routines for the message handler, SAX functions, or memory functions, respectively. They should be set to NULL if your callback functions do not need any additional information passed in to them.

The lang parameter is not used currently and may be set to NULL. It will be used in future releases to determine the language of the error messages.

xmlclean

Purpose

Frees any memory used during the previous parse.

Syntax

void xmlclean(xmlctx *ctx);

Parameters

ctx (IN) - The XML parser context

Comments

This function is provided as a convenience for those who want to parse multiple files but would like to free the memory used for parses before the subsequent call to xmlparse() or xmlparsebuf().

xmlparse

Purpose

Invokes the XML parser on an input file. The parser must have been initialized successfully with a call to xmlinit() first.

Syntax

uword xmlparse(xmlctx *ctx, const oratext *filename, const oratext *encoding, 
ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
filename (IN) - path to XML document
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

Flag bits must be OR'd to override the default behavior of the parser. The following flag bits may be set:

XML_FLAG_VALIDATE turns validation on.
XML_FLAG_DISCARD_WHITESPACE will discard whitespace where it appears to be insignificant.

The default behavior is to not validate the input. The default behavior for whitespace processing is to be fully conformant to the XML 1.0 spec, i.e. all whitespace is reported back to the application but it is indicated which whitespace is ignorable. However, some applications may prefer to set the XML_FLAG_DISCARD_WHITESPACE which will discard all whitespace between an end-element tag and the following start-element tag.

The memory passed to the SAX callbacks or stored with the DOM parse tree will not be freed until one of the following is done:

xmlparse() or xmlparsebuf() is called to parse another file.
xmlclean() is called.
xmlterm() is called.

This function will free any memory used during the previous parse.

xmlparsebuf

Purpose

Invokes the XML parser on a buffer. The parser must have been initialized successfully with a call to xmlinit() first.

Syntax

uword xmlparsebuf(xmlctx *ctx, const oratext *buffer, size_t len, const oratext 
*encoding, ub4 flags);

Parameters

ctx      (IN/OUT) - The XML parser context
buffer   (IN) - file to be parsed
len      (IN) - length of the buffer
encoding (IN) - default character set encoding
flags    (IN) - what options to use

Comments

This function is identical to xmlparse() except that input is taken from the user's buffer instead of from an external file.

xmlterm

Purpose

Terminates the XML parser. It should be called after xmlinit(), and before exiting the main program.

Syntax

uword xmlterm(xmlctx *ctx);

Parameters

ctx (IN) - the XML parser context

Comments

This function will free any memory used during the previous parse. No additional XML parser calls can be made until xmlinit() is called.

createDocument

Purpose

Creates a new document in memory.

Syntax

xmlnode* createDocument(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

This function is used when constructing a new document in memory. An XML document is always rooted in a node of type DOCUMENT_NODE-- this function creates that root node and sets it in the context. There can be only one current document and hence only one document node; if one already exists, this function does nothing and returns NULL.

isStandalone

Purpose

Return value of document's standalone flag.

Syntax

boolean isStandalone(xmlctx *ctx)

Parameters

ctx (IN) - the XML parser context

Comments

This function returns the boolean value of the document's standalone flag, as specified in the <?xml?> processing instruction.

XSLT API

XSLT is a language for tranforming XML documents into other XML documents.

XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used indepently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformation that are needed when XSLT is used as part of XSL.

A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree. The result tree is separate from the source tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added.

A transformation expressed in XSLT is called a stylesheet. This is because, in the case when XSLT is transforming into the XSL formatting vocabulary, the transformation functions as a stylesheet.

A stylesheet contains a set of template rules. A template rule has two parts: a pattern which is matched against nodes in the source tree and a template which can be instantiated to form part of the result tree. This allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures.

A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. A template can also contain elements from the XSLT namespace that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates. Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template. Note that elements are only processed when they have been selected by the execution of an instruction. The result tree is constructed by finding the template rule for the root node and instantiating its template.

A software module called an XSL processor is used to read XML documents and transform them into other XML documents with different styles.

The C implementation of the XSL processor followed the XSL Transformations standard (version 1.0, November 16, 1999) and included the required behavior of an XSL processor as specified in the XSLT specification.

Data Structures and Types

uword
xmlctx
xmlnode

Functions

xmlprocess(xmlctx *docctx, xmlctx *xslctx, xmlctx *resctx, xmlnode **result)

Processes XSL Stylesheet with XML document source and returns success or an error code.

Data Structure and Type Description

uword

typedef unsigned int uword;

xmlctx

typedef struct xmlctx xmlctx;

Note:
The contents of xmlctx are private and must not be accessed by users.

xmlnode

typedef struct xmlnode xmlnode;

Note:
The contents of xmlnode are private and must not be accessed by users.

Function Prototypes

xslprocess

Purpose

This function processes an XSL Stylesheet with an XML document source.

Syntax

 uword xslprocess(xmlctx *docctx, xmlctx *xslctx, xmlctx *resctx, xmlnode 
**result);

Parameters

xmlctx (IN/OUT) - The XML document context

xslctx (IN) - The XSL stylesheet context

resctx (IN) - The result document fragment context

result (IN/OUT) - The result document fragment node

W3C SAX APIs

SAX is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list.

There are two major types of XML (or SGML) APIs:

tree-based APIs, and
event-based APIs.

A tree-based API compiles an XML document into an internal tree structure, then allows an application to navigate that tree using the Document Object Model (DOM), a standard tree-based API for XML and HTML documents.

An event-based API, on the other hand, reports parsing events (such as the start and end of elements) directly to the application through callbacks, and does not usually build an internal tree. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.

Tree-based APIs are useful for a wide range of applications, but they often put a great strain on system resources, especially if the document is large (under very controlled circumstances, it is possible to construct the tree in a lazy fashion to avoid some of this problem). Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree.

In both of these cases, an event-based API provides a simpler, lower-level access to an XML document: you can parse documents much larger than your available system memory, and you can construct your own data structures using your callback event handlers.

To use SAX, an xmlsaxcb structure is initialized with function pointers and passed to the xmlinit() call. A pointer to a user-defined context structure may also be included; that context pointer will be passed to each SAX function.

The SAX callback structure:

typedef struct
{
   sword (*startDocument)(void *ctx);
   sword (*endDocument)(void *ctx);
   sword (*startElement)(void *ctx, const oratext *name, const struct xmlarray 
*attrs);
   sword (*endElement)(void *ctx, const oratext *name);
   sword (*characters)(void *ctx, const oratext *ch, size_t len);
   sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len);
   sword (*processingInstruction)(void *ctx, const oratext *target, const 
oratext *data);
   sword (*notationDecl)(void *ctx, const oratext *name,
                         const oratext *publicId, const oratext *systemId);
   sword (*unparsedEntityDecl)(void *ctx, const oratext *name, const oratext 
*publicId,
                               const oratext *systemId, const oratext 
*notationName);
   sword (*nsStartElement)(void *ctx, const oratext *qname,
                           const oratext *local, const oratext *nsp,
                   const struct xmlnodes *attrs);
} xmlsaxcb;

Data Structures and Types

Callback Functions conforming to the SAX standard:

(void *ctx, const oratext *ch, size_t len)

Receive notification of character data inside an element.

(void *ctx)

Receive notification of the end of the document.

(void *ctx, const oratext *name)

Receive notification of the end of an element.

(void *ctx, const oratext *ch, size_t len)

Receive notification of ignorable whitespace in element content.

(void *ctx, const oratext *name, const oratext *publicId, const oratext 
*systemId)

Receive notification of a notation declaration.

(void *ctx, const oratext *target, const oratext *data)

Receive notification of a processing instruction.

(void *ctx)

Receive notification of the beginning of the document.

(void *ctx, const oratext *name, const struct xmlattrs *attrs)

Receive notification of the start of an element.

(void *ctx, const oratext *name, const oratext *publicId, const oratext 
*systemId, 
const oratext *notationName)

Receive notification of an unparsed entity declaration.

Non-SAX Callback Functions

(void *ctx, const oratext *qname, const oratext *local, const oratext 
*namespace, const struct xmlattrs *attrs)

Receive notification of the start of a namespace for an element.

Data Structure and Type Description

oratext

typedef unsigned char oratext;

sword

typedef signed int sword;

xmlattrs

typedef struct xmlattrs xmlattrs;

Note:
The contents of xmlattrs are private and must not be accessed by users.

Function Prototypes

characters

Purpose

This callback function receives notification of character data inside an element.

Syntax

sword (*characters)(void *ctx, const oratext *ch, size_t len);

Parameters

ctx (IN) - client context pointer

ch (IN) - the characters

len (IN) - number of characters to use from the character pointer

Comments

endDocument

Purpose

This callback function receives notification of the end of the document.

Syntax

sword (*endDocument)(void *ctx);

Parameters

ctx (IN) - client context

Comments

endElement

Purpose

This callback function receives notification of the end of an element.

Syntax

sword (*endElement)(void *ctx, const oratext *name);

Parameters

ctx (IN) - client context

name (IN) - element type name

Comments

ignorableWhitespace

Purpose

This callback function receives notification of ignorable whitespace in element content.

Syntax

sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len);

Parameters

ctx (IN) - client context

ch (IN) - whitespace characters

len (IN) - number of characters to use from the character pointer

Comments

notationDecl

Purpose

This callback function receives notification of a notation declaration.

Syntax

sword (*notationDecl)(void *ctx, const oratext *name, const oratext *publicId, 
const oratext *systemId);

Parameters

ctx (IN) - client context

name (IN) - notation name

publicId (IN) - notation public identifier, or null if not available

systemId (IN) - notation system identifier

Comments

processingInstruction

Purpose

This callback function receives notification of a processing instruction.

Syntax

sword (*processingInstruction)(void *ctx, const oratext *target, const oratext 
*data);

Parameters

ctx (IN) - client context

target (IN) - processing instruction target

data (IN) - processing instruction data, or null if none is supplied

Comments

startDocument

Purpose

This callback function receives notification of the beginning of the document.

Syntax

sword (*startDocument)(void *ctx);

Parameters

ctx (IN) - client context

Comments

startElement

Purpose

This callback function receives notification of the beginning of an element.

Syntax

sword (*startElement)(void *ctx, const oratext *name, const struct xmlattrs 
*attrs);

Parameters

ctx (IN) - client context

name (IN) - element type name

attrs (IN) - specified or defaulted attributes

Comments

unparsedEntityDecl

Purpose

This callback function receives notification of an unparsed entity declaration.

Syntax

sword (*unparsedEntityDecl)(void *ctx, const oratext *name, const oratext 
*publicId, const oratext *systemId, 
         const oratext *notationName);

Parameters

ctx (IN) - client context

name (IN) - entity name

publicId (IN) - entity public identifier, or null if not available

systemId (IN) - entity system identifier

notationName (IN) - name of the associated notation

Comments

nsStartElement

Purpose

This callback function receives notification of the start of a namespace for an element.

Syntax

sword (*nsStartElement)(void *ctx, const oratext *qname, const oratext *local, 
const oratext *namespace, 
         const struct xmlattrs *attrs));

Parameters

ctx (IN) - client context

qname (IN) - element fully qualified name

local (IN) - element local name

namespace (IN) - element namespace (URI)

attrs (IN) - specified or defaulted attributes

Comments

W3C DOM APIs

The Document Object Model (DOM) is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. In the DOM specification, the term document is used in the broad sense -- increasingly, XML is being used as a way of representing many different kinds of information that may be stored in diverse systems, and much of this would traditionally be seen as data rather than as documents. Nevertheless, XML presents this data as documents, and the DOM may be used to manage this data.

With the DOM, programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Anything found in an HTML or XML document can be accessed, changed, deleted, or added using the DOM, with a few exceptions -- in particular, the DOM interfaces for the XML internal and external subsets have not yet been specified.

One important objective of the W3C specification for the DOM is to provide a standard programming interface that can be used in a wide variety of environments and applications. The DOM is designed to be used with any programming language. Since the DOM standard is object-oriented, for this C adaptation, some changes had to be made:

Reused function names had to be expanded, e.g. getValue() in the attribute class is given the unique name getAttrValue(), matching the pattern established by getNodeValue().
Also, some functions were added to extend the DOM. For example, there is no function defined which returns the number of children of a node, so numChildNodes() was invented, etc.

The implementation of this C DOM interface follows REC-DOM-Level-1-19981001.

Data Structures and Types

boolean

Boolean value, TRUE or FALSE

oratext

String pointer

xmlcpmod

Content model node modifier

xmlctx

Master XML parser context

xmlnode

Document node

xmlnodes

Array of nodes

xmlntype

Node type enumeration

DOM Functions

appendChild

Append child node to current node

appendData

Append character data to end of node's current data

cloneNode

Create a new node identical to the current one

createAttribute

Create an new attribute for an element node

createCDATASection

Create a CDATA_SECTION node

createComment

Create a COMMENT node

createDocumentFragment

Create a DOCUMENT_FRAGMENT node

createElement

Create an ELEMENT node

createEntityReference

Create an ENTITY_REFERENCE node

createProcessingInstruction

Create a PROCESSING_INSTRUCTION (PI) node

createTextNode

Create a TEXT node

deleteData

Remove substring from a node's character data

getAttrName

Return an attribute's name

getAttrSpecified

Return value of attribute's specified flag [DOM getSpecified]

getAttrValue

Return an attribute's value (definition) [DOM getValue]

getAttribute

Return the value of an attribute

getAttributeIndex

Return an element's attribute given its index

getAttributeNode

Get an element's attribute node given its name [DOM getName]

getAttributes

Return array of element's attributes

getCharData

Return character data for a TEXT node [DOM getData]

getCharLength

Return length of TEXT node's character data [DOM getLength]

getChildNode

Return indexed node from array of nodes [DOM item]

getChildNodes

Return array of node's children

getContentModel

Returns the content model for an element from the DTD [DOM extension]

getDocument

Return top-level DOCUMENT node [DOM extension]

getDocumentElement

Return highest-level (root) ELEMENT node

getDocType

Return current DTD

getDocTypeEntities

Return array of DTD's general entities

getDocTypeName

Return name of DTD

getDocTypeNotations

Return array of DTD's notations

getElementsByTagName

Return list of elements with matching name

getEntityNotation

Return an entity's NDATA [DOM getNotation]

getEntityPubID

Return an entity's public ID [DOM getPublicId]

getEntitySysID

Return an entity's system ID [DOM getSystemId]

getFirstChild

Return the first child of a node

getImplementation

Return DOM-implementation structure (if defined)

getLastChild

Return the last child of a node

getModifier

Returns a content model node's '?', '*', or '+' modifier [DOM extension]

getNextSibling

Return a node's next sibling

getNamedItem

Returns the named node from a list of nodes

getNodeMapLength

Returns number of entries in a NodeMap [DOM getLength]

getNodeName

Returns a node's name

getNodeType

Returns a node's type code (enumeration)

getNodeValue

Returns a node's "value", its character data

getNotationPubID

Returns a notation's public ID [DOM getPublicId]

getNotationSysID

Returns a notation's system ID [DOM getSystemId]

getOwnerDocument

Returns the DOCUMENT node containing the given node

getPIData

Returns a processing instruction's data [DOM getData]

getPITarget

Returns a processing instruction's target [DOM getTarget]

getParentNode

Returns a node's parent node

getPreviousSibling

Returns a node's "previous" sibling

getTagName

Returns a node's "tagname", same as name for now

hasAttributes

Determine if element node has attributes [DOM extension]

hasChildNodes

Determine if node has children

hasFeature

Determine if DOM implementation supports a specific feature

insertBefore

Inserts a new child node before the given reference node

insertData

Inserts new character data into a node's existing data

isStandalone

Determine if document is standalone [DOM extension]

nodeValid

Validate a node against the current DTD [DOM extension]

normalize

Normalize a node by merging adjacent TEXT nodes

numAttributes

Returns number of element node's attributes [DOM extension]

numChildNodes

Returns number of node's children [DOM extension]

removeAttribute

Removes an element's attribute given its names

removeAttributeNode

Removes an element's attribute given its pointer

removeChild

Removes a node from its parents list of children

removeNamedItem

Removes a node from a list of nodes given its name

replaceChild

Replace one node with another

replaceData

Replace a substring of a node's character data with another string

setAttribute

Sets (adds or replaces) a new attribute for an element node given the attribute's name and value

setAttributeNode

Sets (adds or replaces) a new attribute for an element node given a pointer to the new attribute

setNamedItem

Sets (adds or replaces) a new node in a parent's list of children

setNodeValue

Sets a node's "value" (character data)

setPIData

Sets a processing instruction's data [DOM setData]

splitText

Split a node's character data into two parts

substringData

Return a substring of a node's character data

Data Structures and Types

boolean

typedef int boolean;

oratext

typedef unsigned char oratext;

xmlcpmod

Content model node modifiers, see getModifier.

XMLCPMOD_NONE  = 0                 /* no modifier */
XMLCPMOD_OPT   = 1                 /* '?' optional */
XMLCPMOD_0MORE = 2                 /* '*' zero or more */
XMLCPMOD_1MORE = 3                 /* '+' one or more */

xmlctx

typedef struct xmlctx xmlctx;

Note:
The contents of xmlctx are private and must not be accessed by users.

xmlnode

typedef struct xmlnode xmlnode;

Note:
The contents of xmlnode are private and must not be accessed by users.

xmlnodes

typedef struct xmlnodes xmlnodes;

Note:
The contents of xmlnodes are private and must not be accessed by users.

xmlntype

Parse tree node types, see getNodeType. Names and values match DOM specification.


ELEMENT_NODE                = 1    /* element */
ATTRIBUTE_NODE              = 2    /* attribute */
TEXT_NODE                   = 3    /* char data not escaped by CDATA */
CDATA_SECTION_NODE          = 4    /* char data escaped by CDATA */
ENTITY_REFERENCE_NODE       = 5    /* entity reference */
ENTITY_NODE                 = 6    /* entity */
PROCESSING_INSTRUCTION_NODE = 7    /* processing instruction */
COMMENT_NODE                = 8    /* comment */
DOCUMENT_NODE               = 9    /* document */
DOCUMENT_TYPE_NODE          = 10   /* DTD */
DOCUMENT_FRAGMENT_NODE      = 11   /* document fragment */
NOTATION_NODE               = 12   /* notation */

Function Prototypes

appendChild

Purpose

Adds new node to the end of the list of children for the given parent and returns the node added.

Syntax

xmlnode *appendChild(xmlctx *ctx, xmlnode *parent, xmlnode *newnode)

Parameters

ctx

(IN)

XML context

parent

(IN)

parent node

newnode

(IN)

new node to append

Example

xmlnode *node, *parent;
...

if (node = createElement(ctx, "node"))
    appendChild(ctx, parent, node);

appendData

Purpose

Append the given string to the character data of a TEXT or CDATA node.

Syntax

void appendData(xmlctx *ctx, xmlnode *node, const oratext *arg)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

arg

(IN)

new data to append

Example

xmlnode *node;
...
getNodeValue(node) -> "foo"
appendData(ctx, node, "bar");
getNodeValue(node) -> "foobar"

cloneNode

Purpose

Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns NULL).

Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node.

A deep clone differs in that the node's children are also recursively cloned instead of just pointed-to.

Syntax

xmlnode *cloneNode(xmlctx *ctx, const xmlnode *old, boolean deep)

Parameters

ctx

(IN)

XML context

old

(IN)

old node to clone

deep

(IN)

recursion flag

createAttribute

Purpose

Create a new ATTRIBUTE node with the given name and value. The new node is unattached and must be added to an element node with setAttributeNode.

Syntax

xmlnode *createAttribute(xmlctx *ctx, const oratext *name, const oratext *value)

Parameters

ctx

(IN)

XML context

name

(IN)

name of new attribute

value

(IN)

value of new attribute

Example

xmlnode *attr, *elem;
...
if (attr = createAttribute(ctx, "attr1", "value1"))
{
    setAttributeNode(ctx, elem, attr, NULL);
}

createCDATASection

Purpose

Create a new CDATA node.

Syntax

xmlnode *createCDATASection(xmlctx *ctx, const oratext *data)

Parameters

ctx

(IN)

XML context

data

(IN)

CDATA body

Example

xmlnode *node, *parent;
...
if (node = createCDATASection(ctx, "<greeting>H'o!</greeting>"))
    appendChild(ctx, parent, node);

createComment

Purpose

Create a new COMMENT node.

Syntax

xmlnode *createComment(xmlctx *ctx, const oratext *data)

Parameters

ctx

(IN)

XML context

data

(IN)

text of comment

Example

xmlnode *node, *parent;
...
if (node = createComment(ctx, "From here on this document is unfinished"))
    appendChild(ctx, parent, node);

createDocumentFragment

Purpose

Create a new DOCUMENT_FRAGMENT node. A document fragment is a lightweight document object that contains one or more children, but does not have the overhead of a full document. It can be used in some operations (inserting for example) in place of a simple node, in which case all the fragment's children are operated on instead of the fragment node itself.

Syntax

xmlnode *createDocumentFragment(xmlctx *ctx)

Parameters

ctx

(IN)

XML context

Example

xmlnode *frag, *fragelem, *fragtext;
...
if ((frag = createDocumentFragment(ctx)) &&
    (fragelem = createElement(ctx, (oratext *) "FragElem")) &&
    (fragtext = createTextNode(ctx, (oratext *) "FragText")))
{
    appendChild(ctx, frag, fragelem);
    appendChild(ctx, frag, fragtext);
}

createElement

Purpose

Create a new ELEMENT node.

Syntax

xmlnode *createElement(xmlctx *ctx, const oratext *elname)

Parameters

ctx

(IN)

XML context

elname

(IN)

name of new element

Example

xmlnode *node, *parent;
...
if (node = createElement(ctx, "BOOK"))
    appendChild(ctx, parent, node);

createEntityReference

Purpose

Create a new ENTITY_REFERENCE node.

Syntax

xmlnode *createEntityReference(xmlctx *ctx, const oratext *name)

Parameters

ctx

(IN)

XML context

name

(IN)

name of entity to reference

Example

xmlnode *node, *parent;
...
if (node = createEntityReference(ctx, "homephone"))
    appendChild(ctx, parent, node);

createProcessingInstruction

Purpose

Create a new PROCESSING_INSTRUCTION node with the given target and contents.

Syntax

xmlnode *createProcessingInstruction(xmlctx *ctx, const oratext *target, const 
oratext *data)

Parameters

ctx (IN) XML context target (IN) PI target data (IN) PI definition

Example

xmlnode *node, *parent;
...
if (node = createProcessingInstruction(ctx, "target", "definition"))
    appendChild(ctx, parent, node);

createTextNode

Purpose

Create a new TEXT node with the given contents.

Syntax

xmlnode *createTextNode(xmlctx *ctx, const oratext *data)

Parameters

ctx

(IN)

XML context

data

(IN)

data for node

Example

xmlnode *node, *parent;
...
if (node = createTextNode(ctx, "riverrun, past Eve and Adam's..."))
    appendChild(ctx, parent, node);

deleteData

Purpose

Delete a substring from the node's character data.

Syntax

void deleteData(xmlctx *ctx, xmlnode *node, ub4 offset, ub4 count)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

offset

(IN)

offset of start of substring (0 is first char)

count

(IN)

length of substring

Example

xmlnode *node;
...
getNodeValue(node) -> "phoenix"
deleteData(ctx, node, 2, 1);
getNodeValue(node) -> "phenix"

getAttribute

Purpose

Returns one attribute from an array of attributes, given an index (starting at 0). Fetch the attribute name and/or value (with getAttrName and getAttrValue). On error, returns NULL.

Syntax

const oratext *getAttribute(const xmlnode *node, const oratext *name)

Parameters

node

(IN)

node whose attribtutes to scan

name

(IN)

name of the attribute

Example

xmlnode  *node, *attr;
xmlnodes *nodes;
const oratext *attrval;
...
if (nodes = getAttributes(node))
{
    attr = getAttributeIndex(nodes, 1);/* second attribute */
    attrval = getAttribute(attr, "foo");
    ...
}

getAttributeIndex

Purpose

Returns one attribute from an array of attributes, given an index (starting at 0). Fetch the attribute name and/or value (with getAttrName and getAttrValue). On error, returns NULL.

Syntax

xmlnode *getAttributeIndex(const xmlnodes *attrs, size_t index)

Parameters

attrs

(IN)

pointer to attribute nodes structure (as returned by getAttributes)

index

(IN)

zero-based attribute# to return

Example

xmlnode  *node, *attr;
xmlnodes *nodes;
...
if (nodes = getAttributes(node))
{
    attr = getAttributeIndex(nodes, 1);      /* second attribute */
    ...
}

getAttributeNode

Purpose

Returns a pointer to the element node's attribute of the given name. If no such thing exists, returns NULL.

Syntax

xmlnode *getAttributeNode(const xmlnode *elem, const oratext *name)

Parameters

elem

(IN)

pointer to element node

name

(IN)

name of attribute

Example

xmlnode *node, *attr;
...
if (attr = getAttributeNode(elem, "attr1"))
    ...

getAttributes

Purpose

Returns an array of all attributes of the given node. This pointer may then be passed to getAttribute to fetch individual attribute pointers, or to numAttributes to return the total number of attributes. If no attributes are defined, returns NULL.

Syntax

xmlnodes *getAttributes(const xmlnode *node)

Parameters

node

(IN)

node whose attributes to return

Example

xmlnode  *node;
xmlnodes *nodes;
...
if (nodes = getAttributes(node))
    ...

getAttrName

Purpose

Given a pointer to an attribute, returns the name of the attribute. Under the DOM spec, this is a method named getName.

Syntax

const oratext *getAttrName(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrName(attr) -> "x"

getAttrSpecified

Purpose

Return the 'specified' flag for the attribute: if this attribute was explicitly given a value in the original document or through the DOM, this is TRUE; otherwise, it is FALSE. If the node is not an attribute, returns FALSE. Under the DOM spec, this is a method named getSpecified.

Syntax

boolean getAttrSpecified(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrSpecified(attr) -> TRUE

getAttrValue

Purpose

Given a pointer to an attribute, returns the "value" (definition) of the attribute. Under the DOM spec, this is a method named getValue.

Syntax

const oratext *getAttrValue(const xmlnode *attr)

Parameters

attr

(IN)

pointer to attribute (see getAttribute)

Example

xmlnode *elem, *attr;
...
attr = setAttribute(ctx, elem, "x", "y");
getAttrValue(attr) -> "y"

getCharData

Purpose

Returns the character data of a TEXT or CDATA node. Under the DOM spec, this is a method named getData.

Syntax

const oratext *getCharData(const xmlnode *node)

Parameters

node

(IN)

pointer to text node

Example

xmlnode *node;
...
if (node = createTextNode(ctx, "riverrun"))
    getCharData(node) -> "riverrun"

getCharLength

Purpose

Returns the length of the character data of a TEXT or CDATA node. Under the DOM spec, this is a method named getLength.

Syntax

ub4 getCharLength(const xmlnode *node)

Parameters

node

(IN)

pointer to text node

Example

xmlnode *node;
...
if (node = createTextNode(ctx, "prumptly"))
    getCharLength(node) -> 8

getChildNode

Purpose

Returns the nth node in an array of nodes, or NULL if the numbered node does not exist. Invented function, not in DOM, but named to match the DOM pattern.

Syntax

xmlnode* getChildNode(const xmlnodes *nodes, size_t index)

Parameters

nodes

(IN)

array of nodes (see getChildNodes)

index

(IN)

zero-based child#

Example

xmlnode  *node, *child;
xmlnodes *nodes;
...
if (nodes = getChildNodes(node))
{
    child = getChildNode(nodes, 1);/* second child node */
    ...
}

getChildNodes

Purpose

Returns the array of children of the given node. This pointer may then be passed to getChildNode to fetch individual children.

Syntax

xmlnodes* getChildNodes(const xmlnode *node)

Parameters

node

(IN)

node whose children to return

Example

xmlnode  *node;
xmlnodes *nodes;
...
if (nodes = getChildNodes(node))
    ...

getContentModel

Purpose

Returns the content model for the named element from the current DTD. The content model is composed of xmlnodes, so may be traversed with the same functions as the parsed document. See also the getModifier function which returns the '?', '*', and '+' modifiers to content model nodes.

Syntax

xmlnode *LpxGetContentModel(xmldtd *dtd, oratext *name)

Parameters

dtd

(IN)

pointer to the DTD

name

(IN)

name of element

getDocType

Purpose

Returns a pointer to the (opaque) DTD for the current document.

Syntax

xmldtd* getDocType(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

Example


xmlnodes *nodes;
...
nodes = getDocTypeEntities(getDocType(ctx));

getDocTypeEntities

Purpose

Returns an array of (general) entities defined for the given DTD.

Syntax

xmlnodes *getDocTypeEntities(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

Example


xmldtd   *dtd;
xmlnodes *entities;
...
dtd = getDocType(ctx);
entities = getDocTypeEntities(dtd);

getDocTypeName

Purpose

Returns the given DTD's name.

Syntax

oratext *getDocTypeName(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

getDocTypeNotations

Purpose

Returns an array of notations defined for the given DTD.

Syntax

xmlnodes *getDocTypeNotations(xmldtd* dtd)

Parameters

dtd

(IN)

pointer to DTD

Example


xmldtd   *dtd;
xmlnodes *notations;
...
dtd = getDocType(ctx);
notations = getDocTypeNotations(dtd);

getElementsByTagName

Purpose

Returns a list of all elements (within the tree rooted at the given node) with a given tag name in the order in which they would be encountered in a pre-order traversal of the tree. If root is NULL, the entire document is searched. The special value "*" matches all tags.

Syntax

xmlnodes *getElementsByTagName(xmlctx *ctx, xmlnode *root, const oratext *name)

Parameters

ctx

(IN)

XML parser context

root

(IN)

root node of tree

name

(IN)

element tag name

Example

xmlnodes *nodes;
...
nodes = getElementsByTagName(ctx, NULL, "ACT");/* find all ACT elements */

getDocument

Purpose

Returns the root node of the parsed document. The root node is always of type DOCUMENT_NODE. Compare to the getDocumentElement function, which returns the root element node, which is a child of the DOCUMENT node.

Syntax

xmlnode* getDocument(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

getDocumentElement

Purpose

Returns the root element (node) of the parsed document. The entire document is rooted at this node. Compare to getDocument which returns the uppermost DOCUMENT node (the parent of the root element node).

Syntax

xmlnode* getDocumentElement(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

getEntityNotation

Purpose

Returns an entity node's NDATA (notation). Under the DOM spec, this is a method named getNotationName.

Syntax

const oratext *getEntityNotation(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!NOTATION n SYSTEM "http://www.w3.org/">
<!ENTITY e SYSTEM "http://www.w3.org/" NDATA n>

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntityNotation(ent) -> "n"

getEntityPubID

Purpose

Returns an entity node's public ID. Under the DOM spec, this is a method named getPublicId.

Syntax

const oratext *getEntityPubID(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!ENTITY e PUBLIC "PublicID" "nop.ent">

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntityPubID(ent) -> "PublicID"

getEntitySysID

Purpose

Returns an entity node's system ID. Under the DOM spec, this is a method named getSystemId.

Syntax

const oratext *getEntitySysID(const xmlnode *ent)

Parameters

ent

(IN)

pointer to entity

Example


<!ENTITY e PUBLIC "PublicID" "nop.ent">

xmlnode *ent;/* assume ent will be set to ENTITY node above */
...
getEntitySysID(ent) -> "nop.ent"

getFirstChild

Purpose

Returns the first child of the given node, or NULL if the node has no children.

Syntax

xmlnode* getFirstChild(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode *elem;/* assume elem will point to element Thing */
...
getFirstChild(elem) -> element "A"

getImplementation

Purpose

This function returns a pointer to the DOMImplementation structure for this implementation, or NULL if no such information is available.

Syntax

xmldomimp* getImplementation(xmlctx *ctx)

Parameters

ctx

(IN)

XML context

getLastChild

Purpose

Returns the last child of the given node, or NULL if the node has no children.

Syntax

xmlnode* getLastChild(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode *elem;/* assume elem will point to element Thing */
...
getLastChild(elem) -> element "C"

getModifier

Purpose

Returns the modifier for a content model node. Possible values are XMLCPMOD_NONE (no modifier), XMLCPMOD_OPT ('?', optional), XMLCPMOD_0MORE ('*', zero or more), or XMLCPMOD_1MORE (one or more).

Syntax

xmlcpmod getModifier(xmlnode *node)

Parameters

node

(IN)

pointer to content model node

getNamedItem

Purpose

Returns the named node from an array nodes; sets the user's index (if provided) to the child# of the node (first node is zero).

Syntax

xmlnode *getNamedItem(const xmlnodes *nodes, const oratext *name, size_t *index)

Parameters

nodes

(IN)

array of nodes

name

(IN)

name of node to fetch

index

(OUT)

index of found node

Example


xmlnode  *node, *elem;
xmlnodes *nodes;
size_t    index;
...
if (nodes = getChildNodes(elem))
{
    node = getNamedItem(nodes, "FOO", &index);
    ...
}

getNextSibling

Purpose

This function returns a pointer to the next sibling of the given node, that is, the next child of the parent. For the last child, NULL is returned.

Syntax

xmlnode* getNextSibling(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode *node, *elem;/* assume elem will point to node Thing */
...
for (node = getFirstChild(elem); node; node = getNextSibling(node))
    ...node will be A then B then C...

getNodeMapLength

Purpose

Given an array of nodes (as returned by getChildNodes), returns the number of nodes in the map. Under the DOM spec, this is a member function named getLength.

Syntax

size_t getNodeMapLength(const xmlnodes *nodes)

Parameters

nodes

(IN)

array of nodes

Example


<Thing><A/><B/><C/></Thing>

xmlnodes *nodes;
xmlnode  *elem;/* assume elem will point to node Thing */
...
if (nodes = getChildNodes(elem))
    getNodeMapLength(nodes) -> 3

getNodeName

Purpose

Returns the name of the given node, or NULL if the node has no name. Note that "tagname" and "name" are currently synonymous.

Syntax

const oratext* getNodeName(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node Thing */
...
getNodeName(elem) -> "Thing"

getNodeType

Purpose

Returns the type code for a node.

Syntax

xmlntype getNodeType(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node Thing */
...
getNodeType(elem) -> ELEMENT_NODE

getNodeValue

Purpose

Returns the "value" (associated character data) for a node, or NULL if the node has no data.

Syntax

const oratext* getNodeValue(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<!--This is a comment-->

xmlnode *node;/* assume node will point to comment node above */
...
getNodeValue(node) -> "This is a comment"

getNotationPubID

Purpose

Return a notation node's public ID. Under the DOM spec, this is a method named getPublicId.

Syntax

const oratext *getNotationPubID(const xmlnode *note)

Parameters

note

(IN)

pointer to node

Example


<!NOTATION n PUBLIC "whatever">

xmlnode *note;/* assume note will point to notation node above */
...
getNotationPubID(note) -> "whatever"

getNotationSysID

Purpose

Return a notation node's system ID. Under the DOM spec, this is a method named getSystemId.

Syntax

const oratext *getNotationSysID(const xmlnode *note)

Parameters

note

(IN)

pointer to node

Example


<!NOTATION n SYSTEM "http://www.w3.org/">

xmlnode *note;/* assume note will point to notation node above */
...
getNotationSysID(note) -> "http://www.w3.org/"

getOwnerDocument

Purpose

Returns the document node which contains the given node. An XML document is always rooted in a node of type DOCUMENT_NODE. Calling getOwnerDocument on any node in the document returns that document node.

Syntax

xmlnode* getOwnerDocument(xmlnode *node)

Parameters

node

(IN)

pointer to node

getParentNode

Purpose

Returns the parent node of the given node. For the top-most node, NULL is returned.

Syntax

xmlnode* getParentNode(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example

<Thing><A/><B/><C/></Thing>

xmlnode  *elem;/* assume elem will point to node A */
...
getParentNode(elem) -> node Thing

getPIData

Purpose

Returns a Processing Instruction's (PI) data string. Under the DOM spec, this is a method named getData.

Syntax

const oratext *getPIData(const xmlnode *pi)

Parameters

pi

(IN)

pointer to PI node

Example


<?PI Blither blather?>

xmlnode *pi;/* assume pi will point to PI node above */
...
getPIData(pi) -> "Blither blather"

getPITarget

Purpose

Returns a Processing Instruction's (PI) target string. Under the DOM spec, this is a method named getTarget.

Syntax

const oratext *getPITarget(const xmlnode *pi)

Parameters

pi

(IN)

pointer to PI node

Example


<?PI Blither blather?>

xmlnode *pi;/* assume pi will point to PI node above */
...
getPITarget(pi) -> "PI"

getPreviousSibling

Purpose

Returns the previous sibling of the given node. That is, the node at the same level which came before this one. For the first child of a node, NULL is returned.

Syntax

xmlnode* getPreviousSibling(const xmlnode *node)

Parameters

node

(IN)

pointer to node

Example


<Thing><A/><B/><C/></Thing>

xmlnode *node, *elem;/* assume elem will point to node Thing */
...
for (node = getLastChild(elem); node; node = getPreviousSibling(node))
    ...node will be C then B then A...

getTagName

Purpose

Returns the "tagname" of a node, which is the same as its name for now, see getNodeName. The DOM says "...even though there is a generic nodeName attribute on the Node interface, there is still a tagName attribute on the Element interface; these two attributes must contain the same value, but the Working Group considers it worthwhile to support both, given the different constituencies the DOM API must satisfy.

Syntax

const oratext *getTagName(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasAttributes

Purpose

Determines if if the given node has any defined attributes, returning TRUE if so, FALSE if not. This is a DOM extension named after the pattern started by hasChildNodes.

Syntax

boolean hasAttributes(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasChildNodes

Purpose

Determines if the given node has children, returning TRUE if so, FALSE if not. The same result can be achieved by testing if getChildNodes returns a pointer (has children) or NULL (no children).

Syntax

boolean hasChildNodes(const xmlnode *node)

Parameters

node

(IN)

pointer to node

hasFeature

Purpose

Tests if the DOM implementation implements a specific feature and version. feature is the package name of the feature to test. In DOM Level 1, the legal values are "HTML" and "XML" (case-insensitive). version is the version number of the package name to test. In DOM Level 1, this is the string "1.0". If the version is not specified, supporting any version of the feature will cause the method to return TRUE.

Syntax

boolean hasFeature(xmlctx *ctx, const oratext *feature, const oratext *version)

Parameters

ctx

(IN)

XML context

feature

(IN)

the package name of the feature to test

version

(IN)

the version number of the package name to test

insertBefore

Purpose

Inserts a new node into the given parent node's list of children before the existing reference node. If the reference node is NULL, appends the new node at the end of the list. If the new node is a DocumentFragment, its children are inserted, in the same order, instead of the fragment itself. If the new node is already in the tree, it is first removed.

Syntax

xmlnode *insertBefore(xmlctx *ctx, xmlnode *parent, xmlnode *newChild, xmlnode 
*refChild)

Parameters

ctx

(IN)

XML context

parent

(IN)

parent node to insert into

newChild

(IN)

new child node to insert

refChild

(IN)

reference node to insert before

Example


<Thing><A/><B/><C/></Thing>

xmlnode *elem, *new, *ref;    /* assume elem points to Thing, new is a new
                                 element "Z", and ref points to node B */
...
insertBefore(ctx, elem, new, ref);

<Thing><A/><Z/><B/><C/></Thing>

insertData

Purpose

Inserts a string into the node character data at the specified offset.

Syntax

void insertData(xmlctx *ctx, xmlnode *node, ub4 offset, const oratext *arg)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

offset

(IN)

insertion point (0 is first position)

refChild

(IN)

new string to insert

Example


xmlnode *node;
...
getNodeValue(node) -> "abcdefg"
insertData(ctx, node, 3, "ZZZ");
getNodeValue(node) -> "abcZZZdefg"

isStandalone

Purpose

Returns the value of the standalone flag as specified in the document's <?xml?> processing instruction. This is an invented function, not in DOM spec, but named to match the DOM pattern.

Syntax

boolean isStandalone(xmlctx *ctx)

Parameters

ctx

(IN)

XML parser context

nodeValid

Purpose

Validate a node against the DTD. Returns 0 on success, else a non-zero error code (which can be looked up in the message file). This function is provided for applications which construct their own documents via the API and/or Class Generator. Normally the parser will validate the document and the user need not call nodeValid explicitly.

Syntax

uword nodeValid(xmlctx *ctx, const xmlnode *node)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

normalize

Purpose

"Normalizes" an element, i.e. merges adjacent TEXT nodes. Adjacent TEXT nodes don't happen during a normal parse, only when extra nodes are inserted via the DOM.

Syntax

void normalize(xmlctx *ctx, xmlnode *elem)

Parameters

ctx

(IN)

XML context

elem

(IN)

pointer to element node

Example


xmlnode *node, *t1, *t2;
...
if ((node = createElement(ctx, "FOO")) &&
    (t1 = createTextNode(ctx, "one of ")) &&
    (t2 = createTextNode(ctx, "these days")) &&
    appendChild(ctx, node, t1) &&
    appendChild(ctx, node, t2))
{
    <FOO>"one of " "these days"</FOO>
    normalize(ctx, node);
    <FOO>"one of these days"</FOO>
}

numAttributes

Purpose

Returns the number of defined attributes in an attribute array (as returned by getAttributes). This is an invented function, not in the DOM spec, but named after the DOM pattern.

Syntax

size_t numAttributes(const xmlnodes *attrs)

Parameters

attrs

(IN)

array of attributes

Example


xmlnodes *nodes;
xmlnode  *node;
size_t    i;
...
if (nodes = getAttributes(node))
{
    for (i = 0; i < numAttributes(nodes); i++)
        ...
}

numChildNodes

Purpose

Returns the number of children in an array of nodes (as returned by getChildNodes). This is an invented function, not in the DOM spec, but named after the DOM pattern.

Syntax

size_t numChildNodes(const xmlnodes *nodes)

Parameters

nodes

(IN)

pointer to opaque nodes structure

Example


xmlnodes *nodes;
xmlnode  *elem;
size_t    i;
...
if (nodes = getChildNodes(elem))
{
    for (i = 0; i < numChildNodes(nodes); i++)
        ...
}

removeAttribute

Purpose

Removes the named attribute from an element node. If the removed attribute has a default value it is immediately replaced.

Syntax

void removeAttribute(xmlnode *elem, const oratext *name)

Parameters

elem

(IN)

pointer to element node

name

(IN)

name of attribute to remove

Example


<!ATTLIST FOO attr CDATA 'default'>

xmlnode *elem;/* assume elem point to a FOO node */
...
<FOO attr="snark"/>
removeAttribute(elem, "attr");
<FOO attr="default"/>

removeAttributeNode

Purpose

Removes an attribute from an element, given a pointer to the attribute. If successful, returns the attribute node back. On error, returns NULL.

Syntax

xmlnode *removeAttributeNode(xmlnode *elem, xmlnode *attr)

Parameters

elem

(IN)

pointer to element node

attr

(IN)

attribute node to remove

Example


xmlnode *elem, *attr;
...
if (attr = getAttributeNode(elem, "attr1"))
    removeAttributeNode(elem, attr);

removeChild

Purpose

Removes the given node from its parent and returns it.

Syntax

xmlnode *removeChild(xmlnode *node)

Parameters

node

(IN)

old node to remove

Example


xmlnodes *nodes;
xmlnode  *elem, *node;
...
if ((nodes = getChildNodes(elem)) &&
    (node = getNamedItem(nodes, "B", NULL))
{
    <Thing><A/><B/><C/></Thing>
    removeChild(node);
    <Thing><A/><C/></Thing>
}

removeNamedItem

Purpose

Removes the named node from an array of nodes.

Syntax

xmlnode *removeNamedItem(xmlnodes *nodes, const oratext *name)

Parameters

nodes

(IN)

list of nodes

name

(IN)

name of node to remove

Example


xmlnodes *nodes;
xmlnode  *elem;
...
if (nodes = getChildNodes(elem))
{
    <Thing><A/><B/><C/></Thing>
    removeNamedItem(nodes, "B");
    <Thing><A/><C/></Thing>
}

replaceChild

Purpose

Replaces an existing child node with a new node and returns the old node. If the new node is already in the tree, it is first removed.

Syntax

xmlnode *replaceChild(xmlctx *ctx, xmlnode *newChild, xmlnode *oldChild)

Parameters

ctx

(IN)

XML context

newChild

(IN)

new replacement node

oldChild

(IN)

old node being replaced

Example


xmlnodes *nodes;
xmlnode  *elem, *old, *new;
...
if ((nodes = getChildNodes(elem)) &&
    (old = getNamedItem(nodes, "B", NULL)) &&
    (new = createElement(ctx, "NEW")))
{
    <Thing><A/><B/><C/></Thing>
    replaceChild(ctx, new, old);
    <Thing><A/><NEW/><C/></Thing>
}

replaceData

Purpose

Replaces the substring at the given character offset and length with a replacement string.

Syntax

void replaceData(xmlctx *ctx, xmlnode *node, ub4 offset, ub4 count, oratext 
*arg)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

offset

(IN)

start of substring to replace (0 is first character)

count

(IN)

length of old substring

arg

(IN)

replacement text

Example


xmlnode *node;
...
getNodeValue(node) -> "every dog has his day"
replaceData(ctx, node, 6, 3, "man");
getNodeValue(node) -> "every man has his day"

setAttribute

Purpose

Create a new attribute for an element. If the named attribute already exists, its value is simply replaced.

Syntax

xmlnode *setAttribute(xmlctx *ctx, xmlnode *elem, const oratext *name, const 
oratext *value)

Parameters

ctx

(IN)

XML context

elem

(IN)

pointer to element node

name

(IN)

name of new attribute

value

(IN)

value of new attribute

Example


xmlnode *elem;
...
<Thing/>
setAttribute(ctx, elem, "attr", "value");
<Thing attr="value"/>

setAttributeNode

Purpose

Adds a new attribute to the given element. If the named attribute already exists, it is replaced and the user's old pointer (if provided) is set to the old attr. If the attribute is new, it is added and the old pointer is set to NULL. Returns a truth value indicating success.

Syntax

boolean setAttributeNode(xmlctx *ctx, xmlnode *elem,
                         xmlnode *newNode, xmlnode **oldNode)

Parameters

ctx

(IN)

XML context

elem

(IN)

pointer to element node

newNode

(IN)

pointer to new attribute

oldNode

(OUT)

return pointer for old attribute

Example


xmlnode *elem, *attr;
...
if (attr = createAttribute(ctx, "attr", "value"))
{
    <Thing/>
    setAttributeNode(ctx, elem, attr, NULL);
    <Thing attr="value"/>
}

setNamedItem

Purpose

Sets a new child node in a parent node's map; if an old node exists with same name, replaces the old node (and sets user's pointer, if provided, to it); if no such named node exists, appends node to map and sets pointer to NULL.

Syntax

boolean setNamedItem(xmlctx *ctx, xmlnode *parent, xmlnode *node, xmlnode **old)

Parameters

node

(IN)

pointer to node

parent

(IN)

parent to add node to

node

(IN)

new node to add

old

(IN)

pointer to replaced node

Example


xmlnode *elem, *new;
...
if ((new = createElement(ctx, "B")) &&
    setAttribute(ctx, new, "attr", "value"))
{
    <Thing><A/><B/><C/></Thing>
    setNamedItem(ctx, elem, new, NULL);
    <Thing><A/><B attr="value"/><C/></Thing>
}

setNodeValue

Purpose

Sets the value (character data) associated with a node.

Syntax

boolean setNodeValue(xmlnode *node, const oratext *data)

Parameters

node

(IN)

pointer to node

data

(IN)

new data for node

Example


xmlnode *node;
...
getNodeValue(node) -> "umbrella"
setNodeValue(node, "brolly");
getNodeValue(node) -> "brolly"

setPIData

Purpose

Sets a Processing Instruction's (PI) data (equivalent to setNodeValue). It is not permitted to set the data to NULL. Under the DOM spec, this is a method named setData.

Syntax

void setPIData(xmlnode *pi, const oratext *data)

Parameters

pi

(IN)

pointer to PI node

data

(IN)

new data for PI

Example


xmlnode *pi;
...
<?SKRINKLIT Monster Grendel's tastes are plainish?>
setPIData(pi, "Breakfast?  Just a couple Danish.");
<?SKRINKLIT Breakfast?  Just a couple Danish.?>

splitText

Purpose

Breaks a TEXT node into two TEXT nodes at the specified offset, keeping both in the tree as siblings. The original node then only contains all the content up to the offset point. And a new node, which is inserted as the next sibling of the original, contains all the old content starting at the offset point.

Syntax

xmlnode *splitText(xmlctx *ctx, xmlnode *old, uword offset)

Parameters

ctx

(IN)

XML context

old

(IN)

original node to split

offset

(IN)

offset of split point

Example


xmlnode *node;
...
<FOO>"one of these days"</FOO>
splitText(ctx, node, 7);
<FOO>"one of " "these days"</FOO>

substringData

Purpose

Returns a substring of a node's character data.

Syntax

const oratext *substringData(xmlctx *ctx, const xmlnode *node, ub4 offset, ub4 
count)

Parameters

ctx

(IN)

XML context

node

(IN)

pointer to node

offset

(IN)

offset of start of substring

count

(IN)

length of substring

Example


xmlnode *node;
...
<FOO>"one of these days"</FOO>
substringData(ctx, node, 0, 3) -> "one"

Namespace APIs

Namespace APIs provide an interface that is an extension to the DOM and give information relating to the document namespaces.

XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references. A single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.

These considerations require that document constructs should have universal names, whose scope extends beyond their containing document. This C implementation of XML namespaces provides a mechanism to accomplish this.

Names from XML namespaces may appear as qualified names, which contain a single colon, separating the name into a namespace prefix and a local part. The prefix, which is mapped to a URI reference, selects a namespace. The combination of the universally managed URI namespace and the document's own namespace produces identifiers that are universally unique. Mechanisms are provided for prefix scoping and defaulting.

URI references can contain characters not allowed in names, so cannot be used directly as namespace prefixes. Therefore, the namespace prefix serves as a proxy for a URI reference. An attribute-based syntax described in the W3C Namespace specification is used to declare the association of the namespace prefix with a URI reference.

The implementation of this C Namespace interface followed the XML Namespace standard of revision REC-xml-names-19990114.

Data Structures and Types

oratext
xmlattr
xmlnode

Functions

getAttrLocal(xmlattr *attrs)

Returns attribute local name.

getAttrNamespace(xmlattr *attr)

Returns attribute namespace (URI).

getAttrPrefix(xmlattr *attr)

Returns attribute prefix.

getAttrQualifiedName(xmlattr *attr)

Returns attribute fully qualified name.

getNodeLocal(xmlnode *node)

Returns node local name.

getNodeNamespace(xmlnode *node)

Returns node namespace (URI).

getNodePrefix(xmlnode *node)

Returns node prefix.

getNodeQualifiedName(xmlnode *node)

Returns node qualified name.

Data Structure and Type Description

ORATEXT 
typedef unsigned char oratext;

XMLATTR 

typedef struct xmlattr xmlattr;

Note:
the contents of xmlattr are private and must not be accessed by users.

XMLNODE


typedef struct xmlnode xmlnode;

Note:
the contents of xmlnode are private and must not be accessed by users.

Function Prototypes

getAttrLocal

Purpose

This function returns the local name of this attribute.

Syntax

const oratext *getAttrLocal(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrNamespace

Purpose

This function returns namespace for this attribute.

Syntax

const oratext *getAttrNamespace(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrPrefix

Purpose

This function returns prefix for this attribute.

Syntax

const oratext *getAttrPrefix(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getAttrQualifiedName

Purpose

This function returns fully qualified name for the attribute.

Syntax

const oratext *getAttrQualifiedName(const xmlattr *attr);

Parameters

attr (IN) - pointer to opaque attribute structure (see getAttribute)

Comments

getNodeLocal

Purpose

This function returns the local name of this node.

Syntax

const oratext *getNodeLocal(const xmlnode *node);

Parameters

node (IN) - node to get local name from

Comments

getNodeNamespace

Purpose

This function returns namespace for this node.

Syntax

const oratext *getNodeNamespace(const xmlnode *node);

Parameters

node (IN) - node to get namespace from

Comments

getNodePrefix

Purpose

This function returns prefix for this node.

Syntax

const oratext *getNodePrefix(const xmlnode *node);

Parameters

node (IN) - node to get prefix from

Comments

getNodeQualifiedName

Purpose

This function returns fully qualified name for this node.

Syntax

const oratext *getNodeQualifiedName(const xmlnode *node);

Parameters

node (IN) - node to get name from

oratext	String pointer
xmlctx	Master XML context
xmlmemcb	Memory callback structure (optional)
xmlsaxcb	SAX callback structure (SAX only)
ub4	32-bit (or larger) unsigned integer
uword	Native unsigned integer

xmlinit	Initialize XML parser
xmlclean	Clean up memory used during parse
xmlparse	Parse a file
xmlparsebuf	Parse a buffer
xmlterm	Shut down XML parser
createDocument	Create a new document
isStandalone	Return document's standalone flag

boolean	Boolean value, TRUE or FALSE
oratext	String pointer
xmlcpmod	Content model node modifier
xmlctx	Master XML parser context
xmlnode	Document node
xmlnodes	Array of nodes
xmlntype	Node type enumeration

appendChild	Append child node to current node
appendData	Append character data to end of node's current data
cloneNode	Create a new node identical to the current one
createAttribute	Create an new attribute for an element node
createCDATASection	Create a CDATA_SECTION node
createComment	Create a COMMENT node
createDocumentFragment	Create a DOCUMENT_FRAGMENT node
createElement	Create an ELEMENT node
createEntityReference	Create an ENTITY_REFERENCE node
createProcessingInstruction	Create a PROCESSING_INSTRUCTION (PI) node
createTextNode	Create a TEXT node
deleteData	Remove substring from a node's character data
getAttrName	Return an attribute's name
getAttrSpecified	Return value of attribute's specified flag [DOM getSpecified]
getAttrValue	Return an attribute's value (definition) [DOM getValue]
getAttribute	Return the value of an attribute
getAttributeIndex	Return an element's attribute given its index
getAttributeNode	Get an element's attribute node given its name [DOM getName]
getAttributes	Return array of element's attributes
getCharData	Return character data for a TEXT node [DOM getData]
getCharLength	Return length of TEXT node's character data [DOM getLength]
getChildNode	Return indexed node from array of nodes [DOM item]
getChildNodes	Return array of node's children
getContentModel	Returns the content model for an element from the DTD [DOM extension]
getDocument	Return top-level DOCUMENT node [DOM extension]
getDocumentElement	Return highest-level (root) ELEMENT node
getDocType	Return current DTD
getDocTypeEntities	Return array of DTD's general entities
getDocTypeName	Return name of DTD
getDocTypeNotations	Return array of DTD's notations
getElementsByTagName	Return list of elements with matching name
getEntityNotation	Return an entity's NDATA [DOM getNotation]
getEntityPubID	Return an entity's public ID [DOM getPublicId]
getEntitySysID	Return an entity's system ID [DOM getSystemId]
getFirstChild	Return the first child of a node
getImplementation	Return DOM-implementation structure (if defined)
getLastChild	Return the last child of a node
getModifier	Returns a content model node's '?', '*', or '+' modifier [DOM extension]
getNextSibling	Return a node's next sibling
getNamedItem	Returns the named node from a list of nodes
getNodeMapLength	Returns number of entries in a NodeMap [DOM getLength]
getNodeName	Returns a node's name
getNodeType	Returns a node's type code (enumeration)
getNodeValue	Returns a node's "value", its character data
getNotationPubID	Returns a notation's public ID [DOM getPublicId]
getNotationSysID	Returns a notation's system ID [DOM getSystemId]
getOwnerDocument	Returns the DOCUMENT node containing the given node
getPIData	Returns a processing instruction's data [DOM getData]
getPITarget	Returns a processing instruction's target [DOM getTarget]
getParentNode	Returns a node's parent node
getPreviousSibling	Returns a node's "previous" sibling
getTagName	Returns a node's "tagname", same as name for now
hasAttributes	Determine if element node has attributes [DOM extension]
hasChildNodes	Determine if node has children
hasFeature	Determine if DOM implementation supports a specific feature
insertBefore	Inserts a new child node before the given reference node
insertData	Inserts new character data into a node's existing data
isStandalone	Determine if document is standalone [DOM extension]
nodeValid	Validate a node against the current DTD [DOM extension]
normalize	Normalize a node by merging adjacent TEXT nodes
numAttributes	Returns number of element node's attributes [DOM extension]
numChildNodes	Returns number of node's children [DOM extension]
removeAttribute	Removes an element's attribute given its names
removeAttributeNode	Removes an element's attribute given its pointer
removeChild	Removes a node from its parents list of children
removeNamedItem	Removes a node from a list of nodes given its name
replaceChild	Replace one node with another
replaceData	Replace a substring of a node's character data with another string
setAttribute	Sets (adds or replaces) a new attribute for an element node given the attribute's name and value
setAttributeNode	Sets (adds or replaces) a new attribute for an element node given a pointer to the new attribute
setNamedItem	Sets (adds or replaces) a new node in a parent's list of children
setNodeValue	Sets a node's "value" (character data)
setPIData	Sets a processing instruction's data [DOM setData]
splitText	Split a node's character data into two parts
substringData	Return a substring of a node's character data

ctx	(IN)	XML context
parent	(IN)	parent node
newnode	(IN)	new node to append

ctx	(IN)	XML context
node	(IN)	pointer to node
arg	(IN)	new data to append

ctx	(IN)	XML context
name	(IN)	name of new attribute
value	(IN)	value of new attribute

node	(IN)	node whose attribtutes to scan
name	(IN)	name of the attribute

attrs	(IN)	pointer to attribute nodes structure (as returned by getAttributes)
index	(IN)	zero-based attribute# to return

nodes	(IN)	array of nodes (see getChildNodes)
index	(IN)	zero-based child#

ctx	(IN)	XML parser context
root	(IN)	root node of tree
name	(IN)	element tag name

ctx	(IN)	XML context
feature	(IN)	the package name of the feature to test
version	(IN)	the version number of the package name to test

elem	(IN)	pointer to element node
name	(IN)	name of attribute to remove

ctx	(IN)	XML context
newChild	(IN)	new replacement node
oldChild	(IN)	old node being replaced

6 XML Parser for C

Parser APIs

Calling Sequence

Memory

Thread Safety

Data Types Index

Function Index

Data Structures and Types

oratext

xmlctx

xmlmemcb

xmlsaxcb

ub4

uword

Functions

xmlinit

Purpose

Syntax

Parameters

Comments

xmlclean

Purpose

Syntax

Parameters

Comments

xmlparse

Purpose

Syntax

Parameters

Comments

xmlparsebuf

Purpose

Syntax

Parameters

Comments

xmlterm

Purpose

Syntax

Parameters

Comments

createDocument

Purpose

Syntax

Parameters

Comments

isStandalone

Purpose

Syntax

Parameters

Comments

XSLT API

Data Structures and Types

Functions

Data Structure and Type Description

uword

xmlctx

xmlnode

Function Prototypes

xslprocess

Purpose

Syntax

Parameters

W3C SAX APIs

Data Structures and Types

Non-SAX Callback Functions

Data Structure and Type Description

oratext

sword

xmlattrs

Function Prototypes

characters

Purpose

Syntax

Parameters

Comments

endDocument

Purpose

Syntax

Parameters

Comments

6
XML Parser for C