http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Release Info

Installation
Download
Build Instructions

FAQs
Samples
API Docs

DOM C++ Binding
Programming
Migration Guide

Feedback
Bug-Reporting
PDF Document

CVS Repository
Mail Archive

Migrating to earlier Releases
 

Migrating from Xerces-C++ 2.3.0 to Xerces-C++ 2.4.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.3.0 code base and the Xerces-C++ 2.4.0.

Topics discussed are:

New features in Xerces-C++ 2.4.0
 
  • PSVI
  • Performance enhancement
  • Stateless Grammar
  • Grammar Serialization/Deserialiation

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.3.0; and the Xerces-C++ 2.4.0 releases of the parser.

New Public API
 
  • PSVI related
  • Grammar serialization/deserialization related

Modified Public API
 

Deprecated/Removed Public API
 
  • XMLAttDef: getProvided, getDOMTypeInfoUri, getDOMTypeInfoName, setProvided
  • XMLAttDefList: hasMoreElements, nextElement, Reset
  • DTDAttDefList: hasMoreElements, nextElement, Reset
  • SchemaAttDefList: hasMoreElements, nextElement, Reset
  • XMLElementDecl: LookupOpts
  • XMLNumber family: toString
  • ENTITYDatatypeValidator: setEntityDeclPool
  • IDDatatypeValidator: setIDRefList
  • IDREFDatatypeValidator: setIDRefList
  • GeneralAttributeCheck: setIDRefList
  • SchemaGrammar: getIDRefList
  • SchemaElementDecl: all non thread safe methods
  • SchemaAttDef: getters
  • DTDGrammar: getRootElemId



Migrating from Xerces-C++ 2.2.0 to Xerces-C++ 2.3.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.2.0 code base and the Xerces-C++ 2.3.0.

Topics discussed are:

New features in Xerces-C++ 2.3.0
 
  • Experimental Implementation of Namespaces in XML 1.1
  • Experimental Implementation of XML 1.1: in DOMWriter
  • More Schema 1.0 Errata Implementation
  • More DOM L3 Core Support
    • DOMConfiguration
    • Document Normalization
  • Plugable Memory Manager
  • Plugable Security Manager
  • Plugable Panic Handler
  • Logical Path Resolution

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.2.0; and the Xerces-C++ 2.3.0 releases of the parser.

New Public API
 
  • To support additional DOM L3 functions, the following are added:
  • DOMDocument: getDOMConfiguration
  • DOMConfiguration class for document normalization.

Modified Public API
 

Deprecated/Removed Public API
 
  • DOMDocument canSetNormalizationFeature, setNormalizationFeature, getNormalizationFeature, getErrorHandler, setErrorHandler removed



Migrating from Xerces-C++ 2.1.0 to Xerces-C++ 2.2.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.1.0 code base and the Xerces-C++ 2.2.0.

Topics discussed are:

New features in Xerces-C++ 2.2.0
 
  • C++ Namespace Support
  • Schema 1.0 Errata Implementation
  • Experimental Implementation of XML 1.1
  • More DOM L3 Core Support:
    • DOMNode: baseURI
    • DOMAttr: isId, getTypeInfo
    • DOMElement: setIdAttribute, setIdAttributeNS, setIdAttributeNode, getTypeInfo
  • DOM Message: make use of the non-standard extension DOMImplementation::loadDOMExceptionMsg to load the default error text message for the correspond Exception Code.
  • New feature XMLPlatformUtils::Initialize(const char* const locale) to set the locale for message loader. See Specify locale for Message Loader for details
  • Support Build with ICU Message Loader, or Message Catalog Message Loader
  • RPM for Linux
  • 390: Uniconv390 support
  • 390: support record-oriented MVS datasets with the DOM Level 3 serialization APIs
  • Support for Linux/390
  • Performance: Break Scanner for different functionalities and many other performance improvement
  • New feature, "http://apache.org/xml/features/dom/byte-order-mark", allows user to enable DOMWriter to write Byte-Order-Mark in the output XML stream, See Xercesc Feature: Byte Order Mark for details

Using C++ Namespace
 

Xerces-C++ 2.2.0 now supports C++ Namespace. All Xerces-C++ classes/data/variables are defined in the "namespace xercesc" if C++ Namespace support is ENABLED.

All the binary distributions of Xerces-C++ 2.2.0 are now built with C++ Namespace enabled. Therefore users' applications that links with the distributed binary packages must namespace qualified all the Xerces-C++ classes/data/variables with "xercesc::" or add the "using namespace xercesc" clause.

See the Programming Guide Using C++ Namespace for more details.


Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.1.0; and the Xerces-C++ 2.2.0 releases of the parser.

New Public API
 
  • To support additional DOM L3 functions, the following are added:
    • DOMAttr: isId, getTypeInfo
    • DOMElement: setIdAttribute, setIdAttributeNS, setIdAttributeNode, getTypeInfo
    • Added DOMTypeInfo class for getTypeInfo class in DOMElement and DOMAttr
    • Added getDOMTypeInfoUri, getDOMTypeInfoName to XMLAttDef and XMLElementDecl for use in building DOMTypeInfo
  • Added a non-standard extension DOMImplementation::loadDOMExceptionMsg to load the default error message for the corresponding DOMException code.
  • XMLAttr: Added a constructor and a set method to allow creating/setting of XMLAttr using a rawname.
  • Added XMLUri::getUriText to return the URI as a string specification.
  • Add XMLString::fixURI to transform an absolute path filename to standard URI form.
  • Added XMLString::equals for faster string comparison.
  • To allow users to tell the parser to force standard uri conformance, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: get/setStandardUriConformant
    • and DOMBuilder/SAX2XMLReader will recognize the feature http://apache.org/xml/features/standard-uri-conformant
  • Add XMLURL::hasInvalidChar() to indicate if the URL has invalid char as per RFC standard
  • To allow users to enable/disable src offset calculation, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: get/setCalculateSrcOfs
    • and DOMBuilder/SAX2XMLReader will recognize the feature http://apache.org/xml/features/calculate-src-ofst
  • To allow users to select the scanner when scanning XML documents, the following are added:
    • XercesDOMParser/DOMParser/SAXParser: useScanner
    • and DOMBuilder/SAX2XMLReader will recognize the property http://apache.org/xml/properties/scannerName
  • Added getSrcOffset to XercesDOMParser/DOMParser/SAXParser/DOMBuilder/SAX2XMLReader to allow users to get the current src offset within the input source.

Modified Public API
 
  • The following DOM functions are being added a const modifier.
    • DOMImplementation::hasFeature
    • DOMNode: isSameNode, isEqualNode, compareTreePosition
  • XMLPlatformUtils::Initialize() takes a parameter specifying locale for message loader, with default value "en_US".
  • To fix [Bug 13641], the QName copy constructor is corrected to take a reference as parameter, i.e. QName(const QName& qname).
  • To fix [Bug 12232], the QName operator== has been added a const modified.
  • Move XMLUri copy constructor and operator = as public.
  • Move XMLUri::isURIString as public.
  • For validation purpose, added two more default parameters to XMLValidator::validateAttrValue.
  • To fix [Bug 15802], the getURIText of DOMParser/XercesDOMParser/SAXParser/SAX2XMLReader are being added a const modifier.

Deprecated/Removed Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 2.0.0 to Xerces-C++ 2.1.0
 

The following section is a discussion of the technical differences between Xerces-C++ 2.0.0 code base and the Xerces-C++ 2.1.0.

Topics discussed are:

New features in Xerces-C++ 2.1.0
 
  • 64 bit binaries distribution on Windows IA64 and Linux IA64
  • Support for Cygwin environment
  • DOM Level 3 DOMNode: compareTreePosition, lookupNamespaceURI, lookupNamespacePrefix and isDefaultNamespace
  • plus many more bug fixes

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 2.0.0; and the Xerces-C++ 2.1.0 releases of the parser.

New Public API
 
  • To fix bug 7087, XMLEnumerator is added a virtual destructor.
  • To fix bug 11448, XMLNotationDecl::get/setBaseURI, and XMLEntityDecl::get/setBaseURI are added.

Modified Public API
 
  • DOMNodeList: item, and getLength have been added a const modifier.
  • DOMNode: lookupNamespacePrefix, isDefaultNamespace, and lookupNamespaceURI have been added a const modifier.

Deprecated/Removed Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 1.7.0 to Xerces-C++ 2.0.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.7.0 code base and the Xerces-C++ 2.0.0.

Topics discussed are:

New features in Xerces-C++ 2.0.0
 
  • 64 bit binaries distribution
  • Follow Unix Shared Library Naming Convention
  • Apache Recommended DOM C++ Binding
  • Experimental DOM Level 3 subset support, including DOMWriter and DOMBuilder
  • Grammar preparsing and Grammar caching
  • Optionally ignore loading of external DTD
  • Project files for Microsoft Visual C++ .Net
  • Codewarrior 8 support
  • Option to enable/disable strict IANA encoding name checking
  • plus many more bug fixes and performance enhancement

Unix Library Name Change
 

The Xerces-C++ UNIX Library now follows the Unix Shared Library Naming Convention (libname.so.soname). It is now called:

  • AIX
    • libxerces-c25.0.so
    • symbolic link: libxerces-c.so ----> libxerces-c25.so
    • symbolic link: libxerces-c25.so ----> libxerces-c25.0.so
  • Solaris / Linux
    • libxerces-c.so.25.0
    • symbolic link: libxerces-c.so ----> libxerces-c.so.25
    • symbolic link: libxerces-c.so.25 ----> libxerces-c.so.25.0
  • HP-UX
    • libxerces-c.sl.25.0
    • symbolic link: libxerces-c.sl ----> libxerces-c.sl.25
    • symbolic link: libxerces-c.sl.25 ----> libxerces-c.sl.25.0

DOM Reorganization
 

1. The old Java-like DOM is now deprecated, and all the associated files, including the headers and DOMParser files are moved to src/xercesc/dom/deprecated. Users of the old Java-like DOM are required to change all their #include lines to pick up the headers. For example

//old code
#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOM_Document.hpp>
#include <xercesc/parsers/DOMParser.hpp>

void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/deprecated/DOM.hpp>          //<==== change this include line
#include <xercesc/dom/deprecated/DOM_Document.hpp> //<==== change this include line
#include <xercesc/dom/deprecated/DOMParser.hpp>    //<==== change this include line

// the rest is the same
void test(char* xmlFile) {
    DOMParser parser;
    parser.parse(xmlFile);
    DOM_Document doc = parser.getDocument();
    :
    return;
}

2. The Experimental IDOM is now renamed, and becomes the Apache Recommended DOM C++ Binding. The following changes are made:

  • class names are renamed from IDOM_XXXX to DOMXXXX, e.g. IDOM_Document to DOMDocument
  • and thus header files are renamed from IDOM_XXXX.hpp to DOMXXXX.hpp and are moved to src/xercesc/dom
  • the IDOMParser is renamed to XercesDOMParser. And thus the header file is renamed as well
  • the rest is the same, see Apache Recommended DOM C++ binding and DOM Programming Guide for more programming information

Users of IDOM are required to change all their #include lines and do a global rename of IDOMParser to XercesDOMParesr, and IDOM_XXXX to DOMXXXX. For example

//old code
#include <xercesc/idom/IDOM.hpp>
#include <xercesc/idom/IDOM_Document.hpp>
#include <xercesc/parsers/IDOMParser.hpp>

void test(char* xmlFile) {
    IDOMParser parser;
    parser.parse(xmlFile);
    IDOM_Document* doc = parser.getDocument();
    :
    return;
}

should now change to

//new code
#include <xercesc/dom/DOM.hpp>                  //<==== change this include line
#include <xercesc/dom/DOMDocument.hpp>          //<==== change this include line
#include <xercesc/parsers/XercesDOMParser.hpp>  //<==== change this include line

void test(char* xmlFile) {
    XercesDOMParser parser;                           //<==== rename the IDOMParser
    parser.parse(xmlFile);
    DOMDocument* doc = parser.getDocument();          //<==== rename the IDOM_XXXX
    :
    return;
}

Reuse Grammar becomes Grammar Caching
 

The Xerces-C++ 2.0.0 extends the "Reuse Grammar" support by replacing it with a new feature called "Grammar Caching" which provides more flexibility in reusing grammars. Users who used to do the following:


      XercesDOMParser parser;

      // this is the first parse, just usual code as you do normal parse
      // "firstXmlFile" has a grammar (schema or DTD) specified.
      parser.parse(firstXmlFile);

      // this is the second parse, by setting second parameter to true,
      // the parser will reuse the grammar in the last parse
      // (i.e. the one in  "firstXmlFile")
      // to validate the second "anotherXmlFile".  Any grammar that is
      // specified in anotherXmlFile is IGNORED.
      //
      // Note: The anotherXmlFile cannot have any DTD internal subset.
      parser.parse(anotherXmlFile, true);

should now use the features cacheGrammarFromParse and useCachedGrammarFromParse:

      XercesDOMParser parser;

      // By setting cacheGrammarFromParse to true,
      // the parser will cache any grammars encountered in the
      // follow-on xml files, if not cached already
      parser.cacheGrammarFromParse(true);

      parser.parse(firstXmlFile);

      // By setting useCachedGrammarFromParse to true,
      // the parser will use all the previous cached grammars
      // to validate the follow-on xml files if the cached
      // grammar matches the one specified in anotherXmlFile.
      //
      // Note: The follow-on xml files cannot have any DTD internal subset.
      parser.useCachedGrammarFromParse(true);

      parser.parse(anotherXmlFile);

      // This will flush the cached grammar pool
      parser.resetCachedGrammarPool();

Note there are a number of differences between "Reuse Grammar" and "Grammar Caching"

  1. "Reuse Grammar" ignores any grammar that is specified in anotherXmlFile and simply reuse whatever stored in previous parse; while "Grammar Caching" will use the cached grammar only if it matches the one specified in the anotherXmlFile. If not match, then the new grammar is parsed.
  2. "Reuse Grammar" can only reuse the grammar from previous parse; while "Grammar Caching" can selectively cache many grammars from different parses and collect them all in a pool indexed by targetNamespace (for Schema) or system id (for DTD).
  3. Plus "Grammar Caching" has much more functionalities other than above (like "Pre-parsing Grammar"). Please refer to Preparsing Grammar and Grammar Caching for more programming details.

Public API Changes
 

The following lists the public API changes between the Xerces-C++ 1.7.0; and the Xerces-C++ 2.0.0 releases of the parser.

New Public API
 
  • To support DOM Level 3, the following are added (see the API documentation page for details).
    • DOMNode functions set/getUserData, isSameNode isEqualNode.
    • DOMDocument functions renameNode, get/setActualEncoding, get/setEncoding, get/setVersion, get/setStandalone, get/setDocumentURI.
    • DOMEntity functions get/setActualEncoding, get/setEncoding, get/setVersion.
    • classes AbstractDOMParser, DOMError, DOMErrorHandler, and DOMLocator.
    • classes DOMUserDataHandler, DOMImplementationRegistry and DOMImplementationSource.
    • classes DOMBuilder, DOMEntityResolver, DOMImplementationLS, DOMInputSource, Wrapper4DOMInputSource and Wrapper4InputSource.
    • classes DOMWriter, DOMWriterFilter, LocalFileFormatTarget, StdOutFormatTarget, and MemBufFormatTarget
  • To support DOMWriter, the following PlatformUtils functions are added
    • openFileToWrite, writeBufferToFile
  • To have Apache Recommended DOM C++ Binding, the following are added (see Apache Recommended DOM C++ binding).
    • function release() to fix Memory Management problem
    • classes DOMDocumentRange and DOMDocumentTraversal
    • XMLSize_t is used to represent unsigned integral type in DOM
    • IDOM_XXXX classes are renamed to DOMXXXX, and IDOMParser is renamed to XercesDOMParser as described in DOM Reorganization
    • XercesDOMParser::adoptDocument is added so that document can optionally live outside the parser.
  • To support optionally load external DTD, the following are added:
    • XercesDOMParser::set/getLoadExternalDTD
    • DOMParser::set/getLoadExternalDTD
    • SAXParser::set/getLoadExternalDTD
    • and SAX2XMLReader will recognize the feature http://apache.org/xml/features/nonvalidating/load-external-dtd
  • To support Preparsing Grammar and Grammar Caching, the following are added:
    • XercesDOMParser/DOMParser/SAXParser functions loadGrammar, resetCachedGrammarPool, cacheGrammarFromParse, isCachingGrammarFromParse, useCachedGrammarInParse, isUsingCachedGrammarInParse.
    • SAX2XMLReader functions loadGrammar, resetCachedGrammarPool, and will recognize the features http://apache.org/xml/features/validation/cache-grammarFromParse and http://apache.org/xml/features/validation/use-cachedGrammarInParse.
  • To support access to Grammar info, the following are added:
    • XercesDOMParser/DOMParser/SAXParser/SAX2XMLReader functions getRootGrammar, getGrammar, getURIText.
  • To support strict IANA encoding name checking, the following are added:
    • class EncodingValidator.
    • PlatformUtils functions strictIANAEncoding, isStrictIANAEncoding.
    • XMLTransService functions strictIANAEncoding, isStrictIANAEncoding.

Modified Public API
 
  • SAXParser::getScanner() is moved from public to protected.
  • Grammar::getGrammarType has been added a const modifier.
  • Xerces features are renamed from XMLUni::fgSAX2XercesXXXX to XMLUni::fgXercesXXXX so that they can be shared with DOM parser.
  • With the new Grammar Caching introduced, the the last parameter "reuseGrammar" in the following API is dropped. Users should now use the "Grammar Caching" feature as described in Reuse Grammar becomes Grammar Caching.
    • (in Parser, SAXParser, DOMParser, and XercesDOMParser)
    • parse(const InputSource& source, const bool reuseGrammar = false);
    • parse(const XMLCh* const systemId, const bool reuseGrammar = false);
    • parse(const char* const systemId, const bool reuseGrammar = false);
    • (in SAXParser, DOMParser, and XercesDOMParser)
    • parseFirst(const InputSource& source, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const XMLCh* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);
    • parseFirst(const char* const systemId, XMLPScanToken& toFill, const bool reuseGrammar = false);

Deprecated/Removed Public API
 
  • The old Java-like DOM is now deprecated as described in DOM Reorganization
  • SAX2XMLReader::setValidationConstraint. For consistency, SAX2XMLReader users should set the feature http://apache.org/xml/features/validation-error-as-fatal" instead.
  • SAX2XMLReader::setExitOnFirstFatalError. For consistency, SAX2XMLReader users should set the feature "http://apache.org/xml/features/continue-after-fatal-error" instead.
  • With the new Grammar Caching introduced, the following features will not be recognized by the SAX2XMLReader:
    • http://apache.org/xml/features/validation/reuse-grammar
    • http://apache.org/xml/features/validation/reuse-validator



Migrating from Xerces-C++ 1.6.0 to 1.7.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.6.0 code base and the Xerces-C++ 1.7.0 code base.

New features in Xerces-C++ 1.7.0
 
  • Support SAX2-ext's DeclHandler.
  • Directory sane_include reorganization: add sub-directory 'xercesc' to src / include folder. See "Directory change in Xerces-C++ 1.7.0" below for detail.
  • More IDOM test cases - port IDOMMemTest, and merge ThreadTest and IThreadTest.
  • Support IconvFBSD in multi-threading environment.
  • Use IDOM in schema processing for faster performance.
  • Add Project files for BCB6.
  • Port to Caldera (SCO) OpenServer.
  • Support building with new MacOSURLAccessCF NetAccessor that doesn't require Carbon but can allow Xerces to live solely within CoreServices layer.

Directory change in Xerces-C++ 1.7.0
 
  • A new directory, src/xercesc is created to be the new parent directory of all src's direct subdirectories.
  • And in the binary package, all the headers are distributed in include/xercesc directory.
  • Migration considerations:
    • Windows application,
      either change the include directories setting to "..\..\..\..\..\src\xercesc" (Projects->settings->C/C++->Preprocessor),
      or
      change the relevant #include instances in the source/header files, accordingly, eg
      #include <util/XMLString.hpp> be changed to
      #include <xercesc/util/XMLString.hpp>
    • Unix application,
      either change the include search path in the Makefile to " -I <installroot>/include/xercesc",
      or
      change the relevant #include instances in the source/header files as shown above.

Public API Changes in Xerces-C++ 1.7.0
 

The following lists the public API changes between the Xerces-C++ 1.7.0 and the Xerces-C++ 1.7.0 releases of the parser.

New Public API
 
  • Added SAX2-ext's DeclHandler class. See the API documentation page for details.
  • To support SAX2-ext's DeclHandler, the following new methods are added in classes DefaultHandler and SAX2XMLReader:
    • void DefaultHandler::elementDecl(const XMLCh* const name, const XMLCh* const model)
    • void DefaultHandler::attributeDecl(const XMLCh* const eName, const XMLCh* const aName, const XMLCh* const type, const XMLCh* const mode, const XMLCh* const value)
    • void DefaultHandler::internalEntityDecl(const XMLCh* const name, const XMLCh* const value)
    • void DefaultHandler::externalEntityDecl(const XMLCh* const name, const XMLCh* const publicId, const XMLCh* const systemId)
    • DeclHandler* SAX2XMLReader::getDeclarationHandler() const
    • void SAX2XMLReader::setDeclarationHandler(DeclHandler* const handler)
  • To conform to DOM Level 2 specification, the following methods are added:
    • DOM_Node DOM_NodeIterator::getRoot()
    • DOM_Node DOM_TreeWalker::getRoot()
    • bool DOM_Node::hasAttributes() const
    • bool DOM_Element::hasAttribute(const DOMString &name) const
    • bool DOM_Element::hasAttributeNS(const DOMString &namespaceURI, const DOMString &localName) const
    • IDOM_Node* IDOM_NodeIterator::getRoot()
    • IDOM_Node* IDOM_TreeWalker::getRoot()
    • bool IDOM_Node::hasAttributes() const
    • bool IDOM_Element::hasAttribute(const XMLCh* name) const
    • bool IDOM_Element::hasAttributeNS(const XMLCh* namespaceURI, const XMLCh* localName) const
  • To fix [Bug 5570], a copy constructor is added to DOM_Range

Modified Public API
 
  • To conform to the SAX2 specification, the namespace-prefixes feature in SAX2 is set to off as default.
  • To fix [Bug 6330], the Base64::encode and Base64::decode have been modified as follows
    • static XMLByte* Base64::encode(const XMLByte* const inputData, const unsigned int inputLength, unsigned int* outputLength);
    • static XMLByte* Base64::decode(const XMLByte* const inputData, unsigned int* outputLength);
    • static XMLCh* decode(const XMLCh* const inputData, unsigned int* outputLength);
  • To conform to DOM Level 2 specification, the DOM_Node::supports and IDOM_Node::supports are modified to
    • bool DOM_Node::isSupported(const DOMString &feature, const DOMString &version) const
    • bool IDOM_Node::isSupported(const XMLCh* feature, const XMLCh* version) const

Deprecated Public API
 
  • No Deprecated Public API in this release.



Migrating from Xerces-C++ 1.5.2 to 1.6.0
 

The following section is a discussion of the technical differences between Xerces-C++ 1.5.2 code base and the Xerces-C++ 1.6.0 code base.

New features in Xerces-C++ 1.6.0
 
  • Full Schema support is available in this release. See the Schema page for details.
  • New sample SEnumVal to show how to enumerate the markup decls in a Schema Grammar is added.

Public API Changes in Xerces-C++ 1.6.0
 

The following lists the public API changes between the Xerces-C++ 1.5.2 and the Xerces-C++ 1.6.0 releases of the parser.

New Public API
 
  • It should not be a fatal error if a schema InputSource is not found. Add the following new methods:
    • const bool InputSource::getIssueFatalErrorIfNotFound() const
    • void InputSource::setIssueFatalErrorIfNotFound(const bool flag
  • Allow code to take advantage of the fact that the length of the prefix and local name are known when constructing the QName. Add the following new methods:
    • void QName::setNPrefix(const XMLCh*, const unsigned int)
    • void QName::setNLocalPart(const XMLCh*, const unsigned int)
  • To support schemaLocation and noNamespaceSchemaLocation to be specified outside the instance document, the following new methods are added:
    • XMLCh* DOMParser::getExternalSchemaLocation() const
    • XMLCh* DOMParser::getExternalNoNamespaceSchemaLocation() const
    • void DOMParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void DOMParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • XMLCh* IDOMParser::getExternalSchemaLocation() const
    • XMLCh* IDOMParser::getExternalNoNamespaceSchemaLocation() const
    • void IDOMParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void IDOMParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • XMLCh* SAXParser::getExternalSchemaLocation() const
    • XMLCh* SAXParser::getExternalNoNamespaceSchemaLocation() const
    • void SAXParser::setExternalSchemaLocation(const XMLCh* const schemaLocation)
    • void SAXParser::setExternalNoNamespaceSchemaLocation(const char* const noNamespaceSchemaLocation)
    • and the following properties are recognized by SAX2XMLReader:
      • http://apache.org/xml/properties/schema/external-schemaLocation
      • http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation
  • To support identity constraints, the following new method is added:
    • QName* XMLAttr::getAttName() const

Modified Public API
 
  • To support attribute constraint checking, the constant values in XMLAttDef::DefAttTypes have been re-ordered.

Deprecated Public API
 
  • Root Element check is moved from XMLValidator to XMLScanner. Thus XMLValidator::checkRootElement() is deprecated.



Migrating from Xerces-C++ 1.4.0 to 1.5.2
 

The following section is a discussion of the technical differences between Xerces-C++ 1.4.0 code base and the Xerces-C++ 1.5.2 code base.

New features in Xerces-C++ 1.5.2
 

Schema subset support and an experimental IDOM are available in this release.

Schema Subset Support
 
  • New function "setDoSchema" is added to DOM/SAX parser.
  • New feature "http://apache.org/xml/features/validation/schema" is recognized by SAX2XMLReader.
  • New classes such as SchemaValidator, TraverseSchema ... are added.
  • The Scanner is enhanced to process schema.
  • New sample data files personal-schema.xml and personal.xsd.
  • New command line option "-s" for samples.

See the Schema page for details.


Experimental IDOM
 

The experimental IDOM API is a new design of the C++ DOM API. If you would like to migrate from DOM to the experimental IDOM, please refer to IDOM programming guide. Please note that this experimental IDOM API is only a prototype and is subject to change.



Changes required to migrate to Xerces-C++ 1.5.2
 

There are some architectural changes between the Xerces-C++ 1.4.0 and the Xerces-C++ 1.5.2 releases of the parser, and as a result, some code has undergone restructuring as shown below.

Validator directory Reorganization
 
  • common content model files such as DFAContentModel ... are moved to a new directory called src/validators/common
  • DTD related files are moved to a new directory called src/validators/DTD
  • new directory src/validators/Datatype is created to store all datatype validators
  • new directory src/validators/schema is created to store Schema related files

DTDValidator
 

DTDValidator was design to scan, validate and store the DTD in Xerces-C++ 1.4.0 or earlier. In Xerces-C++ 1.5.2, this process is broken down into three components:

  • new class DTDScanner - to scan the DTD
  • new class DTDGrammar - to store the DTD Grammar
  • DTDValidator - to validate the DTD only



Migrating from XML4C 2.x to Xerces-C++ 1.4.0
 

The following section is a discussion of the technical differences between XML4C 2.x code base and the new Xerces-C++ 1.4.0 code base.

Summary of changes required to migrate from XML4C 2.x to Xerces-C++ 1.4.0
 

There are some major architectural changes between the 2.3.x and Xerces-C++ 1.4.0 releases of the parser, and as a result the code has undergone significant restructuring. The list below mentions the public api's which existed in 2.3.x and no longer exist in Xerces-C++ 1.4.0. It also mentions the Xerces-C++ 1.4.0 api which will give you the same functionality. Note: This list is not exhaustive. The API docs (and ultimately the header files) supplement this information.

  • parsers/[Non]Validating[DOM/SAX]parser.hpp
    These files/classes have all been consolidated in the new version to just two files/classes: [DOM/SAX]Parser.hpp. Validation is now a property which may be set before invoking the parse. Now, the setDoValidation() method controls the validation processing.
  • The framework/XMLDocumentTypeHandler.hpp been replaced with validators/DTD/DocTypeHandler.hpp.
  • The following methods now have different set of parameters because the underlying base class methods have changed in the 3.x release. These methods belong to one of XMLDocumentHandler, XMLErrorReporter or DocTypeHandler interfaces.
    • [Non]Validating[DOM/SAX]Parser::docComment
    • [Non]Validating[DOM/SAX]Parser::doctypePI
    • [Non]ValidatingSAXParser::elementDecl
    • [Non]ValidatingSAXParser::endAttList
    • [Non]ValidatingSAXParser::entityDecl
    • [Non]ValidatingSAXParser::notationDecl
    • [Non]ValidatingSAXParser::startAttList
    • [Non]ValidatingSAXParser::TextDecl
    • [Non]ValidatingSAXParser::docComment
    • [Non]ValidatingSAXParser::docPI
    • [Non]Validating[DOM/SAX]Parser::endElement
    • [Non]Validating[DOM/SAX]Parser::startElement
    • [Non]Validating[DOM/SAX]Parser::XMLDecl
    • [Non]Validating[DOM/SAX]Parser::error
  • The following methods/data members changed visibility from protected in 2.3.x to private (with public setters and getters, as appropriate).
    • [Non]ValidatingDOMParser::fDocument
    • [Non]ValidatingDOMParser::fCurrentParent
    • [Non]ValidatingDOMParser::fCurrentNode
    • [Non]ValidatingDOMParser::fNodeStack
  • The following files have moved, possibly requiring changes in the #include statements.
    • MemBufInputSource.hpp
    • StdInInputSource.hpp
    • URLInputSource.hpp
  • All the DTD validator code was moved from internal to separate validators/DTD directory.
  • The error code definitions which were earlier in internal/ErrorCodes.hpp are now split up into the following files:
    • framework/XMLErrorCodes.hpp - Core XML errors
    • framework/XMLValidityCodes.hpp - DTD validity errors
    • util/XMLExceptMsgs.hpp - C++ specific exception codes.

The Samples
 

The sample programs no longer use any of the unsupported util/xxx classes. They only existed to allow us to write portable samples. But, since we feel that the wide character APIs are supported on a lot of platforms these days, it was decided to go ahead and just write the samples in terms of these. If your system does not support these APIs, you will not be able to build and run the samples. On some platforms, these APIs might perhaps be optional packages or require runtime updates or some such action.

More samples have been added as well. These highlight some of the new functionality introduced in the new code base. And the existing ones have been cleaned up as well.

The new samples are:

  1. PParse - Demonstrates 'progressive parse' (see below)
  2. StdInParse - Demonstrates use of the standard in input source
  3. EnumVal - Shows how to enumerate the markup decls in a DTD Validator

Parser Classes
 

In the XML4C 2.x code base, there were the following parser classes (in the src/parsers/ source directory): NonValidatingSAXParser, ValidatingSAXParser, NonValidatingDOMParser, ValidatingDOMParser. The non-validating ones were the base classes and the validating ones just derived from them and turned on the validation. This was deemed a little bit overblown, considering the tiny amount of code required to turn on validation and the fact that it makes people use a pointer to the parser in most cases (if they needed to support either validating or non-validating versions.)

The new code base just has SAXParer and DOMParser classes. These are capable of handling both validating and non-validating modes, according to the state of a flag that you can set on them. For instance, here is a code snippet that shows this in action.

void ParseThis(const  XMLCh* const fileToParse,
               const bool validate)
{
  //
  // Create a SAXParser. It can now just be
  // created by value on the stack if we want
  // to parse something within this scope.
  //
  SAXParser myParser;

  // Tell it whether to validate or not
  myParser.setDoValidation(validate);

  // Parse and catch exceptions...
  try
  {
    myParser.parse(fileToParse);
  }
    ...
};

We feel that this is a simpler architecture, and that it makes things easier for you. In the above example, for instance, the parser will be cleaned up for you automatically upon exit since you don't have to allocate it anymore.


Moved Classes to src/framework
 

Some of the classes previously in the src/internal/ directory have been moved to their more correct location in the src/framework/ directory. These are classes used by the outside world and should have been framework classes to begin with. Also, to avoid name classes in the absence of C++ namespace support, some of these clashes have been renamed to make them more XML specific and less likely to clash. More classes might end up being moved to framework as well.

So you might have to change a few include statements to find these classes in their new locations. And you might have to rename some of the names of the classes, if you used any of the ones whose names were changed.


Util directory Reorganization
 

The src/util directory was becoming somewhat of a dumping ground of platform and compiler stuff. So we reworked that directory to better spread things out. The new scheme is:

util - The platform independent utility stuff
 
  • MsgLoaders - Holds the msg loader implementations
    1. ICU
    2. InMemory
    3. MsgCatalog
    4. Win32
  • Compilers - All the compiler specific files
  • Transcoders - Holds the transcoder implementations
    1. Iconv
    2. ICU
    3. Win32
  • Platforms
    1. AIX
    2. HP-UX
    3. Linux
    4. Solaris
    5. ....
    6. Win32

This organization makes things much easier to understand. And it makes it easier to find which files you need and which are optional. Note that only per-platform files have any hard coded references to specific message loaders or transcoders. So if you don't include the ICU implementations of these services, you don't need to link in ICU or use any ICU headers. The rest of the system works only in terms of the abstraction APIs.




Copyright © 2003 The Apache Software Foundation. All Rights Reserved.