Release Info

Build Instructions

API Docs

DOM C++ Binding
Migration Guide

PDF Document

CVS Repository
Mail Archive

Building Xerces-C++ with ICU

Xerces-C++ may be built in stand-alone mode using native encoding support and also using ICU where you get support over 180 different encodings and/or locale specific message support. ICU stands for International Components for Unicode and is an open source distribution from IBM. You can get ICU libraries from IBM's developerWorks site or go to the ICU download page directly.

NoteImportant: Please remember that ICU and Xerces-C++ must be built with the same compiler, preferably with the same version. You cannot for example, build ICU with a threaded version of the xlC compiler and build Xerces-C++ with a non-threaded one.

Building on Windows

There are two options to build Xerces-C++ with ICU on Windows. One is to use the MSDEV GUI environment, and the other is to invoke the compiler from the command line.

Using, the GUI environment, requires one to edit the project files. Here, we will describe only the second option. It involves using the perl script ''.


  • Perl 5.004 or higher
  • Cygwin tools or MKS Toolkit
  • zip.exe

Extract Xerces-C++ source files from the .zip archive using WinZip, say in the root directory (an arbitrary drive x:). It should create a directory like 'x:\xerces-c-src2_5_0'.

Extract the ICU files, using WinZip, in root directory of the disk where you have installed Xerces-C++, sources. After extraction, there should be a new directory 'x:\icu' which contains all the ICU source files.

Start a command prompt to get a new shell window. Make sure you have perl, cygwin tools (uname, rm, cp, ...), and zip.exe somewhere in the path. Next setup the environment for MSVC using 'VCVARS32.BAT' or a similar file. Then at the prompt enter:

set XERCESCROOT=x:\xerces-c-src2_5_0
set ICUROOT=x:\icu
cd x:\xerces-c-src2_5_0\scripts

To build with ICU, either specify using ICU transcoding service,

perl -s x:\xerces-c-src2_5_0 -o x:\temp\xerces-c2_5_0-win32 -t icu

or specify using ICU message loader service

perl -s x:\xerces-c-src2_5_0 -o x:\temp\xerces-c2_5_0-win32 -m icu

(Match the source directory to your system; the target directory can be anything you want.)

If everything is setup right and works right, then you should see a binary drop created in the target directory specified above. This script will build both ICU and Xerces-C++, and copy the files (relevant to the binary drop) to the target directory.

If the parser is built with icu message loader (as mentioned above), or message catalog loader, you need an environment variable, XERCESC_NLS_HOME to point to the directory, $XERCESCROOT/msg, where the message files reside.

For a description of options available, you can enter:


Building on UNIX

Extract Xerces-C++ source files into, say, the home directory ($HOME). It should create a directory like '$HOME/xerces-c-src2_5_0'.

Extract the ICU files into the same directory where you have installed Xerces-C++ sources. After extraction, there should be a new directory '$HOME/icu' which contains all the ICU source files.

Build the ICU according to the ICU Build instruction in ICU Readme. Then have its dll, libicuuc* and libicudt* available from your library search path.

Then build the Xerces-C++ with ICU. This is similar to building a standalone Xerces-C++ library as instructed in "Building Xerces-C++ on UNIX platforms"; except that you have to specify the transcoder option '-t icu' and/or the message loader option '-m icu'. For example:

runConfigure -plinux -cgcc -xg++ -minmem -nsocket -ticu -rpthread

Or instead of building the ICU and Xerces-C++ manually in two steps, you can use the bundled perl script '' which will build both of them in one step. For example:

export XERCESCROOT=$HOME/xerces-c-src2_5_0
export ICUROOT=$HOME/icu
cd $HOME/xerces-c-src2_5_0/scripts

To build with ICU, either specify using ICU transcoding service,

perl -s $HOME/xerces-c-src2_5_0 -o $HOME/temp/xerces-c2_5_0-aix -t icu

or specify using ICU message loader service

perl -s $HOME/xerces-c-src2_5_0 -o $HOME/temp/xerces-c2_5_0-aix -m icu

If the parser is built with icu message loader (as mentioned above), or message catalog loader, you need an environment variable, XERCESC_NLS_HOME to point to the directory, $XERCESCROOT/msg, where the message files reside.

Building Xerces-C++ using RPM on Linux

Xerces-C++ may be built from the distributed source archive directly on Linux using RPM. For example:

rpm -ta xerces-c-src2_5_0.tar.gz (rpm 4.0 and older)
rpmbuild -ta xerces-c-src2_5_0.tar.gz (rpm 4.1 and later; ships with RedHat 8)

The Xerces-C++ RPM specificattion can be found in

Please refer to the RPM-HOWTO, for more RPM related information.

Building Xerces-C++ COM Wrapper on Windows

To build the COM module for use with XML on Windows platforms, you must first set up your machine appropriately with necessary tools and software modules and then try to compile it. The end result is an additional library that you can use along with the standard Xerces-C++ for writing VB templates or for use with IE 5.0 using JavaScript.

Setting up your machine for COM

To build the COM project you will need to install the MS PlatformSDK. Some of the header files we use don't come with Visual C++ 6.0. You may download it from Microsoft's Website at or directly FTP it from

The installation is huge, but you don't need most of it. So you may do a custom install by just selecting "Build Environment" and choosing the required components. First select the top level Platform SDK. Then click the down arrow and make all of the components unavailable. Next open the "Build Environment" branch and select only the following items:

  • Win32 API
  • Component Services
  • Web Services - Internet Explorer

Important: When the installation is complete you need to update VC6's include path to include ..\platformsdk\include\atl30. You do this by choosing "Tools -> Options -> Directories". This path should be placed second after the normal PlatformSDK include. You change the order of the paths by clicking the up and down arrows.

NoteThe order in which the directories appear on your path is important. Your first include path should be ..\platformsdk\include. The second one should be ..\platformsdk\include\atl30.

Building COM module for Xerces-C++

Once you have set up your machine, build Xerces-C++ COM module by choosing the project named 'xml4com' inside the workspace. Then select your build mode to be xml4com - Win32 Release MinDependency. Finally build the project. This will produce a DLL named xerces-com.dll which needs to be present in your path (on local machine) before you can use it.

Testing the COM module

There are some sample test programs in the test/COMTest directory which show examples of navigating and searching an XML tree using DOM. You need to browse the HTML files in this directory using IE 5.0. Make sure that your build has worked properly, specially the registration of the ActiveX controls that happens in the final step.

You may also want to check out the NIST DOM test suite at You will have to modify the documents in the NIST suite to load the Xerces COM object instead of the MSIE COM object.

Building User Documentation

The user documentation (this very page that you are reading on the browser right now), was generated using an XML application called StyleBook. This application makes use of Xerces-J and Xalan to create the HTML file from the XML source files. The XML source files for the documentation are part of the Xerces-C++ module. These files reside in the doc directory.

Pre-requisites for building the user documentation are:

  • JDK 1.2.2 (or later).
  • Xerces-J 1.0.1.bundled
  • Xalan-J 0.19.2.bundled
  • Stylebook 1.0-b2.bundled
  • The Apache Style files (dtd's and .xsl files).bundled

Invoke a command window and setup PATH to include the JDK 1.2.2 bin directory

Next, cd to the Xerces-C++ source drop root directory, and enter

  • Under Windows:
  • Under Unix's:
    sh createDocs.bat

This should generate the .html files in the 'doc/html' directory.

I wish to port Xerces to my favourite platform. Do you have any suggestions?

All platform dependent code in Xerces has been isolated to a couple of files, which should ease the porting effort. Please refer to Porting Guidelines for further details.

What should I define XMLCh to be?

XMLCh should be defined to be a type suitable for holding a utf-16 encoded (16 bit) value, usually an unsigned short.

All XML data is handled within Xerces-C++ as strings of XMLCh characters. Regardless of the size of the type chosen, the data stored in variables of type XMLCh will always be utf-16 encoded values.

Unlike XMLCh, the encoding of wchar_t is platform dependent. Sometimes it is utf-16 (AIX, Windows), sometimes ucs-4 (Solaris, Linux), sometimes it is not based on Unicode at all (HP/UX, AS/400, system 390).

Some earlier releases of Xerces-C++ defined XMLCh to be the same type as wchar_t on most platforms, with the goal of making it possible to pass XMLCh strings to library or system functions that were expecting wchar_t parameters. This approach has been abandoned because of

  • Portability problems with any code that assumes that the types of XMLCh and wchar_t are compatible
  • Excessive memory usage, especially in the DOM, on platforms with 32 bit wchar_t.
  • utf-16 encoded XMLCh is not always compatible with ucs-4 encoded wchar_t on Solaris and Linux. The problem occurs with Unicode characters with values greater than 64k; in ucs-4 the value is stored as a single 32 bit quantity. With utf-16, the value will be stored as a "surrogate pair" of two 16 bit values. Even with XMLCh equated to wchar_t, xerces will still create the utf-16 encoded surrogate pairs, which are illegal in ucs-4 encoded wchar_t strings.

Where can I look for more help?

If you have read this page, followed the instructions, and still cannot resolve your problem(s), there is more help. You can find out if others have solved this same problem before you, by checking the Apache XML mailing list archives at and the Bugzilla Apache bug database.

Copyright © 2003 The Apache Software Foundation. All Rights Reserved.