The XML parser will check if an XML document is well-formed, and optionally validate it against a DTD. The parser will construct an object tree which can be accessed via a DOM interface or operate serially via a SAX interface.
Be sure to read the licensing agreement before using this product.
You may post questions, comments, or bug reports to the XML Forum on the Oracle Technology Network. Oracle customers may also call Oracle Worldwide Support for assistance.
The parser conforms to the following standards:
The following files and directories are found in this release:
license.html | Licensing agreement |
readme.html | This file |
bin/ | Standalone parser "xml" |
demo/ | Example usage of the XML parser |
doc/ | API documentation |
include/ | Header files |
lib/ | XML Parser/XSL Processor & support libraries |
mesg/ | Error message files |
libxml8.a | XML Parser/XSL Processor |
libcore8.a | CORE functions |
libnls8.a | National Language Support |
The parser may be called as an executable by invoking bin/xml which has the following options:
-c | Conformance check only, no validation |
-e encoding | Specify input file encoding |
-h | Help - show this usage help |
-n | Number - DOM traverse and report number of elements |
-p | Print document and DTD structures after parse |
-x | Exercise SAX interface and print document |
-v | Version - display parser version then exit |
-w | Whitespace - preserve all whitespace |
The parser may also be invoked by writing code to use the supplied APIs.
The code must be compiled using the headers in the include/
subdirectory and linked against the libraries in the lib/
subdirectory. Please see the Makefile in the demo/ subdirectory
for full details of how to build your program.
Error message files are provided in the mesg/
subdirectory.
The parser currently supports the following encodings: UTF-8, UTF-16, US-ASCII, ISO-10646-UCS-2, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, EUC-JP, SHIFT_JIS, BIG5, GB2312, KOI8-R, EBCDIC-CP-US, EBCDIC-CP-CA, EBCDIC-CP-NL, EBCDIC-CP-WT, EBCDIC-CP-DK, EBCDIC-CP-NO, EBCDIC-CP-FI, EBCDIC-CP-SE, EBCDIC-CP-IT, EBCDIC-CP-ES, EBCDIC-CP-GB, EBCDIC-CP-FR, EBCDIC-CP-HE, EBCDIC-CP-BE, EBCDIC-CP-CH, EBCDIC-CP-ROECE, EBCDIC-CP-YU, and EBCDIC-CP-IS. In addition, any character set specified in Appendix A, Character Sets, of the Oracle National Language Support Guide may be used.
In order to be able to use these encodings, you must have the ORACLE_HOME environment variable set and pointing to the location of your Oracle installation. In addition, the environment variables ORA_NLS, ORA_NLS32, and ORA_NLS33 must be set to point to the location of the NLS data files. On Unix systems, this is usually $ORACLE_HOME/ocommon/nls/admin/data. On Windows NT, this is usually $ORACLE_HOME/nlsrtl/admin/nlsdata.
The default encoding is UTF-8. It is recommended that you set the default encoding explicitly if using only single byte character sets (such as US-ASCII or any of the ISO-8859 character sets) for performance up to twice as fast as with multibyte character sets, such as UTF-8.
The version of the C++ compiler used on Windows NT is Visual C++ 11.00.7022. The version of the C++ compiler used on Sun Solaris is Sparcworks for C++ 4.1. The Sparcworks 5.0 compiler has a -compat=4 option which you can use to compile your C++ code and link against libxml8.a.
The following features of the XSLT recommendation are not currently supported but may be available in future releases: extension elements, extension functions, xsl:fallback, xsl:output, and xsl:apply-import. In addition, the following XSLT-specific additions to the core XPath library are not supported: document(), element-available(), and function-available().
July 17, 2000
This is the first production V2 release. This changes in this release were mainly bug fixes.
For the XML parser, the following bugs were fixed:
1352943 | XMLPARSE() SOMETIMES CHOKES ON FILENAMES |
1302311 | PROBLEM WITH PARAMETER ENTITY PROCESSING |
1323674 | INCONSISTENT ERROR HANDLING IN THE C XML PARSER |
1328871 | LPXPRINTBUFFER UNCONDITIONALLY PREPENDS XML COMMENT TO OUTPUT |
1349962 | USING FREED MEMORY LOCATION CAUSES TLPXVNSA31.DIF |
oraxmldom.h was renamed to oradom.h |
For the XSLT processor, the following bugs were fixed:
1225546 | USELESS ERROR MESSAGE NEEDS DETAIL |
1267616 | TLPXST14.DIF: REPLACE DBL_MAX WITH SBIG_ORAMAXVAL IN LPXXP.C:LPXXPSUBSTRING() |
1289228 | ERROR CONTEXT REQUIRED FOR DEBUGGING: FILE NAME, LINE#, FUNCTION, ETC |
1289214 | XSL:CHOOSE DOESN'T WORK |
1298028 | XPATH CONSTRUCT NOT(POSITION()=LAST()) NOT WORKING |
1298193 | XPATH FUNCTIONS DON'T PROVIDE IMPLICIT TYPE CONVERSION OF PARAMS |
1323665 | C XML PARSER CANNOT SET BASE DIRECTORY OR URI FOR STYLESHEET PARSING |
1325452 | SEVERE MEMORY CONSUMPTION / LEAK IN XSLPROCESS |
1333693 | CHAINED TRANSFORMS WITH C XSL PROCESSOR DON'T WORK: LPX-00002 |
April 4, 2000
SAX memory usage: SAX memory usage is now much smaller, and flat for any input size and multiple parses (memory leaks plugged).
XSLT memory usage: XSLT memory usage is improved.
Validation warnings: Validity Constraint (VC) errors have been changed to warnings and do not terminate parsing. For compatibilty with the old behavior (halt on warnings as well as errors), a new flag XML_FLAG_STOP_ON_WARNING (or '-W' to the xml program) has been added.
Performance improvements: Switch to finite automata VC structure validation yields 10% performance gain.
HTTP support: HTTP URIs are now supported; look for FTP in the next release. For other access methods, the user may define their own callbacks with the new xmlaccess() API.
Print method: The Node::print method has been changed to take an ostream instead of an fstream.
March 17, 2000
XSLT improvements: Various bugs fixed in the XSLT processor; error messages are improved; xsl:number, xsl:sort, xsl:namespace-alias, xsl:decimal-format, forwards-compatible processing with xsl:version, and literal result element as stylesheet are now available; the following XSLT-specific additions to the core XPath library are now available: current(), format-number(), generate-id(), and system-property().
Bug fixes: Some problems with validation and matching of start and end tags with SAX were fixed (1227096). Also, a bug with parameter entity processing in external entities was fixed (1225219).
February 10, 2000
Performance improvements: This version of the parser is a major performance improvement over the last, about two and a half times faster for UTF-8 parsing and about four times faster for ASCII parsing. Comparison timing against previous version for parsing (DOM) and validating various standalone files (SPARC Ultra 1 CPU time):
File size | Old UTF-8 | New UTF-8 | Speedup | Old ASCII | New ASCII | Speedup |
---|---|---|---|---|---|---|
42K | 180ms | 70ms | 2.6 | 120ms | 40ms | 3.0 |
134K | 510ms | 210ms | 2.4 | 450ms | 100ms | 4.5 |
247K | 980ms | 400ms | 2.5 | 690ms | 180ms | 3.8 |
1M | 2860ms | 1130ms | 2.5 | 1820ms | 380ms | 4.8 |
2.7M | 10550ms | 4100ms | 2.6 | 7450ms | 1930ms | 3.9 |
10.5M | 42250ms | 16400ms | 2.6 | 29900ms | 7800ms | 3.8 |
Conformance improvements: Stricter conformance to the XML 1.0 spec yields higher scores on standard test suites (Jim Clark, Oasis, etc).
Lists, not arrays: Internal parser data structures are now uniformly lists; arrays have been dropped. Therefore, access is now better suited to a firstChild/nextSibling style loop instead of numChildNodes/getChildNode.
DTD parsing:A new API call xmlparsedtd() is added which parses an external DTD directly, without needing an enclosing document. Used mainly by the Class Generator.
Error reporting: Error messages are improved and more specific, with nearly twice as many as before. Error location is now described by a stack of line number/entity pairs, showing the final location of the error and intermediate inclusions (e.g. line X of file, line Y of entity). NOTE: You must use the new error message file (lpxus.msb) provided with this release; the error message file provided with earlier releases is incompatible. See below.
XSL improvements: Various bugs fixed in the XSLT processor; xsl:call-template is now fully supported.
November 24, 1999
The Oracle XML v2 parser is a beta release and is written in C, with a C++ wrapper. The main difference from the Oracle XML v1 parser is the ability to format the XML document according to a stylesheet via an integrated an XSLT processor.