4 XML Catalog API
Use the XML Catalog API to implement a local XML catalog.
Java SE 9 introduces a new XML Catalog API to support the Organization for the Advancement of Structured Information Standards (OASIS) XML Catalogs, OASIS Standard V1.1. This chapter of the Oracle JDK 9 Core Libraries Guide describes the API, its support by the Java XML processors, and usage patterns.
The XML Catalog API is a straightforward API for implementing a local catalog, and the support by the JDK XML processors makes it easier to configure your processors or the entire environment to take advantage of the feature.
Learning More About Creating Catalogs
To learn about creating catalogs, see the Catalog Standard. The XML catalogs under the directory /etc/xml/catalog
on some Linux distributions can also be a good reference for creating a local catalog.
Purpose of XML Catalog API
The XML Catalog API and the Java XML processors provide an option for developers and system administrators to better manage external resources.
The XML Catalog API provides an implementation of OASIS XML Catalogs v1.1, a standard designed to address issues caused by external resources.
Problems Caused by External Resources
XML, XSD and XSL documents may contain references to external resources that the Java XML processors need to retrieve to process the documents. External resources can cause a problem for the applications or the system. The Catalog API and the Java XML processors provide an option for developers and system administrators to better manage these external resources.
-
Availability. When the resources are remote, the XML processors must be able to connect to the remote server. Even though connectivity is rarely an issue, it’s still a factor in the stability of an application. Too many connections can be a hazard to servers that hold the resources (such as the well-documented case involving excessive DTD traffic directed to the W3C’s servers), and this in turn could affect your applications. See Use Catalog with XML Processors) for an example that solves this issue using the XML Catalog API.
-
Performance. Although in most cases connectivity isn’t an issue, a remote fetch can still cause a performance issue for an application. Furthermore, there may be multiple applications on the same system attempting to resolve the same source, and this would be a waste of system resources.
-
Security. Allowing remote connections can pose a security risk if the application processes untrusted XML sources.
-
Manageability. If a system processes a large number of XML documents, then externally referenced documents, whether local or remote, can become a maintenance hassle.
How XML Catalog API Addresses Problems Caused by External Resources
The XML Catalog API and the Java XML processors provide an option for developers and system administrators to better manage the external resources.
-
Application developers – You can create a local catalog of all external references for your application, and let the Catalog API resolve them for the application. This not only avoids remote connections but also makes it easier to manage these resources.
-
System administrators – You can establish a local catalog for your system and configure the Java VM to point to the catalog. Then, all of your applications on the system may share the same catalog without any code changes to the applications, assuming they’re compatible with Java SE 9. To establish a catalog, you may take advantage of existing catalogs such as those included with some Linux distributions.
XML Catalog API Interfaces
Access the XML Catalog API through its interfaces.
XML Catalog API Interfaces
The XML Catalog API defines the following interfaces:
-
The
Catalog
interface represents an entity catalog as defined by XML Catalogs, OASIS Standard V1.1, 7 October 2005. ACatalog
object is immutable. After it’s created, theCatalog
object can be used to find matches in asystem
,public
, oruri
entry. A custom resolver implementation may find it useful to locate local resources through a catalog. -
The
CatalogFeatures
class holds all of the features and properties the Catalog API supports, includingjavax.xml.catalog.files
,javax.xml.catalog.defer
,javax.xml.catalog.prefer
, andjavax.xml.catalog.resolve
. -
The
CatalogManager
class manages the creation of XML catalogs and catalog resolvers. -
The
CatalogResolver
interface is a catalog resolver that implements SAXEntityResolver
, StAXXMLResolver
, DOM LSLSResourceResolver
used by schema validation, and transformURIResolver
. This interface resolves external references using catalogs.
Details on the CatalogFeatures Class
The catalog features are collectively defined in the CatalogFeatures class. The features are defined at the API and system levels, which means that they can be set through the API, system properties, and JAXP properties. To set a feature through the API, use the CatalogFeatures class.
The following code sets javax.xml.catalog.resolve to "continue"
so that the process continues even if no match is found by the CatalogResolver
:
CatalogFeatures f = CatalogFeatures.builder().with(Feature.RESOLVE, "continue").build();
To set this"continue"
functionality system-wide, use the Java command line or System.setProperty method:
System.setProperty(Feature.RESOLVE.getPropertyName(), "continue");
To set this"continue"
functionality for the whole JVM instance, enter a line in the jaxp.properties
file:
javax.xml.catalog.resolve = "continue"
The resolve
property, as well as the prefer
and defer
properties, can be set as an attribute of the catalog or group entry in a catalog file. For example, in the following catalog, the resolve
attribute is set with a value "continue"
on the catalog entry that instructs the processor to continue when the no match is found through this catalog. The attribute can also be set on the group
entry as follows:
<?xml version="1.0" encoding="UTF-8"?>
<catalog
xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
resolve="continue"
xml:base="http://local/base/dtd/">
<group resolve="continue">
<system
systemId="http://remote/dtd/alice/docAlice.dtd"
uri="http://local/dtd/docAliceSys.dtd"/>
</group>
</catalog>
Properties set in a narrower scope override those that are set in a wider one. Therefore, a property set through the API always takes preference.
Using the XML Catalog API
Resolve DTD, entity, and alternate URI references in XML source documents using the various entry types of the XML Catalog standard.
The XML Catalog Standard defines a number of entry types. Among them, the system entries, including system
, rewriteSystem
, and systemSuffix
entries, are used for resolving DTD and entity references in XML source documents, while uri
entries are for alternate URI references.
System Reference
Use a CatalogResolver
object to locate a local resource.
Locating a Local Resource
The following example demonstrates how to use a CatalogResolver
object to locate a local resource using a system
entry, given an XML file that contains a reference to example.dtd
property:
<?xml version="1.0"?>
<!DOCTYPE catalogtest PUBLIC "-//OPENJDK//XML CATALOG DTD//1.0"
"http://openjdk.java.net/xml/catalog/dtd/example.dtd">
<catalogtest>
Test &example; entry
</catalogtest>
The example.dtd
defines an entity "example"
:
<!ENTITY example "system">
The URI to the example.dtd
in the XML doesn't need to exist. The purpose is to provide a unique identifier for the CatalogResolver
object to locate a local resource. To do this, create a catalog entry file called catalog.xml
with a system
entry to refer to the local resource:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<system
systemId="http://openjdk.java.net/xml/catalog/dtd/example.dtd"
uri="example.dtd"/>
</catalog>
With this catalog and the system
entry, all you need to do is get a default CatalogFeatures
object, and set the URI to the catalog file to create a CatalogResolver
object:
CatalogResolver cr =
CatalogManager.catalogResolver(CatalogFeatures.defaults(), catalogUri);
catalogUri
must be a valid URI. For example:
URI.create("file:///users/auser/catalog/catalog.xml")
The CatalogResolver
object can now be used as a JDK XML resolver. In the following example, it’s used as a SAX EntityResolver
:
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
XMLReader reader = factory.newSAXParser().getXMLReader();
reader.setEntityResolver(cr);
Notice that in the example the system identifier is given an absolute URI. That makes it easy for the resolver to find the match with exactly the same systemId
in the catalog's system
entry.
If the system
identifier in the XML is relative, then it may complicate the matching process because the XML processor may have made it absolute with a specified base URI or the source file's URI. In that situation, the systemId
of the system entry would need to match the anticipated absolute URI. An easier solution is to use the systemSuffix
entry, for example:
<systemSuffix systemIdSuffix="example.dtd" uri="example.dtd"/>
The systemSuffix
entry matches any reference that ends with example.dtd
in an XML source and resolves it to a local example.dtd
file as specified in the uri
attribute. You may add more to the systemId
to ensure that it’s unique or the correct reference. For example, you may set the systemIdSuffix
to xml/catalog/dtd/example.dtd
, or rename the id
in both the XML source file and the systemSuffix
entry to make it a unique match, for example my_example.dtd
.
The URI of the system
entry can be absolute or relative. If the external resources have a fixed location, then an absolute URI is more likely to guarantee uniqueness. If the external resources are placed relative to your application or the catalog entry file, then a relative URI may be more effective, allowing the deployment of your application without knowing where it’s installed. Such a relative URI then is resolved using the base URI or the catalog file’s URI if the base URI isn’t specified. In the previous example, example.dtd
is assumed to have been placed in the same directory as the catalog file.
Public Reference
Use a public
entry instead of a system
entry to find a desired resource.
If no system
entry matches the desired resource, and the PREFER
property is specified to match public
, then a public
entry can do the same as a system
entry. Note that public
is the default setting for the PREFER
property.
Using a Public Entry
When the DTD reference in the parsed XML file contains a public identifier such as "-//OPENJDK//XML CATALOG DTD//1.0"
, a public
entry can be written as follows in the catalog entry file:
<public publicId="-//OPENJDK//XML CATALOG DTD//1.0" uri="example.dtd"/>
When you create and use a CatalogResolver
object with this entry file, the example.dtd
resolves through the publicId
property. See System Reference for an example of creating a CatalogResolver
object.
URI Reference
Use a uri
entry to find a desired resource.
The URI type entries, including uri
, rewriteURI
, and uriSuffix
, can be used in a similar way as the system type entries.
Using URI Entries
While the XML Catalog Standard gives a preference to the system
type entries for resolving DTD references, and uri
type entries for everything else, the Java XML Catalog API doesn’t make that distinction. This is because the specifications for the existing Java XML Resolvers, such as XMLResolver
and LSResourceResolver
, doesn’t give a preference. The uri
type entries, including uri
, rewriteURI
, and uriSuffix
, can be used in a similar way as the system
type entries. The uri
elements are defined to associate an alternate URI reference with a URI reference. In the case of system
reference, this is the systemId
property.
You may therefore replace the system
entry with a uri
entry in the following example, although system
entries are more generally used for DTD references.
<system
systemId="http://openjdk.java.net/xml/catalog/dtd/example.dtd"
uri="example.dtd"/>
A uri
entry would look like the following:
<uri name="http://openjdk.java.net/xml/catalog/dtd/example.dtd" uri="example.dtd"/>
While system
entries are frequently used for DTDs, uri
entries are preferred for URI references such as XSD and XSL import and include. The next example uses a uri
entry to resolve a XSL import.
As described in XML Catalog API Interfaces, the XML Catalog API defines the CatalogResolver
interface that extends Java XML Resolvers including EntityResolver
, XMLResolver
, URIResolver
, and LSResolver
. Therefore, a CatalogResolver
object can be used by SAX, DOM, StAX, Schema Validation, as well as XSLT Transform. The following code creates a CatalogResolver
object with default feature settings:
CatalogResolver cr =
CatalogManager.catalogResolver(CatalogFeatures.defaults(), catalogUri);
The code then registers this CatalogResolver
object on a TransformerFactory
class where a URIResolver
object is expected:
TransformerFactory factory = TransformerFactory.newInstance();
factory.setURIResolver(cr);
Alternatively the code can register the CatalogResolver
object on the Transformer
object:
Transformer transformer = factory.newTransformer(xslSource);
transformer.setURIResolver(cur);
Assuming the XSL source file contains an import
element to import the xslImport.xsl
file into the XSL source:
<xsl:import href="pathto/xslImport.xsl"/>
To resolve the import
reference to where the import file is actually located, a CatalogResolver
object should be set on the TransformerFactory
class before creating the Transformer
object, and a uri
entry such as the following must be added to the catalog entry file:
<uri name="pathto/xslImport.xsl" uri="xslImport.xsl"/>
The discussion about absolute or relative URIs and the use of systemSuffix
or uriSuffix
entries with the system reference applies to the uri
entries as well.
Java XML Processors Support
Use the XML Catalogs features with the standard Java XML processors.
The XML Catalogs features are supported throughout the Java XML processors, including SAX and DOM (javax.xml.parsers
), and StAX parsers (javax.xml.stream
), schema validation (javax.xml.validation
), and XML transformation (javax.xml.transform
).
This means that you don’t need to create a CatalogResolver object outside an XML processor. Catalog files can be registered directly to the Java XML processor, or specified through system properties, or in the jaxp.properties file. The XML processors perform the mappings through the catalogs automatically.
Enable Catalog Support
To enable the support for the XML Catalogs feature on a processor, the USE_CATALOG
feature must be set to true
, and at least one catalog entry file specified.
USE_CATALOG
A Java XML processor determines whether the XML Catalogs feature is supported based on the value of the USE_CATALOG
feature. By default, USE_CATALOG
is set to true
for all JDK XML Processors. The Java XML processor further checks for the availability of a catalog file, and attempts to use the XML Catalog API only when the USE_CATALOG
feature is true
and a catalog is available.
The USE_CATALOG
feature is supported by the XML Catalog API, the system property, and the jaxp.properties
file. For example, if USE_CATALOG
is set to true
and it’s desirable to disable the catalog support for a particular processor, then this can be done by setting the USE_CATALOG
feature to false
through the processor's setFeature
method. The following code sets the USE_CATALOG
feature to the specified value useCatalog
for an XMLReader
object:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
XMLReader reader = spf.newSAXParser().getXMLReader();
if (setUseCatalog) {
reader.setFeature(XMLConstants.USE_CATALOG, useCatalog);
}
On the other hand, if the entire environment must have the catalog turned off, then this can be done by configuring the jaxp.properties
file with a line:
javax.xml.useCatalog = false;
javax.xml.catalog.files
The javax.xml.catalog.files
property is defined by the XML Catalog API and supported by the JDK XML processors, along with other catalog features. To employ the catalog feature on a parsing, validating, or transforming process, all that’s needed is to set the FILES
property on the processor, through its system property or using the jaxp.properties
file.
Catalog URI
The catalog file reference must be a valid URI, such as file:///users/auser/catalog/catalog.xml
.
The URI reference in a system or a URI entry in the catalog file can be absolute or relative. If they’re relative, then they are resolved using the catalog file's URI or a base URI if specified.
Using system or uri Entries
When using the XML Catalog API directly (see XML Catalog API Interfaces for an example), system
and uri
entries both work when using the JDK XML Processors' native support of the CatalogFeatures
class. In general, system
entries are searched first, then public
entries, and if no match is found then the processor continues searching uri
entries. Because both system
and uri
entries are supported, it’s recommended that you follow the custom of XML specifications when selecting between using a system
or uri
entry. For example, DTDs are defined with a systemId
and therefore system
entries are preferable.
Use Catalog with XML Processors
Use the XML Catalog API with various Java XML processors.
The XML Catalog API is supported throughout JDK XML processors. The following sections describe how it can be enabled for a particular type of processor.
Use Catalog with DOM
To use a catalog with DOM, set the FILES
property on a DocumentBuilderFactory
instance as demonstrated in the following code:
static final String CATALOG_FILE = CatalogFeatures.Feature.FILES.getPropertyName();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
if (catalog != null) {
dbf.setAttribute(CATALOG_FILE, catalog);
}
Note that catalog
is a URI to a catalog file. For example, it could be something like "file:///users/auser/catalog/catalog.xml"
.
It’s best to deploy resolving target files along with the catalog entry file, so that the files can be resolved relative to the catalog file. For example, if the following is a uri
entry in the catalog file, then the XSLImport_html.xsl
file will be located at /users/auser/catalog/XSLImport_html.xsl
.
<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>
Use Catalog with SAX
To use the Catalog feature on a SAX parser, set the catalog file to the SAXParser
instance:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setXIncludeAware(true);
SAXParser parser = spf.newSAXParser();
parser.setProperty(CATALOG_FILE, catalog);
In the prior sample code, note the statement spf.setXIncludeAware(true)
. When this is enabled, any XInclude
is resolved using the catalog as well.
Given an XML file XI_simple.xml
:
<simple>
<test xmlns:xinclude="http://www.w3.org/2001/XInclude">
<latin1>
<firstElement/>
<xinclude:include href="pathto/XI_text.xml" parse="text"/>
<insideChildren/>
<another>
<deeper>text</deeper>
</another>
</latin1>
<test2>
<xinclude:include href="pathto/XI_test2.xml"/>
</test2>
</test>
</simple>
Additionally, given another XML file XI_test2.xml
:
<?xml version="1.0"?>
<!-- comment before root -->
<!DOCTYPE red SYSTEM "pathto/XI_red.dtd">
<red xmlns:xinclude="http://www.w3.org/2001/XInclude">
<blue>
<xinclude:include href="pathto/XI_text.xml" parse="text"/>
</blue>
</red>
Assume another text file, XI_text.xml
, contains a simple string, and the file XI_red.dtd
is as follows:
<!ENTITY red "it is read">
In these XML files, there is an XInclude
element inside an XInclude
element, and a reference to a DTD. Assuming they are located in the same folder along with the catalog file CatalogSupport.xml
, add the following catalog entries to map them:
<uri name="pathto/XI_text.xml" uri="XI_text.xml"/>
<uri name="pathto/XI_test2.xml" uri="XI_test2.xml"/>
<system systemId="pathto/XI_red.dtd" uri="XI_red.dtd"/>
When the parser.parse
method is called to parse the XI_simple.xml
file, it’s able to locate the XI_test2.xml
file in the XI_simple.xml
file, and the XI_text.xml
file and the XI_red.dtd
file in the XI_test2.xml
file through the specified catalog.
Use Catalog with StAX
To use the catalog feature with a StAX parser, set the catalog file on the XMLInputFactory
instance before creating the XMLStreamReader
object:
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
XMLStreamReader streamReader =
factory.createXMLStreamReader(xml, new FileInputStream(xml));
When the XMLStreamReader
streamReader
object is used to parse the XML source, external references in the source are then resolved in accordance with the specified entries in the catalog.
Note that unlike the DocumentBuilderFactory
class that has both setFeature
and setAttribute
methods, the XMLInputFactory
class defines only a setProperty
method. The XML Catalog API features including XMLConstants.USE_CATALOG
are all set through this setProperty
method. For example, to disable USE_CATALOG
on a XMLStreamReader
object, you can do the following:
factory.setProperty(XMLConstants.USE_CATALOG, false);
Use Catalog with Schema Validation
To use a catalog to resolve any external resources in a schema, such as XSD import
and include
, set the catalog on the SchemaFactory
object:
SchemaFactory factory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Schema schema = factory.newSchema(schemaFile);
The XMLSchema schema document contains references to external DTD:
<!DOCTYPE xs:schema PUBLIC "-//W3C//DTD XMLSCHEMA 200102//EN" "pathto/XMLSchema.dtd" [
...
]>
And to xsd
import:
<xs:import
namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/pathto/xml.xsd">
<xs:annotation>
<xs:documentation>
Get access to the xml: attribute groups for xml:lang
as declared on 'schema' and 'documentation' below
</xs:documentation>
</xs:annotation>
</xs:import>
Following along with this example, to use local resources to improve your application performance by reducing calls to the W3C server:
-
Include these entries in the catalog set on the
SchemaFactory
object:
<public publicId="-//W3C//DTD XMLSCHEMA 200102//EN" uri="XMLSchema.dtd"/>
<!-- XMLSchema.dtd refers to datatypes.dtd -->
<systemSuffix systemIdSuffix="datatypes.dtd" uri="datatypes.dtd"/>
<uri name="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>
-
Download the source files
XMLSchema.dtd
,datatypes.dtd
, andxml.xsd
and save them along with the catalog file.
As already discussed, the XML Catalog API lets you use any of the entry types that you prefer. In the prior case, instead of the uri
entry, you could also use either one of the following:
-
A
public
entry, because thenamespace
attribute in theimport
element is treated as thepublicId
element:
<public publicId="http://www.w3.org/XML/1998/namespace" uri="xml.xsd"/>
-
A
system
entry:
<system systemId="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>
Note:
When experimenting with the XML Catalog API, it might be useful to ensure that none of the URIs or system IDs used in your sample files points to any actual resources on the internet, and especially not to the W3C server. This lets you catch mistakes early should the catalog resolution fail, and avoids putting a burden on W3C servers, thus freeing them from any unnecessary connections. All the examples in this topic and other related topics about the XML Catalog API, have an arbitrary string"pathto"
added to any URI for that purpose, so that no URI could possibly resolve to an external W3C resource.
To use the catalog to resolve any external resources in an XML source to be validated, set the catalog on the Validator
object:
SchemaFactory schemaFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
StreamSource source = new StreamSource(new File(xml));
validator.validate(source);
Use Catalog with Transform
To use the XML Catalog API in a XSLT transform process, set the catalog file on the TransformerFactory
object.
TransformerFactory factory = TransformerFactory.newInstance();
factory.setAttribute(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Transformer transformer = factory.newTransformer(xslSource);
If the XSL source that the factory is using to create the Transformer
object contains DTD, import, and include statements similar to these:
<!DOCTYPE HTMLlat1 SYSTEM "http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd">
<xsl:import href="pathto/XSLImport_html.xsl"/>
<xsl:include href="pathto/XSLInclude_header.xsl"/>
Then the following catalog entries can be used to resolve these references:
<system
systemId="http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd"
uri="XSLDTD.dtd"/>
<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>
<uri name="pathto/XSLInclude_header.xsl" uri="XSLInclude_header.xsl"/>
Calling Order for Resolvers
The JDK XML processors call a custom resolver before the catalog resolver.
Custom Resolver Preferred to Catalog Resolver
The catalog resolver (defined by the CatalogResolver
interface) can be used to resolve external references by the JDK XML processors to which a catalog file has been set. However, if a custom resolver is also provided, then it’s always be placed ahead of the catalog resolver. This means that a JDK XML processor first calls a custom resolver to attempt to resolve external resources. If the resolution is successful, then the processor skips the catalog resolver and continues. Only when there’s no custom resolver or if the resolution by a custom resolver returns null, does the processor then call the catalog resolver.
For applications that use custom resolvers, it’s therefore safe to set an additional catalog to resolve any resources that the custom resolvers don’t handle. For existing applications, if changing the code isn’t feasible, then you may set a catalog through the system property or jaxp.properties
file to redirect external references to local resources knowing that such a setting won’t interfere with existing processes that are handled by custom resolvers.
Detecting Errors
Detect configuration issues by isolating the problem.
The XML Catalogs Standard requires that the processors recover from any resource failures and continue, therefore the XML Catalog API ignores any failed catalog entry files without issuing an error, which makes it harder to detect configuration issues.
Dectecting Configuration Issues
To detect configuration issues, isolate the issues by setting one catalog at a time, setting the RESOLVE
value to strict
, and checking for a CatalogException
exception when no match is found.
Table 4-1 RESOLVE Settings
RESOLVE Value | CatalogResolver Behavior | Description |
---|---|---|
|
Throws a |
An unmatched reference may indicate a possible error in the catalog or in setting the catalog. |
|
Returns quietly |
This is useful in a production environment where you want the XML processors to continue resolving any external references not covered by the catalog. |
|
Returns quietly |
For processors such as SAX, that allow skipping the external references, the |