4 XML Catalog API

Use the XML Catalog API to implement a local XML catalog.

Java SE 9 introduced a new XML Catalog API to support the Organization for the Advancement of Structured Information Standards (OASIS) XML Catalogs, OASIS Standard V1.1, 7 October 2005. This chapter of the Core Libraries Guide describes the API, its support by the Java XML processors, and usage patterns.

The XML Catalog API is a straightforward API for implementing a local catalog, and the support by the JDK XML processors makes it easier to configure your processors or the entire environment to take advantage of the feature.

Learning More About Creating Catalogs

To learn about creating catalogs, see XML Catalogs, OASIS Standard V1.1, 7 October 2005. The XML catalogs under the directory /etc/xml/catalog on some Linux distributions can also be a good reference for creating a local catalog.

Purpose of XML Catalog API

The XML Catalog API and the Java XML processors provide an option for developers and system administrators to manage external resources. 

The XML Catalog API provides an implementation of OASIS XML Catalogs v1.1, a standard designed to address issues caused by external resources.

Problems Caused by External Resources

XML, XSD and XSL documents may contain references to external resources that Java XML processors need to retrieve to process the documents. External resources can cause a problem for the applications or the system. The Catalog API and the Java XML processors provide an option for developers and system administrators to manage these external resources.

External resources can cause a problem for the application or the system in these areas:

  • Availability: If a resource is remote, then XML processors must be able to connect to the remote server hosting the resource. Even though connectivity is rarely an issue, it’s still a factor in the stability of an application. Too many connections can be a hazard to servers that hold the resources, and this in turn could affect your applications. See Use Catalog with XML Processors for an example that solves this issue using the XML Catalog API.

  • Performance. Although in most cases connectivity isn’t an issue, a remote fetch can still cause a performance issue for an application. Furthermore, there may be multiple applications on the same system attempting to resolve the same resource, and this would be a waste of system resources.

  • Security: Allowing remote connections can pose a security risk if the application processes untrusted XML sources.

  • Manageability: If a system processes a large number of XML documents, then externally referenced documents, whether local or remote, can become a maintenance hassle.

How XML Catalog API Addresses Problems Caused by External Resources

Application developers can create a local catalog of all external references for the application, and let the Catalog API resolve them for the application. This not only avoids remote connections but also makes it easier to manage these resources.

System administrators can establish a local catalog for the system and configure the Java VM to use the catalog. Then, all of the applications on the system may share the same catalog without any code changes to the applications, assuming that they’re compatible with Java SE 9. To establish a catalog, you may take advantage of existing catalogs such as those included with some Linux distributions.

XML Catalog API Interfaces

Access the XML Catalog API through its interfaces.

XML Catalog API Interfaces

The XML Catalog API defines the following interfaces:

  • The Catalog interface represents an entity catalog as defined by XML Catalogs, OASIS Standard V1.1, 7 October 2005. A Catalog object is immutable. After it’s created, the Catalog object can be used to find matches in a system, public, or uri entry. A custom resolver implementation may find it useful to locate local resources through a catalog.

  • The CatalogFeatures class provides the features and properties the Catalog API supports, including javax.xml.catalog.files, javax.xml.catalog.defer, javax.xml.catalog.prefer, and javax.xml.catalog.resolve.

  • The CatalogManager class manages the creation of XML catalogs and catalog resolvers.

  • The CatalogResolver interface is a catalog resolver that implements SAX EntityResolver, StAX XMLResolver, DOM LS LSResourceResolver used by schema validation, and transform URIResolver. This interface resolves external references using catalogs.

Details on the CatalogFeatures Class

The catalog features are collectively defined in the CatalogFeatures class. The features are defined at the API and system levels, which means that they can be set through the API, system properties, and JAXP properties. To set a feature through the API, use the CatalogFeatures class.

The following code sets javax.xml.catalog.resolve to continue so that the process continues even if no match is found by the CatalogResolver:

CatalogFeatures f = CatalogFeatures.builder().with(Feature.RESOLVE, "continue").build();

To set this continue functionality system-wide, use the Java command line or System.setProperty method:

System.setProperty(Feature.RESOLVE.getPropertyName(), "continue");

To set this continue functionality for the whole JVM instance, enter a line in the jaxp.properties file:

javax.xml.catalog.resolve = "continue"

The jaxp.properties file is typically in the $JAVA_HOME/conf directory.

The resolve property, as well as the prefer and defer properties, can be set as an attribute of the catalog or group entry in a catalog file. For example, in the following catalog, the resolve attribute is set with the value continue. The attribute can also be set on the group entry as follows:

<?xml version="1.0" encoding="UTF-8"?> 
<catalog
  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
  resolve="continue"
  xml:base="http://local/base/dtd/">
  <group resolve="continue">
    <system
      systemId="http://remote/dtd/alice/docAlice.dtd"
      uri="http://local/dtd/docAliceSys.dtd"/>     
  </group> 
</catalog>

Properties set in a narrower scope override those that are set in a wider one. Therefore, a property set through the API always takes preference.

Using the XML Catalog API

Resolve DTD, entity, and alternate URI references in XML source documents using the various entry types of the XML Catalog standard.

The XML Catalog Standard defines a number of entry types. Among them, the system entries, including system, rewriteSystem, and systemSuffix entries, are used for resolving DTD and entity references in XML source documents, whereas uri entries are for alternate URI references.

System Reference

Use a CatalogResolver object to locate a local resource.

Locating a Local Resource

The following example demonstrates how to use a CatalogResolver object to locate a local resource.

Consider the following XML file:

<?xml version="1.0"?> 
<!DOCTYPE catalogtest PUBLIC "-//OPENJDK//XML CATALOG DTD//1.0" 
  "http://openjdk.java.net/xml/catalog/dtd/example.dtd">

<catalogtest>
  Test &example; entry
</catalogtest>

The example.dtd file defines an entity example:

<!ENTITY example "system">

However, the URI to the example.dtd file in the XML file doesn't need to exist. The purpose is to provide a unique identifier for the CatalogResolver object to locate a local resource. To do this, create a catalog entry file called catalog.xml with a system entry to refer to the local resource:

<?xml version="1.0" encoding="UTF-8"?> 
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <system
    systemId="http://openjdk.java.net/xml/catalog/dtd/example.dtd"
    uri="example.dtd"/>
</catalog>

With this catalog entry file and the system entry, all you need to do is get a default CatalogFeatures object and set the URI to the catalog entry file to create a CatalogResolver object:

CatalogResolver cr =
  CatalogManager.catalogResolver(CatalogFeatures.defaults(), catalogUri);

catalogUri must be a valid URI. For example:

URI.create("file:///users/auser/catalog/catalog.xml")

The CatalogResolver object can now be used as a JDK XML resolver. In the following example, it’s used as a SAX EntityResolver:

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
XMLReader reader = factory.newSAXParser().getXMLReader();
reader.setEntityResolver(cr);

Notice that in the example the system identifier is given an absolute URI. That makes it easy for the resolver to find the match with exactly the same systemId in the catalog's system entry.

If the system identifier in the XML is relative, then it may complicate the matching process because the XML processor may have made it absolute with a specified base URI or the source file's URI. In that situation, the systemId of the system entry would need to match the anticipated absolute URI. An easier solution is to use the systemSuffix entry, for example:

<systemSuffix systemIdSuffix="example.dtd" uri="example.dtd"/>

The systemSuffix entry matches any reference that ends with example.dtd in an XML source and resolves it to a local example.dtd file as specified in the uri attribute. You may add more to the systemId to ensure that it’s unique or the correct reference. For example, you may set the systemIdSuffix to xml/catalog/dtd/example.dtd, or rename the id in both the XML source file and the systemSuffix entry to make it a unique match, for example my_example.dtd.

The URI of the system entry can be absolute or relative. If the external resources have a fixed location, then an absolute URI is more likely to guarantee uniqueness. If the external resources are placed relative to your application or the catalog entry file, then a relative URI may be more effective, allowing the deployment of your application without knowing where it’s installed. Such a relative URI then is resolved using the base URI or the catalog file’s URI if the base URI isn’t specified. In the previous example, example.dtd is assumed to have been placed in the same directory as the catalog file.

Public Reference

Use a public entry instead of a system entry to find a desired resource.

If no system entry matches the desired resource, and the PREFER property is specified to match public, then a public entry can do the same as a system entry. Note that public is the default setting for the PREFER property.

Using a Public Entry

When the DTD reference in the parsed XML file contains a public identifier such as "-//OPENJDK//XML CATALOG DTD//1.0", a public entry can be written as follows in the catalog entry file:

<public publicId="-//OPENJDK//XML CATALOG DTD//1.0" uri="example.dtd"/>

When you create and use a CatalogResolver object with this entry file, the example.dtd resolves through the publicId property. See System Reference for an example of creating a CatalogResolver object.

URI Reference

Use a uri entry to find a desired resource.

The URI type entries, including uri, rewriteURI, and uriSuffix, can be used in a similar way as the system type entries.

Using URI Entries

While the XML Catalog Standard gives a preference to the system type entries for resolving DTD references, and uri type entries for everything else, the Java XML Catalog API doesn’t make that distinction. This is because the specifications for the existing Java XML Resolvers, such as XMLResolver and LSResourceResolver, doesn’t give a preference. The uri type entries, including uri, rewriteURI, and uriSuffix, can be used in a similar way as the system type entries.  The uri elements are defined to associate an alternate URI reference with a URI reference. In the case of system reference, this is the systemId property.

You may therefore replace the system entry with a uri entry in the following example, although system entries are more generally used for DTD references.

<system
  systemId="http://openjdk.java.net/xml/catalog/dtd/example.dtd"
  uri="example.dtd"/>

A uri entry would look like the following:

<uri name="http://openjdk.java.net/xml/catalog/dtd/example.dtd" uri="example.dtd"/>

While system entries are frequently used for DTDs, uri entries are preferred for URI references such as XSD and XSL import and include. The next example uses a uri entry to resolve a XSL import.

As described in XML Catalog API Interfaces, the XML Catalog API defines the CatalogResolver interface that extends Java XML Resolvers including EntityResolver, XMLResolver, URIResolver, and LSResolver. Therefore, a CatalogResolver object can be used by SAX, DOM, StAX, Schema Validation, as well as XSLT Transform. The following code creates a CatalogResolver object with default feature settings:

CatalogResolver cr =
  CatalogManager.catalogResolver(CatalogFeatures.defaults(), catalogUri);

The code then registers this CatalogResolver object on a TransformerFactory class where a URIResolver object is expected:

TransformerFactory factory = TransformerFactory.newInstance();
factory.setURIResolver(cr);

Alternatively the code can register the CatalogResolver object on the Transformer object:

Transformer transformer = factory.newTransformer(xslSource); 
transformer.setURIResolver(cur);

Assuming the XSL source file contains an import element to import the xslImport.xsl file into the XSL source:

<xsl:import href="pathto/xslImport.xsl"/>

To resolve the import reference to where the import file is actually located, a CatalogResolver object should be set on the TransformerFactory class before creating the Transformer object, and a uri entry such as the following must be added to the catalog entry file:

<uri name="pathto/xslImport.xsl" uri="xslImport.xsl"/>

The discussion about absolute or relative URIs and the use of systemSuffix or uriSuffix entries with the system reference applies to the uri entries as well.

Java XML Processors Support

Use the XML Catalogs features with the standard Java XML processors.

The XML Catalogs features are supported throughout the Java XML processors, including SAX and DOM (javax.xml.parsers), and StAX parsers (javax.xml.stream), schema validation (javax.xml.validation), and XML transformation (javax.xml.transform).

This means that you don’t need to create a CatalogResolver object outside an XML processor. Catalog files can be registered directly to the Java XML processor, or specified through system properties, or in the jaxp.properties file. The XML processors perform the mappings through the catalogs automatically.

Enable Catalog Support

To enable the support for the XML Catalogs feature on a processor, the USE_CATALOG feature must be set to true, and at least one catalog entry file specified.

USE_CATALOG

A Java XML processor determines whether the XML Catalogs feature is supported based on the value of the USE_CATALOG feature. By default, USE_CATALOG is set to true for all JDK XML Processors. The Java XML processor further checks for the availability of a catalog file, and attempts to use the XML Catalog API only when the USE_CATALOG feature is true and a catalog is available.

The USE_CATALOG feature is supported by the XML Catalog API, the system property, and the jaxp.properties file. For example, if USE_CATALOG is set to true and it’s desirable to disable the catalog support for a particular processor, then this can be done by setting the USE_CATALOG feature to false through the processor's setFeature method. The following code sets the USE_CATALOG feature to the specified value useCatalog for an XMLReader object:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
XMLReader reader = spf.newSAXParser().getXMLReader();
if (setUseCatalog) {
   reader.setFeature(XMLConstants.USE_CATALOG, useCatalog); 
}

On the other hand, if the entire environment must have the catalog turned off, then this can be done by configuring the jaxp.properties file with a line:

 javax.xml.useCatalog = false;

javax.xml.catalog.files

The javax.xml.catalog.files property is defined by the XML Catalog API and supported by the JDK XML processors, along with other catalog features. To employ the catalog feature on a parsing, validating, or transforming process, all that’s needed is to set the FILES property on the processor, through its system property or using the jaxp.properties file.

Catalog URI

The catalog file reference must be a valid URI, such as file:///users/auser/catalog/catalog.xml.

The URI reference in a system or a URI entry in the catalog file can be absolute or relative. If they’re relative, then they are resolved using the catalog file's URI or a base URI if specified.

Using system or uri Entries

When using the XML Catalog API directly (see XML Catalog API Interfaces for an example), system and uri entries both work when using the JDK XML Processors' native support of the CatalogFeatures class. In general, system entries are searched first, then public entries, and if no match is found then the processor continues searching uri entries. Because both system and uri entries are supported, it’s recommended that you follow the custom of XML specifications when selecting between using a system or uri entry. For example, DTDs are defined with a systemId and therefore system entries are preferable.

Use Catalog with XML Processors

Use the XML Catalog API with various Java XML processors.

The XML Catalog API is supported throughout JDK XML processors. The following sections describe how it can be enabled for a particular type of processor.

Use Catalog with DOM

To use a catalog with DOM, set the FILES property on a DocumentBuilderFactory instance as demonstrated in the following code:

static final String CATALOG_FILE = CatalogFeatures.Feature.FILES.getPropertyName();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
if (catalog != null) {
   dbf.setAttribute(CATALOG_FILE, catalog);
}

Note that catalog is a URI to a catalog file. For example, it could be something like "file:///users/auser/catalog/catalog.xml".

It’s best to deploy resolving target files along with the catalog entry file, so that the files can be resolved relative to the catalog file. For example, if the following is a uri entry in the catalog file, then the XSLImport_html.xsl file will be located at /users/auser/catalog/XSLImport_html.xsl.

<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>

Use Catalog with SAX

To use the Catalog feature on a SAX parser, set the catalog file to the SAXParser instance:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setXIncludeAware(true);
SAXParser parser = spf.newSAXParser();
parser.setProperty(CATALOG_FILE, catalog);

In the prior sample code, note the statement spf.setXIncludeAware(true). When this is enabled, any XInclude is resolved using the catalog as well.

Given an XML file XI_simple.xml:

<simple> 
  <test xmlns:xinclude="http://www.w3.org/2001/XInclude">   
    <latin1>
      <firstElement/>
      <xinclude:include href="pathto/XI_text.xml" parse="text"/>
      <insideChildren/>
      <another>
        <deeper>text</deeper>
      </another>
    </latin1>
    <test2>
      <xinclude:include href="pathto/XI_test2.xml"/>   
    </test2>
  </test>
</simple>

Additionally, given another XML file XI_test2.xml:

<?xml version="1.0"?>
<!-- comment before root -->
<!DOCTYPE red SYSTEM "pathto/XI_red.dtd">
<red xmlns:xinclude="http://www.w3.org/2001/XInclude">
  <blue>
    <xinclude:include href="pathto/XI_text.xml" parse="text"/>
  </blue>
</red>

Assume another text file, XI_text.xml, contains a simple string, and the file XI_red.dtd is as follows:

 <!ENTITY red "it is read">

In these XML files, there is an XInclude element inside an XInclude element, and a reference to a DTD. Assuming they are located in the same folder along with the catalog file CatalogSupport.xml, add the following catalog entries to map them:

<uri name="pathto/XI_text.xml" uri="XI_text.xml"/>
<uri name="pathto/XI_test2.xml" uri="XI_test2.xml"/> 
<system systemId="pathto/XI_red.dtd" uri="XI_red.dtd"/>

When the parser.parse method is called to parse the XI_simple.xml file, it’s able to locate the XI_test2.xml file in the XI_simple.xml file, and the XI_text.xml file and the XI_red.dtd file in the XI_test2.xml file through the specified catalog.

Use Catalog with StAX

To use the catalog feature with a StAX parser, set the catalog file on the XMLInputFactory instance before creating the XMLStreamReader object:

XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
XMLStreamReader streamReader =
  factory.createXMLStreamReader(xml, new FileInputStream(xml));

When the XMLStreamReader streamReader object is used to parse the XML source, external references in the source are then resolved in accordance with the specified entries in the catalog.

Note that unlike the DocumentBuilderFactory class that has both setFeature and setAttribute methods, the XMLInputFactory class defines only a setProperty method. The XML Catalog API features including XMLConstants.USE_CATALOG are all set through this setProperty method. For example, to disable USE_CATALOG on a XMLStreamReader object, you can do the following:

factory.setProperty(XMLConstants.USE_CATALOG, false);

Use Catalog with Schema Validation

To use a catalog to resolve any external resources in a schema, such as XSD import and include, set the catalog on the SchemaFactory object:

SchemaFactory factory =
  SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Schema schema = factory.newSchema(schemaFile);

The XMLSchema schema document contains references to external DTD:

<!DOCTYPE xs:schema PUBLIC "-//W3C//DTD XMLSCHEMA 200102//EN" "pathto/XMLSchema.dtd" [
   ... 
]>

And to xsd import:

<xs:import
  namespace="http://www.w3.org/XML/1998/namespace"
  schemaLocation="http://www.w3.org/2001/pathto/xml.xsd">
  <xs:annotation>
    <xs:documentation>
      Get access to the xml: attribute groups for xml:lang
      as declared on 'schema' and 'documentation' below
    </xs:documentation>
  </xs:annotation>
</xs:import>

Following along with this example, to use local resources to improve your application performance by reducing calls to the W3C server:

  • Include these entries in the catalog set on the SchemaFactory object:

<public publicId="-//W3C//DTD XMLSCHEMA 200102//EN" uri="XMLSchema.dtd"/>
<!-- XMLSchema.dtd refers to datatypes.dtd --> 
<systemSuffix systemIdSuffix="datatypes.dtd" uri="datatypes.dtd"/>
<uri name="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>
  • Download  the source files XMLSchema.dtd, datatypes.dtd, and xml.xsd and save them along with the catalog file.

As already discussed, the XML Catalog API lets you use any of the entry types that you prefer. In the prior case, instead of the uri entry, you could also use either one of the following:

  • A public entry, because the namespace attribute in the import element is treated as the publicId element:

<public publicId="http://www.w3.org/XML/1998/namespace" uri="xml.xsd"/>
  • A system entry:

<system systemId="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>

Note:

When experimenting with the XML Catalog API, it might be useful to ensure that none of the URIs or system IDs used in your sample files points to any actual resources on the internet, and especially not to the W3C server. This lets you catch mistakes early should the catalog resolution fail, and avoids putting a burden on W3C servers, thus freeing them from any unnecessary connections. All the examples in this topic and other related topics about the XML Catalog API, have an arbitrary string "pathto" added to any URI for that purpose, so that no URI could possibly resolve to an external W3C resource.

To use the catalog to resolve any external resources in an XML source to be validated, set the catalog on the Validator object: 

SchemaFactory schemaFactory =
  SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
StreamSource source = new StreamSource(new File(xml));
validator.validate(source);

Use Catalog with Transform

To use the XML Catalog API in a XSLT transform process, set the catalog file on the TransformerFactory object.

TransformerFactory factory = TransformerFactory.newInstance();
factory.setAttribute(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Transformer transformer = factory.newTransformer(xslSource);

If the XSL source that the factory is using to create the Transformer object contains DTD, import, and include statements similar to these:

<!DOCTYPE HTMLlat1 SYSTEM "http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd">
<xsl:import href="pathto/XSLImport_html.xsl"/>
<xsl:include href="pathto/XSLInclude_header.xsl"/>

Then the following catalog entries can be used to resolve these references:

<system
  systemId="http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd"
  uri="XSLDTD.dtd"/>
<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>
<uri name="pathto/XSLInclude_header.xsl" uri="XSLInclude_header.xsl"/>

Calling Order for Resolvers

The JDK XML processors call a custom resolver before the catalog resolver.

Custom Resolver Preferred to Catalog Resolver

The catalog resolver (defined by the CatalogResolver interface) can be used to resolve external references by the JDK XML processors to which a catalog file has been set. However, if a custom resolver is also provided, then it’s always be placed ahead of the catalog resolver. This means that a JDK XML processor first calls a custom resolver to attempt to resolve external resources. If the resolution is successful, then the processor skips the catalog resolver and continues. Only when there’s no custom resolver or if the resolution by a custom resolver returns null, does the processor then call the catalog resolver.

For applications that use custom resolvers, it’s therefore safe to set an additional catalog to resolve any resources that the custom resolvers don’t handle. For existing applications, if changing the code isn’t feasible, then you may set a catalog through the system property or jaxp.properties file to redirect external references to local resources knowing that such a setting won’t interfere with existing processes that are handled by custom resolvers.

Detecting Errors

Detect configuration issues by isolating the problem.

The XML Catalogs Standard requires that the processors recover from any resource failures and continue, therefore the XML Catalog API ignores any failed catalog entry files without issuing an error, which makes it harder to detect configuration issues.

Dectecting Configuration Issues

To detect configuration issues, isolate the issues by setting one catalog at a time, setting the RESOLVE value to strict, and checking for a CatalogException exception when no match is found.

Table 4-1 RESOLVE Settings

RESOLVE Value CatalogResolver Behavior Description

strict (default)

Throws a CatalogException if no match is found with a specified reference

An unmatched reference may indicate a possible error in the catalog or in setting the catalog.

continue

Returns quietly

This is useful in a production environment where you want the XML processors to continue resolving any external references not covered by the catalog.

ignore

Returns quietly

For processors such as SAX, that allow skipping the external references, the ignore value instructs the CatalogResolver object to return an empty InputSource object, thus skipping the external reference.