Use Catalog with XML Processors

Use the XML Catalog API with various Java XML processors.

The XML Catalog API is supported throughout JDK XML processors. The following sections describe how it can be enabled for a particular type of processor.

Use Catalog with DOM

To use a catalog with DOM, set the FILES property on a DocumentBuilderFactory instance as demonstrated in the following code:

static final String CATALOG_FILE = CatalogFeatures.Feature.FILES.getPropertyName();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
if (catalog != null) {
   dbf.setAttribute(CATALOG_FILE, catalog);
}

Note that catalog is a URI to a catalog file. For example, it could be something like "file:///users/auser/catalog/catalog.xml".

It’s best to deploy resolving target files along with the catalog entry file, so that the files can be resolved relative to the catalog file. For example, if the following is a uri entry in the catalog file, then the XSLImport_html.xsl file will be located at /users/auser/catalog/XSLImport_html.xsl.

<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>

Use Catalog with SAX

To use the Catalog feature on a SAX parser, set the catalog file to the SAXParser instance:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setXIncludeAware(true);
SAXParser parser = spf.newSAXParser();
parser.setProperty(CATALOG_FILE, catalog);

In the prior sample code, note the statement spf.setXIncludeAware(true). When this is enabled, any XInclude is resolved using the catalog as well.

Given an XML file XI_simple.xml:

<simple> 
  <test xmlns:xinclude="http://www.w3.org/2001/XInclude">   
    <latin1>
      <firstElement/>
      <xinclude:include href="pathto/XI_text.xml" parse="text"/>
      <insideChildren/>
      <another>
        <deeper>text</deeper>
      </another>
    </latin1>
    <test2>
      <xinclude:include href="pathto/XI_test2.xml"/>   
    </test2>
  </test>
</simple>

Additionally, given another XML file XI_test2.xml:

<?xml version="1.0"?>
<!-- comment before root -->
<!DOCTYPE red SYSTEM "pathto/XI_red.dtd">
<red xmlns:xinclude="http://www.w3.org/2001/XInclude">
  <blue>
    <xinclude:include href="pathto/XI_text.xml" parse="text"/>
  </blue>
</red>

Assume another text file, XI_text.xml, contains a simple string, and the file XI_red.dtd is as follows:

 <!ENTITY red "it is read">

In these XML files, there is an XInclude element inside an XInclude element, and a reference to a DTD. Assuming they are located in the same folder along with the catalog file CatalogSupport.xml, add the following catalog entries to map them:

<uri name="pathto/XI_text.xml" uri="XI_text.xml"/>
<uri name="pathto/XI_test2.xml" uri="XI_test2.xml"/> 
<system systemId="pathto/XI_red.dtd" uri="XI_red.dtd"/>

When the parser.parse method is called to parse the XI_simple.xml file, it’s able to locate the XI_test2.xml file in the XI_simple.xml file, and the XI_text.xml file and the XI_red.dtd file in the XI_test2.xml file through the specified catalog.

Use Catalog with StAX

To use the catalog feature with a StAX parser, set the catalog file on the XMLInputFactory instance before creating the XMLStreamReader object:

XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
XMLStreamReader streamReader =
  factory.createXMLStreamReader(xml, new FileInputStream(xml));

When the XMLStreamReader streamReader object is used to parse the XML source, external references in the source are then resolved in accordance with the specified entries in the catalog.

Note that unlike the DocumentBuilderFactory class that has both setFeature and setAttribute methods, the XMLInputFactory class defines only a setProperty method. The XML Catalog API features including XMLConstants.USE_CATALOG are all set through this setProperty method. For example, to disable USE_CATALOG on a XMLStreamReader object, you can do the following:

factory.setProperty(XMLConstants.USE_CATALOG, false);

Use Catalog with Schema Validation

To use a catalog to resolve any external resources in a schema, such as XSD import and include, set the catalog on the SchemaFactory object:

SchemaFactory factory =
  SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
factory.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Schema schema = factory.newSchema(schemaFile);

The XMLSchema schema document contains references to external DTD:

<!DOCTYPE xs:schema PUBLIC "-//W3C//DTD XMLSCHEMA 200102//EN" "pathto/XMLSchema.dtd" [
   ... 
]>

And to xsd import:

<xs:import
  namespace="http://www.w3.org/XML/1998/namespace"
  schemaLocation="http://www.w3.org/2001/pathto/xml.xsd">
  <xs:annotation>
    <xs:documentation>
      Get access to the xml: attribute groups for xml:lang
      as declared on 'schema' and 'documentation' below
    </xs:documentation>
  </xs:annotation>
</xs:import>

Following along with this example, to use local resources to improve your application performance by reducing calls to the W3C server:

  • Include these entries in the catalog set on the SchemaFactory object:

<public publicId="-//W3C//DTD XMLSCHEMA 200102//EN" uri="XMLSchema.dtd"/>
<!-- XMLSchema.dtd refers to datatypes.dtd --> 
<systemSuffix systemIdSuffix="datatypes.dtd" uri="datatypes.dtd"/>
<uri name="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>
  • Download  the source files XMLSchema.dtd, datatypes.dtd, and xml.xsd and save them along with the catalog file.

As already discussed, the XML Catalog API lets you use any of the entry types that you prefer. In the prior case, instead of the uri entry, you could also use either one of the following:

  • A public entry, because the namespace attribute in the import element is treated as the publicId element:

<public publicId="http://www.w3.org/XML/1998/namespace" uri="xml.xsd"/>
  • A system entry:

<system systemId="http://www.w3.org/2001/pathto/xml.xsd" uri="xml.xsd"/>

Note:

When experimenting with the XML Catalog API, it might be useful to ensure that none of the URIs or system IDs used in your sample files points to any actual resources on the internet, and especially not to the W3C server. This lets you catch mistakes early should the catalog resolution fail, and avoids putting a burden on W3C servers, thus freeing them from any unnecessary connections. All the examples in this topic and other related topics about the XML Catalog API, have an arbitrary string "pathto" added to any URI for that purpose, so that no URI could possibly resolve to an external W3C resource.

To use the catalog to resolve any external resources in an XML source to be validated, set the catalog on the Validator object: 

SchemaFactory schemaFactory =
  SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setProperty(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
StreamSource source = new StreamSource(new File(xml));
validator.validate(source);

Use Catalog with Transform

To use the XML Catalog API in a XSLT transform process, set the catalog file on the TransformerFactory object.

TransformerFactory factory = TransformerFactory.newInstance();
factory.setAttribute(CatalogFeatures.Feature.FILES.getPropertyName(), catalog);
Transformer transformer = factory.newTransformer(xslSource);

If the XSL source that the factory is using to create the Transformer object contains DTD, import, and include statements similar to these:

<!DOCTYPE HTMLlat1 SYSTEM "http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd">
<xsl:import href="pathto/XSLImport_html.xsl"/>
<xsl:include href="pathto/XSLInclude_header.xsl"/>

Then the following catalog entries can be used to resolve these references:

<system
  systemId="http://openjdk.java.net/xml/catalog/dtd/XSLDTD.dtd"
  uri="XSLDTD.dtd"/>
<uri name="pathto/XSLImport_html.xsl" uri="XSLImport_html.xsl"/>
<uri name="pathto/XSLInclude_header.xsl" uri="XSLInclude_header.xsl"/>