5 Working with XML Conversions

Inbound Refinery can convert native files to either FlexionDoc or SearchML-styled XML, as well as process XSL transformation using Xalan XSL transformer. XML conversions require the following components to be installed and enabled on the specified server:

Component Name Component Description Enabled on Server
XMLConverter Enables Inbound Refinery to produce FlexionDoc and SearchML-styled XML as the primary web-viewable file or as independent renditions, and can use the Xalan XSL transformer to process XSL transformations. Inbound Refinery Server
XMLConverterSupport Enables Content Server to support XML conversions and XSL transformations. Content Server

This section discusses how to work with XML conversions and how to troubleshoot the conversion process. The section contains the following sections:

5.1 Configuring XML Converter Conversion Settings

This section covers the following topics:

5.1.1 Configuring Content Servers to Send Jobs to Inbound Refinery

File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add-ons. You must configure each content server to send files to refineries for conversion. When a file extension is mapped to a file format and a conversion, files of that type will be sent for conversion when they are checked into the content server. You can configure your file extension, file format, and conversion mappings in your content servers using either the File Formats Wizard or the Configuration Manager.

Most conversions required for Inbound Refinery are available by default in Content Server. In addition to the default conversions that can be used to send file formats to a refinery for XML conversion, the following conversions are added to the content server when the XMLConverterSupport component is installed:

Conversion Description
FlexionXML This conversion can be used to convert files to XML using the FlexionDoc schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). If you want these standard file types to be sent to a refinery for conversion to XML using FlexionDoc, you do not need to re-map their file formats to the FlexionXML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.
SearchML This conversion can be used to convert files to XML using the SearchML schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). If you want these standard file types to be sent to a refinery for conversion to XML using the SearchML, you do not need to re-map their file formats to the SearchML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.
XML-XSLT Transformation After XML Converter converts documents to the FlexionDoc schema, the XML-XSLT conversion allows the resultant XML to be transformed into other XML schema specified by a developer.

5.1.2 Setting Accepted Conversions

Conversions available in the content server should match those available in the refinery. When a file format is mapped to a conversion in the content server, files of that format will be sent for conversion upon checkin. One or more refineries must be set up to accept that conversion.

You set the conversions that the refinery will accept and queue maximums on the Conversion Listing page. Most conversions required for Inbound Refinery are available by default in Inbound Refinery. In addition to the default conversions that can be accepted by a refinery, the FlexionXML and SearchML conversions are added to the refinery when the XMLConverter component is installed. The FlexionXML and SearchML conversions are accepted by default. For more information about setting accepted conversions, refer to the Inbound Refinery Administration Guide.

5.1.3 Setting XML Files as the Primary Web-Viewable Rendition

To set XML files as the primary web-viewable rendition, complete the following steps:

  1. Log into the refinery.

  2. From Conversion Settings, select Primary Web Rendition. The Primary Web-Viewable Rendition page is displayed.

  3. Select the Convert to XML option.

  4. Typically you will want to clear all other conversion options. Inbound Refinery will attempt to convert each incoming file based on the native file format. If the format is not supported for conversion by the first selected method, Inbound Refinery will check to see if the next selected method supports the format, and so on. Inbound Refinery will attempt to convert the file using the first selected method that supports the conversion of the format.

    For example, consider that you select both the Convert to PDF using third-party applications option and the Convert to XML option. The refinery will attempt to convert any supported formats to PDF using the Convert to PDF using third-party applications method. Whether or not this method fails, Inbound Refinery will not attempt another conversion method for these formats. Therefore, you should typically select only the Convert to XML option if you want to create XML files as the primary web-viewable rendition.

  5. Click Update to save your changes.

  6. Click the XML Options button. The XML Options Page is displayed.

  7. Set your XML options, and click Update to save your changes.

  8. Note the following important considerations:

    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. For a complete description of available Flexiondoc and SearchML options, refer to the xx.cfg file located in the refinery IdcHomeDir/components/XMLConverter/resources directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • FlexionDoc and SearchML schema code and documentation files are installed with the XMLConverter component into the refinery IdcHomeDir/components/XMLConverter directory.

5.1.4 Setting XML Files as an Additional Rendition

To set XML files as an additional rendition, complete the following steps:

  1. Log into the refinery.

  2. From Conversion Settings, select Additional Renditions. The Additional Renditions page is displayed.

  3. Select the Create XML renditions for all supported formats option. Inbound Refinery will generate an XML file in addition to other renditions such as PDF files.

    When the generated XML files are delivered back to a content server, the XML files are included in the full-text index. However, if other web-viewable files are generated in addition to the XML file, the XML file is not used as the primary web-viewable rendition. For example, if Inbound Refinery generates both a PDF file and an XML file, the PDF file would be used as the primary web-viewable rendition. XML renditions stored in the content server weblayout directory can be recognized by the characters @x in their filenames. For example, the file Report2001@x~2.xml would be an XML rendition.

  4. Click Update to save your changes.

  5. Click the XML Options button. The XML Options Page is displayed.

  6. Set your XML options, and click Update to save your changes.

  7. Note the following important considerations:

    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. For a complete description of available Flexiondoc and SearchML options, refer to the xx.cfg file located in the refinery IdcHomeDir/components/XMLConverter/resources directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • FlexionDoc and SearchML schema code and documentation files are installed with the XMLConverter component into the refinery IdcHomeDir/components/XMLConverter directory.

5.1.5 Setting Up XSL Transformation

Inbound Refinery uses the Xalan XSLT processor and the SAX validator built into the Java virtual machine running Inbound Refinery. To enable transformation, the XMLConverter component must be installed and enabled on the refinery server and the XMLConverterSupport must be installed and enabled on the content server.

To turn on XSL Transformation do the following:

  1. Log into the refinery server.

  2. Do one of the following:

    • If the XML rendition is to be the primary web-viewable file, click Conversion Settings then Primary Web Rendition. Enable Convert to XML on the Primary Web-Viewable Rendition Page when it is displayed.

    • If the XML is to be an additional rendition, click Conversion Settings then Additional Renditions. Enable Create XML renditions for all supported formats on the Additional Renditions Page when it is displayed.

  3. Click XML Options. The XML Options Page is displayed.

  4. Enable Process XSLT Transformation and select the XML schema to use from the following options:

    • Produce FlexionDoc XML

    • Produce SearchML

  5. Click Update to save your changes or Reset to revert to the last saved settings.

In order to preform XSL transformations Inbound Refinery must have an XSL template to apply during the transformation checked into Oracle Content Server. To check in an XSL template to Oracle Content Server, do the following:

  1. Create an XSL file. The XSL file specifies how an XML file with a specific Content Type will be transformed to a new XML file. You can specify a DTD or schema for validation, but you do not have to. You can store DTD or schema in the content server, but you do not have to.

    Note:

    It is critical that the DTDs and schemas used for XSL transformation are accessible without a login (they cannot require authentication for read access). For example, if a DTD file is stored in Oracle Content Server, it must be in a Security Group that gives read access to anonymous users.

    The XSL file and Native file must both reside in the same Security Group for XSL transformation to be successful.

  2. Check the XSL file into the content server; associating it to a Content Type.

    1. In the Content Check In Form, select the Content Type from the Type drop-down list.

    2. Enter the Content ID according to the following convention:

      <Content Type>.xsl.
      

      For example, if the Content Type is ADACCT, you would enter adacct.xsl.

    3. Enter your XSL file as the Primary File.

    4. Make sure the Security Group matches any DTD/schema files in the content server associated with your XSL file and the native files that will be checked into the content server.

    5. Click Check In.

    Now, when files are checked in with this Content Type, and a FlexionDoc/SearchML XML file is generated by XML Converter or the checked-in file is XML, this XSL file will be used for XSL transformation to a new XML document.

  3. Repeat steps 1 and 2 for each Content Type that you want to post-process to XML.

5.1.5.1 XSLT Errors

When a validation fails, Inbound Refinery collects the errors from the SAX Validation engine, creates an hcsp error page and attempts to check in the page to Oracle Content Server. In order for the refinery to be able to check in an error page you must manually setup in outgoing providers on Inbound Refinery to the content server. The name of Inbound Refinery provide must match the agent name. For example if Inbound Refinery is named production_ibr and it is converting files for a content server named production_cs, then an outgoing provider named production_cs must be created on the production_ibr Inbound Refinery.

If you want to be notified when there are XSL transformation failures, it is recommended that you complete the following steps to set up a criteria workflow in the content server:

  1. Access the Workflow Admin applet for the content server.

  2. Add a criteria workflow for notification of XSLT transformation failures.

  3. Add a workflow step with the following properties:

  • Users: specify the users that should be notified.

  • Exit Conditions: select At least this many reviewers, and set the value to 0.

  • Events: For the Entry event, add the following Custom Script Expression:

    <$if dDocTitle like "*XSLT Error"$>
    <$else$>
    <$wfSet("wfJumpEntryNotifyOff", "1")$>
    <$wfExit(0,0)$>
    <$endif$>
    

5.1.5.2 Support for Unmanaged Files

When a document containing graphics are submitted for XML transformation, the graphics are ignored by default. If you want to have submitted graphics exported, you must manually turn this on by setting xx-graphictype to one of the following values:

  • gif: to export all graphics to GIF format

  • jpeg: to export all graphics to JPEG format

  • png: to export all graphics to PNG format

When xx-graphictype is set, Inbound Refinery can copy the graphics to a specified directory. By default the Inbound Refinery will ignore the extra files. To use the graphics, do the following:

  1. In the Inbound Refinery intradoc.cfg file, set the following variables, where the value for xx_graphictype is the desired format:

    xx_graphictype=jpeg
    AllowXmlConverterCopyUnmanagedFiles=true
    
  2. Also in the Inbound Refinery intradoc.cfg file, set XmlUnmanagedPhysicalDir-<AgentName> to the directory you want the unmanaged files copied for each agent. Typically, this will be the weblayout directory of the content server.

For example, if the refinery server has an agent named production_cs and the weblayout directory is located at c:/oracle/production_cs/weblayout, you would specify the following in the refinery intradoc.cfg file:

XmlUnmanagedPhysicalDir-production_cs=c:/oracle/production_cs/weblayout

Inbound Refinery appends the following to the path when computing the complete location of the unmanaged files:

groups/<dSecurityGroup>/documents/<dDocType>/

The file name of the unmanaged files are of the form:

<dDocName>.<increasingNumber>.jpg

An example of the complete path to an unmanaged file then would be:

c:/oracle/production_cs/weblayout/groups/public/documents/adacct/001166.2.jpg

5.2 Troubleshooting XML Converter Problems

This section covers the following topics:

5.2.1 Inbound Refinery Installation Issues

The following are symptoms of Inbound Refinery installation issues:

5.2.1.1 A refinery or content will not start or does not function properly

After installing Inbound Refinery, a content server or refinery will not start or is not functioning properly.

Possible Causes Solutions
The XMLConverter component has been installed on a content server, or the XMLConverterSupport component has been installed on a refinery. The XMLConverter component must be installed on refineries, and the XMLConverterSupport component must be installed on content servers.

If you install the wrong component, complete the following:

  • Uninstall the component from the content server or refinery using the Component Manager or the Component Wizard.

  • Install and enable the correct component.


5.2.2 Inbound Refinery Setup and Run Issues

The following are symptoms of Inbound Refinery setup and run issues:

5.2.2.1 XML Converter Won't Process Any Files

XML Converter has been installed, but no files are being converted.

Possible Causes Solutions
File formats and conversion methods not set up for file type in the content server. Use the File Formats Wizard or Configuration Manager in the content server to set up the file formats and conversion methods for XML conversion. For details, see "Configuring Content Servers to Send Jobs to Inbound Refinery".
The refinery has not been configured to accept the conversion. Configure the refinery to accept the conversion. For details, see "Setting Accepted Conversions".
The refinery has not been configured to create XML files as the primary web-viewable rendition or an additional rendition. For details, see "Setting XML Files as the Primary Web-Viewable Rendition" and "Setting XML Files as an Additional Rendition".