23 Working with Conversions

When using Oracle WebCenter Content: Inbound Refinery, several different conversion operations can be configured and managed including PDF conversion, XML conversion, Tiff conversion, and converting Microsoft Office files to HTML. This chapter discusses the tasks involved in managing those conversion types.

Note:

Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.

For additional information describing the different types of conversion, how and where they are performed, and the advantages of each type, see the "Conversions in WebCenter Content" blog.

This chapter includes the following topics:

23.1 Managing PDF Conversions

Inbound Refinery can convert native files to PDF by either exporting to PDF directly using Oracle Outside In PDF Export (included with Inbound Refinery) or by using third-party applications to output the native file to PostScript and then using a third-party PDF distiller engine to convert the PostScript file to PDF.

PDF conversions require the following components to be installed and enabled on the Inbound Refinery server.

Component Name Component Description Enabled on Server

PDFExportConverter

Enables Inbound Refinery to use Oracle OutsideIn to convert native formats directly to PDF without the use of any third-party tools. PDF Export is fast, multi-platform, and allows concurrent conversions.

Inbound Refinery Server

WinNativeConverter

Enables Inbound Refinery to convert native files to a PostScript file with either the native application or OutsideInX and convert the PostScript file to PDF using a third-party distiller engine. This component is for Windows platform only. It replaces the functionality previously made available in the deprecated PDFConverter component.

WinNativeConverter offers the best rendition quality of all PDF conversion options when used with the native application on a Windows platform. This does not allow concurrent conversions.

WinNativeConverter also enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application.

Inbound Refinery Server

OpenOfficeConversion

Provides cross-platform support allowing Inbound Refinery to convert supported files to PDF using Open Office. Like WinNativeConverter, OpenOfficeConversion doesn't allow concurrent conversions, but unlike WinNativeConverter, it does support UNIX platforms.

Inbound Refinery Server

Note:

Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.

This section describes how to work with PDF conversions and includes the following topics:

23.1.1 PDF Conversion Considerations

There are several factors to consider when choosing a PDF conversion method. System performance (the time it takes to convert a file to PDF format), the fidelity of the PDF output (how closely it matches the look and formatting of the native file), what native applications are needed (such as Microsoft Word or PowerPoint, used to generate the PostScript file converted by Inbound Refinery), and the platform a conversion application requires should all be taken into consideration.

If the speed of conversion is a primary concern, using PDF Export to convert original files directly to PDF is fastest. In addition to not having to use third-party tools, PDF Export allows concurrent PDF conversions and supports Windows, Linux and UNIX platforms.

If the fidelity of the PDF output is a primary concern, then using the native application to open the original file, output to PostScript, and convert the PostScript to PDF is the best option. However, this method is limited to the Windows platform and it cannot run concurrent PDF conversions.

If conversion must be done on a UNIX platform, then using OpenOffice to open a native file and export directly to a PDF file may be the best option. Depending on how it is set up, it may provide greater fidelity than PDF Export. However, unlike PDF Export, it does not support concurrent PDF conversions. Table 23-1 compares conversion methods and lists the platforms they support.

Note:

Regardless of the conversion option used, a PDF is a web-ready version of the native format. A converted PDF should not be expected to be an exact replica of the native format. Many factors such as font substitutions, complexity and format of embedded graphics, table structure, or issues with third-party distiller engines may cause the PDF output to differ from the native format.

Table 23-1 PDF Conversion Methods

Conversion Method Performance Fidelity Supported Platforms Concurrent PDF Conversions

PDF Export

Best

Good

Windows/UNIX

Yes

3rd-Party Native Applications

Good

Best

Windows

No

OpenOffice

Good

Good

Windows/UNIX

No

23.1.2 Configuring PDF Conversion Settings

This section discusses the following topics regarding PDF conversion settings:

23.1.2.1 Configuring Content Servers to Send Jobs to Inbound Refinery

File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion. When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. Use either the File Formats Wizard or the Configuration Manager to set the file extension, file format, and conversion mappings.

All conversions required for Inbound Refinery are available by default in Content Server. For more information about configuring file extensions, file formats, and conversions in your Content Servers, see About MIME Types and Managing File Types.

Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion upon check-in. One or more refineries must be set up to accept that conversion. Set the conversions that the refinery will accept and queue maximums on the Conversion Listing page. All conversions required for Inbound Refinery are available by default in both Content Server and Inbound Refinery.

For more information about setting accepted conversions, see Setting Accepted Conversions.

23.1.2.2 Setting PDF Files as the Primary Web‐Viewable Rendition

To set PDF files as the primary web‐viewable rendition:

  1. Log into the refinery.
  2. Select Conversion Settings, then select Primary Web Rendition.
  3. On the Primary Web-Viewable Rendition page, select one or more of the following conversion methods. For a conversion method to be available, the associated components must be installed and enabled:
    • Convert to PDF using PDF Export: when running on either Windows or UNIX, Inbound Refinery uses Outside In PDF Export to convert files directly to PDF without the use of third-party applications. PDFExportConverter must be enabled on the refinery server.

    • Convert to PDF using third-party applications: when running on Windows, Inbound Refinery can use several third-party applications to create PDF files of content items. In most cases, a third‐party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, Inbound Refinery can use a third-party application to convert a file directly to PDF. For this option to be available, WinNativeConverter must be enabled on the refinery server. In addition, when using this option, Inbound Refinery requires the following:

      • A PostScript distiller engine.

      • A PostScript printer.

      • The third‐party applications used during the conversion.

    • Convert to PDF using OpenOffice: when running on either Windows or UNIX, Inbound Refinery can use OpenOffice to convert some file types directly to PDF. For this option to be available, OpenOfficeConversion must be installed on the refinery server. When using this option, Inbound Refinery requires only OpenOffice.

    • Convert to PDF using Outside In: Inbound Refinery includes Outside In, which can be used with WinNativeConverter on Windows to create PDF files of some content items. Outside In is used to print the files to PostScript, and then the PostScript files are converted to PDF using the configured PostScript distiller engine. When using this option, Inbound Refinery requires only a PostScript distiller engine.

    Inbound Refinery attempts to convert each incoming file based on the conversion method assigned to the format by the Content Server. If the format is not supported for conversion by the first selected method, Inbound Refinery checks to see if the next selected method supports the format, and so on. Inbound Refinery will attempt to convert the file using the first selected method that supports the conversion of the format.

    For example, consider that you select both the Convert to PDF using third-party applications option and the Convert to PDF using Outside In option. You then send a Microsoft Word file to the refinery for conversion. Because the Microsoft Word file format is supported for conversion to PDF using a third-party application (Microsoft Word), Inbound Refinery attempts to use the Convert to PDF using third-party applications method to convert the file to PDF as the primary web-viewable rendition.

    If this method fails, Inbound Refinery does not attempt the Convert to PDF using Outside In method. However, if you send a JustWrite file to the refinery for conversion, this file format is not supported for conversion to PDF using the Convert to PDF using third-party applications method, so Inbound Refinery will check to see if this format is supported by the Convert to PDF using Outside In method. Because this format is supported by Outside In, Inbound Refinery will attempt to convert the file to PDF using Outside In.

  4. Click Update to save your changes.
  5. When using the Convert to PDF using Third-Party Applications method or the Convert to PDF using Outside In method, click the corresponding PDF Web-Viewable Options button.
  6. On the PDF Options page, set your PDF options, and click Update to save your changes.

23.1.2.3 Installing a Distiller Engine and PDF Printer

When converting documents to PDF using WinNativeConverter, a distiller engine and PDF printer must be obtained, installed and configured. This is not necessary when converting to PDF using either Outside In PDF Export or OpenOffice to open and save documents to PDF.

WinNativeConverter can use several third-party applications to create PDF files of content items. In most cases, a third-party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, WinNativeConverter can use a third-party application to convert a file directly to PDF.

Note:

A distiller engine is not provided with Inbound Refinery. You must obtain a distiller engine of your choice. The chosen distiller engine must be able to execute conversions via a command-line. The procedures in this section use AFPL Ghostscript as an example. This is a free, robust distiller engine that performs both PostScript to PDF conversion and optimization of PDF files during or after conversion.

To install the PDF printer:

  1. Obtain and install a distiller engine on the computer where Inbound Refinery has been deployed.
  2. Start the SystemProperties utility:
    • Microsoft Windows: Choose Start then Programs then Oracle Content Server. Choose refinery_instance then Utilities then System Properties.

  3. Open the Printer tab.
  4. Click Browse next to the Printer Information File field and navigate to the printer information file installed with your distiller engine.
  5. Enter a name for the printer in the Printer Name field.
  6. Enter the name of the printer driver in the Printer Driver Name field. This name should match the name used in the printer driver information file.
  7. Enter the port path in the Printer File Port Path field. For example, c:\temp\idcout.ps
  8. Click Install Printer and follow the printer install instructions when prompted.

    Note:

    After a printer is installed, the fields on the System Properties Printer tab are disabled. If the installed printer is deleted, the Printer tab is enabled again and the printer must be reinstalled.

  9. Click OK to apply the change and exit System Properties.

23.1.2.4 Configuring Third‐Party Application Settings

To change third‐party application settings:

  1. Log into the refinery.
  2. Select Conversion Settings then Third‐Party Application Settings.
  3. On the Third-Party Application Settings page, click Options for the third‐party application.
  4. Change the third‐party application options.
  5. Click Update to save your changes.

23.1.2.5 Configuring Timeout Settings for PDF Conversions

To configure timeout settings for PDF file generation:

  1. Log into the refinery.
  2. Select Conversion Settings then Timeout Settings.
  3. On the Timeout Settings page, enter the Minimum (in minutes), Maximum (in minutes), and Factor for the following conversion operations:
    • Native to PostScript: the stage in which the original (native) file is converted to a PostScript (PS) file.

    • PostScript to PDF: the stage in which the PS file is converted to a Portable Document Format (PDF) file.

    • FrameMaker to PostScript: these values apply to the conversion of Adobe FrameMaker files to PS files.

    • PDF to Post Production: the stage in which any processing is performed after the file has been converted to PDF format.

  4. Click Update to save your changes.

23.1.2.6 Setting Margins When Using Outside In

Inbound Refinery includes Outside In version 8.3.2. When using Outside In to convert graphics to PDF, you can set the margins for the generated PDF from 0–4.23 inches or 0–10.76 cm. By default, Inbound Refinery uses 1‐inch margins on the top, bottom, right, and left.

To adjust these margins:

  1. Use a text editor to open the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory.
  2. Change the following settings:
    OIXTopMargin=
    OIXBottomMargin=
    OIXLeftMargin=
    OIXRightMargin=
    
  3. To change the margin units from inches to centimeters, set the following:
     OIXMarginUnitInch=false
    
  4. Save your changes to the intradoc.cfg file.
  5. Restart the refinery.

23.1.3 Configuring OpenOffice

This section discusses the following topics regarding OpenOffice conversions:

23.1.3.1 OpenOffice Configuration Considerations

Typically, the OpenOffice Listener must always be running on the Inbound Refinery computer, or PDF conversion will fail. When running OpenOffice on Windows, configure an OpenOffice port in the Setup.xcu file and run the OpenOffice Quickstarter. The Quickstarter adds shortcuts to OpenOffice applications to the system tray and runs the OpenOffice Listener as a background process.

By default, the Quickstarter loads at system startup and the OpenOffice icon should be in the system tray. To start the Quickstarter, launch any OpenOffice application. The application can then be closed, and the Quickstarter remains running. To set the Quickstarter to load at system startup, right-click the OpenOffice icon in the system tray, and choose Load OpenOffice.org During System Start-Up.

Note:

OpenOffice can be launched by Inbound Refinery running as a service on Windows XP, 2000, 2003. However, because you must be logged in to Windows to run the OpenOffice Listener, you must always be logged in to Windows when using OpenOffice for PDF conversion even when running Inbound Refinery as a service.

23.1.3.2 Configuring the OpenOffice Port and Setting up the Listener

When running OpenOffice on UNIX, it is recommended that you configure an OpenOffice port and run soffice, which acts as the Listener. Soffice can be used on Windows instead of the Quickstarter.

To start soffice, launch the soffice.exe file located in the following directory:

Windows: OpenOffice_install_dir\openoffice.org3\program\

UNIX: OpenOffice_install_dir/openoffice.org3/program

Note:

For versions of OpenOffice prior to 3.x, soffice.exe is located in the following directories:

Windows: OpenOffice_install_dir\program\

UNIX: OpenOffice_install_dir/program

Editing Setup.xcu or main.xcd

Prior to version 3.3 of OpenOffice, the file Setup.xcu was used to configure a listening port. Starting with version 3.3, Setup.xcu was incorporated into the file main.xcd. If configuring a version of OpenOffice prior to 3.3, then the steps below apply to editing the Setup.xcu file. If configuring version 3.3 or later versions of OpenOffice, the steps below apply to editing the main.xcd file.

To configure an OpenOffice port:

  1. In a standard text editor, open the Setup.xcu file (for versions prior to 3.3) or main.xcd file (for version 3.3. or higher) of OpenOffice. The Setup.xcu file is located in the following directory:
    • Windows: OpenOffice_install_dir\share\registry\data\org\openoffice\

    • UNIX: OpenOffice_install_dir/share/registry/data/org/openoffice

    The main.xcd file is located in the following directory:

    • Windows: OpenOffice_install_dir\openoffice.org\basisversion_number\share\registry

    • UNIX: OpenOffice_install_dir/openoffice.org/basisversion_number/share/registry

  2. Search for the element <node oor:name="Office">. This element contains several <prop/> elements.
  3. Insert the following <prop/> element on the same level as the existing elements, as the first element:
    <prop oor:name="ooSetupConnectionURL" oor:type="xs:string">
    <value>socket,host=localhost,port=8100;urp;</value>
    </prop>
    

    This configures OpenOffice to provide a socket on port 8100, where it will serve connections via the UNO remote protocol (URP). Be careful to block port 8100 for connections from outside your network in your firewall. Using port 8100 is recommended. However, it might be necessary to adjust the port number if port 8100 is already in use. In this case, replace 8100 in the element.

  4. After making changes to the Setup.xcu or main.xcd file, stop and restart the Quickstarter (Windows) or soffice (UNIX or Windows).

23.1.3.3 Setting Port for Session Using soffice Command Line Parameters

As an alternative to configuring an OpenOffice port in the Setup.xcu file and then running the OpenOffice Quickstarter (Windows) or soffice (UNIX or Windows), soffice can be launched from the command line with parameters. However, these settings only apply to the current session. To launch soffice from the command line:

  1. Open a command window and navigate to the following directory:
    • Windows: OpenOffice_install_dir\openoffice.org3\program\

    • UNIX: OpenOffice_install_dir/openoffice.org3/program

  2. Enter the following command:
    soffice "-accept=socket,port=8100;urp;"
    
  3. Verify that OpenOffice is listening on the specified port by opening a command window and entering one of the following commands:
    netstat -a
    netstat -na
    

    An output similar to the following shows that OpenOffice is listening:

    TCP <Hostname>:8100 <Fully qualified hostname>: 0 Listening
    

23.1.3.4 Configuring Inbound Refinery to Use OpenOffice

To configure Inbound Refinery to use OpenOffice:

  1. If port 8100 was not used when modifying the OpenOffice Setup.xcu file, do the following:

    1. In the Inbound Refinery administration interface, select Conversion Settings then Third-Party Application Settings.

    2. On the Third-Party Application Settings page, click the Options button for OpenOffice.

    3. On the The OpenOffice Options page, in the Port to Connect to the OpenOffice Listener field, enter the port that you used when modifying the OpenOffice Setup.xcu file.

    4. Click Update.

  2. Restart Inbound Refinery.

23.1.3.5 Setting Classpath to OpenOffice Class Files

If converting documents using OpenOffice, Oracle Inbound Refinery requires class files distributed with OpenOffice. You must set the path to the OpenOffice class files in the refinery intradoc.cfg file, located in the DomainHome/ucm/ibr/bin directory. To set the path in the intradoc.cfg file:

  1. Navigate to the DomainHome/ucm/ibr/bin directory and open the intradoc.cfg file in a standard text editor.
  2. At the end of the file, enter the following:
    JAVA_CLASSPATH_openoffice_jars=OfficePath/Basis/program/classes/unoil.jar:OfficePath/URE/java/ridl.jar:OfficePath/URE/java/jurt.jar:OfficePath/URE/java/juh.jar
    

    Note:

    The true value for OfficePath is likely to include spaces and care must be taken when setting this in a Microsoft Windows environment. Ensure that the paths are not enclosed in quotes, that slashes (/) are used for path separators and not backslashes (\), and that any space in the path is escaped using a backslash (\). For example, a properly formed classpath in a Windows environment could look like this:

    JAVA_CLASSPATH_openoffice_jars=C:/Program\ Files/OpenOffice.org\
    3/Basis/program/classes/unoil.jar:C:/Program\ Files/OpenOffice.org\
    3/URE/java/ridl.jar:C:/Program\ Files/OpenOffice.org\
    3/URE/java/jurt.jar:C:/Program\ Files/OpenOffice.org\ 3/URE/java/juh.jar
    
  3. Save and close the intradoc.cfg file.
  4. Restart Inbound Refinery.

23.1.3.6 Using OpenOffice Without Logging In to Host

Inbound Refinery can use OpenOffice to convert some file types directly to PDF. This is done by configuring the OpenOffice listener, which must be running in order for conversions to be successful. Typically, you must be logged in to the computer on which OpenOffice is installed in order for OpenOffice to be able to open and process any documents. However, the OpenOffice listener can be run in headless mode with no graphical user interface.

Note:

Before setting up the OpenOffice listener to run in headless mode, confirm that documents can be converted to PDF using OpenOffice running in a non-headless mode. Also, turn off any extra screens that start up before OpenOffice can be used, such as startup dialogs, tip wizards, or update notices. These cause the refinery process to time out, because conversions will not proceed until these screens are cleared and they are not displayed in headless mode.

The following information describes how to set up headless mode on a Windows host and on a UNIX host.

23.1.3.6.1 Setting Up Headless Mode on a Windows Host

To convert documents to PDF using OpenOffice without being logged in to a Windows host, you must create a custom service to run the OpenOffice listener in headless mode. The Windows Resource Kits provide the INSTSRV.EXE and SRVANY.EXE utilities to create custom services.

To set up a custom OpenOffice service:

  1. In the MS-DOS command prompt, type the following command:

    path\INSTSRV.EXE service_name path\SRVANY.EXE
    

    where path is the path to the Windows Resource Kit, and service_name is the name of your custom service. This name can be anything, but should be descriptive to identify the service. When done, a new service key is created in your Windows registry.

  2. Open the Registry Editor by selecting Start, then select Run, entering regedit, and clicking OK.

    Note:

    Backup your registry before editing it.

  3. Backup your registry by choosing File then Export and entering a name for the backup file, then clicking Save. Remember the location to which the backup file is saved should you need to restore the registry.

  4. Navigate to the new registry key created in the first step and select the new service key. The new key is located at:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\service_name

  5. With the new key selected, choose Edit then New, then select Key, and name it Parameters.

  6. Right-click on the Parameters key, select New, then select String Value, and name the value Application.

  7. Right-click on the Application string and select Modify.

  8. Type in the full path to soffice.exe, appended with -headless. For example:

    C:\Program Files\OpenOffice2.0\program\soffice.exe-headless
    
  9. Close the Registry Editor and restart the computer.

  10. After the computer has successfully restarted, choose Start then Settings then Control Panel. Choose Administrative Tools then Services to open Windows Services.

  11. On the Windows Services page, right-click the service you just created, choose Properties and ensure that the service is set up to start automatically

  12. Select the Log On tab and enable This account. This enables the service to run using a specific user account.

  13. Enter the same user credentials that the Inbound Refinery is using to run.

    Note:

    The Inbound Refinery user will need to have the right to log on as a service on the Inbound Refinery computer.

  14. Start the service, accept the changes and close Windows Services.

23.1.3.6.2 Setting Up Headless Mode on a UNIX Host

To convert documents to PDF using OpenOffice without being logged in to a UNIX host, the OpenOffice listener must run in headless mode with no graphical user interface, using a virtual buffer display (X server).

Note:

Each UNIX environment is unique. This information is a general guideline for setting up the OpenOffice listener in headless mode on UNIX platforms. An example of the procedure for Red Hat EL4 is also included.

In general, to configure the OpenOffice listener to run in headless mode on UNIX platforms:

Note:

Before setting up OpenOffice to run in headless mode, ensure that Inbound Refinery is installed and configured correctly to successfully convert documents to PDF using OpenOffice in non-headless mode.

  1. Create a startup script to run Inbound Refinery when the system boots up.

  2. Configure a virtual X server and create a startup script to run it when the system boots up, to enable OpenOffice to run.

  3. Create a startup script to run OpenOffice in headless mode when the system boots up.

  4. Configure the system to run the startup scripts in the following order:

    1. Start Inbound Refinery

    2. Start the virtual X server

    3. Start OpenOffice

      Note:

      The virtual X server must be started prior to starting OpenOffice, or OpenOffice will not run. Additionally, remember to ensure that the web server is also configured to run when the system boots up.

23.1.4 Converting Microsoft Office Files to PDF

When running on Windows, Inbound Refinery can use Microsoft Office to convert Microsoft Office files to PDF files. The following Microsoft Office versions are supported:

  • Microsoft Office 2003

  • Microsoft Office 2007

    Microsoft Office 2010

    Note:

    Support for Microsoft Office 2007 excludes support for Microsoft Project 2007.

Please note the following important general considerations:

  • Microsoft Office is used to convert Microsoft Office files to PDF when the Convert to PDF Using third-party applications option is selected on the Primary Web-Viewable Rendition page.

  • Inbound Refinery can convert a number of special features in Microsoft Office files into links in the generated PDF files. You set the conversion options for Microsoft Office files using the Third-Party Application Settings page.

  • To keep a conversion of a Microsoft Office file from timing out, all functions requiring user input should be disabled. These include password protection, security notifications, such as disabling of macros, and online access requests to show online content or participate in user feedback programs. For details on how to disable these and other similar features, see the Microsoft documentation for each product.

  • If a Microsoft Office file was converted to a PDF file successfully, but one or more links in the file could not be converted to links in the PDF file, the conversion status of that file is set to Incomplete. To prevent this from happening, you can set AllowSkippedHyperlinkToCauseIncomplete=false in the intradoc.cfg configuration file located in the refinery DomainDir\ucm\ibr\bin directory.

This section discusses the following topics regarding Microsoft Office conversions:

23.1.4.1 Converting Microsoft Word Files to PDF

Consider the following when running Inbound Refinery on Windows and using Microsoft Word to convert Word files to PDF:

  • Any information in a Word file that is outside of the document's print area will not be converted to PDF.

  • Password-protected files will time out unless the need for a password is removed.

  • On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.

  • The following types of links in Word files can be converted to PDF:

    • Absolute URL links (for example, http://www.example.com). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target). In order to be processed as an absolute URL link, Word must return the http:// prefix as a part of the link. All supported versions of Microsoft Word automatically enforce this rule.

    • Relative URL links (for example, ../../../../portal.htm). These links do not contain any server name or protocol prefix.

    • Mailto links (links to email addresses; for example mailto:support@example.com). In order to be processed as an email link, Word must return the mailto: prefix as a part of the link. All supported versions of Microsoft Word automatically enforce this rule.

    • Table of Contents links (converted to bookmarks in the generated PDF file).

    • Bookmarks (internal links to auto-generated or author-generated bookmarks).

    • Standard heading styles (Heading 1, Heading 2, and so on, which are converted to bookmarks in the generated PDF file).

    • Links to footnotes and endnotes.

    • UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc). This option is not currently available on the Word Options panel. To enable this functionality, you must set the ProcessWordUncLinks=true variable in the refinery connection's intradoc.cfg file (DomainHome\ucm\ibr\bin\intradoc.cfg). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.

  • Links in text boxes are not converted.

  • Links within nested tables are not converted. If it is critical to convert links within nested tables, consider using Oracle Outside In PDF Export (included with Inbound Refinery) to convert MS Word documents to PDF, rather than the native MS Word application. For information about different conversion options, see Managing PDF Conversions. Alternately, Adobe Reader has a general preference that allows Adobe Reader to enable links formatted in the document. See the documentation that is available for your version of Adobe Reader for information.

  • Linked AutoShapes and objects (for example, pictures or WordArt objects) located in tables are not converted.

  • You might notice in some generated PDF files that the hotspot for a link is sometimes slightly off from the actual text (within a character or two). To date there are no know problems related to this occurrence, and there is currently no solution.

23.1.4.2 Converting Microsoft Excel Files to PDF

Consider the following when running Inbound Refinery on Windows and using Microsoft Excel to convert Excel files to PDF:

  • Any information in an Excel file that is outside of the document's print area will not be converted to PDF.

  • Password-protected files will time out unless the need for a password is removed.

  • On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.

  • Only external links are converted to PDF links. This is because, in the current implementation, it is impossible (or extremely difficult) to know which page of the generated PDF file will contain the target of an internal link (bookmark).

  • Only the following types of links in Excel files can be converted to PDF:

    • Absolute URL links (for example, http://www.example.com). You can also use links that specify targets on the page (for example, http://idvm001/ibr/portal.htm#target). In order to be processed as an absolute URL link, Excel must return the http:// prefix as a part of the link. All supported versions of Microsoft Excel automatically enforce this rule.

    • Relative URL links (for example, ../../../../portal.htm). These links do not contain any server name or protocol prefix.

    • Mailto links (links to email addresses; for example mailto:support@example.com). In order to be processed as an email link, Excel must return the mailto: prefix as a part of the link. All supported versions of Microsoft Excel automatically enforce this rule.

      Note:

      The Excel Options panel does not have separate settings for absolute URL links and relative URL links. If the Process Excel URL Links option is selected, absolute URL links and relative URL links are all converted to PDF.

    • UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc). This option is not currently available on the Excel Options panel. To enable this functionality, you must set the ProcessExcelUncLinks=true variable in the refinery connection's intradoc.cfg file (DomainHome\ucm\ibr\bin\intradoc.cfg). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.

  • Links are only converted if they are located in cells. Links in text boxes, WordArt objects, and so forth are not converted. In the generated PDF file, the hotspot for the link is the cell that contains the link.

  • The Scaling on the Page Setup for the worksheet must be set to Adjust to: ###%normal size (and not Fit to Page). Further, the closer to 100% the scale is set, the better the results.

23.1.4.3 Converting Microsoft PowerPoint Files to PDF

Consider the following when running Inbound Refinery on Windows and using Microsoft PowerPoint to convert PowerPoint files to PDF:

  • Any information in a PowerPoint file that is outside of the document's print area will not be converted to PDF.

  • Password-protected files will time out unless the need for a password is removed.

  • On Word 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.

  • PowerPoint has two types of links: Hyperlinks, which behave the same in all Office applications, and Action Settings. The MSOfficeConverter.exe supports the following Action Settings: Hyperlink to: Next Slide, Previous Slide, and URL. All other links should be inserted as hyperlinks.

  • The following types of hyperlinks in PowerPoint files can be converted to PDF:

    • Absolute URL links (for example, http://www.example.com). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target). In order to be processed as an absolute URL link, PowerPoint must return the http:// prefix as a part of the link. All supported versions of Microsoft PowerPoint automatically enforce this rule.

    • Relative URL links (for example, ../../../../portal.htm). These links do not contain any server name or protocol prefix.

    • Mailto links (links to email addresses; for example mailto:support@example.com). In order to be processed as an email link, PowerPoint must return the mailto: prefix as a part of the link. All supported versions of Microsoft PowerPoint automatically enforce this rule.

    • Bookmarks (internal links to auto-generated or author-generated bookmarks).

      Note:

      The PowerPoint Options panel does not have separate settings for absolute URL links and relative URL links. If the Process PowerPoint Hyperlinks option is selected, absolute and relative URL link are all converted to PDF.

    • UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc). This option is not currently available on the PowerPoint Options panel. To enable this functionality, you must set the ProcessPowerPointUncLinks=true variable in the refinery connection's intradoc.cfg file (DomainHome\ucm\ibr\bin\intradoc.cfg). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.

  • PowerPoint hyperlinks can only be processed if PowerPoint presentations are converted in the Slides format.

  • It is technically possible to have a link on an object (for example, a text box) over a link on an individual line of text. Because of the way the PDF is assembled, only the link on the object are active in the generated PDF file. Logically, any given spot in a PDF can only be assigned one action; so the top action is performed.

23.1.4.4 Converting Microsoft Visio Files to PDF

Consider the following when running Inbound Refinery on Windows and using Microsoft Visio to convert Visio files to PDF:

  • Any information in a Visio file that is outside of the document's print area will not be converted to PDF.

  • Visio files created using the Cross-Functional Flowchart template might cause a pop-up dialog to appear when Visio attempts to open the file, and thus the refinery process will time out unless this dialog is cleared manually on the refinery computer. To prevent this from happening, do not base files on the Cross-Functional Flowchart template. You can use Outside In to convert Visio files based on the Cross-Functional Flowchart template. Note that Visio links are not converted by Outside In.

  • Password-protected files will time out unless the need for a password is removed.

  • On Visio 2003, choose Tools then Options then General. Turn off Show content and links from Microsoft Online under the Online category, and opt out of the Customer Experience Improvement Program under the Customer Feedback category. If you do not, these files might time out.

  • The following types of links in Visio files can be converted to PDF:

    • Absolute URL links (for example, http://www.example.com). You can also use links that specify targets on the page (for example http://idvm001/ibr/portal.htm#target). Visio does not enforce some of the link rules that are enforced by other Office applications. Therefore, the author of the Visio document must use the http:// prefix as a part of the link. If the prefix is not found, the link is converted as a relative URL link, which will probably not produce the link. Without the author applying this prefix, there is no way for the conversion engine to distinguish the link from relative and mailto links.

    • Relative URL links (for example, ../../../../portal.htm). These links do not contain any server name or protocol prefix.

    • Mailto links (links to email addresses; for example mailto:support@example.com). Visio does not enforce some of the link rules that are enforced by other Office applications. Therefore, the author of the Visio document must use the mailto: prefix as a part of the link. If the prefix is not found, the link is converted as a relative URL link, which will probably not produce the link. Without the author applying this prefix, there is no way for the conversion engine to distinguish the link from absolute and relative links.

    • Bookmarks (internal links to auto-generated or author-generated bookmarks). When Process internal Visio links is selected on the Visio Options panel, all internal document links to other sheets are included in the generated PDF.

      Note:

      For proper conversion of internal links in Visio 2003, you must clear the address field when creating the link. By default, the address field is populated with the file name of the file in which the link occurs. If this is not cleared, the link is converted as a link to the original file and prompts the user to download the original file when the link is clicked.

    • UNC path links (for example, \\server1\c\TestDocs\MSOfficeXP\word\target.doc). This option is not currently available on the Visio Options panel. To enable this functionality, you must set the ProcessVisioUncLinks=true variable in the refinery connection's intradoc.cfg file (DomainHome\ucm\ibr\bin\intradoc.cfg). In general, UNC paths have no relevance in a web browser; a UNC path is not a URL. Therefore, the PDF must be opened outside of the web browser for UNC path links to be resolved correctly. If you are using UNC path links, you might want to configure the Reader on client computers to open PDF files outside the browser.

  • All Microsoft Visio files should be set so the printer paper size and orientation matches the drawing page size and orientation. Otherwise, links are not converted correctly (they are placed in the wrong location). For example, if the printer paper is set to Letter/Landscape, the drawing page should also be set to Letter/Landscape.

  • In the generated PDF file, the hotspot for a Visio link is a square that encompasses the shape; even if the shape itself is not a square.

23.1.4.5 Using Relative versus Absolute Links in Office Documents

Both relative and absolute links can be converted to PDF in Word, Excel, PowerPoint, and Visio files.

  • Example absolute link:

    http://system/ucm/groups/public/documents/addacct/000123.pdf

  • Example relative link:

    ..\addacct\000123.pdf

When creating links, absolute and relative links each have advantages and disadvantages. Absolute links are easy to copy and paste, however, relative links can eliminate issues if the Content Server is migrated to a new computer or if the IP address and DNS names change; relative links will always be relative to the location of the web viewable file for the document you are checking in.

Note:

The following procedure applies to Inbound Refinery with Microsoft Office installed, using the Convert to PDF using third-party applications option. This procedure does not apply to configurations using Inbound Refinery on UNIX.

To use relative links in Word, Excel, PowerPoint, and Visio documents:

  1. Log into the refinery.

  2. Choose Conversion Settings then Third‐Party Application Settings.

  3. On the Third-Party Application Settings page, click Options for the third‐party application.

  4. Click Update to save your changes.

  5. Use relative links, instead of absolute links, when authoring documents. It is important to understand that these links are relative to the location of the web viewable file for the document being checked in:

    • Example 1: Relative linking with the same document type and security

      Create a link to document 000123: If the document was checked into security group "public" and document type "adacct", this document has a web viewable URL of:

      http://system/ucm/groups/public/documents/adacct/000123.pdf

      If you check document 000456 into the same security group and document type, its web viewable URL would be:

      http://system/ucm/groups/public/documents/adacct/000456.pdf

      Because the URL path is identical to 000123, the relative URL link in the document for 000456 would only need to be:

      000123.pdf

    • Example 2: Relative linking to a different document type

      Using the same document names, if you checked document 000456 into the same security group but a different document type, its web viewable URL would look like:

      http://system/ucm/groups/public/documents/adcorp/000456.pdf

      This means that your relative URL link needs to go up one directory and then into "adacct" to find 000123.pdf. So the relative URL link would be:

      ..\adacct\000123.pdf

    • Example 3: Relative linking to a different document security

      Now if you also change the security group of document 000456, its web viewable URL would look like:

      http://system/ucm/groups/secure/documents/adcorp/000456.pdf

      This means that the relative URL link will need to go up three directories and then back down to 000123.pdf. So the relative URL link would be:

      ..\..\..\public\adacct\000123.pdf

  6. Check the documents into the Content Server. When converting the documents to PDF, the refinery will create links relative to the location of the web viewable file for each document you are checking in.

23.2 Managing Tiff Conversions

Tiff conversion enables the following functionality specific to TIFF (Tagged Image File Format) files:

  • Creation of a managed PDF file from a single or multiple-page TIFF file.

  • Creation of a managed PDF file from multiple TIFF files that have been compressed into a single ZIP file.

  • OCR (Optical Character Recognition) during TIFF-to-PDF conversion. This enables indexing of the text within checked-in TIFF files, so that users can perform full-text searches of these files.

The TiffConverter component is supported on Windows only. For information on file formats and languages that can be converted by PdfCompressor, see the documentation provided by CVISION.

Note:

The TiffConverter component requires CVISION CVista PdfCompressor to perform TIFF-to-PDF conversion with OCR. PdfCompressor is not provided with the TiffConverter component. You must obtain PdfCompressor from CVISION.

TIFF conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

TiffConverter

Enables Inbound Refinery to convert single or multipage TIFF files to PDF complete with searchable text.

Inbound Refinery Server

TiffConverterSupport

Enables Content Server to support TIFF to PDF conversion.

Content Server

23.2.1 Configuring Content Servers to Send Jobs for Tiff Conversion

File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the TiffConverterSupport component on a Content Server adds three TIFFConversion options on the File Formats Wizard page.

For a content item to be processed by Inbound Refinery, its file extension (for example, TIF or TIFF) must be mapped to a format name associated with the TIFFConversion conversion method. The added conversion options for Tiff Converter are not automatically mapped. They must be mapped manually. The following topics describe how to set the mappings:

23.2.1.1 Using the File Formats Wizard for Tiff Conversion

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. You can convert TIFF to PDF with OCR or TIFF to PDF without OCR.

To convert TIFF to PDF with OCR:

  1. Log in to the Content Server as an administrator.

  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.

  3. On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (TIFFConversion) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the TIFFConversion conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the image/tiff file format to PASSTHRU, so TIF and TIFF files are not processed by Inbound Refinery.

    Note:

    The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.

  4. If you have added tifz and tiz file extensions using the Configuration Manager, you can select tifz, tiz on the File Format Wizard page to enable application/zip options in the File Type (conversion name) field menu.

    • Compressed Tiff to PDF (tifz, tiz): Selecting this menu item maps the TIFZ and TIZ file extensions to the graphic/tiff-x-compressed file format and associates the graphic/tiff-x-compressed file format with the TIFFConversion conversion method. When TIFZ or TIZ files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the graphic/tiff-x-compressed file format to PASSTHRU, so TIFZ and TIZ files are not processed by Inbound Refinery.

    • Compressed Tiff to PDF (zip): Selecting this menu item maps the ZIP file extension to the application/zip file format and associates the application/zip file format with the TIFFConversion conversion method. When ZIP files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the application/zip file format to PASSTHRU, so that ZIP files are not processed by Inbound Refinery.

  5. Click Update to save all changes.

To convert TIFF to PDF without OCR:

  1. Log in to the Content Server as an administrator.

  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.

  3. On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (Direct PDFExport)  in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the Direct PDFExport conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using oit PDFExport and converted to PDF without OCR.

    Note:

    When the TIFF to PDF (Direct Export) options is used, only the metadata in the resulting PDF is searchable, the text is not searchable.

  4. Click Update to save all changes.

23.2.1.2 Using the Configuration Manager for Tiff Conversion

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:

  1. Log in to Content Server as an administrator.

  2. From the main menu, choose Administration, then Admin Applets.

  3. From the Applets list, choose Configuration Manager.

    The Configuration Manager applet is started.

  4. In the Configuration Manager applet, choose Options then File Formats.

  5. To enable single, unzipped TIFF files (TIF and TIFF) to be processed by Inbound Refinery:

    1. In the File Formats section, check that the image/tiff file format is added and associated with the TIFFConversion conversion method.

      Note:

      The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.

    2. In the File Extensions section, check that the tif and tiff file extensions are added and mapped to the image/tiff file format.

  6. To enable TIFF files that have been compressed into a single TIFZ or TIZ file to be processed by Inbound Refinery:

    1. In the File Formats section, check that the graphic/tiff-x-compressed file format is and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, check that the tifz and tiz file extensions are added and mapped to the graphic/tiff-x-compressed file format.

  7. To enable TIFF files that have been compressed into a single ZIP file to be processed by Inbound Refinery:

    1. In the File Formats section, check that the application/zip file format is added and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, check that the zip file extension is added and mapped to the application/zip file format.

23.2.1.3 Tips for Processing Zip Files in Tiff Conversion

The ZIP file extension might be used in multiple ways in your environment. For example, you might be checking in:

  • Multiple TIFF files compressed into a single ZIP file for Inbound Refinery to convert to a single PDF file with OCR.

  • Multiple file types compressed into a single ZIP file that should not be processed (the ZIP file should be passed through in its native format).

When using the ZIP file extension in multiple ways, Oracle recommends configuring the Content Server to allow the user to choose how ZIP files are processed at check-in. This is referred to as Allow override format on check-in. To enable this Content Server functionality:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration, then Admin Server then General Configuration.
  3. Enable the Allow override format on checkin setting and click Save.
  4. Restart the Content Server.
  5. Using the Configuration Manager, set up the file formats:
    • Map the application/zip file format to the TIFFConversion conversion method. This option can then be selected to send ZIP files containing TIFF files to Inbound Refinery. For a description, enter Zipped Tiff to PDF.

    • Set up an alternate file format, for example called application/zip-passthru, mapped to PassThru for zipped files that should not be converted. For a description, enter Zip Passthru.

      Note:

      The Content check-in Form page lists file formats by their description.

  6. Map the ZIP file extension to the file format that will be used most commonly. This will be the default conversion method for ZIP files.
  7. When a user checks in a ZIP file, the user can override the default conversion method by selecting any of the conversion methods that are set up.

Note:

If you are using the upload applet to check in multiple files, the files are compressed into a single ZIP file before being checked in. In this case Oracle also recommends enabling Allow override format on check-in so the user can choose how the ZIP file is processed when uploading multiple TIFFs.

Tip:

When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.

23.2.2 Configuring Tiff Conversion Settings

This section discusses the following topics regarding conversion settings:

23.2.2.1 Setting Accepted Conversions

When installed on the refinery, the TiffConverter component adds the TIFFConversion option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server.

23.2.2.2 Changing Timeout Settings

The timeout settings should reflect the processing time required for the size of TIFF files that are commonly checked in to the Content Server. This is highly variable depending on CPU power and TIFF complexity. Perform these tasks to determine the appropriate timeout values for TIFF files:

  • Run and time several representative Inbound Refinery jobs using CVista PdfCompressor alone (without the Inbound Refinery).

  • Examine the document history information and evaluate the required processing time.

  • Change Inbound Refinery timeout settings accordingly.

    Note:

    Information about Tiff Converter timeouts is recorded in the Inbound Refinery and agent logs.

To configure timeout settings for Tiff to PDF file generation:

  1. Log into the refinery.
  2. Choose Settings then Timeouts.
  3. On the Timeouts page, enter the Minimum (in minutes), the Maximum (in minutes), and Factor for the Tiff to PDF Conversion. This is the stage in which the original (native) TIFF file is converted to a Portable Document Format (PDF) file.following conversion operations:

    For more information about how timeout settings are calculated and examples, see Configuring Inbound Refinery.

  4. Click Update to save all changes.

23.2.3 Configuring CVista PdfCompressor

This section discusses the following topics regarding the CVista PdfCompressor:

23.2.3.1 Changing PdfCompressor Settings

These options are specific to CVista PdfCompressor. If the TiffConverter component is not installed, the CVista PdfCompressor Options are not available.

To change the PdfCompressor settings:

  1. Login to the refinery.
  2. Choose Conversion Settings then Third-Party Applications Settings.
  3. On the Third-Party Application Settings page, click Options for CVista PdfCompressor.
  4. On the CVista PdfCompressor Options page, set the path to the location of the CVista PdfCompressor executable in the appropriate text box.
  5. Enter the string of parameter values in the parameters option text box. A default option string is set on installation of the TiffConverter component.
  6. Click Update to save the settings.

Tip:

When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.

The following recommended parameter strings should produce optimal results for each given scenario. If these settings do not produce the intended results, modify these strings by removing or appending settings. For more information on these and other available settings, see the online help provided with CVista PdfCompressor (especially "Appendix A: Command-Line Flags for Compression").

Default CVista PdfCompressor Parameters - OCR Enabled

A default string is set when the TiffConverter component is installed unless a string already exists (if the string was set using a previous version of Tiff Converter). The default string has been optimized for typical PdfCompressor usage with OCR enabled:

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong

CVista PdfCompressor Parameters- Horizontal and Vertical OCR Enabled

The following string can be used for typical usage with OCR and support OCR processing of both vertical and horizontal text in the same image (add -ocrtwod):

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐ocrtwod ‐lsize 25 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong

CVista PdfCompressor Parameters - No OCR

The following string can be used for simple conversion (without OCR):

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong

23.2.3.2 Configuring CVista PdfCompressor OCR Languages

Note:

Changes made in the CVista PdfCompressor user interface do not affect how CVista PdfCompressor functions when called by Tiff Converter.

By default, CVista PdfCompressor uses an English OCR dictionary when performing OCR on TIFF files. However, CVista PdfCompressor can perform OCR on several other languages.

To set up multiple OCR languages and enable the user to choose the OCR language at check-in:

Note:

If the following method is used, language parameters should not be specified or passed to the refinery via the CVista PdfCompressor Options Page.

  1. Obtain the appropriate current language files by contacting CVISION:

    • A lng file is required for each language.

    • Czech, Polish, and Hungarian also require the latin2.shp file.

    • Russian also requires the cyrillic.shp file.

    • Greek also requires the greek.shp file.

    • Turkish also requires the turkish.shp file.

  2. Place the CVISION language files in the CVista installation directory. The default location is C:\Program Files\CVision\PdfCompressorxx\ where xx stands for the version number of PdfCompressor.

  3. Log in to Content Server as an administrator.

  4. From the main menu, choose Administration then Admin Applets.

  5. From the Applets list, choose Configuration Manager.

  6. On the Configuration Manager page, click Information Fields tab.

  7. If the OCRLang information field has been added, skip this step. If it has not been added:

    1. In the Field Info section, click Add.

    2. On the Add Custom Info page, in the Field Name field, enter OCRLang. This creates a new information field for CVista language conversion options.

      Note:

      Enter this field name exactly.

    3. Click OK.

    4. On the Add Custom Info Field page, in the Field Caption field, enter the descriptive caption to be displayed on the Content check-in Form page. For example, OCR Language.

    5. From the Field Type list, choose Text.

    6. Select the Enable Option List check box.

    7. From the Option List Type list, choose Select List Validated.

    8. In the Use option list field, enter xOCRLangList.

    9. Click Edit next to the Use Option List field.

    10. On the Option List page, enter the CVista OCR languages to present as options. The following language names are valid options.

      Note:

      You can use either the English language name or the native equivalent (if listed). However, you must enter the language options exactly as they appear in the following table.

      English Native

      Czech

      -

      Danish

      Dansk

      Dutch

      Nederlands

      English

      -

      Finnish

      Suomi

      French

      Français

      German

      Deutsch

      Greek

      -

      Hungarian

      Magyar

      Italian

      Italiano

      Norwegian

      Norsk

      Polish

      Polski

      Portuguese

      Português

      Russian

      -

      Spanish

      Español

      Swedish

      Svenska

      Turkish

      -

    11. Select the Ignore Case check box.

    12. Click OK.

    13. In the Default Value field, enter the default OCR language option.

    14. Click OK to save the settings and return to the Information Fields tab.

    15. Click Update Database Design.

  8. If the OCRLang Information field has been added, but changes must be made to the languages option list and/or the default language:

    1. In the Field Info section, select OCRLang and click Edit.

    2. On the Add Custom Info page, click Edit next to the Use Option List field.

    3. On the Option List page, delete any unused CVista OCR languages.

    4. Click OK.

    5. In the Default Value field, enter the default OCR language option.

    6. Click OK to save the settings and return to the Information Fields tab.

  9. Close the Configuration Manager applet. When a user checks in a TIFF file, the user can override the default OCR language by selecting any of the OCR languages that were set up.

23.3 Managing XML Conversions

XML conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

XMLConverter

Enables Inbound Refinery to produce FlexionDoc and SearchML-styled XML as the primary web-viewable file or as independent renditions, and can use the Xalan XSL transformer to process XSL transformations.

Inbound Refinery Server

XMLConverterSupport

Enables Content Server to support XML conversions and XSL transformations.

Content Server

23.3.1 Configuring Content Servers to Send Jobs to Inbound Refinery

File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion.

When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. File extension, file format, and conversion mappings can be configured using either the File Formats Wizard or the Configuration Manager.

Most conversions required for Inbound Refinery are available by default in Content Server. In addition to the default conversions, the following conversions are added to the Content Server when the XMLConverterSupport component is installed.

Conversion Description

FlexionXML

Used to convert files to XML using the FlexionDoc schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using FlexionDoc, their file formats do not need to be re-mapped to the FlexionXML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.

SearchML

Used to convert files to XML using the SearchML schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using SearchML, their file formats do not need to be re-mapped to the SearchML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.

XSLT Transformation

After XML Converter converts documents to the FlexionDoc schema, the XSLT conversion allows the resultant XML to be transformed into other XML schema specified by a developer.

Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion on check-in. One or more refineries must be set up to accept that conversion.

Most conversions required for Inbound Refinery are available by default. In addition to the default conversions that can be accepted by a refinery, the FlexionXML and SearchML conversions are added to the refinery when the XMLConverter component is installed. The FlexionXML and SearchML conversions are accepted by default.

23.3.2 Setting XML Files as the Primary Web‐Viewable Rendition

To set XML files as the primary web‐viewable rendition:

  1. Log into the refinery.
  2. Choose Conversion Settings then select Primary Web Rendition.
  3. On the Primary Web-Viewable Renditions page, select the Convert to XML option.
  4. Typically all other conversion options should be cleared. Inbound Refinery attempts to convert each incoming file based on the native file format. If the format is not supported for conversion by the first selected method, Inbound Refinery checks if the next selected method supports the format, and so on. Inbound Refinery attempts to convert the file using the first selected method that supports the conversion of the format.

    For example, suppose you select both the Convert to PDF using third-party applications option and the Convert to XML option. The refinery attempts to convert any supported formats to PDF using the Convert to PDF using third-party applications method. Whether or not this method fails, Inbound Refinery does not attempt another conversion method for these formats. Therefore, you should typically select only the Convert to XML option to create XML files as the primary web-viewable rendition.

  5. Click Update to save all changes.
  6. Click XML Options.
  7. On the XML Options page, set XML options, and click Update to save the changes.
  8. Note the following important considerations:
    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. For a complete description of available Flexiondoc and SearchML options, see the xx.cfg file located in the refinery IdcHomeDir/components/XMLConverter/resources directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • FlexionDoc and SearchML documentation files are installed with the XMLConverter component and located in the refinery IdcHomeDir/components/XMLConverter directory.

23.3.3 Setting XML Files as an Additional Rendition

To set XML files as an additional rendition:

  1. Log into the refinery.
  2. From Conversion Settings, select Additional Renditions.

    The Additional Renditions page opens.

  3. Select the Create XML renditions for all supported formats option. Inbound Refinery will generate an XML file in addition to other renditions such as PDF files.

    When the generated XML files are delivered back to a Content Server, the XML files are included in the full-text index. However, if other web‐viewable files are generated in addition to the XML file, the XML file is not used as the primary web‐viewable rendition. For example, if Inbound Refinery generates both a PDF file and an XML file, the PDF file would be used as the primary web‐viewable rendition. XML renditions stored in the Content Server weblayout directory can be recognized by the characters @x in their file names. For example, the file Report2001@x~2.xml would be an XML rendition.

  4. Click Update to save your changes.
  5. Click XML Options.
  6. On the XML Options page, set your XML options, and click Update to save your changes.
  7. Note the following important considerations:
    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • For a complete description of available Flexiondoc and SearchML options, see the xx.cfg and sx.cfg files located in the refinery IdcHomeDir/components/XMLConverter/resources directory. These configuration files are for reference only and should not be modified.

    • FlexionDoc and SearchML schema code and documentation files are installed with the XMLConverter component into the refinery IdcHomeDir/components/XMLConverter directory.

23.3.4 Setting Up XSL Transformation

Inbound Refinery uses the Xalan XSLT processor and the SAX validator built into the Java virtual machine running Inbound Refinery. To enable transformation, the XMLConverter component must be installed and enabled on the refinery server and the XMLConverterSupport component must be installed and enabled on the Content Server.

To turn on XSL Transformation:

  1. Log into the refinery server.

  2. Do one of the following:

    • If the XML rendition is to be the primary web-viewable file, click Conversion Settings then Primary Web Rendition. Enable Convert to XML on the Primary Web-Viewable Rendition Page when it is displayed.

    • If the XML is to be an additional rendition, click Conversion Settings then Additional Renditions. Enable Create XML renditions for all supported formats on the Additional Renditions Page when it is displayed.

  3. Click XML Options.

  4. On the XML Options page, enable Process XSLT Transformation and select the XML schema to use from the following options:

    • Produce FlexionDoc XML

    • Produce SearchML

  5. Click Update to save all changes or Reset to revert to the last saved settings.

In order to preform XSL transformations Inbound Refinery must have an XSL template to apply during the transformation checked into Content Server. To check in an XSL template to Content Server:

  1. Create an XSL file. The XSL file specifies how an XML file with a specific Content Type will be transformed to a new XML file. A DTD or schema can be specified for validation and stored in the Content Server, but is not required.

  2. Check the XSL file into the Content Server and associate it to a Content Type.

    1. In the Content check-in Form, select the Content Type from the Type list.

    2. Enter the Content ID according to the following convention:

      Content Type.xsl

      For example, if the Content Type is Documents, enter documents.xsl.

    3. Enter the XSL file as the Primary File.

    4. Check that the Security Group matches any DTD/schema files in the Content Server associated with the XSL file and the native files that are checked into the Content Server.

    5. Click Check In.

    When files are checked in with this Content Type, and a FlexionDoc/SearchML XML file is generated by XML Converter or the checked-in file is XML, this XSL file will be used for XSL transformation to a new XML document.

  3. Repeat these steps for each Content Type to post-process to XML.

23.3.4.1 XSLT Errors

When a validation fails, Inbound Refinery collects the errors from the SAX Validation engine, creates an hcsp error page and attempts to check in the page to Content Server.

Manually set up outgoing providers on Inboard Refinery to the Content Server for the refinery to check in an error page. The name of Inbound Refinery provide must match the agent name. For example if Inbound Refinery is named production_ibr and it is converting files for a Content Server named production_cs, then an outgoing provider named production_cs must be created on the production_ibr Inbound Refinery.

To set up a criteria workflow to be notified regarding XSL transformation failures:

  1. From the main menu, choose Administration then Admin Applets.
  2. From the Applet list, choose Workflow Admin.
  3. Add a criteria workflow for notification of XSLT transformation failures.
  4. Add a workflow step with the following properties:
  • Users: specify the users that should be notified.

  • Exit Conditions: select At least this many reviewers, and set the value to 0.

  • Events: For the Entry event, add the following Custom Script Expression:

    <$if dDocTitle like "*XSLT Error"$>
    <$else$>
    <$wfSet("wfJumpEntryNotifyOff", "1")$>
    <$wfExit(0,0)$>
    <$endif$>
    

For details about using workflows, see Managing Workflows.

23.4 Converting Microsoft Office Files to HTML

Inbound Refinery can convert native Microsoft Office files to HTML by using the native Microsoft Office applications installed on a Windows system. Content Server can be installed on either a Windows or UNIX platform, but for Microsoft Office to HTML conversions to work, Inbound Refinery must be configured on the Windows system where the Microsoft Office native applications are installed.

HTML conversion automates opening Microsoft office files in their native application, saves them out as HTML pages, then collects the HTML output into a compressed ZIP file that gets returned to Content Server.

HTML conversion can process the following types of files:

  • Microsoft Word 2003 through 2010

  • Microsoft Excel 2003 through 2010

  • Microsoft PowerPoint 2003 through 2010

  • Microsoft Visio 2007

When WinNativeConverter is enabled to work with Inbound Refinery, native Microsoft Office files checked into Content Server are sent to Inbound Refinery for conversion. Inbound Refinery automates the process of converting the files to HTML using the native Microsoft Office applications. If a single HTML page is returned to Content Server, it is used as the web-viewable file. If conversion results in multiple HTML pages, the following files are returned to Content Server:

  • An HCSP page as the primary web-viewable rendition

  • A ZIP file that includes the HTML output from the Office application

  • Optionally, a thumbnail rendition of the native Microsoft Office file

When a user clicks on the web-viewable link in Content Server of a document converted to multiple HTML pages by Inbound Refinery, the HCSP page redirects the server to the HTML rendition.

Microsoft Office to HTML conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

WinNativeConverter

Enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application.

Inbound Refinery Server

MSOfficeHtmlConverterSupport

Enables Content Server to support HTML conversions of native Microsoft Office files converted by Inbound Refinery and returned to Content Server in a ZIP file. Requires that ZipRenditionManagement component be installed on the Content Server.

Content Server

ZipRenditionManagement

Enables Content Server access to HTML renditions created and compressed into a ZIP file by Inbound Refinery.

Content Server

This section discusses how to configure Content Server to work with Microsoft Office to HTML conversions:

23.4.1 Configuring Content Servers to Send Jobs for HTML Conversion

When installed on the refinery, the WinNativeConverter adds the Word HTML, PowerPoint HTML, Excel HTML, and Visio HTML option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server. File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options.

For a Microsoft Office document to be processed by Inbound Refinery, its file extension must be mapped to a format name that is associated with the HTML Conversion method. The added conversion options for HTML Conversion are not automatically mapped: they must be mapped manually. They can be set either using the File Formats Wizard or the Configuration Manager applet. The Configuration Manager applet gives you greater control over which file extensions are mapped to which conversion options. For details, see the following sections:

23.4.1.1 Using the File Formats Wizard for Microsoft Office Conversions

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
  3. On the File Formats Wizard, select the Microsoft Office document file types you want to convert to HTML. The Conversion column lists the appropriate conversion option according to the file type. For example:
    • Word for doc, docx, dot, dotx

    • PowerPoint for ppt, pptx

    • Excel for xls, xlsx

    • Visio for vsd

    Note:

    HTML conversion can process the following types of files:

    • Microsoft Word 2003 through 2010

    • Microsoft PowerPoint 2003 through 2010

    • Microsoft Excel 2003 through 2010

    • Microsoft Visio 2007

  4. Click Update to save all changes.
  5. Log in to the Inbound Refinery as an administrator.
  6. From the navigation menu, choose Conversion Settings then Primary Web Rendition.
  7. On the Primary Web Rendition page, enable Convert selected MS Office formats to MS HTML.
  8. Click Update.

23.4.1.2 Using the Configuration Manager for Microsoft Office Conversions

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration then Admin Applets.
  3. From the Applet list, choose Configuration Manager.
  4. Choose Options then File Formats.
  5. Select the application format for the Office document type to convert from the Format column. For example, for Microsoft Word, select application/msword.
  6. Click Edit.
  7. In the Edit File Format dialog, select the HTML conversion option from the Conversion list appropriate to the selected Office document format. For example, for application/msword, select the conversion option Word HTML.
  8. Click OK.
  9. Repeat these steps for all Microsoft Office formats to convert to HTML.
  10. When finished, click Close to close the File Formats page and then close the Configuration Manager.
  11. Restart Content Server and Inbound Refinery.