23 Working with Conversions

When using Oracle WebCenter Content: Inbound Refinery, several different conversion operations can be configured and managed including PDF conversion, XML conversion, Tiff conversion, and converting Microsoft Office files to HTML. This chapter discusses the tasks involved in managing those conversion types.

Note:

Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.

For additional information describing the different types of conversion, how and where they are performed, and the advantages of each type, see the "Conversions in WebCenter Content" blog.

This chapter includes the following topics:

23.1 Managing PDF Conversions

Inbound Refinery can convert native files to PDF by either exporting to PDF directly using Oracle Outside In PDF Export (included with Inbound Refinery) or by using third-party applications to output the native file to PostScript and then using a third-party PDF distiller engine to convert the PostScript file to PDF.

PDF conversions require the following components to be installed and enabled on the Inbound Refinery server.

Component Name Component Description Enabled on Server

PDFExportConverter

Enables Inbound Refinery to use Oracle OutsideIn to convert native formats directly to PDF without the use of any third-party tools. PDF Export is fast, multi-platform, and allows concurrent conversions.

Inbound Refinery Server

WinNativeConverter

Enables Inbound Refinery to convert native files to a PostScript file with either the native application or OutsideInX and convert the PostScript file to PDF using a third-party distiller engine. This component is for Windows platform only. It replaces the functionality previously made available in the deprecated PDFConverter component.

WinNativeConverter offers the best rendition quality of all PDF conversion options when used with the native application on a Windows platform. This does not allow concurrent conversions.

WinNativeConverter also enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application.

Inbound Refinery Server

Note:

Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.

This section describes how to work with PDF conversions and includes the following topics:

23.1.1 PDF Conversion Considerations

There are several factors to consider when choosing a PDF conversion method. System performance (the time it takes to convert a file to PDF format), the fidelity of the PDF output (how closely it matches the look and formatting of the native file), what native applications are needed (such as Microsoft Word or PowerPoint, used to generate the PostScript file converted by Inbound Refinery), and the platform a conversion application requires should all be taken into consideration.

If the speed of conversion is a primary concern, using PDF Export to convert original files directly to PDF is fastest. In addition to not having to use third-party tools, PDF Export allows concurrent PDF conversions and supports Windows, Linux and UNIX platforms.

If the fidelity of the PDF output is a primary concern, then using the native application to open the original file, output to PostScript, and convert the PostScript to PDF is the best option. However, this method is limited to the Windows platform and it cannot run concurrent PDF conversions.

Table 23-1 compares conversion methods and lists the platforms they support.

Note:

Regardless of the conversion option used, a PDF is a web-ready version of the native format. A converted PDF should not be expected to be an exact replica of the native format. Many factors such as font substitutions, complexity and format of embedded graphics, table structure, or issues with third-party distiller engines may cause the PDF output to differ from the native format.

Table 23-1 PDF Conversion Methods

Conversion Method Performance Fidelity Supported Platforms Concurrent PDF Conversions

PDF Export

Best

Good

Windows/UNIX

Yes

3rd-Party Native Applications

Good

Best

Windows

No

23.1.2 Configuring PDF Conversion Settings

This section discusses the following topics regarding PDF conversion settings:

23.1.2.1 Configuring Content Servers to Send Jobs to Inbound Refinery

File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion. When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. Use either the File Formats Wizard or the Configuration Manager to set the file extension, file format, and conversion mappings.

All conversions required for Inbound Refinery are available by default in Content Server. For more information about configuring file extensions, file formats, and conversions in your Content Servers, see About MIME Types and Managing File Types.

Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion upon check-in. One or more refineries must be set up to accept that conversion. Set the conversions that the refinery will accept and queue maximums on the Conversion Listing page. All conversions required for Inbound Refinery are available by default in both Content Server and Inbound Refinery.

For more information about setting accepted conversions, see Setting Accepted Conversions.

23.1.2.2 Setting PDF Files as the Primary Web‐Viewable Rendition

To set PDF files as the primary web‐viewable rendition:

  1. Log into the refinery.
  2. Select Conversion Settings, then select Primary Web Rendition.
  3. On the Primary Web-Viewable Rendition page, select one or more of the following conversion methods. For a conversion method to be available, the associated components must be installed and enabled:
    • Convert to PDF using PDF Export: when running on either Windows or UNIX, Inbound Refinery uses Outside In PDF Export to convert files directly to PDF without the use of third-party applications. PDFExportConverter must be enabled on the refinery server.

    • Convert to PDF using third-party applications: when running on Windows, Inbound Refinery can use several third-party applications to create PDF files of content items. In most cases, a third‐party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, Inbound Refinery can use a third-party application to convert a file directly to PDF. For this option to be available, WinNativeConverter must be enabled on the refinery server. In addition, when using this option, Inbound Refinery requires the following:

      • A PostScript distiller engine.

      • A PostScript printer.

      • The third‐party applications used during the conversion.

    • Convert to PDF using Outside In: Inbound Refinery includes Outside In, which can be used with WinNativeConverter on Windows to create PDF files of some content items. Outside In is used to print the files to PostScript, and then the PostScript files are converted to PDF using the configured PostScript distiller engine. When using this option, Inbound Refinery requires only a PostScript distiller engine.

    Inbound Refinery attempts to convert each incoming file based on the conversion method assigned to the format by the Content Server. If the format is not supported for conversion by the first selected method, Inbound Refinery checks to see if the next selected method supports the format, and so on. Inbound Refinery will attempt to convert the file using the first selected method that supports the conversion of the format.

    For example, consider that you select both the Convert to PDF using third-party applications option and the Convert to PDF using Outside In option. You then send a Microsoft Word file to the refinery for conversion. Because the Microsoft Word file format is supported for conversion to PDF using a third-party application (Microsoft Word), Inbound Refinery attempts to use the Convert to PDF using third-party applications method to convert the file to PDF as the primary web-viewable rendition.

    If this method fails, Inbound Refinery does not attempt the Convert to PDF using Outside In method. However, if you send a JustWrite file to the refinery for conversion, this file format is not supported for conversion to PDF using the Convert to PDF using third-party applications method, so Inbound Refinery will check to see if this format is supported by the Convert to PDF using Outside In method. Because this format is supported by Outside In, Inbound Refinery will attempt to convert the file to PDF using Outside In.

  4. Click Update to save your changes.
  5. When using the Convert to PDF using Third-Party Applications method or the Convert to PDF using Outside In method, click the corresponding PDF Web-Viewable Options button.
  6. On the PDF Options page, set your PDF options, and click Update to save your changes.
23.1.2.3 Installing a Distiller Engine and PDF Printer

When converting documents to PDF using WinNativeConverter, a distiller engine and PDF printer must be obtained, installed, and configured. This is not necessary when converting to PDF using Outside In PDF Export to open and save documents to PDF.

WinNativeConverter can use several third-party applications to create PDF files of content items. In most cases, a third-party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, WinNativeConverter can use a third-party application to convert a file directly to PDF.

Note:

A distiller engine is not provided with Inbound Refinery. You must obtain a distiller engine of your choice. The chosen distiller engine must be able to execute conversions via a command-line. The procedures in this section use AFPL Ghostscript as an example. This is a free, robust distiller engine that performs both PostScript to PDF conversion and optimization of PDF files during or after conversion.

To install the PDF printer:

  1. Obtain and install a distiller engine on the computer where Inbound Refinery has been deployed.
  2. Start the SystemProperties utility:
    • Microsoft Windows: Choose Start then Programs then Oracle Content Server. Choose refinery_instance then Utilities then System Properties.

  3. Open the Printer tab.
  4. Click Browse next to the Printer Information File field and navigate to the printer information file installed with your distiller engine.
  5. Enter a name for the printer in the Printer Name field.
  6. Enter the name of the printer driver in the Printer Driver Name field. This name should match the name used in the printer driver information file.
  7. Enter the port path in the Printer File Port Path field. For example, c:\temp\idcout.ps
  8. Click Install Printer and follow the printer install instructions when prompted.

    Note:

    After a printer is installed, the fields on the System Properties Printer tab are disabled. If the installed printer is deleted, the Printer tab is enabled again and the printer must be reinstalled.

  9. Click OK to apply the change and exit System Properties.
23.1.2.4 Configuring Third‐Party Application Settings

To change third‐party application settings:

  1. Log into the refinery.
  2. Select Conversion Settings then Third‐Party Application Settings.
  3. On the Third-Party Application Settings page, click Options for the third‐party application.
  4. Change the third‐party application options.
  5. Click Update to save your changes.
23.1.2.5 Configuring Timeout Settings for PDF Conversions

To configure timeout settings for PDF file generation:

  1. Log into the refinery.
  2. Select Conversion Settings then Timeout Settings.
  3. On the Timeout Settings page, enter the Minimum (in minutes), Maximum (in minutes), and Factor for the following conversion operations:
    • Native to PostScript: the stage in which the original (native) file is converted to a PostScript (PS) file.

    • PostScript to PDF: the stage in which the PS file is converted to a Portable Document Format (PDF) file.

    • FrameMaker to PostScript: these values apply to the conversion of Adobe FrameMaker files to PS files.

    • PDF to Post Production: the stage in which any processing is performed after the file has been converted to PDF format.

  4. Click Update to save your changes.
23.1.2.6 Setting Margins When Using Outside In

Inbound Refinery includes Outside In version 8.3.2. When using Outside In to convert graphics to PDF, you can set the margins for the generated PDF from 0–4.23 inches or 0–10.76 cm. By default, Inbound Refinery uses 1‐inch margins on the top, bottom, right, and left.

To adjust these margins:

  1. Use a text editor to open the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory.
  2. Change the following settings:
    OIXTopMargin=
    OIXBottomMargin=
    OIXLeftMargin=
    OIXRightMargin=
    
  3. To change the margin units from inches to centimeters, set the following:
     OIXMarginUnitInch=false
    
  4. Save your changes to the intradoc.cfg file.
  5. Restart the refinery.

23.2 Managing Tiff Conversions

Tiff conversion enables the following functionality specific to TIFF (Tagged Image File Format) files:

  • Creation of a managed PDF file from a single or multiple-page TIFF file.

  • Creation of a managed PDF file from multiple TIFF files that have been compressed into a single ZIP file.

  • OCR (Optical Character Recognition) during TIFF-to-PDF conversion. This enables indexing of the text within checked-in TIFF files, so that users can perform full-text searches of these files.

The TiffConverter component is supported on Windows only. For information on file formats and languages that can be converted by PdfCompressor, see the documentation provided by CVISION.

Note:

The TiffConverter component requires CVISION CVista PdfCompressor to perform TIFF-to-PDF conversion with OCR. PdfCompressor is not provided with the TiffConverter component. You must obtain PdfCompressor from CVISION.

TIFF conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

TiffConverter

Enables Inbound Refinery to convert single or multipage TIFF files to PDF complete with searchable text.

Inbound Refinery Server

TiffConverterSupport

Enables Content Server to support TIFF to PDF conversion.

Content Server

23.2.1 Configuring Content Servers to Send Jobs for Tiff Conversion

File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the TiffConverterSupport component on a Content Server adds three TIFFConversion options on the File Formats Wizard page.

For a content item to be processed by Inbound Refinery, its file extension (for example, TIF or TIFF) must be mapped to a format name associated with the TIFFConversion conversion method. The added conversion options for Tiff Converter are not automatically mapped. They must be mapped manually. The following topics describe how to set the mappings:

23.2.1.1 Using the File Formats Wizard for Tiff Conversion

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. You can convert TIFF to PDF with OCR or TIFF to PDF without OCR.

To convert TIFF to PDF with OCR:

  1. Log in to the Content Server as an administrator.

  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.

  3. On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (TIFFConversion) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the TIFFConversion conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the image/tiff file format to PASSTHRU, so TIF and TIFF files are not processed by Inbound Refinery.

    Note:

    The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.

  4. If you have added tifz and tiz file extensions using the Configuration Manager, you can select tifz, tiz on the File Format Wizard page to enable application/zip options in the File Type (conversion name) field menu.

    • Compressed Tiff to PDF (tifz, tiz): Selecting this menu item maps the TIFZ and TIZ file extensions to the graphic/tiff-x-compressed file format and associates the graphic/tiff-x-compressed file format with the TIFFConversion conversion method. When TIFZ or TIZ files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the graphic/tiff-x-compressed file format to PASSTHRU, so TIFZ and TIZ files are not processed by Inbound Refinery.

    • Compressed Tiff to PDF (zip): Selecting this menu item maps the ZIP file extension to the application/zip file format and associates the application/zip file format with the TIFFConversion conversion method. When ZIP files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the application/zip file format to PASSTHRU, so that ZIP files are not processed by Inbound Refinery.

  5. Click Update to save all changes.

To convert TIFF to PDF without OCR:

  1. Log in to the Content Server as an administrator.

  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.

  3. On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (Direct PDFExport)  in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the Direct PDFExport conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using oit PDFExport and converted to PDF without OCR.

    Note:

    When the TIFF to PDF (Direct Export) options is used, only the metadata in the resulting PDF is searchable, the text is not searchable.

  4. Click Update to save all changes.

23.2.1.2 Using the Configuration Manager for Tiff Conversion

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:

  1. Log in to Content Server as an administrator.

  2. From the main menu, choose Administration, then Admin Applets.

  3. From the Applets list, choose Configuration Manager.

    The Configuration Manager applet is started.

  4. In the Configuration Manager applet, choose Options then File Formats.

  5. To enable single, unzipped TIFF files (TIF and TIFF) to be processed by Inbound Refinery:

    1. In the File Formats section, check that the image/tiff file format is added and associated with the TIFFConversion conversion method.

      Note:

      The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.

    2. In the File Extensions section, check that the tif and tiff file extensions are added and mapped to the image/tiff file format.

  6. To enable TIFF files that have been compressed into a single TIFZ or TIZ file to be processed by Inbound Refinery:

    1. In the File Formats section, check that the graphic/tiff-x-compressed file format is and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, check that the tifz and tiz file extensions are added and mapped to the graphic/tiff-x-compressed file format.

  7. To enable TIFF files that have been compressed into a single ZIP file to be processed by Inbound Refinery:

    1. In the File Formats section, check that the application/zip file format is added and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, check that the zip file extension is added and mapped to the application/zip file format.

23.2.1.3 Tips for Processing Zip Files in Tiff Conversion

The ZIP file extension might be used in multiple ways in your environment. For example, you might be checking in:

  • Multiple TIFF files compressed into a single ZIP file for Inbound Refinery to convert to a single PDF file with OCR.

  • Multiple file types compressed into a single ZIP file that should not be processed (the ZIP file should be passed through in its native format).

When using the ZIP file extension in multiple ways, Oracle recommends configuring the Content Server to allow the user to choose how ZIP files are processed at check-in. This is referred to as Allow override format on check-in. To enable this Content Server functionality:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration, then Admin Server then General Configuration.
  3. Enable the Allow override format on checkin setting and click Save.
  4. Restart the Content Server.
  5. Using the Configuration Manager, set up the file formats:
    • Map the application/zip file format to the TIFFConversion conversion method. This option can then be selected to send ZIP files containing TIFF files to Inbound Refinery. For a description, enter Zipped Tiff to PDF.

    • Set up an alternate file format, for example called application/zip-passthru, mapped to PassThru for zipped files that should not be converted. For a description, enter Zip Passthru.

      Note:

      The Content check-in Form page lists file formats by their description.

  6. Map the ZIP file extension to the file format that will be used most commonly. This will be the default conversion method for ZIP files.
  7. When a user checks in a ZIP file, the user can override the default conversion method by selecting any of the conversion methods that are set up.

Note:

If you are using the upload applet to check in multiple files, the files are compressed into a single ZIP file before being checked in. In this case Oracle also recommends enabling Allow override format on check-in so the user can choose how the ZIP file is processed when uploading multiple TIFFs.

Tip:

When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.

23.2.2 Configuring Tiff Conversion Settings

This section discusses the following topics regarding conversion settings:

23.2.2.1 Setting Accepted Conversions

When installed on the refinery, the TiffConverter component adds the TIFFConversion option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server.

23.2.2.2 Changing Timeout Settings

The timeout settings should reflect the processing time required for the size of TIFF files that are commonly checked in to the Content Server. This is highly variable depending on CPU power and TIFF complexity. Perform these tasks to determine the appropriate timeout values for TIFF files:

  • Run and time several representative Inbound Refinery jobs using CVista PdfCompressor alone (without the Inbound Refinery).

  • Examine the document history information and evaluate the required processing time.

  • Change Inbound Refinery timeout settings accordingly.

    Note:

    Information about Tiff Converter timeouts is recorded in the Inbound Refinery and agent logs.

To configure timeout settings for Tiff to PDF file generation:

  1. Log into the refinery.
  2. Choose Settings then Timeouts.
  3. On the Timeouts page, enter the Minimum (in minutes), the Maximum (in minutes), and Factor for the Tiff to PDF Conversion. This is the stage in which the original (native) TIFF file is converted to a Portable Document Format (PDF) file.following conversion operations:

    For more information about how timeout settings are calculated and examples, see Configuring Inbound Refinery.

  4. Click Update to save all changes.

23.2.3 Configuring CVista PdfCompressor

This section discusses the following topics regarding the CVista PdfCompressor:

23.2.3.1 Changing PdfCompressor Settings

These options are specific to CVista PdfCompressor. If the TiffConverter component is not installed, the CVista PdfCompressor Options are not available.

To change the PdfCompressor settings:

  1. Login to the refinery.
  2. Choose Conversion Settings then Third-Party Applications Settings.
  3. On the Third-Party Application Settings page, click Options for CVista PdfCompressor.
  4. On the CVista PdfCompressor Options page, set the path to the location of the CVista PdfCompressor executable in the appropriate text box.
  5. Enter the string of parameter values in the parameters option text box. A default option string is set on installation of the TiffConverter component.
  6. Click Update to save the settings.

Tip:

When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.

The following recommended parameter strings should produce optimal results for each given scenario. If these settings do not produce the intended results, modify these strings by removing or appending settings. For more information on these and other available settings, see the online help provided with CVista PdfCompressor (especially "Appendix A: Command-Line Flags for Compression").

Default CVista PdfCompressor Parameters - OCR Enabled

A default string is set when the TiffConverter component is installed unless a string already exists (if the string was set using a previous version of Tiff Converter). The default string has been optimized for typical PdfCompressor usage with OCR enabled:

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong

CVista PdfCompressor Parameters- Horizontal and Vertical OCR Enabled

The following string can be used for typical usage with OCR and support OCR processing of both vertical and horizontal text in the same image (add -ocrtwod):

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐ocrtwod ‐lsize 25 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong

CVista PdfCompressor Parameters - No OCR

The following string can be used for simple conversion (without OCR):

‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
23.2.3.2 Configuring CVista PdfCompressor OCR Languages

Note:

Changes made in the CVista PdfCompressor user interface do not affect how CVista PdfCompressor functions when called by Tiff Converter.

By default, CVista PdfCompressor uses an English OCR dictionary when performing OCR on TIFF files. However, CVista PdfCompressor can perform OCR on several other languages.

To set up multiple OCR languages and enable the user to choose the OCR language at check-in:

Note:

If the following method is used, language parameters should not be specified or passed to the refinery via the CVista PdfCompressor Options Page.

  1. Obtain the appropriate current language files by contacting CVISION:

    • A lng file is required for each language.

    • Czech, Polish, and Hungarian also require the latin2.shp file.

    • Russian also requires the cyrillic.shp file.

    • Greek also requires the greek.shp file.

    • Turkish also requires the turkish.shp file.

  2. Place the CVISION language files in the CVista installation directory. The default location is C:\Program Files\CVision\PdfCompressorxx\ where xx stands for the version number of PdfCompressor.

  3. Log in to Content Server as an administrator.

  4. From the main menu, choose Administration then Admin Applets.

  5. From the Applets list, choose Configuration Manager.

  6. On the Configuration Manager page, click Information Fields tab.

  7. If the OCRLang information field has been added, skip this step. If it has not been added:

    1. In the Field Info section, click Add.

    2. On the Add Custom Info page, in the Field Name field, enter OCRLang. This creates a new information field for CVista language conversion options.

      Note:

      Enter this field name exactly.

    3. Click OK.

    4. On the Add Custom Info Field page, in the Field Caption field, enter the descriptive caption to be displayed on the Content check-in Form page. For example, OCR Language.

    5. From the Field Type list, choose Text.

    6. Select the Enable Option List check box.

    7. From the Option List Type list, choose Select List Validated.

    8. In the Use option list field, enter xOCRLangList.

    9. Click Edit next to the Use Option List field.

    10. On the Option List page, enter the CVista OCR languages to present as options. The following language names are valid options.

      Note:

      You can use either the English language name or the native equivalent (if listed). However, you must enter the language options exactly as they appear in the following table.

      English Native

      Czech

      -

      Danish

      Dansk

      Dutch

      Nederlands

      English

      -

      Finnish

      Suomi

      French

      Français

      German

      Deutsch

      Greek

      -

      Hungarian

      Magyar

      Italian

      Italiano

      Norwegian

      Norsk

      Polish

      Polski

      Portuguese

      Português

      Russian

      -

      Spanish

      Español

      Swedish

      Svenska

      Turkish

      -

    11. Select the Ignore Case check box.

    12. Click OK.

    13. In the Default Value field, enter the default OCR language option.

    14. Click OK to save the settings and return to the Information Fields tab.

    15. Click Update Database Design.

  8. If the OCRLang Information field has been added, but changes must be made to the languages option list and/or the default language:

    1. In the Field Info section, select OCRLang and click Edit.

    2. On the Add Custom Info page, click Edit next to the Use Option List field.

    3. On the Option List page, delete any unused CVista OCR languages.

    4. Click OK.

    5. In the Default Value field, enter the default OCR language option.

    6. Click OK to save the settings and return to the Information Fields tab.

  9. Close the Configuration Manager applet. When a user checks in a TIFF file, the user can override the default OCR language by selecting any of the OCR languages that were set up.

23.3 Managing XML Conversions

XML conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

XMLConverter

Enables Inbound Refinery to produce FlexionDoc and SearchML-styled XML as the primary web-viewable file or as independent renditions, and can use the Xalan XSL transformer to process XSL transformations.

Inbound Refinery Server

XMLConverterSupport

Enables Content Server to support XML conversions and XSL transformations.

Content Server

23.3.1 Configuring Content Servers to Send Jobs to Inbound Refinery

File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion.

When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. File extension, file format, and conversion mappings can be configured using either the File Formats Wizard or the Configuration Manager.

Most conversions required for Inbound Refinery are available by default in Content Server. In addition to the default conversions, the following conversions are added to the Content Server when the XMLConverterSupport component is installed.

Conversion Description

FlexionXML

Used to convert files to XML using the FlexionDoc schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using FlexionDoc, their file formats do not need to be re-mapped to the FlexionXML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.

SearchML

Used to convert files to XML using the SearchML schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using SearchML, their file formats do not need to be re-mapped to the SearchML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager.

XSLT Transformation

After XML Converter converts documents to the FlexionDoc schema, the XSLT conversion allows the resultant XML to be transformed into other XML schema specified by a developer.

Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion on check-in. One or more refineries must be set up to accept that conversion.

Most conversions required for Inbound Refinery are available by default. In addition to the default conversions that can be accepted by a refinery, the FlexionXML and SearchML conversions are added to the refinery when the XMLConverter component is installed. The FlexionXML and SearchML conversions are accepted by default.

23.3.2 Setting XML Files as the Primary Web‐Viewable Rendition

To set XML files as the primary web‐viewable rendition:

  1. Log into the refinery.
  2. Choose Conversion Settings then select Primary Web Rendition.
  3. On the Primary Web-Viewable Renditions page, select the Convert to XML option.
  4. Typically all other conversion options should be cleared. Inbound Refinery attempts to convert each incoming file based on the native file format. If the format is not supported for conversion by the first selected method, Inbound Refinery checks if the next selected method supports the format, and so on. Inbound Refinery attempts to convert the file using the first selected method that supports the conversion of the format.

    For example, suppose you select both the Convert to PDF using third-party applications option and the Convert to XML option. The refinery attempts to convert any supported formats to PDF using the Convert to PDF using third-party applications method. Whether or not this method fails, Inbound Refinery does not attempt another conversion method for these formats. Therefore, you should typically select only the Convert to XML option to create XML files as the primary web-viewable rendition.

  5. Click Update to save all changes.
  6. Click XML Options.
  7. On the XML Options page, set XML options, and click Update to save the changes.
  8. Note the following important considerations:
    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. For a complete description of available Flexiondoc and SearchML options, see the xx.cfg file located in the refinery IdcHomeDir/components/XMLConverter/resources directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • FlexionDoc and SearchML documentation files are installed with the XMLConverter component and located in the refinery IdcHomeDir/components/XMLConverter directory.

23.3.3 Setting XML Files as an Additional Rendition

To set XML files as an additional rendition:

  1. Log into the refinery.
  2. From Conversion Settings, select Additional Renditions.

    The Additional Renditions page opens.

  3. Select the Create XML renditions for all supported formats option. Inbound Refinery will generate an XML file in addition to other renditions such as PDF files.

    When the generated XML files are delivered back to a Content Server, the XML files are included in the full-text index. However, if other web‐viewable files are generated in addition to the XML file, the XML file is not used as the primary web‐viewable rendition. For example, if Inbound Refinery generates both a PDF file and an XML file, the PDF file would be used as the primary web‐viewable rendition. XML renditions stored in the Content Server weblayout directory can be recognized by the characters @x in their file names. For example, the file Report2001@x~2.xml would be an XML rendition.

  4. Click Update to save your changes.
  5. Click XML Options.
  6. On the XML Options page, set your XML options, and click Update to save your changes.
  7. Note the following important considerations:
    • If you want to adjust the default settings for the Flexiondoc and SearchML options, you can specify option settings in the intradoc.cfg file located in the refinery DomainDir/ucm/ibr/bin directory. You must restart your refinery after making changes to the intradoc.cfg file.

    • For a complete description of available Flexiondoc and SearchML options, see the xx.cfg and sx.cfg files located in the refinery IdcHomeDir/components/XMLConverter/resources directory. These configuration files are for reference only and should not be modified.

    • FlexionDoc and SearchML schema code and documentation files are installed with the XMLConverter component into the refinery IdcHomeDir/components/XMLConverter directory.

23.3.4 Setting Up XSL Transformation

Inbound Refinery uses the Xalan XSLT processor and the SAX validator built into the Java virtual machine running Inbound Refinery. To enable transformation, the XMLConverter component must be installed and enabled on the refinery server and the XMLConverterSupport component must be installed and enabled on the Content Server.

To turn on XSL Transformation:

  1. Log into the refinery server.

  2. Do one of the following:

    • If the XML rendition is to be the primary web-viewable file, click Conversion Settings then Primary Web Rendition. Enable Convert to XML on the Primary Web-Viewable Rendition Page when it is displayed.

    • If the XML is to be an additional rendition, click Conversion Settings then Additional Renditions. Enable Create XML renditions for all supported formats on the Additional Renditions Page when it is displayed.

  3. Click XML Options.

  4. On the XML Options page, enable Process XSLT Transformation and select the XML schema to use from the following options:

    • Produce FlexionDoc XML

    • Produce SearchML

  5. Click Update to save all changes or Reset to revert to the last saved settings.

In order to preform XSL transformations Inbound Refinery must have an XSL template to apply during the transformation checked into Content Server. To check in an XSL template to Content Server:

  1. Create an XSL file. The XSL file specifies how an XML file with a specific Content Type will be transformed to a new XML file. A DTD or schema can be specified for validation and stored in the Content Server, but is not required.

  2. Check the XSL file into the Content Server and associate it to a Content Type.

    1. In the Content check-in Form, select the Content Type from the Type list.

    2. Enter the Content ID according to the following convention:

      Content Type.xsl

      For example, if the Content Type is Documents, enter documents.xsl.

    3. Enter the XSL file as the Primary File.

    4. Check that the Security Group matches any DTD/schema files in the Content Server associated with the XSL file and the native files that are checked into the Content Server.

    5. Click Check In.

    When files are checked in with this Content Type, and a FlexionDoc/SearchML XML file is generated by XML Converter or the checked-in file is XML, this XSL file will be used for XSL transformation to a new XML document.

  3. Repeat these steps for each Content Type to post-process to XML.

23.3.4.1 XSLT Errors

When a validation fails, Inbound Refinery collects the errors from the SAX Validation engine, creates an hcsp error page and attempts to check in the page to Content Server.

Manually set up outgoing providers on Inboard Refinery to the Content Server for the refinery to check in an error page. The name of Inbound Refinery provide must match the agent name. For example if Inbound Refinery is named production_ibr and it is converting files for a Content Server named production_cs, then an outgoing provider named production_cs must be created on the production_ibr Inbound Refinery.

To set up a criteria workflow to be notified regarding XSL transformation failures:

  1. From the main menu, choose Administration then Admin Applets.
  2. From the Applet list, choose Workflow Admin.
  3. Add a criteria workflow for notification of XSLT transformation failures.
  4. Add a workflow step with the following properties:
  • Users: specify the users that should be notified.

  • Exit Conditions: select At least this many reviewers, and set the value to 0.

  • Events: For the Entry event, add the following Custom Script Expression:

    <$if dDocTitle like "*XSLT Error"$>
    <$else$>
    <$wfSet("wfJumpEntryNotifyOff", "1")$>
    <$wfExit(0,0)$>
    <$endif$>
    

For details about using workflows, see Managing Workflows.

23.4 Converting Microsoft Office Files to HTML

Inbound Refinery can convert native Microsoft Office files to HTML by using the native Microsoft Office applications installed on a Windows system. Content Server can be installed on either a Windows or UNIX platform, but for Microsoft Office to HTML conversions to work, Inbound Refinery must be configured on the Windows system where the Microsoft Office native applications are installed.

HTML conversion automates opening Microsoft office files in their native application, saves them out as HTML pages, then collects the HTML output into a compressed ZIP file that gets returned to Content Server.

HTML conversion can process the following types of files:

  • Microsoft Word 2003 through 2010

  • Microsoft Excel 2003 through 2010

  • Microsoft PowerPoint 2003 through 2010

  • Microsoft Visio 2007

When WinNativeConverter is enabled to work with Inbound Refinery, native Microsoft Office files checked into Content Server are sent to Inbound Refinery for conversion. Inbound Refinery automates the process of converting the files to HTML using the native Microsoft Office applications. If a single HTML page is returned to Content Server, it is used as the web-viewable file. If conversion results in multiple HTML pages, the following files are returned to Content Server:

  • An HCSP page as the primary web-viewable rendition

  • A ZIP file that includes the HTML output from the Office application

  • Optionally, a thumbnail rendition of the native Microsoft Office file

When a user clicks on the web-viewable link in Content Server of a document converted to multiple HTML pages by Inbound Refinery, the HCSP page redirects the server to the HTML rendition.

Microsoft Office to HTML conversions require the following components to be installed and enabled on the specified server.

Component Name Component Description Enabled on Server

WinNativeConverter

Enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application.

Inbound Refinery Server

MSOfficeHtmlConverterSupport

Enables Content Server to support HTML conversions of native Microsoft Office files converted by Inbound Refinery and returned to Content Server in a ZIP file. Requires that ZipRenditionManagement component be installed on the Content Server.

Content Server

ZipRenditionManagement

Enables Content Server access to HTML renditions created and compressed into a ZIP file by Inbound Refinery.

Content Server

This section discusses how to configure Content Server to work with Microsoft Office to HTML conversions:

23.4.1 Configuring Content Servers to Send Jobs for HTML Conversion

When installed on the refinery, the WinNativeConverter adds the Word HTML, PowerPoint HTML, Excel HTML, and Visio HTML option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server. File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options.

For a Microsoft Office document to be processed by Inbound Refinery, its file extension must be mapped to a format name that is associated with the HTML Conversion method. The added conversion options for HTML Conversion are not automatically mapped: they must be mapped manually. They can be set either using the File Formats Wizard or the Configuration Manager applet. The Configuration Manager applet gives you greater control over which file extensions are mapped to which conversion options. For details, see the following sections:

23.4.1.1 Using the File Formats Wizard for Microsoft Office Conversions

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
  3. On the File Formats Wizard, select the Microsoft Office document file types you want to convert to HTML. The Conversion column lists the appropriate conversion option according to the file type. For example:
    • Word for doc, docx, dot, dotx

    • PowerPoint for ppt, pptx

    • Excel for xls, xlsx

    • Visio for vsd

    Note:

    HTML conversion can process the following types of files:

    • Microsoft Word 2003 through 2010

    • Microsoft PowerPoint 2003 through 2010

    • Microsoft Excel 2003 through 2010

    • Microsoft Visio 2007

  4. Click Update to save all changes.
  5. Log in to the Inbound Refinery as an administrator.
  6. From the navigation menu, choose Conversion Settings then Primary Web Rendition.
  7. On the Primary Web Rendition page, enable Convert selected MS Office formats to MS HTML.
  8. Click Update.
23.4.1.2 Using the Configuration Manager for Microsoft Office Conversions

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:

  1. Log in to Content Server as an administrator.
  2. From the main menu, choose Administration then Admin Applets.
  3. From the Applet list, choose Configuration Manager.
  4. Choose Options then File Formats.
  5. Select the application format for the Office document type to convert from the Format column. For example, for Microsoft Word, select application/msword.
  6. Click Edit.
  7. In the Edit File Format dialog, select the HTML conversion option from the Conversion list appropriate to the selected Office document format. For example, for application/msword, select the conversion option Word HTML.
  8. Click OK.
  9. Repeat these steps for all Microsoft Office formats to convert to HTML.
  10. When finished, click Close to close the File Formats page and then close the Configuration Manager.
  11. Restart Content Server and Inbound Refinery.