6 Working with TIFF Conversions

Inbound Refinery can convert TIFF files to PDF, consolidating multiple documents into one and allowing for optical character recognition to index text within the TIFF files during the conversion process to allow searching on text. TIFF conversions require the following components to be installed and enabled on the specified server:

Component Name Component Description Enabled on Server
TiffConverter Enables Inbound Refinery to convert single or multipage TIFF files to PDF complete with searchable text. Inbound Refinery Server
TiffConverterSupport Enables Content Server to support TIFF to PDF conversion. Content Server

This section provides an overview of Tiff converters and troubleshooting the conversion problems. The section contains the following sections:

6.1 About Tiff Conversion

Tiff conversion enables the following functionality specific to TIFF (Tagged Image File Format) files:

  • Creation of a managed PDF file from a single or multiple-page TIFF file.

  • Creation of a managed PDF file from multiple TIFF files that have been compressed into a single ZIP file.

  • OCR (Optical Character Recognition) during TIFF-to-PDF conversion. This enables indexing of the text within checked-in TIFF files, so that users can perform full-text searches of these files.

Tiff Converter is supported on Windows only. For information on file formats and languages that can be converted by PdfCompressor, refer to the documentation provided by CVISION.

Important:

Tiff Converter requires CVISION CVista PdfCompressor to perform TIFF-to-PDF conversion with OCR. PdfCompressor is not provided with Tiff Converter. You must obtain PdfCompressor from CVISION.

6.1.1 Tiff Conversion Process Overview

Tiff conversion can process the following types of files:

  • Single or multiple-page TIFF files (TIF or TIFF file extensions)

  • Multiple TIFF files that have been compressed into a single zip file (TIFZ, TIZ, or ZIP file extensions)

When a TIFF file is checked into a content server, it is stored in the vault directory in its native format. If the file format is set up to be converted by an Inbound Refinery conversion filter (such as TIFF Converter), the content server checks for an active refinery (one that is not busy and is configured to accept the conversion). When the refinery accepts the job, the content server sends the conversion data and the file to be converted to the refinery. Inbound Refinery then calls the TIFF Converter option filter, which performs the actual conversion.

Inbound Refinery is designed to use CVISION CVista PdfCompressor to convert the single-page TIFF file, multiple-page TIFF file, or zip file containing multiple TIFF files to a single Portable Document Format (PDF) file. During TIFF-to-PDF conversion, Optical Character Recognition (OCR), if configured and selected, is performed. This enables indexing of the text within checked-in TIFF files, so that Content Server users can perform full-text searches of these files.

The converted PDF files are stored in the content server's weblayout directory, which is the web-viewable file repository. The PDF files are then used as the primary web-viewable files for the content items in the content server.

Tip:

When viewing PDF files generated by Tiff Converter, use Adobe Acrobat Reader 6.0.1 or higher for the best results.

The exact conversion process depends on the configuration of the content server, the refinery, Inbound Refinery, and CVista PdfCompressor. For more information about configuring Inbound Refinery after installation, see "Configuring Tiff Conversion Settings".

6.2 Configuring Tiff Conversion Settings

This section covers the following topics:

6.2.1 Configuring Content Servers to Send Jobs for Tiff Conversion

File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the TiffConverterSupport component on a content server adds three TIFFConversion options on the File Formats Wizard page.

For a content item to be processed by Inbound Refinery, its file extension (e.g., TIF or TIFF) must be mapped to a format name that is associated with the TIFFConversion conversion method. The added conversion options for Tiff Converter are not automatically mapped: they must be mapped manually. Information on setting these mappings is covered in the following sections:

6.2.1.1 Using the File Formats Wizard

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes, complete the following steps:

  1. Make sure you are logged into Content Server as an administrator.

  2. In the navigation menu, select Administration, Refinery Administration and File Formats Wizard.

    The File Formats Wizard for <server name> page is displayed.

  3. Select or clear the Tiff to PDF (tif, tiff), Compressed Tiff to PDF (tifz, tiz), and Compressed Tiff to PDF (zip) check boxes as desired.

    Tip:

    The tif and tiff file extensions are included in several places depending on configuration. For example, they can be set for mapping to ImageThumbnail and for mapping to TIFFConversion. This can cause contention between the check boxes for "Create thumbnail only for selected graphics" and the check boxes for Tiff Converter. For example, if you have selected "Tiff to PDF" and then select "Create thumbnail only..." and click Update, "Tiff to PDF" will be disabled and tif/tiff will be re-mapped to ImageThumbnail.

    In this case you must reselect "Tiff to PDF" to remap tif/tiff to TIFFConversion (all the other graphics formats are left mapped to ImageThumbnail). If you experience any problems using the File Formats Wizard to map file extensions for Tiff Converter, use the Configuration Manager to manually view and edit all your mappings exactly. For details, see "Using the Configuration Manager".

    • Tiff to PDF (tif, tiff)-Selecting this check box maps the TIF and TIFF file extensions to the graphic/tiff file format and associates the graphic/tiff file format with the TIFFConversion conversion method. Thus, when TIF or TIFF files are checked into the content server, they will be processed by the refinery using Tiff Converter and converted to PDF with OCR. Clearing this check box sets the graphic/tiff file format to PASSTHRU, so that TIF and TIFF files are not processed by Inbound Refinery.

    • Compressed Tiff to PDF (tifz, tiz)-Selecting this check box maps the TIFZ and TIZ file extensions to the graphic/tiff-x-compressed file format and associates the graphic/tiff-x-compressed file format with the TIFFConversion conversion method. Thus, when TIFZ or TIZ files are checked into the content server, they will be processed by the refinery using Tiff Converter and converted to PDF with OCR. Clearing this check box sets the graphic/tiff-x-compressed file format to PASSTHRU, so that TIFZ and TIZ files are not processed by Inbound Refinery.

    • Compressed Tiff to PDF (zip)-Selecting this check box maps the ZIP file extension to the application/x-zip-compressed file format and associates the application/x-zip-compressed file format with the TIFFConversion conversion method. Thus, when ZIP files are checked into the content server, they will be processed by the refinery using Tiff Converter and converted to PDF with OCR. Clearing this check box sets the application/x-zip-compressed file format to PASSTHRU, so that ZIP files are not processed by Inbound Refinery.

  4. Click Update to save your changes.

6.2.1.2 Using the Configuration Manager

File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes, complete the following steps:

  1. Make sure you are logged into Content Server as an administrator.

  2. In the navigation menu, select Adminstration, Admin Applets.

    The Administration Applets for <server name> page is displayed.

  3. Click Configuration Manager.

    The Configuration Manager applet is started.

  4. Select File Formats from the Options menu.

  5. If you want single, unzipped TIFF files (TIF and TIFF) to be processed by Inbound Refinery, complete the following steps:

    1. In the File Formats section, make sure the graphic/tiff file format has been added and associated with the TIFFConversion conversion method.

      Important:

      The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the content server has been restarted.
    2. In the File Extensions section, make sure the tif and tiff file extensions have been added and mapped to the graphic/tiff file format.

  6. If you want TIFF files that have been compressed into a single TIFZ or TIZ file to be processed by Inbound Refinery, complete the following steps:

    1. In the File Formats section, make sure the graphic/tiff-x-compressed file format has been added and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, make sure the tifz and tiz file extensions have been added and mapped to the graphic/tiff-x-compressed file format.

  7. If you want TIFF files that have been compressed into a single ZIP file to be processed by Inbound Refinery, complete the following steps:

    1. In the File Formats section, make sure the application/x-zip-compressed file format has been added and associated with the TIFFConversion conversion method.

    2. In the File Extensions section, make sure the zip file extension has been added and mapped to the application/x-zip-compressed file format.

      Tip:

      Use care when setting up the zip file extension, as it might be used in multiple ways in your environment. For more information, see "Tips for Processing Zip Files".

6.2.1.3 Tips for Processing Zip Files

The ZIP file extension might be used in multiple ways in your environment. For example, you might be checking in:

  • Multiple TIFF files compressed into a single ZIP file that you want Inbound Refinery to convert to a single PDF file with OCR.

  • Multiple file types compressed into a single ZIP file that you do not want to process (the ZIP file should be passed through in its native format).

If you are using the ZIP file extension in multiple ways, Oracle recommends configuring the content server to allow the user to choose how ZIP files are processed at check in (also referred to as Allow override format on checkin). To enable this content server functionality, complete the following steps:

  1. Using the Admin Server General Configuration page, enable the Allow override format on checkin setting and click Save.

  2. Restart the content server.

  3. Using the Configuration Manager, set up your file formats:

    • Map the application/x-zip-compressed file format to the TIFFConversion conversion method. This option can then be selected to send ZIP files containing TIFF files to Inbound Refinery. For a description, enter Zipped Tiff to PDF.

    • Set up an alternate file format, for example called application/zip-passthru, mapped to PassThru for zipped files that should not be converted. For a description, enter Zip Passthru.

      Note:

      The Content Check In Form page lists file formats by their description.
  4. Map the ZIP file extension to the file format that will be used most commonly. This will be the default conversion method for ZIP files.

  5. When a user checks in a ZIP file, the user can override the default conversion method by selecting any of the conversion methods you have set up.

Important:

If you are using the upload applet to check in multiple files, the files are compressed into a single ZIP file before being checked in. In this case Oracle also recommends enabling Allow override format on checkin so that the user can choose how the ZIP file is processed when uploading multiple TIFFs.

Tip:

When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.

6.2.2 Setting Accepted Conversions

When installed on the refinery, the TiffConverter component adds the TIFFConversion option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the content server.

6.2.3 Changing Timeout Settings

The timeout settings you use should reflect the processing time required for the size of TIFF files that you commonly check into the content server. This is highly variable depending on CPU power and TIFF complexity. To determine the appropriate timeout values for TIFF files, perform these tasks:

  • Run and time several representative Inbound Refinery jobs using CVista PdfCompressor alone (without the Inbound Refinery).

  • Examine the document history information and evaluate the required processing time.

  • Change your Inbound Refinery timeout settings accordingly.

    Note:

    Information about Tiff Converter timeouts is recorded in the Inbound Refinery and agent logs.

To configure timeout settings for Tiff to PDF file generation, complete the following steps:

  1. Log into the refinery.

  2. Select Coversion Settings, Timeout Settings. The Timeout Settings page is displayed.

  3. Enter the Minimum (in minutes), Maximum (in minutes), and Factor for the following conversion operations:

    Tiff to PDF Conversion-the stage in which the original (native) TIFF file is converted to a Portable Document Format (PDF) file.

    For more information about how timeout settings are calculated and examples, refer to the Inbound Refinery Administration Guide.

  4. Click Update to save your changes.

6.2.4 Configuring CVista PdfCompressor

This section covers the following topics:

6.2.4.1 Changing PdfCompressor Settings

These options are specific to CVista PdfCompressor. If the TiffConverter component is not installed, the CVista PdfCompressor Options are not available.

A parameter string to modify the PdfCompressor converter settings is entered by selecting Conversion Settings, Third-Party Applications Settings and setting the path to the location of the CVista PdfCompressor executable and entering parameter values in the parameters options text box.

To change the PdfCompressor settings, complete the following steps:

  1. Login to the refinery.

  2. Select Conversion Settings, Third-Party Applications Settings.

    The Third-Party Application Settings page is displayed.

  3. Select the Options button for CVista PdfCompressor. The CVista PdfCompressor Options Page is displayed.

  4. Set the path to the location of the CVista PdfCompressor executable in the appropriate text box.

  5. Enter the string of parameter values in the parameters option text box. A default option string is set upon installation of the TiffConverter component. See "Recommended CVista PdfCompressor Options" for more information.

  6. Click Update to save the settings.

6.2.4.1.1 Recommended CVista PdfCompressor Options

These recommended parameter strings should produce optimal results for each given scenario. If these settings do not produce the results that you desire, you will need to modify these strings by removing or appending settings. For more information on these and other available settings, refer to the online help provided with CVista PdfCompressor (especially Appendix A: Command-Line Flags for Compression).

Default CVista PdfCompressor Parameters - OCR Enabled

A default string is set when the TiffConverter component is installed; unless a string already exists (the string was set using a previous version of Tiff Converter). The default string has been optimized for typical PdfCompressor usage with OCR enabled:

-m -c ON -colorcomptype 2 -mrcquality 5 -mrcColorCompType 0 -linearize -o -ocrmode 1 -ot 120 -qualityc 75 -qualityg 75 -rscdwndpi 300 -rsgdwndpi 300 -rsbdwndpi 300 -cconc -ccong

CVista PdfCompressor Parameters- Horizontal and Vertical OCR Enabled

The following string can be used for typical usage with OCR and support OCR processing of both vertical and horizontal text in the same image (add -ocrtwod):

-m -c ON -colorcomptype 2 -mrcquality 5 -mrcColorCompType 0 -linearize -o -ocrmode 1 -ot 120 -ocrtwod -lsize 25 -qualityc 75 -qualityg 75 -rscdwndpi 300 -rsgdwndpi 300 -rsbdwndpi 300 -cconc -ccong

CVista PdfCompressor Parameters - No OCR

The following string can be used for simple conversion (without OCR):

-m -c ON -colorcomptype 2 -mrcquality 5 -mrcColorCompType 0 -linearize -qualityc 75 -qualityg 75 -rscdwndpi 300 -rsgdwndpi 300 -rsbdwndpi 300 -cconc -ccong

6.2.4.2 Configuring CVista PdfCompressor OCR Languages

By default, CVista PdfCompressor uses an English OCR dictionary when performing OCR on TIFF files. However, CVista PdfCompressor can perform OCR on several other languages.

To set up multiple OCR languages and enable the user to choose the OCR language at check in, complete the following steps:

Important:

Changes made in the CVista PdfCompressor user interface will not affect how CVista PdfCompressor functions when called by Tiff Converter.

Note:

If the following method is used, language parameters should not be specified or passed to the refinery via the CVista PdfCompressor Options Page.
  1. Obtain the appropriate current language files by contacting CVISION:

    • A lng file is required for each language.

    • Czech, Polish, and Hungarian also require the latin2.shp file.

    • Russian also requires the cyrillic.shp file.

    • Greek also requires the greek.shp file.

    • Turkish also requires the turkish.shp file.

  2. Place the CVISION language files in your CVista installation directory. The default location is C:\Program Files\CVision\PdfCompressor x.x\ (where x.x stands for the version number of PdfCompressor).

  3. Make sure you are logged into content server as an administrator.

  4. In the navigation menu, select Administration, Admin Applets.

    The Administration Applets for <server name> page is displayed.

  5. Click Configuration Manager.

    The Configuration Manager applet is started.

  6. Open the Information Fields tab.

  7. If the OCRLang information field has been added, skip to Step 8. If it has not been added, complete the following steps:

    1. In the Field Info section, click Add.

      The Add Custom Info screen is displayed.

    2. In the Field Name field, enter OCRLang. This will be a new information field for CVista language conversion options.

      Important:

      You must enter this field name exactly.
    3. Click OK.

    4. The Add Custom Info Field 'OCRLang' screen is displayed.

    5. In the Field Caption field, enter the descriptive caption you want displayed on the Content Check In Form page. For example, OCR Language.

    6. From the Field Type list, choose Text.

    7. Select the Enable Option List check box.

    8. From the Option List Type list, choose Select List Validated.

    9. In the Use option list field, enter xOCRLangList.

    10. Next to the Use Option List field, click Edit.

    11. The Option List for 'xOCRLangList' screen is displayed.

    12. Enter the CVista OCR languages that you want to present as options. The following language names are valid options:

      Important:

      You can use either the English language name or the native equivalent (if listed). However, you must enter the language options exactly as they appear in the following table.
      English Native English Native
      Czech - Italian Italiano
      Danish Dansk Norwegian Norsk
      Dutch Nederlands Polish Polski
      English - Portuguese Português
      Finnish Suomi Russian -
      French Français Spanish Español
      German Deutsch Swedish Svenska
      Greek - Turkish -
      Hungarian Magyar - -

    13. Select the Ignore Case check box.

    14. Click OK.

    15. In the Default Value field, enter the default OCR language option.

    16. Click OK to save the settings and return to the Information Fields tab.

    17. Click Update Database Design.

  8. If the OCRLang Information field has been added, but you want to make changes to the languages option list and/or the default language, complete the following steps:

    1. In the Field Info section, select OCRLang and click Edit.

      The Add Custom Info screen is displayed.

    2. Next to the Use Option List field, click Edit.

      The Option List for 'xOCRLangList' screen is displayed.

    3. Delete any CVista OCR languages that you are not using.

    4. Click OK.

    5. In the Default Value field, enter the default OCR language option.

    6. Click OK to save the settings and return to the Information Fields tab.

  9. Close the Configuration Manager applet. When a user checks in a TIFF file, the user can override the default OCR language by selecting any of the OCR languages you have set up.

6.3 Troubleshooting Tiff Converter Problems

This section covers the following topics:

6.3.1 Installation Problems

The following table lists common problems with installing Inbound Refinery, possible causes, and solutions.

Problem Possible Causes Solutions
Refinery or content server would not start after Tiff Converter components are installed Wrong component installed on content server/refinery. Uninstall the components and reinstall the components on the correct location.

6.3.2 General Conversion Problems

The following table lists general Inbound Refinery conversion problems, possible causes, and solutions.

Problem Possible Causes Solutions
TIFF files are not being converted (they are being passed through in their native format). File formats and conversion methods for Inbound Refinery have not been set up properly in Content Server. Set up file formats and conversion methods for Inbound Refinery. For details, see "Configuring Content Servers to Send Jobs for Tiff Conversion".
TIFF files are not being converted (they are being passed through in their native format). The conversions are taking too long, and Inbound Refinery is timing out. Change your Inbound Refinery timeout settings. For details, see "Changing Timeout Settings".
TIFF files are not being converted (they are being passed through in their native format). Inbound Refinery is failing to launch CVista PdfCompressor. PDF · Mobi · ePub
Zipped TIFF files are not being processed by Inbound Refinery when they are checked in. File formats and conversion methods for Inbound Refinery have not been set up properly in Content Server. Change how zip files are processed. For details, see "Changing Timeout Settings".
The TIFF Conversion conversion method is not available in the Content Server Configuration Manager. The TiffConverterSupport component has not been uploaded and enabled. Upload and enable the TiffConverterSupport component using either the Component Wizard or the Component Manager. This component is included on the Inbound Refinery distribution media.
The TIFF Conversion conversion method is not available in the Content Server Configuration Manager. The TiffConverterSupport component is enabled, but Content Server has not been restarted. Restart Content Server.

6.3.3 CVista PdfCompressor Conversion Problems

The following table lists common conversion problems when using CVISION CVista PdfCompressor, possible causes, and solutions.

Problem Possible Causes Solutions
Inbound Refinery is failing to launch CVista PdfCompressor. The path to the CVista PdfCompress.exe is incorrect. PDF · Mobi · ePub
I have used the CVista PdfCompressor user interface to change conversion settings, but this has had no effect on how TIFF files are being processed. Changes made in the CVista PdfCompressor user interface will not affect how CVista PdfCompressor functions when called by Inbound Refinery. PDF · Mobi · ePub
CVista PdfCompressor is only performing OCR on English text. By default, CVista PdfCompressor uses only an English OCR dictionary. Other OCR languages must be set up. PDF · Mobi · ePub

6.3.4 PDF Thumbnailing and Viewing Problems

The following table lists common problems with creating thumbnails for and viewing the PDF files that are generated by Inbound Refinery, possible causes, and solutions.

Problem Possible Causes Solutions
No thumbnails are being created for PDF files generated by Inbound Refinery. Thumbnailing is not enabled in Inbound Refinery. Enable thumbnailing in Inbound Refinery. For details, refer to the Inbound Refinery Administration Guide.
When viewing PDF files generated by Inbound Refinery in Adobe Acrobat Reader, there are lines or other artifacts on the screen. Acrobat Reader 4 is being used to view the files. When viewing PDF files generated by Inbound Refinery, use Adobe Acrobat Reader 6.0.1 or higher for the best results.