6 Managing Document Conversion Processing

Capture's Document Conversion Processor provides two key functions: using Oracle Outside In Technology, it automatically converts non-image documents to image format, and optionally merges documents. This chapter describes how workspace managers manage document conversion, by configuring document conversion jobs and monitoring their processing.

This chapter includes the following sections:

Introduction to Document Conversion

This section covers the following topics:

Key Document Conversion Processor Job Settings

The Document Conversion Processor provides automated conversion of non-image electronic documents and attachments such as Microsoft Word, Excel, or PDF documents to image format. Create Document Conversion Processor jobs that specify the following:

  • Which files and attachments to convert, specified by file name (for example, PDF files).

  • If a script should be used to customize Document Conversion Processor functions.

  • The format to convert non-image files into: black and white TIFF or color JPEG.

  • If and how an external conversion program should be used to convert documents.

  • If and how non-image documents and attachments should be merged during document conversion.

  • How metadata values should be applied when merging documents.

  • The next post-processing step (if any) after document conversion. For example, converted batches might go to a Recognition Processor job or the Commit Processor might output them. If no post-processing step is specified, processed batches become available for indexing users to complete.

Figure 6-1 Specifying Documents to Convert on the Document Selection Train Stop

Description of Figure 6-1 follows
Description of "Figure 6-1 Specifying Documents to Convert on the Document Selection Train Stop"

Important Points about Document Conversion

Document Conversion Processor Use With Other Batch Processors

The Document Conversion Processor is often used with other processor jobs, as described in the following scenarios:

Use Case 1: Processing Expense Reports
  1. Using a multi-function device (MFD), an end-user scans an expense report, which emails it.

    1. The end-user scans the expense report along with receipts into a single PDF file that contains the cover sheet with one or more bar codes.

    2. After scanning, the MFD emails the document to a designated email account for expenses processing.

  2. The Import Processor imports and processes the email, creates a batch, and forwards it for document conversion.

    1. The Import Processor processes the email message, and creates a batch containing two documents:

      1) Report PDF (cover sheet/report)

      2) Email message, positioned as the last document in the batch

    2. After processing the email message, the Import Processor forwards the newly created batch to the Document Conversion Processor queue.

  3. The Document Conversion Processor converts the documents in the batch to image format, merges the documents (expense report and email message) into a single document, and forwards the batch for recognition processing.

    1. The Document Conversion Processor converts the PDF and email message to image format.

    2. The Document Conversion Processor merges the two documents into a single document. In case the email had important information, it is included as the last page of the expense document.

    3. The Document Conversion Processor forwards the batch to the Recognition Processor queue so that the expense report number can be automatically recognized from the cover sheet.

  4. The Recognition Processor performs bar code recognition and indexing of the document, and then forwards the batch to the Commit Processor for committing.

  5. The Commit Processor commits the documents in the batch.

Use Case 2: Processing Invoices
  1. A vendor sends an email to a designated email account with two invoices in PDF format attached along with explanatory text in the email message's body text.

  2. The Import Processor imports and processes the email message, creating a batch containing three documents.

    The import job is configured to place the email message body as the last document in the batch and to forward the batch for document conversion.

  3. The Document Conversion Processor converts, merges, and forwards the batch for committing.

    1. The document conversion job converts the PDFs and the email message documents to image format.

    2. The document conversion job is configured to merge the last document (email message body) in the batch to each previous document. As a result, the email message body is appended to both invoices.

    3. The email message document (third document) is removed from the batch.

  4. The Commit Processor commits each invoice along with the original email message so that another process such as Oracle WebCenter Forms Recognition can perform automated data extraction.

Adding, Copying, or Editing a Document Conversion Job

To add, copy, or edit a document conversion job:

  1. In a selected workspace, select the Processing tab.

  2. In the Document Conversion Jobs table, click the Add button or select a job and click the Edit button.

    You can also copy a document conversion job, by selecting a job, clicking the Copy button, and entering a new name when prompted. Copying a job allows you to quickly duplicate and modify it.

  3. Complete settings on the Document Selection train stop.

    1. Enter a name for the job in the Name field. Enter an optional description in the Description field.

    2. The Online field is automatically selected. See Activating or Deactivating Document Conversion Jobs.

    3. If the document conversion job uses a script, select it in the Script field. This field displays scripts previously added on the Advanced tab and assigned a type of Document Converter Processor. See Customizing Document Conversion Processing Using Scripts.

    4. In the Documents to Convert field, select whether to process all non-image documents or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert documents, you can select the Do not convert field. To process documents for specific document profiles, select one or more document profiles listed in the Restrict to Document Profiles field. Select All to process documents for all defined document profiles.

    5. In the Attachments to Convert field, select whether to process all non-image document attachments or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert attachments, you can select the Do not convert field. To process attachments for specific attachment types, select one or more attachment types listed in the Restrict to Attachment Types field. Select All to process attachments for all defined attachment types.

  4. Complete settings on the Output Format train stop.

    • You can convert non-image documents to either black and white TIFF (default) or color JPEG. If you select JPEG, specify an image quality from 1 (lowest quality) to 99 (highest quality) to use for compression in the JPEG Image Quality field; the default value is 85. Select a resolution in the DPI field; the default is 200.

    • Under Image Settings, in the Blank Page Byte Threshold field, enter a file size value (in bytes). Any image whose size is less than or equal to the threshold is considered a blank page and therefore deleted.

  5. Complete settings on the External Conversion train stop. See Specifying Settings for Using an External Conversion Program.

  6. Complete settings on the Document Merge Options train stop.

    See Specifying How Documents are Merged and Metadata is Assigned.

  7. On the Post-Processing train stop, specify what happens after document conversion processing completes, depending on its success.

    See Configuring Post-Processing and Monitoring.

  8. Review settings on the Summary train stop and click Submit to save the job.

  9. Configure how batches flow to the Document Conversion Processor job. See Configuring Batch Flow to a Document Conversion Processor Job.

  10. Test the document conversion job you created.

Configuring Blank Page Detection in a Document Conversion Job

When users perform document conversion, non-image documents are converted to image documents and they may contain blank pages. To configure Capture to automatically detect and delete blank pages from documents, specify a threshold file size, where any image whose size is less than or equal to this threshold size will be considered a blank page and therefore will be deleted.

  1. When adding or editing a Document Conversion Job (see Adding, Copying, or Editing a Document Conversion Job), select the Output Format train stop.
  2. Under Image Settings, in the Blank Page Byte Threshold field, enter a file size value (in bytes).

    Note:

    Specify 0 to include blank pages.

  3. Click Submit to save the Document Conversion Job.

Deleting a Document Conversion Job

Deleting a document conversion job makes it unavailable for batches for which it is set as a post-processing step. If a job specified for post-processing is unavailable, an error results for the batch. You may want to change a job to offline for a time before deleting it, allowing you to resolve unexpected issues with its deletion.

To delete a document conversion job:

  1. In a selected workspace, select the Processing tab.
  2. Select a job to delete in the Document Conversion Jobs table, and click the Delete button.

    To deactivate a job instead, select it and click the Toggle Online/Offline button.

  3. When prompted, click Yes to confirm the deletion.

Activating or Deactivating Document Conversion Jobs

If online, document conversion jobs run when selected in a client profile or processor job's Post-Processing train stop. You can temporarily stop the job from running (take it offline) or change a deactivated job to run again.

Note:

When reactivating a job, it may take up to a minute for the job to resume processing batches that were queued while the job was offline.

Follow these steps to change a document conversion job to online or offline:

  1. On the Processing tab, select a job in the Document Conversion Jobs table. Notice that the Status column displays Online or Offline for each job.
  2. Click the Toggle Online/Offline button to either activate the job or deactivate it.

    You can also deactivate or activate a document conversion job by selecting or deselecting the Online field on the Document Selection train stop.

Specifying How Documents are Merged and Metadata is Assigned

The Document Conversion Processor lets you specify if and how to merge documents in a batch during conversion processing and how to assign metadata values when merging documents.

The merge and metadata assignment options accommodate common document conversion scenarios. For example, the Import Processor might import email messages with PDF attachments, then send them for document conversion. Because the email message is common to each attached PDF document and might be important for processing or indexing each one, you would select one of the document merge options that merges a source document (email message, in this case) with all other target documents (PDF).

  1. Add or edit a document conversion job as described in Adding, Copying, or Editing a Document Conversion Job, and select the Document Merge Options train stop. Use this tab to specify:
    • If and how to merge documents in the batch.

    • If merging one document with all other documents, where to place that document.

    • Which document's metadata values to assign and retain when merging.

    • If and how to include source document’s attachments when merging documents.

  2. Select a batch merge option. You can choose:
    • Do not merge documents: Select this option (default) if batches to convert are already organized into documents or you want to convert without merging documents. When you select this option, all other fields on the tab are disabled.

    • Merge all documents: Select this option to merge all documents in the batch into a single document. When selected, the first document in the batch is considered the target document and all other documents are considered source documents and appended to it.

    • Merge first document with all other documents: Select this option to merge the first document in the batch with all other documents. The first document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.

    • Merge last document with all other documents: Select this option to merge the last document in the batch with all other documents. The last document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.

  3. In the Source Document Page Placement field, specify whether to add the source document to the start or end of the target document.
  4. Specify how to apply metadata values to merged documents. You can choose:
    • Apply source document's metadata values: Select this option to apply the source document's metadata values to the target document. If each source document has metadata values, the last one processed provides the target metadata values.

    • Allow target document's metadata values to be overwritten: Select this option to allow the target document's metadata values to be overwritten by the source document's metadata values.

    Note:

    These fields can be selected together. They are deselected by default; the default behavior when merging first or last documents is to apply the target document's metadata values. The default behavior when merging all documents is to apply the first document's (target's) metadata values.

  5. In the Source Attachments field, specify whether and how to include source document’s attachments when merging documents:
    • Do not include Attachments: Select this option if you do not want to include source document’s attachments.

    • Include all Attachments to merged documents (default): Select this option to include all the attachments of the source document to the merged documents.

    • Include Attachments with matching Document Profile Attachment Types: Select this option to include only attachments that match the document profile’s attachment types.

    Note:

    If you select Merge all documents in the Batch Merge Option field, then all documents including the first document in the batch are considered to be source documents.

Configuring Post-Processing and Monitoring

Use a document conversion job's post-processing options to specify what happens after processing completes, depending on processing success.

To configure post-processing settings:

  1. In a selected workspace, add or edit a Document Conversion Processor job. See Adding, Copying, or Editing a Document Conversion Job.

  2. Click the Post-Processing train stop.

    The screen lists the same processing options for successful processing (no system errors) and unsuccessful processing (one or more system errors).

  3. In the Batch Processor and Batch Processor Job fields, specify which processing step, if any, occurs after document conversion processing completes. You can choose a batch processor of None (no processing occurs), Commit Processor, Recognition Processor, or Document Conversion Processor. If you choose Recognition Processor or Document Conversion Processor, specify a processor job.

    For example, you might send batches with no system errors to the Commit Processor. You might specify None for batches with system errors, then change their batch status or prefix to facilitate further processing in the client.

  4. In the email address fields, optionally enter an address to which to send an email after processing completes successfully or fails. While configuring and testing a Document Conversion Processor job, you might set yourself to receive email notifications upon system errors, then later automatically alert an administrator of processing errors.

  5. In the remaining fields, specify how to change processed batches.

    1. Rename batches by adding a prefix. For example, rename batches that were unsuccessful with the prefix ERR for follow-up.

    2. Change batch status or priority. For example, you might change the status of batches with system errors, then create a client profile with batch filtering set to this status to allow qualified users to manually edit and complete batches that encountered errors.

  6. Click Submit to save the job.

Configuring Batch Flow to a Document Conversion Processor Job

To run a document conversion job, you must configure batches to flow to the job for processing. You do this by setting the Document Conversion Processor job as a post-processing step in a client profile or other processor job. To configure batch flow from:

For example, you might create an Import Processor job that imports email messages and their PDF attachments, then sends them to the Document Conversion Processor for conversion to image format, then sends them to a Recognition Processor job for bar code recognition.

Specifying Settings for Using an External Conversion Program

The External Conversion train stop lets you specify if and how to use an external conversion program for document conversion.

  1. Add or edit a document conversion job as described in Adding, Copying, or Editing a Document Conversion Job, and select the External Conversion train stop.
  2. Enable use of an external conversion program by selecting the On option in the External Conversion Use field. All the other options in this train stop are enabled only when you select On for this feature. By default, the Off option is selected and the feature is disabled.
  3. Enter one or more file name filters in the File Name Filter(s) field to restrict processing to certain file names. By default, this field is set to *.* (all files are processed).
  4. Specify the full path and file name of the external conversion program in the External Conversion Program field. Here is an example to use GhostScript as the external conversion program and to export to a Group IV multi-page TIFF at 200 DPI:
    • Full path to the external conversion program:

      C:\Program Files\gs\gs9.14\bin\gswin64c.exe

    • Command line:

      -dNOPAUSE -q -r200 -sDEVICE=tiffg4 -dBATCH -sOutputFile=<Output File> <Input File>
  5. In the Command Line Parameters field enter <Input File> to substitute the file name of the input file and enter <Output File> to substitute the file name of the output file. During document conversion process, these literal strings are replaced with the actual values before being passed on to the external conversion program. <Input File> is the name of the source document file to convert (includes the full path and the file name). <Output File> is the multiple page TIFF file to be generated by the external conversion program (includes the full path and the file name). The output file name is always the same as the input file name path and file name appended with “.TIF”. For example, if the input file name is C:\Users\captureuser\AppData\Local\Temp\1\Sample.PDF, then the output file name will be C:\Users\captureuser\AppData\Local\Temp\1\Sample.PDF.TIF.
  6. Specify the process monitoring method by selecting either of the two options in the Process Monitoring Method field. By default, the Duration Timeout option is selected with a 15 minute timeout. Duration Timeout is the maximum time the external conversion program will be allowed to execute before being forcefully terminated. Select the Output File Inactivity Timeout option to proactively monitor the conversion process by checking the size of the output file. This option is useful if the external conversion program updates the <Output File> in real-time. When this option is enabled, if the size of the output file has not changed during the time specified, the external conversion program’s process will be forcefully terminated. Increment or decrement the timeout value in the Timeout (minutes) field.
    For both the timeout options, the minimum value is 1 minute and the maximum value is 1000 minutes.
  7. Specify a return code in the Success Return Code field. If the external conversion program terminates execution with a return code, then, this field indicates a successful conversion. Use of a return code allows the Document Conversion Processor to detect an error and log it. The default value of this field is 0.

Customizing Document Conversion Processing Using Scripts

To customize Document Conversion Processor behavior, incorporate JavaScripts.

To use a script in a Document Conversion Processor job:
  1. From a developer, obtain a Document Conversion Processor JavaScript file.

    See Creating Document Conversion Processor Scripts in Developing Scripts for Oracle WebCenter Enterprise Capture.

  2. On the workspace's Advanced tab, import the script, specifying Document Converter Processor as its script type, and identifying the script file. See Managing Capture Scripts.
  3. On the Processing tab, select the Document Conversion Processor job and click the Edit button.
  4. In the Script field on the Document Selection train stop, select the document conversion script you imported.
  5. Test the results of the added document conversion script.

Handling Document Conversion Processing System Errors

A Document Conversion Processor job might encounter system errors such as the following during processing:

  • Errors converting non-images.

  • Errors related to accessing the database, such as a network failure.

In addition to email notification, the Capture system administrator can consult the Document Conversion Processor performance metrics and logs to address system issues.