This chapter includes the following sections:
This section covers the following topics:
The Document Conversion Processor provides automated conversion of non-image electronic documents and attachments such as Microsoft Word, Excel, or PDF documents to image format. Create Document Conversion Processor jobs that specify the following:
Which files and attachments to convert, specified by file name (for example, PDF files).
The format to convert non-image files into: black and white TIFF or color JPEG.
If and how an external conversion program should be used to convert documents.
If and how non-image documents and attachments should be merged during document conversion.
How metadata values should be applied when merging documents.
The next post-processing step (if any) after document conversion. For example, converted batches might go to a Recognition Processor job or the Commit Processor might output them. If no post-processing step is specified, processed batches become available for indexing users to complete.
Figure 7-1 Specifying Documents to Convert on the Document Selection Train Stop
Batches undergoing document conversion may contain a mixture of image and non-image document files. If the processor encounters documents that are already in image format, it skips them but still merges documents as configured.
In Capture, document conversion is an intermediate batch flow step. This means you must configure how batches reach document conversion (Configuring Batch Flow to a Document Conversion Processor Job), whether an external conversion program should be used (Specifying Settings for Using an External Conversion Program), and the next post-processing step (if any) that occurs after document conversion (Configuring Post-Processing and Monitoring).
You can monitor document conversion processing through post-processing options. For example, you can configure separate email notifications for batches that process successfully and for those that encounter system errors, and can rename batches and change their status or priority. For post-processing information, see Configuring Post-Processing and Monitoring. For information about system errors, see Handling Document Conversion Processing System Errors.
The Document Conversion Processor is often used with other processor jobs, as described in the following scenarios:
Using a multi-function device (MFD), an end-user scans an expense report, which emails it.
The end-user scans the expense report along with receipts into a single PDF file that contains the cover sheet with one or more bar codes.
After scanning, the MFD emails the document to a designated email account for expenses processing.
The Import Processor imports and processes the email, creates a batch, and forwards it for document conversion.
The Import Processor processes the email message, and creates a batch containing two documents:
1) Report PDF (cover sheet/report)
2) Email message, positioned as the last document in the batch
After processing the email message, the Import Processor forwards the newly created batch to the Document Conversion Processor queue.
The Document Conversion Processor converts the documents in the batch to image format, merges the documents (expense report and email message) into a single document, and forwards the batch for recognition processing.
The Document Conversion Processor converts the PDF and email message to image format.
The Document Conversion Processor merges the two documents into a single document. In case the email had important information, it is included as the last page of the expense document.
The Document Conversion Processor forwards the batch to the Recognition Processor queue so that the expense report number can be automatically recognized from the cover sheet.
The Recognition Processor performs bar code recognition and indexing of the document, and then forwards the batch to the Commit Processor for committing.
The Commit Processor commits the documents in the batch.
A vendor sends an email to a designated email account with two invoices in PDF format attached along with explanatory text in the email message's body text.
The Import Processor imports and processes the email message, creating a batch containing three documents.
The import job is configured to place the email message body as the last document in the batch and to forward the batch for document conversion.
The Document Conversion Processor converts, merges, and forwards the batch for committing.
The document conversion job converts the PDFs and the email message documents to image format.
The document conversion job is configured to merge the last document (email message body) in the batch to each previous document. As a result, the email message body is appended to both invoices.
The email message document (third document) is removed from the batch.
The Commit Processor commits each invoice along with the original email message so that another process such as Oracle WebCenter Forms Recognition can perform automated data extraction.
To add, copy, or edit a document conversion job:
In a selected workspace, select the Processing tab.
In the Document Conversion Jobs table, click the Add button or select a job and click the Edit button.
You can also copy a document conversion job, by selecting a job, clicking the Copy button, and entering a new name when prompted. Copying a job allows you to quickly duplicate and modify it.
Complete settings on the Document Selection train stop.
Enter a name for the job in the Name field. Enter an optional description in the Description field.
In the Documents to Convert field, select whether to process all non-image documents or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert documents, you can select the Do not convert field. To process documents for specific document profiles, select one or more document profiles listed in the Restrict to Document Profiles field. Select All to process documents for all defined document profiles.
In the Attachments to Convert field, select whether to process all non-image document attachments or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert attachments, you can select the Do not convert field. To process attachments for specific attachment types, select one or more attachment types listed in the Restrict to Attachment Types field. Select All to process attachments for all defined attachment types.
Complete settings on the Output Format train stop.
You can convert non-image documents to either black and white TIFF (default) or color JPEG. If you select JPEG, specify an image quality from 1 (lowest quality) to 99 (highest quality) to use for compression in the JPEG Image Quality field; the default value is 85. Select a resolution in the DPI field; the default is 200.
Complete settings on the External Conversion train stop. See Specifying Settings for Using an External Conversion Program.
Complete settings on the Document Merge Options train stop.
On the Post-Processing train stop, specify what happens after document conversion processing completes, depending on its success.
Review settings on the Summary train stop and click Submit to save the job.
Configure how batches flow to the Document Conversion Processor job. See Configuring Batch Flow to a Document Conversion Processor Job.
Test the document conversion job you created.
Deleting a document conversion job makes it unavailable for batches for which it is set as a post-processing step. If a job specified for post-processing is unavailable, an error results for the batch. You may want to change a job to offline for a time before deleting it, allowing you to resolve unexpected issues with its deletion.
To delete a document conversion job:
To deactivate a job instead, select it and click the Toggle Online/Offline button.
If online, document conversion jobs run when selected in a client profile or processor job's Post-Processing train stop. You can temporarily stop the job from running (take it offline) or change a deactivated job to run again.
When reactivating a job, it may take up to a minute for the job to resume processing batches that were queued while the job was offline.
Follow these steps to change a document conversion job to online or offline:
You can also deactivate or activate a document conversion job by selecting or deselecting the Online field on the Document Selection train stop.
The Document Conversion Processor lets you specify if and how to merge documents in a batch during conversion processing and how to assign metadata values when merging documents.
The merge and metadata assignment options accommodate common document conversion scenarios. For example, the Import Processor might import email messages with PDF attachments, then send them for document conversion. Because the email message is common to each attached PDF document and might be important for processing or indexing each one, you would select one of the document merge options that merges a source document (email message, in this case) with all other target documents (PDF).
If and how to merge documents in the batch.
If merging one document with all other documents, where to place that document.
Which document's metadata values to assign and retain when merging.
If and how to include source document’s attachments when merging documents.
Do not merge documents: Select this option (default) if batches to convert are already organized into documents or you want to convert without merging documents. When you select this option, all other fields on the tab are disabled.
Merge all documents: Select this option to merge all documents in the batch into a single document. When selected, the first document in the batch is considered the target document and all other documents are considered source documents and appended to it.
Merge first document with all other documents: Select this option to merge the first document in the batch with all other documents. The first document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.
Merge last document with all other documents: Select this option to merge the last document in the batch with all other documents. The last document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.
Apply source document's metadata values: Select this option to apply the source document's metadata values to the target document. If each source document has metadata values, the last one processed provides the target metadata values.
Allow target document's metadata values to be overwritten: Select this option to allow the target document's metadata values to be overwritten by the source document's metadata values.
These fields can be selected together. They are deselected by default; the default behavior when merging first or last documents is to apply the target document's metadata values. The default behavior when merging all documents is to apply the first document's (target's) metadata values.
Do not include Attachments: Select this option if you do not want to include source document’s attachments.
Include all Attachments to merged documents (default): Select this option to include all the attachments of the source document to the merged documents.
Include Attachments with matching Document Profile Attachment Types: Select this option to include only attachments that match the document profile’s attachment types.
If you select Merge all documents in the Batch Merge Option field, then all documents including the first document in the batch are considered to be source documents.
Use a document conversion job's post-processing options to specify what happens after processing completes, depending on processing success.
To configure post-processing settings:
In a selected workspace, add or edit a Document Conversion Processor job. See Adding, Copying, or Editing a Document Conversion Job.
Click the Post-Processing train stop.
The screen lists the same processing options for successful processing (no system errors) and unsuccessful processing (one or more system errors).
In the Batch Processor and Batch Processor Job fields, specify which processing step, if any, occurs after document conversion processing completes. You can choose a batch processor of None (no processing occurs), Commit Processor, Recognition Processor, or Document Conversion Processor. If you choose Recognition Processor or Document Conversion Processor, specify a processor job.
For example, you might send batches with no system errors to the Commit Processor. You might specify None for batches with system errors, then change their batch status or prefix to facilitate further processing in the client.
In the email address fields, optionally enter an address to which to send an email after processing completes successfully or fails. While configuring and testing a Document Conversion Processor job, you might set yourself to receive email notifications upon system errors, then later automatically alert an administrator of processing errors.
In the remaining fields, specify how to change processed batches.
Rename batches by adding a prefix. For example, rename batches that were unsuccessful with the prefix
ERR for follow-up.
Change batch status or priority. For example, you might change the status of batches with system errors, then create a client profile with batch filtering set to this status to allow qualified users to manually edit and complete batches that encountered errors.
Click Submit to save the job.
To run a document conversion job, you must configure batches to flow to the job for processing. You do this by setting the Document Conversion Processor job as a post-processing step in a client profile or other processor job. To configure batch flow from:
A client profile, see Configuring a Client Profile's Post-Processing.
An Import Processor job, see Configuring Post-Processing.
A Recognition Processor job, see Configuring Post-Processing and Monitoring.
For example, you might create an Import Processor job that imports email messages and their PDF attachments, then sends them to the Document Conversion Processor for conversion to image format, then sends them to a Recognition Processor job for bar code recognition.
The External Conversion train stop lets you specify if and how to use an external conversion program for document conversion.
Full path to the external conversion program:
-dNOPAUSE -q -r200 -sDEVICE=tiffg4 -dBATCH -sOutputFile=<Output File> <Input File>
C:\Users\captureuser\AppData\Local\Temp\1\Sample.PDF, then the output file name will be
A Document Conversion Processor job might encounter system errors such as the following during processing:
Errors converting non-images.
Errors related to accessing the database, such as a network failure.
In addition to email notification, the Capture system administrator can consult the Document Conversion Processor performance metrics and logs to address system issues.