|Oracle® Fusion Middleware Managing Oracle WebCenter Capture
11g Release 1 (11.1.1)
Part Number E37898-02
|PDF · Mobi · ePub|
Capture's Document Conversion Processor provides two key functions: using Oracle Outside In Technology, it automatically converts non-image documents to image format, and optionally merges documents. This chapter describes how workspace managers manage document conversion, by configuring document conversion jobs and monitoring their processing.
This chapter includes the following sections:
This section covers the following topics:
The Document Conversion Processor provides automated conversion of non-image electronic documents such as Microsoft Word, Excel, or PDF documents to image format. Create document conversion processor jobs that specify the following:
Which files to convert, specified by file name (for example, PDF files).
The format to convert non-image files into: black and white TIFF or color JPEG.
If and how non-image documents should be merged during document conversion and how metadata values should be applied when merging.
The next post-processing step (if any) after document conversion. For example, converted batches might go to a recognition processor job or the commit processor might output them. If no post-processing step is specified, processed batches become available for indexing users to complete.
Figure 6-1 Specifying Documents to Convert on the Document Selection Train Stop
Batches undergoing document conversion may contain a mixture of image and non-image document files. If the processor encounters documents in image format, it processes the batches, skipping conversion of image batch items but merging documents as configured.
In Capture, document conversion is an intermediate batch flow step. This means you must configure how batches reach document conversion (Section 6.7) and the next post-processing step (if any) that occurs after document conversion (Section 6.6).
You can monitor document conversion processing through post-processing options. For example, you can configure separate email notification for batches that process successfully and for those that encounter system errors, and can rename batches and change their status or priority. For post-processing information, see Section 6.6. For information about system errors, see Section 6.8.
The Document Conversion Processor is often used with other processor jobs, as described in the following scenarios:
Using a multi-function device (MFD), an end user scans an expense report, which emails it.
The end user scans the expense report along with receipts into a single PDF file that contains a coversheet with one or more bar codes.
The end user then attaches the PDF to an email and sends it to a designated email account.
The Import Processor imports and processes the email, creates a batch, and forwards it for document conversion.
The Import Processor processes the email message, and creates a batch containing two documents:
1) Report PDF (coversheet/report)
2) Email message, positioned as the last document in the batch
After processing the email message, the Import Processor forwards the batch to the Document Conversion Processor queue.
The Document Conversion Processor converts the documents in the batch to image format, merges the documents (expense report and email message) into a single document, and forwards the batch for recognition processing.
The Document Conversion Processor converts the PDF and email message to image format, so that the Recognition Processor can perform bar code recognition on them.
The Document Conversion Processor merges the two documents into a single document. In case the email had important information, it is included as the last page of the expense document.
The Document Conversion Processor forwards the batch to the Recognition Processor queue.
The Recognition Processor performs bar code recognition and indexing of the document, then forwards the batch for committing.
The Commit Processor commits the batch.
A vendor sends an email to a designated email account with two invoices in PDF format attached and explanatory text in the email message's body text.
The Import Processor imports and processes the email message, creating a batch containing three documents.
The import job is configured to place the email message body as the last document in the batch and to forward the batch for document conversion.
The Document Conversion Processor converts, merges, and forwards the batch for committing.
The document conversion job converts the PDFs and the email message documents to image format.
The document conversion job is configured to merge the last document (email message body) in the batch to each previous document. As a result, the email message body is appended to both invoices.
The email message document (third document) is removed from the batch.
The Commit Processor commits each invoice along with the original email message so that another process such as Oracle WebCenter Forms Recognition can perform automated data extraction.
To add, copy, or edit a document conversion job:
In a selected workspace, select the Processing tab.
In the Document Conversion Jobs table, click the Add button or select a job and click the Edit button.
You can also copy a document conversion job, by selecting a job, clicking the Copy button, and entering a new name when prompted. Copying a job allows you to quickly duplicate and modify it.
Complete settings on the Document Selection train stop.
Enter a name for the job in the Name field.
In the Documents to Convert field, select whether to process all non-image documents or only ones with a specified filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon.
Complete settings on the Output Format train stop.
You can convert non-image documents to either black and white TIFF (default) or color JPEG. If you select JPEG, specify an image quality from 0 (lowest quality) to 100 (highest quality) to use for compression in the JPEG Image Quality field; the default value is 85. Select a resolution in the DPI field; the default is 200.
Complete settings on the Document Merge Options train stop.
See Section 6.5.
On the Post-Processing train stop, specify what happens after document conversion processing completes, depending on its success.
See Section 6.6.
Review settings on the Summary train stop and click Submit to save the job.
Configure how batches flow to the processor job. See Section 6.7.
Test the document conversion job you created.
Deleting a document conversion job makes it unavailable for batches for which it is set as a post-processing step. If a job specified for post-processing is unavailable, an error results for the batch. You may want to change a job to offline for a time before deleting it, allowing you to resolve unexpected issues with its deletion.
To delete a document conversion job:
In a selected workspace, select the Processing tab.
Select a job to delete in the Document Conversion Jobs table, and click the Delete button.
To deactivate a job instead, select it and click the Toggle Online/Offline button.
When prompted, click Yes to confirm the deletion.
If online, document conversion jobs run when selected in a client profile or processor job's Post-Processing train stop. You can temporarily stop the job from running (take it offline) or change a deactivated job to run again.
When reactivating a job, it may take up to a minute for the job to resume processing batches that were queued while the job was offline.
Follow these steps to change a document conversion job to online or offline:
On the Processing tab, select a job in the Document Conversion Jobs table. Notice that the Status column displays Online or Offline for each job.
Click the Toggle Online/Offline button to either activate the job or deactivate it.
Note that you can also deactivate or activate a document conversion job by selecting or deselecting the Online field on the Document Selection train stop.
The document conversion processor lets you specify if and how to merge documents in a batch during conversion processing and how to assign metadata values when merging documents.
The merge and metadata assignment options accommodate common document conversion scenarios. For example, the Import Processor might import email messages with PDF attachments, then send them for document conversion. Because the email message is common to each attached PDF document and might be important for processing or indexing each one, you would select one of the document merge options that merges a source document (email message, in this case) with all other target documents (PDF).
Add or edit a document conversion job as described in Section 6.2, and select the Document Merge Options train stop. Use this tab to specify:
If and how to merge documents in the batch.
If merging one document with all other documents, where to place that document.
Which document's metadata values to assign and retain when merging.
Select a batch merge option. You can choose:
Do not merge documents: Select this option (default) if batches to convert are already organized into documents or you want to convert without merging documents. When you select this option, all other fields on the tab are disabled.
Merge all documents: Select this option to merge all documents in the batch into a single document. When selected, the first document in the batch is considered the target document and all other documents are considered source documents and appended to it.
Merge first document with all other documents: Select this option to merge the first document in the batch with all other documents. Note that the first document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.
Merge last document with all other documents: Select this option to merge the last document in the batch with all other documents. Note that the last document is considered the source document and is added to the start or end of the target document based on your Source Document Page Placement setting.
In the Source Document Page Placement field, specify whether to add the source document to the start or end of the target document.
This field applies only when the Merge first document with all other documents or Merge last document with all other documents option is selected in the Batch Merge Options field.
Specify how to apply metadata values to merged documents. You can choose:
Apply source document's metadata values: Select this option to apply the source document's metadata values to the target document. If each source document has metadata values, the last one processed provides the target metadata values.
Allow target document's metadata values to be overwritten: Select this option to allow the target document's metadata values to be overwritten by the source document's metadata values.
Note that these fields can be selected together. They are deselected by default; the default behavior when merging first or last documents is to apply the target document's metadata values. The default behavior when merging all documents is to apply the first document's (target's) metadata values.
Use a document conversion job's post-processing options to specify what happens after processing completes, depending on processing success.
To configure post-processing settings:
In a selected workspace, add or edit a document conversion processor job. See Section 6.2.
Click the Post-Processing train stop.
The screen lists the same processing options for successful processing (no system errors) and unsuccessful processing (one or more system errors).
In the Batch Processor and Batch Processor Job fields, specify which processing step, if any, occurs after document conversion processing completes. You can choose a batch processor of None (no processing occurs), Commit Processor, Recognition Processor, or Document Conversion Processor. If you choose Recognition Processor or Document Conversion Processor, specify a processor job.
For example, you might send batches with no system errors to the commit processor. You might specify None for batches with system errors, then change their batch status or prefix to facilitate further processing in the client.
In the email address fields, optionally enter an address to which to send an email after processing completes successfully or fails. While configuring and testing a document conversion processor job, you might set yourself to receive email notifications upon system errors, then later automatically alert an administrator of processing errors.
In the remaining fields, specify how to change processed batches.
Rename batches by adding a prefix. For example, rename batches that were unsuccessful with the prefix ERR for follow-up.
Change batch status or priority. For example, you might change the status of batches with system errors, then create a client profile with batch filtering set to this status to allow qualified users to manually edit and complete batches that encountered errors.
Click Submit to save the job.
To run a document conversion job, you must configure batches to flow to the job for processing. You do this by setting the document conversion processor job as a post-processing step in a client profile or other processor job. To configure batch flow from:
A client profile, see Section 4.17.
An import processor job, see Section 5.10.
A recognition processor job, see Section 7.2.4.
For example, you might create an Import Processor job that imports email messages and their PDF attachments, then sends them to the Document Conversion Processor for conversion to image format, then sends them to a Recognition Processing job for bar code recognition.
A document conversion processor job might encounter system errors such as the following during processing:
Errors converting non-images.
Errors related to accessing the database, such as a network failure.
In addition to email notification, the Capture system administrator can consult the Document Conversion Processor performance metrics and logs to address system issues.