This chapter describes how you configure and manage commit profiles and their processing for a workspace. It includes the following sections:
This section covers the following topics:
Organizations choose to commit batches in a variety of ways, depending on configuration and needs.
Some organizations use the WebCenter Content commit driver or WebCenter Content Imaging commit driver to commit documents directly into these content repositories.
Some organizations use the Oracle Documents Cloud Service commit driver to commit documents directly into Oracle Documents Cloud Service.
Some organizations use the Text File commit driver to output to a folder and list file, for later use in a variety of ways. Another process might process the documents for intelligent data recognition (for example, using Oracle WebCenter Forms Recognition), or import documents independently into their content management system. You can customize the contents of commit text files, but they typically contain a reference to each document file and associated metadata values.
You can configure multiple commit profiles to process for a workspace. For example, one commit profile may commit documents to a content management system, while another commits to a text file. A Service Bureau might output to a commit text file containing references to images, then FTP the information to clients. Still another organization might commit directly into their content management system and create a commit text file for backup, to be kept for 90 days or archived permanently.
Figure 9-1 Commit Profile General Settings
Regardless of the format in which image documents were captured and formatted in Capture, you can output them to one of the following formats upon commit:
TIFF Multi-Page: Outputs documents to multiple page, TIFF format.
PDF Image-Only: Outputs documents to PDF/A format.
PDF Searchable: Outputs documents to PDF (portable document format) containing the original image with hidden text to facilitate full-text searching of the document. Refer to the Capture Certification Matrix for a list of platforms that support searchable PDF.
Caution:
The text generated from OCR results for searchable PDF documents CANNOT be edited and may contain recognition errors.
Non-image documents are files such as Microsoft Word, Microsoft Excel, PDF, or EML documents. Depending on configuration, non-image files may be retained in their native format in Capture. At commit, non-image documents are processed differently than image documents:
Non-image documents remain in their native format and are not converted to TIFF or PDF format. The document output setting is ignored for all non-image documents in the batch, but applied to any image documents.
For example, a Microsoft Word document committed by a commit profile configured to output searchable PDF is committed to the repository as a Word document. Similarly, email messages captured and stored in EML format and output to Content Server are committed in EML format regardless of the commit profile's selected document output format.
Once non-image documents have been successfully committed by all online and applicable commit profiles, they are deleted from the Capture workspace just like image documents.
Batches arriving at commit processing are ready to undergo processing by one or more commit profiles defined for the workspace. A batch may be uniform (for example, consisting of all image documents that use the same document profile) or varied (for example, consisting of image and non-image documents assigned different document profiles). Regardless, batch committing follows this general process:
For a batch to reach commit, Commit Processor must be selected as a post-processing step in the client profile or processor job.
Capture runs all commit profiles defined for the workspace on the batch, subject to the following rules:
Commit profiles run one at a time, in the order in which they are specified on the workspace's Commit tab.
Commit profiles must be Online. Changing a profile to Offline deactivates its use in commit processing for the workspace.
A commit profile skips processing any documents whose assigned document profile does not match the document profiles assigned to the commit profile. See Restricting a Commit Profile Based on Document Profile.
As Capture processes each document, it verifies that required metadata fields are complete. An error occurs for a document if required fields do not contain values.
If an error is encountered, commit processing may skip the document, skip the commit profile, or cancel commit processing, as described in About Commit Error Handling.
Capture commits documents in the batch.
Capture continues to commit all the documents within the batch, repeating this process until all commit profiles have been executed or an error occurs which causes the entire commit process to be canceled.
When there are no remaining documents in the batch, Capture deletes the batch.
If a document fails to be committed, it remains in the batch and an error is generated.
Based on system configuration, Capture outputs records to the audit table.
When a document has been successfully committed by all applicable commit profiles, Capture removes the document's files and associated metadata from the batch.
Regardless of the format in which image attachments were captured and formatted in Capture, you can output them to one of the following formats upon commit:
TIFF Multi-Page: Outputs attachments to multiple page, TIFF format.
PDF Image-Only: Outputs attachments to PDF/A format.
PDF Searchable: Outputs attachments to PDF (portable document format) containing the original image with hidden text to facilitate full-text searching of the document. Refer to the Capture Certification Matrix for a list of platforms that support searchable PDF.
Use a commit profile's error handling options to specify what happens when errors are encountered during batch committing. If an error is encountered, you can:
Skip to the next document
This option skips committing the current document and begins committing the next document in the batch.
Skip to the next commit profile
This option stops the current commit profile from executing and begins processing the next commit profile (if specified).
Cancel the commit
This option stops the entire commit process, including any other commit profiles, from executing.
During commit, a record is maintained that indicates if a document/attachment has been successfully committed with a commit profile. When a document/attachment is about to be committed by a commit profile, a check is performed to see if the document/attachment has already been successfully committed by the commit profile. If it has, the document/attachment will not be reattempted.
Regardless of error settings, all documents in which an error is encountered remain in the batch until the error is resolved and they are successfully committed.
If uncommitted documents remain in a batch after all commit profiles have been executed, the batch lock is cleared and the batch is put in a ready state so that it can be opened in the Capture client.
The Capture system administrator can consult the Commit Processor performance metrics and logs to address commit processing issues and errors. See Viewing Performance Metrics and Understanding Loggers for Capture in Administering Oracle WebCenter Enterprise Capture.
To add, copy, or edit a commit profile:
In a selected workspace, click the Commit tab.
In the Commit Profiles table, click the Add button or select a commit profile and click the Edit button.
You can also copy a commit profile by selecting one, clicking the Copy button, and entering a new name when prompted. Copying a commit profile allows you to quickly duplicate and modify it.
Complete settings on the General Settings train stop.
Enter a name in the Commit Profile Name field.
Ensure that the Online field is selected. When online, the commit profile is executed whenever a batch for the workspace is processed by the Commit Processor. See Activating and Ordering Commit Profiles.
In the Commit Driver field, select the method by which the profile commits batches. See Methods of Committing Documents.
In the Document Output Format field, specify the format in which to commit image documents. See About Image Document Output Formats and Configuring PDF Searchable Document Output.
In the Attachment Document Output Format field, specify the format in which to commit image attachments. See About Image Attachment Output Formats and Configuring PDF Searchable Attachment Output.
In the Error Handling Policy field, specify what happens if an error is encountered in one or more documents in the batch being committed. See About Commit Error Handling.
In the Restrict Commit to Document Profiles field, select one or more document profiles to restrict the commit profile to only documents assigned those document profiles. See Restricting a Commit Profile Based on Document Profile.
Complete date format, locale, and encoding settings. Default Date Format option is enabled for Text File and Oracle Documents Cloud Service commit driver. Default Locale option is enabled for Text File, WebCenter Content, and Oracle Documents Cloud Service commit driver. Encoding option is enabled for Text File commit driver.
On the Commit Driver Settings train stop, complete driver-specific settings.
For text file, see Configuring a Text File Commit Profile.
For WebCenter Content, see Configuring a WebCenter Content Commit Profile.
For WebCenter Content Imaging, see Configuring a WebCenter Content Imaging Direct Commit Profile.
For Oracle Documents Cloud Service, see Configuring an Oracle Documents Cloud Service Commit Profile.
On the Document Output Settings train stop, specify settings for searchable PDF output such as optional text file and OCR options. See Configuring PDF Searchable Document Output.
This train stop is available only if you selected PDF Searchable in the Document Output Format field on the General Settings train stop.
On the Attachment Output Settings train stop, specify settings for searchable PDF output such as optional text file and OCR options. See Configuring PDF Searchable Attachment Output. This train stop is available only if you selected PDF Searchable in the Attachment Document Output Format field on the General Settings train stop.
Click Submit to save the commit profile.
Configure how batches flow to commit processing. See Configuring Batch Flow to the Commit Processor.
Activate the commit profile and specify execution order.
All the online commit profiles are processed in the order in which they are listed on the Commit tab. For more information, see Activating and Ordering Commit Profiles.
Test the commit profile by committing a batch (for example, from the client). Search for and view the document in the repository or location specified in the commit profile. See Example: Viewing Batch Commit Results.
Deleting a commit profile makes it unavailable for batches for which commit processing is set as a post-processing step. You may want to change a commit profile to offline for a time before deleting it, allowing you to resolve unexpected issues with its deletion.
To delete a commit profile:
Commit profiles apply to all batches committed from the associated workspace. When you specify Commit Processor as a post-processing step in a client profile or other processor job, all online commit profiles begin processing in the order in which they are listed on the Commit tab. You can limit their processing in several ways:
You can change a commit profile to Offline, as described below, which temporarily deactivates it.
In the commit profile, you can restrict committing to documents assigned to one of the selected document profiles. This enables you to commit different types of documents into separate content management systems based on the document's assigned document profile. See Restricting a Commit Profile Based on Document Profile.
To activate and order commit profile processing.
As described in How Commit Profiles are Applied During Commit Processing, when a batch reaches commit processing as a final post-processing step, the Commit Processor runs the batch through all online commit profiles as ordered on the Commit tab. However, you can prevent a commit profile from processing a document by restricting the commit profile to processing only documents assigned to specified document profiles. For example, you might use this method to configure one commit profile to commit Purchase Order documents to WebCenter Content Imaging and another commit profile to commit Customer Agreement documents to Content Server.
To restrict a commit profile's execution based on document profile:
A text file commit profile creates a delimited text file that contains the full path to each document file, followed by document metadata and the full path to each attachment file. The document files along with attachments are extracted from the batch, formatted, and output to a specified folder.
You can configure text file commit driver settings. For example, you can specify where files are written, which values are written and delimited in the text file, and how files are named.
To configure text file commit driver settings:
In a selected workspace, create a commit profile. See Adding, Copying, or Editing a Commit Profile.
Select settings on the General Settings train stop.
Select Text File in the Commit Driver field.
In the Default Date Format, Default Locale, and Encoding fields, complete date format, locale, and encoding settings to be used in text files.
Select the Commit Driver Settings train stop.
Because you selected Text File as the commit driver, this train stop displays settings specific to text file commits.
On the Text File Folder tab, specify how to write commit text files.
Optionally, you can select the Do not create Commit Text File option in the Commit Text File Create Option field if you do not want to create a commit text file. If you select this option, all other fields on this tab will be disabled.
In the Commit Text File Folder field, enter a location to which to write the commit text files. Use a fully qualified folder path relative to the Capture server's operating system.
Optionally, select the Store in subfolders field and its related field to store commit text files in subfolders named based on:
- Year
- Year and Month
- Year, Month, and Day
In the File Prefix and File Extension fields, specify an optional prefix and extension.
On the Document Folder tab, specify how to write the document files.
In the Document Folder field, enter a location to which to write the document files. Use a fully qualified folder path relative to the Capture server's operating system.
Optionally, select the Create a folder per committed batch field to create a folder for each committed batch, where folders are named using the syntax BatchID.WorkspaceID.
Optionally, under Subfolder Options, select Store in subfolders to store document files in subfolders, then specify how to name the subfolders. You can name subfolders based on:
- Year
- Year and Month
- Year, Month, and Day
You can also name subfolders based on one or more metadata field values. Select Metadata Field(s), click Configure, and select and order one or more metadata fields; the Subfolder Path field displays the structure of the subfolder path using the specified metadata fields.
Optionally, in the Attachment Option field, select the Exclude Attachments option to exclude the attachments. If this option is selected, the document’s attachment(s) are not placed into its corresponding attachment folder when committed. In addition, the document’s attachment folder is not created and the attachment record is also not created in the text file.
In the If folder name consists of invalid characters field, specify how Capture should handle any invalid characters found in folder names (remove the characters or cancel the document's commit).
On the Formatting tab, specify the metadata fields and delimiters to use in the commit text file.
In the Field Delimiter field, specify the character to use to separate fields. If you choose Other, enter a character in the Other Character field that becomes available.
Optionally, in the Text Qualifier field, specify the character mark to use to identify the beginning and end of text fields. The default value for this field is None.
From the Available Fields list, select metadata fields to include in the commit text file and move them to the Selected Fields list. For example, you might include the two document file fields <File Name (full path)>
and <Document ID>
(<Document ID>
is a system generated field that represents a GUID for the document). Reorder the metadata fields as needed. The order of the fields represents the order that the metadata fields appear in a document record.
For each attachment in a document, an attachment record will be created immediately following the document record. The attachment record will be in the following format:
@Attachment<Delimiter><Attachment Type><Delimiter><Attachment File Name>
Attachment Type
is the name of the corresponding attachment type.
Delimiter
is the field delimiter specified on the Formatting tab.
Attachment File Name
is the relative path and file name to the attachment from the perspective of the text file.
On the Document File Naming tab, specify how to name the document files.
Optionally, select the Name document file based on Metadata field values field to name the document file based on one or more selected metadata field values. If this field is not selected, Capture names the document files using the default naming scheme that includes the internal batch ID, an underscore, and a numeric identifier. From the Available Fields list, select metadata fields to include and move them to the Selected Fields list.
Order the selected fields. The order of the fields affects the naming of the document file.
In the Field Delimiter field, specify the field delimiter to use between each of the selected metadata field values.
In the If File Name Consists of Invalid Characters field, specify how Capture should handle any invalid characters found in document file names.
In the Items Linked to Multiple Pages field, select the Create a copy for each page option to create multiple copies for each page linked to the same batch item. For example, when pages are duplicated in the Capture client, only one batch item physically exists, but multiple pages reference the item. During commit, you can specify whether only one file is output for the batch item or multiple copies of the batch item are output for each page linked to that item.
Click Submit to save the commit profile.
Test the commit profile.
After committing a batch, locate the commit folder and view its contents, including text file and document files.
This section covers the following topics:
Use the WebCenter Content commit driver to commit documents from Capture to Content Server. This commit driver uses the Content Server RIDC API.
Configuring the driver settings involves the following main steps. Detailed steps are provided in Configuring WebCenter Content Commit Driver Settings.
Figure 9-3 Content Commit Driver Settings
To create a Content commit profile, you need login credentials to a WebCenter Content Server instance. To configure WebCenter Content commit driver settings:
In a selected workspace, create a commit profile. See Adding, Copying, or Editing a Commit Profile.
On the General Settings train stop, select WebCenter Content in the Commit Driver field.
Select the Commit Driver Settings train stop.
Because you selected WebCenter Content as the commit driver, this train stop displays settings specific to WebCenter Content commits.
On the Login tab, enter a user name, password, and server URL to log in to the WebCenter Content Server instance during commit. This user must have permission to check-in documents into Content. The user name and password are case sensitive. Use the following format for the Server URL:
http://
hostname:port
/cs/idcplg
Click Login. Upon successful login, the status changes to Connected, and the remaining tabs become available.
On the Check-In tab, specify the document title, type, security group, and account within Content.
For information about document titles, see Titling and Naming WebCenter Content Document Files.
For information about specifying type, security group, and account metadata values, see Assigning Metadata During a Content Server Commit.
On the Field Mappings tab, map Capture metadata fields to Content Server fields. As documents are committed using this driver, the document's metadata field values are written into the specified Content Server fields. To map a Content Server field, select one and click the Edit button. In the window that displays, select a Capture field or one of the predefined system level fields.
When Assign Values Dynamically and By Field Mappings are selected on the Check-In tab, the following additional fields are displayed in the Content Server Field column on the Field Mappings tab: <Account>, <Content Type>, and <Security Group>. Mapping these fields allows metadata values to be dynamically assigned at commit, based on these Capture field values.
For information about mapping to custom Content Server fields, see Mapping to Custom Content Server Fields.
On the Options tab, specify settings to customize the document check-in process, such as naming document files and providing an alternative check-in service. See Titling and Naming WebCenter Content Document Files. On this tab, you can also optionally specify to exclude attachments.
Note:
Capture uses the Alternative Check-In Service field to override the default service for the document check-in process.
If you want to use fast check-in and the configuration setting DirectReleaseNewCheckinDoc
is set to true (DirectReleaseNewCheckinDoc=true
) in the Content Server, you need to use the CHECKIN_NEW
service in the Capture commit profile.
Capture uses CHECKIN_UNIVERSAL
, which does not perform a fast check-in. CHECKIN_NEW
can recognize the DirectReleaseNewCheckinDoc=true
setting. To use the CHECKIN_NEW
service to perform the check-in, in the Alternative Check-In Service field, enter CHECKIN_NEW
.
When committing files to Content Server, you can specify a document title and a document file name for them using a similar process.
To specify document titles or document file names:
While creating a WebCenter Content commit profile (Configuring a WebCenter Content Commit Profile), display the title/file name builder window:
For document titles, click the Configure button next to Document Title on the Check-In tab.
For document file names, click the Configure button on the Options tab.
The title or name builder window is displayed, as shown in Figure 9-4.
Construct a document title or file name.
From the Available Fields list, select a Capture field to add it to the title or name (multiple Capture fields can be selected).
Use the Document Title or Document File Name list to build the title or name. You can include the following elements:
Literal characters: Enter alphanumeric characters
Capture metadata fields
You can assign values to the Content Server Type, Security Group, and Account fields:
When you assign values using this method, documents committed using the profile are assigned the metadata values selected from those available for the specified Content Server.
When you assign values using this method, documents committed using the profile are assigned metadata values based on Capture field values.
When you assign values using this method, a combination of metadata values is assigned based on a user's choice list selection. For example, if a user selects a value of Invoice from a choice list metadata field, the document might be assigned a type of ACCTG, a security group of Finance, and no account value (<No Account> selected).
This configuration requires either a user defined choice list (Managing User Defined Choice Lists) or a database choice list (Managing Database Choice Lists).
In addition to mapping Capture metadata fields to Content Server fields on the Field Mappings tab, as described in Configuring WebCenter Content Commit Driver Settings, you can map Capture fields to custom Content Server fields. You might do this, for example, when you have custom code running on the Content Server that specifically references these fields. (Without custom code, these custom fields are ignored.)
To add, edit, or delete custom Content Server fields for metadata mapping:
Use the WebCenter Content Imaging commit driver to commit documents from Capture to WebCenter Content Imaging using one of these methods:
Direct Commit, where Capture connects to a WebCenter Content Imaging instance and commits documents directly into it, as described in this section.
Input Agent Commit, where Capture creates input files from Capture batches, which are then uploaded in bulk to WebCenter Content Imaging by its Input Agent service. See Configuring a WebCenter Content Imaging Input Agent Commit Profile.
Note:
To support multiple configurations, create and activate multiple WebCenter Content Imaging commit profiles. For example, you might create a profile to commit documents directly and another to commit them via Input Agent.
This section covers the following topics:
With a direct commit profile, Capture logs into WebCenter Content Imaging and commits documents directly using WebCenter Content Imaging's web services. Creating this type of WebCenter Content Imaging commit profile involves the following configuration. Detailed steps are provided in Configuring Direct Commit Settings.
Figure 9-5 WebCenter Content Imaging Commit Driver Settings (Direct Commit)
You can set the commit profile to search for a matching document during direct commits. When searching, Capture compares metadata values for the document being committed to search parameters specified in the selected WebCenter Content Imaging search definition.
When searching and appending:
If Capture finds a match to an existing document, it appends the new document to the matched document.
If Capture does not find a match, a new document is created.
If Capture finds multiple matching documents, the document commit stops.
Important Points About Searching and Appending
The Searching and appending option is available for documents output in multiple page TIFF format only. (On the commit profile's General Settings train stop, the Document Output Format field must be set to TIFF Multi-Page).
Because the performance of searches affects the performance of the commit process, searches should be carefully defined and tuned.
The WebCenter Content Imaging search definition is integral to matching, and must reflect client profile settings in the workspace. For example, if a client profile contains five metadata fields but the search definition contains three fields only, searches are less specific, and documents considered separate documents in Capture could be appended to in WebCenter Content Imaging. In addition, the search definition must use AND conditions, not OR conditions.
Depending on the database used for the WebCenter Content Imaging Server, the search may be case sensitive, meaning that if metadata values are identical but their case differs, no matching occurs.
Use the WebCenter Content Imaging commit driver to commit documents from Capture to WebCenter Content Imaging using one of these methods:
Direct Commit, where Capture connects to a WebCenter Content Imaging instance and commits documents directly into it. See Configuring a WebCenter Content Imaging Direct Commit Profile.
Input Agent Commit, where Capture creates input files from Capture batches, which are then uploaded in bulk to WebCenter Content Imaging by its Input Agent service, as described in this section.
Note:
To support multiple configurations, create and activate multiple WebCenter Content Imaging commit profiles. For example, you might create a profile to commit documents directly and another to commit them via Input Agent.
This section covers the following topics:
During an Input Agent commit, Capture writes a delimited input file for the batch, along with its images, to a specified Capture output directory. The input file lists document images along with their associated metadata values for bulk uploading into WebCenter Content Imaging. The WebCenter Content Imaging Input Agent service monitors a specified Input Agent input directory and upon finding input files that match its input mask, uploads the files' referenced documents in bulk into WebCenter Content Imaging.
Creating this type of WebCenter Content Imaging commit profile involves the following configuration. Detailed steps are provided in Configuring Input Agent Settings.
Configuring an Input Agent commit requires an input definition created in WebCenter Content Imaging's Manage Inputs area.
Creating an input definition in WebCenter Content Imaging includes these steps:
Follow these steps to configure a commit profile that writes input files from Capture to WebCenter Content Imaging for processing by a WebCenter Content Imaging Input Agent. Also see About Input Agent Committing.
Note:
A WebCenter Content Imaging input definition is required for configuring an Input Agent commit.
Figure 9-6 WebCenter Content Imaging Commit Driver Settings (Input Agent Commit)
This section covers the following topics:
Use the Oracle Documents Cloud Service commit driver to commit documents from Capture to Oracle Documents Cloud Service (DOCS). See Oracle Cloud Administering Oracle Documents Cloud Service in the Oracle Cloud documentation set.
Configuring the driver settings involves the following main steps. Detailed steps are provided in Configuring Oracle Documents Cloud Service Commit Driver Settings.
Create a commit profile and select Oracle Documents Cloud Service as the commit driver.
Connect to the DOCS instance after entering login information.
Specify where to upload the documents in DOCS and how to name the files. Create metadata collections.
To configure Oracle Documents Cloud Service commit driver settings:
In a selected workspace, create a commit profile. See Adding, Copying, or Editing a Commit Profile.
On the General Settings train stop, select Oracle Documents Cloud Service in the Commit Driver field.
Select the Commit Driver Settings train stop.
Because you selected Oracle Documents Cloud Service as the commit driver, this train stop displays settings specific to Oracle Documents Cloud Service commits.
On the Login tab, enter a user name, password, and server URL to log in to the DOCS server instance during document commits.
Click Login. Upon successful login, the status changes to Connected, and the remaining tabs become available.
On the Document Folder tab, specify where to commit the documents:
In the Parent Folder section, click Select Folder... to display the Select Parent Folder window. In the Select Parent Folder window, if there is no parent folder specified for the commit profile, the user's home folder is displayed; otherwise, the selected parent folder is displayed. Select a folder to which to commit all the documents and click OK. The name of the selected parent folder is displayed in the Name field and the corresponding unique ID of the parent folder is displayed in the ID field. Optionally, click Clear Folder to clear the current selection and select a new DOCS folder.
Optionally, in the Subfolder Creation section, select Create Subfolders using Field values option to store document files in subfolders created dynamically within the parent folder and named using metadata field values. From the Available Fields list, select metadata fields to include and move them to the Selected Fields list. Each metadata field represents a subfolder and the order of the metadata fields represents the subfolder hierarchy.
In the If folder name consists of invalid characters field, specify how Capture should handle any invalid characters found in a subfolder name by selecting either Remove invalid characters option or Cancel document commit option.
Note:
When creating subfolders in DOCS, detection of existing folder names is case insensitive.
If a metadata field value used to create a subfolder is blank, and there are subsequent non-blank subfolders, the document commit is aborted with an error message indicating that the subfolder path is invalid.
If a metadata field value used to create a subfolder is blank and there are no subsequent non-blank subfolders, the document is stored in the parent folder of the first blank subfolder.
See the following examples:
If metadata fields, CustName=”Corp 1”,CorrespondenceType=”AP”,OrderNumber=NULL
, then the document is stored in <Parent Folder>\Corp 1\AP
.
If metadata fields, CustName=”Corp 1”,CorrespondenceType=NULL,OrderNumber=NULL
, then the document is stored in <Parent Folder>\Corp 1
.
If metadata fields, CustName=NULL,CorrespondenceType=NULL,OrderNumber=NULL
, then the document is stored in <Parent Folder>\
.
On the Document File Naming tab, specify how to name the document files and document attachment files.
Optionally, select Use original file name for non-image files field to name non-image files using their original file names.
Optionally, select the Name document file based on Metadata field values field to name the file based on one or more selected metadata field values. If this field is not selected, Capture names the files using the default naming scheme that includes the internal batch ID, an underscore, and a numeric identifier. From the Available Fields list, select metadata fields to include and move them to the Selected Fields list.
Order the metadata fields in the Selected Fields list. The order of the fields affects the naming of the document file.
In the Field Delimiter field, specify the field delimiter to use between metadata field values.
In the If File Name Consists of Invalid Characters field, specify how Capture should handle any invalid characters found in document file names by selecting either Remove invalid characters option or Cancel document commit option.
On the Metadata tab, select the Enable Metadata Collection Support field to create metadata collections and optionally include selected system fields to metadata collections.
From the System Fields list, select system fields to include when creating metadata collections. The System Fields list is enabled for selection only if the Enable Metadata Collection Support field is selected.
When a document is committed, a metadata collection is automatically created if it does not already exist. A metadata collection is named as:
<Workspace Name>-<DocumentProfile>
If a document is not assigned a document profile, the metadata collection is named as: <Workspace Name>-Default
.
All the document profile fields are added to the metadata collection.
System fields are named in the following format: Capture!<System Field>
. For example, Capture!<Batch_ID>
.
Note:
If a system field contains /, <, or > character, these characters will be removed.
If a system field contains a : (colon), it will be replaced with a - (hyphen). For example: System Date:Year
will be replaced with System_Date-Year
.
If a metadata collection name or system field name contains an invalid DOCS character except space character, the commit process will be aborted and an error message will be displayed. To correct this issue, a Capture administrator needs to update the document profile and/or the system field name and then, recommit the batch. If a metadata collection or system field name contains a space character, the space character will be replaced with an _ (underscore).
When the document is committed, if a field does not already exist in the collection, the DOCS commit driver adds the new field to the collection. Capture fields that are deleted are ignored and not removed from the collection. And, system fields that were previously selected but currently not selected are also ignored and not removed from the collection.
After a metadata collection is created,
it is assigned to the parent folder of the document,
all the document profile field values are assigned to the metadata collection associated with the document,
and system field values are assigned to the metadata collection associated with the document as applicable.
On the Options tab, in the Document Attachment Options field, optionally, specify whether and how to include document attachments:
Select the Exclude Attachments option if you want to exclude attachments from documents when committing documents to DOCS.
Select the Include Attachments in sub-folders per Attachment Type option to include attachments in subfolders based on attachment type when committing documents to DOCS.
Note:
For each attachment, a subfolder is created in the primary document’s DOCS folder (if the subfolder does not already exist). The subfolder is named using the attachment type name of the document attachment. If the attachment does not have an attachment type name, the subfolder is named as Attachments
.
If you specified a document output format of PDF Searchable on the General Settings train stop, complete fields on the Document Output Settings train stop to configure additional settings related to this document format.
Figure 9-7 PDF Searchable Document Output Settings
In a selected workspace, add or edit a commit profile. See Adding, Copying, or Editing a Commit Profile.
Complete settings on the Document Output Settings train stop.
In the Optical Character Recognition settings:
Select the languages to use for the searchable PDF output. The English language is selected by default.
Note:
In case Asian languages are selected for OCR, English is also recognized by default without having to specify it explicitly.
The following table lists the available languages:
Name | Alternate Name | Alphabet Classification |
---|---|---|
English |
Latin |
|
German |
Latin |
|
French |
Latin |
|
Dutch |
Latin |
|
Norwegian |
Latin |
|
Swedish |
Latin |
|
Finnish |
Latin |
|
Danish |
Latin |
|
Icelandic |
Latin |
|
Portuguese |
Latin |
|
Spanish |
Latin |
|
Catalan |
Catalonian |
Latin |
Galician |
Gallegan |
Latin |
Italian |
Latin |
|
Maltese |
Latin |
|
Greek |
Greek |
|
Polish |
Latin |
|
Czech |
Latin |
|
Slovak |
Latin |
|
Hungarian |
Latin |
|
Slovenian |
Latin |
|
Croatian |
Latin |
|
Romanian |
Rumanian |
Latin |
Albanian |
Latin |
|
Turkish |
Latin |
|
Estonian |
Latin |
|
Latvian |
Latin |
|
Lithuanian |
Latin |
|
Esperanto |
Latin |
|
Serbian(Latin) |
Bosnian |
Latin |
Serbian |
Cyrillic |
|
Macedonian |
Cyrillic |
|
Moldavian |
Cyrillic |
|
Bulgarian |
Cyrillic |
|
Byelorussian |
Belarusian, Belarusan |
Cyrillic |
Ukrainian |
Cyrillic |
|
Russian |
Cyrillic |
|
Chechen |
Cyrillic |
|
Kabardian |
Cyrillic |
|
Afrikaans |
Latin |
|
Aymara |
Latin |
|
Basque |
Latin |
|
Bemba |
Ichibemba |
Latin |
Blackfoot |
Siksika |
Latin |
Breton |
Latin |
|
Brazilian |
Latin |
|
Bugotu |
Bughotu |
Latin |
Chamorro |
Latin |
|
Tswana(Chuana) |
Chuana, Setswana |
Latin |
Corsican |
Latin |
|
Crow |
Latin |
|
Eskimo |
Inuit |
Latin |
Faroese |
Latin |
|
Fijian |
Latin |
|
Frisian |
Latin |
|
Friulian |
Latin |
|
Gaelic(Irish) |
Irish |
Latin |
Gaelic(Scottish) |
Scottish |
Latin |
Ganda(Luganda) |
Luganda |
Latin |
Guarani |
Latin |
|
Hani |
Latin |
|
Hawaiian |
Latin |
|
Ido |
Latin |
|
Indonesian |
Latin |
|
Interlingua |
Latin |
|
Kasub |
Kashubian |
Latin |
Kawa |
Wa, Blang |
Latin |
Kikuyu |
Gikuyu |
Latin |
Kongo |
Latin |
|
Kpelle |
Latin |
|
Kurdish |
Latin |
|
Latin |
Latin |
|
Luba |
Latin |
|
Luxembourgian |
Letzeburgesch, Luxembourgeois |
Latin |
Malagasy |
Latin |
|
Malay |
Latin |
|
Malinke |
Maninkakan |
Latin |
Maori |
Latin |
|
Mayan |
Latin |
|
Miao |
Hmong |
Latin |
Minankabaw |
Minangkabau |
Latin |
Mohawk |
Latin |
|
Nahuatl |
Latin |
|
Nyanja |
Chewa, Chichewa |
Latin |
Occidental |
Latin |
|
Ojibway |
Ojibwa |
Latin |
Papiamento |
Latin |
|
PidginEnglish |
Tok Pisin |
Latin |
Provencal |
Occitan |
Latin |
Quechua |
Latin |
|
Rhaetic |
Romansh |
Latin |
Romany |
Latin |
|
Ruanda |
Rwanda, Kinyarwanda |
Latin |
Rundi |
Latin |
|
Samoan |
Latin |
|
Sardinian |
Latin |
|
Shona |
Latin |
|
Sioux |
Dakota |
Latin |
Sami |
Latin |
|
Sami(Lule) |
Lule Sami |
Latin |
Sami(Northern) |
Northern Sami |
Latin |
Sami(Southern) |
Southern Sami |
Latin |
Somali |
Latin |
|
Sotho |
Sesotho, Sutu |
Latin |
Sundanese |
Latin |
|
Swahili |
Kiswahili |
Latin |
Swazi |
Swati |
Latin |
Tagalog |
Filipino |
Latin |
Tahitian |
Latin |
|
Tinpo |
Latin |
|
Tongan |
Latin |
|
Tun |
Tunia |
Latin |
Visayan |
Cebuano |
Latin |
Welsh |
Latin |
|
Sorbian(Wend) |
Wend |
Latin |
Wolof |
Latin |
|
Xhosa |
Latin |
|
Zapotec |
Latin |
|
Zulu |
Latin |
|
Japanese |
Asian |
|
Chinese(S) |
Simplified Chinese |
Asian |
Chinese(T) |
Traditional Chinese |
Asian |
Korean |
Asian |
Optionally, to optimize OCR recognition for financial, medical and legal documents, from the list of available professional dictionaries, select the ones that you want to use in the searchable PDF output.
Note:
Although the Asian languages support English recognition, the professional dictionaries are not used when an Asian language is selected for OCR.
Select the Single Language Detection per Page field to detect the predominant language on a page in the document. This option is particularly useful when the language for a page is unknown. This field is unchecked by default.
Note:
The Single Language Detection per Page field is enabled only when multiple languages are selected.
When this field is not selected, it is recommended to use fewer than five languages to improve recognition accuracy.
A configuration error message may be displayed in the following cases:
If you deselect this field when multiple Asian languages are selected for OCR.
If you select this field when an unsupported language (for example, Greek) has been selected for recognition.
If you select an unsupported language for OCR when this field is selected.
In the Advanced PDF Settings:
Select the Color Image Quality for color images. You can choose: Minimum (minimum size), Good (medium size), or Best (maximum size).
In the Compatibility field, select the PDF version of the searchable PDF document. You can choose: PDF 1.4, PDF 1.5, PDF 1.6, PDF 1.7, PDF A/1b, PDF A/2b, or PDF A/2u.
Select the Create Linear PDF for Efficient Web Viewing field to create a linear PDF document. This option is enabled only when you select PDF 1.4, PDF 1.5, PDF 1.6, or PDF 1.7 in the Compatibility field. Linear PDFs are documents that have been organized in a special way to enable efficient incremental access in a network environment. This field is unchecked by default.
Select the Preserve Original Image Orientation field to prevent the automatic reorientation of the image. This field is unchecked by default.
In the Text File Options settings:
Select the Create Text File field to create a text file along with the searchable PDF output file that contains all text captured during the OCR process.
In the Text File Code Page field, search for and select the character set to use for the OCR text file. The default value is UTF-8. The following table lists the available character sets:
Code Page Name | Description |
---|---|
Big5 |
Traditional Chinese (support Eten extension and HKscs non-standard mode) |
Code Page 437 |
DOS Latin US |
Code Page 850 |
DOS Latin 1 |
Code Page 852 |
DOS Latin 2 |
Code Page 860 |
DOS Portuguese |
Code Page 863 |
DOS French-Canadian |
Code Page 865 |
DOS Nordic |
Code Page 866 |
DOS Cyrillic CIS |
CWI Magyar |
DOS Hungarian |
EUC-CN |
Simplified Chinese |
EUC-JP |
Japanese |
EUC-TW |
Traditional Chinese |
GB 18030 |
Simplified Chinese |
GBK |
Simplified Chinese |
Greek-ELOT |
DOS Greek |
Greek-MEMOTEK |
DOS Greek |
HKSCS-2004 |
Traditional Chinese (HKscs standard mode, supports Eten extension) |
Icelandic |
DOS Icelandic |
IVKAM C-S |
Czech & Slovak |
Latin 1 |
ISO 8859-1 |
Mac Central EU |
PT 202 |
Mac INSO Latin 2 |
MAC CE |
Mac Primus CE u |
MAC CE |
Macintosh |
Mac Western |
Magyar Ventura |
DOS Hungarian |
Maltese |
Malta; 7 bits used |
Mazowia Polish |
DOS Polish |
OCR |
Non Standard Win |
Roman 8 |
For HP printers |
Shift_JIS |
Japanese |
Sloven & Croat |
7 bits used |
Turkish |
DOS Turkish |
UHC |
Korean (extended EUC-KR) |
Unicode |
Multilingual |
UTF-8 |
Multilingual |
Windows ANSI |
Code Page 1252 |
Windows Baltic |
Code Page 1257 |
Windows Cyrillic |
Code Page 1251 |
Windows Eastern |
Code Page 1250 |
Windows Esperant |
Non Standard Win |
Windows Greek |
Code Page 1253 |
Windows Sami |
Sami |
Windows Turkish |
Code Page 1254 |
WordPerfect |
Multilingual |
WordPerfect Old |
Multilingual |
Click Submit to save the commit profile.
To commit batches, a client profile or batch processor job must be configured to flow to the Commit Processor, as described in How Commit Profiles are Applied During Commit Processing. You do this by setting the Commit Processor as the post-processing step in a client profile or other processor job. To configure batch flow from:
A client profile, see Configuring a Client Profile's Post-Processing.
An Import Processor job, see Configuring Post-Processing.
A Document Conversion Processor job, see Configuring Post-Processing and Monitoring.
A Recognition Processor job, see Configuring Post-Processing and Monitoring.