1 Capture Workspace Overview

Oracle WebCenter Capture provides scalable document capture focused on process-oriented imaging applications and image-enabling enterprise applications. With a web interface for centralized or distributed environments, Capture streamlines the capture process for paper and electronic documents. It is fully integrated with Oracle WebCenter Content: Imaging and Oracle WebCenter Content to provide organizations with one system to capture, store, manage, and retrieve their mission critical business content.

This chapter provides a general introduction to Oracle WebCenter Capture and its administration. It includes the following sections:

1.1 About Oracle WebCenter Capture

Batches and documents are the primary drivers of work in WebCenter Capture. In Capture, documents are scanned or imported and maintained in batches. A batch consists of scanned images or electronic document files (such as PDF or Microsoft Office files) that are organized into documents and assigned metadata (index) values. Each document shares a set of metadata values.

WebCenter Capture involves the following main processes:

1.1.1 Capture

Capture refers to the scanning or importing of documents into batches within a Capture workspace (see Section 1.3). Common document input scenarios include:

  • High volume scanning using a production document imaging scanner

  • Ad hoc remote scanning or import, such as from a business application

  • Automated import, such as from an email inbox

Capture provides these methods of document input:

  • Using the Capture client, end-users manually scan documents or import electronic document files into batches using client profiles created by workspace managers. See Section 1.2 and Section 4.1.

  • Using settings stored in an import job, the Import Processor automatically imports images and other electronic documents directly into Capture from email, network folders, or list files. See Section 5.1.

1.1.2 Conversion

Depending on the business scenario, non-image documents input into Capture may need to be converted to a different format. For example, an organization might convert PDF expense reports attached to imported email messages to an image format to allow their bar codes to be read. Capture's Document Conversion Processor automatically converts and merges documents within a batch using settings stored in a document conversion job. See Section 6.1.

1.1.3 Classification

Classification in Capture refers to separating a batch into its logical documents and assigning a set of metadata values to each document.

Classification can occur manually or automatically in Capture, in a variety of ways.

Document Separation

Document separation can occur in multiple ways:

  • Manually, by users in the Capture client. For example, users can select a client profile configured for a specific number of pages per document. They can also insert separator sheets between documents prior to scanning to identify a new document. While visually inspecting documents, client users can also add document separations as needed.

  • Manually, by users during file import in the Capture client.

  • Automatically, during import by the Import Processor, based on job settings.

  • Automatically, during bar code recognition by the Recognition Processor. If a batch is sent to the Recognition Processor, it will automatically perform bar code recognition and document classification. See Section 7.1.

Metadata Assignment

In Capture, documents are assigned a set of metadata values based on a document profile, which identifies the metadata fields available to index that type of document. Metadata values can be assigned:

  • Manually, by users in the metadata pane of the Capture client.

  • Automatically, during processing by the Import Processor, based on job settings.

  • Automatically, during processing by the Recognition Processor, based on job settings.

Metadata fields can take a variety of forms in Capture, including choice lists, dependent choice lists, database lookups, auto populated fields, and fields with input masks and display formats. Workspace managers configure these metadata field definitions in the workspace and then use them in client profiles or processor jobs. See Section 3.4.

1.1.4 Release

Capture uses a lock and release system to ensure that only one user or processor job has access to any batch at a time, as described below.

Client Batch Icon Description

Displays locked to user batch icon. Locked to User

A batch automatically becomes locked to a user when the user creates or opens (expands) it, and stays locked until the user releases it.

When done working with batches, users release them. Releasing a batch automatically synchronizes its documents and metadata with the WebCenter Capture server and forwards the batch for post-processing, if post-processing is configured in its client profile.

Displays locked batch icon. Locked

If a user attempts to open a batch locked to another user, a message displays, identifying the workstation of the user who locked the batch.

Displays unlocked batch icon. Unlocked

If a released batch is not set for post-processing, the lock is removed and any user can open it.

Displays processing batch icon. Processing

If a released batch is set for post-processing (commit, recognition, or document conversion), its icon changes to processing.

Displays batch error icon. Error

If a batch enters an error state, its lock is released. This allows the batch to be examined and locked by another processor or user. (Users can right-click an error icon to view details about the error.)


1.1.5 Commit

Committing a batch takes all of its documents (image and/or non-image) and their metadata, writes them in a selected output format to a specific location or content repository, and then removes them from the Capture workspace. This allows the documents to be located and accessed in the content repository via their metadata or contents. Note that when a batch is committed, some of its documents may not be committed. For example, documents without their required fields populated are skipped. If all documents in a batch are committed, the batch is also deleted from the Capture workspace.

Batches are committed by Capture's Commit Processor using settings selected in an assigned commit profile. Commit profiles can commit to:

  • WebCenter Content Server

  • WebCenter Imaging (directly or via Input Agent)

  • Commit text file (often for consumption by another process or Oracle technology)

You can output batches to one of the following formats during commit:

  • TIFF Multi-Page

  • PDF Image-Only

  • PDF Searchable, with an optional full text file that contains text found in documents via Optical Character Recognition (OCR).

Note that during committing, non-image files that were not converted to image format remain in their original format.

For information about committing, see Section 8.1.

1.2 About the Capture Client

The Capture client is the end-user application that a knowledge worker or scan operator uses to create batches using scanners or document files imported from a file folder. The client's main functionality includes:

  • Scanning and importing documents, using the industry standard TWAIN interface to scan from desktop scanners or other TWAIN-compliant input devices

  • Reviewing, editing, and indexing documents

  • Releasing documents so they can be further processed, checked into a content repository, or attached to business application records

The Capture client provides a single window whose upper left batch pane is fixed, while its other panes change, depending on the batch pane selection. For example, the document pane shown in the right of Figure 1-1 displays page thumbnails (smaller page representations) and options for editing documents and their pages. The lower left indexing pane shows metadata fields to complete for the selected document.

Figure 1-1 Capture Client Window

Description of Figure 1-1 follows
Description of "Figure 1-1 Capture Client Window"

For Capture client user information, see Oracle Fusion Middleware Using Oracle WebCenter Capture.

1.3 About Capture Workspaces

A Capture workspace represents a complete capture system, providing a centralized location for metadata, configuration profiles, and physical data for a particular environment. Capture client users create and access batches within the workspace to which they have been granted access. Workspace managers configure and manage workspaces they have been granted access to and control others' access to the workspace.

The Capture workspace provides these benefits:

  • A separate work area useful for managing document capture for a department or division, or even an organization

  • Shareable elements for re-use in multiple Capture components

  • Secure access to workspaces, provided by Capture's user/group restrictions on workspaces

  • Ability to copy a workspace, for easily adapting its configuration for another environment

  • Ability to restrict access to batches created within a workspace by client profile

For more information about workspaces, see Section 3.1.

1.4 About the Capture Batch Processors

WebCenter Capture provides the following processors, which workspace managers configure for automation in the workspace console:

  • The Import Processor provides automated bulk importing, from sources such as a file system folder, a delimited list (text/ASCII) file, or the inbox/folder of an email server. The import job monitors the source and imports at a specified frequency, such as once a minute, hour, or day. See Section 5.1.

  • The Document Conversion Processor automatically converts non-image documents to a specified format in Capture using Oracle Outside In Technology. Documents can also be merged in various ways during conversion. For example, the conversion processor can convert document files such as PDFs or Microsoft Office documents to TIFF image format. See Section 6.1.

  • The Recognition Processor automatically performs bar code recognition, separation of documents, including patch code separation, and automatic indexing. See Section 7.1.

  • The Commit Processor executes commit profiles to automatically output batches to a specified location or content repository, then removes the batches from the workspace. Supported output formats include TIFF, image only PDF/A, and Searchable PDF document. A commit profile specifies how to output the documents and their metadata, and includes metadata field mappings, output format, error handling instructions, and commit driver settings. See Section 8.1.

Batch Flow

Workspace managers can queue batches to specific batch processors through post-processing options.

  • If configured, import processing is the initial processing step, responsible for creating and capturing document batches.

  • If configured, commit processing is the final processing step.

  • If configured, document conversion and recognition processing are intermediate processes. Because bar code recognition requires image format, document conversion typically precedes recognition.

The following is an example batch flow:

  1. Batches are captured either in the client or through an import processor job.

    Post-processing in the client profile or import job is set to Document Conversion.

  2. The imported batches are converted through a document conversion job to images.

    Post-processing in the conversion job is set to Recognition Processor.

  3. Bar codes on the converted image documents are processed by a recognition job.

    Post-processing in the recognition job is set to Commit Processor.

  4. When commit processing is reached, online commit profiles process the documents, committing them to a repository or network folder.

1.5 About the Capture Workspace Console

Capture provides a central configuration console in which workspace managers create and manage workspaces and their elements for use throughout Capture. For example, workspace managers create metadata fields, choice lists, and database lookups in the console, then use them in multiple areas such as client profiles and batch processors. See Section 3.1.

Figure 1-2 Workspace Console Window

Description of Figure 1-2 follows
Description of "Figure 1-2 Workspace Console Window"

1.6 About Capture Administrator and User Roles

Capture provides the following administrator and user roles, each with a different access level and set of tasks:

  • System Administrators install and configure the Capture system environment, map users or groups configured in a policy store to Capture roles, and monitor Capture processing. They also manage data sources for use in a Capture workspace. It is assumed that system administrators have system administration permissions, including Enterprise Manager and Oracle WebLogic Server access. System administration is covered in Oracle Fusion Middleware Administering Oracle WebCenter Capture.

  • Capture Workspace Managers have read/write access to workspaces they create and ones to which they have been granted access. They can add, edit, copy, and delete workspaces, and configure profiles and processor jobs. Note that workspace managers can also use certain WLST commands with their assigned workspaces. For task information, see Section 1.8.

  • Capture Workspace Viewers have read-only access to workspaces to which they have been granted access. For example, a workspace viewer might review workspace configuration, client profiles, and processor job settings within the workspace to resolve issues. Workspace viewers cannot make changes to workspaces.

  • Capture Users have access to the Capture client only. They can see and select only those client profiles to which they have been granted access. These end-users create batch-related content within a workspace, including batches, documents, and pages. Capture user information is covered in Oracle Fusion Middleware Using Oracle WebCenter Capture. Note that workspace managers are typically assigned both the workspace manager and user role, which enables them to switch between configuring the workspace and testing configurations in the client.

  • Developers who write customization scripts for use in Capture components may either be added as Capture workspace managers and granted the workspace manager role or they may provide scripts to workspace managers who import, reference, and test their scripts in Capture components. Information on developing scripts for Capture is provided in Oracle Fusion Middleware Developing Scripts for Oracle WebCenter Capture.

1.7 About Capture Security

Capture's user login, access, and authentication are integrated with Oracle Enterprise Manager and Oracle Platform Security Services (OPSS). After authentication, users' permissions depend on their assigned Capture roles, which the system administrator assigns in Oracle Enterprise Manager.

Capture provides multiple access points, as described in the following sections:

1.7.1 About Workspace Access

Access to the console and workspaces functions as follows:

  • For the console, workspace managers and viewers assigned the Capture workspace manager or viewer role, respectively, can sign in and access the workspace console.

  • For workspaces, workspace managers automatically have access to workspaces they create. To access any other workspace, a manager with workspace access must grant other managers and viewers access to it on the Security tab, as described in Section 3.3.

1.7.2 About Client Access

Access to the client and client profiles functions as follows:

  • For the client, Capture users, workspace viewers, and workspace managers assigned the Capture user role can sign into and access the client.

  • For client profiles, workspace managers must grant users, managers, or viewers access to a selected client profile on the profile's Security train stop, as described in Section 4.16.

1.8 Workspace Manager Tasks

The following steps summarize how you configure and manage a workspace environment, using the workspaces pane and workspace tabs.

  1. Get started accessing the Workspace Console and the Capture client, as described in Chapter 2.

  2. In the workspaces pane, create and manage workspaces.

    The Workspace Manager role has access to workspaces they create or other workspace managers grant them access to. See Section 3.2.

  3. On the Security tab, manage workspace access for Capture users. See Section 3.3.

  4. On the Metadata and Classification tabs, create and manage workspace elements for use in the workspace, including metadata fields, choice lists, database lookups, batch statuses, and document profiles. See Chapter 3.

  5. On the Capture tab, create and manage client profiles, as described in Chapter 4. In the Capture client, test the client profiles.

  6. On the Capture tab, create and manage import processor jobs, as described in Chapter 5.

  7. On the Processing tab, create and manage recognition processor jobs (Chapter 6) and document conversion jobs (Chapter 7).

  8. On the Commit tab, create and manage commit profiles. See Section 8.1.

  9. On the Advanced tab, import scripts provided by developers. See Section 3.11.

  10. If needed, use WLST commands to import and export workspaces, or release or export batches. See Chapter 10.

1.9 About Capture Customization

To extend Capture functionality, Java developers can write and incorporate JavaScript extensions. For example, a Capture client extension could validate an account number used in a transaction using a proprietary calculation or force the first letter of a name to uppercase.

Capture behavior can be customized in the following components:

  • Capture client

  • Import Processor

  • Recognition Processor

For information about using scripts in supported Capture components, see Section 3.11. For developer information about writing scripts, see Oracle Fusion Middleware Developing Scripts for Oracle WebCenter Capture.

1.10 Capture Workspace Use Case

You can use Oracle WebCenter Capture to process virtually any type of document. This guide features a use case in which Oracle WebCenter Capture processes a large volume of customer documents, with the workspace manager automating the process as much as possible to meet business needs.

The Customer workspace processes these types of customer documents:

  • Correspondence

  • Purchase Orders

  • Customer Agreements

Figure 1-3 illustrates the workspace's main configuration.

Figure 1-3 Customer Workspace

Description of Figure 1-3 follows
Description of "Figure 1-3 Customer Workspace"

1.10.1 Document Profiles for the Customer Workspace

The three document profiles accommodate the main types of documents processed, and include: Correspondence, Purchase Orders, and Customer Agreements.

  1. Correspondence arrives by mail and takes the following path:

    1. Client users scan and index batches of correspondence documents using a Correspondence client profile, then release them.

    2. Documents are output (committed) to a folder via a text commit profile and picked up by another process.

  2. Purchase Orders arrive by email and take the following path:

    1. An Import Processor job checks for new email messages for specified accounts, and imports and indexes the email message as a document as well as attached purchase order documents.

    2. Documents are committed to Content Imaging for transaction processing.

  3. Customer Agreements are scanned using a variety of multi-function devices (MFDs), which send the scanned documents to a network file share. These documents may arrive in either TIFF or PDF format, and take the following path:

    1. An Import Processor job checks the network folder for new files and imports them.

    2. A Document Conversion job converts the PDF documents to a standard image format. A Document Conversion job is used to ensure all incoming documents are in image format to ensure processing by the Recognition Processor.

    3. A Recognition job reads the images' bar codes, separates the images into documents, and indexes the documents. Document separation is needed in case multiple agreements were scanned into a single file.

    4. Documents are committed to Content Server for storage and retrieval.

1.10.2 Metadata Configuration for the Customer Workspace

Figure 1-4 displays metadata fields defined for the Customer workspace, which are then included in document profiles as they apply. For example, the Correspondence document profile includes Customer ID, Customer Name, Product Family, Product, and Correspondence Type metadata fields.

Figure 1-4 Example Metadata Fields and Configuration

Description of Figure 1-4 follows
Description of "Figure 1-4 Example Metadata Fields and Configuration"

  • The Correspondence Type metadata field uses a user defined choice list that allows users to select the type of correspondence document they are indexing (a complaint, satisfaction, suggestion, or other document).

  • The Customer Name metadata field uses a database lookup, where a search for customer name returns both the full customer name and the unique Customer ID.

  • The Product Family and Product metadata fields use database choice lists that allow users to select product information from a database source. They have a choice list dependency, where the user's product family choice determines the products listed.

1.10.3 Processor Configuration for the Customer Workspace

There are multiple business scenarios in which document conversion and merging play an integral role, particularly when Capture's other automated batch processors are also involved.

For example, suppose your organization receives PDF documents such as expense reports via email. Each PDF document contains a bar code and the email message may contain relevant information. You might create the following configuration:

  1. An Import Processor job imports the email messages, creating batches containing two documents (the PDF and the email message, positioned last). After processing the email message, Import Processor forwards the batch to the Document Conversion processor.

  2. A Document Conversion Processor job converts each batch's PDF and email message to image format. (Image format is needed for later bar code recognition.) The processor merges the two documents to a single document, so that the email's information is available, if needed. If the email contains two PDF expense reports, the email message should display as the last page of each expense report document. The Document Conversion Processor forwards the batch to the Recognition Processor.

  3. A Recognition Processor job performs bar code recognition and indexing of each document. Recognition Processor forwards the batch to the Commit Processor.

  4. The Commit Processor commits the batch.