2 Understand the Content Capture Process

Let’s have a look at the basic process of capturing content and uploading it to Oracle Content Management.

Shows how documents are processed from first stage to the last using required processors (and file import agent in some cases) and suitable commit drivers

Sources

Content can be captured from various sources:

Processors

Regardless of the source, each captured document is routed through a number of processors before it’s uploaded to Oracle Content Management for storage and/or further processing:
  • The import processor provides automated bulk importing from sources such as a file system folder, a delimited list text file, or the inbox/folder of an email server account. The import job monitors the source and imports content at a specified frequency (for example, once a minute, hour, or day).

  • The TIFF conversion processor automatically converts non-image documents and attachments to TIFF or JPEG format. You can choose to merge documents and attachments in various ways during conversion. For example, the conversion processor can convert document files such as PDFs or Microsoft Office documents to TIFF images for bar code processing.

  • The PDF conversion processor converts documents, images, and attachments to PDFs.
  • The recognition processor automatically recognizes bar codes, organizes documents, and indexes them.

  • The commit processor executes commit profiles to automatically output and upload documents in a batch to Oracle Content Management, and then removes the batches from the procedure.

    A commit profile specifies how to output the documents and their metadata, and it includes metadata field mappings, output format, error handling instructions, and commit driver settings.

  • The asset lookup processor enables client users to search for supported assets in the Oracle Content Management repository.

  • The XML transformation processor enables client users to transform XML documents into a desired style based on an XSLT file.
  • The taxonomy lookup processor enables users to select taxonomy categories or automate taxonomy searches using a Content Capture field value.
  • The external processor enables you to integrate existing, or new, capabilities with Content Capture. These capabilities can include types of document conversions, perhaps to Microsoft PowerPoint or other image formats. Or, the external processor could assign metadata values based on document content or sender’s email address. In general, it's a means to extend the functionality of Content Capture as a document flows through a procedure.
  • The conditional assignment processor provides a basic conditional logic that gives you the flexibility to manipulate metadata field values and change document profiles.
  • The optical character recognition (OCR) enables you to convert image documents into PDF or text.
  • The Classification Jobs processor enables you to automate detection of languages and classification of documents when the documents are received. After classification jobs are committed, asset languages are set in Oracle Content Management.

Each processor, if you’ve configured to use them all, works in coordination with the others. Many tasks in a batch flow begin with the import processor, and then pass on to the PDF or TIFF conversion processor for converting documents to the configured formats. The recognition processor subsequently takes over for recognizing bar codes, organizing documents in specified ways, and indexing them. And finally, the commit processor delivers (uploads) the output to Oracle Content Management.

All captured documents are uploaded and stored in Oracle Content Management as separate content items with the metadata assigned during the content capture process. You can access and manage these items just as any other items in Oracle Content Management.

Procedures

Procedures are defined content capture workflows, from the initial sourcing all the way to final upload to Oracle Content Management. Each procedure represents a complete content capture system, providing a centralized location to configure metadata, processing rules, configuration profiles, and batch data for a particular environment. Content Capture Client users create and access batches within a procedure to which they’ve been granted access.

You can create multiple procedures for your organization, so you can efficiently manage all your content capture and processing needs, for example by department or location. You can also share common configuration elements between procedures for optimal reuse. And you can also copy a procedure for easy adaptation for other environments.

Batches

A batch contains one or more documents, which may be related (for example, multiple documents for a customer) or unrelated (for example, documents divided by separator sheets).
  • A document may consist of scanned images or an electronic file such as a Microsoft Word or a PDF file.

  • A document may or may not contain attachments such as images or an electronic file.

When you work with a batch, you can lock it. You will see a lock icon if a batch is locked by you or another user. Releasing a batch removes the lock icon and, depending on the client profile settings, frees the batch for another user or a system processor to work on.

Client Profiles

You scan or import groups of pages in batches using a client profile that the procedure manager has defined for you. A client profile is a group of settings that determine how to scan, import, or index the documents in a batch. A client profile does the following:
  • It controls such things as scanner settings, how documents are created and separated in the batch, whether metadata fields are available, whether and how bar codes are processed, and what happens next to batches after you release them.

  • It determines whether you can capture documents only, capture and index documents, or index documents only.

  • It determines whether non-image electronic files (for example, PDF documents) should be retained in their original format, converted to an image format, or prevented from being imported.

  • It identifies the set of metadata fields to complete for a selected document.

If your client profile includes indexing, you can assign metadata values to documents such as a customer ID and name. Documents of different types in a batch typically have different sets of metadata fields available. You can assign metadata values to documents in index-only batches, but cannot append, insert, or replace pages.

When you’re done working on the documents in a batch, you release the batch, which unlocks it from your exclusive use (if there are no release processes defined). If you release a batch and selected an available release process, the next action performed on the batch depends on that release process. What happens next depends on the client profile settings:
  • The batch may be removed from the batch pane list and committed to Oracle Content Management, or it may be placed in a queue for further processing such as PDF/TIFF conversion or bar code recognition.

  • The batch may remain in the list but unlocked (no lock icon is shown). This allows you or another user to lock the batch and make further changes.