1 Learn About Content Capture

The content capture features of Oracle Content Management provide you with one system to capture, index, store, and manage your mission-critical business content. You can scan and import documents in bulk, and process them automatically before they’re uploaded to Oracle Content Management. Documents consist of one or more images obtained from a scanner or imported from a file, or they can be non-image, electronic files such as Microsoft Word or PDF files. When you import non-image files, the defined capture flow determines whether they are retained in their original format, converted to an image format, or prevented from being imported.

The batches of documents that you create are scalable, allowing you to reorganize documents, automate their grouping to suit your business needs, read barcodes for billing or filing purposes, index documents to make them easily searchable, and convert them to standard formats for your organization. You create content capture workflows, or procedures, which automate the processing and routing of physical and electronic documents in bulk.

The primary drivers for capturing content are batches and documents. Documents are scanned or imported and maintained in batches. A batch consists of scanned images or electronic document files (such as PDF or Microsoft Office files) which are organized into documents and assigned metadata values (indexed). Each document shares a set of metadata values. Oracle Content Management provides a variety of content capture processors which import documents, convert them to PDF and/or TIFF, auto-recognize bar codes, automatically separate documents, populate metadata values, and deliver the final output to Oracle Content Management.

The content capture process involves the following main components:

Capture

Scanning or importing documents into batches within a content capture procedure can be done in a variety of ways:

  • High volume scanning using a production document imaging scanner

  • Ad hoc remote scanning or import, such as from a business application

  • Automated import, such as from an email account or monitored folder

End-users can manually scan hardcopy documents or import electronic documents into batches using the Content Capture Client software (based on client profiles created by procedure managers). Alternatively, using settings stored in an import job, the import processor can also automatically import images and other electronic documents directly from email, network folders, or list files.

Conversion

Depending on your business needs, you may need to convert non-image input documents and attachments to a different format. For example, PDF expense reports attached to imported email messages may need to be converted to an image format to allow their bar codes to be read. In this case, the TIFF conversion processor converts PDF files to TIFF images. The TIFF conversion processor automatically converts documents or attachments and merges them within a batch using settings stored in a conversion job. The PDF conversion processor converts documents to PDFs of the same content type as of the source documents.

Classification

Classification is the process of separating batches into their logical documents and assigning document profiles. The client profile specifies a set of possible metadata fields and attachment types available to each document. Classification also involves assigning a status to a batch.

Classification can occur manually or automatically in a variety of ways:

Document Separation

  • Manually by Content Capture Client users. For example, users can select a client profile configured for a specific number of pages per document. They can also insert separator sheets between documents prior to scanning to identify a new document. While visually inspecting a batch, Content Capture Client users can create new documents by splitting larger documents into multiple, smaller documents.

  • Manually by users during file import in the Content Capture Client.

  • Automatically, when the import processor imports documents based on job settings.

  • Automatically, during bar code recognition by the recognition processor. If a batch is sent to the recognition processor, the processor automatically performs bar code recognition and document classification.

Metadata Assignment

Documents are assigned a set of metadata values based on a document profile, which is called indexing. This profile identifies the metadata fields available for indexing a particular type of document. Metadata values can be assigned in various ways:

  • Manually, by users in the metadata pane of the Content Capture Client.

  • Automatically, when the import processor processes documents based on job settings.

  • Automatically, during processing by the recognition processor, based on job settings.

  • Automatically, during processing by the asset lookup processor, based on job settings.

Metadata fields can be configured in a variety of ways. You can configure an input mask and a display format or provide a regular expression for validation. Metadata values can be auto populated, selected from choice lists (or derived from bar codes) and dependent choice lists. Procedure managers configure these metadata field definitions in the procedure and then use them in client profiles or processor jobs.

Attachment Type

An attachment is an image or non-image file associated with a primary document. Procedure managers define attachment types, which can be assigned to document profiles. These attachment types may be used to classify attachments with documents that have been assigned to a document profile. Content Capture Client users can view attachments, change an attachment type, create attachments, and modify image attachments.

Batch Status

Procedure managers define batch statuses to suit their business needs. The user assigns them to a batch manually at any time during the content capture process, or automatically by one of the processors.

Release

Oracle Content Management uses a lock-and-release method to ensure that only one user or processor has access to any content capture batch at any given time. A batch automatically becomes locked to you when you create or open (expand) the batch. You need to release or unlock the batch to make it available to others. When you’re done working with a batch, you release or unlock it. Releasing a batch automatically synchronizes its documents and metadata with Oracle Content Management and routes the batch for further processing (commit, recognition, or conversion), if this is configured in its client profile.

Commit

When a batch is committed, all of its documents and their metadata are uploaded to Oracle Content Management and then removed from the batch. This allows the documents to be located and accessed in Oracle Content Management via their metadata or contents. Some of the documents may not be committed. For example, documents without their required fields populated are skipped. If all documents in a batch are committed, the batch is also deleted from the procedure.

During the commit process, non-image files that weren’t converted to an image format remain in their original format.