5 Creating Recognition Processor Scripts

This chapter discusses creating Recognition Processor scripts.

The following are common uses for Recognition Processor scripts:

For a list of Recognition Processor events, see Section 5.2, "Script Design."

This chapter covers the following topics:

5.1 Batch Job Processing Order of Events

Events are executed in the following order in recognition processor batch jobs:

  1. initialize

  2. processBatch

  3. restoreCaptureBatch (only if the batch has previously failed during document creation)

  4. beginPhase

  5. extractBatchItem

  6. barcodesFoundonItem

  7. endPhase

  8. beginPhase

  9. batchItemAllValidBarcodes

  10. determineSeparatorPage

  11. batchItemValidBarcode (This applies to the bar code on every page organization type. It only happens when Optimize Bar Code Recognition is turned on and the processor is unable to find a bar code on a batch item.)

  12. endPhase

  13. beginPhase

  14. determineDocType

  15. endPhase

  16. beginPhase

  17. beginDatabaseLookup

  18. determineIndexValues

  19. endPhase

  20. beginPhase

  21. renameOrigCaptureDocTitle

  22. createCaptureDoc

  23. endPhase

  24. beginPhase

  25. postProcess

  26. endPhase

  27. endBatchProcess

5.2 Script Design

This section describes classes that can be used in script design, which include the following:

5.2.1 RecognitionJob

The constants for the bar code symbologies are as follows.

Constants

Description

BARCODE_CODEABAR=0

Codeabar

BARCODE_CODE128=1

Code 128

BARCODE_CODE39=2

Code 39

BARCODE_CODE93=3

Code 93

BARCODE_EAN13=4

EAN-13

BARCODE_EAN8=5

EAN-8

BARCODE_INTERLEAVED25=6

Interleaved2/5

BARCODE_UCCEAN128=7

UCC/EAN 128

BARCODE_UPCA=8

UPC-A

BARCODE_UPCE=9

UPC-E

BARCODE_AIRLINE25=10

Airline(IATA) 2/5

BARCODE_CODE32=11

Code32

BARCODE_DATALOGIC25=12

Datalogic 2/5

BARCODE_INDUSTRIAL25=13

Industrial 2/5

BARCODE_ISBNADDON2=14

ISBN Addon 2

BARCODE_ISBNADDON5=15

ISBN Addon 5

BARCODE_MATRIX25=16

Matrix 2/5

BARCODE_POSTNETPLANET=17

Postnet/Planet

BARCODE_PATCHCODE=18

Patch Code

BARCODE_DATAMATRIX=19

Data Matrix

BARCODE_PDF417=20

PDF417

BARCODE_QRCODE=21

QR code


The following are constants for the document organization type.

Constant Description

ORGANIZE_SINGLEPAGE=0

Organize documents based on a fixed number of pages per document.

ORGANIZE_SINGLEDOC=1

Do not perform document organization.

ORGANIZE_MULTIPAGE_BARCODE=2

Organize documents based on the same bar code value on each page.

ORGANIZE_MULTIPAGE_SEP=3

Organize documents based on separator pages

ORGANIZE_MULTIPAGE_OTHER=4

Organize documents based on hierarchical separator pages.


The following are values used by database lookup.

Values Description

DBLOOKUP_NONE=0

No database lookup is configured.

DBLOOKUP_BARCODE=1

Use a bar code value to perform database lookup.

DBLOOKUP_INDEXFIELD=2

Use the index field value to perform database lookup.


The following show actions to take when a database lookup finds multiple records.

Actions Description

DBMULTIPLERECORD_USEFIRST=0

Use the first record found during database lookup.

DBMULTIPLERECORD_DONOTLINK=1

Do not populate the database lookup result.


The following show what action to take when a database lookup finds no match.

Actions Description

DBNOMATCH_ALLOWCOMMIT=0

Permit the batch to be committed even when no database record is found.

DBNOMATCH_NOCOMMIT=1

Do not allow the batch to be committed when no match is found.


The following are operators.

Operators Description

OPERATOR_OR = 0

The OR operator, used in cover page definition rules.

OPERATOR_AND = 1

The AND operator, used in cover page definition rules.


The following values show how the document type is dynamically determined.

Values Description

DOCTYPE_NONE = 0

The document type is not dynamically determined.

DOCTYPE_BARCODE = 1

The document type is dynamically determined based on a bar code value.

DOCTYPE_SEPARATOR = 2

The document type is dynamically determined based on a separator page.


The following are actions to take when multiple bar code values are found for a bar code definition.

Actions Description

MULTIBARCODE_USEFIRST=0

Use the first bar code value found.

MULTIBARCODE_USELAST=1

Use the last bar code value found.

MULTIBARCODE_CLEAR=2

Do not use the bar code values.


The following properties apply for this class.

Properties Description

String workspaceName

Name of the workspace with which this job is associated.

String workspaceID

ID of the workspace with which this job is associated.

String jobID

Job ID.

Date lastModifiedDateTime

Date and time the job was last modified.

String lastModifiedUserID

ID of the user that last modified the job.

String name

Job name.

String description

Job description.

String scriptID

ID of the script with which this job is associated.

List<BarcodeDefinition> barcodes

List of bar code definitions.

Boolean autoDetectBarcodes

This determines whether Enable Auto-detect Bar Codes is turned on.

Boolean validateCheckSum

This determines whether Validate Optional Checksum is turned on.

List<Integer> symbologies

A list of selected bar code symbologies for recognition: valid values are from 0 - 21, as defined in the constants for bar code symbologies earlier in this section.

Integer batchOrganization

Document organization type; valid values are from 0 - 4, as defined in the constants for document organization type earlier in this section.

Integer documentPageCount

For the "Fixed number of pages per document" document organization type, this property refers to the maximum number of pages per document.

Integer pagesPerDoc2ReadBarcodes

For the "None: Do not perform document organization" document organization type, this property refers to the number of pages per document to read bar codes.

Integer maxPageCountPerDoc

For the "Same bar code value on each page, or Separator pages" document organization type, this property refers to the maximum number of pages per document.

BarcodeDefinition multiPageDocBarcode

For the "Same bar code value on each page" document organization type, this property refers to the bar code that determines document separation.

Boolean optimizeBarcodeDetection

For the "Same bar code value on each page" document organization type, this property determines whether to optimize bar code detection

List<SeparatorDefinition> coverPages

InFor the "Separator pages, Hierarchical separator pages, None: Do not perform document organization" document organization type, this property holds the data that defines the separator page. When the hierarchical separator page is used, the list may contain more than one separator page definition, while in the other two scenarios, the list will only contain one separator page definition.

Integer multiBarcodeValuesOption

Actions to take If more than one value is found for a bar code within a document; valid values are 0-2 as defined in the constants.

Integer dynamicDocType

Options on how the Dynamic Document Profile is determined; valid values are 0-2 as defined in the constants.

String defaultDocTypeID

ID for the Default Document Profile.

BarcodeDefinition docTypeBarCode

When the Document Profile is being dynamically determined using the bar code, this property represents the selected bar code.

List<DocumentDefinition> docTypeMappings

When the Document Profile is being dynamically determined using the bar code, this mapping represents the Document Profile and Bar Code Value Mappings.

List<RecognitionJobField> jobFields

Field mappings information.

Integer dblookupUsing

The type of value the database lookup will be using; valid values are 0-2 as defined in the constants.

BarcodeDefinition dblookupBarcodeField

This is the bar code definition that is selected for database lookup.

String dblookupIndexDefID

This is the metadata field ID that is selected for database lookup.

String dblookupProfile

Database lookup profile ID.

String dblookupSearchField

Database lookup search field ID.

Integer dblookupMultipleRecordAction

Actions to take when more than one record is found during database lookup; valid values are 0-1 as defined in the constants.

Integer dblookupNoMatchAction

Actions to take when no record is found during database lookup; valid values are 0-1 as defined in the constants.

String renamePrefix

Part of post-process setting. When there is no system error, this is the batch prefix to rename, if needed.

String renameEmail

Part of post-process setting. When there is no system error, this is the email address to send email notification to rename, if needed

String renameStatus

Part of post-process setting. When there is no system error, this is the batch status to change, if needed.

Integer renamePriority

Part of post-process setting. When there is no system error, this is the batch priority to change, if needed

String processorID

Part of post-process setting. When there is no system error, this is the batch processor ID to which the batch will be released.

String processorJobID

Part of post-process setting. When there is no system error, this is the batch processor job ID to which the batch will be released.

String failureRenamePrefix

Part of post-process setting. When there is a system error, this is the batch prefix to rename, if needed.

String failureRenameEmail

Part of post-process setting. When there is a system error, this is the email address to which notification should be sent, if needed.

String failureRenameStatus

Part of post-process setting. When there is a system error, this is the batch status to change, if needed.

Integer failureRenamePriority

Part of post-process setting. When there is a system error, this is the batch priority to change, if needed.

String failureProcessorID

Part of post-process setting. When there is a system error, this is the batch processor ID to which the batch will be released.

String failureProcessorJobID

Part of post-process setting. When there is a system error, this is the batch processor job ID to which the batch will be released.


5.2.2 BarcodeDefinition

The following are the constants for the bar code validation rule type.

Constants Description

VALIDATION_NONE=0

No validation rule is specified.

VALIDATION_LENGTH=1

Use the bar code length to validate.

VALIDATION_MASK=2

Use the mask to validate.

VALIDATION_REGULAREXPRESSION=3

Use a regular expression to validate.

VALIDATION_PICKLIST=4

Use a choice list to validate.


The following are the properties for this class.

Properties Description

String barcodeName

Bar code definition name.

Integer validationRule

Bar code validation rule; valid values are 0-4, as defined in the constants.

Integer validationLength

Validation length.

String validationMask

Validation mask.

String validationRegularExpression

Validation regular expression.

String pickListSourceID

Validation choice list source ID.

String pickListID

Validation choice list ID.


5.2.3 SeparatorDefinition

The following are the properties for this class.

Properties Description

String name

Name of the separator page.

Boolean deleteUponCommit

Determines whether to delete the separator page after commit.

Integer operator

Operator used for rules; valid values are 0 and 1, as defined in the RecognitionJob constants.

String docTypeID

If the document type is dynamically determined based on a separator page, this is the ID of the document type for this separator page.

List<SeparatorRuleDefinition> rules

Rules for this separator page.


5.2.4 SeparatorRuleDefinition

The following are the properties for this class.

Properties Description

String name

Name of the rule.

Integer operator

Operator used for patch code and bar codes selected; valid values are 0 and 1, as defined in the RecognitionJob constants.

String patchCode

Patch code selected for this rule.

List<String> barcodes

Bar codes selected for this rule.


5.2.5 DocumentDefinition

The following are the constants for the document type mapping option.

Constants Description

BARCODE_VALUE=0

This determines document type based on bar code value.

BARCODE_PICKLIST=1

This determines document type based on values in the choice list.


These are the properties for the DocumentDefinition class.

Properties Description

String docTypeID

Document type ID.

Integer mappingType

This sets whether to determine document type based on bar code value or choice list; valid values are 0 and 1, as defined in the constants.

String value

Bar code value specified.

String pickListSourceID

Choice list source ID specified.

String pickListID

Choice list ID specified.


5.2.6 RecognitionJobField

These are the constants for the auto-populate type.

Constants Description

AUTOPOPULATE_NONE=0

Do not auto-populate the index value.

AUTOPOPULATE_BARCODE=1

Auto-populate the index value with the bar code value.

AUTOPOPULATE_BATCHNAME=2

Auto-populate the index value with the batch name.

AUTOPOPULATE_DEFAULT=3

Auto-populate the index value with a default value.

AUTOPOPULATE_INDEXDATE=4

Auto-populate the index value with the index date.

AUTOPOPULATE_SCANDATE=5

Auto-populate the index value with the scan date.


These are the properties for the RecognitionJobField class.

Properties Description

String indexDefID

Metadata ID to populate with property values.

Integer autoPopulate

Auto-populate type; valid values are 0-5, as defined in the constants.

String populateValue

For the bar code type, this represents the bar code definition name; for the default type, this represents a default value.


5.2.7 RecognitionProcessorContext

Properties Description

Logger logger

Logger for user to log additional entries.

RecognitionJob job

Current job being used.

BatchLockEntity batchLockEntity

Current batch that is being processed.

CaptureWorkspaceEntity workspaceEntity

Current workspace that is being used.

int phaseID

An integer that identifies the current phase:

0 - not in a phase. Examples include pre-process validation and batch clean up.

1 - bar code recognition

2 - document organization

3 - document classification

4 - indexing

5 - document creation

6 - post processing

boolean cancelAction

In certain calls, the user is allowed to cancel the action (for example, bar code recognition or database lookup).

BatchItemEntity batchItem

Current batch item being processed. This is specifically used during bar code recognition and bar code validation (part of the document organization phase).

Integer patchCodeRead

Patch code found on a batch item. This is only used during the bar code recognition phase.

List<String> barcodesRead

All bar codes associated with a batch item, which includes original bar codes associated with the batch item, and bar codes read through the bar code recognition engine. This is only used during the bar code recognition phase.

List<ProcessorItem> validBarcodes

This is a list of valid bar codes found for a specific batch item. This only applies to the bar code validation step (part of the document organization phase).

ProcessorDocument also contains a list of valid bar codes, which is associated with a specific document. It is a collection of all valid bar codes found on all batch items associated with the document.

ProcessorItem validBarcode

This is specific to the bar code that determines document separation, and specific to the optimized bar code recognition setting.

ProcessorSeparatorPage separator

This is called for organization types that involve a separator page. If the separator is null, then this batch item is not a separator page.

ProcessorDocument document

This is used for the document classification, indexing, and document creation phase. It contains everything the user needs to know about the document.

String dbLookupValue

This is only used before database lookup is executed. The user can change the lookup value.

String unIndexedDocTitle

This is specific to the Document Creation phase. This property allows the user to customize the first Capture document title. The default title is unindexed; if this value is null, then the first document title will remain unchanged.

String extractPath

Path to which batch items were extracted. This is specific during the bar code recognition phase. The user should not modify this property.


5.2.8 PostProcessContext

Properties Description

String renameBatch

Name that the batch will be renamed to during post process. If null, the batch will not be renamed.

Integer priority;

Priority that the batch will be changed to during post process. If the priority is not valid (<0 or >10), the batch priority will remain the same.

BatchStatusEntity status;

Status entity object that the batch will be associated with during post process. If null, the batch status will remain the same.

int batchState;

If there were some errors during the recognition process, the batch state will be preset to BATCH_STATE_ERROR; otherwise, the batch state will be preset to BATCH_STATE_READY.

List<String> emailRecipients;

A list of email recipients that email notification will be sent to. If empty, no email will be sent.

String emailSubject;

Subject line of the email notification.

String emailMessage;

Main message body of email notification. If empty, no email will be sent.

String processorID;

The processor ID to which the current batch will be released.

String processorJobID;

The processor job ID to which the current batch will be released.


5.2.9 ProcessorSeparatorPage

Properties Description

boolean include

Indicates whether this separator page will be deleted after commit.

int level;

This is only used in the hierarchy separator pages organization type. Level starts with 1.

String name;

Separator page name.

String batchItemID;

The batch item with which this separator page is associated.


5.2.10 ProcessorItem

Properties Description

String name;

String value;

ProcessorItem is a class that holds name/value pair data. In the script, it is used to hold barcode name/values.


5.2.11 ProcessorDocument

Properties Description

String title;

Title of the document, which is populated during the document creation phase.

List<String> batchItems;

All batch items associated with this document. This is populated during the document organization phase.

List<ProcessorItem> validBarcodes;

Valid bar codes associated with this document. This is a combination of all valid bar codes found for all batch items associated with this document. This is populated during the document organization phase.

int failureStatus;

Status of the current document.

  • 0 - no error

  • 1 - failed to validate bar code. This is the case when the processor finds duplicate bar codes in a document that matches the bar code validation rule, and the job setting is to clear the value.

  • 2 - document exceeded maximum page rule.

  • 3 - unable to determine document type.

  • 4 - no database search result found, and job setting is to prevent commit when no record is found.

String docTypeID;

Document type ID associated with the document. If null, no ID is determined.

String comment;

Comments for the document. It is usually error detail for 'failureStatus,' which the user can customize through script.

String captureDocID;

This is only used in the "One Document" batch organization type, where the processor does not organize documents, and does not create any Capture documents. This ID is the Capture document ID.

ProcessSeparatorPage separator;

Separator page of this document. This applies to the "One Document" and "multiple page document with separator" organization types.

List<ProcessSeparatorPage> hierarchySeparators;

Separator pages for this document. This applies to the "multiple pages with hierarchy separator" organization type.

List<IndexValue> indexValues;

List of metadata names and values.


5.3 Script Methods

This section describes script methods, which include the following:

5.3.1 initialize (RecognitionProcessorContext rpc)

This is the very first call the recognition processor makes to script. There is no batch identified yet.

Properties populated in rpc are:

  • logger: Logger can be used to log additional entries. This property remains during the entire process, and does not repeat for every method.

  • job: current Recognition Job. This property remains during the entire process, and does not repeat for every method.

  • workspaceEntity: Current workspace entity. This property remains during the entire process, and does not repeat for every method.

  • phaseID: 0

5.3.2 processBatch (RecognitionProcessorContext rpc)

This is called before the Recognition Processor processes the batch.

  • phaseID: 0

  • batchLockEntity: At this point, the recognition processor has refreshed the document list for the batch. This property will remain during the remainder of the process, and will not repeat for the rest of the methods.

  • cancelAction: The user can set the flag to true to skip processing of a batch.

5.3.3 restoreCaptureBatch (RecognitionProcessorContext rpc)

When the following occurs:

  • The job organization type requires Capture batch.

  • There is only one image document.

  • This batch has more than one document.

  • The batch state indicates that the processor failed at the document creation phase.

The processor makes sure that both batch and job have not been modified since the last process. Under this circumstance, the processor attempts to restore the batch to its original state by removing previous documents created by the recognition process. This method is invoked prior to restoring the batch, which allows the user to cancel restore, and make the batch invalid for processing.

  • phaseID: 0

  • cancelAction: You can set the flag to true to skip restoring of the batch, and the process skips processing this batch.

5.3.4 beginPhase (RecognitionProcessorContext rpc)

This indicates the beginning of a phase.

  • phaseID: There are six different phases (see RecognitionProcessorContext phaseID for detail).

  • cancelAction: You can set the flag to true to skip certain phases. For phases that cannot be skipped, this flag is ignored.

    • Phases that can be canceled are: bar code recognition, document classification, and indexing.

    • Phases that cannot be canceled are: document organization, document creation, and post-processing.

5.3.5 endPhase (RecognitionProcessorContext rpc)

This indicates the end of a phase.

  • phaseID: There are six different phases (see RecognitionProcessorContext phaseID for detail).

5.3.6 extractBatchItem(RecognitionProcessorContext rpc)

This happens during the barcode recognition phase. The processor extracts batch items one at a time into a directory right before the processor performs bar code recognition on the page. Then the processor informs the user where the items are.

  • phaseID: 1

  • extractPath: The directory where the batch items are located.

5.3.7 barcodesFoundOnItem(RecognitionProcessorContext rpc)

This is invoked after the processor processes the batch item, and collected and recognized bar codes on this item.

  • phaseID: 1

  • batchItem: Current batch item that is used to perform bar code recognition.

  • patchCodeRead: Patch code value found on the batch item.

  • barCodesRead: A combination of bar codes read on the page and existing bar codes on the batch item.

5.3.8 batchItemAllValidBarcodes (RecognitionProcessorContext rpc)

This occurs after the recognition processor has finished validating bar codes on a specific batch item.

  • phaseID: 2

  • batchItem: Current batch item that is used to perform bar code validation.

  • validBarCodes: A list of name/value pairs of the valid bar codes found on the batch item. This list includes all bar codes definitions in the recognition job. You can change the value, but you shouldn't change the name, or add or remove items from the list.

5.3.9 determineSeparatorPage(RecognitionProcessorContext rpc)

This occurs after the processor has validated whether this page is a separator. This method is only invoked when a separator is defined for a recognition job. Properties used in this calls are:

  • phaseID: 2

  • batchItem: Current batch item that is to determine whether it is a separator or not.

  • validBarCodes: A list of name/value pair of the valid bar codes found on the batch item. This list includes all bar codes definitions in the recognition job.

  • separator: This object is null unless this batch item is a valid separator. If you want to make changes, you need to either set it to null, or populate it with valid data.

Level is used for the hierarchy separator page type only. For the other organization type, this value is ignored. Level should begin with 1.

You can change the level determined by the processor. However, if the level does not fit into a recognition job definition, the processor uses either the highest level (level<=0) or lowest level (level>=max defined level).

5.3.10 batchItemValidBarcode (RecognitionProcessorContext rpc)

This method passes in one valid bar code recognized on this batch item. This call will only happen when the batch organization type is bar code on every page, and optimized bar code recognition is turned on,

When the processor cannot find a bar code on a page, it will try to determine the separator bar code value on the next page. This is called right after the processor has determined the bar code value on the next page.

  • phaseID: 2

  • batchItem: Next page batch item that is to determine the separator bar code value.

  • validBarcode: Name/value pair for the separator bar code. User can change the value if needed.

5.3.11 determineDocType(RecognitionProcessorContext rpc)

This is called after the recognition processor has identified a document type as either the default document type or one of the dynamic document type mappings. docTypeID can be null if the processor is unable to identify it.

  • phaseID: 3

  • document: Contains the current document information. Some properties are specific to certain organization type. The docTypeID needs to be examined here, and changed if needed.

5.3.12 beginDatabaseLookup(RecognitionProcessorContext rpc)

This is called after the recognition processor has determined the lookup value, and before the actual execution of the lookup is called.

  • phaseID: 4

  • dbLookupValue: User can modify the lookupValue.

  • cancelAction: User can cancel lookup.

5.3.13 determineIndexValues(RecognitionProcessorContext rpc)

This is called after the Recognition Processor has determined all metadata values for a particular processor document. The user gets a chance to modify values.

  • phaseID: 4

  • document: Contains the current document information. Some properties are specific to certain organization types. The indexValues needs to be examined here, and changed if needed.

5.3.14 renameOrigCaptureDocTitle (RecognitionProcessorContext rpc)

This is called before the processor renames the original document into "unindexed." This applies to all batch organization types except the "One Document Only" type.

  • phaseID: 5

  • unIndexedDocTitle: The user can change the title.

5.3.15 createCaptureDoc(RecognitionProcessorContext rpc)

Before the processor creates the Capture document, it is possible to customize the document title, document type id, metadata values, and document comments. You can also change the batch items associated with this document, although in the case of the "One Document Only" organization type, changing batch items does not affect the outcome.

  • phaseID: 5

  • document: Capture document that the processor is about to create.

5.3.16 postProcess(PostProcessContext postContext)

This is invoked after the Recognition Processor has determined all post-process settings, but before any actual changes take place.

  • postContext: The user can make changes to affect the post-process outcome, if needed.

5.3.17 endBatchProcess()

This indicates that the Recognition Processor has finished processing the batch.