The Input Agent is an Imaging service used to upload and index documents in bulk into the Imaging system.
Input Agent indexes Imaging documents in bulk by using an application definition, input definition, and a specially formatted text file called an input file. The input file specifies the list of images to index and the metadata to associate with them in the application. Input Agent is multithreaded and is configurable to handle large and small volumes of data.
To configure the Input Agent, do the following:
Note:
In order to process input files, the Input Agent must have the appropriate permissions on the input directory and the input directory must allow file locking. The Input Agent requires that the user account that is running the WebLogic Server service have read and write privileges to the input directory and all files and subdirectories in the input directory. These privileges are required so that Input Agent can move the files to the various directories as it works on them. File locking on the share is needed by Input Agent to coordinate actions between servers in the cluster.
After completing these steps, the Input Agent is active and ready to process work. Once you create an application (see Creating An Application) and input definitions (see Creating Input Definitions), the Input Agent will start processing jobs.
The Input Agent performs work based on input files. These are simple text documents, similar to CSV (comma-separated values) files, that contain lists of files and associated metadata to index into the Imaging system. The input file can use different encodings as long as the correct encoding is specified in the input definition. Input Agent looks for all input files that match the input mask of the input definition and not the sample file that is used to define the input definition. Note that sample files are not required when creating an input through the API. They are only used when creating an input through the user interface so a user can see the data to help choose the columns.
WARNING:
Input file masks must be unique to the Imaging system and cannot overlap. Input Agent only processes an input file for one input and will not restage a file to be processed again for a different input definition. The order in which inputs are processed is random so it is unknown as to which input will pick up a shared input file.
A sample input file looks like:
C:\IPMData\Input Files\print\NewPrintstreams\doc16.txt|NEW ORDER|10/06/94|B82L|218482 C:\IPMData\Input Files\print\NewPrintstreams\doc17.txt|NEW ORDER|10/06/94|N71H|007124 C:\IPMData\Input Files\print\NewPrintstreams\doc18.txt|NEW ORDER|10/06/94|B83W|24710
The detailed structure of an input file is defined as:
[path to document file][delimiter][metadata value 1]<[delimiter]<metadata value 2> ... <delimiter>>
Items in brackets ([]) are required and items in angle brackets (<>) are optional.
path to document file
is the location of the tiff, jpeg, doc or other file type that is being saved to Imaging. It must be a path that is accessible to the user account running the Input Agent.
delimiter
is the character that separates the values from one another, such as the | character.
metadata value x
are the index values that the application uses to index the document.
The delimiter character must be the same character throughout the entire input file and match what is specified in the input definition. The default is a pipe character (|).
Only one metadata value is required per required field in the application. For example, if a Name and Date field are both marked as Required in an application, then the input file must have values for both the Name and Date field as well. Additional values are optional but they must continue to follow the [delimiter]<metadata value> format.
There is no length restriction per line, but all metadata pertaining to the file must be on a single line because the newline character specifies the start of a new document.
Each value is separated by a delimiter, with the delimited values treated by the Input Agent as Column 1... Column N. Any commands on the line do not count as a column. See Using Input Filing Commands.
Columns in the input file need not match the ordering of the Application, but they must be in the same locations as specified in the input definition to be indexed correctly.
Note:
Dates and times specified in the input file are subject to current Daylight Savings Time rules, and not the DST rules in effect for the specified date. This can cause the timestamp of the document to shift forward or back up to two hours. If the timestamp shifts forward or back across midnight, the date used for the document input may also shift.
Input Agent gives users more control over the filing process by inserting special command sequences in the input file. An Input Definition applies to all files, but commands can be inserted by Input Agent in the input file as needed and can change from file to file, offering the flexibility of setting a specific behavior per file, such as the file locale for changing date formats or numeric display.
These commands can be used for processing the entire input file or just a single row of the file, depending on the command. The details of the individual commands are specified below.
The locale command changes the locale which the agent uses to parse the data. This command can only be used once at the beginning of the input file before any documents are specified. If the command is used after data has been processed then an error will occur and the filing will stop.
Syntax
@Locale[delimiter][locale]
Example
@Locale|es-es
Notes
This command can only be used at the very beginning of the input file and applies to the whole file. If multiple locales need to be used then that data must be separated into different files. The delimiter must be the same as is used throughout the input file. The locale follows the format of ISO Language - ISO Country code.
The new command creates a new document in the Imaging system and behaves the same as leaving the index data on a line by itself. The command only applies to the line that is annotated and will reset on the next line.
Syntax
@New[delimiter][line data]
Line Data: The metadata values for the document as would exist on a typical input file.
Example
@New|TestTiff.TIF|98.765|Good Company LTD|10/08/2003|0000|1.733,12|10/09/2003
Notes
The @New at the beginning of the line is not counted as one of the columns to be mapped.
The supporting content command allows the user to apply a file as supporting content to a document instead of creating a new document. The content is applied to the last new document line that appears in the input file unless an explicit document ID is specified in the command. If the last new document fails to index then the supporting content command also fails since the intended document to add content to doesn't exist.
Syntax
@Support[delimiter][key][delimiter][content path]<[delimiter][document id]>
Key: The supporting content key to store the file under. It must be unique for the document.
Content Path: The path to the file to save as supporting content.
Document ID (optional): The Imaging document ID that the supporting content should be applied to. If this value is given then the previous new document is ignored and the supporting content is placed on the document ID given.
Example
@Support|supporting content key 1|C:\temp\sample.tif
The apply annotations command applies a pre-generated annotation file to a document. The annotation is applied to the last new document line that appears in the input file unless an explicit document ID is specified in the command. If the last new document fails to index then the apply annotations command also fails since the intended document to apply annotations to doesn't exist.
Note that multiple annotation commands overwrite each other. They are not cumulative.
Note:
Use this command to apply annotations only when uploading new documents to Imaging using the Input Agent. It is not recommended to use this command to apply annotations to existing documents as it will overwrite any existing annotations associated with the document.
Syntax
@Annotation[delimiter][file path]<[delimiter][document id]>
File Path: The path to the annotation file to apply to the document.
Document ID (optional): The Imaging document ID that the annotation should be applied to. If this value is given then the previous new document is ignored and the supporting content is placed on the document ID given.
Example
@Annotation|C:\temp\annot.xml
The workflow inject document command kicks off a workflow injection for the specified document id. The command is only intended for use in the error file and is documented here for informational purposes only.
Syntax
@WorkflowInjectDoc[delimiter][document id]
Document ID (required): The Imaging document ID to inject into workflow.
Example
@WorkflowInjectDoc|2.IPM_014404
This section describes how the Input Agent processes the input files.
The input directory specified in the configuration MBean is the top level of the directory structure. Below the top level input directory, the Input Agent creates and manages other directories in the following structure to process its work. Directory definitions follow the following file structure.
Input - Errors – Holding – Processed — YYYY-MM-DD – Samples – Stage
Directory | Definitions |
---|---|
Input |
This is the top level that is defined in the configuration MBean. This is where Input Agent looks for new input files. There can be multiple input directories defined in the MBean and each entry in the MBean will have this same structure below it. |
Errors |
Whenever an input file has a mixture of failed index attempts along with some successful indexes, an error file is created for that filing in this directory. |
Holding |
If CleanupExpireDays and CleanupFileExclusionList MBeans are enabled, the holding directory stores any successfully processed file, including annotation and supporting content files. The images remain there until the number of days specified in the CleanupExpireDays setting elapses. After that point the files and the batch folder are deleted. Specify any files that should not be deleted in the CleanupFileExclusionList setting with exact file names. |
YYYY-MM-DD |
These directories are date values in the form of year-month-day (such as 2009-04-01) that organize the input files by the date they were processed. This gives the date of when the file was processed and prevents any one directory from getting too many files in it. |
Processed |
Files under this directory have been parsed all the way through the filing process. If an error occurred during processing, then an error file is placed in the Errors directory and the original file is placed in the Processed directory even if no document is created in the Imaging system. |
Samples |
This directory contains all the sample files that work with input objects through the user interface. Files in this directory are visible in the input wizard under the user interface and should not contain production data. Note that the Samples directory location is configured separately from the input directories and may not be under the input directory. |
Stage |
Files in this directory have been selected for processing and are being worked on by the agent. Once the filing is complete, the file is moved to the Processed directory. If the processing fails, an error report is generated. |
Input Agent polls for input files, stages them, and posts a message to the JMS queue that there are files available for processing. Input ingestors listen to the JMS queue and start processing staged files. The sequence of events is as follows:
First, Input Agent polls for files:
Once input ingestors are notified that there are files staged for processing in the JMS queue, they begin processing the files:
A work manager is an Oracle WebLogic Server concept for controlling how many threads are assigned to a process. In Imaging, they are used to control how many threads are assigned to the Input Agents and for increasing or decreasing their load on the system. On a new installation, Input Agent is assigned 10 threads. You can reconfigure how many threads Oracle WebLogic Server should provide to the Input Agents by changing the default settings of the WebLogic Server work manager InputAgentMaxThreadsConstraint (default 10) to match your system needs. The number of maximum threads should be adjusted equally on all systems to avoid one machine falling behind and creating a backlog. A value of -1 or 0 disables the constraint. Values above 1 constrain the number of threads to the specified number.
To update thread settings, complete the following steps from the WebLogic Server administration console. For more information about WebLogic Server, see Administering Server Environments for Oracle WebLogic Server.
Input Agent has a retry mechanism to allow it to reattempt processing the input file in the event of a recoverable error. An example of this type of error is when the repository is not yet available and needs to finish initializing. When Input Agent detects a recoverable error, it puts the filing back on the JMS queue. The queue has a configurable retry wait timer that prevents the input file from being reprocessed immediately. You can also set the InputAgentRetryCount MBean to control how many times a job can be retried. The default is 3, after which the job is placed in the failed directory.
To troubleshoot any input file errors, do the following: