Data sources and manipulators must be thread safe.
The
stop() method can be called concurrently when any of the
following methods are running:
- DataSouceRuntime.runFullAcquisition()
- ManipulatorRuntime.processRecord()
- ManipulatorRuntime.onInputClose()
- IncrementalDataSourceRuntime.runIncrementalAcquisition()
Recommendations for data sources
The requirement to be thread safe has a few implementation
implications for data sources:
- Any state that is shared
with
runFullAcquisition() needs to be synchronized with
stop(). State may be share with
checkFullAcquisitionRequired() and the binary
content interfaces (BinaryContentFileProvider and
BinaryContentInputStreamProvider).
- If you are supporting text
extraction by implementing either the
BinaryContentFileProvider interface or the
BinaryContentInputStreamProvider interface, the data
source must be thread safe because IAS Server calls
BinaryContentFileProvider.getBinaryContentFile() or
BinaryContentInputStreamProvider.getBinaryContentInputStream()
from multiple threads.
Recommendations for manipulators
The requirement to be thread safe has a few implementation
implications for manipulators:
- If possible, use only
local variables or final immutable fields.
- Persist internal state
across calls to
processRecord() or
onInputClose() only if it is absolutely necessary.
If it is necessary, access state in a synchronized way.
For optimal performance, it is a good idea to minimize the time you
hold locks in
processRecord().
Manipulators should not hold locks when calling
OutputChannel.output() from
processRecord(). The call to
output() may take a while to return, which blocks
other threads that are concurrently calling
processRecord(). One way of holding locks is by using
the Java synchronize keyword for a method. However, synchronizing
processRecord() adversely affects performance.
Synchronizing effectively makes the manipulator single threaded by preventing
other threads from entering
processRecord().
Configuration and context synchronization
As part of the implementation of an extension, the IAS Server passes
in a
PipelineComponentConfiguration object and a
PipelineComponentRuntimeContext object to either
DataSource.createDataSourceRuntime() (in the case of
data sources) and
Manipulator.createManipulatorRuntime() (in the case of
manipulators). The IAS Server does not modify the
PipelineComponentConfiguration after
createManipulatorRuntime() or
createDataSourceRuntime() has been called.
When the IAS Server runs an acquisition, the
PipelineComponentRuntimeContext and everything
accessible from it is thread safe.