Creating a runtime class for a data source

The DataSourceRuntime is the runtime representation of a data source instance. It is created by DataSource.createDataSourceRuntime() and exists for the life span of the data source.

IAS Server creates and passes a PipelineComponentRuntimeContext class to DataSource.createDataSourceRuntime(). The PipelineComponentRuntimeContext specifies an output channel, error channel, a state directory, and several other runtime properties.

The ErrorChannel.discard() method discards any invalid records from the record acquisition process. Also, in addition to discarding records, the ErrorChannel class processes exceptions that you catch. This processing includes incrementing the appropriate metric for a record and also logging a record in the ias-service.log file. The ErrorChannel logs events at level WARN and higher.

To create a runtime class for a data source:

  1. In the Java project that contains the DataSource implementation, create a subclass of DataSourceRuntime.
    For example:
    public class CsvDataSourceRuntime extends DataSourceRuntime {
    
    }
  2. Implement the DataSourceRuntime constructor.
  3. Implement the abstract method runFullAcquisition() to define how to acquire content from the data source. The implementation depends on your custom data source.
  4. Within your implementation of runFullAcquisition(), call ErrorChannel.discard() as necessary to discard any records that are invalid or have errors, and also call OutputChannel.output() for each record that has been processed.
  5. Optionally, implement either the BinaryContentFileProvider interface or the BinaryContentInputStreamProvider interface if the data source needs to support text extraction. For guidance, see Supporting document conversion in a data source.
  6. Optionally, implement the IncrementalDataSourceRuntime interface to calculate the changes in your data source, rather than have the Integrator Acquisition System determine the changes for you. For guidance, see Supporting incremental acquisition in a data source.
  7. Optionally, handle requests to stop an acquisition by providing a mechanism to stop an extension's runtime object in a timely way. This may include polling PipelineComponentRuntimeContext.isStopped() and may include overriding PipelineComponentRuntime.stop(). For guidance, see Stopping an extension when an acquisition stops.
  8. Optionally, override PipelineComponentRuntime.endAcquisition() to clean up any resources used by PipelineComponentRuntime. For guidance, see Cleaning up resources used by an extension.

Example of a data source runtime

To see many of the steps above, refer to the sample data source extension in <install path>\IAS\<version>\sample\ias-extensions\src\main\com\endeca\ias\extension\sample\datasource\csv.