You add support for document conversion by making the DataSourceRuntime class implement either the BinaryContentFileProvider interface or the BinaryContentInputStreamProvider interface.
The BinaryContentFileProvider interface allows the extension to pass a file to IAS Server so IAS Server can perform document conversion. The interface provides a getBinaryContentFile() method that takes a Record as input and uses a property on the Record to identify the file to read. IAS Server then reads the file directly or caches it locally (optional) and then reads the file.
The BinaryContentInputStreamProvider interface allows the extension to download and convert binary contents to an input stream so IAS Server can read the input stream and perform document conversion. A common scenario is one where the data source extension connects to a database to read content. The interface provides a getBinaryContentInputStream() method that takes a Record as input and uses a property on the Record to identify the content to read. IAS Server then caches the content locally (not optional) and reads the content as an input stream.
During the document conversion process, IAS Server examines the file, extracts the text of the file, and stores the text as the Endeca.Document.Text property on the Record. In both interfaces, the IAS Server manages file access, local file download (if enabled), temporary files, and caching.
An extension developer needs to implement one of the binary content provider interfaces, but not both, to support document conversion. An IAS application developer specifies whether document conversion is enabled by configuring the data source in XML create configuration file, by using the IAS Server API (TextExtractionConfig), or by using the IAS Server Command-line Utility.
If document conversion is enabled, an IAS data developer can also specify whether IAS Server should cache the file locally before reading it.
To see an example of how BinaryContentFileProvider.getBinaryContentFile() is used, see the IAS sample extension in <install path>\IAS\<version>\sample\ias-extensions\src\main\com\endeca\ias\extension\sample\datasource\directory\DirectoryDataSourceRuntime.java.
To see an example of how BinaryContentInputStreamProvider.getBinaryContentInputStream() is used, see the IAS sample extension in <install path>\IAS\<version>\sample\ias-extensions\src\main\com\endeca\ias\extension\sample\datasource \blob\BlobDataSourceRuntime.java.