Overview of the Integrator Acquisition System

The Integrator Acquisition System, or IAS, is a set of components that crawl source data stored in a variety of formats including: file systems, delimited files, JDBC databases, Web servers, and custom data sources. IAS transforms the data, if necessary, and outputs the data to an XML file or a Record Store instance that can be accessed by Integrator ETL for use in the Endeca Server.

The following image shows the Integrator Acquisition System components as they work together in a typical implementation to crawl data sources and produce Endeca records:

Diagram of all IAS components and source data types

IAS Components

The Integrator Acquisition System is made up of the following components:
  • The Endeca IAS Service is a servlet container that runs the IAS Server, the Component Instance Manager, and any number of Record Store instances (one per crawl).
  • The IAS Server is the component that manages all crawling operations.
  • The IAS Server API allows users to write programs that communicate with the IAS Server. The IAS Server API has a WSDL interface and also a IAS Server Command-line Utility. The API is documented in the IAS API Guide.
  • The Endeca Web Crawler manages all Web crawl-related operations. This component is documented in the IAS Web Crawler Guide.
  • The Component Instance Manager creates, lists, and deletes Record Store instances. The Component Instance Manager has a WSDL interface and also a CIM Command-line Utility.
  • The Endeca Record Store provides persistent storage for generations of records. The Record Store has a WSDL interface and also a Record Store Command-line Utility. The IAS Server writes crawl output from each crawl to a unique Record Store instance.
  • The IAS Extension API provides interfaces and classes to build extensions such as custom data sources and custom manipulators. You package extensions into a plugin and install it into the Integrator Acquisition System. After you install the plugin, the extensions are available and configurable using the IAS Server API and the IAS Server Command-line Utility. This API is documented in the Integrator Acquisition System Extension API Guide.

Interaction with Integrator

After running a crawl, IAS stores the resulting Endeca records in a Record Store. The records are then available for use in Integrator ETL. In a typical data processing scenario, you create a Record Store Reader Component in Integrator and configure the component to connect to the Record Store. The Record Store Reader Component reads the records and an Integrator ETL graph processes them as necessary. For details about the Record Store Reader Component, see the Integrator ETL User's Guide available on the Oracle Technology Network.