The Endeca Content Acquisition System is a set of components that
add, configure, and crawl data sources for use in an Endeca application. Data
sources include file systems, content management systems, Web servers, and
custom data sources. The Endeca Content Acquisition System crawls data sources,
converts documents and files to Endeca records, and stores them for use in an
Forge pipeline.
The following image shows the Endeca Content Acquisition System
components as they work together in a typical implementation to crawl data
sources and produce Endeca records:

The Endeca Content Acquisition System is made up of the following
components:
- The Endeca CAS Service is a
servlet container that runs the CAS Server, the Component Instance Manager, and
any number of Record Store instances (one per data source).
- The CAS Server is the
component that manages all file system and CMS crawling operations. The CAS
Server is documented in the
Endeca CAS Developer's Guide.
- The CAS Console for Endeca
Workbench is a Web-based application used to crawl various data sources
including file systems and content management systems. During the Content
Acquisition System installation, the CAS Console is installed as an extension
to Endeca Workbench. The CAS Console is documented in the
Endeca CAS Console Help.
- The CAS Server API allows
users to write programs that communicate with the CAS Server. The CAS Server
API has a WSDL interface and also a CAS Server Command-line Utility. The API is
documented in the
Endeca CAS API Guide.
- The Dimension Value Id
Manager is a CAS component that creates, stores, and retrieves dimension value
identifiers.
- The Endeca Web Crawler
manages all Web crawl-related operations. This component is documented in the
Endeca Web Crawler Guide.
- Endeca CMS connectors are
available for use in the CAS Console for Endeca Workbench or the CAS Server
API. CMS connectors provide a means to access and crawl data sources in a wide
variety of CMS types, such as Documentum, eRoom, FileNet, JSR-170 compliant
repositories, Lotus Notes, Microsoft SharePoint, and Interwoven TeamSite.
- The Component Instance
Manager creates, lists, and deletes Record Store instances. The Component
Instance Manager has a WSDL interface and also a CIM Command-line Utility.
- The Endeca Record Store
provides persistent storage for generations of records. The Record Store has a
WSDL interface and also a Record Store Command-line Utility. The CAS Server
writes crawl output from each data source to a unique Record Store instance.
- The CAS Extension API
provides interfaces and classes to build extensions such as custom data sources
and custom manipulators. You package extensions into a plug-in and install it
into the Content Acquisition System. After you install the plug-in, the
extensions are available and configurable using the CAS Console, the CAS Server
API, and the CAS Server Command-line Utility.