The Content Acquisition System ships with a set of default data sources and manipulators. Each is described here:
Data Source |
Description |
---|---|
Delimited File |
Crawls records in delimited text files, including .csv files. |
Endeca Record File |
Crawls Endeca record files including .xml, .xml.gz, .bin, .bin.gz, .binary, and .binary.gz. |
File System |
Crawls folders and files on both local drives and network drives. |
JDBC |
Crawls a JDBC-accessible database. |
Record Store Merger |
Crawls CAS record store instances. |
For information about version support for a particular repository, see the data source's chapter in CAS Developer's Guide.
Manipulator |
Description |
---|---|
Term Extraction |
This manipulator extracts terms from Guided Search records and scores them for relevancy. |
For information about configuring a data source or a manipulator, see
the
CAS Console User's Guide or run the
cas-cmd
utility with the
getModuleSpec
task to return configuration properties.
For information about term extraction, see the
CAS Relationship Discovery Guide.