The Content Acquisition System automatically creates directories under <install path>\CAS\workspace\state that you can use to store state information for a data source or manipulator extension. An extension can read, write, or delete state information from these directories as necessary.

A data source may require state information to run an incremental acquisition. For example, by relying on a file that stores the last date that the data source read from a CMS. The data source may later read from the file and pass in the date in order to run an incremental acquisition.

The path for a data source's state directory is <install path>\CAS\workspace\state\cas\crawls\crawlId\source\.

The path for a manipulator's state directory is <install path>\CAS\workspace\state\cas\crawls\crawlId\manipulators\manipulatorId .

At end of an extension's life cycle, CAS calls PipelineComponent.deleteInstance() and then CAS also deletes the contents of the state directory.


Copyright © Legal Notices