Expressions in a record manipulator perform document retrieval, text extraction, language identification, record or property clean up, and other tasks related to crawling. These expressions are evaluated against each record as it flows through the pipeline, and the record is changed as necessary.
For in-depth information about the expressions that can be used in a record manipulator, see the Data Foundry Expression Reference.
At a minimum, a crawler pipeline requires a record manipulator with two expressions: one to retrieve documents (RETRIEVE_URL) and another to convert documents to text (CONVERTTOTEXT or PARSE_DOC). In addition to these expressions, you can include other optional expressions to delete the temporary files created on disk by RETRIEVE_URL (using REMOVE_EXPORTED_PROP).
To create a record manipulator: