VOID expressions return no value but are used to perform other work. The VOID RETRIEVE_URL expression processes records that have a URL property by retrieving the URL, its corresponding document content, and metadata.

RETRIEVE_URL requires a STRING sub-expression that names a file created to store the document content from the URL. The STRING DIGEST expression is typically used to generate the file.

Forge adds the location of the file, the document content, and other values to the record as property values. The file containing the document content must be unique for each record or Forge overwrites the content when processing subsequent records.

Parameters that affect how this expression retrieves URLs can be expressed as record properties to configure URL retrieval. These parameters include connection time outs (Endeca.Fetch.ConnectTimeout), data transfer rates (Endeca.Fetch.TransferRateLowSpeedLimit), the use of proxy servers (Endeca.Fetch.Proxy), and so on. See "Implementing the Endeca Crawler" in the Forge Guide for information about metadata properties and configuration properties that the expression retrieves or stores with the record.

The following optional expression nodes modify the behavior of VOID RETRIEVE_URL:

See the EXPRESSION element for DTD and attribute information.


Copyright © Legal Notices