VOID
expressions return no value but are used to
perform other work. The
VOID RETRIEVE_URL
expression processes records that have a
URL property by retrieving the URL, its corresponding document content, and
metadata.
RETRIEVE_URL
requires a
STRING
sub-expression that names a file created to
store the document content from the URL. The
STRING DIGEST
expression is typically used to generate
the file.
Forge adds the location of the file, the document content, and other values to the record as property values. The file containing the document content must be unique for each record or Forge overwrites the content when processing subsequent records.
Parameters that affect how this expression retrieves URLs can be
expressed as record properties to configure URL retrieval. These parameters
include connection time outs (Endeca.Fetch.ConnectTimeout
),
data transfer rates (Endeca.Fetch.TransferRateLowSpeedLimit
),
the use of proxy servers (Endeca.Fetch.Proxy
), and so on. See
"Implementing the Endeca Crawler" in the
Forge Guide for information about metadata
properties and configuration properties that the expression retrieves or stores
with the record.
The following optional expression nodes modify the behavior of
VOID RETRIEVE_URL
:
BODY_PROP_NAME
- Specifies the name of the property containing the document body. The default value of this property isEndeca.Document.Body
.URL_PROP_NAME
- Specifies the name of the property that contains the URL to retrieve. Only one URL is retrieved per record. The default value of this property isEndeca.Identifier
.REVISION_PROP_NAME
- Specifies the name of the property that contains the URL's revision information. The default value of this property isEndeca.Document.Revision
.KEY_RING
- Specifies the path to aKey_ring.xml
file that contains the authentication information which aSPIDER
uses when communicating with a host computer. Specify the path to this file in theVALUE
attribute. The path to the file may be absolute or relative to the location of thePipeline.epx
file.
See the
EXPRESSION
element for DTD and attribute information.
This example generates a file name for the retrieved file and it
specifies that a
Key_ring.xml
should be used for authentication.
<EXPRESSION TYPE="VOID" NAME="RETRIEVE_URL"> <!-- this expression generates a filename for the retrieved file --> <EXPRESSION TYPE="STRING" NAME="CONCAT"> <EXPRESSION TYPE="STRING" NAME="CONST"> <EXPRNODE NAME="VALUE" VALUE="&cwd;"/> </EXPRESSION> <EXPRESSION TYPE="STRING" NAME="DIGEST"> <EXPRESSION TYPE="PROPERTY" NAME="IDENTITY"> <EXPRNODE NAME="PROP_NAME" VALUE="Endeca.Identifier"/> </EXPRESSION> </EXPRESSION> </EXPRESSION> <!-- this expression node specifies the path to the key ring file --> <EXPRNODE NAME="KEY_RING" VALUE="key_ring.xml"/> </EXPRESSION>