VOID expressions return no value but are used to
perform other work. The VOID PARSE_DOC expression obtains metadata and extracts
text from documents and adds the metadata and document text in the form of
property values to a record.
Both text/plain and text/html files can be extracted from documents by
this expression; other file types are passed to the Document Conversion Module
converters for parsing. See "Implementing the Endeca Crawler" in the
Forge Guide for a description of each generated
property that
PARSE_DOC adds to the record.
The following list describes the optional expression nodes that can
modify
PARSE_DOC:
FILE_PATH- Specifies whether the expression interprets the property value as a file path (to the contents of the file) or the contents of the file itself.TRUEinterprets the property value as a file path.FALSEinterprets the property value as the contents of the file.PARSE_META- Indicates whether to extract metadata of a document.TRUEextracts metadata;FALSEdoes not. The default value isTRUE.PARSE_TEXT- Indicates whether to extract the body text of a document.TRUEextracts text;FALSEdoes not. The default value isTRUE.MIMETYPE_PROP- Describes the name of the property containing the content type. TheRETRIEVE_URLexpression creates this property with a default property name ofEndeca.Document.MimeType. You do not need to modify this name unless desired.ENCODING_PROP- Describes the name of the property containing the encoding. TheRETRIEVE_URLexpression creates this property with a default property name ofEndeca.Document.Encoding. You do not need to modify this name unless desired.BODY_PROP- Describes the name of the property containing the document body. TheRETRIEVE_URLexpression creates this property with a default property name ofEndeca.Document.Body. You do not need to modify this name unless desired.TEXT_PROPDescribes the name of the property to put document text into. The RETRIEVE_URL expression creates this property with a default property name ofEndeca.Document.Text.You do not need to modify this name unless desired.
See the
EXPRESSION element for DTD and attribute information.

