VOID
expressions return no value but are used to
perform other work. The VOID PARSE_DOC expression obtains metadata and extracts
text from documents and adds the metadata and document text in the form of
property values to a record.
Both text/plain and text/html files can be extracted from documents by
this expression; other file types are passed to the Document Conversion Module
converters for parsing. See "Implementing the Endeca Crawler" in the
Forge Guide for a description of each generated
property that
PARSE_DOC
adds to the record.
The following list describes the optional expression nodes that can
modify
PARSE_DOC
:
FILE_PATH
- Specifies whether the expression interprets the property value as a file path (to the contents of the file) or the contents of the file itself.TRUE
interprets the property value as a file path.FALSE
interprets the property value as the contents of the file.PARSE_META
- Indicates whether to extract metadata of a document.TRUE
extracts metadata;FALSE
does not. The default value isTRUE
.PARSE_TEXT
- Indicates whether to extract the body text of a document.TRUE
extracts text;FALSE
does not. The default value isTRUE
.MIMETYPE_PROP
- Describes the name of the property containing the content type. TheRETRIEVE_URL
expression creates this property with a default property name ofEndeca.Document.MimeType
. You do not need to modify this name unless desired.ENCODING_PROP
- Describes the name of the property containing the encoding. TheRETRIEVE_URL
expression creates this property with a default property name ofEndeca.Document.Encoding
. You do not need to modify this name unless desired.BODY_PROP
- Describes the name of the property containing the document body. TheRETRIEVE_URL
expression creates this property with a default property name ofEndeca.Document.Body
. You do not need to modify this name unless desired.TEXT_PROP
Describes the name of the property to put document text into. The RETRIEVE_URL expression creates this property with a default property name ofEndeca.Document.Text.
You do not need to modify this name unless desired.
See the
EXPRESSION
element for DTD and attribute information.