The content properties contain information (including the text) of the document. Note that some of the properties are generated by the CAS Document Conversion Module.

Endeca Property Name

Property Value

Endeca.Document.CharEncodingForConversion

The encoding used for text conversion of the document.

Endeca.Document.Metadata.attribute

Metadata information in the document. The metadata attributes depend on which ones were added by the authoring tool used to create the document. For example, an Adobe Acrobat PDF document could have such metadata attributes as Endeca.Document.Metadata.title and Endeca.Document.Metadata.primary_author.

Endeca.Document.MimeType

The MIME Type of the document, if it can be determined. Common examples of this property value include text/html, application/pdf, and image/gif.

Endeca.Document.OriginalCharEncoding

The original encoding of the body of the document, if it can be determined. This property value could be an ISO code or other encoding representation (for example, UTF-8, CP1252, or ISO-8859-1).

Endeca.Document.Outlink

A hypertext link (as an absolute URL) that references another document or another site.

Endeca.Document.OutlinkCount

The number of links (Endeca.Document.Outlink properties) in this document.

Endeca.Document.Text

The text (content) of the source document. Note that the Document Conversion Module typically does not preserve line break information.

Endeca.Document.TextExtraction.Error

An error that occurred during the parsing process, including errors returned by the Document Conversion Module.

Endeca.Document.Title

The title of the document.

Endeca.Document.XHTML

The content of the document in XHTML. This property is created only when the output.dom.include property is set to true. If it is, the Web Crawler normalizes the content of HTML documents to XHTML and stores it in this property.

Endeca.File.Size

The size of the file, as indicated by the size of the byte stream.


Copyright © Legal Notices