Crawling errors

Processing source documents, including retrieving and extracting text can introduce problems. This section lists several common errors and any workarounds, if applicable.

PDF content in Endeca.Document.Text displays as binary data — If the Endeca Crawler processes PDF files and the content from those files appears in your application as binary data, the PDF files may contain custom-encoded embedded fonts. It cannot always correctly display content that contains custom-encoded embedded fonts. To solve the issue, a system font is substituted for the custom-encoded font. The substitution succeeds if the encoding in the substituted system font is the same as the custom encoding in the embedded font. When the substitution is not successful, you see binary data in Endeca.Document.Text.

Here are several issues related to retrieving documents from HTTP hosts, and an explanation of how the spider handles them: