Content crawler code can use DocFetch to access files that are not available via a public URL.
To use DocFetch, there are three relevant fields in the DocumentMetaData object returned in the portal's call to IDocument.GetMetaData:
When UseDocFetch is set to True, the IDK sets the ClickThroughURL stored in the Knowledge Directory to the URL of the DocFetch servlet, and calls IDocument.GetDocument to retrieve the file path to the indexable version of the document. When a user subsequently clicks on a link to the crawled document in the Knowledge Directory, the request to the DocFetch servlet makes several calls to the already-implemented content crawler code. GetDocument is called again, but this time as part of the IDocFetch interface. The file path returned is opened by the servlet and streamed back in the response.
To use user preferences or User Information, you must configure the settings to be used in the Content Crawler editor.
DocFetch interfaces are called in the following order. For a complete listing of interfaces, classes, and methods, see the IDK API documentation