Content crawler code can use DocFetch to access files that are not available via a public URL.
To use DocFetch, there are three relevant fields in the DocumentMetaData object returned in the portal's call to IDocument.getMetaData:
When UseDocFetch is set to True, the Oracle WebCenter Interaction Development Kit (IDK) sets the ClickThroughURL stored in the Directory to the URL of the DocFetch servlet, and calls IDocument.getDocument to retrieve the file path to the indexable version of the document. When a user subsequently clicks on a link to the crawled document in the Directory, the request to the DocFetch servlet makes several calls to the already-implemented content crawler code. getDocument is called again, but this time as part of the IDocFetch interface. The file path returned is opened by the servlet and streamed back in the response.
To use user preferences or User Information, you must configure the settings to be used in the Content Crawler editor.
DocFetch interfaces are called in the following order. For a complete listing of interfaces, classes, and methods, see the Oracle WebCenter Interaction Development Kit (IDK) API documentation.