About Content Crawler Indexing
A content crawler must return an indexable version of each
crawled file to be included in the portal Knowledge Directory.
The crawler's servlet/aspx page must return content in a indexable
format and set the content type and file name using the appropriate
headers. Any information required to retrieve the document must be
included in the query string of the index URL, including credentials
(if necessary).
Note: The request from the portal to the indexing
servlet is a simple HTTP GET. This call is not gatewayed, so the content
crawler code does not have access to the Content Source settings,
user credentials and preferences, or anything other information through
the IDK.
For files, content can be streamed directly from the source directory.
If the content is not in a file, the crawler code should create a
temporary file that includes the content with as little extraneous
information as possible.
For details, see the following topics: