About Content Crawler Indexing

A content crawler must return an indexable version of each crawled file to be included in the portal Directory.

The crawler's servlet/aspx page must return content in a indexable format and set the content type and file name using the appropriate headers. Any information required to retrieve the document must be included in the query string of the index URL, including credentials (if necessary).

Note: The request from the portal to the indexing servlet is a simple HTTP GET. This call is not gatewayed, so the content crawler code does not have access to the Content Source settings, user credentials and preferences, or anything other information through the Oracle WebCenter Interaction Development Kit (IDK).

For files, content can be streamed directly from the source directory. If the content is not in a file, the crawler code should create a temporary file that includes the content with as little extraneous information as possible.

For details, see the following topics:

Indexing Streaming ContentIf the content being crawled is in a file, the file can be streamed directly from the source directory.
Creating Temporary Files for IndexingIf crawled content cannot be indexed as-is, the crawler code must create a temporary file for indexing.

Parent topic: About Content Crawlers

Oracle WebCenter Interaction Web Service Development Guide

About Content Crawler Indexing