Oracle WebCenter Interaction Web Service Development Guide

     Previous Next  Open TOC in new window   View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

About Content Crawler Indexing

A content crawler must return an indexable version of each crawled file to be included in the portal Directory.

The crawler's servlet/aspx page must return content in a indexable format and set the content type and file name using the appropriate headers. Any information required to retrieve the document must be included in the query string of the index URL, including credentials (if necessary).
Note: The request from the portal to the indexing servlet is a simple HTTP GET. This call is not gatewayed, so the content crawler code does not have access to the Content Source settings, user credentials and preferences, or anything other information through the Oracle WebCenter Interaction Development Kit (IDK).

For files, content can be streamed directly from the source directory. If the content is not in a file, the crawler code should create a temporary file that includes the content with as little extraneous information as possible.


  Back to Top      Previous Next