Sun Java System Portal Server 7 Developer's Guide

Search APIs

The Portal Server software Search service provides:

Search Robot

The robot examines a set of selected URLs and searches for documents. For each found document, the robot then creates a resource description (RD) of the document using a predefined schema. The schema defines what pieces of information about the document are put in the RD. For example, the RD could contain a date, the author, the title, the URL, and an abstract about the document. These RDs can be grouped together or classified according to a given hierarchical taxonomy.

You can configure the robot through the Portal Server software administration console.

The robot has many customizeable parameters, including the following configuration parameters:

In addition, the robot API enables you to write custom content parsers and summarizers for special URL handling requirements. You can also use the robot API to remove advertisements, generate alerts when certain pages are found, and perform specialized logging.

Search Database

The Search database consists of Summary Object Interchange Format (SOIF) objects. The search API creates, reads, modifies, and writes the Search database entries. Assisting APIs create buffers, set and get attribute value pairs (used to define content and metadata for the objects in the database), handle exceptions, create a SOIF output stream, and read a SOIF input stream.

Normally, the Search database can be accessed by using the SOIF API, but the database can also be accessed through command-line utilities. You can also add RDs that you create, or import RDs from another database.

An RD is a description of some object to include into the system. SOIF is the format used to represent RDs.