The Portal Server software Search service provides:
C API for customizing the way the robot crawls URLs and generates resource descriptions.
Java APIs for searching the database, for submitting data, and for manipulating SOIF objects, such as RDs (RDM and SOIF APIs). C versions of these APIs are also available.
Search provider taglib and helper beans to write customized search JSPs.
The robot examines a set of selected URLs and searches for documents. For each found document, the robot then creates a resource description (RD) of the document using a predefined schema. The schema defines what pieces of information about the document are put in the RD. For example, the RD could contain a date, the author, the title, the URL, and an abstract about the document. These RDs can be grouped together or classified according to a given hierarchical taxonomy.
You can configure the robot through the Portal Server software administration console.
The robot has many customizeable parameters, including the following configuration parameters:
The URLs that it starts crawling from
Server access delays
Passwords
User agent string
Certificates for SSL
Proxy setup
In addition, the robot API enables you to write custom content parsers and summarizers for special URL handling requirements. You can also use the robot API to remove advertisements, generate alerts when certain pages are found, and perform specialized logging.
The Search database consists of Summary Object Interchange Format (SOIF) objects. The search API creates, reads, modifies, and writes the Search database entries. Assisting APIs create buffers, set and get attribute value pairs (used to define content and metadata for the objects in the database), handle exceptions, create a SOIF output stream, and read a SOIF input stream.
Normally, the Search database can be accessed by using the SOIF API, but the database can also be accessed through command-line utilities. You can also add RDs that you create, or import RDs from another database.
An RD is a description of some object to include into the system. SOIF is the format used to represent RDs.