Sun Java System Portal Server 7.1 Technical Reference

Chapter 41 Overview of Search Engine Robot

A Search Engine robot is an agent that identifies and reports on resources in its domains. It does so by using two kinds of filters: an enumerator filter and a generator filter.

The enumerator filter locates resources by using network protocols. It tests each resource, and, if it meets the proper criteria, it is enumerated. For example, the enumerator filter can extract hypertext links from an HTML file and use the links to find additional resources.

The generator filter tests each resource to determine if a resource description (RD) should be created. If the resource passes the test, the generator creates an RD which is stored in the Search Engine database.

Figure 42–1 illustrates how the Search Engine robot works. In Figure 42-1, the robot examines URLs and their associated network resources. Each resource is tested by both the enumerator and the generator. If the resource passes the enumeration test, the robot checks it for additional URLs. If the resource passes the generator test, the robot generates a resource description that is stored in the Search Engine database.

Overview

The Robot Application Functions (RAFs) in the filter.conf file can be used to create and modify filter definitions. The file filter.conf is located in the /var/opt/SUNWportal/searchservers/instanceName/config directory.

The following figure shows how the robot works.

Figure 41–1 How the Robot Works

How the Robot Works

The filter.conf file contains definitions for the enumeration and generation filters. Each of these filters invokes a set of rules which are stored in the filterrules.conf file. The filter definitions contain instructions that are specific to each filter while the filter rules contain the rules used by both filters.

To understand how filter rules are defined, examine the filterrules.conf file. Note that you typically need not manually edit this file since you can create filter rules from the administration console.

For an example of filter definitions, examine the filter.conf file. Edit the filter.conf file only to modify the filters in a way that is not accommodated in the administration console, such as instructing the robot to enumerate some resources without generating resources for them.