Sun Java System Portal Server 7 Developer's Guide

Stages in the Filter Process

Both enumerator and generator filters have five phases in the filtering process. They both have four common phases, Setup, Metadata, Data, and Shutdown. If the resource makes it past the Data phase, it is either in the Enumerate or Generate phase, depending on whether the filter is an enumerator or a generator.

Setup

Performs initialization operations. Occurs only once in the life of the robot.

Metadata

Filters the resource based on metadata that is available about the resource. Metadata filtering occurs once per resource before the resource is retrieved over the network. Table 16–1 lists examples of common metadata types.

Table 16–1 Common Metadata Types

Metadata  

Description  

Example  

Complete URL 

The location of a resource 

http://home.siroe.com/

Protocol 

The access portion of the URL 

http, ftp, file 

Host 

The address portion of the URL 

www.siroe.com:

IP address 

Numeric version of the host 

198.95.249.6 

PATH 

The path portion of the URL 

/index.html

Depth 

Number of links from the starting point URL 

Data

Filters the resource based on its data. Data filtering is done once per resource after it is retrieved over the network. Data that can be used for filtering includes:

  • content-type

  • content-length

  • content-encoding

  • content-charset

  • last-modified

  • expires

Enumerate

Enumerates the current resource in order to determine if it points to other resources to be examined.

Generate

Generates a resource description (RD) for the resource and saves it for adding it to the search engine database.

Shutdown

Performs any needed termination operations. Occurs once in the life of the robot.