Both enumerator and generator filters have five phases in the filtering process. They both have four common phases: Setup, Metadata, Data, and Shutdown. If the resource makes it past the Data phase, it is either in the Enumerate or Generate phase, depending on whether the filter is an enumerator or a generator.
The phases are as follows:
Setup
Performs initialization operations. Occurs only once in the life of the robot.
Metadata
Filters the resource based on metadata that is available about the resource. Metadata filtering occurs once per resource before the resource is retrieved over the network. The table below lists the common metadata types and their description.
Table 43–1 Common Metadata Types
Metadata |
Description |
Example |
---|---|---|
Complete URL |
The location of a resource |
http://home.siroe.com/ |
Protocol |
The access portion of the URL |
http, ftp, file |
Host |
The address portion of the URL |
www.siroe.com |
IP address |
Numeric version of the host |
198.95.249.6 |
PATH |
The path portion of the URL |
/index.html |
Depth |
Number of links from the seed URL |
5 |
Data
Filters the resource based on its data. Data filtering is done once per resource after it is retrieved over the network. Data that can be used for filtering include:
content-type
content-length
content-encoding
content-charset
last-modified
expires
Enumerate
Enumerates the current resource in order to determine if it points to other resources to be examined.
Generate
Generates a resource description (RD) for the resource and saves it in the Search Engine database.
Shutdown
Performs any needed termination operations. Occurs once in the life of the robot.