Both enumeration and generation filters have five phases in the filtering process.
Setup – Performs initialization operations. Occurs only once in the life of the robot.
Metadata – Filters the resource based on metadata available about the resource. Metadata filtering occurs once per resource before the resource is retrieved over the network. Table 19–1 lists examples of common metadata types.
Metadata Type |
Description |
Example |
---|---|---|
Complete URL |
The location of a resource |
http://home.siroe.com/ |
Protocol |
The access portion of the URL |
http, ftp, file |
Host |
The address portion of the URL |
www.siroe.com |
IP address |
Numeric version of the host |
198.95.249.6 |
PATH |
The path portion of the URL |
/index.html |
Depth |
Number of links from the starting point URL |
5 |
Data – Filters the resource based on its data. Data is filtered once per resource after the data is retrieved over the network. Data that can be used for filtering include:
content-type
content-length
content-encoding
content-charset
last-modified
expires
Enumerate – Enumerates the current resource in order to determine whether it points to other resources to be examined.
Generate – Generates a resource description (RD) for the resource and saves it in the Search Server database.
Shutdown – Performs any needed termination operations. This process occurs once in the life of the robot.