Both enumerator and generator filters have five phases in the filtering process. They both have four common phases, Setup, Metadata, Data, and Shutdown. If the resource makes it past the Data phase, it is either in the Enumerate or Generate phase, depending on whether the filter is an enumerator or a generator.
Performs initialization operations. Occurs only once in the life of the robot.
Filters the resource based on metadata that is available about the resource. Metadata filtering occurs once per resource before the resource is retrieved over the network. Table 16–1 lists examples of common metadata types.
Metadata |
Description |
Example |
---|---|---|
Complete URL |
The location of a resource |
http://home.siroe.com/ |
Protocol |
The access portion of the URL |
http, ftp, file |
Host |
The address portion of the URL |
www.siroe.com: |
IP address |
Numeric version of the host |
198.95.249.6 |
PATH |
The path portion of the URL |
/index.html |
Depth |
Number of links from the starting point URL |
5 |
Filters the resource based on its data. Data filtering is done once per resource after it is retrieved over the network. Data that can be used for filtering includes:
content-type
content-length
content-encoding
content-charset
last-modified
expires
Enumerates the current resource in order to determine if it points to other resources to be examined.
Generates a resource description (RD) for the resource and saves it for adding it to the search engine database.
Performs any needed termination operations. Occurs once in the life of the robot.