Most robot application functions (RAFs) require sources of information and generate data that go to destinations. The sources are defined within the robot and are not necessarily related to the fields in the resource description that the robot ultimately generates. Destinations, on the other hand, are generally the names of fields in the resource description, as defined by the resource description server’s schema.
The following sections describe the different stages of the filtering process, and the sources available at those stages:
At the Setup stage, the filter is set up but cannot yet obtain information about the resource’s URL or content.
At the MetaData stage, the robot encounters a URL for a resource but it has not downloaded the resource’s content. Thus information is available about the URL as well as data that is derived from other sources such as the filter.conf file. At this stage, however, information about the content of the resource is not available.
Table 12–2 Sources Available to the RAFs at the MetaData Phase
Source |
Description |
Example |
---|---|---|
csid |
Catalog server ID |
x-catalog//budgie.siroe.com:8086/alexandria |
depth |
Number of links traversed from starting point |
10 |
enumeration filter |
Name of enumeration filter |
enumeration1 |
generation filter |
Name of generation filter |
generation1 |
host |
Host portion of URL |
home.siroe.com |
IP |
Numeric version of host |
198.95.249.6 |
protocol |
Access portion of the URL |
http, https, ftp, file |
path |
Path portion of the URL |
/, /index.html, /documents/listing.html |
URL |
Complete URL |
http://developer.siroe.com/docs/manuals/ |
At the Data stage, the robot has downloaded the content of the resource at the URL and can access data about the content, such as the description and the author.
If the resource is an HTML file, the Robot parses the <META> tags in the HTML headers. Consequently, any data contained in <META> tags is available at the Data stage.
During the Data phase, the following sources are available to RAFs, in addition to those available during the MetaData phase.
Table 12–3 Sources Available to the RAFs at the Data Phase
Source |
Description |
Example |
---|---|---|
content-charset |
Character set used by the resource | |
content-encoding |
Any form of encoding | |
content-length |
Size of the resource in bytes | |
content-type |
MIME type of the resource |
text/html, image/jpeg |
expires |
Date the resource expires | |
last-modified |
Date the resource was last modified | |
data in <META> tags |
Any data that is provided in <META> tags in the header of HTML resources |
Author, Description, Keywords |
All of these sources except for the data in <META> tags are derived from the HTTP response header returned when retrieving the resource.
At the Enumeration and Generation stages, the same data sources are available as in the Data stage. See Table 12–3 for information.
At the Shutdown stage, the filter completes its filtering and shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, the shutdown functions typically restrict their operations to robot shutdown and clean-up activities.
Each function can have an enable property. The values can be true, false, on, or off. The management console uses these parameters to turn certain directives on or off.
The following example enables enumeration for text/html and disables enumeration for text/plain:
# Perform the enumeration on HTML only Enumerate enable=true fn=enumerate-urls max=1024 type=text/html Enumerate enable=false fn=enumerate-urls-from-text max=1024 type=text/plain |
Adding an enable=false property or an enable=off property has the same effect as commenting the line. These properties are used because the management console does not write comments.