This chapter contains the following sections:
Most of the Robot Application Functions (RAFs) require sources of information and generate data that goes to destinations. The sources are defined within the robot itself and are not necessarily related to the fields in the resource description it ultimately generates. Destinations, on the other hand, are generally the names of fields in the resource description, as defined by the resource description server’s schema.
For details on using the administration console to determine the database schema, see Sun Java System Portal Server 7.1 Administration Guide.
The following sections describe the different stages of the filtering process, and the sources available at those stages.
At the Setup stage, the filter is set up and cannot yet get information about the resource’s URL or content.
At the MetaData stage, the robot encounters a URL for a resource, but it has not downloaded the resource’s content, thus information is available about the URL as well as data that is derived from other sources such as the filter.conf file. At this stage, however, information is not available about the content of the resource.
The table below lists the sources available in the RAFs at the MetaData phase and their description.
Table 44–1 Sources Available to the RAFs at the MetaData Phase
Source |
Description |
Example |
---|---|---|
csid |
Catalog Server ID |
x-catalog//budgie.siroe.com:8086/alexandria |
depth |
Number of links traversed from starting point |
10 |
enumeration filter |
Name of Enumeration filter |
enumeration1 |
generation filter |
Name of Generation filter |
generation1 |
host |
Host portion of URL |
home.siroe.com |
IP |
Numeric version of host |
198.95.249.6 |
protocol |
Access portion of the URL |
http, https, ftp, file |
path |
Path portion of the URL |
/, /index.html, /documents/listing.html |
URL |
Complete URL |
http://developer.siroe.com/docs/manuals/ |
At the Data stage, the robot has downloaded the content of the resource at the URL, and can access data about the content, such as the description, the author, and so on.
If the resource is an HTML file, the Robot parses the <META> tags in the HTML headers. Consequently, any data contained in <META> tags is available at the Data stage.
During the data phase, the following sources, shown in the following table are available to RAFs, in addition to those available during the MetaData phase.
Table 44–2 Sources Available to the RAFs at the Data Phase
Source |
Description |
Example |
---|---|---|
content-charset |
Character set used by the resource | |
content-encoding |
Any form of encoding | |
content-length |
Size of the resource in bytes | |
content-type |
MIME type of the resource |
text/html, image/jpeg |
expires |
Date the resource itself expires | |
last-modified |
Date the resource was last modified | |
data in <META> tags |
Any data that is provided in <META> tags in the header of HTML resources |
Author Description Keywords |
At the Enumeration and Generation stages, the same data sources are available as the Data stage.
At the Shutdown stage, the filter completes its filtering and is shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, the shutdown functions typically restrict their operations to shutdown and cleanup activities.