Sun Java System Portal Server 7.2 Administration Guide

Sources and Destinations

Most robot application functions (RAFs) require sources of information and generate data that go to destinations. The sources are defined within the robot and are not necessarily related to the fields in the resource description that the robot ultimately generates. Destinations, on the other hand, are generally the names of fields in the resource description, as defined by the resource description server’s schema.

The following sections describe the different stages of the filtering process, and the sources available at those stages:

Sources Available at the Setup Stage

At the Setup stage, the filter is set up but cannot yet obtain information about the resource’s URL or content.

Sources Available at the MetaData Filtering Stage

At the MetaData stage, the robot encounters a URL for a resource but it has not downloaded the resource’s content. Thus information is available about the URL as well as data that is derived from other sources such as the filter.conf file. At this stage, however, information about the content of the resource is not available.

Table 19–2 Sources Available to the RAFs at the MetaData Phase

Source 

Description 

Example 

csid 

Catalog server ID 

x-catalog//budgie.siroe.com:8086/alexandria

depth 

Number of links traversed from starting point 

10

enumeration filter 

Name of enumeration filter 

enumeration1

generation filter 

Name of generation filter 

generation1

host 

Host portion of URL 

home.siroe.com

IP 

Numeric version of host 

198.95.249.6

protocol 

Access portion of the URL 

http, https, ftp, file

path 

Path portion of the URL 

/, /index.html, /documents/listing.html

URL 

Complete URL 

http://developer.siroe.com/docs/manuals/

Sources Available at the Data Stage

At the Data stage, the robot has downloaded the content of the resource at the URL and can access data about the content, such as the description and the author.

If the resource is an HTML file, the Robot parses the <META> tags in the HTML headers. Consequently, any data contained in <META> tags is available at the Data stage.

During the Data phase, the following sources are available to RAFs, in addition to those available during the MetaData phase.

Table 19–3 Sources Available to the RAFs at the Data Phase

Source 

Description 

Example 

content-charset

Character set used by the resource 

 

content-encoding

Any form of encoding 

 

content-length

Size of the resource in bytes 

 

content-type

MIME type of the resource 

text/html, image/jpeg

expires

Date the resource expires 

 

last-modified

Date the resource was last modified 

 

data in <META> tags

Any data that is provided in <META> tags in the header of HTML resources

Author, Description, Keywords 

All of these sources except for the data in <META> tags are derived from the HTTP response header returned when retrieving the resource.

Sources Available at the Enumeration, Generation, and Shutdown Stages

At the Enumeration and Generation stages, the same data sources are available as in the Data stage. See Table 19–3 for information.

At the Shutdown stage, the filter completes its filtering and shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, the shutdown functions typically restrict their operations to robot shutdown and clean-up activities.

Enable Property

Each function can have an enable property. The values can be true, false, on, or off. The management console uses these parameters to turn certain directives on or off.

The following example enables enumeration for text/html and disables enumeration for text/plain:


#  Perform the enumeration on HTML only
Enumerate enable=true fn=enumerate-urls max=1024 type=text/html
Enumerate enable=false fn=enumerate-urls-from-text max=1024 type=text/plain

Adding an enable=false property or an enable=off property has the same effect as commenting the line. These properties are used because the management console does not write comments.