Sun Java System Portal Server 7.1 Technical Reference

Chapter 44 Robot Application Functions - Sources and Destinations

This chapter contains the following sections:

Introduction

Most of the Robot Application Functions (RAFs) require sources of information and generate data that goes to destinations. The sources are defined within the robot itself and are not necessarily related to the fields in the resource description it ultimately generates. Destinations, on the other hand, are generally the names of fields in the resource description, as defined by the resource description server’s schema.

For details on using the administration console to determine the database schema, see Sun Java System Portal Server 7.1 Administration Guide.

The following sections describe the different stages of the filtering process, and the sources available at those stages.

Setup Stage

At the Setup stage, the filter is set up and cannot yet get information about the resource’s URL or content.

MetaData Filtering Stage

At the MetaData stage, the robot encounters a URL for a resource, but it has not downloaded the resource’s content, thus information is available about the URL as well as data that is derived from other sources such as the filter.conf file. At this stage, however, information is not available about the content of the resource.

The table below lists the sources available in the RAFs at the MetaData phase and their description.

Table 44–1 Sources Available to the RAFs at the MetaData Phase

Source 

Description 

Example 

csid 

Catalog Server ID 

x-catalog//budgie.siroe.com:8086/alexandria 

depth 

Number of links traversed from starting point 

10 

enumeration filter 

Name of Enumeration filter 

enumeration1 

generation filter 

Name of Generation filter 

generation1 

host 

Host portion of URL 

home.siroe.com 

IP 

Numeric version of host 

198.95.249.6 

protocol 

Access portion of the URL 

http, https, ftp, file 

path 

Path portion of the URL 

/, /index.html, /documents/listing.html 

URL 

Complete URL 

http://developer.siroe.com/docs/manuals/ 

Data Stage

At the Data stage, the robot has downloaded the content of the resource at the URL, and can access data about the content, such as the description, the author, and so on.

If the resource is an HTML file, the Robot parses the <META> tags in the HTML headers. Consequently, any data contained in <META> tags is available at the Data stage.

During the data phase, the following sources, shown in the following table are available to RAFs, in addition to those available during the MetaData phase.

Table 44–2 Sources Available to the RAFs at the Data Phase

Source 

Description 

Example 

content-charset 

Character set used by the resource 

 

content-encoding 

Any form of encoding 

 

content-length 

Size of the resource in bytes 

 

content-type 

MIME type of the resource 

text/html, image/jpeg 

expires 

Date the resource itself expires 

 

last-modified 

Date the resource was last modified 

 

data in <META> tags 

Any data that is provided in <META> tags in the header of HTML resources 

Author 

Description 

Keywords 

Enumeration, Generation, and Shutdown Stages

At the Enumeration and Generation stages, the same data sources are available as the Data stage.

At the Shutdown stage, the filter completes its filtering and is shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, the shutdown functions typically restrict their operations to shutdown and cleanup activities.