Sun Java System Portal Server 7.1 Administration Guide

Modifiable Properties

The robot.conf file defines many options for the robot, including pointing the robot to the appropriate filters in filter.conf . For backward compatibility with older versions , robot.conf can also contain the starting point URLs.

Because you can set most properties by using the management console, you typically do not need to edit the robot.conf file. However, advanced users might manually edit this file to set properties that cannot be set through the management console. See Sample robot.conf File for an example of this file.

Table 12–4 lists the properties you can change in the robot.conf file.

Table 12–4 User-Modifiable Properties

Property 

Description 

Example 

auto-proxy

Specifies the proxy setting for the robot. It can be a proxy server or a JavaScript file for automatically configuring the proxy. . 

auto-proxy="http://proxy_server/proxy.pac"

bindir

Specifies whether the robot adds a bin directory to the PATH environment. This is an extra PATH for users to run an external program in a robot, such as those specified by cmd-hook property.

bindir=path

cmd-hook

Specifies an external completion script to run after the robot completes one run. This must be a full path to the command name. The robot executes this script from the /var/opt/SUNWportal/ directory.

No default is set. 

At least one RD must be registered for the command to run. 

 

cmd-hook=”command-string”

command-port

Specifies the port number that the robot listens to in order to accept commands from other programs, such as the Administration Interface or robot control panels. 

For security reasons, the robot can accept commands only from the local host unless remote-access is set to yes.

command-port=port_number

connect-timeout

Specifies the maximum time allowed for a network to respond to a connection request. 

The default is 120 seconds.

command-timeout=seconds

convert-timeout

Specifies the maximum time allowed for document conversion. 

The default is 600 seconds.

convert-timeout=seconds

depth

Specifies the number of links from the starting point URLs that the robot examines. This property sets the default value for any starting point URLs that do not specify a depth. 

The default is 10.

A value of negative one (depth=-1) indicates that the link depth is infinite.

depth=integer

email

Specifies the email address of the person who runs the robot. 

The email address is sent with the user-agent in the HTTP request header so that Web managers can contact the people who run robots at their sites. 

The default is user@domain.

email=user@hostname

enable-ip

Generates an IP address for the URL for each RD that is created. 

The default is true.

enable-ip=[true | yes | false | no]

enable-rdm-probe

Determines the server supports RDM. The robot decides whether to query each server it encounters by using this property. If the server supports RDM, the robot does not attempt to enumerate the server’s resources that server is able to act as its own resource description server. 

The default is false.

enable-rdm-probe=[true | false | yes | no]

enable-robots-txt

Determines the robot should check the robots.txt file at each site it visits, if available.

The default is yes.

enable-robots-txt=[true | false | yes | no]

engine-concurrent

Specifies the number of pre-created threads for the robot to use. 

The default is 10.

You cannot use the management console to set this property interactively. 

engine-concurrent=[1..100]

enumeration-filter

Specifies the enumeration filter that is used by the robot to determine a resource should be enumerated. The value must be the name of a filter defined in the file filter.conf.

The default is enumeration-default.

You cannot use the management console to set this property interactively. 

enumeration-filter=enumfiltername

generation-filter

Specifies the generation filter that is used by the robot to determine a resource description should be generated for a resource. The value must be the name of a filter defined in the file filter.conf.

The default is generation-default.

You cannot use the management console to set this property interactively. 

generation-filter=genfiltername

index-after-ngenerated

Specifies the number of minutes that the robot should collect RDs before batching them for the Search Server. 

The default value is 30 minutes. 


index-after-ngenerated=30

loglevel

Specifies the levels of logging. The loglevel values are as follows:

  • Level 0: log nothing but serious errors

  • Level 1: also log RD generation (default)

  • Level 2: also log retrieval activity

  • Level 3: also log filtering activity

  • Level 4: also log spawning activity

  • Level 5: also log retrieval progress

    The default value is 1.


loglevel=[0...100]

max-connections

Specifies the maximum number of concurrent retrievals that a robot can make. 

The default is 8.


max-connections=[1..100]

max-filesize-kb

Specifies the maximum file size in kilobytes for files retrieved by the robot. 


max-filesize-kb=1024

max-memory-per-url / max-memory

Specifies the maximum memory in bytes used by each URL. If the URL needs more memory, the RD is saved to disk. 

The default is 64k.

You cannot use the management console to set this property interactively. 


max-memory-per-url=n_bytes

max-working

Specifies the size of the robot working set, which is the maximum number of URLs the robot can work on at one time. 

You cannot use the management console to set this property interactively. 


max-working=1024

onCompletion

Determines what the robot does after it has completed a run. The robot can either go into idle mode, loop back and start again, or quit. 

The default is idle.

This property works with the cmd-hook property. When the robot is done, it performs the action of onCompletion and then runs the cmd-hook program.


OnCompletion=[idle | loop | quit]

password

Specifies the password used for httpd authentication and ftp connection.


password=string

referer

Specifies the property sent in the HTTP request if it is set to identify the robot as the referrer when accessing Web pages 


referer=string

register-user

Specifies the user name used to register RDs to the Search Server database. 

This property cannot be set interactively through the Search Server Administration Interface. 


register-user=string

register-password

Specifies the password used to register RDs to the Search Server database. 

This property cannot be set interactively through the management console. 


register-password=string

remote-access

This property determines the robot can accept commands from remote hosts. 

The default is false.


remote-access=[true | false | yes | no]

robot-state-dir

Specifies the directory where the robot saves its state. In this working directory, the robot can record the number of collected RDs and so on. 


robot-state-dir="/var/opt/SUNWportal/
searchservers/<searchserverid>/config/robot"

server-delay

Specifies the time period between two visits to the same web site, thus preventing the robot from accessing the same site too frequently. The default is 0 seconds. 


server-delay=delay_in_seconds

site-max-connections

Indicates the maximum number of concurrent connections that a robot can make to any one site. 

The default is 2.


site-max-connections=[1..100]

smart-host-heuristics

Enables the robot to change sites that are rotating their DNS canonical host names. For example, www123.siroe.com is changed to www.siroe.com .

The default is false.


smart-host-heuristics=[true | false]

tmpdir

Specifies a place for the robot to create temporary files. 

Use this value to set the environment variable TMPDIR.


tmpdir=path

user-agent

Specifies the property sent with the email address in the http-request to the server.


user-agent=SunONERobot/6.2

username

Specifies the user name of the user who runs the robot and is used for httpd authentication and ftp connection.

The default is anonymous.


username=string