Server Delay
|
No Delay
|
No Delay (default), 1 second, 2 seconds, 5 seconds, 10 seconds, 30 seconds,
1 minute, 5 minutes.
|
Maximum Connections - Max concurrent retrieval URLs
|
8
|
1, 2, 4, 8 (default), 10, 12, 16, 20.
|
Maximum Connections per Site
|
2
|
(no limit), 1, 2, 4, 8, 10, 12, 16, 20.
|
Send RDs to Indexing every
|
30 minutes
|
3 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes (default),
1 hour, 2 hours, 4 hours, 8 hours.
|
Script to Launch
|
nothing
|
nothing (default). For sample files, see the cmdHook files
in the /opt/SUNWportal/samples/robot directory (for the
default installation).
|
After Processing all URLs
|
go idle
|
go idle (default), shut down, start over.
|
Contact Email
|
Blank
|
Enter your own.
|
Log Level
|
1 Generation
|
0 Errors only; 1 Generation (default); 2 Enumeration, Conversion; 3
Filtering; 4 Spawning; 5 Retrieval
|
User Agent
|
SunONERobot/6.2
|
Version of the search server.
|
Ignore robots.txt protocol
|
No
|
Some servers have a robot.txt file that says robots
do not come here. If your search robot encounters this file on a site and
this attribute is false, it does not search the site. If this attribute is
true, the robot ignores the file and searches the site.
|
Perform Authentication?
|
Yes
|
Yes
No
|
Robot Username
|
Blank
|
Robot uses the anonymous user name to gain access to a site.
|
Password
|
Blank
|
Frequently a site that allows anonymous users requires a email address
as a password. This address is in plain text.
|
Proxy Username
|
Blank
|
Robot uses the anonymous user name to gain access to a site.
|
Password
|
Blank
|
Frequently a site that allows anonymous users requires an email address
as a password. This address is in plain text.
|
Proxy Connection Type
|
Proxy — Manual Configuration
|
Direct Internet Connection, Proxy--Auto Configuration, Proxy--Manual
Configuration
|
Auto Proxy Configuration Type
|
Local Proxy File
|
Local Proxy File, Remote Proxy File
|
Auto Proxy Configuration Location
|
Blank
|
The auto proxy has a file that lists all the proxy information needed.
An example of a local proxy file is robot.pac.
An example of a emote proxy file is http://proxy.sesta.com:8080/proxy.pac
|
Manual Proxy Configuration HTTP Proxy
|
Host Name:Port
|
Format: server1.sesta.com:8080. These three manual
configuration values are put in the robot.pac file in
the
/var/opt/SUNWportal/searchservers/
search1/config
directory.
|
Manual Proxy Configuration HTTPS Proxy
|
Host Name:Port
|
This manual configuration value is put in the robot.pac file.
Format: server1.sesta.com:8080
|
Manual Proxy Configuration FTP Proxy
|
Host Name:Port
|
This manual configuration value is put in the robot.pac file.
Format: server1.sesta.com:8080
|
Follow Links in HTML
|
Yes
|
Extract hyperlinks from HTML
|
maximum links
|
1024
|
Limits the number of links the robot can extract from any one HTML resource.
As the robot searches sites and discovers links to other resources, it could
conceivably end up following huge numbers of links a great distance from its
original starting point.
|
Follow Links in Plain Text
|
No
|
Extract hyperlinks from plain text.
|
maximum links
|
1024
|
Limits the number of links the robot can extract from any one text resource.
|
Use Cookies
|
No
|
If checked, the robot uses cookies when it crawls. Some sites require
the use of cookies in order for them to be navigated correctly. The robot
keeps its cookies in a file called cookies.txt in the
robot state directory. The format of cookies.txt is the
same format as used by the Netscape™ Communicator browser.
|
Use IP as Source
|
Yes
|
In most cases, the robot operates only on the domain name of a resource.
In some cases, you might want to be able to filter or classify resources based
on subnets by Internet Protocol (IP) address. In that case, you must explicitly
allow the robot to retrieve the IP address in addition to the domain name.
Retrieving IP addresses requires an extra DNS lookup, which can slow the operation
of the robot. If you do not need this option, you can turn it off to improve
performance.
|
Enable Smart Host Heuristics
|
No
|
If checked, the robot converts common alternate host names used by a
server to a single name. This is most useful in cases where a site has a number
of servers all aliased to the same address, such as www.sesta.com,
which often have names such as www1.sesta.com, www2.sesta.com, and so on.
When you select this option, the robot will internally translate all
host names starting with wwwn to www,
where n is any integer. This attribute only operates on
host names starting with wwwn.
This attribute cannot be used when CNAME resolution is OFF (No).
|
Resolve Host Names to CNAMEs
|
No
|
If checked, the robot validates and resolves any host name it encounters
into a canonical host name. This allows the robot to accurately track unique
RDs. If unchecked, the robot validates host names without converting them
to the canonical form. So you may get duplicate RDs listed with the different
host names found by the robot.
For example, devedge.sesta.com is an alias for developer.sesta.com. With CNAME resolution on, a URL referenced
as devedge.sesta.com is listed as being found on developer.sesta.com. With CNAME resolution off, the RD retains the original reference
to devedge.sesta.com.
Smart Host Heuristics cannot be enabled when CNAME resolution is OFF
(No).
|
Accepts Commands from any Host
|
No
|
Most robot control functions operate through a TCP/IP port. This attribute
controls whether commands to the robot must come from the local host system
(No), or whether they can come from anywhere on the network (Yes).
It is recommended that you restrict direct robot control to the local
host (No). You can still administer the robot remotely through the Administration
Console.
|
Default Starting Point Depth
|
10
|
1- starting points only, 2- bookmark style, 3-10, unlimited.
Default value for the levels of hyperlinks the robot traverses from
any starting point. You can set the depth for any given starting point by
editing the site on the Robot, Sites page.
|
Work Directory
|
/var/opt/SUNWportal/
searchservers/search1/tmp
|
Full pathname of a temporary working directory the robot can use to
store data. The robot retrieves the entire contents of documents into this
directory, often many at a time, so this space should be large enough to handle
all of those at once.
|
State Directory
|
/var/opt/SUNWportal/
searchservers/search1/robot
|
Full pathname of a temporary directory the robot uses to store its state information, including the list of URLs it has visited, the URL
pool, and so on. This database can be quite large, so you might want to locate
it in a separate partition from the Work Directory.
|
Page Extraction Index
|
Partial Text
|
Full Text uses the complete document in the resource description. Partial
text only uses the specified number of bytes in the resource description.
|
extract first # bytes
|
4096
|
Enter the number of bytes.
|
Extract Table Of Contents
|
Yes
|
Yes includes the Table of Contents in the resource description.
|
Extract data in META tags
|
Yes
|
Yes includes the META tags in the resource description.
|
Allow No Existing Classifications
|
Yes
|
Yes allows none of the existing classifications in the resource description.
|
Document Converters
|
All selected; if unselected, that type of document cannot be indexed.
|
Adobe PDF
Corel Presentations
Corel Quattro Pro
FrameMaker
Lotus Ami Pro
Lotus Freelance
Lotus Word Pro
Lotus 1-2-3
Microsoft Excel
Microsoft Powerpoint
Microsoft RTF
Microsoft Word
Microsoft Works
Microsoft Write
WordPerfect
StarOffice™ Calc
StarOffice Impress
StarOffice Writer
XyWrite
|
Converter Timeout
|
600
|
Time in seconds allowed for one document to be converted to HTML. If
this time is exceeded, the URL is excluded.
|