Specify the number of crawler threads that will be spawned at run time.
Number of processors
Specify the number of central processing units (CPUs) that exist on
the server where the Ultra Search crawler will run. This setting is
used to determine the optimal number of document conversion threads
used by the system. A document conversion thread converts multi-format
documents into HTML documents for proper indexing.
Automatic language detection
Not all documents retrieved by the Ultra Search crawler specify the
language. For documents with no language specification, the Ultra Search
crawler attempts to automatically detect language. Specify Yes to turn
on this feature.
Crawling depth
A web document might contain links to other web documents, which in
turn might contain more links. This setting allows you to specify the
maximum number of nested links the crawler will follow. Click here for more information on the importance of the crawling
depth.
Crawler timeout threshold
Specify in seconds a crawler timeout. The crawler timeout threshold
is used to force a timeout when the crawler cannot access a web page.
Default character set
Specify the default character set. The crawler uses this setting when
an HTML document does not have its character set specified.
Temporary directory location and size
Specify a temporary directory and size. The crawler uses the temporary
directory for intermittent storage when gathering documents. Specify
the absolute path of the temporary directory. The size is the maximum
temporary space in megabytes that will be used by the crawler.
Logfile directory
Specify the logfile directory. The logfile directory is used for storing
the crawler logfile(s). The log file records all crawler activity, warnings,
and error messages for a particular schedule. It includes
messages logged at startup, runtime, and at shutdown.
By default, the Ultra Search crawler prints selected
crawler activity into each schedule logfile. Selective
printing is necessary to avoid creating immensely large
logfiles (which can easily happen when crawling a large number
of documents). However, in certain
situations, it may be beneficial to configure the crawler to
print detailed activity to each schedule logfile. This is known
as verbose logging. To configure the crawler for verbose
logging, you must log in to the Oracle Server using
SQL*PLus. Login as the Ultra Search instance owner
(or any user that has been granted administrative privileges
on that instance). Once logged in, run the following commands:
- exec wk_adm.use_instance('<instance_name>');
- exec wk_crw.update_crawler_config(verbose=>1);
Database connect string
The database connect string is a standard JDBC connect string used
by the crawler when it needs to connect to the database. The format
of the connect string must be as follows: [hostname]:[port]:[SID]
Use this page to view and edit remote crawler profiles. A remote crawler
profile consists of all parameters needed to run the Ultra Search crawler
on a remote machine other than the Oracle Ultra Search database. A remote
crawler profile is identified by the hostname. The profile includes
the cache, log, and mail directories that the remote crawler shares
with the database machine.
To set these parameters, click on "Edit". Enter the shared directory
paths as seen by the remote crawler. You must ensure that these directories
are are shared or mounted appropriately.