Oracle Commerce Guided Search

Field	Description
Maximum hops	Optional. Limits the depth of the crawl to a maximum number of hops from the root URL (see below). The value must be an integer greater than zero. If no value is provided, the maximum number of hops is unlimited.
Maximum depth	Optional. Limits the depth of the crawl to a maximum depth of URL path. The value must be an integer greater than zero. If no value is provided, the maximum path depth is unlimited.
Agent name	Identifies the name of the spider as it will be referred to in the User-agent field of a `robots.txt file.`Required if you are following the `robots.txt` standard.
Differential crawl URL	Optional. When configuring a spider to perform a differential crawl, the differential crawl URL specifies a file location to store the spider's state between Forge executions. This file is read in at the beginning of a differential crawl to enqueue URLs from previous crawls and may be updated during the crawl.
Ignore robots	When checked, the spider does not adhere to the `robots.txt` standard, which tells the spider which files it can crawl. By default, the spider follows the standard and looks for `robots.txt`on the server.
Disable cookies	When checked, the spider refuses cookies sent from a host server during a crawl.

Field

Description

Maximum hops

Optional. Limits the depth of the crawl to a maximum number of hops from the root URL (see below). The value must be an integer greater than zero. If no value is provided, the maximum number of hops is unlimited.

Maximum depth

Optional. Limits the depth of the crawl to a maximum depth of URL path. The value must be an integer greater than zero. If no value is provided, the maximum path depth is unlimited.

Agent name

Identifies the name of the spider as it will be referred to in the User-agent field of a robots.txt file.Required if you are following the robots.txt standard.

Differential crawl URL

Optional. When configuring a spider to perform a differential crawl, the differential crawl URL specifies a file location to store the spider's state between Forge executions. This file is read in at the beginning of a differential crawl to enqueue URLs from previous crawls and may be updated during the crawl.

Ignore robots

When checked, the spider does not adhere to the robots.txt standard, which tells the spider which files it can crawl. By default, the spider follows the standard and looks for robots.txton the server.

Disable cookies

When checked, the spider refuses cookies sent from a host server during a crawl.

Root URLs

The Root URLs tab is where you manage root URLs, which specify the location from which the spider starts crawling. There must be at least one root URL specified for each spider.

Field	Description
Enqueue URLs	Optional. Takes each URL link from a specified property and adds them to the queue for further filtering.
URL filters	Optional. Provides pattern matching capabilities to control document filtering.

Field

Description

Enqueue URLs

Optional. Takes each URL link from a specified property and adds them to the queue for further filtering.

URL filters

Optional. Provides pattern matching capabilities to control document filtering.

Option	Description
Maximum time spent fetching a URL	When checked, type the time in seconds.
Maximum time to wait for a connection to be made	When checked, type the time in seconds.
Abort fetch if transfer rate falls below	When checked, type the bytes per second and the number of seconds.

Option

Description

Maximum time spent fetching a URL

When checked, type the time in seconds.

Maximum time to wait for a connection to be made

When checked, type the time in seconds.

Abort fetch if transfer rate falls below

When checked, type the bytes per second and the number of seconds.

Option	Description
Proxy mode list	Choose whether to use no proxy servers, a single proxy server, or separate proxy servers for HTTP and HTTPS requests.
HTTP proxy server	The hostname and port number for the HTTP proxy server (if one is being used).
HTTPS proxy server	The hostname and port number for the HTTPS proxy server (if one is being used).
Bypass URLs	When clicked, opens the Bypass URLs editor, where you specify the list of URLs that should be fetched directly, bypassing any proxy servers.

Option

Description

Proxy mode list

Choose whether to use no proxy servers, a single proxy server, or separate proxy servers for HTTP and HTTPS requests.

HTTP proxy server

The hostname and port number for the HTTP proxy server (if one is being used).

HTTPS proxy server

The hostname and port number for the HTTPS proxy server (if one is being used).

Bypass URLs

When clicked, opens the Bypass URLs editor, where you specify the list of URLs that should be fetched directly, bypassing any proxy servers.

Sources

Required. A choice of record server components in the project.