globalBoundaryRules

The default boundary rules specified in this object are copied to new sources that are created with no other boundary rules.

Boundary rules restrict the crawler to those URLs that match the specified rules. Exclusion rules override inclusion rules. The order in which the rules are listed has no impact.

For file sources with no boundary rules, crawling is limited to the underlying file system access privileges. Files accessible from the specified seed file URL are crawled to the default crawling depth.

Object Type

Universal

State Properties

None

Supported Operations

export
update

Administration GUI Page

None

XML Description

The <search:globalBoundaryRules> element describes the rules limiting the scope of the crawler. It contains these elements:

<search:globalBoundaryRules>
   <search:boundaryRule>
      <search:ruleType>
      <search:ruleOperation>
      <search:rulePattern>

Element Descriptions

<search:globalBoundaryRules>

Contains one or more <search:boundaryRule> elements, each describing a boundary rule.

<search:boundaryRules>

Contains one or more <search:boundaryRule> elements.

<search:boundaryRule>

Describes a boundary rule. It contains these child elements:

<search:ruleType>
<search:ruleOperation>
<search:rulePattern>

<search:ruleType>

Type of URL boundary rule:

INCLUSION: The URL matches <search:rulePattern>.
EXCLUSION: The URL does not match <search:rulePattern>.

<search:ruleOperation>

Matching operation for a search rule pattern:

CONTAINS: The URL contains the rule pattern for a case-insensitive match.
STARTSWITH: The URL starts with the rule pattern for a case-insensitive match.
ENDSWITH: The URL ends with the rule pattern for a case-insensitive match.
REGEX: The URL matches the regular expression in a case-sensitive match.

<search:rulePattern>

The pattern of characters in the URL. You can use these special characters:

Caret (^) denotes the beginning of a URL.
Dollar sign ($) denotes the end of a URL.
A period (.) matches any one character.
Question mark (?) before a character matches 0 or 1 occurrences of that character.
Asterisk (*) before a pattern matches 0 or more occurrences of that pattern. Enclose the pattern in parentheses (), brackets [], or braces {}.
A backslash (\) precedes a literal use of a special character, such as \? to match a question mark in a URL.

Files with the following filename extensions are excluded by the default boundary rule patterns:

Image: bmp, png, tif
Audio: wav, wma, mp3
Video: avi, wmv, mpeg, mpg
Binary: bin, cab, dll, dmp, ear, exe, iso, jar, scm, so, tar, war, wmv

Example

This XML document defines the default global boundary rules:

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:globalBoundaryRules>
      <search:boundaryRules>
         <search:boundaryRule>
            <search:ruleType>EXCLUSION</search:ruleType>
            <search:ruleOperation>REGEX</search:ruleOperation>
            <search:rulePattern>
(?i:(?:\.jar)|(?:\.bmp)|(?:\.war)|(?:\.ear)|(?:\.mpg)|(?:\.wmv)|(?:\.mpeg)|(?:\.scm)|(?:\.iso)|(?:\.dmp)|(?:\.dll)|(?:\.cab)|(?:\.so)|(?:\.avi)|(?:\.wav)|(?:\.mp3)|(?:\.wma)|(?:\.bin)|(?:\.exe)|(?:\.iso)|(?:\.tar)|(?:\.png))$
            </search:rulePattern>
         </search:boundaryRule>
         <search:boundaryRule>
            <search:ruleType>EXCLUSION</search:ruleType>
            <search:ruleOperation>REGEX</search:ruleOperation>
            <search:rulePattern>\?.*(.*\+)\1{3}</search:rulePattern>
         </search:boundaryRule>
      </search:boundaryRules>
   </search:globalBoundaryRules>
</search:config>