Skip Headers
Oracle® Secure Enterprise Search Administration API Guide
11g Release 2 (11.2.2)

Part Number E23428-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

globalBoundaryRules

The default boundary rules specified in this object are copied to new sources that are created with no other boundary rules.

Boundary rules restrict the crawler to those URLs that match the specified rules. Exclusion rules override inclusion rules. The order in which the rules are listed has no impact.

For file sources with no boundary rules, crawling is limited to the underlying file system access privileges. Files accessible from the specified seed file URL are crawled to the default crawling depth.

Object Type

Universal

State Properties

None

Supported Operations

export
update

Administration GUI Page

None

XML Description

The <search:globalBoundaryRules> element describes the rules limiting the scope of the crawler. It contains these elements:

<search:globalBoundaryRules>
   <search:boundaryRule>
      <search:ruleType>
      <search:ruleOperation>
      <search:rulePattern>

Element Descriptions 

<search:globalBoundaryRules>

Contains one or more <search:boundaryRule> elements, each describing a boundary rule.

<search:boundaryRules>

Contains one or more <search:boundaryRule> elements.

<search:boundaryRule>

Describes a boundary rule. It contains these child elements:

<search:ruleType>
<search:ruleOperation>
<search:rulePattern>
<search:ruleType>

Type of URL boundary rule:

<search:ruleOperation>

Matching operation for a search rule pattern:

  • CONTAINS: The URL contains the rule pattern for a case-insensitive match.

  • STARTSWITH: The URL starts with the rule pattern for a case-insensitive match.

  • ENDSWITH: The URL ends with the rule pattern for a case-insensitive match.

  • REGEX: The URL matches the regular expression in a case-sensitive match.

<search:rulePattern>

The pattern of characters in the URL. You can use these special characters:

  • Caret (^) denotes the beginning of a URL.

  • Dollar sign ($) denotes the end of a URL.

  • A period (.) matches any one character.

  • Question mark (?) before a character matches 0 or 1 occurrences of that character.

  • Asterisk (*) before a pattern matches 0 or more occurrences of that pattern. Enclose the pattern in parentheses (), brackets [], or braces {}.

  • A backslash (\) precedes a literal use of a special character, such as \? to match a question mark in a URL.

Files with the following filename extensions are excluded by the default boundary rule patterns:

  • Image: bmp, png, tif

  • Audio: wav, wma, mp3

  • Video: avi, wmv, mpeg, mpg

  • Binary: bin, cab, dll, dmp, ear, exe, iso, jar, scm, so, tar, war, wmv

Example

This XML document defines the default global boundary rules:

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:globalBoundaryRules>
      <search:boundaryRules>
         <search:boundaryRule>
            <search:ruleType>EXCLUSION</search:ruleType>
            <search:ruleOperation>REGEX</search:ruleOperation>
            <search:rulePattern>
(?i:(?:\.jar)|(?:\.bmp)|(?:\.war)|(?:\.ear)|(?:\.mpg)|(?:\.wmv)|(?:\.mpeg)|(?:\.scm)|(?:\.iso)|(?:\.dmp)|(?:\.dll)|(?:\.cab)|(?:\.so)|(?:\.avi)|(?:\.wav)|(?:\.mp3)|(?:\.wma)|(?:\.bin)|(?:\.exe)|(?:\.iso)|(?:\.tar)|(?:\.png))$
            </search:rulePattern>
         </search:boundaryRule>
         <search:boundaryRule>
            <search:ruleType>EXCLUSION</search:ruleType>
            <search:ruleOperation>REGEX</search:ruleOperation>
            <search:rulePattern>\?.*(.*\+)\1{3}</search:rulePattern>
         </search:boundaryRule>
      </search:boundaryRules>
   </search:globalBoundaryRules>
</search:config>