Oracle® Secure Enterprise Search Administration API Guide 11g Release 2 (11.2.2) Part Number E23428-01 |
|
|
PDF · Mobi · ePub |
The default boundary rules specified in this object are copied to new sources that are created with no other boundary rules.
Boundary rules restrict the crawler to those URLs that match the specified rules. Exclusion rules override inclusion rules. The order in which the rules are listed has no impact.
For file sources with no boundary rules, crawling is limited to the underlying file system access privileges. Files accessible from the specified seed file URL are crawled to the default crawling depth.
Object Type
Universal
State Properties
None
Supported Operations
export update
Administration GUI Page
None
XML Description
The <search:globalBoundaryRules>
element describes the rules limiting the scope of the crawler. It contains these elements:
<search:globalBoundaryRules> <search:boundaryRule> <search:ruleType> <search:ruleOperation> <search:rulePattern>
Element Descriptions
Contains one or more <search:boundaryRule>
elements, each describing a boundary rule.
Contains one or more <search:boundaryRule>
elements.
Describes a boundary rule. It contains these child elements:
<search:ruleType> <search:ruleOperation> <search:rulePattern>
Type of URL boundary rule:
INCLUSION
: The URL matches <search:rulePattern>.
EXCLUSION
: The URL does not match <search:rulePattern>.
Matching operation for a search rule pattern:
CONTAINS
: The URL contains the rule pattern for a case-insensitive match.
STARTSWITH
: The URL starts with the rule pattern for a case-insensitive match.
ENDSWITH
: The URL ends with the rule pattern for a case-insensitive match.
REGEX
: The URL matches the regular expression in a case-sensitive match.
The pattern of characters in the URL. You can use these special characters:
Caret (^
) denotes the beginning of a URL.
Dollar sign ($
) denotes the end of a URL.
A period (.
) matches any one character.
Question mark (?
) before a character matches 0 or 1 occurrences of that character.
Asterisk (*
) before a pattern matches 0 or more occurrences of that pattern. Enclose the pattern in parentheses ()
, brackets []
, or braces {}
.
A backslash (\
) precedes a literal use of a special character, such as \?
to match a question mark in a URL.
Files with the following filename extensions are excluded by the default boundary rule patterns:
Image: bmp, png, tif
Audio: wav, wma, mp3
Video: avi, wmv, mpeg, mpg
Binary: bin, cab, dll, dmp, ear, exe, iso, jar, scm, so, tar, war, wmv
Example
This XML document defines the default global boundary rules:
<?xml version="1.0" encoding="UTF-8"?> <search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search"> <search:globalBoundaryRules> <search:boundaryRules> <search:boundaryRule> <search:ruleType>EXCLUSION</search:ruleType> <search:ruleOperation>REGEX</search:ruleOperation> <search:rulePattern> (?i:(?:\.jar)|(?:\.bmp)|(?:\.war)|(?:\.ear)|(?:\.mpg)|(?:\.wmv)|(?:\.mpeg)|(?:\.scm)|(?:\.iso)|(?:\.dmp)|(?:\.dll)|(?:\.cab)|(?:\.so)|(?:\.avi)|(?:\.wav)|(?:\.mp3)|(?:\.wma)|(?:\.bin)|(?:\.exe)|(?:\.iso)|(?:\.tar)|(?:\.png))$ </search:rulePattern> </search:boundaryRule> <search:boundaryRule> <search:ruleType>EXCLUSION</search:ruleType> <search:ruleOperation>REGEX</search:ruleOperation> <search:rulePattern>\?.*(.*\+)\1{3}</search:rulePattern> </search:boundaryRule> </search:boundaryRules> </search:globalBoundaryRules> </search:config>