Skip Headers
Oracle® Secure Enterprise Search Administration API Guide
11g Release 1 (11.1.2.0.0)

E14133-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

source

Sources are collections of data to be searched, such as Web sites, files, database tables, content management repositories, collaboration repositories, and applications.

Note:

The current release of the Oracle SES Administration API supports these source types:
  • File

  • Federated

  • User Defined

  • Web

Object Type

Creatable

Object Key

name

Object Key Command Syntax

--NAME=object_name

-n object_name

State Properties

None

Supported Operations

create
createAll
delete
deleteAll
deleteList
export
exportAll
exportList
getAllObjectKeys
update
updateAll

Administration GUI Page


Home - Sources - Create or Edit Source
Home - Sources - Customize Federated Source

XML Descriptions

Each supported source type has a unique XML description:

XML Description: Federated Sources

For a federated source, the <search:sources> element contains a <search:federatedSource> element:

<search:sources>
   <search:federatedSource>
      <search:name>
      <search:url>
      <search:security>
         <search:entityName>
         <search:entityPassword>
         <search:authAttribute>
      <search:queryRouting>
         <search:filterRule>
      <search:searchRestrictions>
         <search:groupRestrictedEnabled>
         <search:searchedGroups>
            <search:fedSourceGroup> 
      <search:attributeRetrieval>
         <search:retrievedAttrs>
            <search:fedSearchAttr>
       <search:attributeMappings>
          <search:attributeMapping>
             <search:localAttribute>
             <search:localAttribute>

Element Descriptions 

<search:sources>

Contains one or more source descriptions.

<search:federatedSource>

Describes a federated source. It contains these elements:

<search:name>
<search:url>
<search:security>
<search:queryRouting>
<search:searchRestrictions>
<search:attributeRetrieval>
<search:name>

Contains the name of the source. (Required)

<search:url>

Contains the Web service URL.

<search:security>

Describes security for connecting to the federated source. It contains these child elements:

<search:entityName>
<search:entityPassword>
<search:authAttribute>
<search:entityName>

Contains the name of the federation trusted entity on the federation endpoint. Contact the administrator of the federated endpoint for this information

<search:entityPassword>

Contains the password for the entity name.

Attribute Value
encrypted Indicates whether the value of <search:entityPassword> is encrypted. Set to true if the password is encrypted, or set to false if it is plain text.

<search:authAttribute>

Contains the name of an attribute that identifies and can authenticate a user on the federation endpoint.

<search:queryRouting>

Describes the rules for routing queries to the federated source. Without any rules, Oracle SES routes all queries to the federated source. This element is optional, but can improve scalability. It contains a <search:filterRule> element.

<search:filterRule>

Contains the rules within a CDATA element. Rules consist of an attribute, a colon (:), and an expression. Attributes can be DATE, STRING, or NUMBER. DATE and NUMBER attributes can include these operators: -, =, >, >=, <, <=. The AND or OR operators separate multiple rules.

<search:searchRestrictions>

Restricts searches to a list of source groups. It contains these child elements:

<search:groupRestrictedEnabled>
<search:searchedGroups>
<search:groupRestrictedEnabled>

Controls whether source groups are restricted during searches. Set to true to restrict searches, or set to false otherwise. The default value is false. (Optional)

<search:searchedGroups>

Describes the source groups to be searched on the federated source. It contains one or more <search:fedSourceGroup> elements.

<search:fedSourceGroup>

Empty element that uses parameters to identify source group. (Read only)

Attribute Value
isAvailable Identifies whether the source group is currently available in the federated source.
name Name of a federated source group. (Required)

<search:attributeRetrieval>

Describes the attributes to be retrieved from the federated source. It contains a <search:retrieveAttrs> element.

<search:retrievedAttrs>

Contains one or more <search:fedSearchAttr> elements.

<search:fedSearchAttr>

Empty element that uses parameters to identify a search attribute.

Attribute Value
name Name of a search attribute. (Required)
type Data type of the attribute: STRING, NUMBER, or DATE.
isAvailable Identifies whether the attribute is currently available in the federated source: true if it is available, or false otherwise.
isMandatory Identifies whether retrieval of the attribute is mandatory: true if it must be listed in the <search:retrievedAttrs> element, or false if it can be omitted without causing an error.

<search:attributeMappings>

Contains one or more <search:attributeMapping> elements.

<search:attributeMapping>

Maps a local attribute to a remote attribute. It contains one of each of these elements:

<search:localAttribute>
<search:remoteAttribute>
<search:localAttribute>

Identifies the local attribute being mapped.

Attribute Value
name Name of the local attribute. (Required)
type Data type of the local attribute: STRING, NUMBER, or DATE. (Required)

<search:remoteAttribute>

Identifies the remote attribute being mapped.

Attribute Value
name Name of the remote attribute. (Required)
type Data type of the remote attribute: STRING, NUMBER, or DATE. (Required)
isAvailable Identifies whether the remote attribute is currently available in the federated source: true if it is available, or false otherwise.

Example 2-1 Federated Source Description

This XML document describes a federated source:

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.1.2.0.0" xmlns:search="http://xmlns.oracle.com/search">
  <search:sources>
    <search:federatedSource>
      <search:name>fed1</search:name>
      <search:url>http://example:7777/search/query/OracleSearch</search:url>
      <search:security>
        <search:entityName>entity2</search:entityName>
        <search:entityPassword encrypted="false">password</search:entityPassword>
        <search:authAttribute>nickname</search:authAttribute>
      </search:security>
      <search:queryRouting>
        <search:filterRule>
          <![CDATA[
          (language:en) AND (idm::mail:a.*)
          ]]>
        </search:filterRule>
      </search:queryRouting>
      <search:searchRestrictions>
        <search:groupRestrictedEnabled>true</search:groupRestrictedEnabled>
        <search:searchedGroups>
          <search:fedSourceGroup isAvailable="true" name="FILE"/>
          <search:fedSourceGroup isAvailable="true" name="Web"/>
        </search:searchedGroups>
      </search:searchRestrictions>
      <search:attributeRetrieval>
        <search:retrievedAttrs>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Author"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Description"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Infosource"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Infosource Path"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Language"/>
          <search:fedSearchAttr type="DATE" isAvailable="true" 
            isMandatory="true" name="LastModifiedDate"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Mimetype"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Title"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Url"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="false" name="custom1"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="false" name="custom2"/>
          <search:fedSearchAttr type="NUMBER" isAvailable="true"
            isMandatory="true" name="eqdocid"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="eqfedid"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="eqsnippet"/>
        </search:retrievedAttrs>
      </search:attributeRetrieval>
    </search:federatedSource>
  </search:sources>
</search:config>

XML Description: File Sources

For a file source, the <search:sources> element contains a <search:fileSource> element:

<search:sources>
   <search:fileSource>
      <search:name>
      <search:fileDisplayUrl>
         <search:fileUrlPrefix>
         <search:displayUrlPrefix>
      <search:startingUrls>
         <search:startingUrl>
            <search:url>
      <search:aclPolicy>
      <search:authorizationPlugin>
      <search:boundaryRules>
      <search:attributeMappings>
         <search:attributeMapping>
            <search:documentAttr>
            <search:searchAttr>
      <search:crawlerSettings>
         <search:numThreads>
         <search:languageDetection>
         <search:defaultLanguage>
         <search:crawlTimeout>
         <search:maxDocumentSize>
         <search:preserveDocumentCache>
         <search:defaultCharSet>
         <search:servicePipeline> 
            <search:pipelineName>
      <search:documentTypes>
         <search:documentType>
            <search:mimeType>

Element Descriptions 

<search:sources>

Contains one or more source descriptions.

<search:fileSource>

Describes a file source. It contains these elements:

<search:name>
<search:fileDisplayUrl>
<search:startingUrls>
<search:aclPolicy>
<search:boundaryRules>
<search:attributeMappings>
<search:crawlerSettings>
<search:documentTypes>
<search:name>

Contains the name of the file source.

<search:fileDisplayUrl>

Identifies a physical path that is replaced by a display URL for security reasons when the file is retrieved during a search.

Attribute Value
enabled Controls whether the display URL prefix is used for security reasons. Set to true to use the display URL, or set to false to display the physical location of the file. (Required)

<search:fileUrlPrefix>

Contains the physical file URL to be replaced by the display URL.

<search:displayUrlPrefix>

Contains a URL prefix displayed instead of the file URL.

<search:startingUrls>

Identifies the file path where the crawler begins. It consists of one or more of these child elements:

<search:startingUrl>

Contains a <search:url> element.

<search:url>

Contains an entry point for starting to crawl files. The URL must be in its original form as an unencoded file path.

<search:aclPolicy>

Describes an authorization policy for the source. See "XML Description: Web Sources".

<search:authorizationPlugin>

Describes the authorization plug-in. See "XML Description: User-Defined Sources".

<search:boundaryRules>

Describes the boundary rules for the source. See "XML Description: Web Sources".

<search:attributeMappings>

Maps the document attributes to search attributes. It contains one or more <search:attributeMapping> elements.

<search:attributeMapping>

Contains a document attribute and a search attribute for mapping. It contains one of each of these child elements:

<search:documentAttr>
<search:searchAttr>
<search:documentAttr>

Identifies a document attribute by its name and data type.

Attribute Value
name Name of a document attribute
type Data type of the attribute: DATE, NUMBER, or STRING

<search:searchAttr>

Identifies a search attribute by its name and data type. Search attributes are displayed to users in the Oracle SES Search interface.

Attribute Value
name Name of a search attribute
type Data type of the attribute: DATE, NUMBER, or STRING

<search:crawlerSettings>

Configures the crawler. It contains these child elements:

<search:numThreads>
<search:languageDetection>
<search:defaultLanguage>
<search:crawlTimeout>
<search:maxDocumentSize>
<search:preserveDocumentCache>
<search:defaultCharSet>
<search:servicePipeline>
<search:numThreads>

Contains the number of simultaneous processes available for crawling.

<search:languageDetection>

Controls the use of a language detector when the metadata for a document does not identify the language.

Attribute value
enabled Controls use of language detection when a source document does not indicate the language in the header. Set to true to enable language detection, or set to false otherwise. (Required)

<search:defaultLanguage>

Default language used by the crawler when the document language is not identified.

<search:crawlTimeout>

Contains the number of milliseconds allowed for the target site to return a document.

<search:maxDocumentSize>

Contains the maximum document size in megabytes. Larger documents are not crawled.

<search:preserveDocumentCache>

Controls retention of the document cache after indexing.

Attribute Value
enabled Set to true to retain the cache, or set to false otherwise. (Required)

<search:defaultCharSet>

Code for the default character set, which is used when a source document does not identify its character set in the metadata

<search:servicePipeline>

Controls use of a document service pipeline. When enabled, this element contains a <search:pipelineName> element.

Attribute Value
enabled Set to true to use the pipeline, or set to false otherwise. (Required)

<search:pipelineName>

Contains the name of the pipeline.

<search:documentTypes>

Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.

<search:documentType>

Contains one or more <search:mimeType> elements.

<search:mimeType>

Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats" for supported MIME types.

Example 2-2 File Source Description

This XML document describes a file source:

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.1.2.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:sources>
      <search:fileSource>
         <search:name>Document Library</search:name>
         <search:fileDisplayUrl enabled="false"/>
         <search:startingUrls>
            <search:startingUrl>
               <search:url>file://localhost/startingDirectory/</search:url>
            </search:startingUrl>
         </search:startingUrls>
         <search:aclPolicy>
            <search:noACL/>
         </search:aclPolicy>
         <search:attributeMappings>
            <search:attributeMapping>
               <search:documentAttr name="AUTHOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="CREATOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="DESCRIPTION" type="STRING"/>
               <search:searchAttr name="Description" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="HOST" type="STRING"/>
               <search:searchAttr name="Host" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="INFOSOURCE" type="STRING"/>
               <search:searchAttr name="Infosource" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="KEYWORD" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="KEYWORDS" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="LANGUAGE" type="STRING"/>
               <search:searchAttr name="Language" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="LASTMODIFIEDDATE" type="DATE"/>
               <search:searchAttr name="LastModifiedDate" type="DATE"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="MIMETYPE" type="STRING"/>
               <search:searchAttr name="Mimetype" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="SUBJECT" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="SUBJECTS" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="TITLE" type="STRING"/>
               <search:searchAttr name="Title" type="STRING"/>
            </search:attributeMapping>
         </search:attributeMappings>
         <search:crawlerSettings>
            <search:numThreads>5</search:numThreads>
            <search:languageDetection enabled="false"/>
            <search:defaultLanguage>en</search:defaultLanguage>
            <search:crawlTimeout>30</search:crawlTimeout>
            <search:maxDocumentSize>10</search:maxDocumentSize>
            <search:preserveDocumentCache enabled="true"/>
            <search:defaultCharSet>8859_1</search:defaultCharSet>
            <search:servicePipeline enabled="true">
               <search:pipelineName>Default pipeline</search:pipelineName>
            </search:servicePipeline>
         </search:crawlerSettings>
         <search:documentTypes>
            <search:documentType>
               <search:mimeType>text/html</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/plain</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/xml</search:mimeType>
            </search:documentType>
         </search:documentTypes>
      </search:fileSource>
   </search:sources>
</search:config>

XML Description: User-Defined Sources

For a user-defined source, a <search:sources> element contains a <search:userDefinedSource> element:

<search:sources>
   <search:userDefinedSource>
      <search:name>
      <search:sourceTypeName>
      <search:aclPolicy>
      <search:authorizationPlugin>
         <search:managerClassName>
         <search:jarFilePath>
         <search:parameters>
            <search:parameter>
      <search:securityAttrs>
         <search:securityAttr>
      <search:parameters>
         <search:parameter>
            <search:value>
      <search:boundaryRules>
      <search:attributeMappings>
      <search:crawlerSettings>
      <search:documentTypes>
         <search:documentType>
            <search:mimeType>

Element Descriptions 

<search:sources>

Describes one or more sources.

<search:userDefinedSource>

Describes a user-defined source. It contains these child elements:

<search:name>
<search:sourceTypeName>
<search:boundaryRules>
<search:aclPolicy>
<search:attributeMappings>
<search:documentTypes>
<search:parameters>
<search:name>

Name of the user-defined source.

<search:sourceTypeName>

Type of user-defined source. For a complete list of user-defined source types, issue an exportAll sourceType command. Set to the source type exactly as shown.

Database
EMC Documentum Content Server
EMC Documentum eRoom
Federated User Authorization Cache
FileNet Content Engine
FileNet Image Services
Hummingbird
IBM DB2
Lotus Notes
Microsoft Exchange)
Microsoft SharePoint 2007
NTFS
Open Text Livelink
Oracle Calendar
Oracle Collaboration Suite E-Mail
Oracle Content Database
Oracle Content Database (JDBC)
Oracle Content Server
Oracle E-Business Suite
Oracle Fusion
Oracle WebCenter
Siebel 7.8
Siebel 7.8(Public)
Siebel 8
User Authorization Cache
User-Defined Source Type
<search:aclPolicy>

See "XML Description: Web Sources".

<search:authorizationPlugin>

Describes an authorization plug-in. It contains these elements:

<search:managerClassName>
<search:jarFilePath>
<search:parameters>
<search:managerClassName>

Contains the name of the plug-in manager Java class.

<search:jarFilePath>

Contains the qualified name of the jar file. Paths can be absolute or relative to the ORACLE_HOME/search/lib/plugins/identity directory.

<search:parameters>

Contains one or more <search:parameter> elements, each one setting a parameter. This element appears in a <search:userDefinedSource> element to define parameters supported by the source. It also appears in a <search:authorizationPlugin> to define parameters supported by the plug-in.

<search:parameter>

Describes a parameter. It contains the following elements:

<search:value>
<search:description>
Attribute Value
name Name of a parameter.

<search:value>

Contains the value of the parameter.

Attribute Value
encrypted Indicates whether the value of <search:value> is encrypted. Set to true if the value is encrypted, or set to false if it is plain text.

<search:description>

Contains a description of the parameter.

<search:securityAttrs>

Contains one or more <search:securityAttr> elements.

<search:securityAttr>

Contains a user or a group that is granted or denies access to the data source, depending on the value of the type attribute. (Read only)

Attribute Value
type Set to GRANT if the user or group has access to the source, or set to DENY otherwise.

<search:boundaryRules>

Describes the boundary rules. See "XML Description: Web Sources".

<search:attributeMappings>

Maps the document attributes to search attributes. See "XML Description: File Sources".

<search:crawlerSettings>

Configures the crawler. It contains these child elements:

<search:numThreads>
<search:languageDetection>
<search:defaultLanguage>
<search:crawlTimeout>
<search:maxDocumentSize>
<search:preserveDocumentCache>
<search:defaultCharSet>
<search:servicePipeline>

See "XML Description: Web Sources".

<search:documentTypes>

Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.

<search:documentType>

Contains a <search:mimeType> element.

<search:mimeType>

Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats".

Example 2-3 User-Defined Source Description

This XML document describes an Oracle Content Database source.

<?xml version="1.0"?>
<search:config productVersion="11.1.2.0.0" xmlns:search="http://xmlns.oracle.com/search">
 <search:sources>
   <search:userDefinedSource>
     <search:name>contentdb</search:name>
     <search:sourceTypeName>Oracle Content Database</search:sourceTypeName>
     <search:aclPolicy>
       <search:noACL/>
     </search:aclPolicy>
     <search:parameters>
       <search:parameter name="Oracle Content Database URL">
          <search:value>http://contentDBUrl.com:7777/content</search:value>
       </search:parameter>
       <search:parameter name="Starting paths">
          <search:value>/us</search:value>
       </search:parameter>
       <search:parameter name="Depth">
          <search:value>-1</search:value>
       </search:parameter>
       <search:parameter name="Oracle Content Database admin user">
          <search:value>myUserName</search:value>
       </search:parameter>
       <search:parameter name="Entity name">
          <search:value>
       orclapplicationcommonname=ocscsplugin,cn=ifs,cn=products,cn=oraclecontext
          </search:value>
       </search:parameter>
       <search:parameter name="Entity password">
          <search:value encrypted="false">password</search:value>
       </search:parameter>
       <search:parameter name="Crawl only">
          <search:value>false</search:value>
       </search:parameter>
       <search:parameter name="Use e-mail for authorization">
          <search:value>false</search:value>
       </search:parameter>
     </search:parameters>
   </search:userDefinedSource>
 </search:sources>
</search:config> 

XML Description: Web Sources

For a Web source, the <search:source> element contains a <search:webSource> element:

<search:sources>
   <search:webSource>
      <search:name>
      <search:selfService>
      <search:startingUrls>
         <search:startingUrl>
            <search:url>

         <search:aclPolicy>
<!-- No ACL policy -->
            <search:noACL>
<!-- Document-level ACL policy -->
            <search:documentLevelACL>
<!-- Source-level ACL policy -->
            <search:sourceLevelACL>
               <search:accessControlEntries>
                  <search:accessControlEntry>
                     <search:name>
                     <search:privilege>

      <search:authorizationPlugin>
<!-- Boundary rules -->
         <search:boundaryRules>
            <search:boundaryRule>
               <search:ruleType>
               <search:ruleOperation>
               <search:rulePattern>

         <search:metatagMappings>
            <search:metatagMapping>
               <search:documentAttr>
               <search:searchAttr>

         <search:crawlerSettings>
            <search:numThreads> 
            <search:languageDetection>
               <search:defaultLanguage> 
            <search:crawlDepth>
               <search:limit>
            <search:crawlTimeout> 
            <search:maxDocumentSize> 
            <search:preserveDocumentCache>
            <search:defaultCharSet>
            <search:servicePipeline>
               <search:pipelineName>
            <search:honorRobotsExclusion>
            <search:indexDynamicPages>
            <search:urlRewriter>
               <search:urlRewriterClass>
               <search:urlRewriterJar>
            <search:httpCharSetOverride>
            <search:cookies>
               <search:cookieContentInLog>
               <search:maxCookieSize>
               <search:maxCookies>
               <search:maxCookiesPerHost>

         <search:documentTypes>
            <search:documentType>
               <search:mimeType>

         <search:httpAuthentications>
            <search:httpAuthentication>
               <search:host>
               <search:realm>
               <search:username>
               <search:password>

         <search:htmlForms>
            <search:htmlForm>
               <search:name>
               <search:formUrl>
               <search:action>
               <search:successUrl>
               <search:formControls>
                  <search:formControl>
                     <search:name>
                     <search:value>
                     <search:isPasswordField>

         <search:ssoAuthentication>
            <search:username>
            <search:password>

Element Descriptions 

<search:sources>

Contains one or more source descriptions.

<search:webSource>

Describes a Web source. It contains these child elements:

<search:name>
<search:selfService
<search:startingUrls>
<search:aclPolicy>
<search:boundaryRules>
<search:metatagMappings>
<search:crawlerSettings>
<search:documentTypes>
<search:httpAuthentications>
<search:htmlForms>
<search:ssoAuthentication>
<search:name>

Name of the Web source.

<search:selfService>

Contains a value of true to enable self-service authentication, or a value of false to disable it. Self-service authentication lets users enter authentication credentials at run time, instead of the administrator entering credentials at the time the source is created.

<search:startingUrls>

Contains one or more <search:startingUrl> elements.

<search:startingUrl>

Contains a <search:url> element.

<search:url>

Contains the URL-encoded Web address that is an entry point for starting to crawl Web pages.

<search:aclPolicy>

Describes an ACL policy for the source. It contains one of these child elements:

<search:noACL>
<search:documentLevelACL>
<search:sourceLevelACL>
<search:noACL>

Indicates no ACL policy. All documents are visible and searchable.

<search:documentLevelACL>

Describes a document-level ACL policy.

<search:sourceLevelACL>

Describes an Oracle SES ACL policy used when crawling private content. It preserves authorizations specified in OracleAS Portal. For user-defined sources, crawler plug-ins (or connectors) can supply ACL information with documents for indexing, which provides finer control document protection. That is, each document within one source may be viewed by a different set of users or groups.

This element contains a <search:accessControlEntries> element.

<search:accessControlEntries>

Contains one or more <search:accessControlEntry> elements.

<search:accessControlEntry>

Provides a list of users and groups that have access to the source or are restricted from access. It contains these child elements:

<search:name>
<search:privilege>
<search:name>

Contains the name or a user or group that is valid for the currently active identity plug-in.

<search:privilege>

Set to GRANTED to allow access to the source, or set to DENIED to restrict access.

<search:authorizationPlugin>

Describes an authorization plug-in. See "XML Description: User-Defined Sources".

<search:boundaryRules>

Contains one or more <search:boundaryRule> elements, each describing a boundary rule.

<search:boundaryRule>

Describes a boundary rule. It contains these child elements:

<search:ruleType>
<search:ruleOperation>
<search:rulePattern>
<search:ruleType>

Type of URL boundary rule. Set to one of these keywords:

<search:ruleOperation>

Matching operation for a search rule pattern. Set to one of these operations:

  • CONTAINS: The URL contains the rule pattern for a case-insensitive match.

  • STARTSWITH: The URL starts with the rule pattern for a case-insensitive match.

  • ENDSWITH: The URL ends with the rule pattern for a case-insensitive match.

  • REGEX: The URL contains the regular expression in a case-sensitive match.

<search:rulePattern>

The pattern of characters in the URL. You can use these special characters:

  • Caret (^) denotes the beginning of a URL.

  • Dollar sign ($) denotes the end of a URL.

  • A period (.) matches any one character.

  • Question mark (?) before a character matches 0 or 1 occurrences of that character.

  • Asterisk (*) before a pattern matches 0 or more occurrences of that pattern. Enclose the pattern in parentheses (), brackets [], or braces {}.

  • A backslash (\) precedes a literal use of a special character, such as \? to match a question mark in a URL.

<search:metatagMappings>

Contains one or more <search:metataMappings> elements.

<search:metatagMapping>

Contains a mapped pair of attributes in these child elements:

<search:documentAttr>
<search:searchAttr>
<search:documentAttr>

Identifies a document attribute by its name and data type. Document attributes are among the properties of a document.

Attribute Value
name Name of a document attribute. (Required)
type Data type of the attribute: DATE, NUMBER, or STRING.

<search:searchAttr>

Identifies a search attribute by its name and data type. Search attributes are displayed to users in the Oracle SES Search interface.

Attribute Value
name Name of a search attribute. (Required)
type Data type of the attribute: DATE, NUMBER, or STRING.

<search:crawlerSettings>

Configures the crawler. It contains these child elements:

<search:numThreads>
<search:languageDetection>
<search:defaultLanguage>
<search:crawlDepth>
<search:crawlTimeout>
<search:maxDocumentSize>
<search:preserveDocumentCache>
<search:defaultCharSet>
<search:servicePipeline>
<search:honorRobotsExclusion>
<search:indexDynamicPages>
<search:urlRewriter>
<search:httpCharSetOverride>
<search:cookies>
<search:numThreads>

Number of processes to use for crawling the source.

<search:languageDetection>

Controls the use of a language detector when the metadata for a document does not identify the language.

Attribute value
enabled Controls use of language detection when a source document does not indicate the language in the header. Set to true to enable language detection, or set to false otherwise. (Required)

<search:defaultLanguage>

Default language used by the crawler when the document language cannot be detected.

<search:crawlDepth>

Controls use of a limit on crawling nested links. It contains a <search:limit> element.

Attribute Value
haslimit Controls whether the search limit is enforced. Set to true to impose the limit, or set to false otherwise. (Required)

<search:limit>

Contains the maximum number of nested links to be crawled.

<search:crawlTimeout>

Number of milliseconds for search results to be returned.

<search:maxDocumentSize>

Maximum document size in megabytes. Larger documents are not crawled.

<search:preserveDocumentCache>

Controls retention of the document cache after indexing.

Attribute Value
enabled Set to true to retain the cache, or set to false otherwise. (Required)

<search:defaultCharSet>

Code for the default character set, which is used when a source document does not identify its character set in the header. See Table 2-4, "Crawlable Character Sets".

<search:servicePipeline>

Controls use of a document service pipeline.

Attribute Value
enabled Set to true to use the pipeline, or set to false otherwise. When true, <search:servicePipeline> contains a <search:pipelineName> element.

<search:pipelineName>

Contains the name of a pipeline.

<search:honorRobotsExclusion>

Controls visits by robots to the Web site.

Attribute Value
enabled Set to true to exclude robots, or set to false otherwise.

<search:indexDynamicPages>

Controls whether dynamic pages are crawled and indexed.

Attribute Value
enabled Set to true to crawl dynamic pages, or set to false otherwise.

<search:urlRewriter>

Controls whether the URL Rewriter is used to filter and rewrite URL links. It contains these elements:

<search:urlRewriterClass>
<search:urlRewriterJar>
Attribute Value
enabled Set to true to use the URL Rewriter, or set to false otherwise.

<search:urlRewriterClass>

Contains the class name of the URL Rewriter.

<search:urlRewriterJar>

Contains the path to the JAR file for the URL Rewriter.

<search:httpCharSetOverride>

Controls the character set used for a Web page.

Attribute Value
enabled Set to true to exclude robots, or set to false otherwise.

<search:cookies>

Controls whether cookies are used to remember context. It contains these child elements:

<search:cookiecontentInLog>
<search:maxCookieSize>
<search:maxCookies>
<search:maxCookiesPerHost>
Attribute Value
enabled Set to true to enable cookies (default), or false otherwise.

<search:cookieContentInLog>

Controls whether information about cookies appears in the log file.

Attribute Value
enabled Set to true to log cookie messages, or set to false otherwise (default).

<search:maxCookieSize>

Contains the maximum size in bytes of a cookie.

<search:maxCookies>

Contains the total number of cookies allowed in a crawl.

<search:maxCookiesPerHost>

Contains the maximum number of cookies permitted for a Web site.

<search:documentTypes>

Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.

<search:documentType>

Contains one or more <search:mimeType> elements.

<search:mimeType>

Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats".

<search:httpAuthentications>

Contains one or more <search:httpAuthentication> elements.

<search:httpAuthentication>

Describes HTTP authentication. For proxy authentication, it contains these elements:

<search:host>
<search:realm>
<search:username>
<search:password>
<search:host>

Contains the address of the target computer.

<search:realm>

Contains a name associated with the protected area of a Web site.

<search:username>

Contains the name of the log-in user.

<search:password>

Contains the password associated with the user name.

Attribute Value
encrypted Indicates whether the value of <search:password> is encrypted. Set to true if the password is encrypted, or set to false if it is plain text.

<search:htmlForms>

Contains one or more <search:htmlForm> elements, each one describing an HTML form.

<search:htmlForm>

Describes an HTML form. It contains these elements:

<search:name>
<search:formUrl>
<search:action>
<search:successUrl>
<search:formControls>
<search:name>

Contains the name of the HTML form object.

<search:formUrl>

Contains the Web address of the HTML form.

<search:action>

Contains the address where the browser sends the form.

<search:successUrl>

Contains the URL displayed after the user successfully submits the form.

<search:formControls>

Contains one or more <search:formControl> elements.

<search:formControl>

Describes a form control. It contains these elements:

<search:name>
<search:value>
<search:isPasswordField>
<search:name>

Contains the name of the form control.

<search:value>

Contains the value of the form control.

Attribute Value
encrypted Indicates whether the value of <search:value> is encrypted. Set to true if the value is encrypted, or set to false if it is plain text.

<search:isPasswordField>

Identifies whether the field contains a password. Set to true for a password field, or false otherwise.

<search:ssoAuthentication>

Describes OracleAS Single Sign-On authentication. It contains these elements:

<search:username>
<search:password>
Attribute Value
enabled Controls use of OracleAS Single Sign-On for authentication. Set to true to enable Single Sign-On, or false otherwise.

<search:username>

Contains a user name for OracleAS Single Sign-On.

<search:password>

Contains the password for the OracleAS Single Sign-On user.

Attribute Value
encrypted Indicates whether the value of <search:password> is encrypted. Set to true if the password is encrypted, or set to false if it is plain text.

Example 2-4 Web Source Description

This XML document describes a Web source.

<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.1.2.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:sources>
      <search:webSource>
         <search:name>this_websource</search:name>
         <search:startingUrls>
            <search:startingUrl>
               <search:url>http://www.example.com/</search:url>
            </search:startingUrl>
         </search:startingUrls>
         <search:aclPolicy>
            <search:noACL/>
         </search:aclPolicy>
         <search:boundaryRules>
            <search:boundaryRule>
               <search:ruleType>EXCLUSION</search:ruleType>
               <search:ruleOperation>STARTSWITH</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[http://www.example.com?test=test val3]]>
               </search:rulePattern>
            </search:boundaryRule>
            <search:boundaryRule>
               <search:ruleType>INCLUSION</search:ruleType>
               <search:ruleOperation>CONTAINS</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[http://www.example.com?test=test val]]>
               </search:rulePattern>
            </search:boundaryRule>
            <search:boundaryRule>
               <search:ruleType>INCLUSION</search:ruleType>
               <search:ruleOperation>REGEX</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[^https?://www\.example\.com(?:\:\d{1,5})?(?:$|/)]]>
               </search:rulePattern>
            </search:boundaryRule>
         </search:boundaryRules>
         <search:metatagMappings>
            <search:metatagMapping>
               <search:documentAttr name="AUTHOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="CREATOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="DESCRIPTION" type="STRING"/>
               <search:searchAttr name="Description" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="KEYWORD" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="KEYWORDS" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="SUBJECT" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="SUBJECTS" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:metatagMapping>
         </search:metatagMappings>
         <search:crawlerSettings>
            <search:numThreads>7</search:numThreads>
            <search:languageDetection enabled="true"/>
            <search:defaultLanguage>fr</search:defaultLanguage>
            <search:crawlDepth haslimit="true">
               <search:limit>2</search:limit>
            </search:crawlDepth>
            <search:crawlTimeout>100</search:crawlTimeout>
            <search:maxDocumentSize>1000</search:maxDocumentSize>
            <search:preserveDocumentCache enabled="true"/>
            <search:defaultCharSet>JIS</search:defaultCharSet>
            <search:servicePipeline enabled="false"/>
            <search:honorRobotsExclusion enabled="false"/>
            <search:indexDynamicPages enabled="true"/>
            <search:urlRewriter enabled="false"/>
            <search:httpCharSetOverride enabled="false"/>
            <search:cookies enabled="true">
               <search:cookieContentInLog enabled="false"/>
               <search:maxCookieSize>1</search:maxCookieSize>
               <search:maxCookies>2</search:maxCookies>
               <search:maxCookiesPerHost>3</search:maxCookiesPerHost>
            </search:cookies>
         </search:crawlerSettings>
         <search:documentTypes>
            <search:documentType>
               <search:mimeType>application/msword</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/pdf</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/x-msexcel</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/x-mspowerpoint</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/html</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/plain</search:mimeType>
            </search:documentType>
         </search:documentTypes>
         <search:httpAuthentications>
            <search:httpAuthentication>
               <search:host>testhost1</search:host>
               <search:realm>testrealm1</search:realm>
               <search:username>testusername1</search:username>
               <search:password encrypted="false">
                 password
               </search:password>
            </search:httpAuthentication>
         </search:httpAuthentications>
         <search:htmlForms>
            <search:htmlForm>
               <search:name>testformname1</search:name>
               <search:formUrl>http://test2.oracle.com</search:formUrl>
               <search:action>test</search:action>
               <search:successUrl>
                 http://successurl.oracle.com
               </search:successUrl>
               <search:formControls>
                  <search:formControl>
                     <search:name>testcontrol1</search:name>
                     <search:value encrypted="false">testvalue1</search:value>
                     <search:isPasswordField>false</search:isPasswordField>
                  </search:formControl>
                  <search:formControl>
                     <search:name>testcontrol2</search:name>
                     <search:value encrypted="false">
                        this_value
                     </search:value>
                     <search:isPasswordField>true</search:isPasswordField>
                  </search:formControl>
               </search:formControls>
            </search:htmlForm>
         </search:htmlForms>
         <search:ssoAuthentication enabled="true">
            <search:username>testsso</search:username>
            <search:password encrypted="false">
               password
            </search:password>
         </search:ssoAuthentication>
      </search:webSource>
   </search:sources>
</search:config>