| Oracle® Secure Enterprise Search Administration API Guide 11g Release 2 (11.2.1) Part Number E17595-04 | 
 | 
| 
 | View PDF | 
Sources are collections of data to be searched, such as Web sites, database tables, content management repositories, collaboration repositories, and applications.
Note:
The current release of the Oracle SES Administration API supports these source types:File
Federated
User Defined
Web
Object Type
Creatable
Object Key
name
Object Key Command Syntax
--NAME=object_name -n object_name
State Properties
None
Supported Operations
create createAll delete deleteAll deleteList export exportAll exportList getAllObjectKeys update updateAll
Administration GUI Page
XML Descriptions
Each supported source type has a unique XML description:
XML Description: Federated Sources
For a federated source, the <search:sources> element contains a <search:federatedSource> element:
<search:sources> <search:federatedSource> <search:name> <search:url> <search:security> <search:entityName> <search:entityPassword> <search:authAttribute> <search:queryRouting> <search:filterRule> <search:searchRestrictions> <search:groupRestrictedEnabled> <search:searchedGroups> <search:fedSourceGroup> <search:attributeRetrieval> <search:retrievedAttrs> <search:fedSearchAttr> <search:attributeMappings> <search:attributeMapping> <search:localAttribute> <search:remoteAttribute>
Element Descriptions
Contains one or more source descriptions.
Describes a federated source. It contains these elements:
<search:name> <search:url> <search:security> <search:queryRouting> <search:searchRestrictions> <search:attributeRetrieval>
Contains the name of the source. (Required)
Contains the Web service URL.
Describes security for connecting to the federated source. It contains these child elements:
<search:entityName> <search:entityPassword> <search:authAttribute>
Contains the name of the federation trusted entity on the federation endpoint. Contact the administrator of the federated endpoint for this information
Contains the password for the entity name.
| Attribute | Value | 
|---|---|
| encrypted | Indicates whether the value of <search:entityPassword>is encrypted. Set totrueif the password is encrypted, or set tofalseif it is plain text. | 
Contains the name of an attribute that identifies and can authenticate a user on the federation endpoint.
Describes the rules for routing queries to the federated source. Without any rules, Oracle SES routes all queries to the federated source. This element is optional, but can improve scalability. It contains a <search:filterRule> element.
Contains the rules within a CDATA element. Rules consist of an attribute, a colon (:), and an expression. Attributes can be DATE, STRING, or NUMBER. DATE and NUMBER attributes can include these operators: -, =, >, >=, <, <=. The AND or OR operators separate multiple rules.
Restricts searches to a list of source groups. It contains these child elements:
<search:groupRestrictedEnabled> <search:searchedGroups>
Controls whether source groups are restricted during searches. Set to true to restrict searches, or set to false otherwise. The default value is false. (Optional)
Describes the source groups to be searched on the federated source. It contains one or more <search:fedSourceGroup> elements.
Empty element that uses parameters to identify source group. (Read only)
| Attribute | Value | 
|---|---|
| isAvailable | Identifies whether the source group is currently available in the federated source. | 
| name | Name of a federated source group. (Required) | 
Describes the attributes to be retrieved from the federated source. It contains a <search:retrieveAttrs> element.
Contains one or more <search:fedSearchAttr> elements.
Empty element that uses parameters to identify a search attribute.
| Attribute | Value | 
|---|---|
| name | Name of a search attribute. (Required) | 
| type | Data type of the attribute: STRING,NUMBER, orDATE. | 
| isAvailable | Identifies whether the attribute is currently available in the federated source: trueif it is available, orfalseotherwise. | 
| isMandatory | Identifies whether retrieval of the attribute is mandatory: trueif it must be listed in the<search:retrievedAttrs>element, orfalseif it can be omitted without causing an error. | 
Contains one or more <search:attributeMapping> elements.
Maps a local attribute to a remote attribute. It contains one of each of these elements:
<search:localAttribute> <search:remoteAttribute>
Identifies the local attribute being mapped.
| Attribute | Value | 
|---|---|
| name | Name of the local attribute. (Required) | 
| type | Data type of the local attribute: STRING,NUMBER, orDATE. (Required) | 
Identifies the remote attribute being mapped.
| Attribute | Value | 
|---|---|
| name | Name of the remote attribute. (Required) | 
| type | Data type of the remote attribute: STRING,NUMBER, orDATE. (Required) | 
| isAvailable | Identifies whether the remote attribute is currently available in the federated source: trueif it is available, orfalseotherwise. | 
Example 2-1 Federated Source Description
This XML document describes a federated source:
<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
  <search:sources>
    <search:federatedSource>
      <search:name>fed1</search:name>
      <search:url>http://example:7777/search/query/OracleSearch</search:url>
      <search:security>
        <search:entityName>entity2</search:entityName>
        <search:entityPassword encrypted="false">password</search:entityPassword>
        <search:authAttribute>nickname</search:authAttribute>
      </search:security>
      <search:queryRouting>
        <search:filterRule>
          <![CDATA[
          (language:en) AND (idm::mail:a.*)
          ]]>
        </search:filterRule>
      </search:queryRouting>
      <search:searchRestrictions>
        <search:groupRestrictedEnabled>true</search:groupRestrictedEnabled>
        <search:searchedGroups>
          <search:fedSourceGroup isAvailable="true" name="FILE"/>
          <search:fedSourceGroup isAvailable="true" name="Web"/>
        </search:searchedGroups>
      </search:searchRestrictions>
      <search:attributeRetrieval>
        <search:retrievedAttrs>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Author"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Description"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Infosource"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Infosource Path"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Language"/>
          <search:fedSearchAttr type="DATE" isAvailable="true" 
            isMandatory="true" name="LastModifiedDate"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Mimetype"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Title"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="Url"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="false" name="custom1"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="false" name="custom2"/>
          <search:fedSearchAttr type="NUMBER" isAvailable="true"
            isMandatory="true" name="eqdocid"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="eqfedid"/>
          <search:fedSearchAttr type="STRING" isAvailable="true"
            isMandatory="true" name="eqsnippet"/>
        </search:retrievedAttrs>
      </search:attributeRetrieval>
    </search:federatedSource>
  </search:sources>
</search:config>
XML Description: File Sources
For a file source, the <search:sources> element contains a <search:fileSource> element:
<search:sources> <search:fileSource> <search:name> <search:fileDisplayUrl> <search:fileUrlPrefix> <search:displayUrlPrefix> <search:startingUrls> <search:startingUrl> <search:url> <search:aclPolicy> <search:authorizationPlugin> <search:boundaryRules> <search:attributeMappings> <search:attributeMapping> <search:documentAttr> <search:searchAttr> <search:crawlerSettings> <search:documentTypes> <search:documentType> <search:mimeType>
Element Descriptions
Contains one or more source descriptions.
Describes a file source. It contains these elements:
<search:name> <search:fileDisplayUrl> <search:startingUrls> <search:aclPolicy> <search:boundaryRules> <search:attributeMappings> <search:crawlerSettings> <search:documentTypes>
Contains the name of the file source.
Identifies a physical path that is replaced by a display URL for security reasons when the file is retrieved during a search.
| Attribute | Value | 
|---|---|
| enabled | Controls whether the display URL prefix is used for security reasons. Set to trueto use the display URL, or set tofalseto display the physical location of the file. (Required) | 
Contains the physical file URL to be replaced by the display URL.
Contains a URL prefix displayed instead of the file URL.
Identifies the file path where the crawler begins. It consists of one or more of these child elements:
Contains a <search:url> element.
Contains an entry point for starting to crawl files. The URL must be in its original form as an unencoded file path.
Describes an authorization policy for the source. See "XML Description: Web Sources".
Describes the authorization plug-in. See "XML Description: User-Defined Sources".
Describes the boundary rules for the source. See "XML Description: Web Sources".
Maps the document attributes to search attributes. It contains one or more <search:attributeMapping> elements.
Contains a document attribute and a search attribute for mapping. It contains one of each of these child elements:
<search:documentAttr> <search:searchAttr>
Identifies a document attribute by its name and data type.
| Attribute | Value | 
|---|---|
| name | Name of a document attribute | 
| type | Data type of the attribute: DATE,NUMBER, orSTRING | 
Identifies a search attribute by its name and data type. Search attributes are displayed to users in the Oracle SES Search interface.
| Attribute | Value | 
|---|---|
| name | Name of a search attribute | 
| type | Data type of the attribute: DATE,NUMBER, orSTRING | 
Configures the crawler. It contains these child elements:
<search:numThreads> <search:languageDetection> <search:defaultLanguage> <search:crawlTimeout> <search:maxDocumentSize> <search:preserveDocumentCache> <search:charSetDetection> <search:defaultCharSet> <search:servicePipeline> <search:indexProfileName> <search:indexNullTitleFallback> <search:badTitles> <search:logLevel> <search:followSymlinks>
See the <search:crawlerSettings> for Web sources for descriptions, except for <search:followSymlinks>:
Contains true to prevent the crawler from following links to the absolute path, or false otherwise. The default value is true.
Applies only to file sources on Linux and UNIX systems.
Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.
Contains one or more <search:mimeType> elements.
Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats" for supported MIME types.
Example 2-2 File Source Description
This XML document describes a file source:
<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:sources>
      <search:fileSource>
         <search:name>Document Library</search:name>
         <search:fileDisplayUrl enabled="false"/>
         <search:startingUrls>
            <search:startingUrl>
               <search:url>file://localhost/startingDirectory/</search:url>
            </search:startingUrl>
         </search:startingUrls>
         <search:aclPolicy>
            <search:noACL/>
         </search:aclPolicy>
         <search:attributeMappings>
            <search:attributeMapping>
               <search:documentAttr name="AUTHOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="CREATOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="DESCRIPTION" type="STRING"/>
               <search:searchAttr name="Description" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="HOST" type="STRING"/>
               <search:searchAttr name="Host" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="INFOSOURCE" type="STRING"/>
               <search:searchAttr name="Infosource" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="KEYWORD" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="KEYWORDS" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="LANGUAGE" type="STRING"/>
               <search:searchAttr name="Language" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="LASTMODIFIEDDATE" type="DATE"/>
               <search:searchAttr name="LastModifiedDate" type="DATE"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="MIMETYPE" type="STRING"/>
               <search:searchAttr name="Mimetype" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="SUBJECT" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="SUBJECTS" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:attributeMapping>
            <search:attributeMapping>
               <search:documentAttr name="TITLE" type="STRING"/>
               <search:searchAttr name="Title" type="STRING"/>
            </search:attributeMapping>
         </search:attributeMappings>
         <search:crawlerSettings>
            <search:numThreads>5</search:numThreads>
            <search:languageDetection enabled="false"/>
            <search:defaultLanguage>en</search:defaultLanguage>
            <search:crawlTimeout>30</search:crawlTimeout>
            <search:maxDocumentSize>10</search:maxDocumentSize>
            <search:preserveDocumentCache enabled="true"/>
            <search:defaultCharSet>8859_1</search:defaultCharSet>
            <search:servicePipeline enabled="true">
               <search:pipelineName>Default pipeline</search:pipelineName>
            </search:servicePipeline>
         </search:crawlerSettings>
         <search:documentTypes>
            <search:documentType>
               <search:mimeType>text/html</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/plain</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/xml</search:mimeType>
            </search:documentType>
         </search:documentTypes>
      </search:fileSource>
   </search:sources>
</search:config>
XML Description: User-Defined Sources
For a user-defined source, a <search:sources> element contains a <search:userDefinedSource> element:
<search:sources> <search:userDefinedSource> <search:name> <search:sourceTypeName> <search:aclPolicy> <search:authorizationPlugin> <search:managerClassName> <search:jarFilePath> <search:parameters> <search:parameter> <search:securityAttrs> <search:securityAttr> <search:parameters> <search:parameter> <search:value> <search:boundaryRules> <search:attributeMappings> <search:crawlerSettings> <search:documentTypes> <search:documentType> <search:mimeType>
Element Descriptions
Describes one or more sources.
Describes a user-defined source. It contains these child elements:
<search:name> <search:sourceTypeName> <search:boundaryRules> <search:aclPolicy> <search:attributeMappings> <search:documentTypes> <search:parameters>
Name of the user-defined source.
Type of user-defined source. For a definitive list of user-defined source types, issue an exportAll sourceType command. Set to the source type exactly as shown.
Database EMC Documentum Content Server EMC Documentum eRoom Federated User Authorization Cache Lotus Notes Microsoft Exchange) Microsoft NTFS Microsoft SharePoint 2007 Oracle Calendar Oracle Collaboration Suite E-Mail Oracle Content Database Oracle Content Database (JDBC) Oracle Content Server Oracle E-Business Suite Oracle Fusion Oracle WebCenter Siebel 7.8 Siebel 7.8(Public) Siebel 8 User Authorization Cache
Describes an authorization plug-in. It contains these elements:
<search:managerClassName> <search:jarFilePath> <search:parameters>
Contains the name of the plug-in manager Java class.
Contains the qualified name of the jar file. Paths can be absolute or relative to the ORACLE_HOME/search/lib/plugins/identity directory.
Contains one or more <search:parameter> elements, each one setting a parameter. This element appears in a <search:userDefinedSource> element to define parameters supported by the source. It also appears in a <search:authorizationPlugin> to define parameters supported by the plug-in.
Describes a parameter. It contains the following elements:
<search:value> <search:description>
| Attribute | Value | 
|---|---|
| name | Name of a parameter. | 
Contains the value of the parameter.
| Attribute | Value | 
|---|---|
| encrypted | Indicates whether the value of <search:value>is encrypted. Set totrueif the value is encrypted, or set tofalseif it is plain text. | 
Contains a description of the parameter.
Contains one or more <search:securityAttr> elements.
Contains a user or a group that is granted or denies access to the data source, depending on the value of the type attribute. (Read only)
| Attribute | Value | 
|---|---|
| type | Set to GRANTif the user or group has access to the source, or set toDENYotherwise. | 
Describes the boundary rules. See "XML Description: Web Sources".
Maps the document attributes to search attributes.
Configures the crawler. It contains these child elements:
<search:numThreads> <search:languageDetection> <search:defaultLanguage> <search:crawlTimeout> <search:maxDocumentSize> <search:preserveDocumentCache> <search:charSetDetection> <search:defaultCharSet> <search:servicePipeline> <search:indexNullTitleFallback> <search:badTitles> <search:logLevel> <search:useInMemoryQueue>
See <search:crawlerSettings> for Web sources for descriptions of these elements, except for <search:useInMemoryQueue>.
Contains true to put the queue in memory, or false otherwise. The default value is false. This setting is used only by connectors associated with Oracle Database.
Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.
Contains a <search:mimeType> element.
Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats".
Example 2-3 User-Defined Source Description
This XML document describes an Oracle Content Database source.
<?xml version="1.0"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
 <search:sources>
   <search:userDefinedSource>
     <search:name>contentdb</search:name>
     <search:sourceTypeName>Oracle Content Database</search:sourceTypeName>
     <search:aclPolicy>
       <search:noACL/>
     </search:aclPolicy>
     <search:parameters>
       <search:parameter name="Oracle Content Database URL">
          <search:value>http://contentDBUrl.com:7777/content</search:value>
       </search:parameter>
       <search:parameter name="Starting paths">
          <search:value>/us</search:value>
       </search:parameter>
       <search:parameter name="Depth">
          <search:value>-1</search:value>
       </search:parameter>
       <search:parameter name="Oracle Content Database admin user">
          <search:value>myUserName</search:value>
       </search:parameter>
       <search:parameter name="Entity name">
          <search:value>
       orclapplicationcommonname=ocscsplugin,cn=ifs,cn=products,cn=oraclecontext
          </search:value>
       </search:parameter>
       <search:parameter name="Entity password">
          <search:value encrypted="false">password</search:value>
       </search:parameter>
       <search:parameter name="Crawl only">
          <search:value>false</search:value>
       </search:parameter>
       <search:parameter name="Use e-mail for authorization">
          <search:value>false</search:value>
       </search:parameter>
     </search:parameters>
   </search:userDefinedSource>
 </search:sources>
</search:config> 
XML Description: Web Sources
For a Web source, the <search:source> element contains a <search:webSource> element:
<search:sources> <search:webSource> <search:name> <search:selfService> <search:startingUrls> <search:startingUrl> <search:url> <search:aclPolicy> <!-- No ACL policy --> <search:noACL> <!-- Document-level ACL policy --> <search:documentLevelACL> <!-- Source-level ACL policy --> <search:sourceLevelACL> <search:accessControlEntries> <search:accessControlEntry> <search:name> <search:privilege> <!-- Boundary rules --> <search:boundaryRules> <search:boundaryRule> <search:ruleType> <search:ruleOperation> <search:rulePattern> <search:metatagMappings> <search:metatagMapping> <search:documentAttr> <search:searchAttr> <!-- Crawler settings --> <search:crawlerSettings> <search:numThreads> <search:languageDetection> <search:defaultLanguage> <search:crawlDepth> <search:limit> <search:crawlTimeout> <search:maxDocumentSize> <search:preserveDocumentCache> <search:charsetDetection> <search:defaultCharSet> <search:servicePipeline> <search:pipelineName> <search:indexNullTitleFallback> <search:badTitles> <search:badTitle> <search:honorRobotsExclusion> <search:indexDynamicPages> <search:httpCharSetOverride> <search:cookies> <search:cookieContentInLog> <search:maxCookieSize> <search:maxCookies> <search:maxCookiesPerHost> <search:agentString> <search:duplicateDetection> <search:connections> <search:crawlConnectionSettingsType> <search:logLevel> <search:documentTypes> <search:documentType> <search:mimeType> <search:httpAuthentications> <search:httpAuthentication> <search:host> <search:realm> <search:username> <search:password> <search:htmlForms> <search:htmlForm> <search:name> <search:formUrl> <search:action> <search:successUrl> <search:formControls> <search:formControl> <search:name> <search:value> <search:isPasswordField> <search:ssoAuthentication> <search:username> <search:password> <search:userAgent>
Element Descriptions
Contains one or more source descriptions.
Describes a Web source. It contains these child elements:
<search:name> <search:selfService <search:startingUrls> <search:aclPolicy> <search:boundaryRules> <search:metatagMappings> <search:crawlerSettings> <search:documentTypes> <search:httpAuthentications> <search:htmlForms> <search:ssoAuthentication>
Name of the Web source.
Contains a value of true to enable self-service authentication, or a value of false to disable it. Self-service authentication lets users enter authentication credentials at run time, instead of the administrator entering credentials at the time the source is created.
Contains one or more <search:startingUrl> elements.
Contains a <search:url> element.
Contains the URL-encoded Web address that is an entry point for starting to crawl Web pages.
Describes an ACL policy for the source. It contains one of these child elements:
<search:noACL> <search:documentLevelACL> <search:sourceLevelACL>
Indicates no ACL policy. All documents are visible and searchable.
Describes a document-level ACL policy.
Describes an Oracle SES ACL policy used when crawling private content. It preserves authorizations specified in OracleAS Portal.
For user-defined sources, crawler plug-ins (or connectors) can supply ACL information with documents for indexing, which provides finer control document protection. That is, each document within one source may be viewed by a different set of users or groups.
This element contains a <search:accessControlEntries> element.
Contains one or more <search:accessControlEntry> elements.
Provides a list of users and groups that have access to the source or are restricted from access. It contains these child elements:
<search:name> <search:privilege>
Contains the name or a user or group that is valid for the currently active identity plug-in.
Set to GRANTED to allow access to the source, or set to DENIED to restrict access.
Contains one or more <search:boundaryRule> elements, each describing a boundary rule.
Describes a boundary rule. It contains these child elements:
<search:ruleType> <search:ruleOperation> <search:rulePattern>
Type of URL boundary rule:
INCLUSION: The URL matches <search:rulePattern>.
EXCLUSION: The URL does not match <search:rulePattern>.
Matching operation for a search rule pattern:
CONTAINS: The URL contains the rule pattern for a case-insensitive match.
STARTSWITH: The URL starts with the rule pattern for a case-insensitive match.
ENDSWITH: The URL ends with the rule pattern for a case-insensitive match.
REGEX: The URL contains the regular expression in a case-sensitive match.
The pattern of characters in the URL. You can use these special characters:
Caret (^) denotes the beginning of a URL.
Dollar sign ($) denotes the end of a URL.
A period (.) matches any one character.
Question mark (?) before a character matches 0 or 1 occurrences of that character.
Asterisk (*) before a pattern matches 0 or more occurrences of that pattern. Enclose the pattern in parentheses (), brackets [], or braces {}.
A backslash (\) precedes a literal use of a special character, such as \? to match a question mark in a URL.
Contains one or more <search:metataMappings> elements.
Contains a mapped pair of attributes in these child elements:
<search:documentAttr> <search:searchAttr>
Identifies a document attribute by its name and data type. Document attributes are among the properties of a document.
| Attribute | Value | 
|---|---|
| name | Name of a document attribute. (Required) | 
| type | Data type of the attribute: DATE,NUMBER, orSTRING. | 
Identifies a search attribute by its name and data type. Search attributes are displayed to users in the Oracle SES Search interface.
| Attribute | Value | 
|---|---|
| name | Name of a search attribute. (Required) | 
| type | Data type of the attribute: DATE,NUMBER, orSTRING. | 
Configures the crawler. It contains these child elements:
<search:numThreads> <search:languageDetection> <search:defaultLanguage> <search:crawlDepth> <search:crawlTimeout> <search:maxDocumentSize> <search:preserveDocumentCache> <search:charSetDetection> <search:defaultCharSet> <search:servicePipeline> <search:indexNullTitleFallback> <search:badTitles> <search:honorRobotsExclusion> <search:indexDynamicPages> <search:httpCharSetOverride> <search:cookies> <search:agentString> <search:duplicateDetection> <search:connections> <search:logLevel>
Number of processes to use for crawling the source.
Controls the use of a language detector when the metadata for a document does not identify the language.
| Attribute | value | 
|---|---|
| enabled | Controls use of language detection when a source document does not indicate the language in the header. Set to trueto enable language detection, or set tofalseotherwise. (Required) | 
Default language used by the crawler when the document language cannot be detected.
Controls use of a limit on crawling nested links. It contains a <search:limit> element.
| Attribute | Value | 
|---|---|
| haslimit | Controls whether the search limit is enforced. Set to trueto impose the limit, or set tofalseotherwise. (Required) | 
Contains the maximum number of nested links to be crawled.
Number of milliseconds for search results to be returned.
Maximum document size in megabytes. Larger documents are not crawled.
Controls retention of the document cache after indexing.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto retain the cache, or set tofalseotherwise. (Required) | 
Contains a value of true to enable automatic character set detection, or false to disable it. The default value is true. This parameter can be set at the global level.
Code for the default character set, which is used when a source document does not identify its character set in the header. See Table 2-4, "Crawlable Character Sets".
Controls use of a document service pipeline.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto use the pipeline, or set tofalseotherwise. Whentrue,<search:servicePipeline>contains a<search:pipelineName>element. | 
Contains the name of a pipeline.
Controls whether the default title is included in the index for documents with null titles:
indexForAll: Includes the default title in the index. (Default)
noIndex: Does not include the default title in the index.
Contains one or more <search:badTitle> elements. This parameter can be set at the global level.
Contains an exact character string for a document title that the crawler omits from the index. These bad titles are defined by default:
PowerPoint Presentation Slide 1
Controls visits by robots to the Web site.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto exclude robots, or set tofalseotherwise. | 
Controls whether dynamic pages are crawled and indexed.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto crawl dynamic pages, or set tofalseotherwise. | 
Controls the character set used for a Web page.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto exclude robots, or set tofalseotherwise. | 
Controls whether cookies are used to remember context. It contains these child elements:
<search:cookiecontentInLog> <search:maxCookieSize> <search:maxCookies> <search:maxCookiesPerHost>
| Attribute | Value | 
|---|---|
| enabled | Set to trueto enable cookies (default), orfalseotherwise. | 
Controls whether information about cookies appears in the log file.
| Attribute | Value | 
|---|---|
| enabled | Set to trueto log cookie messages, or set tofalseotherwise (default). | 
Contains the maximum size in bytes of a cookie.
Contains the total number of cookies allowed in a crawl.
Contains the maximum number of cookies permitted for a Web site.
Contains the browser agent string presented to the Web server. The default value is "Oracle Secure Enterprise Search". Applies only to Web and Portal sources.
Contains a value of true to enable duplicate detection during a Web crawl, or false to disable it. The default value is true.
Sets limits on a connection to Web and Portal sources. It contains these elements:
<search:timeout> <search:retries> <search:retryInterval>
Contains the maximum number of milliseconds to make a connection to a data source. The default value is 10.
Contains the maximum number of connection attempts to a data source. The default value is 10.
Contains the number of milliseconds between connection retry attempts. The default value is 5.
Contains a logging level for the crawler:
| Logging Level | Description | 
|---|---|
| DEBUG | Debugging messages | 
| INFO | Informational messages (Default) | 
| WARN | Warning messages | 
| ERROR | Error messages | 
| FATAL | Fatal messages | 
Identifies the types of documents to be crawled. It contains one or more <search:documentType> elements.
Contains one or more <search:mimeType> elements.
Contains the Internet media type of the content in the form type/subtype. See Table 2-1, "Document Formats".
Contains one or more <search:httpAuthentication> elements.
Describes HTTP authentication. For proxy authentication, it contains these elements:
<search:host> <search:realm> <search:username> <search:password>
Contains the address of the target computer.
Contains a name associated with the protected area of a Web site.
Contains the name of the log-in user.
Contains the password associated with the user name.
| Attribute | Value | 
|---|---|
| encrypted | Indicates whether the value of <search:password>is encrypted. Set totrueif the password is encrypted, or set tofalseif it is plain text. | 
Contains one or more <search:htmlForm> elements, each one describing an HTML form.
Describes an HTML form. It contains these elements:
<search:name> <search:formUrl> <search:action> <search:successUrl> <search:formControls>
Contains the name of the HTML form object.
Contains the Web address of the HTML form.
Contains the address where the browser sends the form.
Contains the URL displayed after the user successfully submits the form.
Contains one or more <search:formControl> elements.
Describes a form control. It contains these elements:
<search:name> <search:value> <search:isPasswordField>
Contains the name of the form control.
Contains the value of the form control.
| Attribute | Value | 
|---|---|
| encrypted | Indicates whether the value of <search:value>is encrypted. Set totrueif the value is encrypted, or set tofalseif it is plain text. | 
Identifies whether the field contains a password. Set to true for a password field, or false otherwise.
Describes OracleAS Single Sign-On authentication. It contains these elements:
<search:username> <search:password> <search:userAgent>
| Attribute | Value | 
|---|---|
| enabled | Controls use of OracleAS Single Sign-On for authentication. Set to trueto enable Single Sign-On, orfalseotherwise. | 
Contains a user name for OracleAS Single Sign-On.
Contains the password for the OracleAS Single Sign-On user.
| Attribute | Value | 
|---|---|
| encrypted | Indicates whether the value of <search:password>is encrypted. Set totrueif the password is encrypted, or set tofalseif it is plain text. | 
Contains an authentication value that overrides the default User Agent value for OracleAS Single Sign-On. The default value is null.
Example 2-4 Web Source Description
This XML document describes a Web source.
<?xml version="1.0" encoding="UTF-8"?>
<search:config productVersion="11.2.1.0.0" xmlns:search="http://xmlns.oracle.com/search">
   <search:sources>
      <search:webSource>
         <search:name>this_websource</search:name>
         <search:startingUrls>
            <search:startingUrl>
               <search:url>http://www.example.com/</search:url>
            </search:startingUrl>
         </search:startingUrls>
         <search:aclPolicy>
            <search:noACL/>
         </search:aclPolicy>
         <search:boundaryRules>
            <search:boundaryRule>
               <search:ruleType>EXCLUSION</search:ruleType>
               <search:ruleOperation>STARTSWITH</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[http://www.example.com?test=test val3]]>
               </search:rulePattern>
            </search:boundaryRule>
            <search:boundaryRule>
               <search:ruleType>INCLUSION</search:ruleType>
               <search:ruleOperation>CONTAINS</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[http://www.example.com?test=test val]]>
               </search:rulePattern>
            </search:boundaryRule>
            <search:boundaryRule>
               <search:ruleType>INCLUSION</search:ruleType>
               <search:ruleOperation>REGEX</search:ruleOperation>
               <search:rulePattern>
                  <![CDATA[^https?://www\.example\.com(?:\:\d{1,5})?(?:$|/)]]>
               </search:rulePattern>
            </search:boundaryRule>
         </search:boundaryRules>
         <search:metatagMappings>
            <search:metatagMapping>
               <search:documentAttr name="AUTHOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="CREATOR" type="STRING"/>
               <search:searchAttr name="Author" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="DESCRIPTION" type="STRING"/>
               <search:searchAttr name="Description" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="KEYWORD" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="KEYWORDS" type="STRING"/>
               <search:searchAttr name="Keywords" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="SUBJECT" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:metatagMapping>
            <search:metatagMapping>
               <search:documentAttr name="SUBJECTS" type="STRING"/>
               <search:searchAttr name="Subject" type="STRING"/>
            </search:metatagMapping>
         </search:metatagMappings>
         <search:crawlerSettings>
            <search:numThreads>7</search:numThreads>
            <search:languageDetection enabled="true"/>
            <search:defaultLanguage>fr</search:defaultLanguage>
            <search:crawlDepth haslimit="true">
               <search:limit>2</search:limit>
            </search:crawlDepth>
            <search:crawlTimeout>100</search:crawlTimeout>
            <search:maxDocumentSize>1000</search:maxDocumentSize>
            <search:preserveDocumentCache enabled="true"/>
            <search:defaultCharSet>JIS</search:defaultCharSet>
            <search:servicePipeline enabled="false"/>
            <search:honorRobotsExclusion enabled="false"/>
            <search:indexDynamicPages enabled="true"/>
            <search:httpCharSetOverride enabled="false"/>
            <search:cookies enabled="true">
               <search:cookieContentInLog enabled="false"/>
               <search:maxCookieSize>1</search:maxCookieSize>
               <search:maxCookies>2</search:maxCookies>
               <search:maxCookiesPerHost>3</search:maxCookiesPerHost>
            </search:cookies>
         </search:crawlerSettings>
         <search:documentTypes>
            <search:documentType>
               <search:mimeType>application/msword</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/pdf</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/x-msexcel</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>application/x-mspowerpoint</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/html</search:mimeType>
            </search:documentType>
            <search:documentType>
               <search:mimeType>text/plain</search:mimeType>
            </search:documentType>
         </search:documentTypes>
         <search:httpAuthentications>
            <search:httpAuthentication>
               <search:host>testhost1</search:host>
               <search:realm>testrealm1</search:realm>
               <search:username>testusername1</search:username>
               <search:password encrypted="false">
                 password
               </search:password>
            </search:httpAuthentication>
         </search:httpAuthentications>
         <search:htmlForms>
            <search:htmlForm>
               <search:name>testformname1</search:name>
               <search:formUrl>http://test2.oracle.com</search:formUrl>
               <search:action>test</search:action>
               <search:successUrl>
                 http://successurl.oracle.com
               </search:successUrl>
               <search:formControls>
                  <search:formControl>
                     <search:name>testcontrol1</search:name>
                     <search:value encrypted="false">testvalue1</search:value>
                     <search:isPasswordField>false</search:isPasswordField>
                  </search:formControl>
                  <search:formControl>
                     <search:name>testcontrol2</search:name>
                     <search:value encrypted="false">
                        this_value
                     </search:value>
                     <search:isPasswordField>true</search:isPasswordField>
                  </search:formControl>
               </search:formControls>
            </search:htmlForm>
         </search:htmlForms>
         <search:ssoAuthentication enabled="true">
            <search:username>testsso</search:username>
            <search:password encrypted="false">
               password
            </search:password>
         </search:ssoAuthentication>
      </search:webSource>
   </search:sources>
</search:config>