Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 2 (11.2.1)

Part Number E17332-04
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

Overview of XML Connector Framework

Oracle SES provides an XML connector framework to crawl any repository that provides an XML interface to its contents. The connectors for Oracle Content Server, Oracle E-Business Suite 12, and Siebel 8 use this framework.

Every document in a repository is known as an item. An item contains information about the document, such as author, access URL, last modified date, security information, status, and contents.

A set of items is known as a feed or channel. To crawl a repository, an XML document must be generated for each feed. Each feed is associated with information such as feed name, type of the feed, and number of items.

To crawl a repository with the XML connector, place data feeds in a location accessible to Oracle SES over one of these protocols: HTTP, FTP, or FILE. Then generate an XML Configuration File that contains information such as feed location and feed type. Create a source with a source type that is based on this XML connector and trigger the crawl from Oracle SES to crawl the feeds.

There are two types of feeds:

Guidelines for the target repository generating the XML feeds:

Example Using the XML Connector

The courses in the Oracle E-Business Suite Learning Management application can be crawled and indexed to readily search the courses offered, location and other details pertaining to the courses.

To crawl and index courses in Oracle E-Business Suite Learning Management: 

  1. Generate an XML feed containing the courses. Each course can be an item in the feed. The properties of the course such as location and instructor can be set as attributes of the item.

  2. Move the feed to a location accessible to Oracle SES through HTTP, FTP, or file protocol.

  3. Generate a control file that points to that feed.

  4. Generate a configuration file to point to this feed. Specify the feed type as control, the URL of the control feed, and the source name in the configuration file.

  5. Create an Oracle E-Business Suite 12 source in Oracle SES, specifying in the parameters the location of the configuration file, the user ID and the password to access the feed.

XML Configuration File

The configuration file is an XML file conforming to a set schema.

The following is an example of a configuration file to set up an XML-based source:

<rsscrawler xmlns="http://xmlns.oracle.com/search/rsscrawlerconfig">  
     <feedLocation>ftp://my.host.com/rss_feeds</feedLocation>
     <feedType>directoryFeed</feedType>
     <errorFileLocation>/tmp/errors</errorFileLocation>
     <securityType>attributeBased</securityType> 
     <sourceName>Contacts</sourceName>
     <securityAttribute name="EMPLOYEE_ID" grant="true"/> 
</rsscrawler> 

Where

  • feedLocation is one of the following:

    • URL of the directory, if the data feed is a directory feed

      This URL should be the FTP URL or the file URL of the directory where the data feeds are located. For example:

      ftp://example.domain.com/relativePathOfDirectory
      file://example.domain.com/c:\dir1\dir2\dir3
      file://example.domain.com//private/home/dir1/dir2/dir3 
      

      File URL if the data feeds are available on the same computer as Oracle SES. The path specified in the URL should be the absolute path of the directory.

      FTP URL to access data feeds on any other computer. The path of the directory in the URL can be absolute or relative. The absolute path should be specified following the slash (/) after the host name in the URL. The relative path should be specified relative to the home directory of the user used to access FTP feeds.

      The user ID used to crawl the source should have write permissions on the directory, so that the data feeds can be deleted after crawl.

    • URL of the control file, if the data feed is a control feed

      This URL can be HTTP, HTTPS, file, or FTP URL. For example:

      http://example.com:7777/context/control.xml
      

      The path in FTP and file protocols can be absolute or relative.

  • feedType indicates the type of feed. Valid values are directoryFeed, controlFeed, and dataFeed.

  • errorFileLocation (optional) specifies the directory where status feeds should be uploaded.

    A status feed is generated to indicate the status of the processing feed. This status feed is named data_feed_file_name.suc or data_feed_file_name.err depending on whether the processing was successful. Any errors encountered are listed in the error status feed. If a value is specified for this parameter, then the status feed is uploaded to this location. Otherwise, the status feed is uploaded to the same location as the data feed.

    The user ID used to access the data feed should have write permission on the directory.

    If feedLocation is an HTTP URL, then errorFileLocation also should be an HTTP URL, to which the status feeds are posted. If no value is specified for errorFileLocation, then the status feeds are posted to the URL given in feedLocation.

    If an error occurs while processing a feed available over file or FTP protocol, then the erroneous feed is renamed filename.prcsdErr in the same directory.

  • sourceName (optional) specifies the name of the source.

  • securityType (optional) specifies the security type. Valid values are the following:

    • noSecurity: There is no security information associated with this source at the document level. This is the default value.

    • identityBased: Identity-based security is used for documents in the feed.

    • attributeBased: Attribute-based security is used for documents in the feed. With this security model, security attributes should be specified in the securityAttribute tag, and the values for these attributes should be specified for each document.

  • securityAttribute specifies attribute-based security. One or more tags of this type should be specified, and each tag should contain the following attributes:

    • name: Name of the security attribute.

    • grant: Boolean parameter indicating whether this is a grant/deny attribute. The security attribute is considered a grant attribute if the value is true and a deny attribute if the value is false.