This chapter discusses how to use Oracle Webcenter Console for SharePoint to crawl and index SharePoint items within your portal. This is accomplished by creating the following objects within the portal:
Oracle Webcenter Console for SharePoint allows you to crawl and index items from a Microsoft Office SharePoint Server (MOSS) or Windows SharePoint Services (WSS) site or site collection. It also allows you to crawl a list of MOSS or WSS sites specified by an RSS feed. Crawling SharePoint items into your portal requires the configuration of a content source, a crawler, and a job. Depending on your needs, more than one content source and/or crawlers may need to be created.
A SharePoint content source is configured with authentication information and default clickthrough behavior. The authentication information is the Windows credentials necessary to access the desired SharePoint site or site collection. If multiple sites are accessible with the same authentication credentials, only one SharePoint content source is required. If sites require different authenticating credentials, create an SharePoint content source for each set of credentials.
Each SharePoint content source can have one or more SharePoint crawlers associated with it. A SharePoint crawler describes which SharePoint site or site collection is to be crawled, what to crawl on that SharePoint site, and where the crawled items should be put. Note that the crawler does not import the SharePoint items themselves, but rather indexes them within the portal.
Oracle Webcenter Console for SharePoint comes with a default content source, SharePoint Content Source, that can be updated with your MOSS or WSS information. To create a SharePoint content source:
Caution: | The portal cannot gateway SharePoint documents. The gateway interferes with authentication and WebDAV features of SharePoint. |
When the crawler accesses the desired SharePoint site, it will run as this user. The user must have the following rights on the site:
To crawl Form Template folders in MOSS 2007 and WSS 3.0, the View Application Pages and View Versions rights are also required.
Note: | The authentication information configured in the content source is used only by the crawler. These credentials will not be passed to the SharePoint site when a user clicks on an item in the portal. Authentication of the user to access that item is handled by MOSS or WSS. |
Select the radio button next to the default clickthrough behavior you would like for SharePoint documents. Clicking on a SharePoint document can either open the document directly, or take the user to the SharePoint properties page for that document. This setting affects clickthrough behavior in the Knowledge Directory, general search results, the Most Recently Used SharePoint Items portlet, and the default mode of the SharePoint Search portlet. When the Most Recently Used SharePoint Items portlet is used in a community, community preferences can be used to change the clickthrough behavior.
Note: | This setting only affects documents. All other SharePoint items open their default display page which is the properties page. |
Once you have created one or more SharePoint content sources, you can create SharePoint crawlers:
Select Site URL to crawl a SharePoint site or site collection. Type the URL of the site that you wish to crawl and click Validate. This URL should end in “/”.
|
|
It is recommended that you choose Selected Site and All Subsites to simplify indexing of SharePoint sites. If you do not choose this setting, you will have to create a separate crawler for each subsite. You should choose Selected Site Only if you want to tightly control what SharePoint sites are indexed.
|
|
It is recommended that you choose to mirror the folder structure, approve imported documents, and import security. If you choose to mirror the folder structure, the crawler will mirror the SharePoint site structure. Importing security means that only those users who can view the source item in MOSS or WSS can see the corresponding document in the portal and search results. For more information on importing security, see the Administrator Guide for Oracle WebCenter Interaction.
|
|
You can use the Document Expiration settings, along with the select Apply these settings to existing documents created by this crawler option, to delete all documents previously crawled by this crawler.
For more information about the Document Settings page, see the Oracle WebCenter Interaction online help.
For more information about content types, see the Administrator Guide for Oracle WebCenter Interaction or the Oracle WebCenter Interaction online help.
For more information about advanced crawler settings, see the Oracle WebCenter Interaction online help.
You must also register the administrative folder that contains the job with the Automation Service.
For more information about setting up crawler jobs, see the Administrator Guide for Oracle WebCenter Interaction or the Oracle WebCenter Interaction online help.
In addition to crawling SharePoint sites and site collections, SharePoint content crawlers can also be used to import content from a list of sites provided by an RSS feed.
The path to a site feed can be a URL (http://, file:///), a local file path on the SharePoint machine (c:\feed.xml), or a UNC file path on the SharePoint network (\\server\feeds\feed.xml). Secured sites (https://) and FTP sites (ftp://) cannot be crawled.
The Oracle Webcenter Console for SharePoint installer creates a virtual directory named SiteFeed that can be used to deploy a simple RSS feed. For example, you can put a well-formed, valid XML feed document or an .aspx file that generates a site feed into the sitefeed folder on the file system and then access it via HTTP.
The site feed document should conform to the RSS 2.0 specification. The link element of each item must contain the URL to a valid SharePoint site. The title element of an item is optional. If provided, it should be a valid Knowledge Directory folder name, as it will be used as the Knowledge Directory folder name for the site for a mirroring crawl. If the title element is omitted, the folder name will be retrieve from the SharePoint site for a mirroring crawl.
<?xml version="1.0" encoding="utf-8" ?>
<title>SharePoint Site List</title>
<description>List of SharePoint sites to be crawled by Oracle WebCenter Console for SharePoint </description>
<title>MOSS TeamSite at mySPServer:17938/sites/MainSiteCollection</title>
<link>http://mySPServer:17938/sites/MainSiteCollection/</link>
<link>http://SharePointServer:9167/</link>