Setting Up Microsoft SharePoint Sources

The SharePoint Crawler connector enables Oracle SES to provide secure search over SharePoint Portal Server and Microsoft Office SharePoint Server 2007 (MOSS). The connector extends the searching capabilities of Oracle SES and enables it to search into an external repository. Oracle SES can crawl through the documents, items, and related metadata in SharePoint repositories and provide secure, full-text search. The connector also provides metadata search and browse functionality, which allows a search to be done against a specific subfolder in the hierarchy.

In SharePoint, data is stored in different libraries such as the Document Library, Picture Library, Lists, Discussion Boards, and so on. A SharePoint instance can have one or more sites and sub-sites that the SharePoint Crawler connector can crawl after you set up the appropriate configuration parameters in the Oracle SES Administration GUI. The SharePoint Crawler connector navigates through the Libraries and Lists to crawl all the documents and items from a SharePoint repository. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

The SharePoint Crawler connector supports incremental crawling, which means that it crawls and indexes only those documents that have changed after the most recent crawl. A document is re-crawled if the content, metadata, or direct security access information of the document has changed since the previous crawl. Documents deleted from a Library are removed from the index during incremental crawling.

Important Notes About SharePoint 2007 Sources

  • The supported versions of SharePoint Server are:

    • 2003 or 2.0 for SharePoint Portal Server

    • 2007 or 3.0 for MOSS 2007

  • When the Crawl Security Settings parameter is set to either NORMAL or STRICT, the SharePoint Crawler for the Container must use the SharePoint administrator account for crawling and indexing documents.

  • To crawl CAD files, insert this line at the beginning of ORACLE_HOME/search/data/config/crawler.dat:

    MIMEINCLUDE application/octet-stream
    
  • When the Crawl Security Settings parameter is set to RELAX, any user that has at least Visitor (Read) permissions can be identified in the SharePoint source for crawling and indexing documents.

  • SharePoint Container names in Oracle SES should not contain any special characters. Enter a backslash (\) before a slash or a comma. Otherwise, the crawler does not recognize the Container.

Known Limitations of the SharePoint 2007 Connector

  • Passwords entered through the Oracle SES Administration GUI are case insensitive.

  • Storing more than 200 files in a single folder may result in degraded performance and increased crawling time.

  • An administrator must own the SharePoint Server site with the documents to be crawled. The crawler does not have sufficient access rights to crawl the documents if it uses the identity of a non-administrative user.

    To grant administrative rights to a SharePoint user:

    1. Open the SharePoint Site UI and select Site Settings.

    2. Select Users and Permissions, then Site Collection Administrators to display the Site Collection Administrators page.

    3. Enter the user name of the SharePoint Server site in the Site Collection Administrators field.

    4. Click OK.

  • If the Crawler Security Settings parameter is set to RELAX, then the user ID specified in the User Name parameter does not require administrative privileges. Visitor (Read) permissions on the site are sufficient. However, Read must have Browse Directories permissions to access any sub-sites. Otherwise, the sub-sites are not crawled.

    To add Browse Directories permissions for SharePoint 2007:

    1. Open People and Groups - Site Permissions.

    2. Under Settings - Permission Levels, select READ.

    3. Under Site Permissions, select Browse Directories.

    4. Click Submit.

    To add Browse Directories permissions for SharePoint 2003:

    1. Open the Created subarea and select Manage Security.

    2. Select the user and edit permissions.

    3. Select READ.

    4. Click Advanced Permissions.

    5. Under Advanced Permissions, select Browse Directories.

    6. Click OK.

  • SharePoint does not allow users without administrative privileges to browse user profiles.

    If the user ID specified in the User Name parameter does not have administrative privileges, then this user needs permission to manage profiles.

    To grant permission to manage profiles:

    1. Open SharePoint Central Administration 3.02.

    2. Click Shared Services Administration - SharedServices1.

    3. Under User Profile and My Sites, select Personalization Service Permissions.

    4. Add user user1 and select permissions Manage user profiles.

    5. Save and submit the user.

    User profiles are crawled if the user has specified the root site in the Site/Sub-Site URL parameter of the source configuration.

Known Issues for SharePoint 2007 Connector

  • Versions of list items whose object type is Folder are not crawled and indexed.

  • Site Collection Administrator users are not able to see documents if they are not listed among the document permission users.

  • Unable to type cast null message is not error. This information is provided when the crawler tries to crawl attachments that are not supported for a particular entity.

  • Principal user_name cannot be validated error is returned when the crawler obtains a user name from the SharePoint repository that is not present in the Active Directory.

  • Performance of the SharePoint connector can be impacted when the Crawl Versions attribute is set to true.

Supported Platforms

The following platforms are supported by the SharePoint Crawler connector:

  • Red Hat Linux 4

  • Windows 2003 Server Standard Edition and above with the latest Service Pack

Creating a SharePoint 2007 Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters described in the following list. Also see Table 7-6, "Supported Values for SharePoint Source Parameters".

  • SharePoint Version: Version of the SharePoint server (SharePoint Portal Server/MOSS 2007) to crawl. (Required)

  • Container name: Contains the names of the containers to be crawled by Oracle SES. You can specify multiple container names as a comma-delimited list. (Required)

    You can crawl an entire area or site or a specific folder. The format for specifying a container folder is AreaName/LibraryName/FolderName/SubFolderName.

    To crawl all documents in the Area or Library, the format is AreaName or AreaName/LibraryName.

    To index the entire SharePoint portal, enter a slash (/).

    To crawl all sites, enter sites.

    Examples for SharePoint Portal Server:

    • Container name: AreaName

      The entire Area is crawled.

    • Container name: AreaName/LibraryName/Folder21

      Folder21 and its subfolders within LibraryName are crawled.

    • Container name: LibraryName

      All documents inside the Library and its subfolders are crawled.

    Examples for MOSS 2007:

    • Container name: LibraryName/Folder21

      Folder21 and its sub-folders within LibraryName are crawled.

    • Container name: LibraryName

      All documents inside the Library and its subfolders are crawled.

      The path for the container cannot contain any special characters. Enter a backslash (\) before a slash or a comma.

  • Attribute list: A comma-delimited list of attributes, as described in Table 7-7. The format for an attribute list is AttributeName, AttributeName. Multiple attributes with same name are not allowed, such as Emp_ID, Emp_ID.

    In MOSS 2007, all attributes viewable from the UI are indexed by default. List all custom attributes to index, using the names displayed in the user interface.

    In SPPS (SP 2003), the Title, LastModifiedDate, and Author attributes are indexed by default. List any other attributes to index, using the names displayed in the UI.

    If you update the attribute list from the administrator parameters, then perform a forced recrawl to delete the indexes of the old attribute list and to create indexes for the new attribute list.

  • Domain name: The domain name of the user that is used to crawl the SharePoint site. For example, if you intend to use the OracleDomain\Administrator user for crawling, then enter OracleDomain for this parameter. Do not include .com or .in or any other suffix in the name. (Required)

  • User name: Specifies the user name of a valid SharePoint Portal Server/MOSS 2007 user. Do not include the domain name for this user. For example, for OracleDomain\Administrator, enter Administrator. (Required)

  • Password: Specifies the password of the SharePoint user specified in User name. (Required)

  • Authentication attribute: Format of the user and group identity stored in the ACL of SharePoint objects. This format must be an authentication attribute of the Oracle SES active identity plug-in, such as USER_NAME for an Active Directory identity plug-in. Otherwise, the ACL validation fails during indexing. (Required and case sensitive)

    For example, this value is USER_NAME for the Microsoft Active Directory identity plug-in.

  • SPS Site/Sub-Site URL: The URL of the Site or Sub-site of the SharePoint Portal, which is used for viewing the search results. (Required)

    This URL has the form http://HostName:PortNumber or http://HostName:PortNumber/SubSiteName.

  • Crawl Security Settings: Sets security on documents for indexing. (Required)

    This setting can be one of the following:

    • NORMAL: The regular crawl uses site-level access control lists (ACLs) but not document-level ACLs.

    • RELAX: When the SharePoint Site Administrator user information is not available and the SharePoint user has visitor (or read) permissions on the site, this user is not able to crawl subsites under the main site. This mode is intended for exposing public documents temporarily and quickly to search. The SES administrator must be careful not to expose documents to other users inadvertently. See the work-around for this in "Known Limitations of the SharePoint 2007 Connector".

    • STRICT: Captures even document-level security. This mode requires that an additional Web Service agent, Oracle MOSS Web Service, be installed on the SharePoint 2007 server. See "Deploying the Web Service on MOSS 2007".

  • Simple Include: Only include URLs having at least one word mentioned in this parameter. Separate the words with commas.

  • Simple Exclude: Exclude all URLs having one or more word(s) mentioned in this parameter. Separate the words with commas.

  • Regular Expression Include: Include all URLs that match the expression provided in this parameter.

  • Regular Expression Exclude: Exclude all URLs that match the expression provided in this parameter.

  • Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. Any other value is interpreted as false. The default value is false, so only the latest version is crawled. (Optional)

  • Crawl folder attributes: Controls whether folder attributes are crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)

  • Crawl attachments: This parameter indicates whether attachments should be crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)

  • LDAP URL: URL of the LDAP server, such as ldap://IP:port, where the default port number is 389.

  • LDAP Search Base: LDAP Search Base, such as, DC=abc, DC=com. When the value of Authentication Attribute is DN, specify the LDAP URL and the LDAP search base of the LDAP server configured in the identity plug-in. Otherwise, leave these parameters blank.

Table 7-6 summarizes the supported values for the configuration parameters of the SharePoint Crawler connector.

Table 7-6 Supported Values for SharePoint Source Parameters

Parameter Name SharePoint Portal Server MOSS 2007

SharePoint Version

2003, 2.0

2007, 3.0

Container name

(/) for full site, Library Name, List Name, Area Name

(/) for full site, Library Name, List Name

Attribute list

AttributeName1, AttributeName2

AttributeName1, AttributeName2

Domain Name

Domain name of the user

Domain name of the user

User name

Valid administrator user for SharePoint Portal server

Valid administrator user for MOSS 2007

Password

Password for the user

Password for the user

Authentication attributes

USER_NAME

USER_NAME

SPC Site/Sub-Site URL

IP address or host name with port on which SharePoint Portal Server is installed

IP address or host name with port on which MOSS 2007 is installed

Crawl Security Settings

NORMAL, RELAX

NORMAL, RELAX, STRICT

Simple Include

Part of URL

Part of URL

Simple Exclude

Part of URL

Part of URL

Regular Expression Include

All URLs that match the expression

All URLs that match the expression

Regular Expression Exclude

All URLs that match the expression

All URLs that match the expression

Crawl versions

true or false

true or false

Crawl folder attachments

true or false

true or false

Crawl attachments

true or false

true or false

LDAP URL

URL of the LDAP server

URL of the LDAP server

LDAP Search Base

LDAP Search Base

LDAP Search Base


Table 7-7 Attributes for List Items and Versions Crawled for SharePoint 2007

List Item Type Attributes

Document Library

Title, Author, Created, Modified

Picture Library

Title, ImageSize, ImageCreateDate, Description, Keywords

Form Library

Title, Author, Created, Modified

Translation Library

Title, Name, Language, Base Document Version, Translation Status, Created

Data Connection Library

Connection Type, Description, Keywords, Title, UDC Purpose, Created

Slide Library

Name, Presentation, Description, Created

Report Library

Name, Title, Author, Created, Report Category, Report Status

Dash Board

Name, Title, Author, Created

Wiki Page Library

Title, Author, Created, Modified

Announcements

Title, Body, Editor, Modified, Author, Created

Contacts

Company, WorkCity, Created, Email, Comments, Title, Editor, HomePhone, JobTitle, Modified, WorkZip, WorkPhone, WorkState, FirstName, Author, FullName, WorkCountry, CellPhone, WorkFax, WorkAddress

Links

Comments, Editor, Modified, Author, URL, Created

Discussion Reply

Body, Created, DiscussionTitle, Editor, Modified, Author

Calendar

EventType, Title, EventDate, Duration, Editor, WorkspaceLink, Modified, EndDate, Description, fRecurrence, Author, fAllDayEvent, Created

Task

Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created

Project Task

Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created

Issue Tracking

Category, LinkIssueIDNoMenu, RelatedIssues, IssueID, Priority, DueData, Comment, V3Comments, IsCurrent, Created, Title, Status, Editor, AssignedTo, Modified, Author

Custom List

Title, Editor, Modified, Author, Created

Languages and Translators

Language_x0020_From,Language_x0020_To,Modified,Author,Translator,Created, Editor

KPI List

Title, PercentExpression, Editor, ViewGuid, Modified, Value, AutoUpdate, KpiComments, Author, Goal, ValueExpression, Warning, KpiDescription, DataSource, LowerValuesAreBetter, Created


Deploying the Web Service on MOSS 2007

For MOSS 2007, if the Crawl Security Settings parameter is set to STRICT, then you must install an extra web service, Oracle MOSS Web Service. The following installation and deinstallation files are provided by the OracleMOSSService installer at ORACLE_HOME/search/lib/plugins/sps/WebService.zip:

  • OracleMossService.wsp

  • install.cmd

  • de-install.cmd

To install or deinstall the Oracle MOSS Web Service: 

  1. Click install.cmd to install, or click de-install.cmd to deinstall.

  2. Verify that the STSADM.exe file is in the following location: Drive:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN.

    If STSADM.exe is not in that folder, specify the correct path when the installer prompts for it.

  3. Press any key to continue.