8 Configuring Access to Collaboration Sources

This chapter contains the following topics:

Setting Up Lotus Notes Sources

Lotus Notes data is stored in notes-databases, which can be further contained inside directories on a server. A Lotus Domino Server instance can have one or more databases that can be crawled using the Lotus Notes source. The Lotus Notes source navigates through the databases to crawl the documents (for example, e-mail, calendar, address book, and "to do") in the specified databases. It stores the metadata, and accesses information in Oracle SES to provide search according to the end users' credentials.

The Lotus Notes connector now lets you enable or disable multiple attachment support with the Attachment as Search Item attribute. When this is disabled, the additional attributes Parent URL and Parent Title are added for all attachment documents, to link it with the parent document.

The Lotus Notes source supports incremental crawling; that is, it crawls and indexes only those documents that have changed after recent most crawling was scheduled. A document is re-crawled if either the content, metadata, display URL or the direct security access information of the document has changed. Documents deleted from a database are removed from the index during incremental crawling.

To enable Oracle SES to launch Notes thick client, set the Notes Thick Client parameter to true.

Important Notes for Lotus Notes Sources

The user-account used to crawl Lotus Notes databases should preferably be an Administrator account, such that it has access on all databases and can retrieve and crawl all documents in the specified databases.

Required Software

  • Lotus Domino Server R5.0.9/R6.5.4/R7.0

  • Notes Clients R5.0.9/R6.5.4/R7.0

Required Tasks

The following tasks must be performed before installing the Lotus Notes source:

  1. HTTP and DIIOP tasks must be running on Domino Server.

  2. If the Active Directory identity plug-in is used, then the users and user-groups in the Domino Directory must be synchronized with Active Directory. While using the Active Directory identity plug-in, the short-name in the Lotus Notes person document is used for validating the user in Active Directory, so it should be a resolvable logon name in Active Directory.

  3. Configure the server document:

    1. Open the server document on the Lotus Notes server that must be crawled.

    2. On the Configuration page, expand the Server section.

    3. On the Security page, in the Programmability Restrictions area, specify the appropriate security restrictions for your environment in the following fields:

      Run restricted Lotus Script/Java agents

      Run restricted Java/Javascript/COM

      Run unrestricted Java/Javascript/COM

      For example, you might specify an asterisk (*) to allow unrestricted access by Lotus Script/Java agents, and specify user names that are registered in the Domino Directory for the Java/Javascript/COM restrictions.

      Note:

      The crawler that you configure to crawl this server with the DIIOP protocol must be able to use the user names that you specify in these fields.
    4. Open the Internet Protocol page, then open the HTTP page, and set the Allow HTTP Clients to Browse Database option to Yes.

    5. Configure the user document:

      Open the user document on the Lotus Notes server. This document is stored in the Domino directory.

      On the Basics page, for Internet password, specify a password.

    6. Restart the DIIOP task on the server.

  4. Copy the Lotus Notes JAR files Note.jar and NCSO.jar to the directory ses_home/search/lib/plugins/ln/ before activating the Lotus Notes identity plug-in.

Known Issues

  • A Lotus Notes source does not index encrypted fields, and the content of attachments with encrypted documents, for searching. With encrypted documents, the URL of the search result launches the Notes document instead of the attachment file, which is the case when non-encrypted documents are crawled.

  • Deleted Notes documents and attachments in Notes documents are still searchable after an incremental crawl that was set by specifying 'Recrawl using last modified date' as true. To remove URLs from deleted documents or attachments from the Oracle SES index, either perform a force re-crawl (that is, change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page) or mark the 'Recrawl using last modified date' source parameter as false.

Setting Up Identity Management for Lotus Notes

Activate an identity plug-in on the Global Settings - Identity Management Setup page.

The users/groups on Active Directory can be synchronized with Lotus Domino Directory such that all users/groups in Active Directory get registered in Domino as well. Thus, any ACL entry in a notes database or notes document can be validated in Active Directory also, and vice versa.

Oracle SES also provides a Lotus Notes identity plug-in so the Lotus Domino Directory can be used to authenticate and validate the notes native users and groups in Oracle SES.

Activate the Lotus Notes identity plug-in with the following parameters:

  • Server name: The Domino server fully qualified host name/IP address. If the HTTP port on the Domino server is not 80, then the host name should be server_name:HTTP_port_number.

  • User name: User name of a valid Lotus Domino Server user. Required.

  • Password: Internet password of the Lotus Notes user. Required.

Creating a Lotus Notes Source

Create a Lotus Notes source on the Home - Sources page. Select Lotus Notes from the Source Type list, and click Create. Enter values for the following parameters:

  • Server Name: The Domino server fully qualified host name/IP address. For example, if the Lotus Notes database name is ses.nsf, then enter ses.nsf for this parameter. If the HTTP port on the Domino server is not 80, then the host name should be server_name:HTTP_port_number. Required.

  • Attribute list: The comma-delimited list of Lotus Notes attributes along with their data types to search. The format is AttributeName:AttributeType, AttributeName:AttributeType. The valid values are String, Number, and Date. For example: Subject:String

    Table 8-1 Lotus Notes Data Type Mapping

    Sr. No Lotus Notes Data Type Oracle SES Data Type

    1

    Boolean

    String

    2

    Integer

    Number (Big Decimal)

    3

    String

    String

    4

    Date

    Date


    While crawling a database, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional parameter.

    The default searchable attributes for Lotus Domino Server are LASTMODIFIEDDATE, Title, and Author. Multiple attributes with same name are not allowed.

  • User name: The user name of a valid Lotus Domino Server user. The user should be an Administrator user or a user who has access to all folders and documents of the databases configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from documents of all databases configured in Container name parameter. Required.

  • Password: Internet password of the Lotus Notes user. Required.

  • Container Name: Names of the containers to be crawled. Multiple container names must be comma-delimited. The container name can include folders, databases, views and folders within databases. For example, database-abc.nsf, folders-folder1, views-abc.nsf:By Author, and db-abc.nsf:folder\subfolder. Note that Lotus Notes database file name must be specified with the extension.

  • Crawl Public Documents: Indicate whether the public documents on notes databases must be crawled such that they are available to anonymous users in Oracle SES, either true or false. Required.

  • Authentication Attribute: The attribute used to validate the ACL. With the Active Directory identity plug-in, set the value to USER_NAME. With the Lotus Notes identity plug-in, set the value to NATIVE. Required.

  • Mail Template Name: This parameter is specific to the mail-databases and the mail template's name should be specified here if any/all of the databases being crawled are mail databases. This is a mandatory parameter if either the Past Days or Future Days parameter is specified.

  • Past Days: If the user is crawling calendar entries, then this parameter specifies the number of days in the past for which the calendar entries are picked. The date of reference here is the start date of the event. This accounts for the number of days in the past, and it does not filter the search by time.

  • Future Days: If the user is crawling calendar entries, then this parameter specifies the number of days in the future for which the calendar entries are picked. The date of reference here is the end date of the event. This accounts for the number of days in the future, and it does not filter the search by time.

  • Notes Title Field: Because in Lotus Notes custom applications it is not mandatory to maintain a Title field, this parameter has been provided to specify those text fields that should be parsed to retrieve the title field. For example, you could enter Subject. With multiple field names, the first field available on the document is selected for the title. Required.

  • Notes Thick Client: Enter true to use Lotus Notes (thick client). Enter false to use Lotus Notes Web access.

  • Recrawl using last modified date: Enter true to enqueue only modified documents. Required.

  • Attachment As Search Item: Enter true to have each document in the attachment be submitted individually as an independent document with the same set of attributes and ACLS as that of the parent document. Enter false to have attachments be added to the parent document and submitted as a single unit.

Displaying the Parent URL in the Search Results

Take the following steps to display the Parent URL attribute in the search results for Lotus Notes connector:

  1. On the Global Settings page, select Use Advanced Configuration. The Global Settings - Configure Search Result List is displayed.

  2. Under Attribute Selection, move ParentURL to the Included list.

  3. Under Style Sheets in the Enter an XSLT box, scroll to <!-- Links link -->...</td> and enter the following XSL code:

    <tr>
       <td>
          <xsl:if test="parenturl [.!= '']">
             <xsl:text>ParentURL: </xsl:text>
             <a class="browseLink" href="{parenturl}">
                <xsl:value-of select="parenturl" />
             </a>
          </xsl:if>
       </td>
    </tr>
    
  4. Click Apply.

Setting Up Microsoft Exchange Sources

Oracle SES can crawl through and provide secure search for e-mail and calendar items, related metadata, attributes, ACLs, and attachments in Microsoft Exchange. It also provides attribute search and browse functionality, which allows search to be done against a specific subfolder in the hierarchy.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed since the last crawl was scheduled. A document is re-crawled if either the content or metadata or the direct security access (permissions) information of the document has changed. A document is also re-crawled if it is moved within Microsoft Exchange. Documents deleted from Exchange are removed from the index during incremental crawls.

A Microsoft Exchange source covers the following objects in Exchange:

  • E-mail

  • E-mail attachments

  • Calendar events

Important Notes for Microsoft Exchange Sources

On the Exchange server, the super user must grant himself the Send as and Receive as privileges. You can enable privileges globally for all users in the system. No user-specific privilege grants are required.

See Also:

Required Software

  • Microsoft Internet Information Server (IIS)

Note:

The file ADODB.dll is usually included in the Microsoft .NET Framework SDK. However, if this file is not on your computer, then you must download the ADODB.dll appropriate for your system from Microsoft and install it using the following command:

gacutil /i adodb.dll

You can download the Microsoft .NET Framework from this site:

http://www.microsoft.com/en-us/download/default.aspx

Required Tasks

  • Proper permissions on the Exchange server must be granted to the Exchange administrator. The Exchange server is crawled with the permission of a super user with the Send as and Receive as privileges. The easiest way to configure this is to use an administrator as super user or create a super user with the administrator privilege and the Send as and Receive as privileges targeting Exchange inbox store and public folders.

  • To enable the Outlook Web Access logon page, you must enable forms-based authentication on the server. To enable forms-based authentication:

    1. On the Exchange server, log on with the Exchange administrator account, and then start Exchange System Manager.

    2. In the console tree, expand Servers.

    3. Expand the server for which you want to enable forms-based authentication, and then expand Protocols.

    4. Expand HTTP, right-click Exchange Virtual Server, and then click Properties.

    5. In the Exchange Virtual Server Properties dialog box, on the Settings tab, in the Outlook Web Access pane, select the Enable Forms Based Authentication option.

    6. Click Apply, and then click OK.

    7. Restart the IIS server.

    If you are using forms-based authentication with SSL off-loading, you must configure your Exchange Server front-end servers to handle this scenario.

    See Also:

    • How to Enable Forms-Based Authentication at

    http://technet.microsoft.com/en-us/library/bb123832.aspx

Known Issues

E-mails with multibyte characters sent from a browser with a different language set than the characters in the mail are not indexed correctly in Oracle SES. The multibyte characters are converted to question marks (?).

This is a known e-mail content issue with Microsoft Exchange. To send future e-mails so that the Microsoft Exchange connector can crawl them properly, either of these workarounds can be applied:

  • Change the browser language to the characters in the e-mail. For example, set it to "Japanese" to input Japanese characters.

  • Change the value of the following registry key:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeWEB\OWA\UseRegionalCharset
         (Original) '1'
         (New)      Any number (except 1). For example, '0'
    

See Also:

Setting Up Identity Management for Microsoft Exchange

The Microsoft Exchange connector uses WebDAV for best performance. Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that Microsoft Exchange is using to authenticate users on the file system.

For the Oracle SES instance to read the files during crawling, add permission to each folder and file to make them accessible by the operating system user that runs the Oracle SES instance. Adding permissions to a folder automatically adds the same permissions to all the files and subfolders in the folder.

Creating a Microsoft Exchange Source

Create a Microsoft Exchange source on the Home - Sources page. Select Microsoft Exchange from the Source Type list, and click Create.

Enter values for the following parameters:

  • User Name: User name to authenticate between Oracle SES and Exchange

  • Password: password to authenticate between Oracle SES and Exchange

  • Server: Microsoft Exchange server IP

  • Domain: Microsoft Exchange server domain

  • LDAP Port: Microsoft Exchange LDAP port

  • Simple Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using simplified regular expressions. Specify an inclusion rule that a URL contain, start with, or end with a term. Only *, ^, and $ operators are permitted. An asterisk (*) is a wildcard. A caret (^) denotes the beginning of a URL, and a dollar sign ($) denotes the end. For example: ^https://*.oracle.com/.jpg$

  • Simple Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using simplified regular expressions. Only *, ^, and $ operators are permitted.

  • Regular Expression Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using restricted (full java.util.regexp) regular expression rules. For example:

    ^https://.*\.oracle(?:corp){0,1}\.com

  • Regular Expression Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using restricted (full java.util.regexp) regular expression rules.

Microsoft Exchange Source Attributes

  • ReceivedTime

  • From

  • To

  • CC

  • Subject

  • Lastmodifieddate

Setting Up NTFS Sources for Windows

This section contains information for NTFS sources on Windows. For NTFS on UNIX, see "Setting Up NTFS Sources for Linux and UNIX".

The NTFS connector enables Oracle SES to search file repositories in Microsoft NTFS. An Oracle SES NTFS source collects the content, metadata attributes and ACLs of files in NTFS. An NTFS source supports incremental crawl. After the initial crawl is performed, subsequent crawls only collect those documents that have changed since the last crawl. A document is re-crawled if the content, metadata, or the ACL information of the document has changed. A file is also re-crawled if it is moved between folders. Files deleted from NTFS are removed from the index during incremental crawls.

Important Notes for NTFS Sources

  • The operating system user running the Oracle SES instance must have read permission on the NTFS file share being crawled. For example, if the remote file share \\computer1\share1\directory1\ is crawled by the NTFS source, then the Oracle SES instance must be run as a domain user who has access to the file share.

  • If you get the ACL in the form <encrypted acl>@domain for a folder on a remote computer, it probably means that the computer running the Oracle SES instance and the remote computer are on different domains and your computer cannot interpret the ACLs appropriately.

  • Currently, the Oracle SES crawler considers the shared folder an empty document, but it is not indexed; therefore, the total number of unique documents indexed is less than the total number of documents fetched.

  • An ACL error may appear when crawling an NTFS source as a built-in user or group, such as an Administrator user. As a workaround, set explicit access to the administrator user by right clicking the NTFS share and adding the administrator user with all the permissions.

  • Everyone is a special group that represents all current network users, including guests and users from other domains. When a user logs on to the network, the user is automatically added to the Everyone group. The NTFS connector supports the Everyone group. All documents for which the Everyone group has permission is crawled and accessed like public documents. There is no need to log in to the query application to access these public documents. However, if there is a deny to a user along with permissions to Everyone group to access the document, then all users except for the one for who deny has been granted can see the document, and these users must log in to the query application to see the document.

  • When using Internet Explorer with files on a different domain, you must explicitly log in to Internet Explorer to open result links to those files.

  • When you use the NTFS connector and search file types of .txt, .zip, or .rtf, only the Title and Author attributes are fetched and indexed. For these attributes, the crawler fetches the properties stored in the authoring program (typically accessed by selecting Properties from the File menu) and not the NTFS properties (accessed in Windows Explorer by right-clicking the file name and choosing Properties).

Required Software

Microsoft .NET Framework 2.0

Required Tasks

If not previously installed, then download and install the Microsoft .NET 2.0 Framework from this site:

http://www.microsoft.com/en-us/download/default.aspx

Setting Up Identity Management for NTFS Sources

When an NTFS source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that NTFS is using to authenticate users on the file system.

For the Oracle SES instance to read the files during crawling, add the permission to each folder and file to make it accessible by the operating system user that runs the Oracle SES instance. Adding permissions to a folder automatically adds the same permissions to all the files and sub-folders in the folder.

NTFS sources rely on Active Directory for security permissions. Because permissions at the server local group level are not defined in Active Directory, these permissions are not supported when crawling NTFS sources. Permissions for server local groups (not domain local groups) are ignored during crawling. Permissions for domain groups and users inherited from server local groups also are ignored.

Creating an NTFS Source

Create an NTFS source on the Home - Sources page. Select NTFS from the Source Type list, and click Create. Enter values for the following parameters:

  • UNC Path: UNC Paths, for example, \\MyServer\Mysharedfolder

  • Domain Name: Domain name of the URL (UNC Path)

  • Simple Include: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only *, ^, and $ operators are permitted. For example: ^https://*.oracle.com/.jpg$

  • Simple Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only *, ^, and $ operators are permitted.

  • Regular Expression Include: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (full java.util.regexp) regular expression rules. For example: ^https://.*\.oracle(?:corp){0,1}\.com

  • Regular Expression Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (full java.util.regexp) regular expression rules.

  • Use Local Display URL: Enter true to use the local display URL or false to use display the content in a web browser.

  • Authentication Attribute: Authentication attribute used by the LDAP to validate the user. Use USER_NAME for Active Directory and nickname for Oracle Internet Directory.

After crawling an NTFS source, you may get a "No User Found Matching the Criteria" error message on the Home - Schedules - Data Synchronization page. This error is signalled by the identity plug-in. The NTFS connector tries to validate the principal as user first. If that fails, then it tries to validate the principal as group. This error occurs if there are groups as ACL for a document, because the connector does not know if the given principal is a user or a group.

NTFS Source Attributes

  • ACLS_

  • FILEDATE

  • Host

  • Language

  • LastModifiedDate

  • Mimetype

  • Title

Setting Up NTFS Sources for Linux and UNIX

This section contains information for NTFS sources on Linux and UNIX platforms, which have additional setup steps not required on Windows. For NTFS sources on Windows, see "Setting Up NTFS Sources for Windows".

An NTFS source collects the content, metadata attributes, and ACLs of files in NTFS. An NTFS source supports incremental crawl. After the initial crawl is performed, subsequent crawls only collect those documents that have changed since the last crawl. A document is re-crawled if the content, metadata or the ACL information of the document has changed. A file is also re-crawled if it is moved between folders. Files deleted from NTFS are removed from the index during incremental crawls.

Important Notes for NTFS Sources

  • On the Windows server, the super user must have permission to read the NTFS file share.

  • The super user must be the impersonate user in the IIS Server.

  • The default behavior for NTFS for Linux and UNIX platforms is to use local file display URL, so the client computer must have access to the file share.

  • An ACL error may appear when crawling an NTFS source as a built-in user or group, such as an Administrator user. As a workaround, set explicit access to the administrator user by right clicking the NTFS share and adding the administrator user with all the permissions.

  • Everyone is a special group that represents all current network users, including guests and users from other domains. When a user logs on to the network, the user is automatically added to the Everyone group. The NTFS connector supports the Everyone group. All documents for which the Everyone group has permission is crawled and accessed like public documents. There is no need to log in to the query application to access these public documents. However, if a user is denied access to a document while the Everyone group has access, then all users except for the denied user can see the document, and these users must log in to the query application to see the document.

  • When using Internet Explorer with files on a different domain, you must explicitly log in to Internet Explorer to open result links to those files.

Required Software

  • Microsoft Internet Information Server (IIS)

  • Microsoft .NET 2.0 Framework

Setting Up Identity Management with NTFS Sources

When an NTFS source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that NTFS is using to authenticate users on the file system.

For the Oracle SES instance to read the files during crawling, add permission to each folder and file to make them accessible by the operating system user that runs the Oracle SES instance. Adding permissions to a folder automatically adds the same permissions to all the files and sub-folders in the folder.

NTFS sources rely on Active Directory for security permissions. Because permissions at the server local group level are not defined in Active Directory, these permissions are not supported when crawling NTFS sources. Permissions for server local groups (not domain local groups) are ignored during crawling. Permissions for domain groups and users inherited from server local groups also are ignored.

Creating an NTFS Source Type on Linux and Unix Platforms

To create an NTFS source type on Linux or UNIX platform:

  1. On the Global Settings page, click the Source Types link.

  2. On the Source Types page, click Create.

  3. Provide the following parameter values on the Complete the New Source Type : Step 1 page:

    Paramater Value
    Name Microsoft NTFS (Linux and Unix)
    Plug-in Manager Java Class Name oracle.search.plugin.ntfs.linux.NTFSLinuxCrawlerPluginManager
    Plug-in Jar File Path ntfs/ntfs.jar

  4. Click Next.

  5. On the Complete the New Source Type : Step 2 page, click Finish.

Now, create the NTFS source on Linux or Unix platform as described in the following section.

Creating an NTFS Source for Linux and Unix Platforms

To create an NTFS source on Linux or UNIX platform:

  1. On the home page, select the Sources secondary tab.

  2. On the Sources page, select the NTFS source type and click Create.

  3. Complete the Create User-Defined Source page. Table 8-2 describes the parameters.

  4. Click Create or Create & Customize.

Table 8-2 NTFS Source Parameters for Linux and UNIX Platforms

Parameter Description

UNC Path

UNC path for the NTFS system to crawl; for example, \\MYSERVER\mysharedfolder

WebService Endpoint

Target end point (HTTP or HTTPS); for example

https://mail.example.com/NTFSWebService/NTFSWebService.asmx

WebService User Name

User name to authenticate the NTFS WebService for the Endpoint.

WebServicePassword

Password for User Name.

Simple Include

To limit crawling, specify up to 50 colon-delimited (:) path inclusion boundary rules using simplified regular expressions. Specify an inclusion rule that a URL contain, start with, or end with a term. Only *, ^, and $ operators are permitted. An asterisk (*) is a wildcard. A caret (^) denotes the beginning of a URL, and a dollar sign ($) denotes the end of a URL. For example: ^https://*.oracle.com/.jpg$

Simple Exclude

To limit crawling, specify up to 50 colon-delimited (:)path exclusion boundary rules using simplified regular expressions. Only *, ^, and $ operators are permitted.

Regular Expression Include

To limit crawling, specify up to 50 colon-delimited (:) path inclusion boundary rules using restricted (full java.util.regexp) regular expression rules. For example: ^https://.*\.oracle(?:corp){0,1}\.com

Regular Expression Exclude

To limit crawling, specify up to 50 colon-delimited (:) path exclusion boundary rules using restricted (full java.util.regexp) regular expression rules.

ACL Validation Attribute

ACL attribute used to validate the user. Enter USER_NAME for Active Directory or nickname for Oracle Internet Directory.

Domain Name

Domain name of the URL (UNC Path).

Incremental Crawl With File Change Detector

Enter true to use the File Change Detector, or false to use scan-based incremental crawl. See "Installing Oracle Search File Change Detector"


After crawling an NTFS source, you may get a "No User Found Matching the Criteria" error message on the Home - Schedules - Data Synchronization page. If this error accompanies a crawl failure, then check that the principal is a valid user or group.

Installing and Configuring Windows Services

NTFS sources on Linux and UNIX platforms requires an NTFS agent to be installed and configured on the Windows domain where the NTFS files are to be crawled. The NTFS agent collects and sends content and metadata to the crawler plug-in on the Oracle SES computer in a crawl session. The communication protocol between Oracle SES and the NTFS agent is HTTP or HTTPS.

The NTFS agent must be installed on a Windows computer where IIS is present, and the computer must be in the same Windows domain where the NTFS file share to be crawled resides.

Typically, a remote file share is crawled with the permission of a domain administrator or a domain user with read privileges on the file share. The easiest way to configure this is to add the domain admin group to the 'administrators' group of the target computer.

The Oracle SES instance must connect to the same Active Directory instance that the Microsoft NTFS domain connects to.

Required Software

Microsoft .NET Framework 2.0

Internet Information Services (IIS) Manager

Required Tasks

Verify that Microsoft .NET 2.0 Framework is installed. If it is not, then download and install it from this site:

http://www.microsoft.com/en-us/download/default.aspx

Installing Oracle Search File Change Detector

By installing and configuring the Oracle Search File Change Detector service, you can realize significantly improved performance in incremental crawls. This service provides the crawler with a list of documents that are modified or deleted. This method is more efficient than scanning all files for changes.

The older, scan-based incremental crawl is still available and can be used when File Change Detector cannot be deployed on your NTFS system or under the conditions listed in "Configuring the NTFS Connector".

The following procedure installs File Change Detector in the Microsoft .NET Framework.

To install Oracle Search File Change Detector: 

  1. Copy OracleSearchFileChangeDetector.zip from ses_home/search/lib/plugins/ntfs to the Windows server where Internet Information Services (IIS) is running.

  2. Unzip the contents of OracleSearchFileChangeDetector.zip file to a folder. This zip file contains two files:

    • OracleSearchFileChangeDetector.exe

    • OracleSearchFileChangeDetector.exe.config

  3. Open OracleSearchFileChangeDetector.exe.config in a text editor and modify the configuration settings as necessary. The settings are described in "Modifying the File Change Detector Configuration File".

  4. Open a command prompt window and navigate to the folder for .NET Framework Version 2.0. For example:

    C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727

  5. Install the OracleSearchFileChangeDetector service by issuing a command like the following, where path is the directory containing the configuration file:

    InstallUtil path\OracleSearchFileChangeDetector.exe
    

    For example:

    installutil d:\OracleSearchFileChangeDetector\
    OracleSearchFileChangeDetector.exe
    

    The Set Service Login dialog box is displayed.

  6. Enter the user credentials for the domain user identified in the ASP.NET Configuration Settings dialog box. For Username, use the format domain\username.

  7. Open the Windows Services utility and start the OracleSearchFileChangeDetector service.

  8. Install the NTFS Web service as described in "Installing the NTFS Web Service".

Modifying the File Change Detector Configuration File

The OracleSearchFileChangeDetector.exe.config file is the XML configuration file for the File Change Detector. When you add new sources, this file is automatically updated with the UNC path of the sources. However, if you make changes to the path of an already existing source, then you must restart File Change Detector for the new path to be watched.

Example 8-1 shows a sample configuration file.

Example 8-1 Oracle Search File Change Detector Configuration File

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
   <configSections>
      <section name="StartupFolders"
         type="FileChangeDetector.StartupFoldersConfigSection,
         OracleSearchFileChangeDetector"/>
   </configSections>
   <StartupFolders>
      <DefaultInternalBufferSizeValue>
         <add internalBufferSize="32768" />
      </DefaultInternalBufferSizeValue>
      <Folders>
         <add sourceName="NTFS1" path="10.255.255.255\writeHere"/>
         <add sourceName="NTFS2" path="10.255.255.255\Work"
            internalBufferSize="40960"/>
      </Folders>
      <Results>
         <add directory="C:\NTFS\Data" />
      </Results>
      <SESBufferSizeValue>
         <add sesBufferSize="1" />
      </SESBufferSizeValue>
   </StartupFolders>
</configuration>

The XML elements are described in the following topics.

DefaultInternalBufferSizeValue

Oracle Search File Change Detector uses a Windows API to capture file update events. The API uses an internal buffer to cache events. The buffer size is specified in the internalBufferSize parameter of the nested add element:

<DefaultInternalBufferSizeValue>
   <add internalBufferSize="n" />
</DefaultInternalBufferSizeValue>

The internalBufferSize parameter specifies the default buffer size for all folders that the File Change Detector monitors, as specified in the Folders element.

The internal buffer is allocated from non-paged memory, which cannot be swapped to disk. Therefore, keep the value of internalBufferSize as small as possible. Increase the value for frequent, highly concurrent updates: More than 100 changes per second.

Folders

This element specifies the list of directories to be watched. Create one nested add element for each NTFS source:

<Folders>
   <add sourceName="name" path="path"/>
   <add sourceName="name" path="path" internalBufferSize="n"/>
</Folders>

The nested add element has these attributes:

  • SourceName: A unique name within the configuration file to identify the NTFS source. (Required)

  • Path: The UNC path specified in the NTFS source configuration. (Required)

    To specify multiple UNC paths, use colon as the delimiter. For example:

    <add sourceName="ntfstest" path="\\server1\share1\Folder:\\server2\share1"/>
    
  • InternalBufferSize: A value that overrides DefaultInternalBufferSizeValue for a source where extensive changes are expected. (Optional)

Results

Specifies the folder where the Oracle Search File Change Detector logs the changes. The value must be the same as the IncrementalCrawlData property in the Web service configuration.

<Results>
   <add directory="path" />
</Results>

SESBufferSizeValue

This element specifies the number of events cached in an internal buffer by the OracleSearchFileChangeDetector service before writing them to the log file. For example, a value of 1 indicates that every event is written immediately to the log file, while a value of 10 means that 10 events are cached before writing them to the log file.

Increase the value of the sesBufferSize parameter when capturing changes in folders where you expect extensive changes. However, the larger the buffer size is, the less up-to-date the changes in the log file are, because updates are less frequent. A reasonable value is the average number of concurrent updates to the crawled folders.

<SESBufferSizeValue>
   <add sesBufferSize="n" />
</SESBufferSizeValue>

Installing the NTFS Web Service

Install this service after you install Oracle Search File Change Detector, as described in "Installing Oracle Search File Change Detector".

To install the NTFS Web service on IIS 6 on Windows 2003 Server: 

  1. Copy NTFSWebService.zip from ses_home/search/lib/plugins/ntfs to the Windows server where Internet Information Services (IIS) is running.

  2. Unzip the files in NTFSWebService.zip into a permanent folder.

  3. Create a virtual directory on the Internet Information Server with the path pointing to the folder created in the previous step.

    1. Select Administrative Tools from the Windows Start menu, then select Internet Information Services (IIS) Manager.

    2. Expand the navigator in IIS Manager and right-click a Web site.

    3. Select New, then Virtual Directory.

    4. Follow the steps of the Virtual Directory Creation wizard.

  4. On the Virtual Directory Access Permissions page of the wizard, select Read and Run Scripts (such as ASP).

  5. Open NTFSWebService Properties.

  6. On the ASP.NET tab, verify that ASP.NET is version 2.0.

  7. On the General tab, enter the settings described in Table 8-3.

  8. On the Application tab, select Local Impersonation and enter the user credentials in the form domain\username.

    The application user must have these permissions:

    • Read on the NTFS Web Service physical directory

    • Read on the file share to be crawled.

    • Write on the C:\WINDOWS\Microsoft.NET\Framework\version\Temporary ASP.NET Files folder.

      If the application user does not have access to this directory, then the Web service cannot load the required DLLs and signals the following error when it tries to access the Web service:

      Server Error in '/NTFSWS683343' Application
      Could not load file or assembly 'WEBSESNTFS' or one of its dependencies. Access is denied.
      

To install the NTFS Web service on IIS 7 on Windows 2008 Server: 

  1. Copy NTFSWebService.zip from ses_home/search/lib/plugins/ntfs to the Windows server where Internet Information Services (IIS) is running.

  2. Unzip the files in NTFSWebService.zip to a directory.

  3. Create a virtual directory on the Internet Information Server with the path pointing to the folder created in the previous step.

    1. Select Administrative Tools from the Windows Start menu, then select roles > Web Server (IIS) > Internet Information Server.

    2. Expand the navigator in IIS Manager, expand Sites, and right-click a Web site.

    3. Select Add Virtual Diectory.

    4. Follow the steps of the Virtual Directory Creation wizard.

  4. Convert the virtual directory to application by right-clicking it, and then clicking Convert to Application.

  5. Click the virtual directory, and in the rightmost pane, double click Application Settings.

  6. Enter the settings described in Table 8-3.

  7. Navigate to the directory where the files in NTFSWebService.zip were extracted as described in Step 0

  8. Open the web.config file in a text editor, and update the configuration settings in it, such that the application user must have the following permissions:

    • Read on the NTFS Web Service physical directory

    • Read on the file share to be crawled.

    • Write on the C:\WINDOWS\Microsoft.NET\Framework\version\Temporary ASP.NET Files folder.

      If the application user does not have access to this directory, then the Web service cannot load the required DLLs and signals the following error when it tries to access the Web service:

      Server Error in '/NTFSWS683343' Application
      Could not load file or assembly 'WEBSESNTFS' or one of its dependencies. Access is denied.
      

Table 8-3 ASP.NET Configuration Settings

Parameter Description

ServiceUsername

User name that authenticates Oracle SES to the NTFS Web service. You also enter this user name when creating the NTFS source. Oracle SES cannot access the Web service without the service username and password.

ServicePassword

Password for ServiceUsername. Ensure that this password is kept secure.

Batchsize

Determines the number of file URLs fetched for a Web service response. The NTFS connector processes a folder by fetching all the files in the folder.

FileChunkSize

Positive integer that specifies the chunk size. Large documents are sent in chunks to the NTFS connector. Enter a positive integer. For example, 1024000 divides the file into 1 MB chunks for sending over the Web.

File chunk size should be the optimal data size that can transfer over the network.

IncrementalCrawlData

Path of the Results directory as specified in the Oracle Search File Change Detector configuration file. See "Modifying the File Change Detector Configuration File".

Choose the Application tab and impersonate as user that has read permission on the shared folder.


To verify that the NTFS Web service is installed correctly on IIS 6 on Windows 2003 Server:  

  1. Open Internet Information Services (IIS) Manager.

  2. In the navigation tree, select NTFSWebService to display its contents in the right pane.

  3. Right-click NTFSWebService.asmx and choose Browse.

  4. Ensure that the Web service methods described in Table 8-4 are listed.

To verify that the NTFS Web service is installed correctly on IIS 7 on Windows 2008 Server:  

  1. Click the virtual directory that was created on the Internet Information Server as described in the Step 3 of section "To install the NTFS Web service on IIS 7 on Windows 2008 Server:".

  2. Click Content View at the bottom of the rightmost pane.

  3. Right-click NTFSWebService.asmx and choose Browse.

  4. Ensure that the Web service methods described in Table 8-4 are listed.

Table 8-4 NTFS Web Service Methods

Method Description

ClearFCDLog

Clears the current Oracle Search File Change Detector log.

ClearPreviousFCDLog

Clears the previous Oracle Search File Change Detector log.

GetDFList

Gets all the files and subfolders in a specified folder.

GetDocContainer

Gets the file and the access URL, display URL, and actual content after encoding. It also gets the ACL for the files and attributes of the file.

GetFileInParts

Gets the file after breaking it into chunk. The FileChunkSize parameter controls the chunk size.

GetMinimalMetadata

Fetches the ACL for the document and the last modified date of the file to determine whether the file has changed.

GetModifiedURLs

Gets a list of modified files and folders from the Oracle Search File Change Detector.


Configuring the NTFS Connector

The NTFS connector must be configured to perform incremental crawls with the Oracle Search File Change Detector. The connector has an additional parameter.

To configure the NTFS connector: 

  1. Open the Oracle SES Administration GUI, and select the Sources secondary tab.

  2. Create or edit the NTFS connector.

  3. Set the Incremental crawl with the File Change Detector parameter to true.

When the Incremental crawl with File Change Detector parameter is set to true, the NTFS connector performs the incremental crawl using the detector change logs. It reverts automatically to a scan-based incremental crawl under these conditions:

  • The Oracle Search File Change Detector service is stopped.

  • The Oracle Search File Change Detector service is started after the previous crawl start time. Scan-based incremental crawl is performed because some changes in the NTFS system might not be captured by the File Change Detector.

  • The internal buffer of the File Change Detector overflowed. When the buffer overflows, the file change detector might not capture some changes.

To revert manually to a scan-based incremental crawl, set the Incremental crawl with the File Change Detector parameter to false.

Known Issues

  • The Oracle Search File Change Detector does not capture changes to top-level directories used in the crawler configuration (UNC Path). Note that other directories within the folder are detected correctly.

  • Changes to the source configuration, such as boundary rules and maximum file size, do not affect incremental crawls. For these changes to take effect, run a scan-based incremental crawl by setting the Incremental crawl with the File Change Detector parameter to false.

  • File Change Detector hangs after the Windows Server Active Directory is restarted. You must manually restart the File Change Detector service whenever Active Directory is restarted.

Setting Up Oracle Calendar Sources

Oracle recommends creating one source group for archived calendar data and another source group for active calendar data. One instance for the archived source can run less frequently, such as every week or month. This source should cover all history. A separate instance for the active source can run daily for only the most recent period.

Setting Up Identity Management for Oracle Calendar

The Oracle SES instance and the Oracle Calendar instance must be connected to the same Oracle Internet Directory system.

To set up a secure Oracle Calendar source: 

  1. On the Global Settings - Identity Management Setup page in the Oracle SES Administration GUI, select the Oracle Internet Directory identity plug-in manager, and click Activate.

  2. Use the following LDIF file to create an application entity for the plug-in. An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.

    oracle_home/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f ses_home/search/config/ldif/calPlugin.ldif
    

    This string defines the entity that is used for the plug-in: orclapplicationcommonname=ocscalplugin,cn=oses,cn=products,cn=oraclecontext. The entity has the password welcome1.

Creating an Oracle Calendar Source

Create an Oracle Calendar source on the Home - Sources page. Select Oracle Calendar from the Source Type list, and click Create. Enter values for the parameters described in Table 8-5.

Table 8-5 Calendar Source Parameters

Parameter Value

Calendar server

http://host name:port

Application entity name

orclapplicationcommonname=ocscalplugin,cn=oses,cn=products,cn=oraclecontext

Application entity password

welcome1

OID server hostname

Oracle Internet Directory hostname

OID server port

389

OID server SSL port

636

OID server ldapbase

dc=us,dc=oracle,dc=com

OID login attribute

uid

User query

(objectclass=ctCalUser)

Past days

30

Future days

60

Rollover

true

Calendar server for Display URL

Calendar endpoint URL to be used to formulate the display URL; for example, http://calendarserver:7777.

If this parameter is blank, then the value provided for the Calendar server parameter is used to formulate the display URL.


Oracle Calendar Attributes

  • Description

  • Priority

  • Status

  • start date

  • end date

  • event Type

  • Author

  • Created Date

  • Title

  • Location

  • Dial_info

  • ConferenceID

  • ConferenceKey

  • Duration

Setting Up Oracle Collaboration Suite E-Mail Sources

Oracle Collaboration Suite 10g Mail (Oracle Mail) implements the IMAP protocol, which is used by Oracle SES to retrieve data. You must login to the mail server using the user name and password to retrieve information. Note that Oracle Collaboration Suite mail server has a flag that allows the administrator to crawl mails of all users. The IMAP connector uses this feature to crawl all the mails of all the users using the mail server's admin login.

Important Notes for Oracle Collaboration Suite E-Mail Sources

Apart from the private folders, the Oracle Collaboration Suite E-Mail has shared folders. You can share any folder with another person by making it shared. Hence, while doing ACL stamping, the crawler must look if the mail is a part of a private folder or a shared folder and act accordingly.

The Oracle Collaboration Suite E-Mail has a Web interface to open mail. This same Web interface opens the searched mails from Oracle SES.

Required Tasks

For the e-mail admin to crawl data, set this parameter:

Go to Farm - Midtier - Mail Application - IMAP Server - Default Settings, and set Allow Admin to Access Any Account to true.

Setting Up Identity Management for Oracle Collaboration Suite E-Mail Sources

Activate the identity plug-in on the Global Settings - Identity Management Setup page. Select Oracle Internet Directory identity plug-in and click Activate.

Enter values for the following parameters: 

  • Authentication Attribute: Select nickname.

  • Host name: Enter the host name of the computer where Oracle Internet Directory is running.

  • Port: Enter the value 389, which is the default LDAP port number.

  • Use SSL: Enter true or false.

  • Realm: Enter the Oracle Internet Directory realm; for example, dc=us,dc=oracle,dc=com.

  • User name: Enter the Oracle Internet Directory administrator user name; for example, cn=orcladmin.

  • Password: Enter the password for the user name.

Creating an Oracle Collaboration Suite E-Mail Sources

Create an Oracle Collaboration Suite E-Mail source on the Home - Sources page. Select Oracle Collaboration Suite E-Mail from the Source Type list, and click Create.

Enter values for the following parameters: 

  • Email Server Address: The IP address or DNS name of the IMAP e-mail server to be crawled, with the port number. This also specifies if the e-mail server follows IMAP or IMAPS protocol. Required.

    Use the format:

    [imap | imaps]://IPaddress:portNumber

    An exception is thrown if this parameter is null. If the server address is incorrect, then an exception is logged at the time of accessing the server.

  • Email Server Admin User: The administration user name to access the e-mail server. Required.

  • Email Server Admin Password: The password of the e-mail admin user. Required.

  • Remove Deleted messages from Index: Indicates whether to keep the index for deleted mails in incremental recrawls. Valid values are yes and no. Any other value is considered to be yes.

  • Authentication Attribute: Attribute used to validate the user. This varies based on the identity plug-in used for authentication. Oracle Collaboration Suite E-Mail uses Oracle Internet Directory for authentication, so set this parameter to mail.

  • LDAP Server: The LDAP server information (IP address or DNS name, and so on).

  • LDAP Server Port: The LDAP server port number.

  • LDAP Admin User Name: The administrator user name of the LDAP server. Required.

  • LDAP Admin Password: The password of the admin user of the LDAP server.

  • LDAP Base: The domain to be searched; for example, dc=oracle, dc=com.

  • LDAP Query: The query string defining the users whose e-mails must be crawled. This parameter is used for user-level partitioning.

    For example, to crawl only users with names beginning with A and having an e-mail in the domain us.example.com, the query is (|(cn=A*)(mail=*@us.example.com)).

  • Days from which crawling needs to be done: The number of days in the past from which the crawling is done. The current date (time of crawl) is the base. For example, a value of 200 specifies crawling messages with dates that are 200 or fewer days old. All mail is the default value.

  • Days to which the crawling needs to be done: Specifies the number of days in the past to which the crawling must be done. The current date (time of crawl) is the base. For example, a value of 7 specifies crawling messages that are seven or more days old. Today is the default value.

  • Display URL template: The display URL to be used for viewing the documents. This should have the placeholder for e-mail or user ID. For example, to see the full e-mail address in the display URL, enter the following:

    http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$EMAIL

    To see the user ID, enter the following:

    http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$UID

  • Email Server Version: The email server to be crawled. Valid values are ocs10g or beehive.

  • Folders to crawl: The comma-delimited list of folders to be crawled. '*' means crawl all folders. Other valid values are INBOX, sent, and trash. This does not support regular expressions.

  • Folders not to crawl: The comma-delimited list of folders not to be crawled. This list is considered only if the Folders to crawl parameter has the * wildcard as its value. Valid values are INBOX, sent, and trash. This parameter does not support regular expressions.

  • Revisit Skipped Attachments: Controls whether the crawler revisits attachments that were skipped in earlier crawls because they did not meet the document type inclusion rules. This setting provides an alternative to a force recrawl after changing the document type inclusion rules. Set to TRUE to revisit skipped attachments, or set to FALSE otherwise (default).

  • Remove out-of-window messages: Set to true to remove the crawled messages that are outside the current time window, else set to false. Default is false.

  • Incremental Crawl Mode: Mode for the incremental crawl. The valid values are New, Maintenance, and Auto. Set to New to process only the new messages. Set to Maintenance to process only the deleted and moved messages. Set to Auto to let the crawler choose the right mode of incremental crawl for the crawling operation. Default is New.

  • Maintenance Time Window: The timeframe to start Maintenance mode of incremental crawl in the Auto recrawl mode.