Oracle® Secure Enterprise Search Administrator's Guide 10g Release 1 (10.1.8.2) Part Number E10418-03 |
|
|
View PDF |
This chapter contains the following topics:
The EMC Documentum eRoom Server plug-in extends the searching capabilities of Oracle SES and enables it to search Documentum eRoom Server repositories. Oracle SES can crawl through the documents and related metadata in the Documentum eRoom and provide secure, full-text search. It also provides metadata search and browse functionality.
Documentum eRoom data is stored in an eRoom, which in turn can contain other containers and content. A Documentum eRoom Server instance can have one or more items that can be crawled using the Documentum eRoom Server plug-in by configuring parameters in Oracle SES. The Documentum eRoom Server plug-in navigates through all the containers and the inline contents to crawl all the documents/items in Documentum eRoom Sever. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.
The Documentum eRoom Server plug-in supports incremental crawling; that is, it crawls and indexes only those documents which have changed after the most recent crawling was performed. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum eRoom Server and the end user has to access the same document with a different URL. Documents deleted from items will be removed from the index during incremental crawling.
The Documentum eRoom application is a COM-based application. To interact with the crawler plug-in, a Web service has been created to fetch the data from eRoom (through eRoom APIs) and provide it to the crawler plug-in.
The admin account should be used by the eRoom crawler plug-in for crawling and indexing eRoom items.
The Documentum eRoom Server version must be 7.3.
The following platforms are supported by this release of Documentum eRoom Web Services:
Windows 2000/2003 Server
Microsoft Internet Information Server (IIS) 5.0 or higher
Documentum eRoom Server version 7.3 must be installed and configured
Oracle SES must be installed
Documentum eRoom Server Administrator
The server hosting eRoom must contain Windows .NET Framework 1.1.
The following tasks must be performed before installing the Documentum eRoom Server plug-in:
Oracle Internet Directory Identity Plug-in: Configure Oracle SES to the Oracle Internet Directory identity plug-in:
This task must be performed if the identity plug-in for Oracle Internet Directory is being used for authentication.
In the Oracle SES administration tool, navigate to the Global Settings - Identity Management Setup page. Select Oracle Internet Directory identity plug-in manager, and click Activate.
For Authentication Attribute, select 'nickname'.
For Host name, enter the host name of the computer where Oracle Internet Directory is running.
For Port, enter the value 389 (default LDAP port number).
For Use SSL, enter the appropriate value, either 'true' or 'false'.
For Realm, enter the Oracle Internet Directory realm, for example dc=us,dc=oracle,dc=com.
For User name, enter the Oracle Internet Directory administrator user name, for example cn=orcladmin.
For Password, enter the password for the user in User name.
Oracle Internet Directory Identity Plug-in: Synchronize users and groups from Oracle Internet Directory to eRoom:
Login to eRoom Server and navigate to Community Setting.
On the right side, click Directories - Select add a Directory connection. For Name, enter a name for the LDAP Directory Connection. Select the LDAP Directory radio button. Click Next.
Enter the URLs for the LDAP directory you want to connect to. Provide the user name and password of LDAP Server. Click Next. For Search Root, specify dc=us,dc=oracle,dc=com.
For Search Filter, specify cn=*. Click Next.
Display the test query of connection information. Click Next.
Attribute Map information is displayed. Click Next.
Display the test Mapping. If these are correct, click OK.
Run the LDAP_Synchronization job: To synchronize a connection click synchronize all connection. Click OK.
Microsoft Active Directory Identity Plug-in: Configure Oracle SES to Active Directory Identity Plug-in:
This task must be performed if the identity plug-in for Active Directory is being used for authentication.
In the Oracle SES administration tool, navigate to the Global Settings - Identity Management Setup page. Select The Active Directory Identity Plug-in Manager implemented based on Oracle User & Role API, and click Activate.
For Authentication Attribute, select 'USER_NAME'.
For Directory URL, enter the host name and port number, for example 'ldap://ldapserverhost:port'.
For Directory account name, enter Active Directory User, for example 'Administrator'.
For Directory account password, enter the password for Directory account name.
For Directory subscriber, enter the Active Directory information (ldap base); for example, 'dc=us,dc=oracle,dc=com'.
For Directory security protocol, enter the appropriate value: 'none' or 'port number'.
Click Finish.
Microsoft Active Directory Identity Plug-in: Synchronize users and groups from Active Directory to eRoom:
Login to eRoom Server and navigate to Community Setting.
On the right side, click Directories - Select add a Directory connection. For Name, enter a name for the LDAP Directory Connection. Select LDAP Directory radio button. Click Next.
Enter the URLs for the LDAP directory you want to connect to. Provide the user name and password of the LDAP server. Click Next. For Search Root, specify dc=us,dc=oracle,dc=com.
For Search Filter, specify cn=*. Click Next.
Display the test query of connection information. Click Next.
Attribute Map information is displayed. Click Next.
Display the test Mapping. If these are correct, click OK.
Run the LDAP_Synchronization job: To synchronize a connection, click synchronize all connection. Click OK.
Set up the eRoom Web Service:
Check the pre-installation requisites before proceeding.
Navigate to the $ORACLE_HOME/search/lib/plugins/eroom
folder. Unzip EroomServices.zip
to any temporary folder on the computer where the IIS instance for eRoom is installed.
Run Setup.Exe
to install the Web service on the server that is hosting eRoom. Provide a name for the virtual directory to be created. This name will be required when entering the URL for Web Service parameter in Oracle SES.
Verify that the Web service is installed by checking the following URL:
http://<iis server IP/host>/<virtual directory name>
Create a source for the user-defined eRoom source type on the Home - Sources page. Enter a source name. Provide values for the following parameters.
Container name: The names of the containers to be crawled by Oracle SES. You can crawl the entire Site, Community, Facility, or eRoom item. The format for specifying container is as follows:
<siteName> OR <siteName>/<communityName> OR <siteName>/<communityName>/<FacilityName> OR <siteName>/<communityName>/<FacilityName>/<eRoomName>
This is a required parameter. For example:
Container name:OracleSite/OracleCommunity/OracleFacility/OracleRoom
This means OracleRoom will be crawled.
Attribute list: The comma-delimited list of eRoom custom attributes along with their data types to be searchable. The format is <attribute name:attribute type>, <attribute name:attribute type>. Valid values are String, Number, and Date.
While crawling eRoom, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional field. For example, to make the following eRoom attributes searchable:
Attribute Name: Account Name Attribute Type: String
Attribute Name: Account ID Attribute Type: Integer
Attribute Name: Creation Date Attribute Type: Date
The value should be:
Account Name: String, Account ID: Number, Creation Date: Date
The default searchable attributes for Documentum eRoom Server are Modified Date, Title, Author, CreateDate, and MimeType.
User name: User name of a valid Documentum eRoom Server user. The user should be an administrator or a user who has access to all content, metadata, and ACL from all folders and documents of items configured in Container name. This is a required parameter.
Password: Password of the Documentum user configured previously. This is a required parameter.
Crawl versions: This field indicates whether multiple versions of documents should be crawled. Valid values are 'true' or 'false'. This is an optional parameter, and the default value is 'false'. If any other value is provided, it is assumed to be 'false' and only the latest versions of a document (files only) will be crawled.
URL for Web Services: A valid URL where eRoom Web service has been installed. (http://server/<Name of the virtual>) For example, http://10.113.10.82/EroomServices.
URL for viewing the documents: A valid IP address or host name with port number (<IP address:port>) of the server hosting Documentum eRoom. It is used for viewing the Oracle SES search results; for example, http://10.113.10.82/eRoom or http://10.113.10.82:7512/eRoom.
Authentication Attribute: Attribute used by the LDAP to validate the user. This varies based on the identity plug-in used for authentication. For Active Directory, it should be "USER_NAME".
Lotus Notes data is stored in notes-databases, which can be further contained inside directories on a server. A Lotus Domino Server instance can have one or more databases that can be crawled using the Lotus Notes source. The Lotus Notes source navigates through the databases to crawl the documents (for example, e-mail, calendar, address book, and "to do") in the specified databases. It stores the metadata, and accesses information in Oracle SES to provide search according to the end users' credentials.
The Lotus Notes connector now lets you enable or disable multiple attachment support with the Attachment as Search Item
attribute. When this is disabled, the additional attributes Parent URL
and Parent Title
are added for all attachment documents, to link it with the parent document.
The Lotus Notes source supports incremental crawling; that is, it crawls and indexes only those documents that have changed after recent most crawling was scheduled. A document is re-crawled if either the content, metadata, display URL or the direct security access information of the document has changed. Documents deleted from a database will be removed from the index during incremental crawling.
To enable SES to launch Notes thick client, set the Notes Thick Client parameter to true
.
The user-account used to crawl Lotus Notes databases should preferably be an Administrator account, such that it has access on all databases and is able to retrieve and crawl all documents in the specified databases.
The following tasks must be performed before installing the Lotus Notes source:
HTTP and DIIOP tasks must be running on Domino Server.
If the Active Directory identity plug-in is used, then the users and user-groups in the Domino Directory must be synchronized with Active Directory. While using the Active Directory identity plug-in, the short-name in the Lotus Notes person document is used for validating the user in Active Directory, so it should be a resolvable logon name in Active Directory.
Configure the server document:
Open the server document on the Lotus Notes server that needs to be crawled.
On the Configuration page, expand the Server section.
On the Security page, in the Programmability Restrictions area, specify the appropriate security restrictions for your environment in the following fields:
Run restricted Lotus Script/Java agents
Run restricted Java/Javascript/COM
Run unrestricted Java/Javascript/COM
For example, you might specify an asterisk (*) to allow unrestricted access by Lotus Script/Java agents, and specify user names that are registered in the Domino Directory for the Java/Javascript/COM restrictions.
Note:
The crawler that you configure to crawl this server with the DIIOP protocol must be able to use the user names that you specify in these fields.Open the Internet Protocol page, then open the HTTP page, and set the Allow HTTP Clients to Browse Database option to Yes.
Configure the user document:
Open the user document on the Lotus Notes server that needs to be crawled. This document is stored in the Domino directory.
On the Basics page, for Internet password, specify a password.
Restart the DIIOP task on the server.
Copy the Lotus Notes/Domino jar files to the following directories. This must be done before activating the Lotus Notes identity plug-in.
For Lotus Notes release 5.0:
$ORACLE_HOME/search/lib/plugins/ln/ Notes.jar NCSO.jar $ORACLE_HOME/search/lib/plugins/identity/ln/ Notes.jar NCSOW.jar
For Lotus Notes release 6.5 and 7.0:
$ORACLE_HOME/search/lib/plugins/ln/ NCSO.jar Notes.jar $ORACLE_HOME/search/lib/plugins/identity/ln/ NCSO.jar Notes.jar
A Lotus Notes source does not index encrypted fields, and the content of attachments with encrypted documents, for searching. With encrypted documents, the URL of the search result launches the Notes document in place of the attachment file, which is the case when non-encrypted documents are crawled.
Oracle SES currently does not support crawling inside specific folders/views of the Notes custom-applications or mail-databases.
Deleted Notes documents and attachments in Notes documents are still searchable after an incremental crawl that was set by specifying 'Recrawl using last modified date' as true. To remove URLs from deleted documents or attachments from the Oracle SES index, either perform a force re-crawl (that is, change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page) or mark the 'Recrawl using last modified date' source parameter as false.
Activate an identity plug-in on the Global Settings - Identity Management Setup page.
The users/groups on Active Directory can be synchronized with Lotus Domino Directory such that all users/groups in Active Directory get registered in Domino as well. Thus, any ACL entry in a notes database or notes document can be validated in Active Directory also, and vice versa.
Oracle SES also provides a Lotus Notes identity plug-in so the Lotus Domino Directory can be used to authenticate and validate the notes native users and groups in Oracle SES.
Activate the Lotus Notes identity plug-in with the following parameters:
Server name: The Domino server fully qualified host name/IP address. If the HTTP port on the Domino server is not 80, then the host name should be "<server-name>:<HTTP port number>".
User name: User name of a valid Lotus Domino Server user. This is a required parameter.
Password: Internet password of the Lotus Notes user. This is a required parameter.
Create a Lotus Notes source on the Home - Sources page. Select Lotus Notes from the Source Type list, and click Create. Enter values for the following parameters:
Server Name: The Domino server fully qualified host name/IP address. For example, if the Lotus Notes database name is ses.nsf
, then enter ses.nsf
for this parameter. If the HTTP port on the Domino server is not 80, then the host name should be "<server_name>:<HTTP port number>".
This is a required parameter.
Attribute list: The comma-delimited list of Lotus Notes attributes along with their data types to search. The format is <Attribute Name>:< Attribute Type>, <Attribute Name: Attribute Type>. The valid values are String, Number, and Date. For example: Subject:String
Table 6-1 Lotus Notes Data Type Mapping
Sr. No | Lotus Notes Data Type | Oracle SES Data Type |
---|---|---|
1 |
Boolean |
String |
2 |
Integer |
Number (Big Decimal) |
3 |
String |
String |
4 |
Date |
Date |
While crawling a database, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional parameter.
The default searchable attributes for Lotus Domino Server are Modified Date, Title, and Author. Multiple attributes with same name are not allowed.
User name: The user name of a valid Lotus Domino Server user. The user should be an Administrator user or a user who has access to all folders and documents of the databases configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from documents of all databases configured in Container name parameter. This is a required parameter.
Password: Internet password of the Lotus Notes user. This is a required parameter.
Container Name: The comma-delimited names of the containers to be crawled by Oracle SES. These containers could be one or many specific databases or directory-names if all databases in the particular directories need to be crawled. Multiple database or directory names should be separated by a comma. Specify the Lotus Notes database file name with the extension. For example, if the database is under the mail directory, then enter mail/ses.msf
for this parameter. This is a required parameter.
Crawl Public Documents: Indicate whether the public documents on notes databases need to be crawled such that they are available to anonymous users in Oracle SES, either true or false. This is a required parameter.
Authentication Attribute: The attribute used to validate the ACL. With the Active Directory identity plug-in, the value should be USER_NAME
. With the Lotus Notes identity plug-in, the value should be NATIVE
. This is a required parameter.
Mail Template Name: This parameter is specific to the mail-databases and the mail template's name should be specified here if any/all of the databases being crawled are mail databases. This is a mandatory parameter if either the Past Days or Future Days parameter is specified.
Past Days: If the user is crawling calendar entries, then this parameter specifies the number of days in the past for which the calendar entries are picked. The date of reference here is the start date of the event. This accounts for the number of days in the past, and it does not filter the search by time.
Future Days: If the user is crawling calendar entries, then this parameter specifies the number of days in the future for which the calendar entries are picked. The date of reference here is the end date of the event. This accounts for the number of days in the future, and it does not filter the search by time.
Notes Title Field: Because in Lotus Notes custom applications it is not mandatory to maintain a Title field, this parameter has been provided to specify those text fields that should be parsed to retrieve the title field. For example, you could enter Subject
. With multiple field names, the first field available on the document is picked for the title. This is a required parameter.
Notes Thick Client: Enter true to use Lotus Notes (thick client). Enter false to use Lotus Notes Web access.
Recrawl using last modified date: Enter true to enqueue only modified documents. This is a required parameter.
Attachment As Search Item: Enter true to have each document in the attachment be submitted individually as an independent document with the same set of attributes and ACLS as that of the parent document. Enter false to have attachments be added to the parent document and submitted as a single unit.
Oracle SES can crawl through and provide secure search for e-mail and calendar items, related metadata, attributes, ACLs, and attachments in Microsoft Exchange. It also provides attribute search and browse functionality, which allows search to be done against a specific subfolder in the hierarchy.
Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed since the last crawl was scheduled. A document is re-crawled if either the content or metadata or the direct security access (permissions) information of the document has changed. A document is also re-crawled if it is moved within Microsoft Exchange. Documents deleted from Exchange are removed from the index during incremental crawls.
A Microsoft Exchange source covers the following objects in Exchange:
E-mail attachments
Calendar events
On the Exchange server, the super user must grant himself the Send as
and Receive as
privileges. You can enable privileges globally for all users in the system. No user-specific privilege grants are required.
See Also:
Microsoft Exchange 2003 Technical Reference Guide and information about permissions in Microsoft Exchange: http://www.microsoft.com/technet/prodtechnol/exchange/default.mspx
Oracle Secure Enterprise Search Release Notes on OTN for supported platforms
Microsoft Internet Information Server (IIS)
Note:
The fileADODB.dll
is usually included in the Windows .NET Framework SDK. However, if this file is not on your computer, then you must download the ADODB.dll
appropriate for your system from Microsoft and install it using the following command:
gacutil /i adodb.dll
The Windows .NET Framework can be downloaded here:
Proper permissions on the Exchange server need to be granted to the Exchange administrator. The Exchange server is crawled with the permission of a super user with the Send as
and Receive as
privileges. The easiest way to configure this is to use an administrator as super user or create a super user with the administrator privilege and the Send as
and Receive as
privileges targeting Exchange inbox store and public folders.
To enable the Outlook Web Access logon page, you must enable forms-based authentication on the server. To enable forms-based authentication:
On the Exchange server, log on with the Exchange administrator account, and then start Exchange System Manager.
In the console tree, expand Servers.
Expand the server for which you want to enable forms-based authentication, and then expand Protocols.
Expand HTTP, right-click Exchange Virtual Server, and then click Properties.
In the Exchange Virtual Server Properties dialog box, on the Settings tab, in the Outlook Web Access pane, select the Enable Forms Based Authentication option.
Click Apply, and then click OK.
Restart the IIS server.
If you are using forms-based authentication with SSL offloading, you must configure your Exchange Server front-end servers to handle this scenario.
E-mails with multibyte characters sent from a browser with a different language set than the characters in the mail are not indexed correctly in Oracle SES. The multibyte characters are converted to "?".
This is a known e-mail content issue with Microsoft Exchange. To send future e-mails so that the Microsoft Exchange connector can crawl them properly, either one of the two workarounds can be applied:
Change the browser language to the characters in the e-mail. For example, set it to "Japanese" to input Japanese characters.
Change the value of the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeWEB\OWA\UseRegionalCharset (Original) '1' (New) Any number (except 1). For example, '0'
With release 10.1.8.2, the Microsoft Exchange connector uses WebDAV for best performance. Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that Microsoft Exchange is using to authenticate users on the file system.
For the Oracle SES instance to read the files during crawling, add permission to each folder and file to make them accessible by the operating system user that runs the Oracle SES instance. (Adding permissions to a folder will automatically add the same permissions to all the files and subfolders in the folder.)
Create a Microsoft Exchange source on the Home - Sources page. Select Microsoft Exchange from the Source Type list, and click Create.
Enter values for the following parameters:
User Name: User name to authenticate between Oracle SES and Exchange
Password: password to authenticate between Oracle SES and Exchange
Server: Microsoft Exchange server IP
Domain: Microsoft Exchange server domain
LDAP Port: Microsoft Exchange LDAP port
Simple Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using simplified regular expressions. Specify an inclusion rule that a URL contain, start with, or end with a term. Only '*','^', and '$' operators are permitted. Use an asterisk (*) to represent a wildcard. Use a caret (^) to denote the beginning of a URL, and use a dollar sign ($) to denote the end of a URL. For example: ^https://*.oracle.com/.jpg$
Simple Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using simplified regular expressions. Only '*','^', and '$' operators are permitted.
Regular Expression Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using restricted (full java.util.regexp) regular expression rules. For example: ^https://.*\.oracle(?:corp){0,1}\.com
Regular Expression Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using restricted (full java.util.regexp) regular expression rules.
This section includes information for Windows NT File System (NTFS) source on Windows. There is a separate source type for NTFS on UNIX.
The NTFS connector enables Oracle SES to search file repositories in Microsoft NTFS. An Oracle SES NTFS source collects the content, metadata attributes and ACLs of files in NTFS. An NTFS source supports incremental crawl. After the initial crawl is performed, subsequent crawls only collect those documents that have changed since the last crawl. A document is re-crawled if the content, metadata, or the ACL information of the document has changed. A file is also re-crawled if it is moved between folders. Files deleted from NTFS are removed from the index during incremental crawls.
The operating system user running the Oracle SES instance must have read permission on the NTFS file share being crawled. For example, if the remote file share \\computer1\share1\directory1\ is crawled by the NTFS source, then the SES instance must be run as a domain user who has access to the file share.
If you get the ACL in the form <encrypted acl>@domain for a folder on a remote computer, it probably means that the computer running the Oracle SES instance and the remote computer are on different domains and your computer cannot interpret the ACLs appropriately.
Currently, the Oracle SES crawler considers the shared folder an empty document, but it is not indexed; therefore, the total number of unique documents indexed will be less than the total number of documents fetched.
An ACL error may appear when crawling an NTFS source as a built-in user or group, such as an Administrator user. As a workaround, set explicit access to the administrator user: Security - Administrator (user), All Permissions.
"Everyone" is a special group that represents all current network users, including guests and users from other domains. When a user logs on to the network, the user is automatically added to the "Everyone" group. The NTFS connector supports the "Everyone" group. All documents for which the "Everyone" group has permission will be crawled and accessed like public documents. There is no need to log in to the search application to access these public documents. However, if there is a "deny" to a user along with permissions to "Everyone" group to access the document, then all users except for the one for who "deny" has been granted can see the document, and these users need to log in to the search application to see the document.
When using Internet Explorer with files on a different domain, you must explicitly log on to Internet Explorer to open result links to those files.
When you use the NTFS connector and search file types of .txt, .zip, or.rtf, only the Title and Author attributes are fetched and indexed.
If not already installed, download and install the Windows .NET 2.0 Framework.
The Oracle SES process needs to be run as domain administrator to crawl remote computers on the domain. This is an important prerequisite to crawl the remote computers for NTFS. Follow these steps to run Oracle SES process as the domain administrator:
Navigate to Control Panel - Administrative Tools - Services.
Select the process OracleService<db sid>
.
Stop this process.
Right click and select Properties.
Select the Log on tab.
Select the option This account, and enter the domain administrator name and password.
Start this process.
Note:
If the Oracle SES instance fails to start after the preceding change, then follow these steps:Navigate to the $ORACLE_HOME/NETWORK/ADMIN
directory.
Edit sqlnet.ora
by changing SQLNET.AUTHENTICATION_SERVICES=(NTS) to SQLNET.AUTHENTICATION_SERVICES=(NONE)
.
When an NTFS source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that NTFS is using to authenticate users on the file system.
For the Oracle SES instance to read the files during crawling, add permission to each folder and file to make them accessible by the operating system user that runs the Oracle SES instance. (Adding permissions to a folder will automatically add the same permissions to all the files and sub-folders in the folder.)
Note:
NTFS sources rely on Active Directory for security permissions. Because permissions at the server local group level are not defined in Active Directory, these permissions are not supported when crawling NTFS sources. In other words, permissions for server local groups (not domain local groups) are ignored during crawling. Permissions for domain groups and users inherited from server local groups also are ignored.Create an NTFS source on the Home - Sources page. Select NTFS from the Source Type list, and click Create. Enter values for the following parameters:
UNC Path: UNC Path(s), for example, \\MyServer\Mysharedfolder
Domain Name: Domain name of the URL (UNC Path)
Simple Include: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only '*','^', and '$' operators are permitted. For example: ^https://*.oracle.com/.jpg$
Simple Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only '*','^', and '$' operators are permitted.
Regular Expression Include: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (=full java.util.regexp) regular expression rules. For example: ^https://.*\.oracle(?:corp){0,1}\.com
Regular Expression Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (=full java.util.regexp) regular expression rules.
Use Local Display URL: Enter true
to use the local display URL and false
to use display the content in a web browser.
Authentication Attribute: Authentication attribute used by the LDAP to validate the user. Use USER_NAME
for Active Directory and nickname
for Oracle Internet Directory.
Note:
After crawling an NTFS source, it is possible to get a "No User Found Matching the Criteria" error message on the Home - Schedules - Data Synchronization page. This error is thrown by the identity plug-in. The NTFS connector tries to validate the principal as user first. If that fails, then it tries to validate the principal as group. This error always occurs if there are groups as ACL for a document, because the connector does not know if the given principal is user or group.This section includes information for Windows NT File System (NTFS) source on UNIX. NTFS sources for UNIX have additional setup steps not required on Windows.
An NTFS source collects the content, metadata attributes, and ACLs of files in NTFS. An NTFS source supports incremental crawl. After the initial crawl is performed, subsequent crawls only collect those documents that have changed since the last crawl. A document is re-crawled if the content, metadata or the ACL information of the document has changed. A file is also re-crawled if it is moved between folders. Files deleted from NTFS are removed from the index during incremental crawls.
On the Windows server, the super user must have permissions to read the NTFS file share.
The super user must be the impersonate user in the IIS Server.
The default behavior for NTFS for UNIX is to use local file display URL, so the client computer must have access to the file share.
An ACL error may appear when crawling an NTFS source as a built-in user or group, such as an Administrator user. As a workaround, set explicit access to the administrator user: Security - Administrator (user), All Permissions.
"Everyone" is a special group that represents all current network users, including guests and users from other domains. When a user logs on to the network, the user is automatically added to the "Everyone" group. The NTFS connector supports the "Everyone" group. All documents for which the "Everyone" group has permission will be crawled and accessed like public documents. There is no need to log in to the search application to access these public documents. However, if there is a "deny" to a user along with permissions to "Everyone" group to access the document, then all users except for the one for who "deny" has been granted can see the document, and these users need to log in to the search application to see the document.
When using Internet Explorer with files on a different domain, you must explicitly log on to Internet Explorer to open result links to those files.
NTFS sources on UNIX requires an NTFS agent to be installed and configured on the Windows domain where the NTFS files are to be crawled. The NTFS agent collects and sends content and metadata to the crawler plug-in on the Oracle SES computer in a crawl session. The communication protocol between Oracle SES and the NTFS agent is HTTP or HTTPS.
The NTFS agent must be installed on a Windows computer where IIS is present, and the computer must be in the same Windows domain where the NTFS file share to be crawled resides.
Typically, a remote file share is crawled with the permission of a domain administrator or a domain user with read privileges on the file share. The easiest way to configure this is to add the domain admin group to the 'administrators' group of the target computer.
The Oracle SES instance must connect to the same Active Directory instance that the Microsoft NTFS domain connects to.
Install NTFS Agent on the Windows computer:
If not already installed, download and install the Windows .NET 2.0 Framework.
Configure NTFS agent in IIS:
Unzip $ORACLE_HOME/search/lib/plugins/ntfsLinWin/NTFSWebService.zip
into a temporary directory
Create a virtual directory in IIS, and copy all the files unzipped from NTFSWebService.zip
into the virtual directory, or copy the files into an existing virtual directory on IIS.
For help in creating virtual directories in IIS (IIS 6.0) see http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/5adfcce1-030d-45b8-997c-bdbfa08ea459.mspx?mfr=true
Make the virtual directory accessible for the anonymous user.
(Optional) Configure IIS Web site to use SSL.
See Also:
Configuring IIS Web site to use SSL: http://www.petri.co.il/configure_ssl_on_your_website_with_iis.htm
How to implement SSL in IIS: http://support.microsoft.com/kb/299875
Configure the NTFS agent to connect to the NTFS store in IIS:
Right-click your Web site (The IIS virtual directory with NTFSWebService Folder/files
)
Click the Properties tab.
Click the ASP.NET button and then Edit Configurations.
ASP.NET configuration/application settings parameters, as required in the Oracle SES source configuration.
Service UserName: User name to authenticate between Oracle SES and NTFS Agents.
Service Password: Password to authenticate between Oracle SES and NTFS Agents.
FileChunkSize: Size (in bytes) for a retrieving a file in smaller chunks using the Web service method. This value should be a positive integer. For example, 1024000 divides the file into 1Mb chunks for passing the contents over the Web.
Configure ASPNET impersonation: Impersonation is performed when ASP.NET executes code in the context of an authenticated and authorized client. Using impersonation, ASP.NET applications can optionally execute the processing thread using the identity of the client on whose behalf they are operating. Configure the IIS virtual directory as follows:
Right-click your IIS Web site (virtual directory), and then click Properties.
Click the ASP.NET button, and then Edit Configurations.
Click the Application tab of ASP.NET Configuration Settings for Local Impersonation settings User Name: DOMAIN\<domain user>Password: password for <domain user>.
The NTFS agent can be deployed in any IIS instance in the same Windows domain.
The application user or super user (Impersonate User) must have read permissions on the NTFSWebService physical directory and on the file share to be crawled. To enable read permissions:
Right-click the file folder.
Click Properties.
Click security, and then click the Advanced tab.
Click effective permissions.
Enable read permissions for the user entered in the NTFS agent configuration.
When an NTFS source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that NTFS is using to authenticate users on the file system.
For the Oracle SES instance to read the files during crawling, add permission to each folder and file to make them accessible by the operating system user that runs the Oracle SES instance. (Adding permissions to a folder will automatically add the same permissions to all the files and sub-folders in the folder.)
Note:
NTFS sources rely on Active Directory for security permissions. Because permissions at the server local group level are not defined in Active Directory, these permissions are not supported when crawling NTFS sources. In other words, permissions for server local groups (not domain local groups) are ignored during crawling. Permissions for domain groups and users inherited from server local groups also are ignored.Create an NTFS source on the Home - Sources page. Select NTFS from the Source Type list, and click Create. Enter the values for the following parameters:
UNC Path: UNC path for the NTFS system to crawl; for example, \\MYSERVER\mysharedfolder
Endpoint: Target end point (HTTP or HTTPS); for example, http(s)://NTFS Domain server (mail.doklet.com in this fig.)/virtual directory (NTFSWebService in the fig.)/NTFSWebService.asmx
User Name: User name to authenticate between Oracle SES and NTFS (configuration parameters same as NTFS Agent in IIS)
Password: Password to authenticate between Oracle SES and NTFS (configuration parameters same as NTFS Agent in IIS)
Authentication attribute: Attribute used by the LDAP to validate the user. This varies based on the identity plug-in used for authentication. Use "USER_NAME" for Active Directory and "nickname" for Oracle Internet Directory.
Domain Name: Domain name of the URL(UNC Path).
Simple Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using simplified regular expressions. Specify an inclusion rule that a URL contain, start with, or end with a term. Only '*','^', and '$' operators are permitted. Use an asterisk (*) to represent a wildcard. Use a caret (^) to denote the beginning of a URL, and use a dollar sign ($) to denote the end of a URL. For example: ^https://*.oracle.com/.jpg$
Simple Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using simplified regular expressions. Only '*','^', and '$' operators are permitted.
Regular Expression Include: To limit crawling, specify up to 50 colon-delimited path inclusion boundary rules using restricted (full java.util.regexp) regular expression rules. For example: ^https://.*\.oracle(?:corp){0,1}\.com
Regular Expression Exclude: To limit crawling, specify up to 50 colon-delimited path exclusion boundary rules using restricted (full java.util.regexp) regular expression rules.
Note:
After crawling an NTFS source, it is possible to get a "No User Found Matching the Criteria" error message on the Home - Schedules - Data Synchronization page. This error is thrown by the identity plug-in. The NTFS connector tries to validate the principal as user first. If that fails, then it tries to validate the principal as group. This error always occurs if there are groups as ACL for a document, because the connector does not know if the given principal is user or group.Oracle recommends creating one source group for archived calendar data and another source group for active calendar data. One instance for the archived source can run less frequently, such as every week or month. This source should cover all history. A separate instance for the active source can run daily for only the most recent period.
The Oracle SES instance and the Oracle Calendar instance must be connected to the same Oracle Internet Directory system. Follow these steps to set up a secure Oracle Calendar source:
On the Global Settings - Identity Management Setup page in the Oracle SES administration tool, select the Oracle Internet Directory identity plug-in manager, and click Activate.
Use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)
$ORACLE_HOME/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f $ORACLE_HOME/search/config/ldif/calPlugin.ldif
Where $ORACLE_HOME
is the directory where Oracle SES was installed.
This defines the entity that will be used for the plug-in: orclapplicationcommonname=ocscalplugin,cn=oses,cn=products,cn=oraclecontext
. The entity will have the password welcome1
.
Create an Oracle Calendar source on the Home - Sources page. Select Oracle Calendar from the Source Type list, and click Create. Enter values for the following parameters:
Table 6-2 Calendar Source Parameters
Parameter | Value |
---|---|
Calendar server |
http://host name:port |
Application entity name |
|
Application entity password |
welcome1 |
OID server hostname |
host name |
OID server port |
389 |
OID server SSL port |
636 |
OID server ldapbase |
dc=us,dc=oracle,dc=com |
uid |
|
User query |
(objectclass=ctCalUser) |
Past days |
30 |
Future days |
60 |
Rollover |
true |
Calendar server for Display URL |
Calendar endpoint URL to be used to formulate the display URL; for example, http://calendarserver:7777. If this parameter is left blank, then the value provided for the Calendar server parameter is used to formulate the display URL. |
Oracle Collaboration Suite 10g Mail (Oracle Mail) implements the IMAP protocol. Oracle SES uses that to retrieve data. You must login to the mail server using the user name and password to get information. However, Oracle Collaboration Suite mail server has a flag, which allows the administrator to crawl mails of all users. This connector uses that feature to crawl all the mails of all the users using the mail server's admin login.
Apart from the private folders, the IMAP Connector for OCS Email Server has shared folders. You can share any folder with another person by making it shared. Hence, while doing ACL stamping, the crawler must look if the mail is a part of a private folder or a shared folder and act accordingly.
The IMAP Connector for OCS Email Server has a Web interface to open mail. This same Web interface opens the searched mails from Oracle SES.
Activate the identity plug-in on the Global Settings - Identity Management Setup page. Select Oracle Internet Directory identity plug-in and click Activate.
Enter values for the following parameters:
For Authentication Attribute, select nickname.
For Host name, enter the host name of the computer where Oracle Internet Directory is running.
For Port, enter the value 389 (the default LDAP port number).
For Use SSL, enter true or false.
For Realm, enter the Oracle Internet Directory realm; for example, dc=us,dc=oracle,dc=com.
For User name, enter the Oracle Internet Directory administrator user name; for example, cn=orcladmin.
For Password, enter the password for the user name.
Create an IMAP Connector for OCS Email Server source on the Home - Sources page. Select IMAP Connector for OCS Email Server from the Source Type list, and click Create. Enter values for the following parameters:
Email Server Address: The IP address/DNS name of the IMAP e-mail server to be crawled, with the port number. This also specifies if the e-mail server follows IMAP or IMAPS protocol. This is a mandatory parameter. An exception is thrown if this is null. If the server address is incorrect, then an exception is logged at the time of accessing the server. It should be of the format: imap://<IP Address>:<port number> or imaps://<IP Address>:<port number>.
Email Server Admin User: The admin user name to access the e-mail server. This is a mandatory parameter.
Email Server Admin Password: The password of the e-mail admin user. This is a mandatory parameter.
Remove Deleted messages from Index: Indicates whether or not to keep the index for deleted mails in incremental recrawls. Valid values are "yes" or "no". Any other value is considered "yes".
Authentication Attribute: Attribute used to validate the user. This varies based on the identity plug-in used for authentication. IMAP Connector for OCS Email Server uses Oracle Internet Directory for authentication, so this parameter should be NICKNAME.
LDAP Server: The LDAP server information (IP address/DNS name, and so on).
LDAP Server Port: The LDAP server port number.
LDAP Admin User Name: The admin user name of the LDAP server. This is a mandatory parameter.
LDAP Admin Password: The password of the admin user of the LDAP server.
LDAP Base: The domain to be searched; for example, dc=oracle,dc=com.
LDAP Query: The query string defining the users whose e-mails need to be crawled. This parameter is used for user-level partitioning. For example, to crawl only users with names beginning with A and having an e-mail in the domain us.oracle.com, the query should be (|(cn=A*)(mail=*@us.oracle.com)).
Days from which crawling needs to be done: A number, which represents the number of days (in the past) from which the crawling will be done with today (current crawl time) as the base. All mails will be the default value for this.
Days to which the crawling needs to be done: A number, which represents the number of days (in the past) to which the crawling will be done with today (time of crawl) as the base. Today is the default value.
Display URL template: The display URL to be used for viewing the documents. This should have the placeholder for e-mail or user ID. For example, to see the full e-mail address in the display URL, enter the following:
http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$EMAIL
To see the user ID, enter the following:
http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$UID
Folders to crawl: The comma-delimited list of folders to be crawled. '*' means crawl all folders. Other valid values are INBOX, sent, and trash. This does not support regular expressions.
Folders not to crawl: The comma-delimited list of folders not to be crawled. This is considered only if the Folders to crawl parameter has the wildcard * as its value. Valid values for are INBOX, sent, and trash. This does not support regular expressions.