This section contains information for NTFS sources on Windows. For NTFS on UNIX, see "Setting Up NTFS Sources for UNIX".
The NTFS connector enables Oracle SES to search file repositories in Microsoft NTFS. An Oracle SES NTFS source collects the content, metadata attributes and ACLs of files in NTFS. An NTFS source supports incremental crawl. After the initial crawl is performed, subsequent crawls only collect those documents that have changed since the last crawl. A document is re-crawled if the content, metadata, or the ACL information of the document has changed. A file is also re-crawled if it is moved between folders. Files deleted from NTFS are removed from the index during incremental crawls.
The operating system user running the Oracle SES instance must have read permission on the NTFS file share being crawled. For example, if the remote file share \\computer1\share1\directory1\ is crawled by the NTFS source, then the Oracle SES instance must be run as a domain user who has access to the file share.
If you get the ACL in the form <encrypted acl>@domain for a folder on a remote computer, it probably means that the computer running the Oracle SES instance and the remote computer are on different domains and your computer cannot interpret the ACLs appropriately.
Currently, the Oracle SES crawler considers the shared folder an empty document, but it is not indexed; therefore, the total number of unique documents indexed is less than the total number of documents fetched.
An ACL error may appear when crawling an NTFS source as a built-in user or group, such as an Administrator user. As a workaround, set explicit access to the administrator user: Security - Administrator (user), All Permissions.
"Everyone" is a special group that represents all current network users, including guests and users from other domains. When a user logs on to the network, the user is automatically added to the "Everyone" group. The NTFS connector supports the "Everyone" group. All documents for which the "Everyone" group has permission is crawled and accessed like public documents. There is no need to log in to the search application to access these public documents. However, if there is a "deny" to a user along with permissions to "Everyone" group to access the document, then all users except for the one for who "deny" has been granted can see the document, and these users must log in to the search application to see the document.
When using Internet Explorer with files on a different domain, you must explicitly log on to Internet Explorer to open result links to those files.
When you use the NTFS connector and search file types of .txt, .zip, or.rtf, only the Title and Author attributes are fetched and indexed. For these attributes, the crawler fetches the properties stored in the authoring program (typically accessed by selecting Properties from the File menu) and not the NTFS properties (accessed in Windows Explorer by right-clicking the file name and choosing Properties).
If not previously installed, then download and install the Windows .NET 2.0 Framework from from the Microsoft Download Center Web site:
When an NTFS source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that NTFS is using to authenticate users on the file system.
For the Oracle SES instance to read the files during crawling, add the permission to each folder and file to make it accessible by the operating system user that runs the Oracle SES instance. Adding permissions to a folder automatically adds the same permissions to all the files and sub-folders in the folder.
NTFS sources rely on Active Directory for security permissions. Because permissions at the server local group level are not defined in Active Directory, these permissions are not supported when crawling NTFS sources. Permissions for server local groups (not domain local groups) are ignored during crawling. Permissions for domain groups and users inherited from server local groups also are ignored.
Create an NTFS source on the Home - Sources page. Select NTFS from the Source Type list, and click Create. Enter values for the following parameters:
UNC Path: UNC Paths, for example,
Domain Name: Domain name of the URL (UNC Path)
Simple Include: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only
$ operators are permitted. For example:
Simple Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using simplified regular expressions. Only
$ operators are permitted.
Regular Expression Include: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (full java.util.regexp) regular expression rules. For example:
Regular Expression Exclude: To limit crawling, specify up to 50 colon-separated path boundary rules using restricted (full java.util.regexp) regular expression rules.
Use Local Display URL: Enter
true to use the local display URL or
false to use display the content in a web browser.
Authentication Attribute: Authentication attribute used by the LDAP to validate the user. Use
USER_NAME for Active Directory and
nickname for Oracle Internet Directory.
After crawling an NTFS source, you may get a "No User Found Matching the Criteria" error message on the Home - Schedules - Data Synchronization page. This error is signalled by the identity plug-in. The NTFS connector tries to validate the principal as user first. If that fails, then it tries to validate the principal as group. This error occurs if there are groups as ACL for a document, because the connector does not know if the given principal is a user or a group.