Setting Up FileNet Content Engine Sources

FileNet Content Engine data is stored in object stores, which can be further contained inside folders on a server. A FileNet Content Engine instance can have one or more object stores that can be crawled by specifying the Object Store details in the Container name parameter in Oracle SES. The Content Engine source navigates the object store to crawl all the documents in the configured Content Engine Object Store. It stores the metadata and accesses information in Oracle SES to provide search according to the end user permissions.

Important Notes for FileNet Content Engine Sources

Any user having administrative privileges can be used to access FileNet Content Engine Crawler plug-in for crawling and indexing documents.

Required Software

  • FileNet Content Engine version 3.5

  • FileNet Application Engine version 3.5

Required Tasks

Because FileNet Content Engine software is not included with Oracle SES, you must copy these files manually into Oracle SES:

  • javaapi.jar, soap.jar, xercesImpl.jar, and xml-apis.jar

    from FileNetInstalledFolder/Workplace/WEB-INF/lib

    to ORACLE_HOME/search/lib/plugins/fnetce

  • WCMConfig.properties

    from FileNetInstalledFolder/Workplace/WEB-INF

    to ORACLE_HOME/search/lib/plugins/fnetce

Known Issues

  • If any of the parameters are updated after initial crawl, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.

  • If additional document types are configured after first time crawl, then these document types are not indexed on subsequent re-crawls. This is also the case if the Document Size parameter is changed after the first crawl. For example, if the Document Size was 10 MB at the time of the first crawl and it is changed to 20 MB before re-crawl, then documents greater than 10 MB are rejected. As a workaround, create the source again and then make the changes.

Setting Up Identity Management with Filenet Content Engine

If a FileNet Content Engine source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that FileNet Content Engine is using to authenticate users on the file system.

Creating a FileNet Content Engine Source

Create a FileNet Content Engine source on the Home - Sources page. Select FileNet Content Engine from the Source Type list, and click Create. Enter values for the following parameters:

  • Container name: The names of the containers to be crawled by Oracle SES. You can crawl a complete objectstore or a specific Folder. The format for specifying container is ObjectStore/FolderName/SubFolderName. Multiple comma-delimited containers can be specified. Required.

    The following are examples of container names:

    • ObjectStore1: The entire ObjectStore1 is crawled.

    • ObjectStore1/Folder1/Folder12: The documents inside Folder12 and its sub-folders are crawled.

    • ObjectStore1, ObjectStore2/Folder1/Folder12: The entire ObjectStore1 and contents of Folder12 in ObjectStore2 are crawled.

  • User name: A valid FileNet Content Engine user. The user should be an Administrator user or a user who has access to all Folders and Documents present in the configured container. The user should be able to retrieve content, metadata, and ACL from folders, documents of all containers configured in Container name. Required.

  • Password: Password of the Content Engine user. Required.

  • Attribute list: Attribute list corresponds to the comma-delimited list of Content Engine attributes along with their data types that the administrator wants to be searchable. The format is attributeName:attributeType, attributeName:attributeType. The valid values are String, Number, and Date. Table 7-4 identifies equivalent FileNet and Oracle SES data types.

    In an object store, the crawler indexes an attribute only if a valid attribute name and data type matches the configured name and type. Otherwise, the attribute is ignored. It is optional.

    For example, to make the following Content Engine attributes searchable:

    • Attribute name: DocumentTitle Attribute type: String

    • Attribute name: ID Attribute type: Number

    • Attribute name: DateCreated Attribute type: Date

    The value of Attribute List should be: Document Title: String, Id: Number, DateCreated: Date

    The default searchable attributes for FileNet Content Engine are Title, Author, and LastModifiedDate. Multiple attributes with same name are not allowed. For example: Emp_ID: String, Emp_ID: Number is not allowed.

  • Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. The default value is false, and only the latest version is crawled. Any other values are interpreted as false.

  • Crawl folder attributes: Controls whether folder metadata is indexed.Valid values are true and false. The default value is false. Any other values are interpreted as false.

  • URL for viewing the documents: The URL for FileNet Workplace application used for viewing the search results. Workplace is a part of FileNet P8 AE. For example: http://IP_address:port/Workplace

  • Remove deleted documents from index: Controls whether documents deleted from CE object stores are removed from the index. Valid values are true and false. The default value is false, because true has a performance impact. Any other values are interpreted as false.

  • Authentication attribute: The authentication attribute used to set ACL. For Active Directory, the value is USER_NAME.

Table 7-4 FileNet Content Engine Data Type Mapping

Sr. No FileNet Content Engine Data Type Oracle SES Data Type

1

Boolean

String

2

float, int, byte, and other numeric values

Number (Big Decimal)

3

String

String

4

DateTime, Date

Date

5

Others

String