7 Configuring Access to Content Management Sources

This chapter contains the following topics:

Setting Up EMC Documentum Content Server Sources
Setting Up FileNet Content Engine Sources
Setting Up FileNet Image Services Sources
Setting Up Hummingbird Document Management Server Sources
Setting Up IBM DB2 Content Manager Sources
Setting Up Microsoft SharePoint Sources
Setting Up Open Text Livelink Sources
Setting Up Oracle Content Database Sources
Setting Up Oracle Content Server Sources

Setting Up EMC Documentum Content Server Sources

Documentum data is stored in DocBases, which can contain cabinets and folders. A Documentum Content Server instance can have one or more DocBases crawled with an EMC Documentum Content Server source. The Documentum Content Server source navigates through the DocBases and the inline cabinets to crawl all the documents in Documentum Content Server. Oracle SES creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the most recent crawling was scheduled. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum Content Server and the end user has to access the same document with a different URL. Documents deleted from a DocBase are removed from the index during incremental crawling.

Important Notes for EMC Documentum Content Server Sources

The Documentum source in Oracle SES must use the administrator account of a DocBase for crawling and indexing documents of that DocBase.

Required Software

Documentum Content Server DA (Documentum Administrator) or Documentum Content Server WebTop application must be installed and configured.
Documentum Foundation Classes (DFC) must be installed on the server running Oracle SES.
Currently supported Documentum version is 6.5.

Required Tasks

Because EMC Documentum Content Server software is not included with Oracle SES, certain files must be copied manually into Oracle SES.

The DFC installation asks for destination directory and user directory. For Windows, the default destination directory is C:\Program Files\Documentum and default user directory is C:\Documentum.

For UNIX, you must create a DFC program root and a DFC user root. For example, DFC program root might be user_home/documentum_shared and DFC user root might be user_home/documentum.

Copy the dfc.properties and DFC jar files from the following locations into ORACLE_HOME/search/lib/plugins/dcs.
- dctm.jar
  - Windows: DFC_destination_directory\
  - Linux: DFC_destination_directory/
- dfc.jar
  - Windows: DFC_destination_directory\shared\
  - UNIX: DFC_destination_directory/dfc
- dfcbase.jar
  - Windows: DFC_destination_directory\shared\
  - UNIX: DFC_destination_directory/dfc
- dfc.properties
  - Windows: DFC_destination_directory\config\
  - UNIX: DFC_destination_directory/config/
Create a new directory under ORACLE_HOME/product/version/SES Instance Name/search/lib/plugin/dcs/. For example dcsothers.
Copy dfc.properties to the folder created in the previous step (dcsothers), as well as to the main folder (dcs).
Copy dfc.jar, dfcbase.jar, dctm.jar to the dcs folder in ORACLE_HOME/product/version/SES Instance Name/search/lib/plugin/dcs.
Add the following to DMCL.ini:
```
max_session_count = 20
max_connection_per_session = 20
```
In Windows, DMCL.ini is located in the WINNT folder. In Linux, DMCL.ini is available in the Documentum folder (DFC user root).
In Windows 2003 server, copy dmcl40.dll from DFC_destination_directory/shared/ to ORACLE_HOME/product/ version/SES Instance Name/BIN. For UNIX platforms, copy the file according to Table 7-1.
The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on Linux. Also note that these variables must to be exported again, and Oracle SES must be restarted when the machine restarts. These variables can also be exported permanently in Linux.

Use the following commands to export environmental variables in Linux:

For DOCUMENTUM:
```
export DOCUMENTUM=/home/sesuser/DOCUMENTUM
```
For DOCUMENTUM_SHARED:
```
export DOCUMENTUM_SHARED=/home/sesuser/DOCUMENTUM_SHARED
```
Restart the middle tier:

searchctl restart.

On Windows, restart the machine after installing DFC.

Table 7-1 DFC Files to Copy for UNIX Platforms

Platform	Copy File	From	To
Linux x86	libdmcl40.so	DFC_destination_directory/dfc	$ORACLE_HOME/lib
Linux x86-64	libdmcl40.so	DFC_destination_directory/dfc	$ORACLE_HOME/lib32
Solaris SPARC (64-bit)	libdmcl40.so	DFC_destination_directory/dfc	$ORACLE_HOME/lib32
HP-UX PA-RISC (64-bit)	libdmcl40.sl	DFC_destination_directory/dfc	$ORACLE_HOME/lib32
AIX 5L Based Systems (64-bit)	libdmcl40.so	DFC_destination_directory/dfc	$ORACLE_HOME/lib32
HP-UX Itanium	libdmcl40.so	DFC_destination_directory/dfc	$ORACLE_HOME/lib32

Known Issues

In this release, search results cannot be viewed in Documentum desktop. The documents and folders can be viewed only using Documentum Administrator (DA) or Webtop applications.
For the Container name parameter, a value of repository name alone might not work. Enter the value of RepositoryName/CabinetName. For example, DocBaseName/CabinetName/FolderName/SubFolderName.

Configuration for Documentum Content Server 6.5

For Windows, the JAR files can be taken from the application server directory where DA is deployed. For DFC installation on Linux, it is a prerequisite to create DFC program root and DFC user root. For example, the DFC program root can be USER HOME/DOCUMENTUM_SHARED and the DFC user root can be USER HOME/ DOCUMENTUM. Table 7-2 lists the location of the JAR files in Windows and Linux.

Table 7-2 Location of the JAR Files

JAR File Name	Windows Location	Linux Location
`dfc.jar`	`Application server home directory/da deployment directory/WEB-INF/lib`	`DFC_destination_directory`
`aspectjrt.jar`	`Application server home directory/da deployment directory/WEB-INF/lib`	`DFC_destination_directory/dfc`
`certjFIPS.jar`	`Application server home directory/da deployment directory/WEB-INF/lib`	`DFC_destination_directory/dfc`
`jsafeFIPS`	`Application server home directory/da deployment directory/WEB-INF/lib`	`DFC_destination_directory/dfc`
`dfc.properties`	`Application server home directory/da deployment directory/WEB-INF/classes`	`DFC_user_directory/config/`
`configservice-api.jar`	`Application server home directory/da deployment directory/WEB-INF/lib`	`DFC_destination_directory/dfc`

To configure the crawler plug-in:

Create a new directory under ORACLE_HOME/product/ version/SES Instance Name/search/lib/plugin/dcs/ . For example, dcsothers.
Copy dfc.properties to the folder created in the previous step (dcsothers) as well as to the main folder (dcs).
Copy dfc.jar, aspectjrt.jar, certjFIPS.jar, jsafeFIPS.jar, configservice-api.jar to the dcs folder in the following path ORACLE_HOME/product/version/SES Instance Name/search/lib/plugin/dcs.
The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on Linux. Also note that the environment variables $DOCUMENTUM_SHARED, $DOCUMENTUM, and $CLASSPATH must be exported again, and Oracle SES must be restarted when the machine restarts. These variables can also be exported permanently in Linux.

Use the following commands to export environmental variables in Linux:

For DOCUMENTUM:
```
export DOCUMENTUM=/home/sesuser/DOCUMENTUM
```
For DOCUMENTUM_SHARED:
```
export DOCUMENTUM_SHARED=/home/sesuser/DOCUMENTUM_SHARED
```
For CLASSPATH:
```
export CLASSPATH=$DOCUMENT_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config
```

Setting Up Identity Management for EMC Documentum Content Server

Setting up identity management requires administration steps in both Oracle SES and EMC Documentum. It includes the following steps:

Activating the Documentum Identity Plug-in
Activating the OID Identity Plug-In
Activating the AD Identity Plug-In
Activating SunOne Identity Plug-In

Activating the Documentum Identity Plug-in

To activate the Documentum identity plug-in, perform the following steps:

Select Documentum Identity Plug-in.
Click Activate.
Enter a valid DocBase name.
Enter a valid user name and password.
Ensure that the environment variable DOCUMENTUM and DOCUMENTUM_SHARED are set correctly.
Click Finish.

Activating the OID Identity Plug-In

Before activating the OID Identity plug-in for validating the users in OID, Documentum Content Server should be synchronized with OID as an LDAP server. To do this, you must import the users and groups from OID to Documentum. Perform the following tasks for this:

Create an LDAP Configuration Object in Documentum Administrator (DA). To do this:
1. Login to DA.
2. Navigate to Administration, User Management, LDAP.
3. In the File Menu, select File, New, LDAP Configuration Object.
4. In the Name field, enter a name for LDAP Configuration Object.
5. Select dm_user as the user subtype.
6. Under Communication Mode, select Regular.
7. Under Import, select Users and Groups.
8. Select Default Configuration Object to use this configuration object in the server field.
9. Click Next.
10. In the Directory Type field, select Oracle Internet Directory Server.
11. In the Bind Type field, select Bind by Searching for Distinguished Name.
12. In the Binding Name field, provide the admin user name of OID. This is usually cn=orcladmin.
13. In the Binding Password field, provide the admin user password.
14. In the Host Name field, provide the OID host name.
15. Retain the default port number of OID (389).
16. In the Person Object Class field, provide the information of Base Person Object, typically the value is inetOrgPerson.
17. In the Person Search Base field, provide the person search base defined in OID. For example, cn=Users, dc=us, dc=oracle, dc=com.
18. In the Person Search Filter field, specify cn=*.
19. In the Group Object Class field, provide the Group Object. Typically the value is groupOfUniqueNames.
20. In the Group Search Filter field, specify cn=*.
21. Click Next.
22. The Attribute Map information is displayed. Click Finish.
Run the LDAP_Synchronization job. To do this:
1. Login to DA.
2. Navigate to Administration, Job Management, Jobs.
3. Open the job dm_LDAPsynchronization.
4. In the state field, select Active.
5. Select Deactivate On Failure.
6. In Designated Server, select the host name of Documentum Server.
7. Select Run After Update.
8. Click the Schedule tab.
9. In the Start Date And Time field, set the current date and time.
10. Select Repeat time from the Repeat list.
11. Set the Frequency field to any numeric value.
12. Select End Date And Time and specify how long the Synchronization job should run.
13. Click the Method tab.
14. Select Pass Standard Argument.
15. Click the SysObject info tab.
16. Click OK.

After synchronizing the Documentum Content Server with OID, you must activate the OID activity plug-in in Oracle SES. Perform the following steps:

Log in to Oracle SES as the admin user.
Click Global Settings.
Select System, Identity Management Setup.
Select Oracle Internet Directory identity plug-in manager and click Activate.
Select nickname from the Authentication Attribute list.
Provide the following values:
- Host name: The host name of the machine where OID is running.
- Port: The default LDAP port number, 389.
- Use SSL: true or false based on your preference.
- Realm: The OID realm, for example, dc=us.dc=oracle.dc=com
- User name: The OID admin username, for example, cn=orcladmin.
- Password: User password

Activating the AD Identity Plug-In

Before activating AD Identity plug-in for validating the users in AD, Documentum Content Server must be synchronized with AD as an LDAP server. To do this, you must import users and groups from AD to Documentum. For this, perform the following steps:

Create an LDAP Configuration Object in DA. To do this:
1. Log in to DA.
2. Navigate to Administration, User Management, LDAP.
3. Select File, New, LDAP Configuration Object.
4. Enter a name for ldap configuration object.
5. Select dm_user as User Subtype.
6. In the Communication Mode field, select Regular.
7. In the Import field, select Users and Groups.
8. Select Default Configuration Object in the server field, and click Next.
9. Provide the following values:
  
  Directory Type: Select Active Directory Server.
  
  Bind Type: Select Bind by Searching for Distinguished Name
  
  Binding Name: Provide the admin user name of AD. It is normally domainName/Administrator.
  
  Binding Password: The password of the AD admin user.
  
  Host Name: AD host name.
  
  Port: Default port number of AD, 389.
  
  Person Object Class: The Base Person Object, typically the value is user.
  
  Person Search Base: The person search base defined in AD, for example cn=Users,dc=us, dc=oracle,dc=com.
  
  Person Search Filter: Enter cn=*.
  
  Group Object Class: The group object. Typically the value is group.
  
  Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.
  
  Group Search Filter: Enter cn=*.
10. Click Next.
11. The Attribute Map information is displayed. Click Finish.
Run the LDAP_Synchronization job. To do this:
1. Login to DA.
2. Navigate to Administration, Job Management, Jobs.
3. Open the job dm_LDAPsynchronization.
4. In the state field, select Active.
5. Select Deactivate On Failure.
6. In Designated Server, select the host name of Documentum Server.
7. Select Run After Update.
8. Click the Schedule tab.
9. In the Start Date And Time field, set the current date and time.
10. Select Repeat time from the Repeat list.
11. Set the Frequency field to any numeric value.
12. Select End Date And Time and specify how long the Synchronization job should run.
13. Click the Method tab.
14. Select Pass Standard Argument.
15. Click the SysObject info tab.
16. Click OK.

After the Documentum Content Server is synchronized with the AD, you must activate the identity for AD Identity plug-in. To perform this:

Log in to Oracle SES as admin user.
Click Global Settings, and then select System, Identity Management Setup.
Select Activity Directory Identity Plug-in Manager, and click Activate.
Provide the following values:
- Authentication Attribute: Select USER_NAME.
- Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.
- Directory account name: Provide the AD user name, for example Administrator.
- Directory account password: AD user password.
- Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.
- Directory security protocol: Specify either none or portnumber.
Click Finish.

Activating SunOne Identity Plug-In

Before activating SunOne Identity plug-in for validating the users in SunOne, you must synchronize Documentum Content Server with SunOne as an LDAP server. To do this, you must import the users and groups from OID to Documentum. Perform the following steps:

Create an LDAP Configuration Object in DA. To do this:
1. Log in to DA.
2. Navigate to Administration, User Management, LDAP.
3. Select File, New, LDAP Configuration Object.
4. Enter a name for ldap configuration object.
5. Select dm_user as User Subtype.
6. In the Communication Mode field, select Regular.
7. In the Import field, select Users and Groups.
8. Select Default Configuration Object in the server field, and click Next.
9. Provide the following values:
  
  Directory Type: Select Netscape/iPlanet Directory Server
  
  Bind Type: Select Bind by Searching for Distinguished Name
  
  Binding Name: Provide the admin user name of SunOne. It is normally cn=Administrator.
  
  Binding Password: The password of the SunOne admin user.
  
  Host Name: SunOne host name.
  
  Port: Enter the port number used for SunOne. The default port number of SunOne is 389.
  
  Person Object Class: The Base Person Object, typically the value is person.
  
  Person Search Base: The person search base defined in SunOne, for example cn=Users,dc=us, dc=oracle,dc=com.
  
  Person Search Filter: Enter cn=*.
  
  Group Object Class: The group object. Typically the value is groupOfUniqueNames.
  
  Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.
  
  Group Search Filter: Enter cn=*.
10. Click Next.
11. The Attribute Map information is displayed. Click Finish.
Run the LDAP_Synchronization job. To do this:
1. Login to DA.
2. Navigate to Administration, Job Management, Jobs.
3. Open the job dm_LDAPsynchronization.
4. In the state field, select Active.
5. Select Deactivate On Failure.
6. In Designated Server, select the host name of Documentum Server.
7. Select Run After Update.
8. Click the Schedule tab.
9. In the Start Date And Time field, set the current date and time.
10. Select Repeat time from the Repeat list.
11. Set the Frequency field to any numeric value.
12. Select End Date And Time and specify how long the Synchronization job should run.
13. Click the Method tab.
14. Select Pass Standard Argument.
15. Click the SysObject info tab.
16. Click OK.

After the Documentum Content Server is synchronized with SunOne, the identity is activated for SunOne Identity plug-in. To perform this:

Log in to Oracle SES as admin user.
Click Global Settings, and then select System, Identity Management Setup.
Select Sun Java System Directory Server Manager, and click Activate.
Provide the following values:
- Authentication Attribute: Select USER_NAME.
- Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.
- Directory account name: Provide the Directory Server user name, for example Administrator.
- Directory account password: Directory Server user password.
- Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.
- Directory security protocol: Specify either none or portnumber.
Click Finish.

Creating an EMC Documentum Content Server Source

Create an EMC Documentum Content Server source on the Home - Sources page. Select EMC Documentum Content Server from the Source Type list, and click Create. Enter values for the following parameters:

Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Documentum DocBase or a specific repository/cabinet/folder. The format is DocBaseName/CabinetName/FolderName/SubFolderName. Multiple comma-delimited container names can be entered. This parameter is case-sensitive; hence, enter the exact same cabinet name as in the Documentum repository. Required

These are examples of container names:
- DocBase1: The entire DocBase1 is crawled.
- DocBase2/Cabinet21: Cabinet21 and its sub-folders within DocBase2 are crawled.
- DocBase2/Cabinet21/Folder11: Folder11 and its sub-folders are crawled.
- DocBase1, DocBase2/Cabinet21/Folder11: The entire DocBase1 and Folder 11 in DocBase2/Cabinet21 are crawled.
Attribute list: The comma-delimited list of Documentum attributes along with their data types to be searchable. The format is AttributeName:AttributeType, AttributeName:AttributeType. Valid values are String, Number, and Date. See Table 7-3, "Documentum Data Type Mapping".

While crawling a DocBase, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional parameter.

For example, assume that you have the following Documentum attributes with the indicated data types
- account name: String
- account ID: Integer
- creation date: Date
To make these attributes searchable, enter this value for Attribute list:

Account Name:String, Account ID:Number, Creation Date:Date

The default searchable attributes for Documentum Content Server are Modified Date, Title, and Author.

Multiple attributes with same name are not allowed, such as Emp_ID:String and Emp_ID:Number.
User name: Enter the user name of a valid Documentum Content Server user. The user should be an administrator user or a user who has access to all cabinets, folders, and documents of the DocBases configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from cabinets, folders, documents and other custom sub classes of all DocBases configured in Container name parameter. Required.
Password: Password of the Documentum user. Required.
Crawl versions: Indicate whether multiple versions of documents should be crawled, either true or false. The default value is false. Any other value is false and only the latest versions of a document are crawled. Optional.
Crawl folder attributes: Indicate whether folder attributes must be crawled, either true or false. This is an optional parameter. The default value is false. Any other value is interpreted as false.
URL for viewing the documents: A valid URL for Documentum WebTop or DA application used for viewing the Oracle SES search results. For example:

http://IP_address:port/da

or

http://IP_address:port/webtop
Authentication Attribute: This parameter is used to set ACLs. This parameter lets you set multiple LDAP servers. If Oracle SES and Documentum Content Server are synchronized with Active Directory, then enter the value USER_NAME. If Oracle Internet Directory is used, then enter nickname.

Table 7-3 Documentum Data Type Mapping

Sr. No	Documentum Data Type	Oracle SES Data Type
1	Boolean	Number
2	Integer	Number
3	String	String
4	ID	String
5	Time or Date	Date
6	Double	Number

Setting Up FileNet Content Engine Sources

FileNet Content Engine data is stored in object stores, which can be further contained inside folders on a server. A FileNet Content Engine instance can have one or more object stores that can be crawled by specifying the Object Store details in the Container name parameter in Oracle SES. The Content Engine source navigates the object store to crawl all the documents in the configured Content Engine Object Store. It stores the metadata and accesses information in Oracle SES to provide search according to the end user permissions.

Important Notes for FileNet Content Engine Sources

Any user having administrative privileges can be used to access FileNet Content Engine Crawler plug-in for crawling and indexing documents.

Required Software

FileNet Content Engine version 3.5
FileNet Application Engine version 3.5

Required Tasks

Because FileNet Content Engine software is not included with Oracle SES, you must copy these files manually into Oracle SES:

javaapi.jar, soap.jar, xercesImpl.jar, and xml-apis.jar

from FileNetInstalledFolder/Workplace/WEB-INF/lib

to ORACLE_HOME/search/lib/plugins/fnetce
WCMConfig.properties

from FileNetInstalledFolder/Workplace/WEB-INF

to ORACLE_HOME/search/lib/plugins/fnetce

Known Issues

If any of the parameters are updated after initial crawl, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.
If additional document types are configured after first time crawl, then these document types are not indexed on subsequent re-crawls. This is also the case if the Document Size parameter is changed after the first crawl. For example, if the Document Size was 10 MB at the time of the first crawl and it is changed to 20 MB before re-crawl, then documents greater than 10 MB are rejected. As a workaround, create the source again and then make the changes.

Setting Up Identity Management with Filenet Content Engine

If a FileNet Content Engine source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that FileNet Content Engine is using to authenticate users on the file system.

Creating a FileNet Content Engine Source

Create a FileNet Content Engine source on the Home - Sources page. Select FileNet Content Engine from the Source Type list, and click Create. Enter values for the following parameters:

Container name: The names of the containers to be crawled by Oracle SES. You can crawl a complete objectstore or a specific Folder. The format for specifying container is ObjectStore/FolderName/SubFolderName. Multiple comma-delimited containers can be specified. Required.

The following are examples of container names:
- ObjectStore1: The entire ObjectStore1 is crawled.
- ObjectStore1/Folder1/Folder12: The documents inside Folder12 and its sub-folders are crawled.
- ObjectStore1, ObjectStore2/Folder1/Folder12: The entire ObjectStore1 and contents of Folder12 in ObjectStore2 are crawled.
User name: A valid FileNet Content Engine user. The user should be an Administrator user or a user who has access to all Folders and Documents present in the configured container. The user should be able to retrieve content, metadata, and ACL from folders, documents of all containers configured in Container name. Required.
Password: Password of the Content Engine user. Required.
Attribute list: Attribute list corresponds to the comma-delimited list of Content Engine attributes along with their data types that the administrator wants to be searchable. The format is attributeName:attributeType, attributeName:attributeType. The valid values are String, Number, and Date. Table 7-4 identifies equivalent FileNet and Oracle SES data types.

In an object store, the crawler indexes an attribute only if a valid attribute name and data type matches the configured name and type. Otherwise, the attribute is ignored. It is optional.

For example, to make the following Content Engine attributes searchable:
- Attribute name: DocumentTitle Attribute type: String
- Attribute name: ID Attribute type: Number
- Attribute name: DateCreated Attribute type: Date
The value of Attribute List should be: Document Title: String, Id: Number, DateCreated: Date

The default searchable attributes for FileNet Content Engine are Title, Author, and LastModifiedDate. Multiple attributes with same name are not allowed. For example: Emp_ID: String, Emp_ID: Number is not allowed.
Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. The default value is false, and only the latest version is crawled. Any other values are interpreted as false.
Crawl folder attributes: Controls whether folder metadata is indexed.Valid values are true and false. The default value is false. Any other values are interpreted as false.
URL for viewing the documents: The URL for FileNet Workplace application used for viewing the search results. Workplace is a part of FileNet P8 AE. For example: http://IP_address:port/Workplace
Remove deleted documents from index: Controls whether documents deleted from CE object stores are removed from the index. Valid values are true and false. The default value is false, because true has a performance impact. Any other values are interpreted as false.
Authentication attribute: The authentication attribute used to set ACL. For Active Directory, the value is USER_NAME.

Table 7-4 FileNet Content Engine Data Type Mapping

Sr. No	FileNet Content Engine Data Type	Oracle SES Data Type
1	Boolean	String
2	float, int, byte, and other numeric values	Number (Big Decimal)
3	String	String
4	DateTime, Date	Date
5	Others	String

Setting Up FileNet Image Services Sources

Documents in FileNet Images Services are organized into Folders. A FileNet Image Services source navigates through the folder hierarchy to crawl all documents in FileNet Image Services (IS). Oracle SES creates the index and stores the metadata of the documents retrieved from FileNet Images Services in Oracle SES to provide search according to the end users' permissions.

A FileNet Image Server instance can have one or more Libraries. A Library is the document repository and contains documents within Folders and sub-Folders. A FileNet Image Services source can crawl multiple Libraries.

Images stored in Image Services can have annotations. Some annotations contain text, and these annotations are crawled. The annotations crawled are:

Stamp
Transparent Text
Stick note

You can search on the content of these annotations after the IS library has been crawled.

Important Notes for FileNet Image Services Sources

A user belonging to IS SysAdmin group must be used to crawl documents and metadata in IS.

Required Software

FileNet Image Services Server version 4.0 or 3.6 SP2
Image Services Resources Adapter version 3.2.1

Required Tasks

Because FileNet Image Services software is not included with Oracle SES, you must perform these tasks manually to integrate with Oracle SES:

Deploy the ISCrawlerWeb.war file in the same application server on which ISRA has been deployed.
For application servers that require context root to be specified while deploying a WAR file, specify Context Root as ISCrawlerWeb.
If the application server is WebSphere Application Server, then activate URL rewriting: Click Servers - Application Servers - server_name- Web Container - Session Management - Enable URL Rewriting.

Known Issues

If additional document types are configured after the first crawl, then these document types are not indexed on subsequent re-crawls. The same applies if the Document Size parameter is changed after first crawl. For example, Document Size was 10 MB at the time of first crawl and it is changed to 20 MB before re-crawl, then documents with greater than 10 MB are rejected. As a workaround: update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.
XML documents are crawled by default without configuring the source for XML documents: Oracle SES provides an option of configuring the documents types, including XML, to be crawled. Currently, even if XML document type is not configured, XML documents still are crawled.

Setting Up Identity Management for FileNet Image Services

Activate an identity plug-in on the Global Settings - Identity Management Setup page.

To configure the identity plug-in for Image Services:

On the Global Settings - Identity Management Setup page, select FileNet Image Services identity plug-in, and click Activate.
Set the following parameters:
- Authentication Attribute: Select NATIVE.
- Web Component URL: Enter the host name and port number of the Web component URL; for example, http://webserverhost:port/ISCrawlerWeb.
- Administrator user name: Enter the Image Services user name.
- Administrator password: Enter the password of the Image Services user.
- Library name of IS Server: Enter the name of the Image Services library, such as ISCF. The library name is the ISRA connection factory name that is created when ISRA is deployed.
Click Finish.

See the ISRA documentation for information about these tasks:

The FileNet Image Services Resource Adapter (ISRA) must be deployed on a supported application server. See the ISRA documentation for supported application servers.
A connection Factory must be created for ISRA. The connection factory should be configured for the target IS libraries. See the ISRA documentation for deployment instructions.
ISRA comes with a viewer application for viewing images and annotations, the FNImageViewer.ear application should be deployed on the same application server as ISRA. This viewer would be invoked to display images for example jpeg, tiff, bmp, gif, and annotations. See the ISRA documentation for deployment instructions.
To support secure search, the Image Services server must be synchronized with the Active Directory server. See the section titled LDAP configuration in ISRA deployment guides for importing Microsoft Active Directory users and groups to Image Services.
After Active Directory users and groups have been imported into Image Services, ISRA must be configured to authenticate with Active Directory. See the section titled LDAP Configuration in the ISRA deployment guide for details.

Creating a FileNet Image Services Source

Create a FileNet Image Services source on the Home - Sources page. Select FileNet Image Services from the Source Type list, and click Create. Enter values for the following parameters:

Container names: The names of the containers to be crawled by Oracle SES. You can crawl an entire FileNet Image Services Library or a specific Folder. The format is LibraryName/FolderName/SubFolderName(cache_name). Library name is the ISRA connection factory name created when ISRA is deployed. Cache name is where the document content can be found. Multiple comma-delimited container names can be entered. Required.

For example:
- Container name: LibraryName1(cache name): The entire LibraryName1 is crawled
- Container name: LibraryName2/Folder1/(cache name): Folder1 and its sub-folders are crawled.
- Container name: LibraryName1, LibraryName2/Folder1(cache name): The entire LibraryName1 and Folder 1 in LibraryName2 are crawled
- Cache name: The format is cache name: DomainName:Organization. This is an optional parameter. If the cache name is not provided, then the plug-in tries to retrieve document content from the default page cache. However, the plug-in throws an error if an invalid page cache or empty brackets () are specified. Ask the Image Services administrator for cache details.
User name: Enter the user name of a valid FileNet Image Services user. The user should be a SysAdmin user or a user who has access to all Folders and Documents of the Libraries configured in the Container name parameter. The user should be able to retrieve content, metadata and ACL from folders, documents and other custom sub classes. The user should be defined in the configured LDAP server and should be imported into IS. Required.
Password: The FileNet Image Services user password. Required.
Web component URL: The URL of J2EE application server where the crawler plug-in Web component module is deployed. The format of the URL is http://host:port. Required.

The Web component is also used to view the search results. On clicking an Oracle SES search result, the user is prompted to log in. After the user successfully logs in, the document is displayed.

To display images and annotations, you must deploy the FileNet Image viewer FNImageViewer.ear. FNImageViewer.ear is a part of ISRA CD. If the viewer is not deployed, the images are displayed in the native viewer or the user is prompted to download the document.
Attribute Names: The comma-delimited list of Image Services attributes along with their data types to search. The format is attributeName:attributeType, attributeName:attributeType. Valid values are String, Number, and Date. Table 7-5 identifies equivalent FileNet and Oracle SES data types.

In a Library, the crawler indexes an attribute only if both name and type of the attribute in the library match the configured name and type; otherwise, it is ignored. Optional.

For example, to make the following FileNet Image Services attributes searchable:
- Attribute name: account name attribute type: String
- Attribute name: account ID attribute type: Integer
- Attribute name: creation date attribute type: Date
The value of Attribute List is:

Account Name: String, Account Id: Number, Creation Date: Date
Set source hierarchy: Indicates whether the source should set the source hierarchy of the document, either true or false. The default value is false. Any other value is interpreted as false.

A document in Image Services can be filed in multiple folders. A user may have READ permissions on a document but not on all the folders in which the document is filed. If Set Source Hierarchy is true, then a user could view a source hierarchy on which he or she does not have permissions in Image Services. However, the user cannot view the documents on which he or she does not have READ permissions.
Set Public Access: Indicates whether the source sets the public access of the documents whose ACL is Anyone. Set this parameter to true or false. The default value is false. Any other value is interpreted as false.
Authentication Attribute: This parameter is used to get the LDAP authentication attribute. The appropriate value varies based on the identity plug-in used for authentication. For Microsoft Active Directory, set it to USER_NAME. For FileNet Image Services identity plug-in, set it to NATIVE.

Table 7-5 FileNet Image Services Data Type Mapping

Sr. No	FileNet Image Services Data Type	Oracle SES Data Type
1	BOOLEAN	String
2	BYTE	Number
3	UNSBYTE	Number
4	SHORT	Number
5	UNSSHORT	Number
6	LONG	Number
7	UNSLONG	Number
8	ASCII	String
9	TIME	Date
10	DATE	Date
11	MENU	Number
12	FP_NUM	Number

Setting Up Hummingbird Document Management Server Sources

The Hummingbird DM Server plug-in extends the searching capabilities of Oracle SES and enables it to search Hummingbird DM Server repositories. Oracle SES can crawl documents and metadata in the Hummingbird repositories and provide secure, full-text search. It also provides metadata search and browse functionality, which allows search to be done against a specific subfolder in the hierarchy.

Hummingbird data is stored in libraries, which can contain folders, files, and workspaces. A Hummingbird DM Server instance can have one or more libraries that can be crawled with the Hummingbird DM Server plug-in by configuring parameters in Oracle SES. The Hummingbird DM Server plug-in navigates through the libraries to crawl all documents in Hummingbird DM Server. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed since the most recent crawl. A document is re-crawled if the content, metadata, or the direct security access information of the document has changed. Documents deleted from a library are removed from the index during incremental crawling.

The Hummingbird plug-in includes two components: a plug-in jar file and a Web services component. The jar file is deployed in Oracle SES. The Web services component must be deployed on the computer on which Hummingbird Web Server (Webtop) is deployed.

The Hummingbird DM Server identity plug-in is used to authenticate the native users of Hummingbird DM Server.

Important Notes for Hummingbird DM Server Sources

The Hummingbird crawler plug-in should use the administrator account for the Container for crawling and indexing documents.
The Hummingbird DM Server version must be 2004 or 2005.

Required Software

Hummingbird DM Server must be installed and configured. The following versions of Hummingbird DN are supported: 2004, 2005.
Hummingbird Web Server (WebTop): Hummingbird Web Server is required to see the files and folder stored in Hummingbird DM Server.
Windows .NET Framework 1.1 must be on the same computer where Hummingbird Web Server (WebTop) is running.

Required Tasks

Import User/Groups from Active Directory Server to Hummingbird.

Login to Hummingbird WebTop with a user having administrator privileges.
Select DM ADMIN from the list at the top of page.
Go to Users and Groups - User Synchronization.
Select the Network Resource and click Load Network.
Select the name of the domain with the users to import and click Load Network.
The Network resource list shows the names of users. Select the users to import and click Import User.
Click Save.
In Library User, you can see the list of users that are imported in Hummingbird Web server.

Known Issues

If you update the Attribute list parameter, then a force re-crawl should be performed to delete the indexes of the old attribute list and create indexes for the new attribute list. That is, change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.

Setting Up Identity Management for Hummingbird

Choose an identity plug-in on the Global Settings - Identity Management Setup page.

Activate the Hummingbird identity plug-in with the following parameters.

Library name: The name of library to be crawled.
URL: This parameter is used to send the request to the Web service to retrieve the data. For example:

[http | https]://computername:port/VirtualDirectoryName/HBDMIdentityWebservice.asmx

The virtual directory name is given during installation of Web services for Hummingbird.
User name: User name of Hummingbird DM Server. The user must be an administrator user and a native user of Hummingbird. Required.
Password: Password for User name.
Authentication Attribute: NATIVE.

Creating a Hummingbird Source

Create a source for the newly created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters in the following table.

Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Hummingbird library or a specific folder. The format is LibraryName/LibraryName/FolderName/SubFolderName. This parameter is case-sensitive.

To crawl all documents in the library the format for library is LibraryName/ LibraryName. You can enter multiple comma-delimited container names. Required.

For example:
- Container name: LibraryName/LibraryName
  
  The entire LibraryName is crawled
- Container name: LibraryName/LibraryName/Folder21
  
  Folder21 and its sub-folders within LibraryName are crawled.
- Container name: LibraryName/LibraryName/PublicFolders/Folder1
  
  Folder1 and its sub-folders within PublicFolders are crawled.
Attribute list: The comma-delimited list of attributes to be searchable. The format is AttributeName,AttributeName. Optional.

Hummingbird stores all attributes as String data type so the data type of attributes in Hummingbird are the String data type in Oracle SES. Only LastModifiedDate is the Date data type in Oracle SES. The default attributes are Title, LastModifiedDate, and Author.

While crawling a library or folder, an attribute is indexed only with a match; otherwise, it is ignored. For example, to make the following Hummingbird attributes searchable:

Attribute name: account name

Attribute name: account ID

Attribute name: creation date

The value of Attribute List is: account name, account ID, creation date.

Multiple attributes with same name are not allowed. For example: Emp_ID, Emp_ID.

If custom fields have been created, then include the name of table and column separated by a dot (.). For example: tablename.columnname,tablename.columnname
User name: User name of a valid Hummingbird DM Server user. The user must be an administrator user or a user who has access to all folders and documents configured in Container name. The user should be able to retrieve content, attributes, and documents. Required.
Password: Password of the Hummingbird user in User name. Required.
Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. The default value is false. Any other value is interpreted as false, and only the latest version of a document is crawled. Optional.
Crawl folder attributes: Controls whether folder attributes are crawled. Valid values are true and false. The default value is false. Any other value is interpreted as false. Optional.
View Documents: The IP address or computer name where the Hummingbird Webtop (Hummingbird Web Server) application is installed. It is the URL for viewing search results. For example: http://computername.

If SSL is enabled on Hummingbird DM Web Server, the URL is https://computername. If Hummingbird is running on a port other than the default port (80), then append the port number using this format: http://computername:port.
Crawl Attachments: Controls whether attachments to the documents are crawled. Valid values are true and false. The default value is false. Any other value is interpreted as false. Optional.
Search form: The profile name used in Hummingbird. The default value is DEF_QBE. If custom attributes have been added in profile and you want to search for these attributes, then enter the name of the custom profile.
URL for Webservice: The URL of Web services that are consumed by the plug-in. For example:

[http | https]://computername/virtual_folder /HBDMWebService.asmx

where virtual_folder is the name of the virtual folder created by the Web service installer.

If the Web service is running on a port other then the default port (80), then include the port number. For example:

[http | https]://computername:port/virtual_folder /HBDMWebService.asmx
Authentication Attribute: The name of the authentication attribute that is used to set ACL. The Oracle Internet Directory value is nick_name. The Active Directory value is USER_NAME. The Hummingbird identity plug-in value is NATIVE.
Hummingbird DM version: The version of Hummingbird DM to be crawled. Valid values are 5 and 6.
Date Format: This is to specify the date format being used in the DM Server. For example, specify the format in crawler source configuration page for date 10/23/2009 10:10:10 as MM/dd/yyyy HH:mm:ss. If no date format is specified, or an invalid date format is specified, then the default locale settings are used to parse the date. This is an optional parameter.
Activity Log Based Crawl: Indicates whether incremental crawl should be based on Activity Log Records, which is an optimal incremental crawl. Set as True for optimized incremental crawl, that is Activity Log based crawl, and False for processing all documents to find modified documents.

Deploying the Web Service on the Hummingbird DM Server

The Web service is located in ORACLE_HOME/search/lib/plugins/hbdm. The Web service must be installed on the same server as Hummingbird DM.

The Web service component is provided as an installable setup file. This component must be installed on the same server on which Hummingbird Web Server and Windows .NET Framework 1.1 are installed.

Separate Web service installers are provided for Hummingbird DM 5 (Hummingbird_DM5_Web_Service_Installer.zip) and Hummingbird DM 6 (Hummingbird_DM6_Web_Service_Installer.zip). Ensure that the correct Web service component is installed based on the Hummingbird DM version.

To install the Web service:

Double-click setup.exe to install the Web service.
The installer prompts for the name of the virtual directory. (The virtual directory name can be changed.) The installer creates a virtual directory on Microsoft Internet Information Server (IIS) with same name. If you have multiple Web sites in IIS running on different ports, and you want to install this Web service in a Web site other than the default Web site, then include the port number.
Provide the user name and password of Hummingbird DM Server. Enter the user name in the form: domainname\username.

Setting Up IBM DB2 Content Manager Sources

The IBM DB2 Content Manager (ICM) plug-in extends the searching capabilities of Oracle SES to search ICM repositories, which consists of item types and their instances in the form of folders and documents. Oracle SES can crawl documents and metadata in the ICM Library Server and provide secure, full-text search. Starting from the specified folders, the plug-in extends the crawling and thus the search, into their complete child tree of any specified folder. If an item type is specified for crawling, then the plug-in crawls all instances of the item types and their complete child trees.

In ICM, the library server manages the content metadata and access control to all content in a database (such as DB2), interfacing to one or more resource managers. The primary job of the Library Server is to service client requests for content. The ICM plug-in navigates through the library server to crawl documents and folders in the specified item types. It stores the metadata and accesses information in Oracle SES to provide search according to the credentials of the end users.

While the crawler connects to the library server through the APIs, the library server internally connects with the resource manager through CM-managed secure tokens. Whenever a reference is made to the document object, they are fetched from the resource manager using these tokens. With the crawler plug-in, metadata corresponding to a document is retrieved from the library server while the display URL points to the document-object on the resource manager using the token.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the recent most crawl. A document is re-crawled if either the content, metadata, display URL, or the direct security access information of the document has changed. Documents deleted from a database are removed from the index during incremental crawling.

Important Notes for IBM DB2 Content Manager Sources

The user account used to crawl the specified item types must be an Administrator account that has access on all instances (documents and folders) to the specified item types and can retrieve and crawl all folders and documents. The administration user specified for crawling must belong to the ICMPUBLIC group and the AllPrivs privilege set.
The version of DB2 Content Manager used to set up the repositories for crawling must be 8.3.

Required Software

This section lists required software (in order of installation) for the installation of DB2 Content Manager 8.3:

Server Software Requirements (Computer with ICM Server):

Windows Server 2003 Enterprise Edition
IBM WebSphere Application Server 5.1 plus FixPak 1
IBM DB2 Universal Database Enterprise Server Edition (32-bit): 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special
DB2 Content Manager Enterprise Edition 8.3 plus FixPak1
DB2 Information Integrator for Content 8.3 with Fix Pack 3
DB2 Content Manager eClient 8.3

Client Software Requirements (Computer with Oracle SES):

IBM DB2 Run-Time Client: 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special
DB2 Information Integrator for Content 8.3 with Fix Pack 3
DB2 Content Manager Client for Windows 8.3 (optional for Windows)

Required Tasks on the Server

The following tasks must be performed on the computer with ICM server.

To install and configure the system with ICM server:

Install DB2 Content Manager 8.3 with the required fix-packs.
Enable LDAP on DB2:
1. Open the System Administration Client.
2. Select Tools - LDAP Configuration to display the LDAP Configuration window.
3. Select Enable LDAP User import and authentication
4. On the Server tab, select server type Active Directory.
5. Provide the LDAP server information on the Server page.
6. Click OK.
Import users and groups from the Active Directory to ICM:
1. In the system administration client, click Authentication and then right-click either Users or User-Groups.
2. Click the LDAP button and then enter the user to be imported into ICM. To view a list of all valid user names, click Show All.
3. Select one or more users and click OK.
4. From the Assign to Groups tab, assign the users to the required groups.
5. From the Set Defaults tab, specify the default resource manager, collection and item access control list for the users, user-groups, or both.
6. Click OK or Apply.
  
  The selected users and user-groups are imported into the DB2 CM environment.
7. To verify the import, select Users or User-Groups. The imported users or user-groups appear in the list on the right.

Required Tasks on the Client Side

Catalog the DB2 run-time client with DB2 Content Manager Library database.

To install and configure the system with Oracle SES:

Locate the services file in \WINDOWS\system32\drivers\etc or similar directory on Windows and the /etc directory on Linux.

Open the services file in a text editor and add these lines:

[Service Name]    [Port #]/tcp #DB2 connection service port
Example: db2c_DB2 50000/tcp    #DB2 connection service port

Enter the following commands from the command line processor, where node_name is any name of your choosing:
```
catalog tcpip node node_name remote [IP_address | host] server service_name
```
In this example, node_name is CMDB, host is my_computer, and service_name is db2c_DB2:
```
catalog tcpip node CMDB remote my_computer server db2c_DB2
```
Enter the following command, where database_alias is a name of your choosing and node_name was specified in the previous step:
```
catalog db database_name as database_alias at node node_name
```
In this example, the alias is the same as the database name (ICMNLSDB) and the node name is CMDB.
```
catalog db ICMNLSDB as ICMNLSDB at node CMDB
```

To check the connection, issue the following command:

connect to database_alias user database_user using password

In this example, the ICMADMIN user connects to ICMNLSDB.

connect to ICMNLSDB user ICMADMIN using password

Select tabname from syscat.tables. All table names in the database are listed.

Known Issues

Oracle SES does not crawl folders that have all blank attributes.
The ICM plug-in does not support CLOB attributes because of a limitation when using these attributes with XPath queries.
To use the ICM eClient application to view search results, Oracle recommends that users log in to eClient first and then open the Oracle SES search screen in the same window. If a user opens the Oracle SES search results directly, then ICM eClient may prompt the user to log in. Then the user must manually refresh the Oracle SES page to view the selected document.
Change of the item type ACL does not update the items or documents (and their last modified date) of that item type. Whenever an ACL of an item type is changed from the System Administration client, the effective change on the items/documents of that item type can be reflected only through a force re-crawl. Change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.
When crawling an item type hierarchy of multiple levels, the crawler might signal this error:

com.ibm.mm.sdk.common.DKUsageError: DGL7146A: The query string is too long or too complex

The CM query has a length restriction of 64k. DB2 UDB does not have such a restriction, and the problem can be fixed by removing the 64K limitation check from the API and allowing the Library Server database determine the limit.

Setting Up Identity Management for DB2 Content Manager

Activate the ICM identity plug-in on the Global Settings - Identity Management Setup page with the following parameters:

Library Server name: The name of the alias of the Library Server of DB2 Content Manager that must be connected to retrieve all the item types required for crawling.
User name: User name of a valid ICM Server user. Required.
Password: Password of the ICM user. Required.
ICM Servers File: Specifies the absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.
ICM Environment File: Specifies the absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.

The required ICM Server (cmbicmsrvs.ini) and ICM Environment (cmbicmenv.ini) files can be found on the client side (computer with Oracle SES) at

ICM_InstallationFolder/cmgmt/connectors/cmbicmsrvs.ini and

ICM_InstallationFolder/cmgmt/connectors/cmbicmenv.ini

Creating an IBM DB2 Content Manager Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for these configuration parameters:

Container name: The item types to be crawled. This can be a specific item type whose instances need be crawled, or a folder/sub-folder if all item types inside that folder or sub-folder must be crawled. Container name can be a combination of multiple item types delimited by a slash (/). Note that a backslash (\) is an unacceptable delimiter.

Container names must be in the format:

parent_item_type_name[@parent_attribute_name=attribute_value]/child_item _type_name[@child_attribute_name=child_attribute_value]

or

child_item _type_name[@parent_attribute_name=attribute_value,@child_attribute_name=child_attribute_value]

For example, you might have a root-component item type named Level-1 with attribute Attribute1 whose value is Value-1. You have another item type Level-2 that is child of Level-1, with attributes Attribute-1 (linked with Level-1) Attribute-2 with value Value-2. You have another item type Level-3 that is a child of Level-2 and has attributes Attribute-1, Attribute-2 (linked attributes) and Attribute-3 with value Value-3.

If the user wants to crawl all items formed with item type Level-3 then the container name is:
```
Level-1[@Attribute-1="Value-1"]/Level-2[@Attribute-2="Value-2"]/Level-3
```
or
```
Level-3[@Attribute-1="Value-1" AND @Attribute-2="Value-2"]
```
The values for String and Date attributes are enclosed in double quotes while the values for Number attributes are not.
Attribute list: The comma-delimited list of ICM attributes along with their data types to be searchable. The format is:

AttributeName:AttributeType, AttributeName:AttributeType

Valid values are String, Number, and Date.

A database crawl indexes an attribute only if both name and type match the configured name and type; otherwise, the attribute is ignored. Optional.

The default searchable attributes for ICM are Modified Date, Title, and Author. This attribute is case-sensitive, and multiple attributes with same name are not allowed.
User name: The ICM user name used for crawling. It must be a user with at least read privileges on the configured item types. This setting is used to make a session with ICM to get ACL, Document List, metadata, and content.
Password: The password of the ICM user in User Name.
Crawl versions: Controls whether all versions of a document are crawled or only the latest version. Valid values are true and false. The default value is false. Any other value is interpreted as false.
Crawl folder attributes: Controls whether folder metadata is indexed. Valid values are true and false. The default value is false.
Library server name: The name of the alias of the Library Server of DB2 Content Manager that must be connected to retrieve all item types required for crawling.
Remove URL not in queue: Controls whether documents deleted from ICM are also removed from the index. Valid values are true and false. The default value is false.
Authentication attribute: The authentication attribute used to validate the ACL. The value for the Active Directory identity plug-in is USER_NAME, and for ICM identity plug-in is NATIVE. Required
WebClient path: The path of an optional Web application used to render the search results. ICM allows the rendering of search results in ICM eClient and a custom Web application, which must be deployed separately on the ICM application server.
Title field: A case-sensitive, comma-delimited list of attributes that can be used as the titles in the ICMD containers specified for crawling. Required.
Time Zone: The time zone of the ICM library server. Because the library-server of ICM could be in a different time zone than the Oracle SES server, this attribute enables the Oracle SES time zone to be converted to the ICM time zone for time-based queries. If an invalid time zone is entered, then GMT is used by default.
ICM Servers File: The absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.
ICM Environment File: The absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.
Use ICM eClient to view search results: Controls whether ICM eClient is used to view search results or some other Web application. Enter true for ICM eClient; false otherwise.

Setting Up Microsoft SharePoint Sources

The SharePoint Crawler connector enables Oracle SES to provide secure search over SharePoint Portal Server and Microsoft Office SharePoint Server 2007 (MOSS). The connector extends the searching capabilities of Oracle SES and enables it to search into an external repository. Oracle SES can crawl through the documents, items, and related metadata in SharePoint repositories and provide secure, full-text search. The connector also provides metadata search and browse functionality, which allows a search to be done against a specific subfolder in the hierarchy.

In SharePoint, data is stored in different libraries such as the Document Library, Picture Library, Lists, Discussion Boards, and so on. A SharePoint instance can have one or more sites and sub-sites that the SharePoint Crawler connector can crawl after you set up the appropriate configuration parameters in the Oracle SES Administration GUI. The SharePoint Crawler connector navigates through the Libraries and Lists to crawl all the documents and items from a SharePoint repository. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

The SharePoint Crawler connector supports incremental crawling, which means that it crawls and indexes only those documents that have changed after the most recent crawl. A document is re-crawled if the content, metadata, or direct security access information of the document has changed since the previous crawl. Documents deleted from a Library are removed from the index during incremental crawling.

Important Notes About SharePoint 2007 Sources

When the Crawl Security Settings parameter is set to either NORMAL or STRICT, the SharePoint Crawler for the Container must use the SharePoint administrator account for crawling and indexing documents.
When the Crawl Security Settings parameter is set to RELAX, any user that has at least Visitor (Read) permissions can be identified in the SharePoint source for crawling and indexing documents.
The supported versions of SharePoint Server are:
- 2003 or 2.0 for SharePoint Portal Server
- 2007 or 3.0 for MOSS 2007
SharePoint Container names in Oracle SES should not contain any special characters. Enter a backslash (\) before a slash or a comma. Otherwise, the crawler does not recognize the Container.

Known Limitations of the SharePoint 2007 Connector

Passwords entered through the Oracle SES Administration GUI are case insensitive.
Storing more than 200 files in a single folder may result in degraded performance and increased crawling time.
If the Crawler Security Settings parameter is set to RELAX, then the user ID specified in the User Name parameter does not require administrative privileges. Visitor (Read) permissions on the site are sufficient. However, Read must have Browse Directories permissions to access any sub-sites. Otherwise, the sub-sites are not crawled.

To add Browse Directories permissions for SharePoint 2007:
1. Open People and Groups - Site Permissions.
2. Under Settings - Permission Levels, select READ.
3. Under Site Permissions, select Browse Directories.
4. Click Submit.
To add Browse Directories permissions for SharePoint 2003:
1. Open the Created subarea and select Manage Security.
2. Select the user and edit permissions.
3. Select READ.
4. Click Advanced Permissions.
5. Under Advanced Permissions, select Browse Directories.
6. Click OK.
SharePoint does not allow users without administrative privileges to browse user profiles.

If the user ID specified in the User Name parameter does not have administrative privileges, then this user needs permission to manage profiles.

To grant permission to manage profiles:
1. Open SharePoint Central Administration 3.02.
2. Click Shared Services Administration - SharedServices1.
3. Under User Profile and My Sites, select Personalization Service Permissions.
4. Add user user1 and select permissions Manage user profiles.
5. Save and submit the user.
User profiles are crawled if the user has specified the root site in the Site/Sub-Site URL parameter of the source configuration.

Known Issues for SharePoint 2007 Connector

Versions of list items whose object type is folder are not getting crawled and indexed.
Site Collection Administrator users are not able to see documents if they are not listed among the document permission users.
Unable to type cast null message is not error. This information is provided when the crawler tries to crawl attachments that are not supported for a particular entity.
Principal user_name cannot be validated error is returned when the crawler obtains a user name from the SharePoint repository that is not present in the Active Directory.
Performance of the SharePoint connector can be impacted when the Crawl Versions attribute is set to true.

Supported Platforms

The following platforms are supported by the SharePoint Crawler connector:

Red Hat Linux 4
Windows 2003 Server Standard Edition and above with the latest Service Pack

Creating a SharePoint 2007 Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters described in the following list. Also see Table 7-6, "Supported Values for SharePoint Source Parameters".

SharePoint Version: Version of the SharePoint server (SharePoint Portal Server/MOSS 2007) to crawl. (Required)
Container name: Contains the names of the containers to be crawled by Oracle SES. You can specify multiple container names as a comma-delimited list. (Required)

You can crawl an entire area or site or a specific folder. The format for specifying a container folder is AreaName/LibraryName/FolderName/SubFolderName.

To crawl all documents in the Area or Library, the format is AreaName or AreaName/LibraryName.

To index the entire SharePoint portal, enter a slash (/).

To crawl all sites, enter sites.

Examples for SharePoint Portal Server:
- Container name: AreaName
  
  The entire Area is crawled.
- Container name: AreaName/LibraryName/Folder21
  
  Folder21 and its subfolders within LibraryName are crawled.
- Container name: LibraryName
  
  All documents inside the Library and its subfolders are crawled.
Examples for MOSS 2007:
- Container name: LibraryName/Folder21
  
  Folder21 and its sub-folders within LibraryName are crawled.
- Container name: LibraryName
  
  All documents inside the Library and its subfolders are crawled.
  
  The path for the container cannot contain any special characters. Enter a backslash (\) before a slash or a comma.
Attribute list: A comma-delimited list of attributes, as described in Table 7-7. The format for an attribute list is AttributeName, AttributeName. Multiple attributes with same name are not allowed, such as Emp_ID, Emp_ID.

In MOSS 2007, all attributes viewable from the UI are indexed by default. List all custom attributes to index, using the names displayed in the user interface.

In SPPS (SP 2003), the Title, LastModifiedDate, and Author attributes are indexed by default. List any other attributes to index, using the names displayed in the UI.

If you update the attribute list from the administrator parameters, then perform a forced recrawl to delete the indexes of the old attribute list and to create indexes for the new attribute list.
Domain name: The domain name of the user that is used to crawl the SharePoint site. For example, if you intend to use the OracleDomain\Administrator user for crawling, then enter OracleDomain for this parameter. Do not include .com or .in or any other suffix in the name. (Required)
User name: Specifies the user name of a valid SharePoint Portal Server/MOSS 2007 user. Do not include the domain name for this user. For example, for OracleDomain\Administrator, enter Administrator. (Required)
Password: Specifies the password of the SharePoint user specified in User name. (Required)
Authentication attribute: Format of the user and group identity stored in the ACL of SharePoint objects. This format must be an authentication attribute of the Oracle SES active identity plug-in, such as USER_NAME for an Active Directory identity plug-in. Otherwise, the ACL validation fails during indexing. (Required and case sensitive)

For example, this value is USER_NAME for the Microsoft Active Directory identity plug-in.
SPS Site/Sub-Site URL: The URL of the Site or Sub-site of the SharePoint Portal, which is used for viewing the search results. (Required)

This URL has the form http://HostName:PortNumber or http://HostName:PortNumber/SubSiteName.
Crawl Security Settings: Sets security on documents for indexing. (Required)

This setting can be one of the following:
- NORMAL: The regular crawl uses site-level access control lists (ACLs) but not document-level ACLs.
- RELAX: When the SharePoint Site Administrator user information is not available and the SharePoint user has visitor (or read) permissions on the site, this user is not able to crawl subsites under the main site. This mode is intended for exposing public documents temporarily and quickly to search. The SES administrator must be careful not to expose documents to other users inadvertently. See the work-around for this in "Known Limitations of the SharePoint 2007 Connector".
- STRICT: Captures even document-level security. This mode requires that an additional Web Service agent, Oracle MOSS Web Service, be installed on the SharePoint 2007 server. See "Deploying the Web Service on MOSS 2007".
Simple Include: Only include URLs having at least one word mentioned in this parameter. Separate the words with commas.
Simple Exclude: Exclude all URLs having one or more word(s) mentioned in this parameter. Separate the words with commas.
Regular Expression Include: Include all URLs that match the expression provided in this parameter.
Regular Expression Exclude: Exclude all URLs that match the expression provided in this parameter.
Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. Any other value is interpreted as false. The default value is false, so only the latest version is crawled. (Optional)
Crawl folder attributes: Controls whether folder attributes are crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)
Crawl attachments: This parameter indicates whether attachments should be crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)
LDAP URL: URL of the LDAP server, such as ldap://IP:port, where the default port number is 389.
LDAP Search Base: LDAP Search Base, such as, DC=abc, DC=com. When the value of Authentication Attribute is DN, specify the LDAP URL and the LDAP search base of the LDAP server configured in the identity plug-in. Otherwise, leave these parameters blank.

Table 7-6 summarizes the supported values for the configuration parameters of the SharePoint Crawler connector.

Table 7-6 Supported Values for SharePoint Source Parameters

Parameter Name	SharePoint Portal Server	MOSS 2007
SharePoint Version	2003, 2.0	2007, 3.0
Container name	(/) for full site, Library Name, List Name, Area Name	(/) for full site, Library Name, List Name
Attribute list	`AttributeName1`, `AttributeName2`	`AttributeName1`, `AttributeName2`
Domain Name	Domain name of the user	Domain name of the user
User name	Valid administrator user for SharePoint Portal server	Valid administrator user for MOSS 2007
Password	Password for the user	Password for the user
Authentication attributes	`USER_NAME`	`USER_NAME`
SPC Site/Sub-Site URL	IP address or host name with port on which SharePoint Portal Server is installed	IP address or host name with port on which MOSS 2007 is installed
Crawl Security Settings	`NORMAL`, `RELAX`	`NORMAL`, `RELAX`, `STRICT`
Simple Include	Part of URL	Part of URL
Simple Exclude	Part of URL	Part of URL
Regular Expression Include	All URLs that match the expression	All URLs that match the expression
Regular Expression Exclude	All URLs that match the expression	All URLs that match the expression
Crawl versions	`true` or `false`	`true` or `false`
Crawl folder attachments	`true` or `false`	`true` or `false`
Crawl attachments	`true` or `false`	`true` or `false`
LDAP URL	URL of the LDAP server	URL of the LDAP server
LDAP Search Base	LDAP Search Base	LDAP Search Base

Table 7-7 Attributes for List Items and Versions Crawled for SharePoint 2007

List Item Type	Attributes
Document Library	Title, Author, Created, Modified
Picture Library	Title, ImageSize, ImageCreateDate, Description, Keywords
Form Library	Title, Author, Created, Modified
Translation Library	Title, Name, Language, Base Document Version, Translation Status, Created
Data Connection Library	Connection Type, Description, Keywords, Title, UDC Purpose, Created
Slide Library	Name, Presentation, Description, Created
Report Library	Name, Title, Author, Created, Report Category, Report Status
Dash Board	Name, Title, Author, Created
Wiki Page Library	Title, Author, Created, Modified
Announcements	Title, Body, Editor, Modified, Author, Created
Contacts	Company, WorkCity, Created, Email, Comments, Title, Editor, HomePhone, JobTitle, Modified, WorkZip, WorkPhone, WorkState, FirstName, Author, FullName, WorkCountry, CellPhone, WorkFax, WorkAddress
Links	Comments, Editor, Modified, Author, URL, Created
Discussion Reply	Body, Created, DiscussionTitle, Editor, Modified, Author
Calendar	EventType, Title, EventDate, Duration, Editor, WorkspaceLink, Modified, EndDate, Description, fRecurrence, Author, fAllDayEvent, Created
Task	Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created
Project Task	Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created
Issue Tracking	Category, LinkIssueIDNoMenu, RelatedIssues, IssueID, Priority, DueData, Comment, V3Comments, IsCurrent, Created, Title, Status, Editor, AssignedTo, Modified, Author
Custom List	Title, Editor, Modified, Author, Created
Languages and Translators	Language_x0020_From,Language_x0020_To,Modified,Author,Translator,Created, Editor
KPI List	Title, PercentExpression, Editor, ViewGuid, Modified, Value, AutoUpdate, KpiComments, Author, Goal, ValueExpression, Warning, KpiDescription, DataSource, LowerValuesAreBetter, Created

Deploying the Web Service on MOSS 2007

For MOSS 2007, if the Crawl Security Settings parameter is set to STRICT, then you must install an extra web service, Oracle MOSS Web Service. The following installation and deinstallation files are provided by the OracleMOSSService installer at ORACLE_HOME/search/lib/plugins/sps/WebService.zip:

OracleMossService.wsp
install.cmd
de-install.cmd

To install or deinstall the Oracle MOSS Web Service:

Click install.cmd to install, or click de-install.cmd to deinstall.
Verify that the STSADM.exe file is in the following location: Drive:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN.

If STSADM.exe is not in that folder, specify the correct path when the installer prompts for it.
Press any key to continue.

Setting Up Open Text Livelink Sources

Livelink data is stored in Workspaces, which in turn can contain folders, files, projects, and task lists. A Livelink Enterprise Server instance can have one or more Workspaces that can be crawled. Oracle SES navigates through the Workspaces to crawl all the objects in Livelink Enterprise Server. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Important Notes for Open Text Livelink Sources

The administrator account must be used by the Livelink crawler plug-in for the container for crawling and indexing documents.
The Livelink Enterprise Server version must be 9.2, 9.5.0, 9.5.5

Required Tasks

Because Open Text Livelink software is not included with Oracle SES, certain files must be copied manually into Oracle SES. Copy the lapi.jar file from LAPI installation folder into ORACLE_HOME/search/lib/plugins/llcs.

The Directory Services module of Livelink should be installed with Livelink, if users and groups are importing from LDAP server and you want to use the Active Directory identity plug-in.

To import users and groups of Active Directory into Livelink Server:

Create an LDAP user that has permission in Active Directory to administer users and groups. This user synchronizes the Active Directory with Livelink.
To extend the schema of Active Directory, install the Active Directory Schema snap-in:
1. Select Run from Windows Start menu.
2. Type mmc /a in the Open field and click OK.
3. On the Console menu, choose Add/Remove Snap-in and click Add.
4. Under Snap-in, double-click Active Directory Schema. Click Close, then OK. Save the console (for example, as "Active Directory Schema.msc"). If the new snap-in does not appear under Snap-in, then you may have to re-install the Windows 2003 Administrative Tools and start again at step 2.
Open the following file in a text editor.:

livelink_home/module/directory_2_3_0/ot-livelink-schema.conf
Open the Active Directory Schema console from the Windows All Programs menu. The console has a name such as Active Directory Schema.msc.
Right-click Active Directory Schema and select Operations Master.
Right click the Attributes folder and select Create Attribute.

Create the attribute llserverinfo using the information from ot-livelink-schema.conf, as shown in Table 7-8.

Table 7-8 llserverinfo Values

Name	Value
Common Name	llserverinfo
LDAP Display Name	llserverinfo
Object ID	Oracle_Internet_Directory from `ot-livelink-schema.conf`
Syntax	Case-insensitive string
Multivalued	Selected

Create the attribute llquery using the information from ot-livelink-schema.conf as follows:

Table 7-9 llquery Values

Name Value

Common Name

llquery

LDAP Display Name

llquery

Object ID

OID from ot-livelink-schema.conf

Syntax

Case-insensitive string

Multivalued

Deselected
Browse through the Directory Services Administration section of the Livelink Administration page to enable the following configuration.
1. To enable the Synchronization Features:
  
  Click the Choose Directory Services link.
  
  Select LDAP Synchronization (Read-Only LDAP) from the Synchronization list.
  
  For Livelink CGI Hosts, specify 127.0.0.1,Livelink_Server_IP
  
  Click Save Changes.
2. To configure LDAP Read-Only Parameters, set the parameters described in Table 7-10.
  
  Click Save Changes.
3. Click Synchronize LDAP Read-only.
  
  Click Synchronize.

Name	Value
Common Name	llquery
LDAP Display Name	llquery
Object ID	`OID` from ot-livelink-schema.conf
Syntax	Case-insensitive string
Multivalued	Deselected

Table 7-10 LDAP Read-Only Parameters

Parameter	Value
New User Password Policy	Hidden
User name Case Sensitivity	Preserve case
Livelink Server Name	Computer name on which Livelink Server is running
LDAP Server	Computer name or IP Address on which LDAP server is running
LDAP Server Port	389
Search Root	cn=Users,dc=otdomain,dc=com
LDAP User name	cn=<LDAP_User_Name>,cn=Users, dc=otdomain,dc=com
LDAP Password	<LDAP_User_Password>
Log-in Name	sAMAccountName or cn
First Name	givenname
Last Name	sn
Title	title
E-mail	mail
Contact	telephonenumber
Department Mapping	disable
Group Name	cn
Group Leader	managedBy
Group Member	Member
Group Member Query	llquery
Privileges	Select Log-in enabled, Public Access
Group Search Filter	objectclass=group
Synchronize Group	checked

Known Issues

If you update the attribute list, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.

Setting Up Identity Management for Open Text

The Livelink Enterprise Server identity plug-in authenticates native users of Livelink Enterprise Server. The identity plug-in communicates with the directory to authenticate a user's credentials, validate a user or group and return the associated canonical form, and return the groups associated with a given user.

Activate the identity plug-in on the Global Settings - Identity Management Setup page, as described in "Activating the Active Directory Identity Plug-in".

Creating an Open Text Livelink Source

Create an Open Text source on the Home - Sources page. Select Open Text from the Source Type list, and click Create. Enter values for the following parameters:

User name: Name of a valid Livelink Enterprise Server user. The user must be an Administrator user or a user who has access to all folders and documents of the workspaces configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from folders, documents and other custom sub classes of all workspaces configured in Container name parameter. Required
Password: Password of the Livelink user. Required.
Server Name and Port Number for Livelink: The computer name/IP address and the port number on which Livelink server is running. The format is ServerName:port.
Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Livelink Workspace or a specific folder. The format for is: WorkspaceName/FolderName/SubFolderName. You can enter multiple comma-delimited container names. Required.

For example:
- Container name: Workspace1: The entire Workspace1 is crawled.
- Container name: Workspace2/Folder21: Folder21 and its sub-folders within Workspace2 are crawled.
Attribute list: The comma-delimited list of Livelink attributes along with their data types to be searchable. The format of an attribute list is AttributeName:AttributeType, AttributeName:AttributeType. Valid values are String, Number, and Date. Optional.

Table 7-11 shows equivalent Open Text and Oracle SES data types. The crawler indexes an attribute only if both name and type match with configured name and type; otherwise, it is ignored. Multiple attributes with same name are not allowed. For example Emp_ID:String, Emp_ID:Number

The default searchable attributes for Livelink Enterprise Server are Modified Date, Title, and Author.

For example: Consider the following Livelink attributes:
- Attribute name: account name attribute type: String
- Attribute name: account ID attribute type: Integer
- Attribute name: creation date attribute type: Date
For these attributes to be searchable, the value of Attribute List must be:

Account Name: String, Account ID: Number, Creation Date:Date
Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. The default value is false. Any other value is provided interpreted as false, and only the latest versions of the documents are crawled. Optional.
Crawl folder attributes: Controls whether folder attributes are crawled. Valid values are true and false. The default value is false. Any other value is provided interpreted as false, and only the latest versions of the documents are crawled. Optional.
Authentication attribute: The attribute used to set ACL. With Active Directory, the value is USER_NAME. With the Livelink identity plug-in, the value is NATIVE. Required and case sensitive.
Crawl objects with public access: Controls whether objects with public access are crawled without an ACL. Valid values are true and false. When false, all objects with this ACL are ignored.
Livelink URL: The Livelink URL for viewing objects from the Livelink Server. For example, for Windows, the URL must be

http | https://host/livelink_service/livelink.exe.

For other application servers like WebLogic, Tomcat, and WebSphere, the URL must be

http | https://host:port/livelink_service/livelink.

Table 7-11 Open Text Data Types

Sr. No	Open Text Data Type	Oracle SES Data Type
1	Boolean	String
2	Integer	Number (Big Decimal)
3	String	String
4	Date	Date

Setting Up Oracle Content Database Sources

Documents in Oracle Content Database are organized into folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.

The metadata crawled includes folder_url (URL of the folder containing the document) and folder_path (path of the folder containing the document). These let you show the direct folder path and direct folder URL for each document hit.

Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.

Important Notes for Oracle Content Database Sources

This book uses the product name Oracle Content Database to mean both Oracle Content Database and Oracle Content Services. Oracle Conte nt Database sources are certified with Oracle Content Database release 10.2 and release 10.1.3 and Oracle Content Services release 10.1.2.3.

Known Issues:

The administrator account used by the Oracle Content Database source must have the ContentAdministrator role on the site that is being crawled and indexed. Also, end users searching documents in Oracle Content Database must have the GetContent and GetMetadata permissions.
By default, Oracle Content Database has a limit of three concurrent requests (simultaneous operations) for each user. However, Oracle SES has a default of five concurrent crawler threads. When crawling Oracle Content Database, only three of the five threads can successfully crawl, which causes the crawl to fail.

Workaround: For an Oracle Content Database source, change the Number of Crawler Threads on the Home - Sources - Crawling Parameters page to a value of 3 or fewer.

Or, modify the Oracle Collaboration Suite configuration in Oracle Enterprise Manager to allow more than three concurrent requests. For example:
1. Access the Enterprise Manager page for the Collaboration Suite Midtier. For example: http://example.domain:1156/.
2. Click the Oracle Collaboration Suite midtier standalone instance name. For example: ocsapps.example.domain.
3. In the System Components table, click Content.
4. From Administration, click Node Configurations.
5. In the Node Configurations table, click HTTP_Node. For example: ocsapps.computer.domain_HTTP_Node.
6. On Properties, change the value for Maximum Concurrent Requests Per User. Enter a value larger than or equal to the number of crawling threads used by Oracle SES. This value is listed on the Global Settings - Crawler Configuration page.

Setting Up Identity Management for Oracle Content Database Sources

The Oracle SES instance and the Oracle Content Database instance must be connected to the same or mirrored Oracle Internet Directory system or other LDAP server.

To set up a secure Oracle Content Database source:

Read "Known Issues:" and confirm that the number of crawler threads does not exceed the available concurrent connection settings for each user in Oracle Content Database.
Activate the Oracle Internet Directory identity plug-in for the Oracle Content Database instance on the Global Settings - Identity Management Setup page in Oracle SES.
For Oracle Content Database 10.1.2.3 and 10.2.0.4, use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)
```
ORACLE_HOME/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f  ORACLE_HOME/search/config/ldif/csPlugin.ldif
```
This defines the entity that is used for the connector: orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext. The entity has the password welcome1.

Creating an Oracle Content Database JDBC Source

The Content Database JDBC connector is an alternative to the Content Database connector provided in Oracle SES Release 10.1. The JDBC connector greatly improves the performance of incremental crawls. If the elapsed time of an incremental crawl is an important consideration in your deployment of Oracle SES, then use the JDBC connector.

Oracle SES crawler supports crawling from Oracle Content Database 10.1.2.0.4 or later. See the readme file for Oracle Content Database 10.2.1.0.4 patchset for details on configuring high volume full and incremental crawls in Oracle Content Database.

Note that it may be necessary to grant the SES user access to one of the Oracle Content Database objects. To do this, use the command:

GRANT SELECT ON ODMC_ALERT_SEQ TO sesuser

where sesuser is the SES user.

For example,

GRANT SELECT ON ODMC_ALERT_SEQ TO eqsys

Note:

The JDBC connector requires installation of a patch to Oracle Content Database. If the patch is not available for your version of Content Database, then use the older connector as described in "Creating an Oracle Content Database Source".

To create an Oracle Content Database JDBC source:

Open the Oracle SES Administration GUI to the Home page.
Select the Sources secondary tab.
For Source Type, select Oracle Content Database (JDBC), then click Create to display Step 1 Parameters.
Enter a source name and the values for the parameters described in Table 7-12.
Click Next to display Step 2 Authorization.
Enter the settings described in Table 7-13.
Click Create or Create and Customize to create the source.

Table 7-12 Oracle Content Database JDBC Source Parameters (Step 1)

Parameter	Value
Database Connection String	JDBC connection string to Oracle Content Database in the form `jdbc:oracle:thin@server:port:sid`. For example, `jdbc:oracle:thin@example.com:1521:rel11g`
Content DB System User	SYSTEM user for Content Database.
Alert Table Name	Name of the Alert table for Content Database, which typically has the form `ODMC_ALERT_name`.
Database User ID for Crawl	Valid user ID for the Content DB database.
Database Password for Crawl	Password associated with the user ID for crawling.
Document Count	Maximum number of documents to be crawled.
URL Prefix	URL to Oracle Content Database in the form `HTTP://hostname:port/CONTENT`. For example, `HTTP://example.com:7778/CONTENT`.
Document Access (DAV) User ID	Valid Content Database user ID for using WebDAV to access documents.
Document Access (DAV) Password	Password associated with the DAV user ID.
Starting Path for Crawl	Full path where the crawl starts. Enter `/` to crawl the entire Content Database hierarchy.

Table 7-13 Oracle Content Database JDBC Authorization Parameters (Step 2)

Parameter	Value
Authorization Database JDBC Connection String	JDBC connection string to Oracle Content Database in the form `jdbc:oracle:thin@server:port:sid`. For example, `jdbc:oracle:thin@example.com:1521:rel11g`
Content DB System User	System user for Content Database, such as `CONTENT` or `IFS_SYS`.
Database User ID	User ID to connect to the database.
Database Password	Password associated with the database user ID.
Use the Run-Time Result Filter	Controls use of a final security check: `TRUE`: Performs a final security check on each row in the result set. `FALSE`: Does not do a final check. (Default)
Authorization User ID Format	Format of user ID in the authorization query. Enter a supported authentication attributes of the active ID plugin, such as `nickname`.

Creating an Oracle Content Database Source

If Oracle Content Database release 10.2 or Oracle Content Services release 10.1.2 is used, then the Entity name and Entity password parameters are required, the last six parameters related with keystore are not required, and the crawler plug-in uses service to service (S2S) authentication to connect to Oracle Content Database.

If Oracle Content Database release 10.1.3 is used, then the last six parameters in the following table are required, the Entity name and Entity password are not required, and Oracle SES uses Web services authentication to connect to Oracle Content Database. See "Required Tasks for Oracle Content Database Release 10.1.3".

Create an Oracle Content Database source on the Home - Sources page. Select Oracle Content Database from the Source Type list, and click Create.

Enter values for the parameters listed in Table 7-14.

Table 7-14 Oracle Content Database Source Parameters

Parameter	Value
Oracle Content Database URL	`http://host name:port/content`
Starting paths	/
Depth	-1
Oracle Content Database admin user	`orcladmin`
Entity name	`orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext`
Entity password	welcome1
Crawl only	`false`
Use e-mail for authorization	`false`
Oracle Content Database Version	For example, 10.1.3.2.0
SES keystore location	For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks
SES keystore type	jks
SES keystore password	*******
SES private key alias	client
SES private key password	*******
CDB Server public key alias	server

Table 7-15 Oracle Content Database Authorization Manager Plug-in Parameters

Parameter	Value
Oracle Content Database URL	http://host name:port/content
Oracle Content Database admin user	orcladmin
Entity name	`orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext`
Entity password	welcome1
Use e-mail for authorization	`false`
Use result filter for authorization	`false` You can use a real-time result filter (query-time authorization) to ensure that the user has access to each result document. Set this parameter to `true` to remove documents that the user has lost access to since the last crawl.
Oracle Content Database Version	For example, 10.1.3.2.0
SES keystore location	For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks
SES keystore type	jks
SES keystore password	********
SES private key alias	client
SES private key password	*******
CDB Server public key alias	server

Required Tasks for Oracle Content Database Release 10.1.3

This section describes the required steps for Web services authentication when using Oracle Content Database release 10.1.3. This procedure uses the JDK keytool to create the keys.

See Also:

"Setting Up a Server Keystore for WS-Security" in the Oracle Fusion Middleware Administrator's Guide for Oracle Universal Online Archive at http://download.oracle.com/docs/cd/B32110_01/content.1013/b32191/security.htm#CHDGCJEH

Configure a server keystore at the Oracle Content Database middle tier if the keystore is not set up yet.

The file ORACLE_HOME/j2ee/OC4J_Content/config/oc4j.properties defines the keystore type and the keystore properties file location. If you use a different file name for the keystore, then edit the file on the following entry:

oracle.ifs.security.KeyStoreLocation=/home/oracle/product/10.1.3.2.0/OracleAS_1/content/settings/server-keystore.jks
1. Change to the settings directory:
```
cd Oracle_home/content/settings 
```
2. Create the Oracle Content Database server keystore with the following keytool command, substituting a secure password for password.
```
Oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
-alias server -keystore server-keystore.jks -dname "cn=server" -keypass 
password -storepass password
```
  To list the keys in the store:
```
Oracle_home/jdk/bin/keytool -list -keystore server-keystore.jks 
-keypass password -storepass password
```
3. Sign the key before using it:
```
Oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias server 
-keystore server-keystore.jks -keypass password -storepass password
```
4. Export the server public key from the server keystore to a file:
```
Oracle_home/jdk/bin/keytool -export -alias server -keystore 
server-keystore.jks -file cdbServer.pubkey -keypass password -storepass 
password
```
5. Store both the keystore password and the private server key password in a secure location so Oracle Content Database can access the keystore and the private key.
```
Oracle_home/content/bin/changepassword -k
```
  When prompted for the old password, press [Enter] if it is the first time to set the password; otherwise, enter the previous password. Then, enter and confirm the keystore password (-storepass password) that you provided in step 1.b.
  
  See ORACLE_HOME/content/log/changepassword.log.

Configure a client keystore at the Oracle SES installation.

Create the SES client keystore with the following keytool command, substituting a secure password for password:

Oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
-alias client -keystore sesClientKeystore.jks -dname "cn=client" 
-keypass password -storepass password

To list the keys in store:

Oracle_home/jdk/bin/keytool -list -keystore sesClientKeystore.jks 
-keypass password -storepass password

Sign the key before using the key:

Oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias client 
-keystore sesClientKeystore.jks -keypass password -storepass password

Restart the WebCenter middle tier from the Oracle Enterprise Manager console.

Export the server public key from the server keystore to a file:

Oracle_home/jdk/bin/keytool -export -alias client -keystore 
sesClientKeystore.jks -file sesClient.pubkey -keypass password 
-storepass password

Import Oracle SES client public keys into the Oracle Content Database server keystore (sesClient.pubkey must be copied to Oracle Content Database):

cd Oracle_home/content/settings
 
Oracle_home/jdk/bin/keytool -import -alias client -file 
sesClient.pubkey -keystore server-keystore.jks -keypass password 
-storepass password

Import Oracle Content Database server public keys into the Oracle SES keystore. (cdbServer.pubkey must be copied to Oracle SES):

Oracle_home/jdk/bin/keytool -import -alias server -file 
cdbServer.pubkey -keystore sesClientKeystore.jks -keypass password 
-storepass password

Note:

Check the server logs at ORACLE_HOME/content/logs for keystore issues with the crawler plug-in.

Oracle Content Database Source Attributes

Oracle SES crawls the following attributes for Oracle Content Database Sources:

AUTHOR
CREATE_DATE
DESCRIPTION
FILE_NAME
LASTMODIFIEDDATE
LAST_MODIFIED_BY
TITLE
MIMETYPE
ACL_CHECKSUM: The check sum calculated over the ACL submitted for the document.
DOCUMENT_LANGUAGE: Oracle SES language code taken from Oracle Content Database language string. For example, if Oracle Content Database uses "American", then Oracle SES submits it as "en-us".
DOCUMENT_CHARACTER_SET: The character set for the Oracle Content Database document.

Oracle SES also can search categories or customized attributes created by the user in Oracle Content Database.

You can apply categories to files and links, and divide categories into subcategories having one or more attributes. When a document in Oracle Content Database is attached to a category, you can search on the attribute of category. (The attributes appear in the list of search attributes.)

For example, suppose you create a category named testCategory with testAttr1 and testAttr2. Document X is created and assigned to testCategory. You must assign the value to the testCategory attributes. After crawling, testAttr1 and testAttr2 appears in the search attribute list.

Customized attribute values can be the following types: String, Integer, Long, Double, Boolean, Date, User, Enumerated String, Enumerated Integer, and Enumerated Long:

Index Long, Double, Integer, Enumerated Integer, and Enumerated Long type customized attributes are type Number attributes in Oracle SES. The display name has an _N suffix.
Index Date customized attributes are type Date attributes in Oracle SES. The display name has a _D suffix).
Index String, Enumerated String, and User customized attributes are type String attributes in Oracle SES.

Limitations on Custom Attributes for Oracle Content Database

The Oracle Content Database SDK has more features than the Oracle Content Database Web GUI. The Web GUI does not support String arrays, but the SDK does. If you use the SDK to build customized administration and user GUIs that support the String array type, then a customized attribute can have multiple values.
If a document in Oracle Content Database is attached to a category and the attributes in that category are left blank, then the attribute is not available in the attribute list for an Advanced Search. The crawler skips attributes with null values. However, if another document has the same attribute with a real value, then the attribute is indexed.

Setting Up Oracle Content Server Sources

The Oracle Content Server connector enables Oracle SES to search Oracle Content Server (formerly Stellent Server), which is the foundation of the Oracle Universal Content Management solution. Users throughout the organization can contribute content from native desktop applications, manage content through rich library services, publish content to Web sites or business applications, and access the content with a browser.

The Content Server connector supports Oracle Content Server 7.5.2 or 10gR3 with XMLCrawlerExport (the Oracle Content Server RSS component).

Oracle Content Server includes an RSS feed generator component (XMLCrawlerExport) on top of the content server. This component generates RSS feeds as XML files from its internal indexer, based on indexer activity. It has access to the original content (for example, a Microsoft Word document), the Web viewable rendition, and all the metadata associated with each document. The component also has a template that contains a Idoc script that applies the metadata values from the indexer to generate the XML document. (Idoc is an Oracle Content Server proprietary scripting language.) Oracle Content Server generates feeds for all documents for the initial crawl, and feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, with the operation on the item (such as insert, delete, update), its metadata (such as author, summary), URL links, and so on.

The Oracle Content Server connector reads the feeds provided by Oracle Content Server according to a crawling schedule. Oracle SES parses and extracts the metadata information, and fetches the document content, using its generic RSS crawler framework.

Oracle SES supports the control feed method, in which individual feeds can be located anywhere and a control feed file is generated containing the links to other feeds. This control file is input to the connector through the configuration file. Control feed must be used when two computers are on different domains or on different platforms, or if they use remote access protocol, such as HTTP or FTP, for communication between the two servers.

See Also:

"Overview of XML Connector Framework"
Oracle Content Database page at http://www.oracle.com/technology/products/contentdb/index.html

Oracle Content Server Security Model

The Oracle Content Server security model is based on the concept of permissions, which defines the privileges a user has on a document. The following table shows the set of permissions supported by Oracle Content Server. Each permission is a superset of the previous ones. For example, Write permission includes Read permission. Admin permission is a superset of all the permissions.

Table 7-16 Oracle Content Server Permissions

Permission	Description
Read	View documents
Write	View, Check In, Check Out, and Get Copy of documents
Delete	View, Check In, Check Out, Get Copy, and Delete documents
Admin	View, Check In, Check Out, Get Copy, and Delete documents An Administration user with Workflow rights can start or edit a workflow for the document. An Administration user can also check in documents with another user specified as the Author.

Oracle Content Server provides multiple security models, including an out-of-the-box security system and integration with centralized security models such as LDAP and Active Directory.

Oracle Universal Content Management security can work in these modes:

Universal Content Management native identity plugin where Universal Content Management is not connected to a directory
Oracle Internet Directory
Active Directory only where Universal Content Management is connected to Active Directory using LDAP. A connection to Active Directory using Microsoft Security is not supported.

The Oracle SES Oracle Content Server connector supports the two most popular security models among current Oracle Content Server customers: Roles and Groups, and Accounts.

Roles and Groups

A security group is a set of files grouped under a unique name. Every file in the library belongs to a security group. Access to security groups is controlled by the permissions, which are assigned to roles, which are assigned to users. For example, the EngAdmin role has Read, Write, Delete, and Admin permission to all content in the EngDocs security group. User Joe is assigned to role EngAdmin; therefore, Joe has all permissions to the documents in EngDocs group.

Accounts

Accounts provide greater flexibility and granularity than groups. An account is a group of content. It introduces another metadata field that is filled out upon content check-in. When accounts are enabled, content items also can be assigned to an account in addition to the security group. A user must have access to the account to read, write, delete or administer content in that account. When accounts are used, the account becomes the primary permission to satisfy before security group permissions are applied.

A user's access to a document is like the intersection between their account permissions and security group permissions. For example, a user is assigned the EngAdmin role, which has all permissions to the documents in EngDocs security group. At the same time, the user is also assigned Read and Write permission to the EngProjA account. Therefore, the user has only Read and Write permission to a content item that is in the EngDocs security group and the EngProjA account.

Accounts can also be set up in a hierarchical structure. A user has permission to the entire subtree starting from the account node. For instance, a user assigned to the Eng account has access to Eng/AbcProj and Eng/XyzProj, or any accounts beginning with Eng. In other words, users that have permission to a particular account prefix also have access to all accounts with that prefix.

Note:

Oracle Content Server uses a prefix test for account filtering, so a slash (/) has no special meaning. A user granted permission to account A has access to any documents in account A*, such as A, AB, or A/B. The hierarchical structure takes advantage of the prefix semantics, but it is enforced with the account model. Hence, there is no special character as the level divider when testing for account permissions.

See Also:

Oracle Universal Content Management documentation at

http://www.oracle.com/technology/products/content-management/ucm/index.html

Setting Up Identity Management for Oracle Content Server

To activate the Oracle Content Server identity plug-in:

On the Global Settings page, select Identity Management Setup under the System heading.

The Global Settings - Identity Management Setup page is displayed.
Select Oracle Content Server and click Activate.
Enter values for the parameters described in Table 7-17, then click Finish.

Table 7-17 Oracle Content Server Connector Setup Parameters

Parameter	Value
HTTP endpoint for authentication	HTTP endpoint for Oracle Content Server authentication. For example, `http://my.host.com:port/idc/idcplg`
Admin User	Administrative user who accesses the Oracle Content Server Identity Service API
Password	Administrative user password

Creating an Oracle Content Server Source

To create an Oracle Content Server source using the Oracle SES Administration GUI:

On the Home page, click the Sources secondary tab to display the Sources page.
Select Oracle Content Server from the Source Type list, then click Create to display Step 1 Parameters.
Enter values for the parameters described in Table 7-18.
Click Next to display Step 2 Authorization, then set values for the parameters described in Table 7-18.
Scroll down to Security Attributes to verify that ACCOUNT and DOCSECURITYGROUP are listed. If they are not, then the source was not created correctly. Verify that the Configuration URL in Step 1 is correct.
Click Create to create the Oracle Content Server source.

After processing each data feed, a status feed is uploaded to the location specified in the configuration file. This status feed is named one of the following:
- data_feed_file_name.suc indicates the data feed was processed successfully.
- data_feed_file_name.err indicates that an error was encountered while processing the feed. The errors are listed in this status feed.

Tip:

To index multibyte character sets, set the default character set of the crawler to UTF-8 regardless of the character set of Oracle Content Server. See "Modifying the Crawler Parameters".

Table 7-18 Oracle Content Server Source Parameters (Step 1)

Parameter	Value
Configuration URL	URL of the XML configuration file providing details of the source, such as the data feed type, location, security attributes, and so on. Obtain the location of the file from the Oracle Content Server administrator. Use the following format to enter the configuration URL: `http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name`
Authentication Type	Java authentication type. Set this parameter when the data feeds are accessed over HTTP. Enter one of the following values: `NATIVE`: Proprietary XML over HTTP authentication `ORASSO`: Oracle Single Sign-on.
User ID	User ID to access the data feeds. The access details of the data feed are specified in the configuration file. Obtain a user ID from the Oracle Content Server administrator.
Password	Password for User ID. Obtain the password from the Oracle Content Server administrator.
Realm	Realm of the Oracle Content Server instance.
Oracle SSO Login URL	URL that protects all OracleAS Single Sign-on applications. Set this parameter when the Authentication Type is ORASSO.
Oracle SSO Action URL	URL that authenticates OracleAS Single Sign-on user credentials. The login form is submitted to this URL. Set this parameter when Authentication Type is ORASSO.
Scratch Directory	Directory where Oracle SES can write temporary status logs. The directory must be on the same system where Oracle SES is installed. Optional.
Maximum number of connection attempts	Maximum number of attempts to connect to the target server for access to the data feed.

Table 7-19 Oracle Content Server Connector Authorization Parameters (Step 2)

Parameter	Value
HTTP Endpoint for Authorization	HTTP endpoint for Oracle Content Server authorization, such as `http://example.com:7777/idc/idcplg`.
Display URL Prefix	HTTP host information to prefix the partial URL specified in the access URL of the documents in RSS feeds to form the complete URL. This complete URL is displayed as the URL when a user clicks the document link in the Oracle SES search results page. For example, you might display `http://example.com:7777/idc` (not `http://example.com/`, as shown on the user interface page).
Administrator User	Administrative user to access the Authorization Service API of Oracle Content Server.
Administrator Password	Administrative user password.
Display Crawled Version	Controls access to the crawled documents: `true`: Search results point to the crawled version of the document. `false`: Search results point to the content information page.
Authorization User ID Format	Format of the user ID used by the Oracle Content Server authorization API, such as `username`, `email`, `nickname`, `user_name`.
Use Cached User and Role Information to Authorize Results	Controls user authorization: `true`: Uses the cached user query filter. This setting removes the query time dependency on Oracle Content Server. `false`: Queries Oracle Content Server for authorization.
User Role Data Source to Cache the Filter	The name of the Oracle Content Server Users source that has crawled the user's SecurityGroup and Account information.
Authentication Type	Java authentication type. Enter `NATIVE` for proprietary XML over HTTP authentication, or `ORASSO` for Oracle Single Sign-on. Set this parameter when the data feeds are accessed over HTTP.
Realm	Realm of the Oracle Content Server instance.
Oracle SSO Login URL	URL that protects all OracleAS Single Sign-on applications. Set this parameter when the Authentication Type is ORASSO.
Oracle SSO Action URL	URL that authenticates OracleAS Single Sign-on user credentials. The login form is submitted to this URL. Set this parameter when Authentication Type is ORASSO.

Note:

In previous releases, the base path of Oracle SES was referred to as ORACLE_HOME. In Oracle SES release 11g, the base path is referred to as ORACLE_BASE. This represents the Software Location that you specify at the time of installing Oracle SES.

ORACLE_HOME now refers to the path ORACLE_BASE/seshome.

For more information about ORACLE_BASE, see "Conventions".