5 Configuring Access to Content Management Sources

This chapter contains the following topics:

Setting Up EMC Documentum Content Server Sources
Setting Up FileNet Content Engine Sources
Setting Up FileNet Image Services Sources
Setting Up Hummingbird Document Management Server Sources
Setting Up IBM DB2 Content Manager Sources
Setting Up Microsoft SharePoint Sources
Setting Up Open Text Livelink Sources
Setting Up Oracle Content Database Sources
Setting Up Oracle Content Server Sources

Setting Up EMC Documentum Content Server Sources

Documentum data is stored in DocBases, which can contain cabinets and folders. A Documentum Content Server instance can have one or more DocBases crawled with an EMC Documentum Content Server source. The Documentum Content Server source navigates through the DocBases and the inline cabinets to crawl all the documents in Documentum Content Sever. Oracle SES creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the most recent crawling was scheduled. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum Content Server and the end user has to access the same document with a different URL. Documents deleted from a DocBase will be removed from the index during incremental crawling.

Important Notes for EMC Documentum Content Server Sources

The admin account of a DocBase should be used by the Documentum source in Oracle SES for crawling and indexing documents of that DocBase.

Required Software

Documentum Content Server DA (Documentum Administrator) or Documentum Content Server WebTop application must be installed and configured.
Documentum Foundation Classes (DFC) must be installed on the server running Oracle SES.

Required Tasks

Because EMC Documentum Content Server software is not included with Oracle SES, certain files must be copied manually into Oracle SES.

The DFC installation asks for destination directory and user directory. For Windows, the default destination directory is C:\Program Files\Documentum and default user directory is C:\Documentum. For UNIX, it is a prerequisite to create DFC program root and DFC user root. For example, DFC program root can be <USER HOME>/documentum_shared and DFC user root can be <USER HOME>/documentum.

Copy the dfc.properties and DFC jar files from the following locations into ORACLE_HOME/search/lib/plugins/dcs.

dfc.jar
- Windows: <DFC destination directory>/shared/
- UNIX: <DFC destination directory>/dfc
dfcbase.jar
- Windows: <DFC destination directory>/shared/
- UNIX: <DFC destination directory>/dfc
dfc.properties
- Windows: <DFC user directory>/config/
- UNIX: <DFC user directory>/config/

For Windows 2003 Server, copy dmcl40.dll from <DFC destination directory>/shared/ to ORACLE_HOME/bin.

For UNIX platforms, copy the file according to the following table:

Table 5-1 DFC Files to Copy for UNIX Platforms

Platform	Copy File	From	To
Linux x86	`libdmcl40.so`	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib`
Linux x86-64	`libdmcl40.so`	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib32`
Solaris SPARC (64-bit)	`libdmcl40.so`	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib32`
HP-UX PA-RISC (64-bit)	libdmcl40.sl	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib32`
AIX 5L Based Systems (64-bit)	`libdmcl40.so`	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib32`
HP-UX Itanium	`libdmcl40.so`	`<DFC destination directory>/dfc`	`ORACLE_HOME/lib32`

Note:

The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on UNIX.

You must declare DOCUMENTUM and DOCUMENTUM_SHARED before restarting the middle tier with searchctl restartall.

See the DFC installation guide for more information.

On UNIX platforms only, push the DCS libraries to global libraries by adding the following lines to the oc4j/j2ee/OC4J_SEARCH/config/application.xml file:
```
<library path="../../../../search/lib/plugins/dcs/dfcbase.jar" /> 
<library path="../../../../search/lib/plugins/dcs/dfc.jar" /> 
<library path="../../../../search/lib/plugins/dcs" /> 
<library path="../../../../search/lib/log4j.jar" />
```
This assumes that the directory search/lib/plugins/dcs contains the Documentum Server configuration file dfc.properties.
Restart the middle tier with searchctl restart. On Windows, after installing DFC, also restart the Windows computer.

Known Issues

In this release, search results cannot be viewed in Documentum desktop. The documents and folders can be viewed only using Documentum Administrator (DA) or Webtop applications.
For the Container name parameter, a value of repository name alone might not work. Enter a value of repository name/cabinet name. For example, <DocBase Name>/<Repository Name/Cabinet Name>/<Folder Name>/.

Setting Up Identity Management for EMC Documentum Content Server

Activate the identity plug-in on the Global Settings - Identity Management Setup page. Select Oracle Internet Directory identity plug-in and click Activate.

Enter values for the following parameters:

For Authentication Attribute, select nickname.
For Host name, enter the host name of the computer where Oracle Internet Directory is running.
For Port, enter the value 389 (the default LDAP port number).
For Use SSL, enter true or false.
For Realm, enter the Oracle Internet Directory realm; for example, dc=us,dc=oracle,dc=com.
For User name, enter the Oracle Internet Directory administrator user name; for example, cn=orcladmin.
For Password, enter the password for the user name.

Compatible version of Documentum Foundation Classes (DFC) must be installed on the computer running Oracle SES.

Import users/groups from Oracle Internet Directory to Documentum. First, create an LDAP Configuration Object in Documentum Administrator (DA):
1. Login to DA.
2. Navigate to Administration - User Management - LDAP.
3. Click File - New - LDAP Configuration Object.
4. For Name, enter a name for the LDAP configuration object.
5. For User Subtype, select dm_user.
6. For Communication Mode, select Regular.
7. For Import, select Users and Groups.
8. Use this configuration object in the server field select Default Configuration Object.
9. Click Next.
10. For Directory Type, select Oracle Internet Directory Server.
11. For Bind Type, select Bind by Searching for Distinguished Name.
12. For Binding Name, enter the Administrator user name of Oracle Internet Directory, normally cn=orcladmin.
13. For Binding Password, enter the Administrator password of Oracle Internet Director.
14. For Host Name, enter the Oracle Internet Directory host name.
15. For Port, it shows the default value 389 (the default port number of Oracle Internet Directory).
16. For Person Object Class, enter the information of Base Person Object, typically the value is inetOrgPerson.
17. For Person Search Base, enter the person search base defined in Oracle Internet Directory; for example, dc=Users,dc=us,dc=oracle,dc=com.
18. For Person Search Filter, specify the cn=*.
19. For Group Object Class, enter the Group Object; typically, its value is groupOfUniqueNames.
20. For Group Search Base, enter the Group Search base defined in Oracle Internet Directory; for example, cn=Groups,dc=us,dc=oracle,dc=com.
21. For Group Search Filter, specify the cn=*.
22. Click Next.
23. Attribute Map information is displayed. Click Finish.
Run the LDAP_Synchronization job:
1. Login to DA.
2. Navigate to Administration - Job Management - Jobs.
3. Open the job dm_LDAPsynchronization.
4. For state, select Active.
5. Check the Deactivate On Failure check box.
6. For Designated Server, select the host name of Documentum Server.
7. Check the Run After Update check box.
8. Go to the Schedule tab.
9. For Start Date And Time, set the current date and time.
10. Select Repeat time from the Repeat list.
11. Set Frequency to any numeric value.
12. Select the End Date And Time radio button and specify how long the synchronization job should run.
13. Go to the Method tab.
14. Check the Pass Standard Argument check box.
15. Go to the SysObject info tab.
16. Click OK.
Add permission to each folder and file to make them accessible by the search user. (Adding permissions to a folder automatically adds the same permissions to all files and sub-folders in the folder.) The following steps create a permission set and assign users/groups to that set. The same permission is assigned to documents. If the documents are not stamped with permission, then it won't get crawled.

Create Access Control Lists (ACLs):
1. Login to DA.
2. Navigate to Administration - Security.
3. In the File menu click File - New - Permission set.
4. For Name, enter a name for the permission set.
5. Click Next.
6. Click Add to add more users/groups to the permission set.
7. Select LDAP users/groups that are to made a part of the permission set and move them to the right frame using the arrow keys. Click OK.
8. Click Finish.
Assign ACLs to documents:
1. Login to DA.
2. Navigate to the document where the permission set is to be applied.
3. Select the Properties icon of this document.
4. Go to the Permissions tab.
5. Click Select in front of Permission set name.
6. Search and select the permission set to be applied to the document.
7. Click OK.

It is important that the users/groups in the permission sets that are applied to the documents are LDAP users/groups. Those documents that do not have permission sets with LDAP users/groups will not be crawled.

Creating an EMC Documentum Content Server Source

Create an EMC Documentum Content Server source on the Home - Sources page. Select EMC Documentum Content Server from the Source Type list, and click Create. Enter values for the following parameters:

Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Documentum DocBase or a specific repository/cabinet/folder. The format is <DocBase Name>/<Repository Name/Cabinet Name>/<Folder Name>/. Multiple comma-delimited container names can be entered. This parameter is case-sensitive; hence, the same cabinet name as in Documentum repository should be entered. This is a required parameter. For example:
- Container name: DocBase1: The entire DocBase1 will be crawled.
- Container name: DocBase2/Cabinet21: Cabinet21 and its sub-folders within DocBase2 will be crawled.
- Container name: DocBase2/Cabinet21/Folder11: Folder11 and its sub-folders will be crawled.
- Container name: DocBase1, DocBase2/Cabinet21/Folder11: The entire DocBase1 and Folder 11 in DocBase2/Cabinet21 will be crawled.

Attribute list: The comma-delimited list of Documentum attributes along with their data types to be searchable. The format is <attribute name>:<attribute type>, <attribute name:attribute type>. Valid values are String, Number, and Date.

Table 5-2 Documentum Data Type Mapping

Sr. No	Documentum Data Type	Oracle SES Data Type
1	Boolean	Number
2	Integer	Number
3	String	String
4	ID	String
5	Time or Date	Date
6	Double	Number

While crawling a DocBase, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional parameter. For example: To make the following Documentum attributes searchable:

Attribute name: account name attribute type: String
Attribute name: account ID attribute type: Integer
Attribute name: creation date attribute type: Date

The value of Attribute list should be the following:

Account Name: String, Account ID: Number, Creation Date:Date

The default searchable attributes for Documentum Content Server are Modified Date, Title, and Author.

Multiple attributes with same name are not allowed. For example, Emp_ID:String, Emp_ID:Number

User name: Enter the user name of a valid Documentum Content Server user. The user should be an administrator user or a user who has access to all cabinets/folders and documents of the DocBases configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from cabinets, folders, documents and other custom sub classes of all DocBases configured in Container name parameter. This is a required parameter.
Password: Password of the Documentum user. This is a required parameter.
Crawl versions: Indicate whether multiple versions of documents should be crawled, either true or false. This is an optional parameter. The default value is false. If any other value is provided, it is assumed to be false and only the latest versions of a document will be crawled.
Crawl folder attributes: Indicate whether folder attributes need to be crawled, either true or false. This is an optional parameter. The default value is false. If any other value is provided, it is assumed to be false.
URL for viewing the documents: A valid URL for Documentum WebTop or DA application used for viewing the Oracle SES search results. For example, http://<IP address>:<port>/da or http://<IP address>:<port>/webtop.
Authentication Attribute: This parameter is used to set ACLs. This parameter lets you set multiple LDAP servers. If Oracle SES and Documentum Content Server are synchronized with Active Directory, then enter the value USER_NAME. If Oracle Internet Directory is used, then enter nickname.

Setting Up FileNet Content Engine Sources

FileNet Content Engine data is stored in object stores, which can be further contained inside folders on a server. A FileNet Content Engine instance can have one or more object stores that can be crawled by specifying the Object Store details in the Container name parameter in Oracle SES. The Content Engine source navigates the object store to crawl all the documents in the configured Content Engine Object Store. It stores the metadata and accesses information in Oracle SES to provide search according to the end user permissions.

Important Notes for FileNet Content Engine Sources

Any user having administrative privileges can be used to access FileNet Content Engine Crawler plug-in for crawling and indexing documents.

Required Software

FileNet Content Engine version 3.5
FileNet Application Engine version 3.5

Required Tasks

Because FileNet Content Engine software is not included with Oracle SES, certain files must be copied manually into Oracle SES:

Copy javaapi.jar, soap.jar, xercesImpl.jar and xml-apis.jar files from <FileNet installed Folder>/Workplace/WEB-INF/lib to ORACLE_HOME/search/lib/plugins/fnetce.
Copy the WCMConfig.properties file from <FileNet installed Folder>/Workplace/WEB-INF, into ORACLE_HOME/search/lib/plugins/fnetce.

Known Issues

If any of the parameters are updated after initial crawl, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.
If additional document types are configured after first time crawl, then these document types are not indexed on subsequent re-crawls. This is also the case if the Document Size parameter is changed after the first crawl. For example, if the Document Size was 10 MB at the time of the first crawl and it is changed to 20 MB before re-crawl, then documents greater than 10 MB are rejected. As a workaround, create the source again and then make the changes.

Setting Up Identity Management with Filenet Content Engine

If a FileNet Content Engine source is used, Oracle recommends that Active Directory be used as identity management system for the Oracle SES instance. The Active Directory instance must be the same one that FileNet Content Engine is using to authenticate users on the file system.

Creating a FileNet Content Engine Source

Create a FileNet Content Engine source on the Home - Sources page. Select FileNet Content Engine from the Source Type list, and click Create. Enter values for the following parameters:

Container name: The names of the containers to be crawled by Oracle SES. You can crawl a complete objectstore or a specific Folder. The format for specifying container is <ObjectStore>/<Folder Name>/. Multiple comma-delimited containers can be specified. This is a required parameter. For example:
- Container name: ObjectStore1: The entire ObjectStore1 will be crawled.
- Container name: ObjectStore1/Folder1/Folder12: The documents inside Folder12 and its sub-folders will be crawled.
- Container name: ObjectStore1, ObjectStore2/Folder1/Folder12: The entire ObjectStore1 and contents of Folder12 in ObjectStore2 will be crawled.
User name: A valid FileNet Content Engine user. The user should be an Administrator user or a user who has access to all Folders and Documents present in the configured container. The user should be able to retrieve content, metadata, and ACL from folders, documents of all containers configured in Container name. This is a required parameter.
Password: Password of the Content Engine user. This is a required parameter.

Attribute list: Attribute list corresponds to the comma-delimited list of Content Engine attributes along with their data types that the administrator wants to be searchable. The format is <attribute name>:<attribute type>, <attribute name:attribute type>. The valid values are String, Number, and Date.

Table 5-3 FileNet Content Engine Data Type Mapping

Sr. No	FileNet Content Engine Data Type	Oracle SES Data Type
1	Boolean	String
2	float, int, byte, and other numeric values	Number (Big Decimal)
3	String	String
4	DateTime, Date	Date
5	Others	String

While crawling from object store an attribute will be indexed only if a valid attribute name and data type should be matched with the configured name and type, else it will be ignored. This is an optional parameter. For example, to make the following Content Engine attributes searchable:

Attribute name: DocumentTitle Attribute type: String
Attribute name: ID Attribute type: Number
Attribute name: DateCreated Attribute type: Date

The value of Attribute List should be: Document Title: String, Id: Number, DateCreated: Date

The default searchable attributes for FileNet Content Engine are Title, Author, and LastModifiedDate. Multiple attributes with same name are not allowed. For example: Emp_ID: String, Emp_ID: Number is not allowed.

Crawl versions: Indicate multiple versions of documents to be crawled with true. By default, this value is false; that is, only the latest version of documents will be crawled. If any value other than true is specified, it is considered false.
Crawl folder attributes: Specify whether or not folder metadata should be indexed, either true or false. The default value is false. Any other value for this parameter is considered false.
URL for viewing the documents: The URL for FileNet Workplace application used for viewing the search results. Workplace is a part of FileNet P8 AE. For example: http://<IP address> < port>/Workplace
Remove deleted documents from index: This parameter determines whether documents deleted from CE object stores should be removed from the index as well, either true or false. The default value is false, as this would be a costly operation in terms of performance. If any value other than true is specified, it is considered false.
Authentication attribute: The authentication attribute used to set ACL. For Active Directory, the value should be USER_NAME.

Setting Up FileNet Image Services Sources

Documents in FileNet Images Services are organized into Folders. A FileNet Image Services source navigates through the folder hierarchy to crawl all documents in FileNet Image Services (IS). Oracle SES creates the index and stores the metadata of the documents retrieved from FileNet Images Services in Oracle SES to provide search according to the end users' permissions.

A FileNet Image Server instance can have one or more Libraries. A Library is the document repository and contains documents within Folders and sub-Folders. A FileNet Image Services source can crawl multiple Libraries.

Images stored in Image Services can have annotations. Some of the annotations contain text, and these annotations will be crawled. The annotations crawled are:

Stamp
Transparent Text
Stick note

You can search on the content of these annotations after the IS library has been crawled.

Important Notes for FileNet Image Services Sources

A user belonging to IS SysAdmin group should be used to crawl documents and metadata in IS.

Required Software

FileNet Image Services Server version 4.0 or 3.6 SP2
Image Services Resources Adapter version 3.2.1

Required Tasks

Because FileNet Image Services software is not included with Oracle SES, certain tasks must be performed manually to integrate with Oracle SES:

Deploy the ISCrawlerWeb.war file in the same application server on which ISRA has been deployed.
For application servers that require context root to be specified while deploying a WAR file, specify Context Root as ISCrawlerWeb.
If the application server is WebSphere Application Server, then activate URL rewriting: Click Servers - Application Servers - name of the server - Web Container - Session Management - Enable URL Rewriting.

Known Issues

If additional document types are configured after the first crawl, then these document types are not indexed on subsequent re-crawls. The same applies if the Document Size parameter is changed after first crawl. For example, Document Size was 10 MB at the time of first crawl and it is changed to 20 MB before re-crawl, then documents with greater than 10 MB are rejected. As a workaround: update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.
XML documents are crawled by default without configuring the source for XML documents: Oracle SES provides an option of configuring the documents types, including XML, to be crawled. Currently, even if XML document type is not configured, XML documents still are crawled.

Setting Up Identity Management for FileNet Image Services

Activate an identity plug-in on the Global Settings - Identity Management Setup page.

Configure the identity plug-in for Image Services

On the Global Settings - Identity Management Setup page, select the FileNet Image Services identity plug-in, and click Activate.
For Authentication Attribute, select NATIVE.
For Web Component URL enter the host name and port number of the Web component URL; for example, http://webserverhost:port/ISCrawlerWeb.
For Administrator user name, enter Image Services user name.
For Administrator password, enter the password of the Image Services user.
For Library name of IS Server, enter the name of the Image Services library like 'ISCF'. Library Name is the ISRA connection factory name that is created when ISRA is deployed.
Click Finish.

Image Services Resources Adapter (ISRA) must be deployed on a supported application server. See the ISRA documentation for supported application servers.

Connection Factory must be created for ISRA, the connection factory should be configured for the target IS libraries. See the ISRA documentation for deployment instructions.

ISRA comes with a viewer application for viewing images and annotations, the FNImageViewer.ear application should be deployed on the same application server as ISRA. This viewer would be invoked to display images for example jpeg, tiff, bmp, gif, and annotations. See the ISRA documentation for deployment instructions.

To support secure search, the Image Services server must be synchronized with the Active Directory server. See the section 'LDAP configuration' in ISRA deployment guides for importing Microsoft Active Directory users/groups to Image Services.

After Active Directory users/groups have been imported into Image Services, ISRA must be configured to authenticate with Active Directory. See the section 'LDAP configuration' in ISRA deployment guide for details.

Creating a FileNet Image Services Source

Create a FileNet Image Services source on the Home - Sources page. Select FileNet Image Services from the Source Type list, and click Create. Enter values for the following parameters:

Container names: The names of the containers to be crawled by Oracle SES. You can crawl an entire FileNet Image Services Library or a specific Folder. The format is <Library Name>/<Folder Name>/(cache name). Library name is the ISRA connection factory name created when ISRA is deployed. Cache name is where the document content can be found. Multiple comma-delimited container names can be entered. This is a required parameter. For example:
- Container name: LibraryName1(cache name): The entire LibraryName1 will be crawled
- Container name: LibraryName2/Folder1/(cache name): Folder1 and its sub-folders will be crawled.
- Container name: LibraryName1, LibraryName2/Folder1(cache name): The entire LibraryName1 and Folder 1 in LibraryName2 will be crawled
- Cache name: The format is cache name:DomainName:Oraganization. This is an optional parameter. If the cache name is not provided, then the plug-in tries to retrieve document content from the default page cache. However, the plug-in throws an error if an invalid page cache or empty brackets () is specified. Ask IS administrator for cache details.
User name: Enter the user name of a valid FileNet Image Services user. The user should be a SysAdmin user or a user who has access to all Folders and Documents of the Libraries configured in the Container name parameter. The user should be able to retrieve content, metadata and ACL from folders, documents and other custom sub classes. The user should be defined in the configured LDAP server and should be imported into IS. This is a required parameter.
Password: The FileNet Image Services user password. This is a required parameter.
Web component URL: The URL of J2EE application server where the crawler plug-in Web component module is deployed. The format of the URL is http://<host>:<port>. This is a required parameter.

The Web component is also used to view the search results, on clicking an Oracle SES search result the user is prompted for login. On successful login, the document is displayed. To view images and annotations the FileNet Image viewer FNImageViewer.ear should be deployed. FNImageViewer.ear is a part of ISRA CD. If the viewer is not deployed, the images will be displayed in native viewer or the user is prompted to download the document.

Attribute Names: The comma-delimited list of Image Services attributes along with their data types to search. The format is <attribute name>:<attribute type>, <attribute name: attribute type>. Valid values are String, Number, and Date.

Table 5-4 FileNet Image Services Data Type Mapping

Sr. No	FileNet Image Services Data Type	Oracle SES Data Type
1	BOOLEAN	String
2	BYTE	Number
3	UNSBYTE	Number
4	SHORT	Number
5	UNSSHORT	Number
6	LONG	Number
7	UNSLONG	Number
8	ASCII	String
9	TIME	Date
10	DATE	Date
11	MENU	Number
12	FP_NUM	Number

While crawling a Library an attribute will be indexed only if both name and type of the attribute in the library match the configured name and type; otherwise, it is ignored. This is an optional parameter. For example, to make the following FileNet Image Services attributes searchable:

Attribute name: account name attribute type: String
Attribute name: account ID attribute type: Integer
Attribute name: creation date attribute type: Date

The value of Attribute List should be

Account Name: String, Account Id: Number, Creation Date: Date

Set source hierarchy: Indicate whether the source should set the source hierarchy of the document, either true or false. The default value is false. If any other value is provided, it is assumed to be false.

A document in Image Services can be filed in multiple folders, it is possible that a user could have READ permissions on a document but not on all the folders in which the document is filed. If Set Source Hierarchy is 'true', then there is a possibility that a user could view a source hierarchy on which he does not have permissions in IS. However, he would not be able to view the documents on which he does not have READ permissions.
Set Public Access: Indicate whether the source should set the public access of the documents whose ACL is Anyone, either true or false. The default value is false. If any other value is provided, it is assumed to be false.
Authentication Attribute: This parameter is used to get the LDAP authentication attribute. This parameter will vary based on the identity plug-in used for authentication. For Microsoft Active Directory, it should be USER_NAME. For FileNet Image Services identity plug-in, it should be NATIVE.

Setting Up Hummingbird Document Management Server Sources

The Hummingbird DM Server plug-in extends the searching capabilities of Oracle SES and enables it to search Hummingbird DM Server repositories. Oracle SES can crawl documents and metadata in the Hummingbird repositories and provide secure, full-text search. It also provides metadata search and browse functionality, which allows search to be done against a specific subfolder in the hierarchy.

Hummingbird data is stored in libraries, which can contain folders, files, and workspaces. A Hummingbird DM Server instance can have one or more libraries that can be crawled with the Hummingbird DM Server plug-in by configuring parameters in Oracle SES. The Hummingbird DM Server plug-in navigates through the libraries to crawl all documents in Hummingbird DM Server. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed since the most recent crawl. A document is re-crawled if the content, metadata, or the direct security access information of the document has changed. Documents deleted from a library are removed from the index during incremental crawling.

The Hummingbird plug-in includes two components: a plug-in jar file and a Web services component. The jar file is deployed in Oracle SES. The Web services component must be deployed on the computer on which Hummingbird Web Server (Webtop) is deployed.

The Hummingbird DM Server identity plug-in is used to authenticate the native users of Hummingbird DM Server.

Important Notes for Hummingbird DM Server Sources

The Hummingbird crawler plug-in should use the admin account for the Container for crawling and indexing documents.
The Hummingbird DM Server version must be 2004 or 2005.

Required Software

Hummingbird DM Server must be installed and configured. The following versions of Hummingbird DN are supported: 2004, 2005.
Hummingbird Web Server (WebTop): Hummingbird Web Server is required to see the files and folder stored in Hummingbird DM Server.
Windows .NET Framework 1.1 must be on the same computer where Hummingbird Web Server (WebTop) is running.

Required Tasks

Import User/Groups from Active Directory Server to Hummingbird:

Login to Hummingbird WebTop with a user having administrator privileges.
Select DM ADMIN from the list at the top of page.
Go to Users and Groups - User Synchronization.
Select the Network Resource and click Load Network.
Select the name of your domain from where you want to import users and click Load Network.
The Network resource list will show the name of users. Select the users you want to import and click Import User.
Click Save.
In Library User, you can see the list of users that are imported in Hummingbird Web server.

Known Issues

If you update the Attribute list parameter, then a force re-crawl should be performed to delete the indexes of the old attribute list and create indexes for the new attribute list. That is, change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.

Setting Up Identity Management for Hummingbird

Choose an identity plug-in on the Global Settings - Identity Management Setup page.

Activate the Hummingbird identity plug-in with the following parameters.

Library name: The name of library to be crawled.
URL: This parameter is used to send the request to the Web service to retrieve the data. For example: <http/https>://<computername>:<port>/<VirtualDirectoryName>/HBDMIdentityWebservice.asmx.

Virtual directory name is the name given during installation of Web services for Hummingbird.
User name: User name of Hummingbird DM Server. The user should be an administrator user and a native user of Hummingbird. This is a required parameter.
Password: Password for User name.
Authentication Attribute: NATIVE.

Creating a Hummingbird Source

Create a source for the newly created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters in the following table.

Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Hummingbird library or a specific folder. The format is <LibraryName>/<LibraryName>/<Folder Name>/. This parameter is case-sensitive.

To crawl all documents in the library the format for library is <LibraryName>/< LibraryName>. Multiple comma-delimited container names can be entered. This is a required parameter. For example:
- Container name: <LibraryName>/<LibraryName>
 
 This means that the entire LibraryName will be crawled
- Container name: LibraryName/LibraryName/Folder21
 
 This means that Folder21 and its sub-folders within LibraryName will be crawled.
- Container name: LibraryName/LibraryName/Public Folders/Folder1
 
 This means that Folder1 and its sub-folders within Public Folders will be crawled.
Attribute list: The comma-delimited list of attributes to be searchable. The format is <attribute name>,<attribute name>. Hummingbird stores all attributes as String data type so the data type of attributes in Hummingbird are mapped with String data type of Oracle SES. Only the lastmodifieddate is set as Date data type in Oracle SES. The default attributes are Title, LastModifiedDate, and Author.

While crawling a library or folder, an attribute is indexed only with a match; otherwise, it is ignored. For example, to make the following Hummingbird attributes searchable:

Attribute name: account name

Attribute name: account ID

Attribute name: creation date

The value of Attribute List should be: account name, account ID, creation date.

Multiple attributes with same name are not allowed. For example: Emp_ID, Emp_ID.

If custom fields have been created, then include the name of table and column separated by a dot ("."). For example: <tablename>.<columnname>,<tablename>.<columnname>

This is an optional parameter.
User name: User name of a valid Hummingbird DM Server user. The user should be an administrator user or a user who has access to all folders and documents configured in Container name. The user should be able to retrieve content, attributes, and documents. This is a required parameter.
Password: Password of the Hummingbird user in User name. This is a required parameter.
Crawl versions: This parameter indicates whether multiple versions of documents should be crawled. Valid values are 'true' or 'false'. The default value is 'false'. If any other value is provided, it is assumed to be 'false', and only the latest versions of a document will be crawled. This is an optional parameter.
Crawl folder attributes: This parameter indicated whether folder attributes need to be crawled. Valid values are 'true' or 'false'. The default value is 'false'. If any other value is provided, it is assumed to be 'false'. This is an optional parameter.
View Documents: The IP address or computer name where the Hummingbird Webtop (Hummingbird Web Server) application is installed. The URL for viewing search results. For example: http://<computername>.

If SSL is enabled on Hummingbird DM Web Server, it is https://<computername>. If the hummingbird is running on a port other than the default port (80), then append the port number in the last of computer name separated with a colon (":"). For example: http://<computername>:<port>
Crawl Attachments: This parameter indicates whether attachments attached to the documents should be crawled. Valid values are 'true' or 'false'. The default value is 'false'. If any other value is provided, it is assumed to be 'false'. This is an optional parameter.
Search form: The profile name used in Hummingbird. It has default value DEF_QBE. If custom attributes have been added in profile and you want to search for these attributes, then pass the name of custom profile here.
URL for Webservice: The URL of Web services that will be consumed by the plug-in. For example: <http/https>://<computername>/<name of virtual folder created by Web service installer>/HBDMWebService.asmx".

If the Web service is running on a port other then the default port (80), then include the port number. For example: <http/https>://<computername>:<port>/<name of virtual folder created by Web service installer>/HBDMWebService.asmx".
Authentication Attribute: The name of the authentication attribute that will be used to set ACL. For Oracle Internet Directory, the value should be nick_name. For Active Directory, the value should be USER_NAME. For Hummingbird identity plug-in, the value should be NATIVE.
Hummingbird DM version: The version of Hummingbird DM to be crawled. Valid values are 5 and 6.

Deploy the Web Service on the Hummingbird DM Server

The Web service is located in $ORACLE_HOME/search/lib/plugins/hbdm. Unzip the contents to a temp directory. The Web service must be installed on the same server as Hummingbird DM.

The Web service component is provided as an installable setup file. This component must be installed on the same server on which Hummingbird Web Server and Windows .NET Framework 1.1 is installed.

Note:

Separate Web service installers are provided for Hummingbird DM 5 (Hummingbird_DM5_Web_Service_Installer.zip) and Hummingbird DM 6 (Hummingbird_DM6_Web_Service_Installer.zip). Make sure that the correct Web service component is installed based on the Hummingbird DM version.

Double-click setup.exe to install the Web service.
While installing, the setup will ask for name of virtual directory. (The virtual directory name can be changed.) The setup will create a virtual directory on Microsoft Internet Information Server (IIS) with same name. If you have more then one Web site in IIS running on different ports and you want to install this Web service in some other Web site (instead of the default Web site), then include the port number.
Provide the user name and password of Hummingbird DM Server. User name should be in the form: <domainname\username>.
Provide the user name and password of Hummingbird DM Server here. User name should be like this <domainname\username>.

Setting Up IBM DB2 Content Manager Sources

The IBM DB2 Content Manager (ICM) plug-in extends the searching capabilities of Oracle SES to search ICM repositories, which consists of item-types and their instances in form of folders and documents. Oracle SES can crawl documents and metadata in the ICM Library Server and provide secure, full-text search. Starting from the specified folders, the plug-in extends the crawling and thus the search, into their complete child-tree of any specified folder. If an item-type is specified for crawling, then the plug-in crawls all instances of the item-types and their complete child-trees.

In ICM, the library server manages the content metadata and access control to all content in a database (for example, DB2), interfacing to one or more resource managers. The primary job of the Library Server is to service client requests for content. The ICM plug-in navigates through the library server to crawl documents and folders in the specified item-types. It stores the metadata and accesses information in Oracle SES to provide search according to the end users' credentials.

While the crawler connects to the library server through the APIs, the library server internally connects with the resource manager through CM-managed secure tokens. Whenever a reference is made to the document object, they are fetched from the resource manager using these tokens. With the crawler plug-in, metadata corresponding to a document is retrieved from the library server while the display URL points to the document-object on the resource manager using the token.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the recent most crawl. A document is re-crawled if either the content, metadata, display URL, or the direct security access information of the document has changed. Documents deleted from a database are removed from the index during incremental crawling.

Important Notes for IBM DB2 Content Manager Sources

The user-account used to crawl the specified item-types should be an Administrator account that has access on all instances (documents/folders) to the specified item-types and is able to retrieve and crawl all folders and documents therein. The administrator user specified for crawling should belong to the "ICMPUBLIC" group and the "AllPrivs" privilege-set.
The version of DB2 Content Manager used to set up the repositories for crawling must be 8.3.

Required Software

This section lists required software (in order of installation) for the installation of DB2 Content Manager 8.3:

Server Side (computer with ICM server installed):

Windows Server 2003 Enterprise Edition
IBM WebSphere Application Server 5.1 plus FixPak 1
IBM DB2 Universal Database Enterprise Server Edition (32-bit): 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special
DB2 Content Manager Enterprise Edition 8.3 plus FixPak1
DB2 Information Integrator for Content 8.3 with Fix Pack 3
DB2 Content Manager eClient 8.3

Client Side (computer with Oracle SES installed):

IBM DB2 Run-Time Client: 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special
DB2 Information Integrator for Content 8.3 with Fix Pack 3
DB2 Content Manager Client for Windows 8.3 (optional for Windows)

Required Tasks on the Server Side

The following tasks must be performed on the computer with ICM server.

DB2 Content Manager 8.3 must be installed on the server computer with the required fix-packs.
LDAP task must be enabled on DB2 CM. To enable LDAP:
1. Launch the System Administration Client.
2. Bring up the LDAP Configuration window by selecting Tools - LDAP Configuration.
3. Select the Enable LDAP User import and authentication check box.
4. On the server tab, select server-type as Active Directory.
5. Provide the LDAP server information on the Server page.
6. Click OK.
After the LDAP configuration is complete, follow the steps to import users/groups from Active Directory to ICM:
1. In the system administration client, click Authentication and then right-click either Users or User-Groups.
2. Click the LDAP button and then enter the user to be imported into ICM. To receive a list of all users that can be imported, click Show All.
3. Select the user(s) to be imported and click OK.
4. From the Assign to Groups tab, assign the users to the required groups.
5. From the Set Defaults tab, specify the default resource manager, collection and item access control list for the user(s)/groups(s). Then click OK or Apply.
6. The selected user or user-group should get imported in the DB2 CM environment. It can be checked by again clicking Users or User-Groups. The imported user/user-group shows up in the list on the right side.

Required Tasks on the Client Side

The following tasks must be performed on the computer with Oracle SES.

Catalog the DB2 run-time client with DB2 Content Manager's Library database.

Open the services file, located in <Windows system directory>\drivers\etc directory for Windows and \etc directory for Linux, on the client computer and add at the end of the file the following command:
```
[Service Name] [Port #]/tcp #DB2 connection service port
Example: db2c_DB2 50000/tcp #DB2 connection service port
```

Run the following commands from the Command Line Processor on the client computer:

catalog tcpip node [some node name, anything you like] remote [IP address / host] server [service name]

For example:

catalog tcpip node CMDB remote <server-name> server db2c_DB2

Run the following commands from the Command Line Processor on the client computer:

catalog db [database name] as [database alias, anything you like] at node [node name configured in previous step]

For example:

catalog db ICMNLSDB as ICMNLSDB at node CMDB

Check the connection using the following commands from the Command Line Processor on the client computer:

connect to [database alias name configured in previous step] user [database user name] using [user password]

For example:

connect to ICMNLSDB user ICMADMIN using ICMADMIN

Database connection should succeed.
Select tabname from syscat.tables. All the table names in the database should be listed.

Known Issues

Oracle SES does not support crawling of folders that have all blank attributes.
The ICM plug-in does not support CLOB attributes. This is due to a limitation when using these attributes with XPath queries.
To use the ICM eClient application to view search results, the user is recommended to login to eClient first and then launch the Oracle SES search screen on the same window. If the user launches the Oracle SES search results directly, then ICM eClient may prompt the user to login, and the user must manually refresh the Oracle SES page to view the clicked document.
Change of item-type ACL does not update the items/documents (and their last modified date) of that item-type. Therefore, whenever an ACL of an item-type is changed from the System Administration client, the effective change on the items/documents of that item-type can be reflected only through a force re-crawl. That is, change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.
When crawling an item-type hierarchy of multiple levels, the crawler might throw a "com.ibm.mm.sdk.common.DKUsageError: DGL7146A: The query string is too long or too complex" exception. This is because the CM query has a length restriction of 64k. DB2 UDB does not have such a restriction, and the problem can be fixed by removing the 64K limitation checking from the API and letting Library Server database determine the limit.

Setting Up Identity Management for DB2 Content Manager

Activate the ICM identity plug-in on the Global Settings - Identity Management Setup page with the following parameters:

Library Server name: This parameter would have the name of the alias of the Library Server of DB2 Content Manager that needs to be connected to retrieve all the item-types required for crawling.
User name: User name of a valid ICM Server user. This is a required parameter.
Password: Password of the ICM user. This is a required parameter.
ICM Servers File: This parameter specifies the absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.
ICM Environment File: This parameter specifies the absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.

Note:

The required ICM Server (cmbicmsrvs.ini) and ICM Environment (cmbicmenv.ini) files can be found on the client side (computer with Oracle SES) at <ICM Installation Folder>/cmgmt/connectors/cmbicmsrvs.ini and <ICM Installation Folder>/cmgmt/connectors/cmbicmenv.ini.

Creating an IBM DB2 Content Manager Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters in the following table.

Container name: The item-types to be crawled. This can be a specific item-type whose instances need be crawled, or a folder/sub-folder if all item-types inside that folder/sub-folder need to be crawled. Container name can be a combination of multiple item-types delimited by a special character "/". Note that "\" is an unacceptable delimiter.

Container names should be in the format: <parent item-type name>[@<parent attribute-name>=<attribute-value>]/<child item-type name>[@<child attribute name>=<child attribute value>], or <child item-type name>[@<parent attribute-name>=<attribute-value>,@<child attribute name>=<child attribute value>].

For example, say you have a root-component item-type named Level-1 with attribute Attribute1 whose value is Value-1. You have another item-type Level-2 that is child of Level-1, with attributes Attribute-1 (linked with Level-1) Attribute-2 with value Value-2. You have another item-type Level-3 that is a child of Level-2 and has attributes Attribute-1, Attribute-2 (linked attributes) and Attribute-3 with value Value-3.

If the user wants to crawl all items formed with item-type Level-3 then the container name given should be:
```
Level-1[@Attribute-1="Value-1"]/Level-2[@Attribute-2="Value-2]/Level-3
```
Or
```
Level-3[@Attribute-1="Value-1" AND @Attribute-2="Value-2"]
```
Note that the values for String and Date attributes should be given with double-codes while the values for Number attributes should be given without any codes.
Attribute list: The comma-delimited list of ICM attributes along with their data types to be searchable. The format is <attribute name>:< attribute type>, <attribute name: attribute type>. Valid values are String, Number, and Date.

While crawling a database, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional field.

The default searchable attributes for ICM are Modified Date, Title, and Author. This attribute is case-sensitive, and multiple attributes with same name are not allowed.
User name: The ICM user name used for crawling. It should be a user with at least read privileges on the configured item-types. This is used to make a session with ICM to get ACL, Document List, metadata, and content.
Password: The password of the ICM user in User Name.
Crawl versions: This parameter is used to specify whether all the versions of a document should be crawled or only the latest version. The default value is false. Valid values are true or false. Any other value is considered false.
Crawl folder attributes: This parameter is used to specify whether or not folder metadata should be indexed. The default value is false. Valid values are true or false.
Library server name: The name of the alias of the Library Server of DB2 Content Manager that needs to be connected to retrieve all the item-types required for crawling.
Remove URL not in queue: This parameter is used to determine whether documents deleted from ICM should be removed from the index as well. Valid values are true or false. The default value is false.
Authentication attribute: The authentication attribute used to validate the ACL. With the Active Directory identity plug-in, this value should be USER_NAME, and for ICM identity plug-in it should be NATIVE. This is a required parameter.
WebClient path: ICM allows the rendering of search results in ICM eClient as well as a custom web-application, which, if used, needs to be deployed separately on the ICM application server.

This parameter crawler contains the path of the web-application used to render the search results.
Title field: Comma-delimited list of attributes that can be used as title in the ICMD containers specified for crawling. This is a case-sensitive required parameter.
Time Zone: Because the library-server of ICM could be in a different time zone than the Oracle SES server, this attribute captures the library-server time zone such that the Oracle SES time zone can be transformed to the ICM time zone to perform time-based queries. If a non-understandable is entered, then GMT is taken by default.
ICM Servers File: The absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.
ICM Environment File: The absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.
Use ICM eClient to view search results: This parameter determines if ICM's eClient is being used to view search results or some other web-application. Enter 'true' for ICM eClient; 'false' otherwise.

Setting Up Microsoft SharePoint Sources

A SharePoint Portal Server source enables Oracle SES to search a SharePoint Portal Server. Oracle SES can crawl through the documents, lists, discussions and related metadata in the SharePoint repositories and provide secure, full-text search. It also provides metadata search and browse functionality, which allows search to be done against a specific subfolder in the hierarchy.

SharePoint data is stored in different libraries like Document Library, Picture Library, Lists, and Discussion Boards, which in turn can contain sites and subareas. A SharePoint Portal Server instance can have one or more sites/subareas that can be crawled using the SharePoint Portal Server source by configuring the parameters in Oracle SES. The SharePoint Portal Server source navigates through the Libraries to crawl all documents in SharePoint Portal Server. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the recent most crawling was scheduled. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. Documents deleted from a Library are removed from the index during incremental crawling.

If you update the attribute list, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.

Important Notes for Microsoft SharePoint Sources

The admin account should be used by the SharePoint plug-in for the Container for crawling and indexing documents.
This connector supports SharePoint Portal Server version 2003.
The name of the Container in SharePoint that users crawl in Oracle SES should not contain any special characters. If it contains a forward slash ("/") or comma (",") then enter a backslash ("\") before the forward slash or comma. Otherwise, the crawler will not recognize the Container.

Creating a Microsoft SharePoint Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters in the following table.

Container name: Names of the containers to be crawled by Oracle SES. You can crawl an entire area or site or a specific folder. The format for specifying container folder is <Area Name>/<Library Name>/<Folder Name>/.

To crawl all documents in the Area or Library, the format for Area or Library is <AreaName> or <AreaName>/<LibraryName>.

To crawl all SharePoint services, enter a forward slash ("/") in this parameter.

To crawl all sites, enter "sites".

Multiple comma-delimited container names can be entered. This is a required parameter. For example:
- Container name: <AreaName>
 
 The entire Area will be crawled.
- Container name: <AreaName>/LibraryName/Folder21
 
 Folder21 and its subfolders within LibraryName will be crawled.
 
 Note: The path of container to crawl should not contain any special characters. If the path contains any forward slash ("/") or comma (",") in any container name, then insert a backslash ("\") before the forward slash or comma.
Attribute list: The comma-delimited list of attributes to be searchable. The format for attribute list is <attribute name>, <attribute name>. SharePoint stores all attributes as String data type, so the data type of attributes in SharePoint will be mapped with String data type of Oracle SES. Only the last modified date will be set as Date data type in Oracle SES. The default attributes the plug-in will set are Title, LastModifiedDate and Author.

Multiple attributes with same name are not allowed. For example Emp_ID, Emp_ID.
User name: User name of a valid SharePoint Portal Server user preceded by a "\" and the domain name of the domain in which this particular user lies. For example, oracledomain\Administrator.

The user should be an Administrator user or a user who has admin rights on the container mentioned in the Container name parameter. The user should be able to retrieve content, attributes, documents. This is a required parameter.
Password: Password of the SharePoint user in User name. This is a required parameter.
Crawl versions: This parameter indicates whether multiple versions of documents should be crawled. The default value is false. Valid values are true or false. If any other value is provided, it is assumed to be false. In this case, only the latest versions of a document will be crawled. This is an optional parameter.
Crawl folder attributes: This parameter indicates whether folder attributes need to be crawled. The default value is false. Valid values are true or false. If any other value is provided, it is assumed to be false. This is an optional parameter.
View documents: IP address or computer name where the SharePoint Webtop (SharePoint Web Server) application is installed. The URL to be used for viewing the search results. For example, <computername>.
Crawl attachments: This parameter indicates whether attachments need to be crawled. The default value is false. Valid values are true or false. If any other value is provided, it is assumed to be false. This is an optional parameter.
Authentication attribute: Name of authentication attribute to be used by the identity plug-in of the configured directory server. For Microsoft Active Directory, this value should be USER_NAME. This is a required parameter and is case-sensitive.

Deploy the Web Service on the SharePoint Portal Server

The Web service is located at $ORACLE_HOME/search/lib/plugins/spps/Sharepoint_Web_Service_Installer.zip. The contents of the zip file must be unzipped to a temp directory, and the Web service must be installed on the same server as the SharePoint server.

The Web service component is provided as an installable setup file. This must be installed on the same server on which SharePoint Portal Server is installed. To install the Web Services component:

Double-click setup.exe.
The setup will ask for login user name and password for the SharePoint admin user. Enter the user name as <domainname\username>.

Setting Up Open Text Livelink Sources

Livelink data is stored in Workspaces, which in turn can contain folders, files, projects, and task lists. A Livelink Enterprise Server instance can have one or more Workspaces that can be crawled. Oracle SES navigates through the Workspaces to crawl all the objects in Livelink Enterprise Server. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

Important Notes for Open Text Livelink Sources

The admin account should be used by the Livelink crawler plug-in for the container for crawling and indexing documents.
The Livelink Enterprise Server version must be 9.2, 9.5.0, 9.5.5

Required Tasks

Because Open Text Livelink software is not included with Oracle SES, certain files must be copied manually into Oracle SES. Copy the lapi.jar file from LAPI installation folder into ORACLE_HOME/search/lib/plugins/llcs.

The Directory Services module of Livelink should be installed with Livelink (if users/groups are importing from LDAP server and you want to use the Active Directory identity plug-in).

To import users/groups of Active Directory in Livelink, follow these steps to import users/groups of Active Directory in Livelink Server.

Importing Users/Groups from LDAP to Livelink

Create an LDAP user that has permissions in Active Directory to administer users and groups. This user is used to synchronize the Active Directory with Livelink.
To extend the schema of Active Directory, install the Active Directory Schema snap-in as under:
1. Select Run from Windows Start menu.
2. Type mmc /a in the Open field and click OK.
3. On the Console menu, choose Add/Remove Snap-in and click Add.
4. Under Snap-in, double-click Active Directory Schema. Click Close, then OK. Save the console (for example, as "Active Directory Schema.msc"). If the new snap-in does not appear under Snap-in, then you may have to re-install the Windows 2003 Administrative Tools and start again at step 2.
Open the file ot-livelink-schema.conf (it is in the directory <livelink_home>/ module/directory_2_3_0) in a text editor.
Open the Active Directory Schema console by clicking the Windows Start button, pointing to Programs - Administrative Tools and selecting (based on the sample name given) Active Directory Schema.msc.
Right-click Active Directory Schema and select Operations Master.
Right click the Attributes folder and select Create Attribute.

Create the attribute llserverinfo using the information from ot-livelink-schema.conf as follows:

Table 5-5 llserverinfo Values

Name	Value
Common Name	llserverinfo
LDAP Display Name	llserverinfo
Object ID	<Oracle Internet Directory> from `ot-livelink-schema.conf`
Syntax	Case-insensitive string
Multivalued	checked

Create the attribute llquery using the information from ot-livelink-schema.conf as follows:

Table 5-6 llquery Values

Name Value

Common Name

llquery

LDAP Display Name

llquery

Object ID

<OID> from ot-livelink-schema.conf

Syntax

Case-insensitive string

Multivalued

unchecked

Name	Value
Common Name	llquery
LDAP Display Name	llquery
Object ID	<OID> from ot-livelink-schema.conf
Syntax	Case-insensitive string
Multivalued	unchecked

Browse through the Directory Services Administration section of the Livelink Administration page for the enabling the following configuration:

Enabling the Synchronization Features:

Click the Choose Directory Services link.

Select LDAP Synchronization (Read-Only LDAP) from the Synchronization list.

For Livelink CGI Hosts, specify 127.0.0.1,<LIVELINK_SERVER_IP>

Click Save Changes.

Configuring LDAP Read-Only Parameters:

Table 5-7 LDAP Read-Only Parameters

Parameter	Value
New User Password Policy	Hidden
User name Case Sensitivity	Preserve case
Livelink Server Name	Computer name on which Livelink Server is running
LDAP Server	Computer name or IP Address on which LDAP server is running
LDAP Server Port	389
Search Root	cn=Users,dc=otdomain,dc=com
LDAP User name	cn=<LDAP_User_Name>,cn=Users, dc=otdomain,dc=com
LDAP Password	<LDAP_User_Password>
Log-in Name	sAMAccountName or cn
First Name	givenname
Last Name	sn
Title	title
E-mail	mail
Contact	telephonenumber
Department Mapping	disable
Group Name	cn
Group Leader	managedBy
Group Member	Member
Group Member Query	llquery
Privileges	Select Log-in enabled, Public Access
Group Search Filter	objectclass=group
Synchronize Group	checked

Click Save Changes.

Click Synchronize LDAP Read-only.

Click Synchronize.

Known Issues

If you update the attribute list, then you must update the crawler re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedules page, and re-crawl the source.

Setting Up Identity Management for Open Text

The Livelink Enterprise Server identity plug-in authenticates native users of Livelink Enterprise Server. The identity plug-in communicates with the directory to authenticate a user's credentials, validate a user or group and return the associated canonical form, and return the groups associated with a given user.

Activate the identity plug-in on the Global Settings - Identity Management Setup page.

Creating an Open Text Livelink Source

Create an Open Text source on the Home - Sources page. Select Open Text from the Source Type list, and click Create. Enter values for the following parameters:

User name: Name of a valid Livelink Enterprise Server user. The user must be an Administrator user or a user who has access to all folders and documents of the workspaces configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from folders, documents and other custom sub classes of all workspaces configured in Container name parameter. This is a required parameter.
Password: Password of the Livelink user. This is a required parameter.
Server Name and Port Number for Livelink: The computer name/IP address and the port number on which Livelink server is running. The format is <server name>:<port>.
Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Livelink Workspace or a specific folder. The format for is: <Workspace Name>/<Folder Name>/. Multiple comma-delimited container names can be entered. This is a required parameter. For example:
- Container name: Workspace1: The entire Workspace1 will be crawled.
- Container name: Workspace2/Folder21: Folder21 and its sub-folders within Workspace2 will be crawled.
Attribute list: The comma-delimited list of Livelink attributes along with their data types to be searchable. The format for attribute list is <attribute name>:<attribute type>, <attribute name:attribute type>. Valid values are String, Number, and Date.

Table 5-8 Open Text Data Types

Sr. No Open Text Data Type Oracle SES Data Type

1

Boolean

String

2

Integer

Number (Big Decimal)

3

String

String

4

Date

Date

While crawling a Workspace an attribute is indexed only if both name and type match with configured name and type; otherwise, it will be ignored. This is an optional parameter. For example: If the administrator wants to make the following Livelink attributes searchable:
- Attribute name: account name attribute type: String
- Attribute name: account ID attribute type: Integer
- Attribute name: creation date attribute type: Date
The value of Attribute List should be

Account Name: String, Account ID: Number, Creation Date:Date

The default searchable attributes for Livelink Enterprise Server will be Modified Date, Title, and Author.

Multiple attributes with same name are not allowed. For example Emp_ID:String, Emp_ID:Number
Crawl versions: Indicates whether multiple versions of documents should be crawled, either true or false. This is an optional parameter and the default value is false. If any other value is provided, it is assumed to be false; in this case, only latest versions of a document will be crawled.
Crawl folder attributes: Indicate whether folder attributes need to be crawled, either true or false. This is an optional parameter. The default value is false. If any other value is provided, it is assumed to be false.
Authentication attribute: The attribute used to set ACL. With Active Directory, the value is USER_NAME. With the Livelink identity plug-in, the value is NATIVE. This is a required parameter. This parameter is case-sensitive.
Crawl objects with public access: This parameter indicates whether objects with public access should be crawled without any ACL. Valid values are true or false. If false, then all objects having this ACL will be ignored.
Livelink URL: The Livelink URL for viewing objects from the Livelink Server. For example, for Windows, the URL should be (http or) https://<host>/<livelink_service>/livelink.exe. For other application servers like Weblogic, Tomcat, and WebSphere, the URL should be (http or) https://<host>:<port>/<livelink_service>/livelink.

Setting Up Oracle Content Database Sources

Documents in Oracle Content Database are organized into folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.

The metadata crawled includes folder_url (URL of the folder containing the document) and folder_path (path of the folder containing the document). These let you show the direct folder path and direct folder URL for each document hit.

Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.

Important Notes for Oracle Content Database Sources

This book uses the product name Oracle Content Database to mean both Oracle Content Database and Oracle Content Services. Oracle Conte nt Database sources are certified with Oracle Content Database release 10.2 and release 10.1.3 and Oracle Content Services release 10.1.2.3.

Known Issues

The administrator account used by the Oracle Content Database source must have the ContentAdministrator role on the site that is being crawled and indexed. Also, end users searching documents in Oracle Content Database must have the GetContent and GetMetadata permissions.
By default, Oracle Content Database has a limit of three concurrent requests (simultaneous operations) for each user. However, Oracle SES has a default of five concurrent crawler threads. When crawling Oracle Content Database, only three of the five threads can successfully crawl, which causes the crawl to fail.

Workaround: For an Oracle Content Database source, change the Number of Crawler Threads on the Home - Sources - Crawling Parameters page to a value less than or equal to three.

Or, modify the Oracle Collaboration Suite configuration in Oracle Enterprise Manager to allow more than three concurrent requests. For example:
1. Access the Enterprise Manager page for the Collaboration Suite Midtier. For example: http://computer.domain:1156/.
2. Click the Oracle Collaboration Suite midtier standalone instance name. For example: ocsapps.computer.domain.
3. In the System Components table, click Content.
4. From Administration, click Node Configurations.
5. In the Node Configurations table, click HTTP_Node. For example: ocsapps.computer.domain_HTTP_Node.
6. On Properties, change the value for Maximum Concurrent Requests Per User. Enter a value larger than or equal to the number of crawling threads used by Oracle SES. This value is listed on the Global Settings - Crawler Configuration page.

Setting Up Identity Management for Oracle Content Database Sources

The Oracle SES instance and the Oracle Content Database instance must be connected to the same or mirrored Oracle Internet Directory system or other LDAP server. Follow these steps to set up a secure Oracle Content Database source:

Read Known Issues and confirm that the number of crawler threads does not exceed the available concurrent connection settings for each user in Oracle Content Database.
Activate the Oracle Internet Directory identity plug-in for the Oracle Content Database instance on the Global Settings - Identity Management Setup page in Oracle SES.
For 10.1.2.3 and 10.2.x, use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)
```
$ORACLE_HOME/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f  $ORACLE_HOME/search/config/ldif/csPlugin.ldif
```
Where $ORACLE_HOME is the directory where Oracle SES was installed.

This defines the entity that will be used for the plug-in: orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext. The entity will have the password welcome1.

Creating an Oracle Content Database Source

If Oracle Content Database release 10.2 or Oracle Content Services release 10.1.2 is used, then the Entity name and Entity password parameters are required, the last 6 parameters related with keystore are not required, and the crawler plug-in will use service to service (S2S) authentication to connect to Oracle Content Database.

If Oracle Content Database release 10.1.3 is used, then the last six parameters in the following table are required, the Entity name and Entity password are not required, and Oracle SES will use Web services authentication to connect to Oracle Content Database.

Create an Oracle Content Database source on the Home - Sources page. Select Oracle Content Database from the Source Type list, and click Create.

Enter values for the following parameters:

Table 5-9 Oracle Content Database Source Parameters

Parameter	Value
Oracle Content Database URL	http://host name:port/content
Starting paths	/
Depth	-1
Oracle Content Database admin user	orcladmin
Entity name	`orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext`
Entity password	welcome1
Crawl only	false
Use e-mail for authorization	false
Oracle Content Database Version	For example, 10.1.3.2.0
SES keystore location	For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks
SES keystore type	jks
SES keystore password	*******
SES private key alias	client
SES private key password	*******
CDB Server public key alias	server

Table 5-10 Oracle Content Database Authorization Manager Plug-in Parameters

Parameter	Value
Oracle Content Database URL	http://host name:port/content
Oracle Content Database admin user	orcladmin
Entity name	`orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext`
Entity password	welcome1
Use e-mail for authorization	false
Use result filter for authorization	false
Oracle Content Database Version	For example, 10.1.3.2.0
SES keystore location	For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks
SES keystore type	jks
SES keystore password	********
SES private key alias	client
SES private key password	*******
CDB Server public key alias	server

Note:

You can use a real-time result filter (query-time authorization) to ensure that the user has access to each result document. Set the Use result filter for authorization parameter to true to remove documents that the user has lost access to since the last crawl.

Required Tasks for Oracle Content Database Release 10.1.3

This section describes the required steps for Web services authentication when using Oracle Content Database release 10.1.3. This uses the JDK keytool to create the keys.

Configure a server keystore at the Oracle Content Database middle tier if the keystore is not set up yet.

See Also:
http://download-west.oracle.com/docs/cd/B32110_01/content.1013/b32191/security.htm#CHDGCJEH

The file $ORACLE_HOME/j2ee/OC4J_Content/config/oc4j.properties defines the keystore type and the keystore properties file location. If you use a different file name for the keystore, then edit the file on the following entry: oracle.ifs.security.KeyStoreLocation=/home/oracle/product/10.1.3.2.0/OracleAS_1/content/settings/server-keystore.jks.
1. Change directory to settings:
```
cd $ORACLE_HOME/content/settings 
```
2. Create the Oracle Content Database server keystore with the following keytool command:
```
$ORACLE_HOME/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
-alias server -keystore server-keystore.jks -dname "cn=server" -keypass 
welcome1 -storepass welcome1
```
  to list the keys in store:
```
$ORACLE_HOME/jdk/bin/keytool -list -keystore server-keystore.jks 
-keypass welcome1 -storepass welcome1
```
3. Sign the key before using the key:
```
$ORACLE_HOME/jdk/bin/keytool -selfcert -validity 5000 -alias server 
-keystore server-keystore.jks -keypass welcome1 -storepass welcome1
```
4. Export the server public key from the server keystore to a file:
```
$ORACLE_HOME/jdk/bin/keytool -export -alias server -keystore 
server-keystore.jks -file cdbServer.pubkey -keypass welcome1 -storepass 
welcome1
```
5. Store both the keystore password and the private server key password in a secure location so Oracle Content Database can access the keystore and the private key.
```
$ORACLE_HOME/content/bin/changepassword -k
```
  When prompted for the old password, press [Enter] if it is first time to set the password; otherwise, enter the previous password. Then, enter and confirm the keystore password (-storepass welcome1) that you provided in step 1.b.
  
  See Also:
  $ORACLE_HOME/content/log/changepassword.log
```
$ORACLE_HOME/content/bin/changepassword -p
```
  When prompted for the old password, press [Enter] if it is first time to set the password; otherwise, enter the previous password. Then, enter and confirm the private server key password (-keypass welcome1) that you provided in step 1.b.

Configure a client keystore at the Oracle SES installation.

See Also:

http://download-west.oracle.com/docs/cd/B32110_01/webcenter.1013/b31074/jpsdg_content.htm#DAFDDBIC

Create the SES client keystore with the following keytool command:

$ORACLE_HOME/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
-alias client -keystore sesClientKeystore.jks -dname "cn=client" 
-keypass welcome1 -storepass welcome1

to list the keys in store:

$ORACLE_HOME/jdk/bin/keytool -list -keystore sesClientKeystore.jks 
-keypass welcome1 -storepass welcome1

Sign the key before using the key:

$ORACLE_HOME/jdk/bin/keytool -selfcert -validity 5000 -alias client 
-keystore sesClientKeystore.jks -keypass welcome1 -storepass welcome1

Restart the WebCenter middle tier from the Oracle Enterprise Manager console.

Export the server public key from the server keystore to a file:

$ORACLE_HOME/jdk/bin/keytool -export -alias client -keystore 
sesClientKeystore.jks -file sesClient.pubkey -keypass welcome1 
-storepass welcome1

Import Oracle SES client public keys into the Oracle Content Database server keystore (sesClient.pubkey must be copied to Oracle Content Database):

cd $ORACLE_HOME/content/settings
 
$ORACLE_HOME/jdk/bin/keytool -import -alias client -file 
sesClient.pubkey -keystore server-keystore.jks -keypass welcome1 
-storepass welcome1

Import Oracle Content Database server public keys into the Oracle SES keystore (cdbServer.pubkey must be copied to Oracle SES):

$ORACLE_HOME/jdk/bin/keytool -import -alias server -file 
cdbServer.pubkey -keystore sesClientKeystore.jks -keypass welcome1 
-storepass welcome1

Note:

Check the server logs at $ORACLE_HOME/content/logs for keystore issues with the crawler plug-in.

Setting Up Oracle Content Server Sources

Oracle Content Server (formerly known as Stellent Content Server) is the foundation of the Oracle Universal Content Management solution. It enables users throughout the organization to contribute content from native desktop applications, manage content through rich library services, publish content to Web sites or business applications, and access the content with a browser. The Oracle Content Server connector is based on the XML connector framework.

See Also:

"Overview of XML Connector Framework"
Stellent documentation on Oracle Technology Network (OTN) for information about Oracle Content Server:

http://www.oracle.com/technology/documentation

Oracle Content Server includes an XML feed generator component (RSSCrawlerExport) on top of the content server. This component generates XML feeds as XML files from its internal indexer, based on indexer activity. It has access to the original content (for example, a Microsoft Word document), the Web viewable rendition, and all the metadata associated with each document. The component also has a template that contains a Idoc script (Idoc is an Oracle Content Server scripting language) that applies the metadata values from the indexer to generate the XML document. Oracle Content Server generates feeds for all documents for the initial crawl, as well as feeds for newly-inserted, updated and deleted documents for the incremental crawl. Each document can be an item in the feed, together with the operation on the item (for example, insert, delete, update), its metadata (for example, author, summary), URL links, and so on.

The Oracle Content Server connector reads the feeds provided by Oracle Content Server, driven by the crawling schedule. Oracle SES parses, extracts the metadata information, and fetches the document content using its XML connector framework.

Oracle SES supports two types of feeds.

Control feed: Individual feeds can be located anywhere and a control feed file is generated containing the links to other feeds. This control file is input to the connector through the configuration file. Control feed must be used when two computers are on different domains or on different platforms, or if they use remote access protocol, such as HTTP or FTP, for communication between the two servers.
Directory feed: All data feeds are placed in a directory, and this directory is input to the connector through the configuration file. A common directory feed configuration is to install Oracle SES on a computer where there is a shared drive with the Oracle Content Server computer.

Important Notes for Oracle Content Server Sources

This section provides important information about Oracle Content Server sources.

To index multibyte character sets, the default character set of the crawler (Source -> Crawling Parameters ) must be set to UTF-8 regardless of the character set on the Oracle Content Server side.

Required Software

Oracle Content Server 7.1.1, 7.5.2 or 10gR3 with RSSCrawlerExport (the Oracle Content Server XML component)

Known Limitations

Before re-crawling all documents, get a snapshot of the RSSCrawlerExport service.
If you have documents with multibyte characters, then set standard UTF-8 as the default character set of crawling parameters.
The file feed location must be referenced the same way on both computers; for example, /shared_drive/dir1/dir2 or \\<computer_name or IP>\<feeddirectory>.
If the Oracle Content Server feeds for Oracle SES are on a network drive, then the Oracle process should be started as a user who has access to the drive.

See Also:
"Required Tasks" for instructions on how to change the user running the Oracle process.

Oracle Content Server Security Model

The Oracle Content Server security model is based on the concept of permission, which defines the privileges a user has on a document. The following table shows the set of permissions supported by Oracle Content Server. Each permission is a superset of the ones above. For example, a write permission automatically includes read permission. An admin permission is a superset of all the permissions.

Table 5-11 Oracle Content Server Permissions

Permission	Description
Read	View documents
Write	View, Check In, Check Out, and Get Copy of documents
Delete	View, Check In, Check Out, Get Copy, and Delete documents
Admin	View, Check In, Check Out, Get Copy, and Delete documents

Oracle Content Server provides multiple security models, including out-of-box security system and integration with centralized security models such as LDAP and Active Directory. The Oracle Content Server connector supports the two most popular security models among current Oracle Content Server customers: Roles and Groups, and Accounts.

Roles and Groups

A security group is a set of files grouped under a unique name. Every file in the library belongs to a security group. Access to security groups is controlled by the permissions, which are assigned to roles, which are assigned to users. For example, the EngAdmin role has Read, Write, Delete, and Admin permission to all content in the EngDocs security group. User Joe is assigned to role EngAdmin; therefore, Joe has all permissions to the documents in EngDocs group.

Accounts

Accounts provide greater flexibility and granularity than groups. An account is a group of content. It introduces another metadata field that is filled out upon content check-in. When accounts are enabled, content items also can be assigned to an account in addition to the security group. A user must have access to the account to read, write, delete or administer content in that account. When accounts are used, the account becomes the primary permission to satisfy before security group permissions are applied.

A user's access to a document is like the intersection between their account permissions and security group permissions. For example, a user is assigned the EngAdmin role, which has all permissions to the documents in EngDocs security group. At the same time, the user is also assigned Read and Write permission to the EngProjA account. Therefore, the user has only Read and Write permission to a content item that is in the EngDocs security group and the EngProjA account.

Accounts can also be set up in a hierarchical structure. A user has permission to the entire subtree starting from the node where he has the account. For instance, if he is assigned the Eng account, then he has access to Eng/AbcProj and Eng/XyzProj, or any accounts beginning with Eng. In other words, if a user has permission to a particular account prefix, they have access to all accounts with that prefix.

Note:

Oracle Content Server uses a prefix test for accounts filtering; therefore, '/' has no special meaning. A user granted permission to account A has access to any documents in account A*, such as A, AB, or A/B. The hierarchical structure takes advantage of the prefix semantics, but it is not enforced with the account model. Hence, there is no special character as the level divider when testing for account permissions.

See Also:

Oracle Content Server documentation

Setting Up Identity Management for Oracle Content Server

Activate the Oracle Content Server identity plug-in on the Global Settings - Identity Management Setup page. Select Oracle Content Server and click Activate.

Enter values for the following parameters:
- HTTP endpoint for authentication: HTTP endpoint for Oracle Content Server authentication. For example, http://my.host.com/idc/idcplg
- Admin User: Administrative user to access the Oracle Content Server Identity Service API
- Password: Administrative user password
Click Finish.

Creating an Oracle Content Server Source

Create an Oracle Content Server source on the Home - Sources page. Select Oracle Content Server from the Source Type list, and click Create. Enter values for the following parameters:
- Configuration File URL: URL of the XML configuration file providing details of the source, such as the data feed type, location, security attributes, and so on. The RSSCrawlerExport component creates configFile.xml in the feed location directory. Obtain the location of the file from the Oracle Content Server administrator.
 
 This file can be accessed over HTTP using the URL: http://<host>:<port>/<server Instance Name>/idcplg?IdcService=RSS_CRAWLER_DOWNLOAD_CONFIG&source=<sourceName>
 
 For example, http://stawg07.us.oracle.com:90/idc/idcplg?IdcService=RSS_CRAWLER_DOWNLOAD_CONFIG&source=ocs1
 
 Note: This sourceName is different from the Oracle SES source name. This should match the name provided while configuring the RSSCrawlerExport component over Oracle Content Server.
- Authentication Type: Standard Java authentication type used by the application serving the control and data feed. This parameter is relevant when the feeds are accessed over HTTP. Enter BASIC for basic authentication, FORM for form-based authentication, or NATIVE for proprietary XML over HTTP authentication.
 
 This parameter is not required for directory feed.
- User ID: User ID to access the data feeds, if the data feeds are to be accessed over HTTP/FTP. The access details of the data feed are specified in the configuration file. This can be obtained from the Oracle Content Server administrator.
 
 This parameter is not required for directory feed on a shared file system.
- Password: Password to access the data feeds. This can be obtained from the Oracle Content Server administrator.
 
 This parameter is not required for directory feed on a shared file system.
- Realm: Realm of the application serving the control and data feed. This parameter is relevant when the feeds are accessed over HTTP, and it is mandatory when the authentication type is BASIC.
 
 This parameter is not required for directory feed.
- Scratch Directory: A directory, in the computer where Oracle SES is installed, to temporarily write the status logs.
 
 This parameter is optional.
- Maximum number of connection attempts: Maximum number of attempts to connect to the target server to access the data feed.
Click Next.
Enter values for the authorization plug-in parameters:
- HTTP endpoint for authorization: HTTP endpoint for Oracle Content Server authorization. For example, http://my.host.com/idc/idcplg
- Display URL Prefix: HTTP host information to prefix the partial URL specified in the access URL of the documents in XML feeds to form the complete URL. This complete URL will be the display URL of the document when the document link in the Oracle SES search results page is clicked. For example, http://my.host.com/.
- Administrator User: Administrative user to access the Authorization Service API of Oracle Content Server
- Administrator Password: Administrative user password
- Display crawled version: If set to 'true', then the search result points to the crawled version of the document; if set to 'false', then the result points to the content information page. Currently, only 'false' is supported.
- Authorization user ID format: Format of user ID in the active identity plug-in that is used by Oracle Content Server Authorization API. For example, username, email, nickname, user_name.
  
  For the Oracle Content Server native identity plug-in, this parameter should be username.
  
  For Active Directory, Oracle Internet Directory or OpenLDAP, this parameter depends on the LDAP provider of Content Server. If ldapprovider is configured to use the user ID, then this parameter is user_name for Active Directory and OpenLDAP, nickname for Oracle Internet Directory. If ldapprovider is configured to another attribute like e-mail, then this parameter should be email.
Click Create to create the Oracle Content Server source.