Setting Up IBM DB2 Content Manager Sources

The IBM DB2 Content Manager (ICM) plug-in extends the searching capabilities of Oracle SES to search ICM repositories, which consists of item types and their instances in the form of folders and documents. Oracle SES can crawl documents and metadata in the ICM Library Server and provide secure, full-text search. Starting from the specified folders, the plug-in extends the crawling and thus the search, into their complete child tree of any specified folder. If an item type is specified for crawling, then the plug-in crawls all instances of the item types and their complete child trees.

In ICM, the library server manages the content metadata and access control to all content in a database (such as DB2), interfacing to one or more resource managers. The primary job of the Library Server is to service client requests for content. The ICM plug-in navigates through the library server to crawl documents and folders in the specified item types. It stores the metadata and accesses information in Oracle SES to provide search according to the credentials of the end users.

While the crawler connects to the library server through the APIs, the library server internally connects with the resource manager through CM-managed secure tokens. Whenever a reference is made to the document object, they are fetched from the resource manager using these tokens. With the crawler plug-in, metadata corresponding to a document is retrieved from the library server while the display URL points to the document-object on the resource manager using the token.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the recent most crawl. A document is re-crawled if either the content, metadata, display URL, or the direct security access information of the document has changed. Documents deleted from a database are removed from the index during incremental crawling.

Important Notes for IBM DB2 Content Manager Sources

  • The user account used to crawl the specified item types must be an Administrator account that has access on all instances (documents and folders) to the specified item types and can retrieve and crawl all folders and documents. The administration user specified for crawling must belong to the ICMPUBLIC group and the AllPrivs privilege set.

  • The version of DB2 Content Manager used to set up the repositories for crawling must be 8.3.

Required Software

This section lists required software (in order of installation) for the installation of DB2 Content Manager 8.3:

Server Software Requirements (Computer with ICM Server): 

  • Windows Server 2003 Enterprise Edition

  • IBM WebSphere Application Server 5.1 plus FixPak 1

  • IBM DB2 Universal Database Enterprise Server Edition (32-bit): 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special

  • DB2 Content Manager Enterprise Edition 8.3 plus FixPak1

  • DB2 Information Integrator for Content 8.3 with Fix Pack 3

  • DB2 Content Manager eClient 8.3

Client Software Requirements (Computer with Oracle SES): 

  • IBM DB2 Run-Time Client: 8.1 plus FixPak 7A special or version 8.2 plus FixPak 7A special

  • DB2 Information Integrator for Content 8.3 with Fix Pack 3

  • DB2 Content Manager Client for Windows 8.3 (optional for Windows)

Required Tasks on the Server

The following tasks must be performed on the computer with ICM server.

To install and configure the system with ICM server: 

  1. Install DB2 Content Manager 8.3 with the required fix-packs.

  2. Enable LDAP on DB2:

    1. Open the System Administration Client.

    2. Select Tools - LDAP Configuration to display the LDAP Configuration window.

    3. Select Enable LDAP User import and authentication

    4. On the Server tab, select server type Active Directory.

    5. Provide the LDAP server information on the Server page.

    6. Click OK.

  3. Import users and groups from the Active Directory to ICM:

    1. In the system administration client, click Authentication and then right-click either Users or User-Groups.

    2. Click the LDAP button and then enter the user to be imported into ICM. To view a list of all valid user names, click Show All.

    3. Select one or more users and click OK.

    4. From the Assign to Groups tab, assign the users to the required groups.

    5. From the Set Defaults tab, specify the default resource manager, collection and item access control list for the users, user-groups, or both.

    6. Click OK or Apply.

      The selected users and user-groups are imported into the DB2 CM environment.

    7. To verify the import, select Users or User-Groups. The imported users or user-groups appear in the list on the right.

Required Tasks on the Client Side

Catalog the DB2 run-time client with DB2 Content Manager Library database.

To install and configure the system with Oracle SES: 

  1. Locate the services file in \WINDOWS\system32\drivers\etc or similar directory on Windows and the /etc directory on Linux.

  2. Open the services file in a text editor and add these lines:

    [Service Name]    [Port #]/tcp #DB2 connection service port
    Example: db2c_DB2 50000/tcp    #DB2 connection service port
    
  3. Enter the following commands from the command line processor, where node_name is any name of your choosing:

    catalog tcpip node node_name remote [IP_address | host] server service_name
    

    In this example, node_name is CMDB, host is my_computer, and service_name is db2c_DB2:

    catalog tcpip node CMDB remote my_computer server db2c_DB2
    
  4. Enter the following command, where database_alias is a name of your choosing and node_name was specified in the previous step:

    catalog db database_name as database_alias at node node_name
    

    In this example, the alias is the same as the database name (ICMNLSDB) and the node name is CMDB.

    catalog db ICMNLSDB as ICMNLSDB at node CMDB
    
  5. To check the connection, issue the following command:

    connect to database_alias user database_user using password
    

    In this example, the ICMADMIN user connects to ICMNLSDB.

    connect to ICMNLSDB user ICMADMIN using password
    
  6. Select tabname from syscat.tables. All table names in the database are listed.

Known Issues

  • Oracle SES does not crawl folders that have all blank attributes.

  • The ICM plug-in does not support CLOB attributes because of a limitation when using these attributes with XPath queries.

  • To use the ICM eClient application to view search results, Oracle recommends that users log in to eClient first and then open the Oracle SES search screen in the same window. If a user opens the Oracle SES search results directly, then ICM eClient may prompt the user to log in. Then the user must manually refresh the Oracle SES page to view the selected document.

  • Change of the item type ACL does not update the items or documents (and their last modified date) of that item type. Whenever an ACL of an item type is changed from the System Administration client, the effective change on the items/documents of that item type can be reflected only through a force re-crawl. Change the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.

  • When crawling an item type hierarchy of multiple levels, the crawler might signal this error:

    com.ibm.mm.sdk.common.DKUsageError: DGL7146A: The query string is too long or too complex

    The CM query has a length restriction of 64k. DB2 UDB does not have such a restriction, and the problem can be fixed by removing the 64K limitation check from the API and allowing the Library Server database determine the limit.

Setting Up Identity Management for DB2 Content Manager

Activate the ICM identity plug-in on the Global Settings - Identity Management Setup page with the following parameters:

  • Library Server name: The name of the alias of the Library Server of DB2 Content Manager that must be connected to retrieve all the item types required for crawling.

  • User name: User name of a valid ICM Server user. Required.

  • Password: Password of the ICM user. Required.

  • ICM Servers File: Specifies the absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.

  • ICM Environment File: Specifies the absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.

The required ICM Server (cmbicmsrvs.ini) and ICM Environment (cmbicmenv.ini) files can be found on the client side (computer with Oracle SES) at

ICM_InstallationFolder/cmgmt/connectors/cmbicmsrvs.ini and

ICM_InstallationFolder/cmgmt/connectors/cmbicmenv.ini

Creating an IBM DB2 Content Manager Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for these configuration parameters:

  • Container name: The item types to be crawled. This can be a specific item type whose instances need be crawled, or a folder/sub-folder if all item types inside that folder or sub-folder must be crawled. Container name can be a combination of multiple item types delimited by a slash (/). Note that a backslash (\) is an unacceptable delimiter.

    Container names must be in the format:

    parent_item_type_name[@parent_attribute_name=attribute_value]/child_item _type_name[@child_attribute_name=child_attribute_value]

    or

    child_item _type_name[@parent_attribute_name=attribute_value,@child_attribute_name=child_attribute_value]

    For example, you might have a root-component item type named Level-1 with attribute Attribute1 whose value is Value-1. You have another item type Level-2 that is child of Level-1, with attributes Attribute-1 (linked with Level-1) Attribute-2 with value Value-2. You have another item type Level-3 that is a child of Level-2 and has attributes Attribute-1, Attribute-2 (linked attributes) and Attribute-3 with value Value-3.

    If the user wants to crawl all items formed with item type Level-3 then the container name is:

    Level-1[@Attribute-1="Value-1"]/Level-2[@Attribute-2="Value-2"]/Level-3
    

    or

    Level-3[@Attribute-1="Value-1" AND @Attribute-2="Value-2"]
    

    The values for String and Date attributes are enclosed in double quotes while the values for Number attributes are not.

  • Attribute list: The comma-delimited list of ICM attributes along with their data types to be searchable. The format is:

    AttributeName:AttributeType, AttributeName:AttributeType

    Valid values are String, Number, and Date.

    A database crawl indexes an attribute only if both name and type match the configured name and type; otherwise, the attribute is ignored. Optional.

    The default searchable attributes for ICM are Modified Date, Title, and Author. This attribute is case-sensitive, and multiple attributes with same name are not allowed.

  • User name: The ICM user name used for crawling. It must be a user with at least read privileges on the configured item types. This setting is used to make a session with ICM to get ACL, Document List, metadata, and content.

  • Password: The password of the ICM user in User Name.

  • Crawl versions: Controls whether all versions of a document are crawled or only the latest version. Valid values are true and false. The default value is false. Any other value is interpreted as false.

  • Crawl folder attributes: Controls whether folder metadata is indexed. Valid values are true and false. The default value is false.

  • Library server name: The name of the alias of the Library Server of DB2 Content Manager that must be connected to retrieve all item types required for crawling.

  • Remove URL not in queue: Controls whether documents deleted from ICM are also removed from the index. Valid values are true and false. The default value is false.

  • Authentication attribute: The authentication attribute used to validate the ACL. The value for the Active Directory identity plug-in is USER_NAME, and for ICM identity plug-in is NATIVE. Required

  • WebClient path: The path of an optional Web application used to render the search results. ICM allows the rendering of search results in ICM eClient and a custom Web application, which must be deployed separately on the ICM application server.

  • Title field: A case-sensitive, comma-delimited list of attributes that can be used as the titles in the ICMD containers specified for crawling. Required.

  • Time Zone: The time zone of the ICM library server. Because the library-server of ICM could be in a different time zone than the Oracle SES server, this attribute enables the Oracle SES time zone to be converted to the ICM time zone for time-based queries. If an invalid time zone is entered, then GMT is used by default.

  • ICM Servers File: The absolute path of the cmbicmsrvs.ini file. This INI file stores the source information for the data store.

  • ICM Environment File: The absolute path of the cmbicmenv.ini file. This INI file stores the database connect information.

  • Use ICM eClient to view search results: Controls whether ICM eClient is used to view search results or some other Web application. Enter true for ICM eClient; false otherwise.