Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 2 (11.2.2)

Part Number E23427-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

Setting Up Oracle Content Database Sources

Documents in Oracle Content Database are organized into folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.

The metadata crawled includes folder_url (URL of the folder containing the document) and folder_path (path of the folder containing the document). These let you show the direct folder path and direct folder URL for each document hit.

Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.

Important Notes for Oracle Content Database Sources

This book uses the product name Oracle Content Database to mean both Oracle Content Database and Oracle Content Services. Oracle Content Database sources are certified with Oracle Content Database release 10.2 and release 10.1.3 and Oracle Content Services release 10.1.2.3.

Known Issues:

  • The administrator account used by the Oracle Content Database source must have the ContentAdministrator role on the site that is being crawled and indexed. Also, end users searching documents in Oracle Content Database must have the GetContent and GetMetadata permissions.

  • By default, Oracle Content Database has a limit of three concurrent requests (simultaneous operations) for each user. However, Oracle SES has a default of five concurrent crawler threads. When crawling Oracle Content Database, only three of the five threads can successfully crawl, which causes the crawl to fail.

    Workaround: For an Oracle Content Database source, change the Number of Crawler Threads on the Home - Sources - Crawling Parameters page to a value of 3 or fewer.

    Or, modify the Oracle Collaboration Suite configuration in Oracle Enterprise Manager to allow more than three concurrent requests. For example:

    1. Access the Enterprise Manager page for the Collaboration Suite Midtier. For example: http://example.domain:1156/.

    2. Click the Oracle Collaboration Suite midtier standalone instance name. For example: ocsapps.example.domain.

    3. In the System Components table, click Content.

    4. From Administration, click Node Configurations.

    5. In the Node Configurations table, click HTTP_Node. For example: ocsapps.computer.domain_HTTP_Node.

    6. On Properties, change the value for Maximum Concurrent Requests Per User. Enter a value larger than or equal to the number of crawling threads used by Oracle SES. This value is listed on the Global Settings - Crawler Configuration page.

Setting Up Identity Management for Oracle Content Database Sources

The Oracle SES instance and the Oracle Content Database instance must be connected to the same or mirrored Oracle Internet Directory system or other LDAP server.

To set up a secure Oracle Content Database source: 

  1. Read "Known Issues:" and confirm that the number of crawler threads does not exceed the available concurrent connection settings for each user in Oracle Content Database.

  2. Activate the Oracle Internet Directory identity plug-in for the Oracle Content Database instance on the Global Settings - Identity Management Setup page in Oracle SES.

  3. For Oracle Content Database 10.1.2.3 and 10.2.0.4, use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)

    ORACLE_HOME/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f  ORACLE_HOME/search/config/ldif/csPlugin.ldif
    

    This defines the entity that is used for the connector: orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext. The entity has the password welcome1.

Creating an Oracle Content Database JDBC Source

The Content Database JDBC connector is an alternative to the Content Database connector provided in Oracle SES Release 10.1. The JDBC connector greatly improves the performance of incremental crawls. If the elapsed time of an incremental crawl is an important consideration in your deployment of Oracle SES, then use the JDBC connector.

Oracle SES crawler supports crawling from Oracle Content Database 10.1.2.0.4 or later. See the readme file for Oracle Content Database 10.2.1.0.4 patchset for details on configuring high volume full and incremental crawls in Oracle Content Database.

You may need to grant the SES user access to an Oracle Content Database object. Use this command:

GRANT SELECT ON ODMC_ALERT_SEQ TO sesuser

where sesuser is the SES user.

For example,

GRANT SELECT ON ODMC_ALERT_SEQ TO SEARCHSYS

Note:

The JDBC connector requires installation of a patch to Oracle Content Database. If the patch is not available for your version of Content Database, then use the older connector as described in "Creating an Oracle Content Database Source".

To create an Oracle Content Database JDBC source: 

  1. Open the Oracle SES Administration GUI to the Home page.

  2. Select the Sources secondary tab.

  3. For Source Type, select Oracle Content Database (JDBC), then click Create to display Step 1 Parameters.

  4. Enter a source name and the values for the parameters described in Table 6-6.

  5. Click Next to display Step 2 Authorization.

  6. Enter the settings described in Table 6-7.

  7. Click Create or Create and Customize to create the source.

Table 6-6 Oracle Content Database JDBC Source Parameters (Step 1)

Parameter Value

Database Connection String

JDBC connection string to Oracle Content Database in the form jdbc:oracle:thin@server:port:sid. For example, jdbc:oracle:thin@example.com:1521:rel11g

Content DB System User

SYSTEM user for Content Database.

Alert Table Name

Name of the Alert table for Content Database, which typically has the form ODMC_ALERT_name.

Database User ID for Crawl

Valid user ID for the Content DB database.

Database Password for Crawl

Password associated with the user ID for crawling.

Document Count

Maximum number of documents to be crawled.

URL Prefix

URL to Oracle Content Database in the form HTTP://hostname:port/CONTENT. For example, HTTP://example.com:7778/CONTENT.

Document Access (DAV) User ID

Valid Content Database user ID for using WebDAV to access documents.

Document Access (DAV) Password

Password associated with the DAV user ID.

Starting Path for Crawl

Full path where the crawl starts. Enter / to crawl the entire Content Database hierarchy.


Table 6-7 Oracle Content Database JDBC Authorization Parameters (Step 2)

Parameter Value

Authorization Database JDBC Connection String

JDBC connection string to Oracle Content Database in the form jdbc:oracle:thin@server:port:sid. For example, jdbc:oracle:thin@example.com:1521:rel11g

Content DB System User

System user for Content Database, such as CONTENT or IFS_SYS.

Database User ID

User ID to connect to the database.

Database Password

Password associated with the database user ID.

Use the Run-Time Result Filter

Controls use of a final security check:

TRUE: Performs a final security check on each row in the result set.

FALSE: Does not do a final check. (Default)

Authorization User ID Format

Format of user ID in the authorization query. Enter a supported authentication attributes of the active ID plugin, such as nickname.


Creating an Oracle Content Database Source

If Oracle Content Database release 10.2 or Oracle Content Services release 10.1.2 is used, then the Entity name and Entity password parameters are required, the last six parameters related with keystore are not required, and the crawler plug-in uses service to service (S2S) authentication to connect to Oracle Content Database.

If Oracle Content Database release 10.1.3 is used, then the last six parameters in the following table are required, the Entity name and Entity password are not required, and Oracle SES uses Web services authentication to connect to Oracle Content Database. See "Required Tasks for Oracle Content Database Release 10.1.3".

Create an Oracle Content Database source on the Home - Sources page. Select Oracle Content Database from the Source Type list, and click Create.

Enter values for the parameters listed in Table 6-8.

Table 6-8 Oracle Content Database Source Parameters

Parameter Value

Oracle Content Database URL

http://host name:port/content

Starting paths

/

Depth

-1

Oracle Content Database admin user

orcladmin

Entity name

orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext

Entity password

welcome1

Crawl only

false

Use e-mail for authorization

false

Oracle Content Database Version

For example, 10.1.3.2.0

SES keystore location

For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks

SES keystore type

jks

SES keystore password

*******

SES private key alias

client

SES private key password

*******

CDB Server public key alias

server


Table 6-9 Oracle Content Database Authorization Manager Plug-in Parameters

Parameter Value

Oracle Content Database URL

http://host name:port/content

Oracle Content Database admin user

orcladmin

Entity name

orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext

Entity password

welcome1

Use e-mail for authorization

false

Use result filter for authorization

false

You can use a real-time result filter (query-time authorization) to ensure that the user has access to each result document. Set this parameter to true to remove documents that the user has lost access to since the last crawl.

Oracle Content Database Version

For example, 10.1.3.2.0

SES keystore location

For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks

SES keystore type

jks

SES keystore password

********

SES private key alias

client

SES private key password

*******

CDB Server public key alias

server


Required Tasks for Oracle Content Database Release 10.1.3

This section describes the required steps for Web services authentication when using Oracle Content Database release 10.1.3. This procedure uses the JDK keytool to create the keys.

See Also:

"Setting Up a Server Keystore for WS-Security" in the Oracle Fusion Middleware Administrator's Guide for Oracle Universal Online Archive at http://download.oracle.com/docs/cd/B32110_01/content.1013/b32191/security.htm#CHDGCJEH
  1. Configure a server keystore at the Oracle Content Database middle tier if the keystore is not set up yet.

    The file ORACLE_HOME/j2ee/OC4J_Content/config/oc4j.properties defines the keystore type and the keystore properties file location. If you use a different file name for the keystore, then edit the file on the following entry:

    oracle.ifs.security.KeyStoreLocation=/home/oracle/product/10.1.3.2.0/OracleAS_1/content/settings/server-keystore.jks

    1. Change to the settings directory:

      cd ORACLE_HOME/content/settings 
      
    2. Create the Oracle Content Database server keystore with the following keytool command, substituting a secure password for password.

      ORACLE_HOME/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
      -alias server -keystore server-keystore.jks -dname "cn=server" -keypass 
      password -storepass password
      

      To list the keys in the store:

      ORACLE_HOME/jdk/bin/keytool -list -keystore server-keystore.jks 
      -keypass password -storepass password
      
    3. Sign the key before using it:

      ORACLE_HOME/jdk/bin/keytool -selfcert -validity 5000 -alias server 
      -keystore server-keystore.jks -keypass password -storepass password
      
    4. Export the server public key from the server keystore to a file:

      ORACLE_HOME/jdk/bin/keytool -export -alias server -keystore 
      server-keystore.jks -file cdbServer.pubkey -keypass password -storepass 
      password
      
    5. Store both the keystore password and the private server key password in a secure location so Oracle Content Database can access the keystore and the private key.

      ORACLE_HOME/content/bin/changepassword -k
      

      When prompted for the old password, press [Enter] if it is the first time to set the password; otherwise, enter the previous password. Then, enter and confirm the keystore password (-storepass password) that you provided in step 1.b.

      See ORACLE_HOME/content/log/changepassword.log.

  2. Configure a client keystore at the Oracle SES installation.

    1. Create the SES client keystore with the following keytool command, substituting a secure password for password:

      ORACLE_HOME/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
      -alias client -keystore sesClientKeystore.jks -dname "cn=client" 
      -keypass password -storepass password
      

      To list the keys in store:

      ORACLE_HOME/jdk/bin/keytool -list -keystore sesClientKeystore.jks 
      -keypass password -storepass password
      
    2. Sign the key before using the key:

      ORACLE_HOME/jdk/bin/keytool -selfcert -validity 5000 -alias client 
      -keystore sesClientKeystore.jks -keypass password -storepass password
      

      Restart the WebCenter middle tier from the Oracle Enterprise Manager console.

    3. Export the server public key from the server keystore to a file:

      ORACLE_HOME/jdk/bin/keytool -export -alias client -keystore 
      sesClientKeystore.jks -file sesClient.pubkey -keypass password 
      -storepass password
      
  3. Import Oracle SES client public keys into the Oracle Content Database server keystore (sesClient.pubkey must be copied to Oracle Content Database):

    cd ORACLE_HOME/content/settings
     
    ORACLE_HOME/jdk/bin/keytool -import -alias client -file 
    sesClient.pubkey -keystore server-keystore.jks -keypass password 
    -storepass password
    
  4. Import Oracle Content Database server public keys into the Oracle SES keystore. (cdbServer.pubkey must be copied to Oracle SES):

    ORACLE_HOME/jdk/bin/keytool -import -alias server -file 
    cdbServer.pubkey -keystore sesClientKeystore.jks -keypass password 
    -storepass password
    

Note:

Check the server logs at ORACLE_HOME/content/logs for keystore issues with the crawler plug-in.

Oracle Content Database Source Attributes

Oracle SES crawls the following attributes for Oracle Content Database Sources:

  • AUTHOR

  • CREATE_DATE

  • DESCRIPTION

  • FILE_NAME

  • LASTMODIFIEDDATE

  • LAST_MODIFIED_BY

  • TITLE

  • MIMETYPE

  • ACL_CHECKSUM: The check sum calculated over the ACL submitted for the document.

  • DOCUMENT_LANGUAGE: Oracle SES language code taken from Oracle Content Database language string. For example, if Oracle Content Database uses "American", then Oracle SES submits it as "en-us".

  • DOCUMENT_CHARACTER_SET: The character set for the Oracle Content Database document.

Oracle SES also can search categories or customized attributes created by the user in Oracle Content Database.

You can apply categories to files and links, and divide categories into subcategories having one or more attributes. When a document in Oracle Content Database is attached to a category, you can search on the attribute of category. (The attributes appear in the list of search attributes.)

For example, suppose you create a category named testCategory with testAttr1 and testAttr2. Document X is created and assigned to testCategory. You must assign the value to the testCategory attributes. After crawling, testAttr1 and testAttr2 appears in the search attribute list.

Customized attribute values can be the following types: String, Integer, Long, Double, Boolean, Date, User, Enumerated String, Enumerated Integer, and Enumerated Long:

  • Index Long, Double, Integer, Enumerated Integer, and Enumerated Long type customized attributes are type Number attributes in Oracle SES. The display name has an _N suffix.

  • Index Date customized attributes are type Date attributes in Oracle SES. The display name has a _D suffix).

  • Index String, Enumerated String, and User customized attributes are type String attributes in Oracle SES.

Limitations on Custom Attributes for Oracle Content Database

  • The Oracle Content Database SDK has more features than the Oracle Content Database Web GUI. The Web GUI does not support String arrays, but the SDK does. If you use the SDK to build customized administration and user GUIs that support the String array type, then a customized attribute can have multiple values.

  • If a document in Oracle Content Database is attached to a category and the attributes in that category are left blank, then the attribute is not available in the attribute list for an Advanced Search. The crawler skips attributes with null values. However, if another document has the same attribute with a real value, then the attribute is indexed.