Documents in Oracle Content Database are organized into folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.
The metadata crawled includes folder_url
(URL of the folder containing the document) and folder_path
(path of the folder containing the document). These let you show the direct folder path and direct folder URL for each document hit.
Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.
This book uses the product name Oracle Content Database to mean both Oracle Content Database and Oracle Content Services. Oracle Content Database sources are certified with Oracle Content Database release 10.2 and release 10.1.3 and Oracle Content Services release 10.1.2.3.
The administrator account used by the Oracle Content Database source must have the ContentAdministrator role on the site that is being crawled and indexed. Also, end users searching documents in Oracle Content Database must have the GetContent and GetMetadata permissions.
By default, Oracle Content Database has a limit of three concurrent requests (simultaneous operations) for each user. However, Oracle SES has a default of five concurrent crawler threads. When crawling Oracle Content Database, only three of the five threads can successfully crawl, which causes the crawl to fail.
Workaround: For an Oracle Content Database source, change the Number of Crawler Threads on the Home - Sources - Crawling Parameters page to a value of 3 or fewer.
Or, modify the Oracle Collaboration Suite configuration in Oracle Enterprise Manager to allow more than three concurrent requests. For example:
Access the Enterprise Manager page for the Collaboration Suite Midtier. For example: http://example.domain:1156/
.
Click the Oracle Collaboration Suite midtier standalone instance name. For example: ocsapps.example.domain
.
In the System Components table, click Content.
From Administration, click Node Configurations.
In the Node Configurations table, click HTTP_Node. For example: ocsapps.computer.domain_HTTP_Node.
On Properties, change the value for Maximum Concurrent Requests Per User. Enter a value larger than or equal to the number of crawling threads used by Oracle SES. This value is listed on the Global Settings - Crawler Configuration page.
The Oracle SES instance and the Oracle Content Database instance must be connected to the same or mirrored Oracle Internet Directory system or other LDAP server.
To set up a secure Oracle Content Database source:
Read "Known Issues:" and confirm that the number of crawler threads does not exceed the available concurrent connection settings for each user in Oracle Content Database.
Activate the Oracle Internet Directory identity plug-in for the Oracle Content Database instance on the Global Settings - Identity Management Setup page in Oracle SES.
For Oracle Content Database 10.1.2.3 and 10.2.0.4, use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)
ORACLE_HOME/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f ORACLE_HOME/search/config/ldif/csPlugin.ldif
This defines the entity that is used for the connector: orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext
. The entity has the password welcome1
.
The Content Database JDBC connector is an alternative to the Content Database connector provided in Oracle SES Release 10.1. The JDBC connector greatly improves the performance of incremental crawls. If the elapsed time of an incremental crawl is an important consideration in your deployment of Oracle SES, then use the JDBC connector.
Oracle SES crawler supports crawling from Oracle Content Database 10.1.2.0.4 or later. See the readme file for Oracle Content Database 10.2.1.0.4 patchset for details on configuring high volume full and incremental crawls in Oracle Content Database.
You may need to grant the SES user access to a Oracle Content Database object. Use this command:
GRANT SELECT ON ODMC_ALERT_SEQ TO sesuser
where sesuser
is the SES user.
For example,
GRANT SELECT ON ODMC_ALERT_SEQ TO eqsys
Note:
The JDBC connector requires installation of a patch to Oracle Content Database. If the patch is not available for your version of Content Database, then use the older connector as described in "Creating an Oracle Content Database Source".To create an Oracle Content Database JDBC source:
Open the Oracle SES Administration GUI to the Home page.
Select the Sources secondary tab.
For Source Type, select Oracle Content Database (JDBC), then click Create to display Step 1 Parameters.
Enter a source name and the values for the parameters described in Table 7-12.
Click Next to display Step 2 Authorization.
Enter the settings described in Table 7-13.
Click Create or Create and Customize to create the source.
Table 7-12 Oracle Content Database JDBC Source Parameters (Step 1)
Parameter | Value |
---|---|
Database Connection String |
JDBC connection string to Oracle Content Database in the form |
Content DB System User |
SYSTEM user for Content Database. |
Alert Table Name |
Name of the Alert table for Content Database, which typically has the form |
Database User ID for Crawl |
Valid user ID for the Content DB database. |
Database Password for Crawl |
Password associated with the user ID for crawling. |
Document Count |
Maximum number of documents to be crawled. |
URL Prefix |
URL to Oracle Content Database in the form |
Document Access (DAV) User ID |
Valid Content Database user ID for using WebDAV to access documents. |
Document Access (DAV) Password |
Password associated with the DAV user ID. |
Starting Path for Crawl |
Full path where the crawl starts. Enter |
Table 7-13 Oracle Content Database JDBC Authorization Parameters (Step 2)
Parameter | Value |
---|---|
Authorization Database JDBC Connection String |
JDBC connection string to Oracle Content Database in the form |
Content DB System User |
System user for Content Database, such as |
Database User ID |
User ID to connect to the database. |
Database Password |
Password associated with the database user ID. |
Use the Run-Time Result Filter |
Controls use of a final security check:
|
Authorization User ID Format |
Format of user ID in the authorization query. Enter a supported authentication attributes of the active ID plugin, such as |
If Oracle Content Database release 10.2 or Oracle Content Services release 10.1.2 is used, then the Entity name and Entity password parameters are required, the last six parameters related with keystore are not required, and the crawler plug-in uses service to service (S2S) authentication to connect to Oracle Content Database.
If Oracle Content Database release 10.1.3 is used, then the last six parameters in the following table are required, the Entity name and Entity password are not required, and Oracle SES uses Web services authentication to connect to Oracle Content Database. See "Required Tasks for Oracle Content Database Release 10.1.3".
Create an Oracle Content Database source on the Home - Sources page. Select Oracle Content Database from the Source Type list, and click Create.
Enter values for the parameters listed in Table 7-14.
Table 7-14 Oracle Content Database Source Parameters
Parameter | Value |
---|---|
Oracle Content Database URL |
|
Starting paths |
/ |
Depth |
-1 |
Oracle Content Database admin user |
|
Entity name |
|
Entity password |
welcome1 |
Crawl only |
|
Use e-mail for authorization |
|
Oracle Content Database Version |
For example, 10.1.3.2.0 |
SES keystore location |
For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks |
SES keystore type |
jks |
SES keystore password |
******* |
SES private key alias |
client |
SES private key password |
******* |
CDB Server public key alias |
server |
Table 7-15 Oracle Content Database Authorization Manager Plug-in Parameters
Parameter | Value |
---|---|
Oracle Content Database URL |
http://host name:port/content |
Oracle Content Database admin user |
orcladmin |
Entity name |
|
Entity password |
welcome1 |
Use e-mail for authorization |
|
You can use a real-time result filter (query-time authorization) to ensure that the user has access to each result document. Set this parameter to |
|
Oracle Content Database Version |
For example, 10.1.3.2.0 |
SES keystore location |
For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks |
SES keystore type |
jks |
SES keystore password |
******** |
SES private key alias |
client |
SES private key password |
******* |
CDB Server public key alias |
server |
This section describes the required steps for Web services authentication when using Oracle Content Database release 10.1.3. This procedure uses the JDK keytool to create the keys.
See Also:
"Setting Up a Server Keystore for WS-Security" in the Oracle Fusion Middleware Administrator's Guide for Oracle Universal Online Archive athttps://download.oracle.com/docs/cd/B32110_01/content.1013/b32191/security.htm#CHDGCJEH
Configure a server keystore at the Oracle Content Database middle tier if the keystore is not set up yet.
The file ORACLE_HOME
/j2ee/OC4J_Content/config/oc4j.properties
defines the keystore type and the keystore properties file location. If you use a different file name for the keystore, then edit the file on the following entry:
oracle.ifs.security.KeyStoreLocation=
/home/oracle/product/10.1.3.2.0/OracleAS_1/content/settings/server-keystore.jks
Change to the settings directory:
cd Oracle_home/content/settings
Create the Oracle Content Database server keystore with the following keytool command, substituting a secure password for password.
Oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 -alias server -keystore server-keystore.jks -dname "cn=server" -keypass password -storepass password
To list the keys in the store:
Oracle_home/jdk/bin/keytool -list -keystore server-keystore.jks -keypass password -storepass password
Sign the key before using it:
Oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias server -keystore server-keystore.jks -keypass password -storepass password
Export the server public key from the server keystore to a file:
Oracle_home/jdk/bin/keytool -export -alias server -keystore server-keystore.jks -file cdbServer.pubkey -keypass password -storepass password
Store both the keystore password and the private server key password in a secure location so Oracle Content Database can access the keystore and the private key.
Oracle_home/content/bin/changepassword -k
When prompted for the old password, press [Enter] if it is the first time to set the password; otherwise, enter the previous password. Then, enter and confirm the keystore password (-storepass
password
) that you provided in step 1.b.
See ORACLE_HOME
/content/log/changepassword.log
.
Configure a client keystore at the Oracle SES installation.
Create the SES client keystore with the following keytool command, substituting a secure password for password:
Oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 -alias client -keystore sesClientKeystore.jks -dname "cn=client" -keypass password -storepass password
To list the keys in store:
Oracle_home/jdk/bin/keytool -list -keystore sesClientKeystore.jks -keypass password -storepass password
Sign the key before using the key:
Oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias client -keystore sesClientKeystore.jks -keypass password -storepass password
Restart the WebCenter middle tier from the Oracle Enterprise Manager console.
Export the server public key from the server keystore to a file:
Oracle_home/jdk/bin/keytool -export -alias client -keystore sesClientKeystore.jks -file sesClient.pubkey -keypass password -storepass password
Import Oracle SES client public keys into the Oracle Content Database server keystore (sesClient.pubkey
must be copied to Oracle Content Database):
cd Oracle_home/content/settings Oracle_home/jdk/bin/keytool -import -alias client -file sesClient.pubkey -keystore server-keystore.jks -keypass password -storepass password
Import Oracle Content Database server public keys into the Oracle SES keystore. (cdbServer.pubkey
must be copied to Oracle SES):
Oracle_home/jdk/bin/keytool -import -alias server -file cdbServer.pubkey -keystore sesClientKeystore.jks -keypass password -storepass password
Note:
Check the server logs atORACLE_HOME
/content/logs
for keystore issues with the crawler plug-in.Oracle SES crawls the following attributes for Oracle Content Database Sources:
AUTHOR
CREATE_DATE
DESCRIPTION
FILE_NAME
LASTMODIFIEDDATE
LAST_MODIFIED_BY
TITLE
MIMETYPE
ACL_CHECKSUM
: The check sum calculated over the ACL submitted for the document.
DOCUMENT_LANGUAGE
: Oracle SES language code taken from Oracle Content Database language string. For example, if Oracle Content Database uses "American", then Oracle SES submits it as "en-us".
DOCUMENT_CHARACTER_SET
: The character set for the Oracle Content Database document.
Oracle SES also can search categories or customized attributes created by the user in Oracle Content Database.
You can apply categories to files and links, and divide categories into subcategories having one or more attributes. When a document in Oracle Content Database is attached to a category, you can search on the attribute of category. (The attributes appear in the list of search attributes.)
For example, suppose you create a category named testCategory
with testAttr1 and testAttr2
. Document X
is created and assigned to testCategory
. You must assign the value to the testCategory
attributes. After crawling, testAttr1
and testAttr2
appears in the search attribute list.
Customized attribute values can be the following types: String, Integer, Long, Double, Boolean, Date, User, Enumerated String, Enumerated Integer, and Enumerated Long:
Index Long, Double, Integer, Enumerated Integer, and Enumerated Long type customized attributes are type Number attributes in Oracle SES. The display name has an _N
suffix.
Index Date customized attributes are type Date attributes in Oracle SES. The display name has a _D
suffix).
Index String, Enumerated String, and User customized attributes are type String attributes in Oracle SES.
Limitations on Custom Attributes for Oracle Content Database
The Oracle Content Database SDK has more features than the Oracle Content Database Web GUI. The Web GUI does not support String arrays, but the SDK does. If you use the SDK to build customized administration and user GUIs that support the String array type, then a customized attribute can have multiple values.
If a document in Oracle Content Database is attached to a category and the attributes in that category are left blank, then the attribute is not available in the attribute list for an Advanced Search. The crawler skips attributes with null values. However, if another document has the same attribute with a real value, then the attribute is indexed.