7 Configuring Access to Content Management Sources

This chapter contains the following topics:

Setting Up EMC Documentum Content Server Sources

Documentum data is stored in DocBases, which can contain cabinets and folders. A Documentum Content Server instance can have one or more DocBases crawled with an EMC Documentum Content Server source. The Documentum Content Server source navigates through the DocBases and the inline cabinets to crawl all the documents in Documentum Content Server. Oracle SES creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the most recent crawling was scheduled. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum Content Server and the end user has to access the same document with a different URL. Documents deleted from a DocBase are removed from the index during incremental crawling.

Important Notes for EMC Documentum Content Server Sources

The Documentum source in Oracle SES must use the administrator account of a DocBase for crawling and indexing documents of that DocBase.

Required Software

  • Documentum Content Server DA (Documentum Administrator) or Documentum Content Server WebTop application must be installed and configured.

  • Documentum Foundation Classes (DFC) must be installed on the server running Oracle SES.

  • Currently supported Documentum version is 6.5.

Required Tasks

  • Because EMC Documentum Content Server software is not included with Oracle SES, certain files must be copied manually into Oracle SES.

    The DFC installation asks for destination directory and user directory. For Windows, the default destination directory is C:\Program Files\Documentum and default user directory is C:\Documentum.

    For UNIX, you must create a DFC program root and a DFC user root. For example, DFC program root might be user_home/documentum_shared and DFC user root might be user_home/documentum.

  • Copy the following files from the EMC Documentum Content Server to ses_home/search/lib/plugins/dcs/. These files may be stored in the shared, dfc, or config directories of the EMC Documentum Content Server.

    • dctm.jar

    • dfc.jar

    • dfcbase.jar

    • aspectjrt.jar

    • certjFIPS.jar

    • jsafeFIPS.jar

    • configservice-api.jar

    • dfc.properties

  • Create a subdirectory in ses_home/search/lib/plugins/dcs/, for example dcsothers, and make a second copy of dfc.properties in it.

  • Add the following to DMCL.ini:

    max_session_count = 20
    max_connection_per_session = 20
    

    In Windows, DMCL.ini is located in the WINNT folder. In Linux, DMCL.ini is available in the Documentum folder (DFC user root).

  • In Windows 2003 Server, copy dmcl40.dll from DFC_destination_directory/shared/ to ses_home/bin. For UNIX platforms, copy libdmcl40.so from DFC_destination_directory/dfc/ to ses_home/lib32.

  • The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on Linux. Also note that these variables must to be exported again, and Oracle SES must be restarted when the computer restarts. These variables can also be exported permanently in Linux.

    Use the following commands to export environmental variables in Linux:

    For DOCUMENTUM:

    export DOCUMENTUM=/home/sesuser/DOCUMENTUM
    

    For DOCUMENTUM_SHARED:

    export DOCUMENTUM_SHARED=/home/sesuser/DOCUMENTUM_SHARED
    
  • Restart the middle tier.

    On Windows, restart the computer after installing DFC.

Known Issues

  • In this release, search results cannot be viewed in Documentum desktop. The documents and folders can be viewed only using Documentum Administrator (DA) or Webtop applications.

  • For the Container name parameter, a value of repository name alone might not work. Enter the value of RepositoryName/CabinetName. For example, DocBaseName/CabinetName/FolderName/SubFolderName.

  • Incremental crawls do not recognize an ACL modification of access permissions from None to Browse and Browse to None. The DCSCHECKSUM attribute value is same for both settings.

Configuration for Documentum Content Server 6.5

For Windows, the JAR files can be taken from the application server directory where DA is deployed. For DFC installation on Linux, it is a prerequisite to create DFC program root and DFC user root. For example, the DFC program root can be USER HOME/DOCUMENTUM_SHARED and the DFC user root can be USER HOME/DOCUMENTUM. Table 7-1 lists the location of the JAR files in Windows and Linux.

Table 7-1 Location of the JAR Files

JAR File Name Windows Location Linux Location

dfc.jar

Application server home directory/da deployment directory/WEB-INF/lib

DFC_destination_directory

aspectjrt.jar

Application server home directory/da deployment directory/WEB-INF/lib

DFC_destination_directory/dfc

certjFIPS.jar

Application server home directory/da deployment directory/WEB-INF/lib

DFC_destination_directory/dfc

jsafeFIPS

Application server home directory/da deployment directory/WEB-INF/lib

DFC_destination_directory/dfc

dfc.properties

Application server home directory/da deployment directory/WEB-INF/classes

DFC_user_directory/config/

configservice-api.jar

Application server home directory/da deployment directory/WEB-INF/lib

DFC_destination_directory/dfc


To configure the crawler plug-in:

  1. Create a new directory under ses_home/search/lib/plugin/dcs/. For example, dcsothers.

  2. Copy dfc.properties to the folder created in the previous step (dcsothers) and to the main folder (dcs).

  3. Copy dfc.jar, aspectjrt.jar, certjFIPS.jar, jsafeFIPS.jar, configservice-api.jar to the dcs folder in the following path ses_home/search/lib/plugin/dcs.

  4. The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on Linux. Also note that the environment variables $DOCUMENTUM_SHARED, $DOCUMENTUM, and $CLASSPATH must be exported again, and Oracle SES must be restarted when the computer restarts. These variables can also be exported permanently in Linux.

    Use the following commands to export environmental variables in Linux:

    For DOCUMENTUM:

    export DOCUMENTUM=/home/sesuser/DOCUMENTUM
    

    For DOCUMENTUM_SHARED:

    export DOCUMENTUM_SHARED=/home/sesuser/DOCUMENTUM_SHARED
    

    For CLASSPATH:

    export CLASSPATH=$DOCUMENT_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config
    

Setting Up Identity Management for EMC Documentum Content Server

Setting up identity management requires administration steps in both Oracle SES and EMC Documentum. It includes the following steps:

Activating the Documentum Identity Plug-in

To activate the Documentum identity plug-in, perform the following steps:

  1. Select Documentum Identity Plug-in.

  2. Click Activate.

  3. Enter a valid DocBase name.

  4. Enter a valid user name and password.

  5. Ensure that the environment variable DOCUMENTUM and DOCUMENTUM_SHARED are set correctly.

  6. Click Finish.

Activating the OID Identity Plug-In

Before activating the OID Identity plug-in for validating the users in OID, Documentum Content Server should be synchronized with OID as an LDAP server. To synchronize them, you must import the users and groups from OID to Documentum:

  1. Create an LDAP Configuration Object in Documentum Administrator (DA). To do this:

    1. Login to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. In the File Menu, select File, New, LDAP Configuration Object.

    4. In the Name field, enter a name for LDAP Configuration Object.

    5. Select dm_user as the user subtype.

    6. Under Communication Mode, select Regular.

    7. Under Import, select Users and Groups.

    8. Select Default Configuration Object to use this configuration object in the server field.

    9. Click Next.

    10. In the Directory Type field, select Oracle Internet Directory Server.

    11. In the Bind Type field, select Bind by Searching for Distinguished Name.

    12. In the Binding Name field, provide the admin user name of OID. This is usually cn=orcladmin.

    13. In the Binding Password field, provide the admin user password.

    14. In the Host Name field, provide the OID host name.

    15. Retain the default port number of OID (389).

    16. In the Person Object Class field, provide the information of Base Person Object, typically the value is inetOrgPerson.

    17. In the Person Search Base field, provide the person search base defined in OID. For example, cn=Users, dc=us, dc=oracle, dc=com.

    18. In the Person Search Filter field, specify cn=*.

    19. In the Group Object Class field, provide the Group Object. Typically the value is groupOfUniqueNames.

    20. In the Group Search Filter field, specify cn=*.

    21. Click Next.

    22. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job. To do this:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After synchronizing the Documentum Content Server with OID, you must activate the OID activity plug-in in Oracle SES. Perform the following steps:

  1. Log in to the Oracle SES Administration GUI.

  2. Click Global Settings.

  3. Select System, Identity Management Setup.

  4. Select Oracle Internet Directory identity plug-in manager and click Activate.

  5. Select nickname from the Authentication Attribute list.

  6. Provide the following values:

    • Host name: The host name of the computer where OID is running.

    • Port: The default LDAP port number, 389.

    • Use SSL: true or false based on your preference.

    • Realm: The OID realm, for example, dc=us.dc=oracle.dc=com

    • User name: The OID admin username, for example, cn=orcladmin.

    • Password: User password

Activating the AD Identity Plug-In

Before activating AD Identity plug-in for validating the users in AD, Documentum Content Server must be synchronized with AD as an LDAP server. To synchronize them, you must import users and groups from AD to Documentum:

  1. Create an LDAP Configuration Object in DA. To do this:

    1. Log in to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. Select File, New, LDAP Configuration Object.

    4. Enter a name for ldap configuration object.

    5. Select dm_user as User Subtype.

    6. In the Communication Mode field, select Regular.

    7. In the Import field, select Users and Groups.

    8. Select Default Configuration Object in the server field, and click Next.

    9. Provide the following values:

      Directory Type: Select Active Directory Server.

      Bind Type: Select Bind by Searching for Distinguished Name

      Binding Name: Provide the admin user name of AD. It is normally domainName/Administrator.

      Binding Password: The password of the AD admin user.

      Host Name: AD host name.

      Port: Default port number of AD, 389.

      Person Object Class: The Base Person Object, typically the value is user.

      Person Search Base: The person search base defined in AD, for example cn=Users,dc=us, dc=oracle,dc=com.

      Person Search Filter: Enter cn=*.

      Group Object Class: The group object. Typically the value is group.

      Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.

      Group Search Filter: Enter cn=*.

    10. Click Next.

    11. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job. To do this:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After the Documentum Content Server is synchronized with the AD, you must activate the identity for AD Identity plug-in. To perform this:

  1. Log in to the Oracle SES Administration GUI.

  2. Click Global Settings, and then select System, Identity Management Setup.

  3. Select Activity Directory Identity Plug-in Manager, and click Activate.

  4. Provide the following values:

    • Authentication Attribute: Select USER_NAME.

    • Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.

    • Directory account name: Provide the AD user name, for example Administrator.

    • Directory account password: AD user password.

    • Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.

    • Directory security protocol: Specify either none or portnumber.

  5. Click Finish.

Activating SunOne Identity Plug-In

Before activating SunOne Identity plug-in for validating the users in SunOne, you must synchronize Documentum Content Server with SunOne as an LDAP server. To synchronize them, you must import the users and groups from OID to Documentum:

  1. Create an LDAP Configuration Object in DA:

    1. Log in to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. Select File, New, LDAP Configuration Object.

    4. Enter a name for ldap configuration object.

    5. Select dm_user as User Subtype.

    6. In the Communication Mode field, select Regular.

    7. In the Import field, select Users and Groups.

    8. Select Default Configuration Object in the server field, and click Next.

    9. Provide the following values:

      Directory Type: Select Netscape/iPlanet Directory Server

      Bind Type: Select Bind by Searching for Distinguished Name

      Binding Name: Provide the admin user name of SunOne. It is normally cn=Administrator.

      Binding Password: The password of the SunOne admin user.

      Host Name: SunOne host name.

      Port: Enter the port number used for SunOne. The default port number of SunOne is 389.

      Person Object Class: The Base Person Object, typically the value is person.

      Person Search Base: The person search base defined in SunOne, for example cn=Users,dc=us, dc=oracle,dc=com.

      Person Search Filter: Enter cn=*.

      Group Object Class: The group object. Typically the value is groupOfUniqueNames.

      Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.

      Group Search Filter: Enter cn=*.

    10. Click Next.

    11. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After the Documentum Content Server is synchronized with SunOne, the identity is activated for SunOne Identity plug-in.

To activate the SunOne identity plug-in: 

  1. Log in to the Oracle SES Administration GUI.

  2. Click Global Settings, and then select System, Identity Management Setup.

  3. Select Sun Java System Directory Server Manager, and click Activate.

  4. Provide the following values:

    • Authentication Attribute: Select USER_NAME.

    • Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.

    • Directory account name: Provide the Directory Server user name, for example Administrator.

    • Directory account password: Directory Server user password.

    • Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.

    • Directory security protocol: Specify either none or portnumber.

  5. Click Finish.

Creating an EMC Documentum Content Server Source

Create an EMC Documentum Content Server source on the Home - Sources page. Select EMC Documentum Content Server from the Source Type list, and click Create. Enter values for the following parameters:

  • Container name: The names of the containers to be crawled by Oracle SES. You can crawl an entire Documentum DocBase or a specific repository/cabinet/folder. The format is DocBaseName/CabinetName/FolderName/SubFolderName. Multiple comma-delimited container names can be entered. This parameter is case-sensitive; hence, enter the exact same cabinet name as in the Documentum repository. Required

    These are examples of container names:

    • DocBase1: The entire DocBase1 is crawled.

    • DocBase2/Cabinet21: Cabinet21 and its sub-folders within DocBase2 are crawled.

    • DocBase2/Cabinet21/Folder11: Folder11 and its sub-folders are crawled.

    • DocBase1, DocBase2/Cabinet21/Folder11: The entire DocBase1 and Folder 11 in DocBase2/Cabinet21 are crawled.

  • Attribute list: The comma-delimited list of Documentum attributes along with their data types to be searchable. The format is AttributeName:AttributeType, AttributeName:AttributeType. Valid values are String, Number, and Date. See Table 7-2, "Documentum Data Type Mapping".

    While crawling a DocBase, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional parameter.

    For example, assume that you have the following Documentum attributes with the indicated data types

    • account name: String

    • account ID: Integer

    • creation date: Date

    To make these attributes searchable, enter this value for Attribute list:

    Account Name:String, Account ID:Number, Creation Date:Date

    The default searchable attributes for Documentum Content Server are Modified Date, Title, and Author.

    Multiple attributes with same name are not allowed, such as Emp_ID:String and Emp_ID:Number.

  • User name: Enter the user name of a valid Documentum Content Server user. The user should be an administrator user or a user who has access to all cabinets, folders, and documents of the DocBases configured in the Container name parameter. The user should be able to retrieve content, metadata, and ACL from cabinets, folders, documents and other custom sub classes of all DocBases configured in Container name parameter. Required.

  • Password: Password of the Documentum user. Required.

  • Crawl versions: Indicate whether multiple versions of documents should be crawled, either true or false. The default value is false. Any other value is false and only the latest versions of a document are crawled. Optional.

  • Crawl folder attributes: Indicate whether folder attributes must be crawled, either true or false. This is an optional parameter. The default value is false. Any other value is interpreted as false.

  • URL for viewing the documents: A valid URL for Documentum WebTop or DA application used for viewing the Oracle SES search results. For example:

    http://IP_address:port/da

    or

    http://IP_address:port/webtop

  • Authentication Attribute: This parameter is used to set ACLs. This parameter lets you set multiple LDAP servers. If Oracle SES and Documentum Content Server are synchronized with Active Directory, then enter the value USER_NAME. If Oracle Internet Directory is used, then enter nickname.

Table 7-2 Documentum Data Type Mapping

Sr. No Documentum Data Type Oracle SES Data Type

1

Boolean

Number

2

Integer

Number

3

String

String

4

ID

String

5

Time or Date

Date

6

Double

Number


Setting Up Microsoft SharePoint Sources

The SharePoint Crawler connector enables Oracle SES to provide secure search over SharePoint Portal Server 2003 (SPPS 2003), Microsoft Office SharePoint Server 2007 (MOSS 2007), and Microsoft SharePoint Server 2010 (SPS 2010). The connector extends the searching capabilities of Oracle SES and enables it to search into an external repository. Oracle SES can crawl through the documents, items, and related metadata in SharePoint repositories and provide secure, full-text search. The connector also provides metadata search and browse functionality, which allows a search to be done against a specific subfolder in the hierarchy.

In SharePoint, data is stored in different libraries such as the Document Library, Picture Library, Lists, Discussion Boards, and so on. A SharePoint instance can have one or more sites and sub-sites that the SharePoint Crawler connector can crawl after you set up the appropriate configuration parameters in the Oracle SES Administration GUI. The SharePoint Crawler connector navigates through the Libraries and Lists to crawl all the documents and items from a SharePoint repository. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

The SharePoint Crawler connector supports incremental crawling, which means that it crawls and indexes only those documents that have changed after the most recent crawl. A document is re-crawled if the content, metadata, or direct security access information of the document has changed since the previous crawl. Documents deleted from a Library are removed from the index during incremental crawling.

Important Notes About SharePoint Sources

  • The supported versions of SharePoint Server are:

    • 2003 or 2.0 for SharePoint Portal Server (SPPS)

    • 2007 or 3.0 for Microsoft Office SharePoint Server (MOSS)

    • 2010 for Microsoft SharePoint Server (SPS)

  • When the Crawl Security Settings parameter is set to either NORMAL or STRICT, the SharePoint Crawler for the Container must use the SharePoint administrator account for crawling and indexing documents.

  • When the Crawl Security Settings parameter is set to RELAX, any user that has at least Visitor (Read) permissions can be identified in the SharePoint source for crawling and indexing documents.

  • SharePoint Container names in Oracle SES should not contain any special characters. Enter a backslash (\) before a slash or a comma. Otherwise, the crawler does not recognize the Container.

Known Limitations of SharePoint Connector

  • Passwords entered through the Oracle SES Administration GUI are case insensitive.

  • Storing more than 200 files in a single folder may result in degraded performance and increased crawling time.

  • An administrator must own the SharePoint Server site with the documents to be crawled. The crawler does not have sufficient access rights to crawl the documents if it uses the identity of a non-administrative user.

    To grant administrative rights to a SharePoint user:

    1. Open the SharePoint Site UI and select Site Settings.

    2. Select Users and Permissions, then Site Collection Administrators to display the Site Collection Administrators page.

    3. Enter the user name of the SharePoint Server site in the Site Collection Administrators field.

    4. Click OK.

  • If the Crawler Security Settings parameter is set to RELAX, then the user ID specified in the User Name parameter does not require administrative privileges. Visitor (Read) permissions on the site are sufficient. However, Read must have Browse Directories permissions to access any sub-sites. Otherwise, the sub-sites are not crawled.

    To add Browse Directories permissions for MOSS 2007/SPS 2010:

    1. Open People and Groups - Site Permissions.

    2. Under Settings - Permission Levels, select READ.

    3. Under Site Permissions, select Browse Directories.

    4. Click Submit.

    To add Browse Directories permissions for SPPS 2003:

    1. Open the Created subarea and select Manage Security.

    2. Select the user and edit permissions.

    3. Select READ.

    4. Click Advanced Permissions.

    5. Under Advanced Permissions, select Browse Directories.

    6. Click OK.

  • SharePoint does not allow users without administrative privileges to browse user profiles.

    If the user ID specified in the User Name parameter does not have administrative privileges, then this user needs permission to manage profiles.

    To grant permission to manage profiles:

    1. Open SharePoint Central Administration 3.02.

    2. Click Shared Services Administration - SharedServices1.

    3. Under User Profile and My Sites, select Personalization Service Permissions.

    4. Add user user1 and select permissions Manage user profiles.

    5. Save and submit the user.

    User profiles are crawled if the user has specified the root site in the Site/Sub-Site URL parameter of the source configuration.

Known Issues for SharePoint Connector

  • Versions of list items whose object type is Folder are not crawled and indexed.

  • Site Collection Administrator users are not able to see documents if they are not listed among the document permission users.

  • Unable to type cast null message is not error. This information is provided when the crawler tries to crawl attachments that are not supported for a particular entity.

  • Principal user_name cannot be validated error is returned when the crawler obtains a user name from the SharePoint repository that is not present in the Active Directory.

  • Performance of the SharePoint connector can be impacted when the Crawl Versions attribute is set to true.

Supported Platforms

The following platforms are supported by the SharePoint Crawler connector:

  • Red Hat Linux 4

  • Windows 2003 Server Standard Edition and later with the latest Service Pack

Creating a Microsoft SharePoint Source

Create a source for the newly-created user-defined source type on the Home - Sources page. Enter a source name. Provide values for the configuration parameters described in the following list. Also see Table 7-3, "Supported Values for SharePoint Source Parameters".

  • SharePoint Version: Version of the SharePoint server (SPPS 2003 or MOSS 2007 or SPS 2010) to crawl. (Required)

  • Container name: Contains the names of the containers to be crawled by Oracle SES. You can specify multiple container names as a comma-delimited list. (Required)

    You can crawl an entire area or site or a specific folder. The format for specifying a container folder is AreaName/LibraryName/FolderName/SubFolderName.

    To crawl all documents in the Area or Library, the format is AreaName or AreaName/LibraryName.

    To index the entire SharePoint portal, enter a slash (/).

    To crawl all sites, enter sites.

    Examples for SharePoint Portal Server:

    • Container name: AreaName

      The entire Area is crawled.

    • Container name: AreaName/LibraryName/Folder21

      Folder21 and its subfolders within LibraryName are crawled.

    • Container name: LibraryName

      All documents inside the Library and its subfolders are crawled.

    Examples for MOSS 2007/SPS 2010:

    • Container name: LibraryName/Folder21

      Folder21 and its sub-folders within LibraryName are crawled.

    • Container name: LibraryName

      All documents inside the Library and its subfolders are crawled.

      The path for the container cannot contain any special characters. Enter a backslash (\) before a slash or a comma.

  • Attribute list: A comma-delimited list of attributes, as described in Table 7-4. The format for an attribute list is AttributeName, AttributeName. Multiple attributes with same name are not allowed, such as Emp_ID, Emp_ID.

    In MOSS 2007/SPS 2010, all attributes viewable from the UI are indexed by default. List all custom attributes to index, using the names displayed in the user interface.

    In SPPS 2003, the Title, LastModifiedDate, and Author attributes are indexed by default. List any other attributes to index, using the names displayed in the UI.

    If you update the attribute list from the administrator parameters, then perform a forced recrawl to delete the indexes of the old attribute list and to create indexes for the new attribute list.

  • Domain name: The domain name of the user that is used to crawl the SharePoint site. For example, if you intend to use the OracleDomain\Administrator user for crawling, then enter OracleDomain for this parameter. Do not include .com or .in or any other suffix in the name. (Required)

  • User name: Specifies the user name of a valid SPPS 2003/MOSS 2007/SPS 2010 user. Do not include the domain name for this user. For example, for OracleDomain\Administrator, enter Administrator. (Required)

  • Password: Specifies the password of the SharePoint user specified in User name. (Required)

  • Authentication attribute: Format of the user and group identity stored in the ACL of SharePoint objects. This format must be an authentication attribute of the Oracle SES active identity plug-in, such as USER_NAME for an Active Directory identity plug-in. Otherwise, the ACL validation fails during indexing. (Required and case sensitive)

    For example, this value is USER_NAME for the Microsoft Active Directory identity plug-in.

  • Crawl URL or SPS Site/Sub-Site URL: The URL of the Site or Sub-site of the Microsoft SharePoint source, which is used for viewing the search results. (Required)

    This URL has the form http://HostName:PortNumber or http://HostName:PortNumber/SubSiteName.

  • Crawl Security Settings: Sets security on documents for indexing. (Required)

    This setting can be one of the following:

    • NORMAL: The regular crawl uses site-level access control lists (ACLs) but not document-level ACLs.

    • RELAX: When the SharePoint Site Administrator user information is not available and the SharePoint user has visitor (or read) permissions on the site, this user is not able to crawl subsites under the main site. This mode is intended for exposing public documents temporarily and quickly to search. The SES administrator must be careful not to expose documents to other users inadvertently. See the work-around for this in "Known Limitations of SharePoint Connector".

    • STRICT: Captures even document-level security. This mode requires that an additional Web Service agent, Oracle MOSS Web Service, be installed on the MOSS 2007/SPS 2010 server. See "Deploying the Web Service on MOSS 2007/SPS 2010".

  • Simple Include: Only include URLs having at least one word mentioned in this parameter. Separate the words with commas.

  • Simple Exclude: Exclude all URLs having one or more word(s) mentioned in this parameter. Separate the words with commas.

  • Regular Expression Include: Include all URLs that match the expression provided in this parameter.

  • Regular Expression Exclude: Exclude all URLs that match the expression provided in this parameter.

  • Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true and false. Any other value is interpreted as false. The default value is false, so only the latest version is crawled. (Optional)

  • Crawl folder attributes: Controls whether folder attributes are crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)

  • Crawl attachments: This parameter indicates whether attachments should be crawled. The default value is false. Valid values are true or false, and any other value is interpreted as false. (Optional)

  • LDAP URL: URL of the LDAP server, such as ldap://IP:port, where the default port number is 389.

  • LDAP Search Base: LDAP Search Base, such as, DC=abc, DC=com. When the value of Authentication Attribute is DN, specify the LDAP URL and the LDAP search base of the LDAP server configured in the identity plug-in. Otherwise, leave these parameters blank.

  • Fast Incremental Crawl: Controls whether the fast incremental crawl should be used while crawling. Enter true to enable the fast incremental crawl. Setting to true fetches newly added and modified documents, without removing the documents from the Oracle SES index that were deleted from the Sharepoint repository. Setting to false fetches newly added and modified documents, and removes the documents from the Oracle SES index that were deleted from the Sharepoint repository.

Table 7-3 summarizes the supported values for the configuration parameters of the SharePoint Crawler connector.

Table 7-3 Supported Values for SharePoint Source Parameters

Parameter Name SharePoint Portal Server MOSS 2007/SPS 2010

SharePoint Version

2003, 2.0

2007, 3.0 / 2010

Container name

(/) for full site, Library Name, List Name, Area Name

(/) for full site, Library Name, List Name

Attribute list

AttributeName1, AttributeName2

AttributeName1, AttributeName2

Domain Name

Domain name of the user

Domain name of the user

User name

Valid administrator user for SharePoint Portal server

Valid administrator user for MOSS 2007/SPS 2010

Password

Password for the user

Password for the user

Authentication attributes

USER_NAME

USER_NAME

SPC Site/Sub-Site URL

IP address or host name with port on which SharePoint Portal Server is installed

IP address or host name with port on which MOSS 2007/SPS 2010 is installed

Crawl Security Settings

NORMAL, RELAX

NORMAL, RELAX, STRICT

Simple Include

Part of URL

Part of URL

Simple Exclude

Part of URL

Part of URL

Regular Expression Include

All URLs that match the expression

All URLs that match the expression

Regular Expression Exclude

All URLs that match the expression

All URLs that match the expression

Crawl versions

true or false

true or false

Crawl folder attachments

true or false

true or false

Crawl attachments

true or false

true or false

LDAP URL

URL of the LDAP server

URL of the LDAP server

LDAP Search Base

LDAP Search Base

LDAP Search Base

Fast Incremental Crawl

true or false

true or false


Table 7-4 Attributes for List Items and Versions Crawled for SharePoint Source

List Item Type Attributes

Document Library

Title, Author, Created, Modified

Picture Library

Title, ImageSize, ImageCreateDate, Description, Keywords

Form Library

Title, Author, Created, Modified

Translation Library

Title, Name, Language, Base Document Version, Translation Status, Created

Data Connection Library

Connection Type, Description, Keywords, Title, UDC Purpose, Created

Slide Library

Name, Presentation, Description, Created

Report Library

Name, Title, Author, Created, Report Category, Report Status

Dash Board

Name, Title, Author, Created

Wiki Page Library

Title, Author, Created, Modified

Announcements

Title, Body, Editor, Modified, Author, Created

Contacts

Company, WorkCity, Created, Email, Comments, Title, Editor, HomePhone, JobTitle, Modified, WorkZip, WorkPhone, WorkState, FirstName, Author, FullName, WorkCountry, CellPhone, WorkFax, WorkAddress

Links

Comments, Editor, Modified, Author, URL, Created

Discussion Reply

Body, Created, DiscussionTitle, Editor, Modified, Author

Calendar

EventType, Title, EventDate, Duration, Editor, WorkspaceLink, Modified, EndDate, Description, fRecurrence, Author, fAllDayEvent, Created

Task

Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created

Project Task

Title, StartDate, Body, Status, Editor, Priority, AssignedTo, DueDate,Modified, Author, PercentComplete, Created

Issue Tracking

Category, LinkIssueIDNoMenu, RelatedIssues, IssueID, Priority, DueData, Comment, V3Comments, IsCurrent, Created, Title, Status, Editor, AssignedTo, Modified, Author

Custom List

Title, Editor, Modified, Author, Created

Languages and Translators

Language_x0020_From,Language_x0020_To,Modified,Author,Translator,Created, Editor

KPI List

Title, PercentExpression, Editor, ViewGuid, Modified, Value, AutoUpdate, KpiComments, Author, Goal, ValueExpression, Warning, KpiDescription, DataSource, LowerValuesAreBetter, Created

Asset Library

Title, Description, Author, CreatedDate

Publishing Page

Title, Name, Description, CreatedDate, Modified

Site Pages

Title, Name, Description, CreatedDate, Modified


Deploying the Web Service on MOSS 2007/SPS 2010

For MOSS 2007/SPS 2010, if the Crawl Security Settings parameter is set to STRICT, then you must install an extra web service, Oracle MOSS Web Service. The following installation and deinstallation files are provided by the OracleMOSSService installer at ses_home/search/lib/plugins/sps/WebService.zip:

  • OracleMossService.wsp

  • install.cmd

  • de-install.cmd

To install or deinstall the Oracle MOSS Web Service: 

  1. Click install.cmd to install, or click de-install.cmd to deinstall.

  2. Verify that the STSADM.exe file is in the following location: Drive:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN.

    If STSADM.exe is not in that folder, specify the correct path when the installer prompts for it.

  3. Press any key to continue.

Setting Up Oracle Content Database Sources

Documents in Oracle Content Database are organized into folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.

The metadata crawled includes folder_url (URL of the folder containing the document) and folder_path (path of the folder containing the document). These let you show the direct folder path and direct folder URL for each document hit.

Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.

Important Notes for Oracle Content Database Sources

This book uses the product name Oracle Content Database to mean both Oracle Content Database and Oracle Content Services. Oracle Content Database sources are certified with Oracle Content Database release 10.2 and release 10.1.3 and Oracle Content Services release 10.1.2.3.

Known Issues:

  • The administrator account used by the Oracle Content Database source must have the ContentAdministrator role on the site that is being crawled and indexed. Also, end users searching documents in Oracle Content Database must have the GetContent and GetMetadata permissions.

  • By default, Oracle Content Database has a limit of three concurrent requests (simultaneous operations) for each user. However, Oracle SES has a default of five concurrent crawler threads. When crawling Oracle Content Database, only three of the five threads can successfully crawl, which causes the crawl to fail.

    Workaround: For an Oracle Content Database source, change the Number of Crawler Threads on the Home - Sources - Crawling Parameters page to a value of 3 or fewer.

    Or, modify the Oracle Collaboration Suite configuration in Oracle Enterprise Manager to allow more than three concurrent requests. For example:

    1. Access the Enterprise Manager page for the Collaboration Suite Midtier. For example: http://example.domain:1156/.

    2. Click the Oracle Collaboration Suite midtier standalone instance name. For example: ocsapps.example.domain.

    3. In the System Components table, click Content.

    4. From Administration, click Node Configurations.

    5. In the Node Configurations table, click HTTP_Node. For example: ocsapps.computer.domain_HTTP_Node.

    6. On Properties, change the value for Maximum Concurrent Requests Per User. Enter a value larger than or equal to the number of crawling threads used by Oracle SES. This value is listed on the Global Settings - Crawler Configuration page.

Setting Up Identity Management for Oracle Content Database Sources

The Oracle SES instance and the Oracle Content Database instance must be connected to the same or mirrored Oracle Internet Directory system or other LDAP server.

To set up a secure Oracle Content Database source: 

  1. Read "Known Issues:" and confirm that the number of crawler threads does not exceed the available concurrent connection settings for each user in Oracle Content Database.

  2. Activate the Oracle Internet Directory identity plug-in for the Oracle Content Database instance on the Global Settings - Identity Management Setup page in Oracle SES.

  3. For Oracle Content Database 10.1.2.3 and 10.2.0.4, use the following LDIF file to create an application entity for the plug-in. (An application entity is a data structure within LDAP used to represent and keep track of software applications accessing the directory with an LDAP client.)

    ses_home/bin/ldapmodify -h oidHost -p OIDPortNumber -D "cn=orcladmin" -w password -f  ses_home/search/config/ldif/csPlugin.ldif
    

    This defines the entity that is used for the connector: orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext. The entity has the password welcome1.

Creating an Oracle Content Database JDBC Source

The Content Database JDBC connector is an alternative to the Content Database connector provided in Oracle SES Release 10.1. The JDBC connector greatly improves the performance of incremental crawls. If the elapsed time of an incremental crawl is an important consideration in your deployment of Oracle SES, then use the JDBC connector.

Oracle SES crawler supports crawling from Oracle Content Database 10.1.2.0.4 or later. See the readme file for Oracle Content Database 10.2.1.0.4 patchset for details on configuring high volume full and incremental crawls in Oracle Content Database.

You may need to grant the SES user access to a Oracle Content Database object. Use this command:

GRANT SELECT ON ODMC_ALERT_SEQ TO sesuser

where sesuser is the SES user.

For example,

GRANT SELECT ON ODMC_ALERT_SEQ TO searchsys

Note:

The JDBC connector requires installation of a patch to Oracle Content Database. If the patch is not available for your version of Content Database, then use the older connector as described in "Creating an Oracle Content Database Source".

To create an Oracle Content Database JDBC source: 

  1. Open the Oracle SES Administration GUI.

  2. On the home page, select the Sources secondary tab.

  3. For Source Type, select Oracle Content Database (JDBC), then click Create to display Step 1 Parameters.

  4. Enter the source name and the values for the parameters described in Table 7-5.

  5. Click Next to display Step 2 Authorization.

  6. Enter the settings described in Table 7-6.

  7. Click Create or Create and Customize to create the source.

Table 7-5 Oracle Content Database JDBC Source Parameters (Step 1)

Parameter Value

Database Connection String

JDBC connection string to Oracle Content Database in the form jdbc:oracle:thin@server:port:sid. For example, jdbc:oracle:thin@example.com:1521:rel11g

Content DB System User

SYSTEM user for Content Database.

Alert Table Name

Name of the Alert table for Content Database, which typically has the form ODMC_ALERT_name.

Database User ID for Crawl

Valid user ID for the Content DB database.

Database Password for Crawl

Password associated with the user ID for crawling.

Document Count

Maximum number of documents to be crawled.

URL Prefix

URL to Oracle Content Database in the form HTTP://hostname:port/CONTENT. For example, HTTP://example.com:7778/CONTENT.

Document Access (DAV) User ID

Valid Content Database user ID for using WebDAV to access documents.

Document Access (DAV) Password

Password associated with the DAV user ID.

Starting Path for Crawl

Full path where the crawl starts. Enter / to crawl the entire Content Database hierarchy.


Table 7-6 Oracle Content Database JDBC Authorization Parameters (Step 2)

Parameter Value

Authorization Database JDBC Connection String

JDBC connection string to Oracle Content Database in the form jdbc:oracle:thin@server:port:sid. For example, jdbc:oracle:thin@example.com:1521:rel11g

Content DB System User

System user for Content Database, such as CONTENT or IFS_SYS.

Database User ID

User ID to connect to the database.

Database Password

Password associated with the database user ID.

Use the Run-Time Result Filter

Controls use of a final security check:

TRUE: Performs a final security check on each row in the result set.

FALSE: Does not do a final check. (Default)

Authorization User ID Format

Format of user ID in the authorization query. Enter a supported authentication attributes of the active ID plugin, such as nickname.


Creating an Oracle Content Database Source

If Oracle Content Database release 10.2 or Oracle Content Services release 10.1.2 is used, then the Entity name and Entity password parameters are required, the last six parameters related with keystore are not required, and the crawler plug-in uses service to service (S2S) authentication to connect to Oracle Content Database.

If Oracle Content Database release 10.1.3 is used, then the last six parameters in the following table are required, the Entity name and Entity password are not required, and Oracle SES uses Web services authentication to connect to Oracle Content Database. See "Required Tasks for Oracle Content Database Release 10.1.3".

Create an Oracle Content Database source on the Home - Sources page. Select Oracle Content Database from the Source Type list, and click Create.

Enter values for the parameters listed in Table 7-7.

Table 7-7 Oracle Content Database Source Parameters

Parameter Value

Oracle Content Database URL

http://host name:port/content

Starting paths

/

Depth

-1

Oracle Content Database admin user

orcladmin

Entity name

orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext

Entity password

welcome1

Crawl only

false

Use e-mail for authorization

false

Oracle Content Database Version

For example, 10.1.3.2.0

SES keystore location

For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks

SES keystore type

jks

SES keystore password

*******

SES private key alias

client

SES private key password

*******

CDB Server public key alias

server


Table 7-8 Oracle Content Database Authorization Manager Plug-in Parameters

Parameter Value

Oracle Content Database URL

http://host name:port/content

Oracle Content Database admin user

orcladmin

Entity name

orclApplicationCommonName=ocsCsPlugin,cn=ifs,cn=products,cn=oraclecontext

Entity password

welcome1

Use e-mail for authorization

false

Use result filter for authorization

false

You can use a real-time result filter (query-time authorization) to ensure that the user has access to each result document. Set this parameter to true to remove documents that the user has lost access to since the last crawl.

Oracle Content Database Version

For example, 10.1.3.2.0

SES keystore location

For example, /scratch/ocs/cdb/cdb-ses/keystore/sesClientKeystore.jks

SES keystore type

jks

SES keystore password

********

SES private key alias

client

SES private key password

*******

CDB Server public key alias

server


Required Tasks for Oracle Content Database Release 10.1.3

This section describes the required steps for Web services authentication when using Oracle Content Database release 10.1.3. This procedure uses the JDK keytool to create the keys.

Note:

In the following steps, content_db_oracle_home denotes the Oracle home of the Oracle Content Database server and ses_oracle_home denotes the Oracle home of the Oracle SES application.

See Also:

"Setting Up a Server Keystore for WS-Security" in the Oracle Fusion Middleware Administrator's Guide for Oracle Universal Online Archive
  1. Configure a server keystore at the Oracle Content Database middle tier if the keystore is not set up yet.

    The file content_db_oracle_home/oc4j/j2ee/home/config/oc4j.properties defines the keystore type and the keystore properties file location. If you use a different file name for the keystore, then edit the file on the following entry:

    oracle.ifs.security.KeyStoreLocation=/home/oracle/product/10.1.3.2.0/OracleAS_1/content/settings/server-keystore.jks

    1. Change to the settings directory:

      cd content_db_oracle_home/content/settings 
      
    2. Create the Oracle Content Database server keystore with the following keytool command, substituting a secure password for password.

      content_db_oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
      -alias server -keystore server-keystore.jks -dname "cn=server" -keypass 
      password -storepass password
      

      To list the keys in the store:

      content_db_oracle_home/jdk/bin/keytool -list -keystore server-keystore.jks 
      -keypass password -storepass password
      
    3. Sign the key before using it:

      content_db_oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias server 
      -keystore server-keystore.jks -keypass password -storepass password
      
    4. Export the server public key from the server keystore to a file:

      content_db_oracle_home/jdk/bin/keytool -export -alias server -keystore 
      server-keystore.jks -file cdbServer.pubkey -keypass password -storepass 
      password
      
    5. Store both the keystore password and the private server key password in a secure location so Oracle Content Database can access the keystore and the private key.

      content_db_oracle_home/content/bin/changepassword -k
      

      When prompted for the old password, press [Enter] if it is the first time to set the password; otherwise, enter the previous password. Then, enter and confirm the keystore password (-storepass password) that you provided in step 1.b.

      See content_db_oracle_home/content/log/changepassword.log.

  2. Configure a client keystore at the Oracle SES installation.

    1. Create the SES client keystore with the following keytool command, substituting a secure password for password:

      ses_oracle_home/jdk/bin/keytool -genkey -keyalg RSA -validity 5000 
      -alias client -keystore sesClientKeystore.jks -dname "cn=client" 
      -keypass password -storepass password
      

      To list the keys in store:

      ses_oracle_home/jdk/bin/keytool -list -keystore sesClientKeystore.jks 
      -keypass password -storepass password
      
    2. Sign the key before using the key:

      ses_oracle_home/jdk/bin/keytool -selfcert -validity 5000 -alias client 
      -keystore sesClientKeystore.jks -keypass password -storepass password
      

      Restart the WebCenter middle tier from the Oracle Enterprise Manager console.

    3. Export the server public key from the server keystore to a file:

      ses_oracle_home/jdk/bin/keytool -export -alias client -keystore 
      sesClientKeystore.jks -file sesClient.pubkey -keypass password 
      -storepass password
      
  3. Import Oracle SES client public keys into the Oracle Content Database server keystore (sesClient.pubkey must be copied to Oracle Content Database):

    cd content_db_oracle_home/content/settings
     
    content_db_oracle_home/jdk/bin/keytool -import -alias client -file 
    sesClient.pubkey -keystore server-keystore.jks -keypass password 
    -storepass password
    
  4. Import Oracle Content Database server public keys into the Oracle SES keystore. (cdbServer.pubkey must be copied to Oracle SES):

    ses_oracle_home/jdk/bin/keytool -import -alias server -file 
    cdbServer.pubkey -keystore sesClientKeystore.jks -keypass password 
    -storepass password
    

Note:

Check the server logs at content_db_oracle_home/content/logs for keystore issues with the crawler plug-in.

Oracle Content Database Source Attributes

Oracle SES crawls the following attributes for Oracle Content Database Sources:

  • AUTHOR

  • CREATE_DATE

  • DESCRIPTION

  • FILE_NAME

  • LASTMODIFIEDDATE

  • LAST_MODIFIED_BY

  • TITLE

  • MIMETYPE

  • ACL_CHECKSUM: The check sum calculated over the ACL submitted for the document.

  • DOCUMENT_LANGUAGE: Oracle SES language code taken from Oracle Content Database language string. For example, if Oracle Content Database uses "American", then Oracle SES submits it as "en-us".

  • DOCUMENT_CHARACTER_SET: The character set for the Oracle Content Database document.

Oracle SES also can search categories or customized attributes created by the user in Oracle Content Database.

You can apply categories to files and links, and divide categories into subcategories having one or more attributes. When a document in Oracle Content Database is attached to a category, you can search on the attribute of category. (The attributes appear in the list of search attributes.)

For example, suppose you create a category named testCategory with testAttr1 and testAttr2. Document X is created and assigned to testCategory. You must assign the value to the testCategory attributes. After crawling, testAttr1 and testAttr2 appears in the search attribute list.

Customized attribute values can be the following types: String, Integer, Long, Double, Boolean, Date, User, Enumerated String, Enumerated Integer, and Enumerated Long:

  • Index Long, Double, Integer, Enumerated Integer, and Enumerated Long type customized attributes are type Number attributes in Oracle SES. The display name has an _N suffix.

  • Index Date customized attributes are type Date attributes in Oracle SES. The display name has a _D suffix).

  • Index String, Enumerated String, and User customized attributes are type String attributes in Oracle SES.

Limitations on Custom Attributes for Oracle Content Database

  • The Oracle Content Database SDK has more features than the Oracle Content Database Web GUI. The Web GUI does not support String arrays, but the SDK does. If you use the SDK to build customized administration and user GUIs that support the String array type, then a customized attribute can have multiple values.

  • If a document in Oracle Content Database is attached to a category and the attributes in that category are left blank, then the attribute is not available in the attribute list for an Advanced Search. The crawler skips attributes with null values. However, if another document has the same attribute with a real value, then the attribute is indexed.

Setting Up Oracle Content Server Sources

The Oracle Content Server connector enables Oracle SES to search Oracle Content Server (formerly Stellent Server), which is the foundation of the Oracle Universal Content Management solution. Users throughout the organization can contribute content from native desktop applications, manage content through rich library services, publish content to Web sites or business applications, and access the content with a browser.

The Content Server connector supports Oracle Content Server 7.5.2 or 10gR3 with XMLCrawlerExport (the Oracle Content Server RSS component).

Oracle Content Server includes an RSS feed generator component (XMLCrawlerExport) on top of the content server. This component generates RSS feeds as XML files from its internal indexer, based on indexer activity. It has access to the original content (for example, a Microsoft Word document), the Web viewable rendition, and all the metadata associated with each document. The component also has a template that contains a Idoc script that applies the metadata values from the indexer to generate the XML document. (Idoc is an Oracle Content Server proprietary scripting language.) Oracle Content Server generates feeds for all documents for the initial crawl, and feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, with the operation on the item (such as insert, delete, update), its metadata (such as author, summary), URL links, and so on.

The Oracle Content Server connector reads the feeds provided by Oracle Content Server according to a crawling schedule. Oracle SES parses and extracts the metadata information, and fetches the document content, using its generic RSS crawler framework.

Oracle SES supports the control feed method, in which individual feeds can be located anywhere and a control feed file is generated containing the links to other feeds. This control file is input to the connector through the configuration file. Control feed must be used when two computers are on different domains or on different platforms, or if they use remote access protocol, such as HTTP or FTP, for communication between the two servers.

Oracle Content Server Security Model

The Oracle Content Server security model is based on the concept of permissions, which defines the privileges a user has on a document. The following table shows the set of permissions supported by Oracle Content Server. Each permission is a superset of the previous ones. For example, Write permission includes Read permission. Admin permission is a superset of all the permissions.

Table 7-9 Oracle Content Server Permissions

Permission Description

Read

View documents

Write

View, Check In, Check Out, and Get Copy of documents

Delete

View, Check In, Check Out, Get Copy, and Delete documents

Admin

View, Check In, Check Out, Get Copy, and Delete documents

An Administration user with Workflow rights can start or edit a workflow for the document. An Administration user can also check in documents with another user specified as the Author.


Oracle Content Server provides multiple security models, including an out-of-the-box security system and integration with centralized security models such as LDAP and Active Directory.

Oracle Universal Content Management security can work in these modes:

  • Universal Content Management native identity plugin where Universal Content Management is not connected to a directory

  • Oracle Internet Directory

  • Active Directory only where Universal Content Management is connected to Active Directory using LDAP. A connection to Active Directory using Microsoft Security is not supported.

The Oracle SES Oracle Content Server connector supports the two most popular security models among current Oracle Content Server customers: Roles and Groups, and Accounts.

Roles and Groups

A security group is a set of files grouped under a unique name. Every file in the library belongs to a security group. Access to security groups is controlled by the permissions, which are assigned to roles, which are assigned to users. For example, the EngAdmin role has Read, Write, Delete, and Admin permission to all content in the EngDocs security group. User Joe is assigned to role EngAdmin; therefore, Joe has all permissions to the documents in EngDocs group.

Accounts

Accounts provide greater flexibility and granularity than groups. An account is a group of content. It introduces another metadata field that is filled out upon content check-in. When accounts are enabled, content items also can be assigned to an account in addition to the security group. A user must have access to the account to read, write, delete or administer content in that account. When accounts are used, the account becomes the primary permission to satisfy before security group permissions are applied.

A user's access to a document is like the intersection between their account permissions and security group permissions. For example, a user is assigned the EngAdmin role, which has all permissions to the documents in EngDocs security group. At the same time, the user is also assigned Read and Write permission to the EngProjA account. Therefore, the user has only Read and Write permission to a content item that is in the EngDocs security group and the EngProjA account.

Accounts can also be set up in a hierarchical structure. A user has permission to the entire subtree starting from the account node. For instance, a user assigned to the Eng account has access to Eng/AbcProj and Eng/XyzProj, or any accounts beginning with Eng. In other words, users that have permission to a particular account prefix also have access to all accounts with that prefix.

Note:

Oracle Content Server uses a prefix test for account filtering, so a slash (/) has no special meaning. A user granted permission to account A has access to any documents in account A*, such as A, AB, or A/B. The hierarchical structure takes advantage of the prefix semantics, but it is enforced with the account model. Hence, there is no special character as the level divider when testing for account permissions.

See Also:

Oracle Universal Content Management documentation at

http://www.oracle.com/technetwork/middleware/content-management/index-094708.html

Setting Up Identity Management for Oracle Content Server

To activate the Oracle Content Server identity plug-in:

  1. On the Global Settings page, select Identity Management Setup under the System heading.

    The Global Settings - Identity Management Setup page is displayed.

  2. Select Oracle Content Server and click Activate.

  3. Enter values for the parameters described in Table 7-10, then click Finish.

Table 7-10 Oracle Content Server Connector Setup Parameters

Parameter Value

HTTP endpoint for authentication

HTTP endpoint for Oracle Content Server authentication. For example, http://my.host.com:port/idc/idcplg

Admin User

Administrative user who accesses the Oracle Content Server Identity Service API

Password

Administrative user password


Creating an Oracle Content Server Source

To create an Oracle Content Server source using the Oracle SES Administration GUI:

  1. On the home page, click the Sources secondary tab to display the Sources page.

  2. Select Oracle Content Server from the Source Type list, then click Create to display Step 1 Parameters.

  3. Enter values for the parameters described in Table 7-11.

  4. Click Next to display Step 2 Authorization, then set values for the parameters described in Table 7-11.

  5. Scroll down to Security Attributes to verify that ACCOUNT and DOCSECURITYGROUP are listed. If they are not, then the source was not created correctly. Verify that the Configuration URL in Step 1 is correct.

  6. Click Create to create the Oracle Content Server source.

    After processing each data feed, a status feed is uploaded to the location specified in the configuration file. This status feed is named one of the following:

    • data_feed_file_name.suc indicates the data feed was processed successfully.

    • data_feed_file_name.err indicates that an error was encountered while processing the feed. The errors are listed in this status feed.

Tip:

To index multibyte character sets, set the default character set of the crawler to UTF-8 regardless of the character set of Oracle Content Server. See "Modifying the Crawler Parameters".

Table 7-11 Oracle Content Server Source Parameters (Step 1)

Parameter Value

Configuration URL

URL of the XML configuration file providing details of the source, such as the data feed type, location, security attributes, and so on. Obtain the location of the file from the Oracle Content Server administrator.

Use the following format to enter the configuration URL:

http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name

Authentication Type

Java authentication type. Set this parameter when the data feeds are accessed over HTTP.

Enter one of the following values:

  • NATIVE: Proprietary XML over HTTP authentication

  • ORASSO: Oracle Single Sign-on.

User ID

User ID to access the data feeds. The access details of the data feed are specified in the configuration file. Obtain a user ID from the Oracle Content Server administrator.

Password

Password for User ID. Obtain the password from the Oracle Content Server administrator.

Realm

Realm of the Oracle Content Server instance. The value for the Realm field is required only when the authentication type is set to either Basic or Digest. As Oracle Content Server sources use Native authentication, the value for the Realm field must be left blank.

Oracle SSO Login URL

Set this parameter value when the authentication type is ORASSO. Oracle SES redirects the crawler to this SSO login URL for authentication before crawling the Oracle Content Server source.

The format for this URL for Oracle 10g SSO secured Oracle Content Server source is:

  • For basic authentication type:

    https://server:port/pls/orasso/orasso.wwsso_app_admin.ls_login
    
  • For form based authentication type:

    https://server:port/mysso/signon.jsp
    

The format for this URL for Oracle 11g OAM secured Oracle Content Server source is:

http://server:port/oam/server/obrareq.cgi?encquery

The value for port can be found from the "Listen Port" field in the OAM Managed Server Console.

Oracle SSO Action URL

Set this parameter value when the authentication type is ORASSO. This is the URL that is displayed after successfully authenticating the SSO user before crawling the Oracle Content Server source.

The format for this URL for Oracle 10g SSO secured Oracle Content Server source is:

https://server:port/sso/auth

The format for this URL for Oracle 11g OAM secured Oracle Content Server source is:

http://server:port/oam/server/auth_cred_submit

The value for port can be found from the "Listen Port" field in the OAM Managed Server Console.

Scratch Directory

Directory where Oracle SES can write temporary status logs. The directory must be on the same system where Oracle SES is installed. Optional.

Maximum number of connection attempts

Maximum number of attempts to connect to the target server for access to the data feed.

Delete Linked Document

Set to true if the documents crawled from the links in the feeds must be deleted, else set to false.

Number of data feeds to be pre-fetched

Number of data feeds to pre-fetch. Its value can be any number greater than or equal to 0. If its value is set to 0, then no data feed is pre-fetched. The value specified must be such that there is sufficient memory to cache the pre-fetched feeds. The default value is 0.

Enable Resume Crawl

Set it to true to enable resuming crawl when the crawl fails or stops. The default value is true.

Stop Crawl On Content Fetch Error

Set it to true to stop crawl on content fetch error, else set it to false. The default value is false.


Table 7-12 Oracle Content Server Connector Authorization Parameters (Step 2)

Parameter Value

HTTP Endpoint for Authorization

HTTP endpoint for Oracle Content Server authorization, such as http://example.com:7777/idc/idcplg.

Display URL Prefix

HTTP host information to prefix the partial URL specified in the access URL of the documents in RSS feeds to form the complete URL. This complete URL is displayed as the URL when a user clicks the document link in the Oracle SES search results page. For example, you might display http://example.com:7777/idc (not http://example.com/, as shown on the user interface page).

Administrator User

Administrative user to access the Authorization Service API of Oracle Content Server.

Administrator Password

Administrative user password.

Display Crawled Version

Controls access to the crawled documents:

  • true: Search results point to the crawled version of the document.

  • false: Search results point to the content information page.

Authorization User ID Format

Format of the user ID used by the Oracle Content Server authorization API, such as username, email, nickname, user_name.

Use Cached User and Role Information to Authorize Results

Controls user authorization:

  • true: Uses the cached user query filter. This setting removes the query time dependency on Oracle Content Server.

  • false: Queries Oracle Content Server for authorization.

User Role Data Source to Cache the Filter

The name of the Oracle Content Server Users source that has crawled the user's SecurityGroup and Account information.