PK ɒj?oa,mimetypeapplication/epub+zipPKɒj?iTunesMetadata.plist] artistName Oracle Corporation book-info cover-image-hash 730882341 cover-image-path OEBPS/dcommon/oracle-small.JPG package-file-hash 186770945 publisher-unique-id E17332-04 unique-id 226347462 genre Oracle Documentation itemName Oracle® Secure Enterprise Search Administrator's Guide, 11g Release 2 (11.2.1) releaseDate 2011-05-17T17:4:43Z year 2011 PKob]PKɒj?META-INF/container.xml PKYuPKɒj?OEBPS/crawler001.htmW Overview of the Oracle Secure Enterprise Search Crawler

Overview of the Oracle Secure Enterprise Search Crawler

The Oracle Secure Enterprise Search (Oracle SES) crawler is a J2SE process activated by a set schedule that runs on the middle tier. When activated, the crawler spawns processor threads that fetch documents from sources. The crawler caches the documents, and when the cache reaches the maximum batch size of 250 MB, the crawler indexes the cached files. This index is used for searching.

The document cache, called Secure Cache, is stored in Oracle Database in a compressed SecureFile LOB. Oracle Database provides excellent security and compact storage.

In the Oracle SES Administration GUI, you can create schedules with one or more sources attached to them. Schedules define the frequency at which the Oracle SES index is kept up-to-date with existing information in the associated sources.

See "Understanding the Crawling Process" for more detailed information about the crawling process.

Modifying the Crawler Parameters

You can alter the crawler's operating parameters at two levels:

  • At the global level for all sources

  • At the source level for a particular defined source

Global parameters include the default values for language, crawling depth, and other crawling parameters, and the settings that control the crawler log and cache.

To configure the crawler: 

  1. Click the Global Settings tab.

  2. Under Sources, click Crawler Configuration.

  3. Make the desired changes on the Crawler Configuration page. Click Help for more information about the configuration settings.

  4. Click Apply.

To configure the crawling parameters for a specific source: 

  1. From the Home page, click the Sources secondary tab to see a list of sources you have created.

  2. Click the edit icon for the source whose crawler you want to configure, to display the Edit Source page.

  3. Click the Crawling Parameters subtab.

  4. Make the desired changes. Click Help for more information about the crawling parameters.

  5. Click Apply.

The parameter values for a particular source can override the default values set at the global level. For example, for Web sources, Oracle SES sets a default crawling depth of 2, irrespective of the crawling depth you set at the global level.

Also note that some parameters are specific to a particular source type. For example, Web sources include parameters for HTTP cookies.

PKZPKɒj?OEBPS/bisources001.htm Setting Up Web Sources

Setting Up Web Sources

A Web source enables users to search a Web site. The following procedures identify the basic steps for setting up a Web source using the Oracle SES Administration GUI. For more information on each page, click Help.

Oracle SES is configured to crawl Web sites on the intranet within the corporate fire wall. To crawl Web sites on the Internet (external Web sites), Oracle SES requires the HTTP proxy server information. See the Global Settings - Proxy Settings page.

You should review the default crawling parameters before you start crawling Internet sources.

To create a Web source: 

  1. On the Home page, select the Sources secondary tab to display the Sources page.

  2. For Source Type, select Web.

  3. Click Create to display the Create Web Source page.

  4. Complete the following fields:

    • Source Name: Name that you assign to this Web source.

    • Starting URLs : The HTTP or HTTPS address of the Web site, starting at the top page to be searched.

    • Self Service : Disabled to use an identity management system or Enabled to prompt users for their credentials.

    • Start Crawling Immediately : Select this option to accept the default parameters and begin crawling, or deselect it to defer crawling.

  5. Click Create or Create & Customize.

  6. Follow the steps for crawling and indexing a source in "Getting Started Basics for the Administration GUI".

Figure 5-1 shows the Create Web Source page.

To customize a Web source: 

  1. When creating a Web source, click Create & Customize on the Create Web Source page to display the Customize Web Source page.

    or

    After creating a source, click the Edit icon on the Home - Sources page.

  2. Click the following subtabs and make the desired changes.

  3. Click Apply.

Figure 5-2 shows the Customize Web Source page.

Web Document Attributes

Oracle SES crawls and indexes these Web document attributes:

  • Title

  • Author

  • Description

  • Host

  • Keywords

  • Language

  • LastModifiedDate

  • Mimetype

  • Subject: Mapped to "Description". If there is no description metatag in the HTML file, then it is ignored.

  • Headline1: The highest H tag text; for example, "Annual Report" from <H2>Annual Report</H2> when there is no H1 tag in the page.

  • Headline2: The second highest H tag text

  • Reference Text: The anchor text from another Web page that points to this page.

You can define additional HTML metatags to map to a String attribute on the Home - Sources - Metatag Mapping page.

PKRTPKɒj?OEBPS/oessecurity006.htm> Changing the Master Encryption Key

Changing the Master Encryption Key

A master encryption key is used to encrypt secure fields in Oracle SES. You can change this key if its security is compromised or for any other reason.

To change the master encryption key: 

  1. Stop all crawler schedules.

  2. Close all middle-tier applications, except for the Monitor application.

  3. Open an interactive session on the Oracle SES middle-tier computer.

  4. Issue a searchctl rollover_key command. See the following description.

  5. Restart the crawler and the middle-tier applications.

searchctl rollover_key

This command has the following syntax:

searchctl rollover_key options

Options have the format keyword=value:

ses_db_conn_str

Local JDBC connection string for the Oracle SES database. For example, localhost:5555:ses1. Required.

ses_admin_passwd

Oracle SES administrative password, that is, for the SEARCHSYS user. If you omit this password from the command, then you are prompted for it.

wls_admin_server

URL to the WebLogic Server Administration Console. For example, t3://wls_example:8000. Required.

wls_admin_user

User name of the WebLogic administrative user. (Required)

wls_admin_passwd

Password of the WebLogic administrative user. If you omit this password from the command, then you are prompted for it.

master_key

New master key. If you omit this option, a random master key is set.

The following command changes the master key to "testing123";

searchctl rollover_key ses_db_conn_str=localhost:5555:ses1 ses_admin_passwd=password wls_admin_user=weblogic wls_admin_passwd=password wls_admin_server=t3://asHost:8000 master_key=testing123
PK9C>PKɒj?OEBPS/cmsources004.htm]g Setting Up Oracle Content Server Sources

Setting Up Oracle Content Server Sources

The Oracle Content Server connector enables Oracle SES to search Oracle Content Server (formerly Stellent Server), which is the foundation of the Oracle Universal Content Management solution. Users throughout the organization can contribute content from native desktop applications, manage content through rich library services, publish content to Web sites or business applications, and access the content with a browser.

The Content Server connector supports Oracle Content Server 7.5.2 or 10gR3 with XMLCrawlerExport (the Oracle Content Server RSS component).

Oracle Content Server includes an RSS feed generator component (XMLCrawlerExport) on top of the content server. This component generates RSS feeds as XML files from its internal indexer, based on indexer activity. It has access to the original content (for example, a Microsoft Word document), the Web viewable rendition, and all the metadata associated with each document. The component also has a template that contains a Idoc script that applies the metadata values from the indexer to generate the XML document. (Idoc is an Oracle Content Server proprietary scripting language.) Oracle Content Server generates feeds for all documents for the initial crawl, and feeds for updated and deleted documents for the incremental crawl. Each document can be an item in the feed, with the operation on the item (such as insert, delete, update), its metadata (such as author, summary), URL links, and so on.

The Oracle Content Server connector reads the feeds provided by Oracle Content Server according to a crawling schedule. Oracle SES parses and extracts the metadata information, and fetches the document content, using its generic RSS crawler framework.

Oracle SES supports the control feed method, in which individual feeds can be located anywhere and a control feed file is generated containing the links to other feeds. This control file is input to the connector through the configuration file. Control feed must be used when two computers are on different domains or on different platforms, or if they use remote access protocol, such as HTTP or FTP, for communication between the two servers.

Oracle Content Server Security Model

The Oracle Content Server security model is based on the concept of permissions, which defines the privileges a user has on a document. The following table shows the set of permissions supported by Oracle Content Server. Each permission is a superset of the previous ones. For example, Write permission includes Read permission. Admin permission is a superset of all the permissions.

Oracle Content Server provides multiple security models, including an out-of-the-box security system and integration with centralized security models such as LDAP and Active Directory.

Oracle Universal Content Management security can work in these modes:

  • Universal Content Management native identity plugin where Universal Content Management is not connected to a directory

  • Oracle Internet Directory

  • Active Directory only where Universal Content Management is connected to Active Directory using LDAP. A connection to Active Directory using Microsoft Security is not supported.

The Oracle SES Oracle Content Server connector supports the two most popular security models among current Oracle Content Server customers: Roles and Groups, and Accounts.

Accounts

Accounts provide greater flexibility and granularity than groups. An account is a group of content. It introduces another metadata field that is filled out upon content check-in. When accounts are enabled, content items also can be assigned to an account in addition to the security group. A user must have access to the account to read, write, delete or administer content in that account. When accounts are used, the account becomes the primary permission to satisfy before security group permissions are applied.

A user's access to a document is like the intersection between their account permissions and security group permissions. For example, a user is assigned the EngAdmin role, which has all permissions to the documents in EngDocs security group. At the same time, the user is also assigned Read and Write permission to the EngProjA account. Therefore, the user has only Read and Write permission to a content item that is in the EngDocs security group and the EngProjA account.

Accounts can also be set up in a hierarchical structure. A user has permission to the entire subtree starting from the account node. For instance, a user assigned to the Eng account has access to Eng/AbcProj and Eng/XyzProj, or any accounts beginning with Eng. In other words, users that have permission to a particular account prefix also have access to all accounts with that prefix.


Note:

Oracle Content Server uses a prefix test for account filtering, so a slash (/) has no special meaning. A user granted permission to account A has access to any documents in account A*, such as A, AB, or A/B. The hierarchical structure takes advantage of the prefix semantics, but it is enforced with the account model. Hence, there is no special character as the level divider when testing for account permissions.


See Also:

Oracle Universal Content Management documentation at

http://www.oracle.com/technetwork/middleware/content-management/index-094708.html


Creating an Oracle Content Server Source

To create an Oracle Content Server source using the Oracle SES Administration GUI:

  1. On the Home page, click the Sources secondary tab to display the Sources page.

  2. Select Oracle Content Server from the Source Type list, then click Create to display Step 1 Parameters.

  3. Enter values for the parameters described in Table 6-12.

  4. Click Next to display Step 2 Authorization, then set values for the parameters described in Table 6-12.

  5. Scroll down to Security Attributes to verify that ACCOUNT and DOCSECURITYGROUP are listed. If they are not, then the source was not created correctly. Verify that the Configuration URL in Step 1 is correct.

  6. Click Create to create the Oracle Content Server source.

    After processing each data feed, a status feed is uploaded to the location specified in the configuration file. This status feed is named one of the following:

    • data_feed_file_name.suc indicates the data feed was processed successfully.

    • data_feed_file_name.err indicates that an error was encountered while processing the feed. The errors are listed in this status feed.


Tip:

To index multibyte character sets, set the default character set of the crawler to UTF-8 regardless of the character set of Oracle Content Server. See "Modifying the Crawler Parameters".

Table 6-12 Oracle Content Server Source Parameters (Step 1)

ParameterValue

Configuration URL

URL of the XML configuration file providing details of the source, such as the data feed type, location, security attributes, and so on. Obtain the location of the file from the Oracle Content Server administrator.

Use the following format to enter the configuration URL:

http://host_name/instance_name/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=source_name

Authentication Type

Java authentication type. Set this parameter when the data feeds are accessed over HTTP.

Enter one of the following values:

  • NATIVE: Proprietary XML over HTTP authentication

  • ORASSO: Oracle Single Sign-on.

User ID

User ID to access the data feeds. The access details of the data feed are specified in the configuration file. Obtain a user ID from the Oracle Content Server administrator.

Password

Password for User ID. Obtain the password from the Oracle Content Server administrator.

Realm

Realm of the Oracle Content Server instance.

Oracle SSO Login URL

URL that protects all OracleAS Single Sign-on applications. Set this parameter when the Authentication Type is ORASSO.

Oracle SSO Action URL

URL that authenticates OracleAS Single Sign-on user credentials. The login form is submitted to this URL. Set this parameter when Authentication Type is ORASSO.

Scratch Directory

Directory where Oracle SES can write temporary status logs. The directory must be on the same system where Oracle SES is installed. Optional.

Maximum number of connection attempts

Maximum number of attempts to connect to the target server for access to the data feed.


Table 6-13 Oracle Content Server Connector Authorization Parameters (Step 2)

ParameterValue

HTTP Endpoint for Authorization

HTTP endpoint for Oracle Content Server authorization, such as http://example.com:7777/idc/idcplg.

Display URL Prefix

HTTP host information to prefix the partial URL specified in the access URL of the documents in RSS feeds to form the complete URL. This complete URL is displayed as the URL when a user clicks the document link in the Oracle SES search results page. For example, you might display http://example.com:7777/idc (not http://example.com/, as shown on the user interface page).

Administrator User

Administrative user to access the Authorization Service API of Oracle Content Server.

Administrator Password

Administrative user password.

Display Crawled Version

Controls access to the crawled documents:

  • true: Search results point to the crawled version of the document.

  • false: Search results point to the content information page.

Authorization User ID Format

Format of the user ID used by the Oracle Content Server authorization API, such as username, email, nickname, user_name.

When no value is specified, the canonical form of the user identity in the active identity plug-in is submitted to the authorization API.

Use Cached User and Role Information to Authorize Results

Controls user authorization:

  • true: Uses the cached user query filter. This setting removes the query time dependency on Oracle Content Server.

  • false: Queries Oracle Content Server for authorization.

User Role Data Source to Cache the Filter

The name of the Oracle Content Server Users source that has crawled the user's SecurityGroup and Account information.

Authentication Type

Java authentication type. Enter NATIVE for proprietary XML over HTTP authentication, or ORASSO for Oracle Single Sign-on. Set this parameter when the data feeds are accessed over HTTP.

Realm

Realm of the Oracle Content Server instance.

Oracle SSO Login URL

URL that protects all OracleAS Single Sign-on applications. Set this parameter when the Authentication Type is ORASSO.

Oracle SSO Action URL

URL that authenticates OracleAS Single Sign-on user credentials. The login form is submitted to this URL. Set this parameter when Authentication Type is ORASSO.


PK/cFbg]gPKɒj?OEBPS/part_advanced.htm Advanced Topics

Part III

Advanced Topics

This part provides information for experienced administrators. It contains the following chapters:

PK ՅPKɒj?OEBPS/index.htm Index

Index

A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  W  X 

Symbols

DR$EQ$DOC_PATH_IDX$I, 10.3.11

A

access URL, 3.1.2
ACLs
defined, 9.1.3.2
policies, 9.1.3.2, 9.1.3.2, 9.2.5
restrictions, 9.1.3.3, 9.1.3.3
Active Directory
activating the plug-in, 9.1.3.4.1
IDM systems, 5.7.1
Administration GUI, 1.3.1
administrative user
SEARCHSYS, 9.1.2, 9.1.4.1
AJP13 protocol, 5.7.1
from remote hosts, 9.1.1
alternate words, 2.2.2
Apache Axis
license, C.1
Apache log4j
license, C.1
APIs
Authorization Plug-in, 9.1.3.2
Web Services, 11.2
Administration Web Service, 11.2
Query Web Service, 11.2
application identity, 1.4.2, 3.7
Application Server Control Console
overview, 10.8
Applications Control, 3.7
attributes
attribute-based security, 3.3.2
mapping federated, 5.7.3.4
overview, 3.5
retrieving federated, 5.7.3.3
tuning the weights of, 4.2
authentication, 10.3.2
authorization, 10.3.2
ACLs, 9.2.3
query-time filtering, 9.2.5
self service, 9.2.6
authorization plug-in
Fusion, 8.1
WebCenter, 8.2
Authorization Plug-in API, 9.1.3.2

B

boundary control of Web crawling, 3.2
boundary rules, 2.2.3, 3.7.1
defined, 3.2.2
example using regular expression, 3.2.2.3
exclusion rules, 3.2.2.2
inclusion rules, 3.2.2.1
permanent redirect, 10.2.8
tuning, 10.2.3
with dynamic pages, 10.2.4
with file sources, 10.2.3.1
with Portal sources, 5.6.1.2
with symbolic links, 5.3.2.2
buffer cache, 10.3.11

C

character set detection, 3.2.8
Chinese, 1.5, 3.2.8, 3.2.8.2
cluster configuration, 10.3.8
crawler, 3.1
crawling multimedia files, 3.2.2.2
crawling process, 3.5
depth, 3.2.4, 10.2.5
document types
zip files restriction, 3.2.3
log file, 3.7.2, 10.2.11
crawler.dat configuration file, 3.7.2
enabling character set detection, 3.2.8
setting default document titles, 3.2.7, 3.2.7
maintenance crawls, 3.6.2
monitoring the crawling process, 3.7
overview, 3.1
URL status codes, B
crawler configuration, 2.2.3
crawler status
Error Manual Recovery, 10.2.1.2
crawling mode, 3.2.1

D

data files, 10.1
database initialization parameters, 10.3.9
debug mode, 10.4
DICOM, 3.4.3
dicom, 3.4.3
display URL, 3.1.2
document attributes, 3.5
document service, 9.2
document types
zip files restriction, 3.2.3
domain rules, 3.2.2
duplicate documents, 10.2.7
dupMarked, 11.2.4.3.1, 11.2.5.3.4, 11.2.5.3.6, 11.2.5.3.8
dupRemoved, 11.2.4.3.1, 11.2.5.3.4, 11.2.5.3.6, 11.2.5.3.8
hasDuplicate, 11.2.4.3.2
isDuplicate, 11.2.4.3.2
versus near duplicate documents, 10.2.7
dynamic pages, 10.2.4

E

easy connect naming method, 10.5
encryption key, 9.6
Enterprise Manager Applications Control, 3.7
Error Manual Recovery status, 10.2.1.2
error messages, D
ESSAPP, 3.7

F

faceted navigation, 4.3
failed schedules, 2.2.1, 10.2.1
failover support, 10.5
federated search, 1.5.2
characteristics, 5.7.4.2
example, 5.7.2
limitations, 5.7.4.3
setting up, 5.7
trusted entities, 5.7.1
federation trusted entities, 5.7.1
file sources
crawling file URLs, 5.3.2.3
multibyte environments, 5.3.2.1
tips, 5.3.2.2
URL boundary rules
with file sources, 10.2.3.1
with symbolic links, 5.3.2.2
Fusion Connector, 8.1
FUSION_APPS_SEARCH_APPID, 1.4.2, 3.7

G

gif, 3.4.3
Google Desktop for Enterprise
integrating with, 10.7

H

hit count
approximate count, 9.2.5
exact count, 9.2.5
exact count (adjusted for query-time filtering), 9.2.5
HTML forms, 9.1.2.1
HTTP authentication, 9.1.2.1, 9.1.4
HTTP protocol, 3.1.2, 5.3.2.3, 9.1.1
HTTP proxy server, 5.1
HTTP proxy servers, 10.2.2
HTTP status codes, 3.7.3, 10.2.8, 10.2.8, 10.2.8, 10.6, B
HTTPS protocol, 3.1.2, 5.7.1, 9.1.1, 9.5, 9.5.4

I

identity management systems, 2.2.3, 9.1.1, 9.1.3.1, 9.1.3.4, 9.1.4.1, 9.2.3
identity plug-in, 8.1.1
Fusion, 8.1
identity plug-ins, 2.2.3, 9.1.3.4
ACLs, 9.1.3.2
activating, 9.1.3.4
define, 9.1.1
re-registering, 9.1.3.5
restrictions, 9.1.3.6
user authentication, 9.1.3.1
image format
dicom, 3.4.3
gif, 3.4.3
tiff, 3.4.3
image formats
jpeg, 3.4.3
IMAP server, 9.2.6
mailing list sources, 5.5
index memory size, 10.3.5.2
index optimization, 10.3.4
indexing
stopwords, 3.6.1.3
indexing batch size, 10.3.5.1
indexing parameters, 10.3.5
initialization parameters, 10.3.9

J

Japanese, 1.5, 3.2.8, 3.2.8.2, 3.2.8.2, 7.3.1.3, 8.4.1, 8.5.1
JDBC, 8.4, 9.1.1
jpeg, 3.4.3

K

KEEP pool, 10.3.11
key, master encryption, 9.6
Korean, 1.5, 3.2.8.2, 5.3.2.1

L

list of values (LOV), 3.5.2
log files
crawler log file, 10.2.11

M

mailing list sources
tips, 5.5
master encryption key, 9.6
metadata, 3.5
multimedia files
crawling, 3.2.2.2
MW_HOME, Preface

N

navigation tools, 4.3

O

OAM
See Oracle Access Manager
OC4J server, 11.2.3
open_cursors parameter, 10.3.9
optimizing
index, 10.3.4
Oracle Access Manager, 9.4
Oracle Calendar sources
secure, 7.6
Oracle Content Database sources, 6.3
tips, 6.3.1
Oracle Content Services, 6.3.1
Oracle Enterprise Manager, 3.7
Oracle HTTP Server
channel with Oracle SES, 9.1.4.2
front-ending, 9.1.4.2, 9.3
SSL-protect, 5.7.1
with AJP13 port, 5.7.1
Oracle Internet Directory
identity plug-in, 9.1.4.1
restrictions, 9.1.3.6
IDM systems, 5.7.1
login attribute, 7.6.2
overview, 9.1.4.1
Oracle RAC
failover, 10.5
tuning, 10.3.7
Oracle Secure Enterprise Search
accessing Application Server Control Console, 10.8
Administration GUI, 1.3.1
components, 1.3
crawler, 1.3.2, 3.1
error messages, D
getting started, 2.1
global settings, 2.2.3
integration with Oracle Internet Directory, 9.1.4.1
overview, 1.1
security, 9.1
statistics, 2.2.1
third party licenses
Apache Axis, C.1
Apache log4j, C.1
tuning crawl performance, 10.2
what's new in 10.1.7, Preface
ORACLE_HOME, Preface
OracleAS Portal sources, 9.1.2.1
tips, 5.6.1.2
user privileges, 5.6.1.2
OracleAS Single Sign-On, 9.1.2.1, 9.1.4.2

P

parallel querying, 10.3.3
partitioning, 10.3.3
passwords
temporary, 9.1.2.1
path rules, 3.2.2
physical memory, 10.3.11
processes parameter, 10.3.9
proxy servers, 10.2.2

Q

query application, 9.2
customize results, 4.2
suggested content, 4.1
query configuration, 2.2.3
query-time authorization
comparison with ACLs, 9.1.3.2
configuration, 9.2.5

R

redo log, 10.2.10
relevancy boosting, 2.2.2, 10.3.6.1
limitations, 10.3.6.1
result filter, 6.3.4
ResultFilterPlugin class, 9.2.5
robots META tag, 3.2.5, 10.2.6
robots.txt file, 3.2.5, 10.2.6
robots.txt protocol, 3.2.5, 10.2.6
rollover_key, 9.6
rules
domain, 3.2.2
path, 3.2.2

S

schedules, 2.2.1
failed, 2.2.1
fixing stuck requests, 10.2.1.2
understanding, 10.2.1
search attributes
default, 3.5
search performance, 2.2.2
search results
narrowing, 4.3
search server configuration, 10.3.8
SEARCH_DATA tablespace, 10.1
SEARCH_INDEX tablespace, 10.1
SEARCH_TEMP tablespace, 10.1
searchctl rollover_key, 9.6
SEARCHSYS
administrative user, 9.1.2, 9.1.4.1
secure search, 1.5.1
identity plug-ins, 2.2.3
security filters, 9.1.3.1, 9.1.3.2, 10.3.2
self service authorization, 9.2.6
sessions parameter, 10.3.9
SOAP, 11.2, 11.2.2, 11.2.2.2
client applications using, 11.2.3.1
development environment, 11.2.4.2
message body, 11.2.3
messages, 11.2.9
source groups, 2.2.2
source hierarchy, 2.2.2
sources
synchronizing, 3.1
types
database, 8.4
e-mail, 1.2
EMC Documentum Content Server, 6.1
EMC Documentum eRoom, 7.1
federated, 5.7
file, 1.2
Lotus Notes, 7.2
mailing list, 1.2
Microsoft Exchange, 7.3
NTFS for UNIX, 7.5
NTFS for Windows, 7.4
Oracle Calendar, 7.6
Oracle Content Database, 6.3
Oracle Mail, 7.7
OracleAS Portal, 1.2
Siebel 7.8, 8.5
Siebel 8, 8.6
table, 1.2
Web, 1.2
spell checking, 2.2.3
SQL*Plus
connecting using, 9.1.1
SSL, 9.1.1, 9.5.1
certificates, 9.5.1
crawling Web site with SSL certificates, 9.5.3
importing certificates, 9.5.3
in Oracle SES, 9.5
JSSE, 9.5
keystore, 9.5.1
statistics, 2.2.1, 10.3.6
stoplist, 3.6.1.3
stopwords, 3.6.1.3
storage areas, 10.3.3.1
stuck threads, 10.3.8
suggested content, 4.1
example with Google OneBox, 4.1.3
security options, 4.1.2
suggested links, 2.2.2, 10.3.1

T

tablespaces, 10.1
temp files, 10.1
temporary passwords, 9.1.2.1
threads, stuck, 10.3.8
tiff, 3.4.3
time outs, 10.3.2
tips
using file sources, 5.3.2
using mailing list sources, 5.5
using Oracle Calendar sources, 7.6
using Oracle Content Database sources, 6.3.1
using OracleAS Portal sources, 5.6.1.2
using user-defined sources, 5.3.2.3
titles, changing, 3.2.7, 3.2.7
trusted entities, 5.7.1

U

undo tablespace, 10.3.10, 10.3.10
UNDO_RETENTION parameter, 10.3.10
updateCred command (WLST), 10.5
URL boundary rules, 2.2.3, 3.7.1
defined, 3.2.2
permanent redirect, 10.2.8
tuning, 10.2.3
with dynamic pages, 10.2.4
with Portal sources, 5.6.1.2
with symbolic links, 5.3.2.2
URL crawler status codes, B
URL looping, 10.2.9
URL queue, 3.1.1
user authentication, 9.1.3.1
user authorization, 9.1.3.2
user-defined sources, 1.2, 2.2.1
tips, 5.3.2.3

W

Web crawling
boundary control, 3.2
Web Services API, 11.1, 11.2
architecture, 11.2.3
concepts, 11.2.2
SOAP, 11.2.2.2
WSDL, 11.2.2.3
data types, 11.2.4
example, 11.2.7
installation, 11.2.1
operations, 11.2.5.1
query syntax, 11.2.6
URL, 11.2.1
WebLogic Server Administration Console, 10.8
WebLogic server configuration, 10.3.8
WSDL specification, 11.2.2.3

X

XML connector framework, 3.3
examples, A
Oracle E-Business Suite, 8.3
schemas, A
Siebel 8, 8.6
PK%גPKɒj?OEBPS/cmsources.htm  Configuring Access to Content Management Sources

6 Configuring Access to Content Management Sources

This chapter contains the following topics:

PKlދ% PKɒj?OEBPS/clsources001.htm?1 Setting Up EMC Documentum eRoom Sources

Setting Up EMC Documentum eRoom Sources

The EMC Documentum eRoom Server plug-in extends the searching capabilities of Oracle SES and enables it to search Documentum eRoom Server repositories. Oracle SES can crawl through the documents and related metadata in the Documentum eRoom and provide secure, full-text search. It also provides metadata search and browse functionality.

Documentum eRoom data is stored in an eRoom, which in turn can contain other containers and content. A Documentum eRoom Server instance can have one or more items that can be crawled using the Documentum eRoom Server plug-in by configuring parameters in Oracle SES. The Documentum eRoom Server plug-in navigates through all the containers and the inline contents to crawl all the documents/items in Documentum eRoom Sever. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end user permissions.

The Documentum eRoom Server plug-in supports incremental crawling; that is, it crawls and indexes only those documents which have changed after the most recent crawling was performed. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum eRoom Server and the end user has to access the same document with a different URL. Documents deleted from items are removed from the index during incremental crawling.

Important Notes for Documentum eRoom Sources

  • The eRoom crawler plug-in should use the administrator account for crawling and indexing eRoom items.

  • The Documentum eRoom Server version must be 7.3.

Required Tasks

The following tasks must be performed before installing the Documentum eRoom Server plug-in:

  • Microsoft Active Directory Identity Plug-in: Configure Oracle SES to Active Directory Identity Plug-in:

    This task must be performed if the identity plug-in for Active Directory is being used for authentication.

    In the Oracle SES Administration GUI, navigate to the Global Settings - Identity Management Setup page. Select The Active Directory Identity Plug-in Manager implemented based on Oracle User & Role API, and click Activate.

    • For Authentication Attribute, select 'USER_NAME'.

    • For Directory URL, enter the host name and port number, for example 'ldap://ldapserverhost:port'.

    • For Directory account name, enter Active Directory User, for example 'Administrator'.

    • For Directory account password, enter the password for Directory account name.

    • For Directory subscriber, enter the Active Directory information (ldap base); for example, 'dc=us,dc=oracle,dc=com'.

    • For Directory security protocol, enter the appropriate value: 'none' or 'port number'.

    Click Finish.

  • Microsoft Active Directory Identity Plug-in: Synchronize users and groups from Active Directory to eRoom:

    1. Login to eRoom Server and navigate to Community Setting.

    2. On the right side, click Directories - Select add a Directory connection. For Name, enter a name for the LDAP Directory Connection. Select the LDAP Directory option. Click Next.

    3. Enter the URLs for the LDAP directory you want to connect to. Provide the user name and password of the LDAP server. Click Next. For Search Root, specify dc=us,dc=oracle,dc=com.

    4. For Search Filter, specify cn=*. Click Next.

    5. Display the test query of connection information. Click Next.

    6. Attribute Map information is displayed. Click Next.

    7. Display the test Mapping. If these are correct, click OK.

    8. Run the LDAP_Synchronization job: To synchronize a connection, click synchronize all connection. Click OK.

  • Set up the eRoom Web Service:

    1. Check the pre-installation requisites before proceeding.

    2. Navigate to the ORACLE_HOME/search/lib/plugins/eroom folder. Unzip EroomServices.zip to any temporary folder on the computer where the IIS instance for eRoom is installed.

    3. Run Setup.Exe to install the Web service on the server that is hosting eRoom. Provide a name for the virtual directory to be created. This name is required when entering the URL for Web Service parameter in Oracle SES.

    4. Verify that the Web service is installed by checking the following URL:

      http://iisServerIP/host/VirtualDirectoryName

Creating a Documentum eRoom Source

Create a source for the user-defined eRoom source type on the Home - Sources page. Enter a source name. Provide values for the following parameters.

  • Container name: The names of the containers to be crawled by Oracle SES. You can crawl the entire Site, Community, Facility, or eRoom item. Required.

    The format for specifying container is as follows:

    <siteName>   OR
    <siteName>/<communityName>   OR
    <siteName>/<communityName>/<FacilityName>   OR
    <siteName>/<communityName>/<FacilityName>/<eRoomName>
    

    For example:

    Container name:OracleSite/OracleCommunity/OracleFacility/OracleRoom
    

    OracleRoom is crawled.

  • Attribute list: The comma-delimited list of eRoom custom attributes along with their data types to be searchable. The format is attributeName:attributeType, attributeName:attributeType. Valid values are String, Number, and Date.

    While crawling eRoom, an attribute is indexed only if both name and type match the configured name and type; otherwise, it is ignored. This is an optional field. For example, to make the following eRoom attributes searchable:

    • Attribute Name: Account Name Attribute Type: String

    • Attribute Name: Account ID Attribute Type: Integer

    • Attribute Name: Creation Date Attribute Type: Date

    The value should be:

    Account Name: String, Account ID: Number, Creation Date: Date

    The default searchable attributes for Documentum eRoom Server are Modified Date, Title, Author, CreateDate, and MimeType.

  • User name: User name of a valid Documentum eRoom Server user. The user should be an administrator or a user who has access to all content, metadata, and ACL from all folders and documents of items configured in Container name. Required.

  • Password: Password of the Documentum user configured previously. Required.

  • Crawl versions: Controls whether multiple versions of documents are crawled. Valid values are true or false. The default value is false. Any other value is interpreted as false and only the latest version of a file is crawled. Optional

  • URL for Web Services: A valid URL where eRoom Web service has been installed. (http://server/virtualName) For example, http://10.113.10.82/EroomServices.

  • URL for viewing the documents: A valid IP address or host name with port number (IP_ address:port) of the server hosting Documentum eRoom. It is used for viewing the Oracle SES search results; for example, http://10.113.10.82/eRoom or http://10.113.10.82:7512/eRoom.

  • Authentication Attribute: Attribute used by the LDAP to validate the user. This varies based on the identity plug-in used for authentication. For Active Directory, set it to USER_NAME.

PK tD1?1PKɒj? OEBPS/toc.htm Table of Contents

Contents

List of Tables

Title and Copyright Information

Preface

What's New

Part I Learning the Basics

1 Introduction to Oracle Secure Enterprise Search

2 Getting Started with the Oracle SES Administration GUI

3 Understanding Crawling

4 Customizing the Search Results

Part II Creating Data Sources

5 Configuring Access to Built-in Sources

6 Configuring Access to Content Management Sources

7 Configuring Access to Collaboration Sources

8 Configuring Access to Applications Sources

Part III Advanced Topics

9 Security in Oracle Secure Enterprise Search

10 Administering Oracle SES Instances

11 Oracle Secure Enterprise Search APIs

A XML Connector Examples and Schemas

B URL Crawler Status Codes

C Third Party Licenses

D Error Messages

Glossary

Index

PK@ P;PKɒj?OEBPS/img_text/search.htm& Description of the illustration search.gif

This is a screen shot of the Search tab. The subtabs available are Relevancy, Suggested Links, Suggested Content, Alternate Words, and Source Groups.

PK*Ҹ0+&PKɒj?%OEBPS/img_text/crawler_pq_index_2.htm: Description of the illustration crawler_pq_index_2.gif

The figures shows how documents are partitioned by the partition engine and distributed across different storage areas, where they are processed in parallel.

PK Description of the illustration global.gif

This screen shot shows the Global Settings tab. The headings under Global Settings are Sources, System, and Search.

PKat( PKɒj?OEBPS/img_text/benri001.htmY Description of the illustration benri001.eps

This graphic shows the client host in a box. Within that box are client applications and the Oracle SES Java proxies. The client applications can make SOAP calls directly or use the proxy Java library. Completely separate from this is the Oracle SES WSDL published interface.

PK\PKɒj?OEBPS/img_text/custom_web.htmy Description of the illustration custom_web.gif

Customizing a Web Source: This screen capture shows the Customize Web Source page of the Oracle SES Administration GUI. At the top right are three links: Search, Help, and Logout. Below the links are three tabs: Home, Search, and Global Settings. Home is selected. Below the tabs and at the left are four subtabs: General, Sources, Schedules, and Statistics. Sources is selected.

The Sources page has these subtabs: Basic Settings, URL Boundary Rules, Document Types, Authentication, Authorization, Metatag Mappings, and Crawling Parameters. Basic Settings is selected. It has the following fields and values:

Starting URLs are listed in a table with two columns: Select and Starting URLs. Above the table is a Remove button. Below the table is an Add Another Row button.

At the right are two buttons, Cancel and Apply.

PKv~yPKɒj?OEBPS/img_text/benri005.htma Description of the illustration benri005.eps

Federation Architecture: In this diagram, the Oracle SES Instance contains the Federator Engine, the Web Service API, and the End-User GUI. For Option 1, the Browser connects to the End-User GUI. For Option 2, the browser connects to Remote Applications, which connect to the Web Service API. The Oracle SES Instance provides search results from two Web Service/Remote Oracle SES Instances and one implementation of Oracle Secure Enterprise WS API/Other Systems.

PK # faPKɒj?OEBPS/img_text/oem_job.htmo Description of the illustration oem_job.gif

Oracle SES Crawls Reported in Fusion Applications Control: This screen capture shows the ESSAPP Scheduling Service. The bread crumbs show the location Scheduling Service Home > Request Details.

Request Details: 3009 (Running)

Request Properties:

Parameters: This table has Name and Value columns.

Execution Trail: This progress bar shows the submitted, scheduled, and start times.

PKI PKɒj?%OEBPS/img_text/crawler_pq_index_1.htm Description of the illustration crawler_pq_index_1.gif

The figure displays how a user query is divided into sub queries and processed individually.

PK_ PKɒj?%OEBPS/img_text/crawler_pq_index_3.htm Description of the illustration crawler_pq_index_3.gif

The figure shows how the query partition interface splits a query and distributes it across partitions.

PKh1 PKɒj?OEBPS/img_text/benri006.htm Description of the illustration benri006.eps

Crawler Collecting Information for Oracle SES: This graphic shows remote sources in a box in the top left. The remote sources are a database, Web sites, Fusion Applications, an e-mail server, and an OracleAS Portal server. A crawler collects these sources and sends them to a crawler plug-in. The crawler plug-in can go back and forth to other sources and remote applications. The middle tier includes the end-user GUI, Web services, and the administration tool. At the bottom of the graphic is a box for the metadata layer. It includes cached docs, index, metadata, source, schedules, source group, and others.

PK>#PKɒj?OEBPS/img_text/benri012.htmB Description of the illustration benri012.gif

Distribution of the Work Load Among Clusters: This diagram shows the Fusion Middleware Common Domain with two clusters: the Oracle SES cluster and the Enterprise Scheduler Cluster.

The Oracle SES cluster has two managed servers named search_server1 and search_server2. The Oracle SES Administration GUI, Administration Web Service, and Query Web Server run on the Oracle SES cluster.

The Enterprise Scheduler Cluster has two managed servers named ess_server1 and ess_server2. The Oracle SES crawler runs on the Enterprise Scheduler cluster.

PK hlPKɒj?OEBPS/img_text/benri011.htmq Description of the illustration benri011.gif

Oracle SES in Fusion Applications: This diagram shows the Fusion Middleware Common Domain containing the Administration Server and these clusters: Oracle SES cluster, Enterprise Scheduler cluster, and Fusion Applications clusters. The Oracle SES cluster contains two managed servers named search_server1 and search_server2. Outside the Common Domain is the Oracle Database 11.2.0.2 Transactional or Oracle RAC metadata repository. The clusters connect to this database instance.

PK8vqPKɒj?OEBPS/img_text/home.htm Description of the illustration home.gif

This is a screen shot of the Home tab. The subtabs available are General, Sources, Schedules, and Statistics.

PK/PKɒj?OEBPS/img_text/create_web.htmQ Description of the illustration create_web.gif

Creating a Web Source: This screen capture shows the Create Web Source page of the Oracle SES Administration GUI. At the top right are three links: Search, Help, and Logout. Below the links are three tabs: Home, Search, and Global Settings. Home is selected. Below the tabs and at the left are four subtabs: General, Sources, Schedules, and Statistics. Sources is selected. Create Web Source has the following fields and values:

Under Web Source List, no sources are defined.

Buttons appear at the right: Create & Customize, Cancel, and Delete.

PK&PKɒj?OEBPS/cmsources001.htm Setting Up EMC Documentum Content Server Sources

Setting Up EMC Documentum Content Server Sources

Documentum data is stored in DocBases, which can contain cabinets and folders. A Documentum Content Server instance can have one or more DocBases crawled with an EMC Documentum Content Server source. The Documentum Content Server source navigates through the DocBases and the inline cabinets to crawl all the documents in Documentum Content Server. Oracle SES creates an index, stores the metadata, and accesses information in Oracle SES to provide search capabilities according to the end user permissions.

Oracle SES supports incremental crawling; that is, it crawls and indexes only those documents that have changed after the most recent crawling was scheduled. A document is re-crawled if either the content or metadata or the direct security access information of the document has changed. A document is also re-crawled if it is moved within Documentum Content Server and the end user has to access the same document with a different URL. Documents deleted from a DocBase are removed from the index during incremental crawling.

Important Notes for EMC Documentum Content Server Sources

The Documentum source in Oracle SES must use the administrator account of a DocBase for crawling and indexing documents of that DocBase.

Required Tasks

Configuration for Documentum Content Server 6.5

For Windows, the JAR files can be taken from the application server directory where DA is deployed. For DFC installation on Linux, it is a prerequisite to create DFC program root and DFC user root. For example, the DFC program root can be USER HOME/DOCUMENTUM_SHARED and the DFC user root can be USER HOME/ DOCUMENTUM. Table 6-2 lists the location of the JAR files in Windows and Linux.

To configure the crawler plug-in: 

  1. Create a new directory under ORACLE_HOME/search/lib/plugin/dcs/. For example, dcsothers.

  2. Copy dfc.properties to the folder created in the previous step (dcsothers) and to the main folder (dcs).

  3. Copy dfc.jar, aspectjrt.jar, certjFIPS.jar, jsafeFIPS.jar, configservice-api.jar to the dcs folder in the following path ORACLE_HOME/search/lib/plugin/dcs.

  4. The environment variables $DOCUMENTUM_SHARED (DFC Program root) and $DOCUMENTUM (DFC user directory) must be created before installing DFC on Linux. Also note that the environment variables $DOCUMENTUM_SHARED, $DOCUMENTUM, and $CLASSPATH must be exported again, and Oracle SES must be restarted when the computer restarts. These variables can also be exported permanently in Linux.

    Export environmental variables in Linux using commands like these:

    For DOCUMENTUM:

    export DOCUMENTUM=/home/sesuser/DOCUMENTUM
    

    For DOCUMENTUM_SHARED:

    export DOCUMENTUM_SHARED=/home/sesuser/DOCUMENTUM_SHARED
    

    For CLASSPATH:

    export CLASSPATH=$DOCUMENT_SHARED/dctm.jar:$DOCUMENTUM_SHARED/config
    

Setting Up Identity Management for EMC Documentum Content Server

Setting up identity management requires administration steps in both Oracle SES and EMC Documentum. It includes the following steps:

Activating the Documentum Identity Plug-in

To activate the Documentum identity plug-in, perform the following steps:

  1. Select Documentum Identity Plug-in.

  2. Click Activate.

  3. Enter a valid DocBase name.

  4. Enter a valid user name and password.

  5. Ensure that the environment variable DOCUMENTUM and DOCUMENTUM_SHARED are set correctly.

  6. Click Finish.

Activating the Oracle Internet Directory Identity Plug-In

Before activating the Oracle Internet Directory Identity plug-in, Documentum Content Server should be synchronized with Oracle Internet Directory as an LDAP server. For synchronization, you must import the users and groups from Oracle Internet Directory to Documentum.

To synchronize users and groups in Oracle Internet Directory and Documentum Content Server: 

  1. Create an LDAP Configuration Object in Documentum Administrator (DA):

    1. Login to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. In the File Menu, select File, New, LDAP Configuration Object.

    4. In the Name field, enter a name for LDAP Configuration Object.

    5. Select dm_user as the user subtype.

    6. Under Communication Mode, select Regular.

    7. Under Import, select Users and Groups.

    8. Select Default Configuration Object to use this configuration object in the server field.

    9. Click Next.

    10. In the Directory Type field, select Oracle Internet Directory Server.

    11. In the Bind Type field, select Bind by Searching for Distinguished Name.

    12. In the Binding Name field, provide the administrative user name of Oracle Internet Directory. This is usually cn=orcladmin.

    13. In the Binding Password field, provide the administrative user password.

    14. In the Host Name field, provide the Oracle Internet Directory host name.

    15. Retain the default port number of Oracle Internet Directory (389).

    16. In the Person Object Class field, provide the information of Base Person Object, typically the value is inetOrgPerson.

    17. In the Person Search Base field, provide the person search base defined in Oracle Internet Directory. For example, cn=Users, dc=us, dc=oracle, dc=com.

    18. In the Person Search Filter field, specify cn=*.

    19. In the Group Object Class field, provide the Group Object. Typically the value is groupOfUniqueNames.

    20. In the Group Search Filter field, specify cn=*.

    21. Click Next.

    22. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After synchronizing the Documentum Content Server with Oracle Internet Directory, you must activate the Oracle Internet Directory activity plug-in in Oracle SES.

To activate the Oracle Internet Directory Activity Plug-in: 

  1. Log in to Oracle SES as the admin user.

  2. Click Global Settings.

  3. Select System, Identity Management Setup.

  4. Select Oracle Internet Directory identity plug-in manager and click Activate.

  5. Select nickname from the Authentication Attribute list.

  6. Provide the following values:

    • Host name: The host name of the computer where Oracle Internet Directory is running.

    • Port: The default LDAP port number, 389.

    • Use SSL: true or false based on your preference.

    • Realm: The Oracle Internet Directory realm, for example, dc=us.dc=oracle.dc=com

    • User name: The Oracle Internet Directory administrative user name, for example, cn=orcladmin.

    • Password: Administrative password

Activating the AD Identity Plug-In

Before activating the AD Identity plug-in for validating the users in AD, Documentum Content Server must be synchronized with AD as an LDAP server. For synchronization, you must import users and groups from AD to Documentum.

To configure Documentum Content Server as an LDAP server: 

  1. Create an LDAP Configuration Object in DA:

    1. Log in to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. Select File, New, LDAP Configuration Object.

    4. Enter a name for ldap configuration object.

    5. Select dm_user as User Subtype.

    6. In the Communication Mode field, select Regular.

    7. In the Import field, select Users and Groups.

    8. Select Default Configuration Object in the server field, and click Next.

    9. Provide the following values:

      Directory Type: Select Active Directory Server.

      Bind Type: Select Bind by Searching for Distinguished Name

      Binding Name: Provide the admin user name of AD. It is normally domainName/Administrator.

      Binding Password: The password of the AD admin user.

      Host Name: AD host name.

      Port: Default port number of AD, 389.

      Person Object Class: The Base Person Object, typically the value is user.

      Person Search Base: The person search base defined in AD, for example cn=Users,dc=us, dc=oracle,dc=com.

      Person Search Filter: Enter cn=*.

      Group Object Class: The group object. Typically the value is group.

      Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.

      Group Search Filter: Enter cn=*.

    10. Click Next.

    11. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After synchronizing the Documentum Content Server with the AD, you must activate the identity for AD Identity plug-in.

To activate the identity plug-in: 

  1. Log in to Oracle SES as admin user.

  2. Click Global Settings, and then select System, Identity Management Setup.

  3. Select Activity Directory Identity Plug-in Manager, and click Activate.

  4. Provide the following values:

    • Authentication Attribute: Select USER_NAME.

    • Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.

    • Directory account name: Provide the AD user name, for example Administrator.

    • Directory account password: AD user password.

    • Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.

    • Directory security protocol: Specify either none or portnumber.

  5. Click Finish.

Activating SunOne Identity Plug-In

Before activating the SunOne Identity plug-in for validating the users in SunOne, you must synchronize Documentum Content Server with SunOne as an LDAP server. For synchronization, you must import the users and groups from Oracle Internet Directory to Documentum Content Server.

To import users and groups from Oracle Internet Directory: 

  1. Create an LDAP Configuration Object in DA:

    1. Log in to DA.

    2. Navigate to Administration, User Management, LDAP.

    3. Select File, New, LDAP Configuration Object.

    4. Enter a name for ldap configuration object.

    5. Select dm_user as User Subtype.

    6. In the Communication Mode field, select Regular.

    7. In the Import field, select Users and Groups.

    8. Select Default Configuration Object in the server field, and click Next.

    9. Provide the following values:

      Directory Type: Select Netscape/iPlanet Directory Server

      Bind Type: Select Bind by Searching for Distinguished Name

      Binding Name: Provide the admin user name of SunOne. It is normally cn=Administrator.

      Binding Password: The password of the SunOne admin user.

      Host Name: SunOne host name.

      Port: Enter the port number used for SunOne. The default port number of SunOne is 389.

      Person Object Class: The 7+Base Person Object, typically the value is person.

      Person Search Base: The person search base defined in SunOne, for example cn=Users,dc=us, dc=oracle,dc=com.

      Person Search Filter: Enter cn=*.

      Group Object Class: The group object. Typically the value is groupOfUniqueNames.

      Group Search Base: The group search base defined in AD. For example, dc=us,dc=oracle,dc=com.

      Group Search Filter: Enter cn=*.

    10. Click Next.

    11. The Attribute Map information is displayed. Click Finish.

  2. Run the LDAP_Synchronization job:

    1. Login to DA.

    2. Navigate to Administration, Job Management, Jobs.

    3. Open the job dm_LDAPsynchronization.

    4. In the state field, select Active.

    5. Select Deactivate On Failure.

    6. In Designated Server, select the host name of Documentum Server.

    7. Select Run After Update.

    8. Click the Schedule tab.

    9. In the Start Date And Time field, set the current date and time.

    10. Select Repeat time from the Repeat list.

    11. Set the Frequency field to any numeric value.

    12. Select End Date And Time and specify how long the Synchronization job should run.

    13. Click the Method tab.

    14. Select Pass Standard Argument.

    15. Click the SysObject info tab.

    16. Click OK.

After the Documentum Content Server is synchronized with SunOne, the identity is activated for SunOne Identity plug-in.

To activate the identity for the SunOne plug-in: 

  1. Log in to Oracle SES as the administrative user.

  2. Click Global Settings, and then select System, Identity Management Setup.

  3. Select Sun Java System Directory Server Manager, and click Activate.

  4. Provide the following values:

    • Authentication Attribute: Select USER_NAME.

    • Directory URL: Provide the host name and the port number. For example, ldap://ldapserverhost:port.

    • Directory account name: Provide the Directory Server user name, for example Administrator.

    • Directory account password: Directory Server user password.

    • Directory subscriber: Provide the directory subscriber (ldap base). For example, dc=us.dc=oracle.dc=com.

    • Directory security protocol: Specify either none or portnumber.

  5. Click Finish.

Creating an EMC Documentum Content Server Source

Create an EMC Documentum Content Server source on the Home - Sources page. Select EMC Documentum Content Server from the Source Type list, and click Create. Enter values for the following parameters:

PKSA7PKɒj?OEBPS/schemas.htm XML Connector Examples and Schemas

A XML Connector Examples and Schemas

This appendix contains examples and schemas associated with the Oracle SES XML connector framework. This contains the following topics:

PKju  PKɒj?OEBPS/license001.htmn/ Apache Software

Apache Software

This program contains code from the Apache Software Foundation ("Apache"). Under the terms of the Apache license, Oracle is required to provide the following notices. Note, however, that the Oracle program license that accompanied this product determines your right to use the Oracle program, including the Apache software, and the terms contained in the following notices do not change those rights. Notwithstanding anything to the contrary in the Oracle program license, the Apache software is provided by Oracle "AS IS" and without any warranty or support of any kind from Oracle or Apache.

                  Apache License
             Version 2.0, January 2004
           http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
 
      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.
 
      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.
 
      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.
 
      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.
 
      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.
 
      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.
 
      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).
 
      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.
 
      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."
 
      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.
 
   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.
 
   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.
 
   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:
 
      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and
 
      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and
 
      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and
 
      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.
 
      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.
 
   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.
 
   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.
 
   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.
 
   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.
 
   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.
 
   END OF TERMS AND CONDITION
PK-:s/n/PKɒj?OEBPS/crawler002.htmPb Overview of Crawler Settings

Overview of Crawler Settings

This section describes crawler settings and other mechanisms to control the scope of Web crawling:


See Also:

"Tuning Crawl Performance" for more detailed information on these settings and other issues affecting crawl performance

URL Boundary Rules

URL boundary rules limit the crawling space. When boundary rules are added, the crawler is restricted to URLs that match the indicated rules. The order in which rules are specified has no impact, but exclusion rules always override inclusion rules.

Boundary rules are set on the Home - Sources - Boundary Rules page.

Inclusion Rules

Specify an inclusion rule that a URL contain, start with, or end with a term. Use an asterisk (*) to represents a wildcard. For example, www.*.example.com. Simple inclusion rules are case-insensitive. For case-sensitivity, use regular expression rules.

An inclusion rule ending with example.com limits the search to URLs ending with the string example.com. Anything ending with example.com is crawled, but http://www.example.com.tw is not crawled.

If the URL Submission functionality is enabled on the Global Settings - Query Configuration page, then URLs that are submitted by end users are added to the inclusion rules list. You can delete URLs that you do not want to index.

Oracle SES supports the regular expression syntax used in Java JDK 1.4.2 Pattern class (java.util.regex.Pattern). Regular expression rules use special characters. The following is a summary of some basic regular expression constructs.

  • A caret (^) denotes the beginning of a URL and a dollar sign ($) denotes the end of a URL.

  • A period (.) matches any one character.

  • A question mark (?) matches zero or one occurrence of the character that it follows.

  • An asterisk (*) matches zero or more occurrences of the pattern that it follows. You can use an asterisk in the starts with, ends with, and contains rules.

  • A backslash (\) escapes any special characters, such as periods (\.), question marks (\?), or asterisks (\*).


See Also:

http://www.oracle.com/technetwork/java/index.html for a complete description in the Java documentation

Index Dynamic Pages

By default, Oracle SES processes dynamic pages. Dynamic pages are generally served from a database application and have a URL that contains a question mark (?). Oracle SES identifies URLs with question marks as dynamic pages.

Some dynamic pages appear as multiple search results for the same page, and you might not want them all indexed. Other dynamic pages are each different and must be indexed. You must distinguish between these two kinds of dynamic pages. In general, dynamic pages that only change in menu expansion without affecting its contents should not be indexed.

Consider the following three URLs:

http://example.com/aboutit/network/npe/standards/naming_convention.html
 
http://example.com/aboutit/network/npe/standards/naming_convention.html?nsdnv=14z1
 
http://example.com/aboutit/network/npe/standards/naming_convention.html?nsdnv=14

The question marks (?) in two URLs indicate that the rest of the strings are input parameters. The three results are essentially the same page with different side menu expansion. Ideally, the search yields only one result:

http://example.com/aboutit/network/npe/standards/naming_convention.html

Note:

The crawler cannot crawl and index dynamic Web pages written in Javascript.

Set the dynamic pages parameter on the Home - Sources - Crawling Parameters page.

Title Fallback

You can override a default document title with a meaningful title if the default title is irrelevant. For example, suppose that the result list shows numerous documents with the title "Daily Memo". The documents had been created with the same template file, but the document properties had not been changed. Overriding this title in Oracle SES can help users better understand their search results.

Title fallback can be used for any source type. Oracle SES uses different logic for each document type to determine which fallback title to use. For example, for HTML documents, Oracle SES looks for the first heading, such as <h1>. For Microsoft Word documents, Oracle SES looks for text with the largest font.

If the default title was collected in the initial crawl, then the fallback title is only used after the document is reindexed during a re-crawl. Thus, if there is no change to the document, then you must force the change by setting the re-crawl policy to Process All Documents on the Home - Schedules - Edit Schedule page.

To implement title fallback, modify the crawlerSettings object using the Administration API. Set the <search:indexNullTitleFallback> element to indexForAll, and list the bad titles in the <search:badTitles> elements. See the Oracle Secure Enterprise Search Administration API Guide.

Title fallback is not currently supported in the Oracle SES Administration GUI, and by default, it is turned off.

Special considerations with title fallback 

  • With Microsoft Office documents:

    • Font sizes 14 and 16 in Microsoft Word correspond to normalized font sizes 4 and 5 (respectively) in converted HTML. The Oracle SES crawler only picks up strings with normalized font size greater than 4 as the fallback title.

    • Titles must contain more than five characters.

  • When a title is null, Oracle SES automatically indexes the fallback title for all binary documents (for example, .doc, .ppt, .pdf).

    For HTML and text documents, Oracle SES does not automatically index the fallback title. Thus, the replaced title on HTML or text documents cannot be searched with the title attribute on the Advanced Search page. To turn on indexing for HTML and text documents, modify the crawlerSettings object using the Administration API. Set the <search:indexNullTitleFallback> parameter to indexForAll.

Character Set Detection

This feature enables the crawler to automatically detect character set information for HTML, plain text, and XML files. Character set detection allows the crawler to properly cache files during crawls, index text, and display files for queries. This is important when crawling multibyte files (such as files in Japanese or Chinese).

To enable character set detection, update the crawlerSettings object using the Administration API. Set the <search:charsetDetection> parameter to true. See the Oracle Secure Enterprise Search Administration API Guide for more information about changing crawler settings.

Language Detection

With multibyte files, besides turning on character set detection, be sure to set the Default Language parameter. For example, if the files are all in Japanese, select Japanese as the default language for that source. If automatic language detection is disabled, or if the crawler cannot determine the document language, then the crawler assumes that the document is written in the default language. This default language is used only if the crawler cannot determine the document language during crawling.

If your files are in multiple languages, then turn on the Enable Language Detection parameter. Not all documents retrieved by the crawler specify the language. For documents with no language specification, the crawler attempts to automatically detect language. The language recognizer is trained statistically using trigram data from documents in various languages (for instance, Danish, Dutch, English, French, German, Italian, Portuguese, and Spanish). It starts with the hypothesis that the given document does not belong to any language and ultimately refutes this hypothesis for a particular language where possible. It operates on Latin-1 alphabet and any language with a deterministic Unicode range of characters (like Chinese, Japanese, Korean, and so on).

The crawler determines the language code by checking the HTTP header content-language or the LANGUAGE column, if it is a table source. If it cannot determine the language, then it takes the following steps:

The Default Language and the Enable Language Detection parameters are on the Global Settings - Crawler Configuration page (globally) and also the Home - Sources - Crawling Parameters page (for each source).


Note:

For file sources, the individual source setting for Enable Language Detection remains false regardless of the global setting. In most cases, the language for a file source should be the same, and set from, the Default Language setting.

PKS!UbPbPKɒj?OEBPS/tuning003.htmx% Tuning Search Performance and Scalability

Tuning Search Performance and Scalability

Oracle SES contains features that you can tune to optimize search performance. This section contains suggestions on how to improve performance (such as response time and throughput) and scalability of Oracle SES. It identifies the most common ways to improve search quality.

Parallel Query and Index Partitioning

Parallel querying significantly improves search performance and facilitates searches of very large data sources. The query architecture is based on Oracle Database partitioning and enhancements in Oracle Text.

To make the best use of this feature, Oracle recommends that you run Oracle SES on a server with a 4-core CPU, with at least 8GB of RAM and multiple fast disk drives.

Parallel querying is automatically implemented on Oracle SES when the partitioning option is enabled. You can specify partitioning only during installation.

To enable partitioning: 

  1. Acquire a license for the Oracle Partitioning option.

  2. During installation, answer Yes when the Repository Creation Utility (RCU) asks if you have a partitioning license. Then Oracle Database is installed with partitioning, and Oracle SES automatically supports parallel query.

Index Fragmentation

Index fragmentation management allows the search engine index to be updated while Oracle SES is executing searches. This is achieved by temporarily saving index changes to an in-memory index and periodically merging them with the larger disk-based search engine index. This reduces fragmentation and leads to faster response times. Index fragmentation management is implemented automatically on Oracle SES, but it can be tuned by configuring Oracle Text, where you can turn index fragmentation management on and off, and specify the frequency of index merges.

Optimizing the index also reduces fragmentation, and it can significantly increase the speed of searches. Schedule index optimization on a regular basis. Also, optimize the index after the crawler has made substantial updates or if fragmentation is more than 50%. Verify that index optimization is scheduled during off-peak hours. Optimization of a very large index could take several hours.

You can see the fragmentation level and run index optimization on the Global Settings - Index Optimization page in the Oracle SES Administration GUI. Index optimization has these options:

Do Not Run Optimization Longer Than

Specify a maximum duration for the index optimization process. The actual time taken for optimization does not exceed this limit, but it can be shorter. A longer optimization time results in a more optimized index. In this mode, the optimization process does not require a large amount of free disk space.

Until the Optimization is Finished

Specifies that the optimization continues until it is finished. Allowing the optimization to complete creates a more compact index and supports better performance than a partial optimization.

In this mode, Oracle SES creates a temporary copy of the index. The required disk space almost equals the current index size. If sufficient free disk space is not available, then the optimization fails. Use the appropriate SQL query shown here to estimate the minimum disk requirement:

  • Oracle SES Without Partitioning

    SELECT SUM(bytes)/1048576 AS "MBytes" 
       FROM dba_segments 
       WHERE segment_name IN ('DR$EQ$DOC_PATH_IDX$I','DR$EQ$DOC_PATH_IDX$X'); 
    
  • Oracle SES With Partitioning

    SELECT SUM(sz) AS "MBytes" 
       FROM 
       ( 
          SELECT MAX(bytes)/1048576 sz FROM dba_segments 
             WHERE segment_name LIKE 'DR#EQ$DOC_PATH_IDX$%I' 
       UNION 
          SELECT MAX(bytes)/1048576 sz FROM dba_segments 
             WHERE segment_name LIKE 'DR#EQ$DOC_PATH_IDX$%X' 
       ) ; 
    

These queries return an estimate of the minimum disk space needed for optimization. Oracle SES may require more disk space than this estimate.

After the optimization is complete, Oracle SES releases the disk space consumed during the optimization. The space can be used by future crawls or any activity that consumes disk space.

Search Statistics

See the Home - Statistics page in the Oracle SES Administration GUI for lists of the most popular queries, failed queries, and ineffective queries. This information can lead to the following actions:

  • Refer users to a particular Web site for failed queries on the Search - Suggested Links page.

  • Fix common errors that users make in searching on the Search - Alternate Words page.

  • Make important documents easier to find on the Search - Relevancy Boosting page.

Once daily, SES automatically summarizes logged queries. The summarizing task might use the server resource if there are a large number of logged queries, which may impact query performance. This issue is visible for stress tests where several queries are executed every second. The ideal solution in such instances is to disable the query statistics option.

To disable the query statistics option: 

  1. From the Administration GUI Home page, select the Global Settings tab, then click Query Configuration.

  2. Under Query Statistics, select No for the Enable Query Statistics option.

WebLogic Search Server Configuration

Oracle SES is installed in a WebLogic domain as described in "Secure Search in Oracle Fusion Applications". The default settings for stuck threads can result in slow query performance even under a moderate load.

To change the search server configuration 

  1. Log in to the WebLogic console, as described in "Accessing the Oracle WebLogic Server Administration Console".

  2. In the left panel under Change Center, click Lock & Edit.

  3. In the left panel under Domain Structure, expand Environment and click Servers. The Summary of Services page is displayed in the main panel.

  4. In the Name column, click search_server1. The Settings for search_server1 page is displayed.

  5. Select the Configuration tab.

  6. Configure these settings:

    • Stuck Thread Max Time: 3600

    • Stuck Thread Timer Interval: 1800

  7. Click Save.

  8. Repeat these steps for any other search server instances, such as search_server2.

  9. In the left panel under Change Center, click Activate Changes.

Database Initialization Parameters

To support a large number of simultaneous users, you may need to increase the values of these database initialization parameters:

  • PROCESSES

  • SESSIONS

  • OPEN_CURSORS

In Fusion Applications, the Oracle SES middle tier uses connection pooling to communicate with the backend database. The database connection uses dedicated server mode, so that when 10 users run concurrent searches, the database requires 10 user processes.

The crawler also uses several threads, and each thread uses several database connections. You can alter the number of crawler threads on the Home - Sources - Crawling Parameters page of the Oracle SES Administration GUI.

Use the combined estimate of concurrent user processes and crawler threads for the value of PROCESSES. Then modify SESSIONS to a compatible value, typically calculated as 1.1 * PROCESSES.

You can monitor the number of open cursors using the statistics stored in the V$SESSTAT dynamic performance view. If the number of open cursors for user sessions frequently approaches the maximum, then you can increase that number.


See Also:


To change the database initialization parameters: 

  1. Open SQL*Plus and log in to Oracle Database as a privileged user, such as SYSTEM.

  2. For a list of all initialization parameters and their current settings, issue this SQL*Plus command:

    show parameters
    
  3. Issue ALTER SYSTEM commands, using values appropriate for your system, to change the value of the parameters. For example, this command sets PROCESSES to 800:

    ALTER SYSTEM SET processes=800 SCOPE=spfile;
    
  4. Restart Oracle Database for the new settings to take effect.

Buffer Cache

An Oracle SES search operation looks up the Oracle Text index and some internal tables to generate a hit list. To maintain the best search performance, reduce disk I/O as much as possible by keeping these objects in the buffer cache. If you have plenty of physical memory, you can enlarge the buffer cache so it can retain these objects.

The search operation accesses these database objects the most frequently:

Object NamePartitioned Object NameObject Type
DR$EQ$DOC_PATH_IDX$XDR#EQ$DOC_PATH_IDX4-digit-ID$XB-tree index
DR$EQ$DOC_PATH_IDX$RDR#EQ$DOC_PATH_IDX4-digit-ID$RTable
DR$EQ$DOC_PATH_IDX$IDR#EQ$DOC_PATH_IDX4-digit-ID$ITable

$X and $R are the most important and are typically smaller than $I. If the database has large KEEP pool or can support one, consider putting the $X and $R tables in it to maintain good performance when accessing them. While the $I table is also important for search, it can become too large to cache in its entirety.

Check the cache hit ratio for these objects regularly in Enterprise Manager or an Automatic Workload Repository (AWR) report. Crawling and optimization can change the size of these objects.

To put a table in the KEEP pool:  

  1. Open SQL*Plus or another SQL interface and connect as a privileged user.

  2. Issue an ALTER INDEX command using this syntax, where table_name is the $R or $X table.

    ALTER INDEX table_name STORAGE(BUFFER_POOL KEEP)
    
  3. Verify the new location of the table:

    SELECT buffer_pool FROM dba_indexes WHERE index_name = table_name;
    

Example 10-1 shows the SQL commands that put the $X file in the KEEP pool.

PK|jxxPKɒj?OEBPS/tuning007.htmE Integrating with Google Desktop

Integrating with Google Desktop

Oracle Secure Enterprise Search provides a GDfE plug-in to integrate with Google Desktop Enterprise Edition. You can include Google Desktop results in your Oracle SES hit list. You can also link to Oracle SES from the GDfE interface.


See Also:

Google Desktop for Enterprise Plug-in Readme at http://host:port/search/query/gdfe/gdfe_readme.html

PK0IJEPKɒj?OEBPS/over.htmG Introduction to Oracle Secure Enterprise Search

1 Introduction to Oracle Secure Enterprise Search

This chapter describes the basic components of Oracle Secure Enterprise Search: the sources, crawler, and user interfaces. It contains the following topics:

PKU@PKɒj?OEBPS/apsources003.htm3% Setting Up Oracle E-Business Suite Sources

Setting Up Oracle E-Business Suite Sources

The Oracle E-Business Suite connector uses the Oracle SES XML connector framework, where searching is based on Oracle E-Business Suite data available as XML feeds.

To activate an identity plug-in for Oracle E-Business Suite sources: 

  1. On the Global Settings page, select Identity Management Setup.

  2. Select Oracle E-Business Suite and click Activate to display the Activate Identity Plug-in page.

  3. Enter values for the parameters as described in Table 8-6. Obtain the values for these parameters from the E-Business Suite administrator.

  4. Click Finish.

To create an Oracle E-Business Suite source: 

  1. Activate an identity plug-in as described in the previous procedure.

  2. On the Home page, select the Sources secondary tab.

  3. Select Oracle E-Business Suite from the Source Type list, and click Create.

  4. Enter the source parameters as described in Table 8-7.

  5. Click Next.

  6. Click Get Parameters to obtain a list of parameters for the authorization manager plug-in.

  7. Enter the values for the authorization manager plug-in parameters as described in Table 8-8.

  8. Click Create.

After processing each data feed, the crawler uploads a status feed to the location specified in the XML configuration file specified in the Configuration URL parameter. This status feed has a name in the following format:

Table 8-8 Oracle E-Business Suite Authorization Parameters

ParameterValue

HTTP endpoint for authorization

HTTP endpoint of E-Business Suite that provides the user authorization service.

User ID

User ID.

Password

Password for User ID.

Business Component

Name of the Oracle E-Business Suite business component being crawled. The values of the security attributes for which the current user is authorized in the realm of this business component is retrieved to build the security filter for the user when the user logs into Oracle SES. For example, oracle.apps.fnd.fwk.search.NavigationSVO.

Security attribute values for anonymous user

Comma-delimited list of authorized values of security attributes for anonymous user. If the parameter is left blank, then the authorization service is contacted to retrieve the values of security attributes accessible for anonymous user.

Display URL Prefix

HTTP host information to prefix the partial URL specified in the access URL of the documents in XML feeds to form the complete URL. This complete URL is the display URL of the document when the document link in the Oracle SES search results page is clicked.

This value must form a valid URL when concatenated with the access URL element of an item in the data feed. Be careful to avoid having either two slashes or none when the values are combined. Thus, enter a trailing slash (/) if the access URLs do not begin with a slash, or omit the trailing slash from the prefix if the access URLs begin with a slash.


PK-33PKɒj?OEBPS/crawler004.htm Configuring Support for Image Metadata

Configuring Support for Image Metadata

The Oracle SES crawler initially is set to search only text files. You can change this behavior by configuring an image document service connector to search the metadata associated with image files. Image files can contain rich metadata that provide additional information about the image itself.

The Image Document Service connector integrates Oracle Multimedia (formerly Oracle interMedia) images with Oracle SES. This connector is separate from any specific data source.

The following table identifies the metadata formats (EXIF, IPTC, XMP, DICOM) that can be extracted from each supported image format (JPEG, TIFF, GIF, JPEG 2000, DICOM).


JPEGTIFFGIFJPEG2000DICOM
EXIFYesYesNoNoNo
IPTCYesYesNoNoNo
XMPYesYesYesYesNo
DICOMNoNoNoNoYes


See Also:

Oracle Multimedia User's Guide and Oracle Multimedia Reference for more information about image metadata

Identifying the Search Attributes for Image Metadata

Image files can contain metadata in multiple formats, but not all of it is useful when performing searches. A configuration file in Oracle SES enables you to control the metadata that is searched and published to an Oracle SES Web application.

If you upgraded from a previous release, then the default configuration file remains ordesima-sample.xml.

The default configuration file is named attr-config.xml. You can modify this file, which is located at ORACLE_HOME/search/lib/plugins/doc/ordim/config/. Oracle recommends that you create a copy of the default configuration file before editing it. Note that the configuration file must conform to the XML schema ORACLE_HOME/search/lib/plugins/doc/ordim/xsd/ordesima.xsd.

Oracle SES indexes and searches only those image metadata tags that are defined within the metadata element (between <metadata>...</metadata>) in the configuration file. By default, the configuration file contains a set of the most commonly searched metadata tags for each of the file formats. You can add other metatags to the file based on your specific requirements.

Image files can contain metadata in multiple formats. For example, an image can contain metadata in the EXIF, XMP, and IPTC formats. An exception to this are DICOM images, which contain only DICOM metadata. Note that for IPTC and EXIF formats, Oracle Multimedia defines its own image metadata schemas. The metadata defined in the configuration file must conform to the Oracle Multimedia defined schemas.

Because different metadata formats use different tags to refer to the same attribute, it is necessary to map metatags and the search attributes they define. Table 3-1 lists some commonly used metatags and how they are mapped in Oracle SES.

Oracle SES provides this mapping in the configuration file attr-config.xml. You can edit the file to add other metatags. Oracle recommends that you make a copy of the original configuration file before editing the settings. The configuration file defines the display name of a metatag and how it is mapped to the corresponding metadata in each of the supported formats.

This is done using the <searchAttribute> tag, as shown in the example below:

<searchAttribute>
 <displayName>Author</displayName>
 <metadata>
   <value format="iptc">byline/author</value>
   <value format="exif">TiffIfd/Artist</value>
   <value format="xmp">dc:creator</value>
   <value format="xmp">tiff:Artist</value>
 </metadata>
</searchAttribute>

For each search attribute, the value of <displayName> is an Oracle SES attribute name that is displayed in the Oracle SES web application when an Advanced Search - Attribute Selection is performed. If any of the listed attributes are detected during a crawl, then Oracle SES automatically publishes the attributes to the SES web application.

For the <value> element, the format attribute must take the value of a supported format, such as iptc, exif, xmp, or dicom.

The value defined within the element, for example, byline/author, is the XML path when the image format is IPTC, EXIF, or XMP. For DICOM, this value must be the standard tag number or value locator.

For IPTC and EXIF formats, the XML path must conform to the metadata schemas defined by Oracle Multimedia. These schemas are defined in the files ordexif.xsd and ordiptc.xsd located at ORACLE_HOME/search/lib/plugins/doc/ordim/xsd/.

You do not need to specify the root elements defined in these schemas (iptcMetadata, exifMetadata) in the configuration file. For example, you can specify byline/author as the xmlPath value of the author attribute in IPTC format. Oracle Multimedia does not define XML schemas for XMP metadata, so refer to the Adobe XMP specification for the xmlPath value.

Within the <searchAttribute> tag, you can also specify an optional <dataType> tag if the attribute carries a date or numeric value. For example,

<searchAttribute>
     <displayName>AnDateAttribute</displayName>
     <dataType>date</dataType>
     <metadata>
        ...
     </metadata>
</searchAttribute>
   

The default data type is string, so you do not have to explicitly specify a string.

Supporting XMP Metadata

Oracle SES supports both standard and custom XMP metadata searches. Because all XMP properties share the same parent elements <rdf:rdf><rdf:description>, you must specify only the real property schema and property name in the configuration file. For example, specify photoshop:category instead of rdf:rdf/rdf:description/photoshop:category. The same rule applies to XMP custom metadata also. However, for XMP structure data, you must specify the structure element in the format parent/child 1/child 2/…child N, where child N is a leaf node. For example, Iptc4xmpCore:CreatorContactInfo/Iptc4xmpCore:CiPerson. Note that the image plug-in does not validate the metadata value for XMP metadata.

XMP metatags consist of 2 components separated by a colon(:). For example, photoshop:Creator, which corresponds to the Author attribute (see Table 3-1). In this example, photoshop refers to the XMP schema namespace. The other common namespaces include dc, tiff, and Iptc4xmpCore.

Before defining any XMP metadata in the configuration file, you must ensure that the namespace is defined. For example, before defining the metadata photoshop:Creator, you must include the namespace photoshop in the configuration file. This rule applies to both the standard and custom XMP metadata namespaces. As a best practice, Oracle recommends that you define all the namespaces at the beginning of the configuration file. If the namespace defined in the configuration file is different from the one in the image, then Oracle SES cannot find the attributes associated with this namespace. You can define namespaces as shown:

<xmpNamespaces>
<namespace prefix="Iptc4xmpCore">http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/</namespace>
<namespace prefix="dc">http://purl.org/dc/elements/1.1/</namespace>
<namespace prefix="photoshop">http://ns.adobe.com/photoshop/1.0/</namespace>
<namespace prefix="xmpRights">http://ns.adobe.com/xap/1.0/rights/</namespace>
<namespace prefix="tiff">http://ns.adobe.com/tiff/1.0/</namespace>
</xmpNamespaces>

The Adobe XMP Specification requires that XMP namespaces end with a slash (/) or hash (#) character.


See Also:

Adobe Extensible Metadata Platform (XMP) Specification for the XMP metadata schema and a list of standard XMP namespace values.

http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf


Custom XMP metadata must be explicitly added to attr-config.xml. An example of a custom metadata is:

<xmpNamespaces>
  <namespace prefix="hm">http://www.oracle.com/ordim/hm/</namespace>
</xmpNamespaces>
<searchattribute>
  <displayname>CardTitle</displayname>
  <metadata>
    <value format="xmp">hm:cardtitle</value>       
  </metadata>
</searchattribute>

Supporting DICOM Metatags

Oracle SES 11g supports DICOM metatags, and these metatags are available in the default configuration file attr-config.xml.

DICOM metatags are either DICOM standard tags or DICOM value locators.

DICOM Value Locators

Value locators identify an attribute in the DICOM content, either at the root level or from the root level down.

A value locator contains one or more sublocators and a tag field (optional). A typical value locator is of the format:

sublocator#tag_field

Or of the format:

sublocator

Each sublocator represents a level in the tree hierarchy. DICOM value locators can include multiple sublocators, depending on the level of the attribute in the DICOM hierarchy. Multiple sublocators are separated by the dot character (.). For example, value locators can be of the format:

sublocator1.sublocator2.sublocator3#tag_field

Or of the format:

sublocator1.sublocator2.sublocator3

A tag_field is an optional string that identifies a derived value within an attribute. A tag that contains this string must be the last tag of a DICOM value locator. The default is NONE.

A sublocator consists of a tag element and can contain other optional elements. These optional elements include definer and item_num. Thus, a sublocator can be of the format:

tag

Or it can be of the format

tag(definer)[item_num)

The following example shows how to add a value locator to the attr-config.xml file:

<searchAttribute>
  <displayName>PatientFamilyName</displayName>
  <metadata>
  <value format="dicom">00100010#UnibyteFamily</value>       
  </metadata>
</searchAttribute>

where UnibyteFamily is a tag_field of person name.

The following example shows how to define a value locator from the root level.

<searchAttribute>
      <displayName>AdmittingDiagnosisCode</displayName>
      <metadata>
        <value format="dicom">00081084.00080100</value>       
      </metadata>
</searchAttribute>
<searchAttribute>
      <displayName>AdmittingDiagnosis</displayName>
      <metadata>
        <value format="dicom">00081084.00080104</value>
      </metadata>
</searchAttribute>

In the above example, the tag 00081084 represents the root tag Admitting Diagnoses Code Sequence. This tag includes four child tags: code value (0008, 0100), coding scheme designator (0008, 0102), coding scheme version (0008, 0103) and code meaning (0008, 0104). In this example, the value locators are code value: 00081084.00080100 and code meaning: 00081084.00080104.


Note:

The image connector does not support SQ, UN, OW, OB, and OF data type value locators. Therefore, ensure that the last sublocator of a value locator does not specify such data types.


See Also:

Oracle Multimedia DICOM Developer's Guide for more information about DICOM value locators

Creating an Image Document Service Connector

A default Image Document Service connector instance is created during the installation of Oracle SES. You can configure the default connector or create a new one.

To create an Image Document Service instance: 

  1. In the Oracle SES Administration GUI, click Global Settings.

  2. Under Sources, click Document Services to display the Global Settings - Document Services page.

  3. To configure the default image service instance:

    1. Click Expand All

    2. Click Edit for the default image service instance.

    or

    To create a new image service instance:

    1. Click Create to display the Create Document Service page.

    2. For Select From Available Managers, choose Secure Enterprise Search Image Document Service and click Next.

    3. Provide a name for the instance.

  4. Provide a value for the attributes configuration file parameter.

    The default value of attributes configuration file is attr-config.xml. The file is located at ORACLE_HOME/search/lib/plugins/doc/ordim/config/.

  5. Click Apply.

  6. Click Document Services in the locator links to return to the Document Services page.

  7. Add the Image Document Service plug-in to either the default pipeline or a new pipeline.

To add the default Image Document Service plug-in to the default pipeline: 

  1. Under Document Service Pipelines, click Edit for the default pipeline.

  2. Move the Image Document Service instance from Available Services to Used in Pipeline.

  3. Click Apply.

To create a new pipeline for the default Image Document Service plug-in: 

  1. Under Document Service Pipelines, click Create to display the Create Document Service Pipeline page.

  2. Enter a name and description for the pipeline.

  3. Move the Image Document Service instance from Available Services to Used in Pipeline.

  4. Click Create.

PKS=GPKɒj?OEBPS/license004.htm Javascript Bubbling Library

Javascript Bubbling Library

Javascript Bubbling Library http://www.bubbling-library.com

Copyright (c) 2007, Caridy Patiño. All rights reserved.

Redistribution and use of this software in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

To get started using Bubbling Library, simply include the two source files into the head of your document:

<!-- YUI Core -->
<script src="/PATH/TO/utilities.js" type="text/javascript"></script>
<!-- Bubbling Library Core -->
<script src="/PATH/TO/bubbling.js" type="text/javascript"></script>

The documentation can be found here: http://www.bubbling-library.com/eng/api/docs/

PKX   PKɒj?OEBPS/oessecurity002.htmW~ Enabling Secure Search

Enabling Secure Search

Much of the information within an organization is publicly accessible. However, there are other sources that are protected. For example, while a user can search in their own e-mail folders, they should not be able to search anyone else's e-mail. A secure search returns only search results that the user is allowed to view based on access privileges.

Oracle SES can use the following two security modes: using OracleAS Single Sign-On or not. These options are set on the Global Settings - Query Configuration page:

  • Require login for secure content only: anyone can search public content. This is the default. This is also known as secure mode 2.

  • Require login for public and secure content. This is also known as secure mode 3.

The security mode is applied to both the default query application and Oracle SES Web services. In mode 3, if a user tries to perform any Web services operation (search or document service) without logging in first, then a SOAP exception is thrown indicating that this secure mode requires login for any operation.

This section describes the authorization methods that Oracle SES supports. The authorization methods prevent search users from accessing documents for which they do not have privileges.

Oracle Secure Enterprise Search offers several options for secure search:


See Also:

The Oracle SES administration tutorial at

http://st-curriculum.oracle.com/tutorial/SESAdminTutorial/index.htm


User Authorization Cache

The User Authorization Cache (UAC) source type can crawl and cache user authorization information such as groups and accessible values of user security attributes. This cached information is used at query time to build a security filter. Querying a local cache is much faster than retrieving the authorization information from external repository and identity systems, and thus it significantly reduces the time to build the security filters for the current user. As a result, users can log in to Oracle SES much more quickly. Moreover, you can set up UAC sources to crawl the user authorization information off line, which reduces the load on target repositories at query time.

You can use UAC for sources that are based on either of the security models supported in Oracle SES:

  • Identity-based security: UAC is enabled for Oracle Internet Directory, Active Directory, and Lotus Notes.

  • Attribute-based security: UAC is enabled for Oracle Content Database and Oracle Content Server sources. This type of security is also called the user-defined security model.

The crawler stores the following information in the User Authorization Cache:

  • User groups: The list of groups that a user belongs to.

  • User attribute values: The values of a specified list of attributes for particular data sources. The values can be single values or arrays of values.

To create a UAC source for an identity plug-in: 

  1. Click the Sources tab.

  2. For Source Type, select User Authorization Cache, then click Create.

    The Create User-Defined Source page is displayed.

  3. Configure the UAC source with the parameters described in Table 9-5. Set Retrieve user groups to true.

  4. Create and activate the identity plug-in. Configure the plug-in to use the cache.

    For example, set these parameters for an Oracle Internet Directory plug-in:

    • Use User Cache: true

    • User Cache Source Name: Name of the UAC source that caches the user group information.

To create a UAC source for attribute-based security: 

  1. Click the Sources tab.

  2. For Source Type, select User Authorization Cache, then click Create.

    The Create User-Defined Source page is displayed.

  3. Configure the UAC source with the parameters described in Table 9-5. Set Source names for which security attributes should be crawled.

  4. Configure the data sources to use cached user authorization information for building security filter by setting appropriate UAC related parameters in the authorization plug-in.

    For example, set these parameters for Oracle Content Server source:

    • Use cached user and role information to authorize results: true

    • User role data source to cache the filter: Name of the UAC source that caches the user authorization information for this data source.

Federated User Authorization Cache

The Federated User Authorization Cache maintains a single User Authorization Cache (UAC) for use by all Oracle SES instances in a federated environment. Any identity plug-in or authorization plug-in can use a Federated UAC.

Prerequisite 

To create a federated UAC: 

  1. Click the Sources tab.

  2. For Source Type, select Federated UAC, then click Create.

    The Create Federated UAC page is displayed.

  3. Configure the federated UAC source with the parameters described in Table 9-6.

  4. Configure an identity or authorization plug-in:

    1. Select the Global Settings secondary tab.

    2. Under System, choose Identity Management Setup.

    3. Activate an identity or authorization plug-in. Enter the name of the federated UAC as the value of the User Cache Source Name parameter.

  5. Repeat these steps for each additional Oracle SES instance in the federated environment.

XML Schema Definition for Remote Cache Configuration Files

The following is the XML schema definition for remote cache configuration files:

<?xml version="1.0" encoding="windows-1252" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns="http://www.example.org" attributeFormDefault="unqualified">
  <xsd:element name="FederatedUAC">
    <xsd:annotation>
      <xsd:documentation>Federated Source Configuration</xsd:documentation>
    </xsd:annotation>
    <xsd:complexType>
      <xsd:all>
        <xsd:element name="UserCache">
          <xsd:annotation>
            <xsd:documentation>
              Remote UAC Source Config Details
            </xsd:documentation>
          </xsd:annotation>
          <xsd:complexType>
            <xsd:all>
              <xsd:element name="Name" minOccurs="1">
                <xsd:annotation>
                  <xsd:documentation>
                    UAC source name in remote instance from which 
                    user/attribute information must be retrieved
                  </xsd:documentation>
                </xsd:annotation>
              </xsd:element>
              <xsd:element name="UserRouting" minOccurs="0">
                <xsd:annotation>
                  <xsd:documentation>
                     Which users should be routed to this UAC cache. Used for
                     identity-based security model.
                  </xsd:documentation>
                </xsd:annotation>
              </xsd:element>
              <xsd:element name="SourceMapping" minOccurs="0" maxOccurs="1">
                <xsd:annotation>
                  <xsd:documentation>
                    To map data source prefix for user-defined security
                    attribute retrieval. This will be used in case of user
                    defined security model. The mapping information must be
                    mentioned in the form of remote and local data source names.
                  </xsd:documentation>
                </xsd:annotation>
                <xsd:complexType>
                  <xsd:all>
                    <xsd:element name="RemoteSourceName" maxOccurs="1" minOccurs="1">
                      <xsd:annotation>
                        <xsd:documentation>
                          Remote Instance Data source name prefixed to the
                          attribute while caching security attribute information
                        </xsd:documentation>
                      </xsd:annotation>
                      <xsd:complexType/>
                    </xsd:element>
                    <xsd:element name="LocalSourceName" minOccurs="1">
                      <xsd:annotation>
                        <xsd:documentation>
                          Local Instance Data source name for which security
                          attribute is being retrieved
                        </xsd:documentation>
                      </xsd:annotation>
                    </xsd:element>
                  </xsd:all>
                </xsd:complexType>
              </xsd:element>
            </xsd:all>
          </xsd:complexType>
        </xsd:element>
      </xsd:all>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Query-time Authorization

Query-time authorization provides another form of filtering. Query-time authorization can be enabled or disabled for Web, file, table, e-mail, mailing list, OracleAS Portal, and user-defined source types from the Home - Sources - Edit Source page. It is not available for federated or self-service sources. Query-time authorization can be used with or without ACLs. For example, a source could be stamped with a relatively broad ACL, while query-time authorization could be used to further filter the results.

In query-time authorization, the Oracle SES administrator associates a Java class that is called at run time. The Java class validates each document that is returned in a user query.

Query-time authorization requires these steps:

  1. The Oracle SES administrator registers a Java class implementing the ResultFilterPlugin interface with a source that requires query-time authorization.

  2. Oracle SES crawls, collects, and indexes all documents. If ACL stamping has been set up, then it ACL-stamps the documents.

  3. At search time, the search result list initially contains all documents accessible under crawl-time ACL policies, unfiltered by query-time user privilege checking.

  4. For the top-N results requested by the user, Oracle SES calls the registered Java class, passing in the search request and document information for any documents belonging to the protected source. The Java class returns an integer value for each document indicating if the document should be removed from the result or not.

  5. Only items the user is privileged to see are returned to the user in their result list.

Notes for Using Query-time Authorization: 

Self Service Authorization

Self service authorization allows end users to enter their credentials needed to access an external content repository. Oracle Secure Enterprise Search crawls and indexes the repository using these credentials to authenticate as the end user. Only the self service user is authorized to see these documents in their search results. Self service authorization works well out of the box, as the crawler appears to be a normally authenticated end user to the content repository.

To set up a self service source, create a template source, defining the target data repository but omitting the credentials needed to crawl. From the search application, an end user can view the Customize page and subscribe to a template source by entering their credentials in an input form. A new user-subscribed source is created, along with a copy of the template's schedule. Oracle SES creates an ACL for this user to be applied to the source.

User-subscribed sources are viewable in the Home - Sources - Manage Template Source page, and the associated schedules are administered in the Home - Schedules page. Any changes applied by the administrator to a template source are dynamically inherited by the associated user-subscribed sources for the next crawl.

The self service option is available for e-mail and Web sources. Self service e-mail sources require the administrator to specify the IMAP server address and the end user to specify the IMAP account user name and password. Self service Web sources are limited to content repositories that use OracleAS Single Sign-On authentication. The administrator specifies the seed URLs, boundary rules, document types, attribute mappings, and crawling parameters, and the end user specifies the single sign-on user name and password.

Crawling of user-subscribed sources is controlled by the administrator. End users do not see any search results for their subscribed source until that source is crawled by the administrator's schedule. Allowing a crawl to automatically launch immediately after an end user subscribes to a source might be useful. However, it makes it possible for users to load the system at inconvenient times.

PK'G\~W~PKɒj?OEBPS/results002.htmG Customizing the Relevancy of Search Attributes

Customizing the Relevancy of Search Attributes

You can customize the default Oracle SES ranking to create a more relevant search result list for your enterprise. Ranking is determined by default and custom attributes. Default attributes include title, keywords, description, and others. Different weights indicate the importance of each attribute for document relevancy. For example, Oracle SES gives more weight to titles than to keywords.

To customize the relevancy of search results, you can use the Administration API or the Query Web Services API.

PKDjL G PKɒj?OEBPS/tuning005.htm Supporting Failover in Oracle RAC

Supporting Failover in Oracle RAC

To support failover in Oracle RAC, you must change the database connection string in the credential store from a physical database node to a service representing multiple physical database nodes.


See Also:

"Introduction to Automatic Workload Management" in the Oracle Real Application Clusters Administration and Deployment Guide

To change the database connection string in the Credential Storage Framework: 

  1. Open a connection to the computer where Oracle Fusion Middleware is installed.

  2. Go to MW_HOME/oracle_common/common/bin.

  3. Start the WebLogic Server Administration Scripting Shell.

    • For Linux, enter wlst.sh.

    • For Windows, enter wlst.cmd.

  4. Enter this command at the wls/offline> prompt:

    connect ()
    
  5. Enter your WebLogic user name in response to the user-name prompt.

  6. Enter your WebLogic password in response to the password prompt.

  7. Enter the WebLogic server URL in response to the server URL prompt. For example, t3://localhost:7234.

  8. Enter this command in response to the wls:domain_name/serverConfig> prompt:

    updateCred("oracle.search","search_database",connect_string,"search")
    

    Where connect_string is the new database connection string, which uses the easy connect naming method in this format:

    jdbc:oracle:thin@hostname:port:sid
    

    For example:

    updateCred("oracle.search","search_database","jdbc:oracle:thin:@example.us.oracle.com:7890:fusion", "search")
    
PKuq PKɒj?OEBPS/tuning006.htm< Monitoring Oracle Secure Enterprise Search

Monitoring Oracle Secure Enterprise Search

In a production environment, where a load balancer or other monitoring tools are used to ensure system availability, Oracle Secure Enterprise Search (Oracle SES) can be monitored easily at the following URL:

http://host:port/monitor/check.jsp.

The page should display the following message: Oracle Secure Enterprise Search instance is up.

This message is not translated to other languages because system monitoring tools might need to byte-compare this string.

If Oracle SES is not available, then the page displays either a connection error or the HTTP status code 503.

PKӏ-A<PKɒj?OEBPS/license.htm Third Party Licenses

C Third Party Licenses

This appendix includes the third party licenses for all the third party products included with Oracle Secure Enterprise Search. This appendix includes the following topics:

PKʹPKɒj?OEBPS/start002.htm)> Understanding the Oracle SES Administration GUI

Understanding the Oracle SES Administration GUI

The Oracle SES Administration GUI provides many options for managing and customizing Oracle SES to suit your enterprise. This section describes some tasks you can accomplish using the Oracle SES Administration GUI.

Search Tab

The Search tab consists of the Relevancy, Suggested Links, Suggested Content, Alternate Words, and Source Groups secondary tabs. These pages help you improve search quality.

Description of search.gif follows

Global Settings Tab

The Global Settings tab includes links to configure settings for your Oracle SES environment.

Description of global.gif follows

This section describes some global configuration pages.


See Also:


PKUbp))PKɒj?OEBPS/oessecurity005.htmGw SSL and HTTPS Support in Oracle Secure Enterprise Search

SSL and HTTPS Support in Oracle Secure Enterprise Search

For SSL support, Oracle SES uses JSSE, a highly-customizable SSL package included in Sun Microsystem's J2SE. Oracle SES uses SSL for many operations, some acting as the SSL client, and others acting as the SSL server.

Oracle SES can crawl HTTPS-based URLs, and the Oracle SES middle tier can be configured to support HTTPS-based access. HTTPS refers to HTTP running over a secure socket layer (SSL).

Understanding SSL

SSL is an encryption protocol for securely transmitting private content on the internet. Using SSL, two parties can establish a secure data channel. SSL uses a cryptographic system that uses two keys to encrypt data: a public key and a private key. Data encrypted with the public key can only be decrypted using the private key, and vice versa.

In SSL terms, the party that initiates the communication is considered the client. During the SSL handshake, authentication between the two parties occurs. The authentication can be one-way (server authentication only) or two-way (server and client authentication). The Oracle SES crawler supports one-way SSL. It does not support two-way SSL.

Server authentication is more common. It happens every time a Web browser accesses a URL that starts with HTTPS. Because of server authentication, the client can be certain of the server's identity and can trust that it is safe to submit secure data such as login username and password to the server.

The following list defines some common terms related to SSL:

Every SSL connection starts with the SSL handshake. These are the basic steps:

  1. The client contacts the server to establish a SSL connection.

  2. The server looks in its keystore for its own SSL certificate and sends it back to the client.

  3. The client checks its keystore to see if it trusts the server or any of the entities in the server's certificate chain. If not, then the handshake is aborted. Otherwise, the client positively identifies the server and deems it trusted. The expiration date of the certificate is also checked, and the name on the certificate is matched against the domain name of the server.

  4. If the server is configured to require client authentication, then the server asks the client to identify itself, so the mirror image of steps 2 and 3 takes place.

  5. Session keys are generated and used for encrypting the transmitted data.

Oracle strongly recommends that you use an SSL-protected channel to transmit password and other secure data over networks.

Typically, the following components transmit password and other secure data over a network:

  • Federation

  • Connectors

  • Authorization plug-ins

  • Identity plug-ins

  • Suggested content

  • Web Service APIs

Managing the Keystore

The keystore is populated with the root certificates representing well known certificate authorities. Most SSL-enabled Web sites use certificates that originate or chain from these main root certificates. See "Oracle SES Acting as an SSL Server" for more information about managing the keystore.


See Also:

The Java Secure Socket Extension (JSSE) Reference Guide at

http://java.sun.com/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html


Oracle SES Acting as an SSL Server

Oracle SES acts as the SSL server when the middle tier, configured to use SSL, responds to HTTPS requests. The Oracle SES crawler connects to SSL-enabled sites using the JSSE package, which contains a keystore with a few default certificates from well known CAs.

This section contains the following topics:

Configuring Oracle Secure Enterprise Search to Require SSL

When Oracle SES is fronted by an Oracle HTTP Server, Oracle recommends that Oracle SES be configured to require SSL with client-side authentication for communication with the Oracle HTTP Server. Furthermore, it should use a keystore other than the default one. It is highly recommended that you create separate identity and trust keystores.

The communication channel between the client and Oracle SES is by default not SSL-enabled and not encrypted.

To configure Oracle SES to require SSL: 

  1. Create a new keystore. This step is optional but Oracle recommends it.

    1. Open the WebLogic console and log in, as described in "Accessing the Oracle WebLogic Server Administration Console".

    2. Expand the Environment button and click Servers. This takes you to the configuration page for the servers.

    3. Click the name of the server for which you want to configure SSL.

    4. Click the keystores tab.

    5. From the keystores list, select Custom Identity and Custom Trust.

    6. In the custom identity keystore field add the complete path and name of the new keystore. The default keystore is located at MW_HOME/wlserver/server/lib. To create a new keystore SESIdentity.jks, add the path and name MW_HOME/wlserver/server/lib/SESIdentity.jks to the keystore field.

    7. Set the custom identity keystore type to be jks. Set a pass phrase for the store.

    8. In the custom trust field, add the complete path and name of the new keystore. The default keystore is located at MW_HOME/wlserver/server/lib. To create a new keystore SESTrust.jks, add the path and name MW_HOME/wlserver/server/lib/SESTrust.jks to the keystore field.

    9. Set the custom identity keystore type to be jks. Set a pass phrase for the store.

    10. Click Save to create the new keystores.

  2. Create new certificates for the identity and trust keystores, using the Java keytool utility, which is located in MW_HOME/jdk160_21/jre/bin. If this directory is not in your search path, then make it your current working directory for these steps.

    1. Generate the key for the identity keystore:

      keytool -genkey -alias [MyCertificateAlias] -keyalg RSA -keysize 1024 -dname ["My DN"] -keypass [MyKeyPass ] –keystore [MyKeyStore ]  -storepass  [PasswordOfTheKeystoreCreatedAbove] –storetype [StoreTypeCreatedAbove]
      

      For example:

      keytool -genkey -alias sescert -keyalg RSA -keysize 1024 -dname "CN=example0123.us.mycompany.com,OU=ses,O=oracle,C=us" -keypass welcome1 -keystore $MW_HOME/wlserver/server/lib/SESIdentity.jks -storepass welcome1 -storetype jks
      

      This example creates a certificate with the alias sescert and the given dn and keypass welcome1. It uses SESIdentity.jks as the keystore, which matches the one created in step 1. The storepass and the storetype are the same as supplied in step 1.

    2. Generate the key for the trust keystore:

      keytool -genkey -keyalg RSA -alias sescert -keysize 1024 -dname "CN=example0123.us.mycompany.com,OU=ses,O=oracle,C=us" -keypass welcome1 -keystore $MW_HOME/wlserver/server/lib/SESTrust.jks -storepass welcome1 -storetype jks
      
    3. Certify the generated keys:.

      keytool -selfcert -alias sescert -keyalg RSA -validity 2000 -keypass welcome1 –keystore $MW_HOME/wlserver/server/lib/SESIdentity.jks -storepass welcome1
      

      The command uses the alias, keypass, and the keystore location supplied in step 2.a. The store pass is the password of the store.

      Self-certify the keystore:

      keytool -selfcert -alias sescert -keyalg RSA -validity 2000 -keypass welcome1 -keystore $MW_HOME/wlserver/server/lib/SESTrust.jks -storepass welcome1
      

      Note:

      In addition to using the Java keytool utility to self-sign the generated key, you can use any of the options mentioned here: http://download.oracle.com/docs/cd/E12840_01/wls/docs103/secmanage/identity_trust.html

  3. Configure Oracle SES to use the generated key:

    1. Log in to the admin console for WebLogic and select the server for which you want to configure SSL by expanding the Environment button and clicking on Servers. This takes you to the configuration page for the servers.

    2. Click the ssl tab.

    3. The private key location is set to from Custom Identity Keystore.

    4. In the Private Key Alias field, provide the private key alias. This is the alias specified in step 2a.

    5. Provide the private key pass phrase that you specified in step 2a.

    6. Save the settings.

  4. Enable SSL for Oracle SES:

    1. Log in to the admin console for WebLogic and select the server for which you want to configure SSL by expanding the Environment button and clicking on Servers. This takes you to the configuration page for the servers.

    2. Click the General tab.

    3. Select SSL Listen Port Enabled and provide a port number. The default port is 7002.

    4. Save the settings.

    5. Click the Control tab. You can access the control tab by expanding the Environment button and clicking on Servers.

    6. From the control tab, restart SSL.

Configuring Oracle HTTP Server to Require SSL

Configuring Oracle HTTP Server to require SSL is a multistep process involving configuration of the server, modification of certain .conf files, and exchange of certificates.

To configure Oracle HTTP Server to require SSL: 

  1. Configure the Oracle HTTP server:

    1. From ORACLEOHS_HOME/bin, run owm. This opens Oracle Wallet Manager, which is used to create the certificate for Oracle HTTP Server.

    2. Click Wallet and then click New.

      If you get a message indicating that the default directory is not set, click Continue.

    3. Provide a password for the wallet. Click No for the option to configure user certificate request.

    4. Click Wallet and then click Save As. Save the wallet to the directory ORACLE_HOME/instances/instanceName/config/OHS/ componentName/keystores/myWallet. This creates a new wallet with the name myWallet for the Oracle HTTP server.

      instanceName and componentName are specified during the installation of Oracle HTTP Server.

    5. Create a key-cert pair (a user certificate) using the following command from ORACLEOHS_HOME/bin:

      orapki wallet add -wallet [walletPath] -dn ["myDN"] -keysize 1024 -self_signed -validity 720
      

      For example,

      orapki wallet add -wallet $ORACLEOHS_HOME/instances/instance1/config/OHS/ohs1/keystores/myWallet –dn CN=example0123.us.mycompany.com,OU=ohs.ses,O=oracle,ST=ca,C=US -keysize 1024 -self_signed -validity 720
      

      The command adds a user certificate with the given dn and the wallet located at ORACLEOHS_HOME/instances/instance1/config/OHS/ohs1/keystores/myWallet. Note that instance1 is the name of the instance provided during installation and ohs1 is the name of the component provided during installation.

    6. Go back to the OWM utility and reopen the wallet: Close and open the wallet by selecting the correct directory. You should now see Certificate:[Ready] under the wallet.

    7. Save the wallet.

    8. Double-click Certificate:[Ready], click the Operations tab, and select export user certificate. Export the user certificate file (/tmp/OHSIdentityCertificate.crt) to a suitable location.

  2. Edit the file ssl.conf located at ORACLEOHS_HOME/instances/instanceName/config/OHS/componentName/.to include the following. Note that instanceName and componentName are specified during the installation of Oracle HTTP Server.

    <VirtualHost*:dddd>
    <IfModule mod_weblogic.c>
       WebLogicHost [SESHost]
       WebLogicPort [SESPort]
       Debug ALL
       WLLogFile [Location of the log file]
       SecureProxy On
       WlSSLWallet "MyWalletLocation"
    <Location /weblogic>
       SetHandler weblogic-handler
       PathTrim /weblogic
    </Location>
    <Location /console>
       SetHandler weblogic-handler
    </Location>
    </IfModule>
    </VirtualHost >
    

    For example, if the host is sesHost, the port is 7002, and the wallet is located at Oracle_Instance/config/Component_Type/Component_Name/keystores/myWallet, then the following configuration file is helpful:

    <IfModule mod_weblogic.c>
       WebLogicHost sesHost
       WebLogicPort 7002
       Debug ALL
       WLLogFile /scratch/exampleuser/Certificates/weblogic.log
       SecureProxy On
       WlSSLWallet "${ORACLE_INSTANCE}/config/${COMPONENT_TYPE}/${COMPONENT_NAME}/keystores/myWallet"
    <Location /weblogic>
       SetHandler weblogic-handler
       PathTrim /weblogic
    </Location>
    <Location /console>
       SetHandler weblogic-handler
    </Location>
    </IfModule>
    
  3. Edit the file mod_wl_ohs.conf located at ORACLEOHS_HOME/instances/instanceName/config/OHS/componentName/ to include the following:

    <IfModule weblogic_module>
            WebLogicHost [SES host name]
      WebLogicPort [SES HTTP port]
      Debug ON
      WLLogFile [Location of the log]
    </IfModule>
    <Location /search/query>
      SetHandler weblogic-handler
    </Location>
    <Location /search/admin>
      SetHandler weblogic-handler
    </Location>
    # For monitor SES URL
    <Location /monitor>
    SetHandler weblogic-handler
    </Location>
    # For Help links in Admin side
    <Location /search/ohw>
    SetHandler weblogic-handler
    </Location>
    

    For example if the Oracle SES host is sesHost and the port is 8001, then the file would contain:

    <IfModule weblogic_module>
      WebLogicHost sesHost
      WebLogicPort 8001
      Debug ON
      WLLogFile /scratch/exampleuser/weblogic.log
    </IfModule>
    <Location /search/query>
      SetHandler weblogic-handler
    </Location>
    <Location /search/admin>
      SetHandler weblogic-handler
    </Location>
    <Location /monitor>
      SetHandler weblogic-handler
    </Location>
    <Location /search/ohw>
      SetHandler weblogic-handler
    </Location>
    
  4. Exchange the certificates for Oracle HTTP Server and Oracle SES WebLogic servers. Use Oracle Wallet Manager to import and export certificates from and to the wallet, and use the Java keytool for the Oracle SES keystore. While importing a certificate, ensure that it is self-signed. If not, then you must import any of the certificates in the chain. See "Understanding SSL" for more information about certificate chains.

    Perform the following steps to exchange certificates:

    1. Export the SESIdentity key generated in step 2a of Configuring Oracle Secure Enterprise Search to Require SSL to a suitable location by running the following command:

      keytool -export -alias sescert –keystore $ORACLESES_HOME/wls/wlserver/server/lib/SESIdentity.jks -file /tmp/SESIdentityCertificate.crt
      

      The above command exports the certificate with the alias sescert and the keystore created in step 2a of Configuring Oracle Secure Enterprise Search to Require SSL to the file /tmp/SESIdentityCertificate.crt.

    2. Import the exported Oracle HTTP Server certificate created in step 1h to Oracle SES. Issue this command from MW_HOME/jdk160_21/jre/bin:

      keytool –file [LocationOfOHSIdentityCertificate]  -alias [MyOHSCerAlas] -import -trustcacerts -keystore [LocationofSESTrustStore] -storepass [MyPasswordForTheTrustStore] -storetype jks
      

      For example, if the location of the exported Oracle HTTP Server identity certificate is tmp/OHSIdentityCertificate.crt, the Oracle SES trust store is at MW_HOME/wlserver/server/lib/SESTrust.jks, the store password is welcome1, and the alias is ohsCert, then run the following:

      keytool -file tmp/OHSIdentityCertificate.crt  -alias ohsCert -import -trustcacerts -keystore $MW_HOME/wlserver/server/lib/SESTrust.jks -storepass welcome1 -storetype jks
      
    3. Import the Oracle SES certificate into Oracle HTTP Server wallet. The Oracle SES certificate is the file SESIdentityCertificate.crt exported in step 4a. To import this certificate, from the Oracle Wallet Manager utility, click Operations and select Import Trusted Certificate. Navigate to the location of the exported Oracle SES certificate (/tmp/SESIdentityCertificate.crt), and import it as a trusted certificate.

    4. Restart Oracle HTTP Server. Before restarting the server, ensure that the Auto Login option is enabled in Oracle Wallet Manager. The restart fails if the option is not enabled.

      To restart the server, run the following command from ORACLEOHS_HOME/instances/instance1/bin/:

      opmnctl restartproc process-type=OHS
      
    5. Restart SSL for the Oracle WebLogic Server by using the control page of the server.

      To access the control page, click Environment, then Server, and then Control.


See Also:

Oracle Database Advanced Security Administrator's Guide for more information about Oracle Wallet Manager

Oracle Database Advanced Security Administrator's Guide for more information about the orapki utility

Oracle HTTP Server Administering a Standalone Deployment Based on Apache 2.0 for more information about enabling SSL for Oracle HTTP Server


PK4O}]LwGwPKɒj?OEBPS/apsources001.htm|5 Setting Up Oracle Fusion Sources

Setting Up Oracle Fusion Sources

Using Oracle SES, you can search for documents within Oracle Fusion Applications. This is done by establishing a connection between Oracle SES and Oracle Fusion using a Fusion connector. To connect to and retrieve documents from Oracle Fusion, you must set up an Oracle SES Fusion identity management system using an identity plug-in, and an authorization management system using an authorization plug-in.

The identity plug-in enables Oracle SES to identify the set of users that can access the Fusion application. The authorization plug-in enables Oracle SES to determine the access rights that each user has for accessing different documents and data within WebCenter. Usually, all users may not have access to the entire data and document set within the application. Instead, each user may have access to a limited set of documents and data.

Defining a Fusion Source

A Fusion application source can be defined from the Source page. After you define the source, you can search for documents within the application.

To create a Fusion source:

  1. On the Home page, click the Sources subtab.

    This opens the Sources page.

  2. From Source Type list, select Oracle Fusion and click Create.

    This opens the Create Source page, which guides you through a multi-step procedure to enter source and authorization parameters.

  3. On the Create Source page, enter the source parameter values listed in Table 8-2.

  4. Click Next and specify values for the authorization parameters listed in Table 8-3.

  5. Click Create & Customize to create the source.

PK@W5|5PKɒj?OEBPS/crawler007.htm!- Monitoring the Crawling Process

Monitoring the Crawling Process

Monitor the crawling process in the Oracle SES Administration GUI by using a combination of the following:

  • Check the crawl progress and crawl status on the Home - Schedules page. (Click Refresh Status.)

  • Monitor your crawler statistics on the Home - Schedules - Crawler Progress Summary page and the Home - Statistics page.

  • Monitor the log file for the current schedule.

In Oracle Fusion Applications, you can also monitor crawler jobs in Enterprise Manager Fusion Applications Control. Figure 3-1 shows a crawler schedule named ABC, which appears in the Scheduling Services with a prefix of Oracle Secure Enterprise Search Crawler. The FUSION_APPS_SEARCH_APPID application identity submits all crawler jobs. All Oracle SES connectors use this identity to crawl searchable repositories within Fusion Applications.

Crawler Statistics

The following crawler statistics are shown on the Home - Schedules - Crawler Progress Summary page. Some statistics are also shown in the log file under "Crawling results".

Crawler Log Files

The log file records all crawler activity, warnings, and error messages for a particular schedule. It includes messages logged at startup, run time, and shutdown. Logging everything can create very large log files when crawling a large number of documents. However, in certain situations, it can be beneficial to configure the crawler to print detailed activity to each schedule log file.

On the Global Settings - Crawler Configuration page, you can select either to log everything or to log only summary information. You can also select the language the crawler uses to generate the log file.

A new log file is created when you restart the crawler. The location of the crawler log file can be found on the Home - Schedules - Crawler Progress Summary page. The crawler maintains the past seven versions of its log file. The most recent log file is shown in the Oracle SES Administration GUI. You can view the other log files in the file system.

The format of the log file name is:

search.crawler.iSES_Instance_IDdsData_Source_ID.timestamp.log

Where:

  • SES_Instance_ID is the SID of the SES database.

  • Data_Source_ID is the identifier of the data source being crawled.

  • timestamp is the starting time in Greenwich Mean Time (GMT) 24-hour MMDDHHmm format (month, day, hour, minute).

Each logging message in the log file is one line, containing the following six tab delimited columns, in order:

  1. Timestamp

  2. Message level

  3. Crawler thread name

  4. Component name. It is typically the name of the executing Java class.

  5. Module name. It can be internal Java class method name

  6. Message

PK!!PKɒj?OEBPS/crawler005.htm)O Overview of Attributes

Overview of Attributes

Each source has its own set of document attributes. Document attributes, like metadata, describe the properties of a document. The crawler retrieves values and maps them to a search attributes. This mapping lets users search documents based on their attributes. Document attributes in different sources can be mapped to the same search attribute. Therefore, users can search documents from multiple sources based on the same search attribute.

After you crawl a source, you can see the attributes for that source. Document attribute information is obtained differently depending on the source type.

Document attributes can be used in tasks such as document management, access control, or version control. Different sources can have different attribute names that are used for the same idea; for example, version and revision. It can also have the same attribute name for different ideas; for example, "language" as in natural language in one source but as programming language in another. Document attribute information is obtained differently depending on the source type.

Oracle SES has several default search attributes. They can be incorporated in search applications for a more detailed search and richer presentation.

Search attributes are defined in the following ways:

  • System-defined search attributes, such as title, author, description, subject, and mimetype.

  • Search attributes created by the Oracle SES administrator.

  • Search attributes created by the crawler. During crawling, the crawler plug-in maps the document attribute to a search attribute with the same name and data type. If not found, then the crawler creates a new search attribute with the same name and type as the document attribute defined in the crawler plug-in.


Note:

Search attribute names must be unique; two attributes cannot have the same name. For example, if a search attribute exists with a String data type, and another search attribute is discovered by the crawler with the same name but a different data type, then the crawler ignores the second attribute.

To prevent this conflict and allow Oracle SES to index both attributes, check the list of Oracle SES attribute names and types in Oracle SES Attributes before creating new attributes.


Attributes For Different Source Types

Table and database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.

For Siebel 7.8 sources, specify the attributes in the query while creating the source. For Oracle E-Business Suite and Siebel 8 sources, specify the attributes in the XML data feed.

For many source types, such as OracleAS Portal, e-mail, NTFS, and Microsoft Exchange sources, the crawler picks up key attributes offered by the target systems. For other sources, such as Documentum eRoom or Lotus Notes, an Attribute list parameter is in the Home - Sources - Customize User-Defined Source page. Any attributes that you define are collected by the crawler and available for search.

System-Defined Search Attributes

There are also two system-defined search attributes, Urldepth and Infosource Path.

Urldepth measures the number of levels down from the root directory. It is derived from the URL string. In general, the depth is the number of slashes, not counting the slash immediately following the host name or a trailing slash. An adjustment of -2 is made to home pages. An adjustment of +1 is made to dynamic pages, such as the example in Table 3-3 with the question mark in the URL.

Urldepth is used internally for calculating relevance ranking, because a URL with a smaller URL depth is typically more important.

Table 3-3 lists the Urldepth of some example URLs.

Infosource Path is a path representing the source of the document. This internal attribute is used in situations where documents can be browsed by their source. The Infosource Path is derived from the URL string.

For example, for this URL:

 http://example.com/portal/page/myo/Employee_Portal/home.htm

The Infosource Path is:

portal/page/myo/Employee_Portal

If the document is submitted through a connector, this value can be set explicitly by using the DocumentMetadata.setSourceHierarchy API.

PKl} ))PKɒj?OEBPS/bisources.htmA Configuring Access to Built-in Sources

5 Configuring Access to Built-in Sources

Among the built-in sources are the data repositories familiar to everyone, such as Web sites and e-mail. Most of them can be set up very quickly. This chapter contains the following topics:

PK *FAPKɒj? OEBPS/toc.ncxA Oracle® Secure Enterprise Search Administrator's Guide, 11g Release 2 (11.2.1) Cover Table of Contents List of Tables Oracle Secure Enterprise Search Administrator's Guide, 11g Release 2 (11.2.1) Preface What's New Learning the Basics Introduction to Oracle Secure Enterprise Search Overview of Oracle Secure Enterprise Search Source Types Oracle Secure Enterprise Search Components Secure Search in Oracle Fusion Applications Oracle Secure Enterprise Search Features Getting Started with the Oracle SES Administration GUI Getting Started Basics for the Administration GUI Understanding the Oracle SES Administration GUI Starting and Stopping Oracle SES Understanding Crawling Overview of the Oracle Secure Enterprise Search Crawler Overview of Crawler Settings Overview of XML Connector Framework Configuring Support for Image Metadata Overview of Attributes Understanding the Crawling Process Monitoring the Crawling Process Parallel Query Indexing Customizing the Search Results Adding Suggested Content in Search Results Customizing the Relevancy of Search Attributes Providing Faceted Navigation Creating Data Sources Configuring Access to Built-in Sources Setting Up Web Sources Setting Up Table Sources Setting Up File Sources Setting Up E-Mail Sources Setting Up Mailing List Sources Setting Up OracleAS Portal Sources Setting Up Federated Sources Configuring Access to Content Management Sources Setting Up EMC Documentum Content Server Sources Setting Up Microsoft SharePoint Sources Setting Up Oracle Content Database Sources Setting Up Oracle Content Server Sources Configuring Access to Collaboration Sources Setting Up EMC Documentum eRoom Sources Setting Up Lotus Notes Sources Setting Up Microsoft Exchange Sources Setting Up NTFS Sources for Windows Setting Up NTFS Sources for UNIX Setting Up Oracle Calendar Sources Setting Up Oracle Collaboration Suite E-Mail Sources Configuring Access to Applications Sources Setting Up Oracle Fusion Sources Setting Up Oracle WebCenter Sources Setting Up Oracle E-Business Suite Sources Setting Up Database Sources Setting Up Siebel 7.8 Sources Setting Up Siebel 8 Sources Advanced Topics Security in Oracle Secure Enterprise Search Overview of Oracle Secure Enterprise Search Security Enabling Secure Search Configuring Secure Search with OracleAS Single Sign-On Configuring Secure Search with Oracle Access Manager Single Sign-On SSL and HTTPS Support in Oracle Secure Enterprise Search Changing the Master Encryption Key Administering Oracle SES Instances Increasing Data Storage Capacity Tuning Crawl Performance Tuning Search Performance and Scalability Turning On Debug Mode Supporting Failover in Oracle RAC Monitoring Oracle Secure Enterprise Search Integrating with Google Desktop Accessing the Oracle WebLogic Server Administration Console Oracle Secure Enterprise Search APIs Overview of Oracle Secure Enterprise Search APIs Oracle Secure Enterprise Search Web Services APIs XML Connector Examples and Schemas Configuration File XML Schema Definition Control Feed Example Control Feed XML Schema Definition Data Feed Example Data Feed XML Schema Definition URL Crawler Status Codes Third Party Licenses Apache Software Eclipse Software Egothor Software Javascript Bubbling Library Plug-in Software Snowball Software Visigoth Software Yahoo! Inc. Error Messages Glossary Index Copyright PKjAAPKɒj?OEBPS/bisources006.htm 5 Setting Up OracleAS Portal Sources

Setting Up OracleAS Portal Sources

An OracleAS Portal source enables users to search across multiple portal installations and repositories, such as Web pages, disk files, and pages on other OracleAS Portal instances. Oracle Secure Enterprise Search can securely crawl both public and private OracleAS Portal content.

To create an OracleAS Portal source: 

  1. On the Home page, select the Sources secondary tab to display the Sources page.

  2. For Source Type, select OracleAS Portal.

  3. Click Create to display the Create OracleAS Portal Source page.

  4. Complete the following fields. Click Help for additional information.

    • Source Name: Name that you assign to this OracleAS Portal source.

    • URL Base: Base URL for OracleAS Portal.

    • Page Groups: List of page groups in OracleAS Portal retrieved when you click Retrieve Page Groups. Select the ones to crawl.

  5. Click Create & Customize.

  6. Select the Authentication tab.

  7. Select Enable OracleAS Single Sign-On Authentication and enter your credentials.

  8. Click Apply.

  9. Follow the steps for crawling and indexing in "Getting Started Basics for the Administration GUI" for the mailing list schedule.

Crawling a Folder or Page

The portal crawler can crawl a subtree under a specific folder or page instead of under an entire portal tree.

To set the boundary rule to crawl a specific folder or page: 

  1. On the Home page, click the Sources secondary tab to display the Sources page.

  2. Select a source and click Edit to display the Edit User-Defined Source page.

  3. Click the URL Boundary Rules subtab.

  4. Under Inclusion Rules for the URL, select the starts with rule and enter the value of the PORTAL_PATH for the folder or page.

    For example, to crawl only the P2 subtree of a portal tree, enter the path from the root to P2, such as /Proot/P1/P2.

OracleAS Portal Search Attributes

The crawler picks up key attributes offered by OracleAS Portal, as described in Table 5-1.

Table 5-1 OracleAS Portal Source Attributes

AttributeDescription

createdate

Date the document was created

creator

User name of the person who created the document

author

User-editable field so that they can specify a full name or whatever they want

page_path

Hierarchy path of the portal page/item in the portal tree (contains page titles)

portal_path

Hierarchy path of the portal page/item in the portal tree, used for browsing and boundary rules (contains page names)

When searching OracleAS Portal 10.1.2, portal_path appears as upper case in the browse. When searching OracleAS Portal 10.1.4, portal_path appears in lowercase.

title

Title of the document

description

Brief description of the document

keywords

Keywords of the document

expiredate

Expiration date of the document

host

Portal host

infosource

Path of the Portal page in the browse hierarchy

language

Language of the portal page or item

lastmodifieddate

Last modified date of the document

mimetype

Usually 'text/html' for portal

perspectives

User-created markers that can be applied to pages or items, such as 'INTERNAL ONLY', 'REVIEWED', or 'DESIGN SPEC'. For example, a Portal containing recipes could have items representing recipes with perspectives such as 'Breakfast', 'Tea', 'Contains Nuts', 'Healthy' and one particular item could have several perspectives assigned to it.

wwsbr_name_

Internal name of the portal page or item

wwsbr_charset_

Character set of the portal page or item

wwsbr_category_

Category of the portal page or item

wwsbr_updatedate_

Date the last time the portal page or item was updated

wwsbr_updator_

Person who last updated the page or item

wwsbr_subtype_

Subtype of the portal page/item (for example, container)

wwsbr_itemtype_

Portal item type

wwsbr_mime_type_

Mimetype of the portal page or item

wwsbr_publishdate_

Date the portal page or item was published

wwsbr_version_number_

Version number of the portal item


PK(%5 5PKɒj?OEBPS/cover.htm Cover

Oracle Corporation

PKJPKɒj?OEBPS/schemas002.htm< Control Feed Example

Control Feed Example

The follow example shows a control feed used in an XML-connector based source.

<?xml version="1.0" encoding="windows-1252" ?> 
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xsi:schemaLocation="http://xmlns.oracle.com/orarss 
C:\project_drive\SES Application Search\RSS Format 
Schema\orarss.xsd"> 
 <channel> 
   <title>Contacts</title> 
   <link>http://my.company.com/rss</link> 
   <description>The channel contains feed for contacts</description>
   <lastBuildDate>2006-04-03T12:20:20.00Z</lastBuildDate> 
   <channelDesc xmlns="http://xmlns.oracle.com/orarss">
     <feedType>control</feedType></channelDesc>
   <item>
     <link>file://localhost/C:\project\rss_feeds\test.xml</link> 
   </item> 
   <item> 
     <link>file://localhost/C:\project\rss_feeds\test2.xml</link> 
   </item> 
   <item operation="control"> 
     <link>http://my.host.com/contacts/control.xml</link></item><item>
   <link>file://localhost/C:\project\rss_feeds\test3.xml</link> 
   </item> 
 </channel> 
</rss> 
PK(mhA < PKɒj?OEBPS/tuning008.htm % Accessing the Oracle WebLogic Server Administration Console

Accessing the Oracle WebLogic Server Administration Console

The Oracle WebLogic Server Administration Console is a Web browser-based user interface that displays the current status of the middle tier. For example, the Home page shows a graph of the Response and Load, and the Performance page shows a graph of the Heap Usage.

To access the Oracle WebLogic Server Administration Console:  

  1. Enter the following URL in a Web browser, replacing host:port with the host name and port for the WebLogic Administration Console:

    http://wls_host:wls_port/console

  2. Log in with your WebLogic administrative user name and password.

PK| PKɒj?OEBPS/tuning.htmt Administering Oracle SES Instances

10 Administering Oracle SES Instances

This chapter provides information about tuning and general management of Oracle SES instances. It contains the following topics:

PK8PKɒj?OEBPS/schemas001.htm Configuration File XML Schema Definition

Configuration File XML Schema Definition

The following example shows the XSD for the configuration file.

<?xml version="1.0" encoding="windows-1252"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns="http://xmlns.oracle.com/search/rsscrawlerconfig"targetNamespace="http://xmlns.oracle.com/search/rsscrawlerconfig" elementFormDefault="qualified">

<xsd:element name="rsscrawler">
 
   <xsd:annotation>      
    <xsd:documentation>        
     RSS crawler configuration paramters      
    </xsd:documentation>    
   </xsd:annotation>
    
<xsd:complexType>      
 <xsd:sequence>
        
  <xsd:element name="sourceName" type="xsd:string" minOccurs="0"/>

  <xsd:element name="feedType" default="dataFeed">          
   <xsd:simpleType>
    <xsd:restriction base="xsd:string">              
     <xsd:enumeration value="controlFeed"/>              
     <xsd:enumeration value="dataFeed"/>              
     <xsd:enumeration value="directoryFeed"/>
    </xsd:restriction>          
   </xsd:simpleType>        
 </xsd:element>
        
<xsd:element name="feedLocation">          
 <xsd:complexType>            
  <xsd:simpleContent>              
   <xsd:extension base="xsd:anyURI"/>            
  </xsd:simpleContent>          
 </xsd:complexType>        
</xsd:element>
        
<xsd:element name="errorFileLocation" type="xsd:string" minOccurs="0">
 <xsd:annotation>            
  <xsd:documentation>
Optional. This should be the absolute path of a location to which the status feeds    are uploaded. This location should be in the same computer from where data feeds are fetched. If not specified, the status feeds are uploaded to the same location as the data feeds. If HTTP is used to fetch the data feed, the value of this tag should be the HTTP URL to which the status feed can be posted. If this tag is not specified, the status feed is posted to the HTTP URL of the data feed. 
  </xsd:documentation>          
 </xsd:annotation>        
</xsd:element>
        
<xsd:element name="securityType" default="noSecurity" maxOccurs="1" minOccurs="0">          
 <xsd:simpleType>
  <xsd:restriction base="xsd:string">              
   <xsd:enumeration value="identityBased"/>              
   <xsd:enumeration value="attributeBased"/>              
   <xsd:enumeration value="noSecurity"/>
  </xsd:restriction>         
 </xsd:simpleType>       
</xsd:element>
       
<xsd:element name="securityAttribute" minOccurs="0" maxOccurs="unbounded">
 <xsd:complexType>          
  <xsd:simpleContent>
   <xsd:extension base="xsd:string">              
    <xsd:attribute name="name" type="xsd:string" use="required"/>              
    <xsd:attribute name="grant" type="xsd:boolean" default="true"/>
   </xsd:extension>           
  </xsd:simpleContent>        
 </xsd:complexType>       
</xsd:element>
      
</xsd:sequence>    
</xsd:complexType>

</xsd:element> 
</xsd:schema> 
PKjPKɒj?OEBPS/crawler003.htm.S Overview of XML Connector Framework

Overview of XML Connector Framework

Oracle SES provides an XML connector framework to crawl any repository that provides an XML interface to its contents. The connectors for Oracle Content Server, Oracle E-Business Suite 12, and Siebel 8 use this framework.

Every document in a repository is known as an item. An item contains information about the document, such as author, access URL, last modified date, security information, status, and contents.

A set of items is known as a feed or channel. To crawl a repository, an XML document must be generated for each feed. Each feed is associated with information such as feed name, type of the feed, and number of items.

To crawl a repository with the XML connector, place data feeds in a location accessible to Oracle SES over one of these protocols: HTTP, FTP, or FILE. Then generate an XML Configuration File that contains information such as feed location and feed type. Create a source with a source type that is based on this XML connector and trigger the crawl from Oracle SES to crawl the feeds.

There are two types of feeds:

Guidelines for the target repository generating the XML feeds:

XML Configuration File

The configuration file is an XML file conforming to a set schema.

The following is an example of a configuration file to set up an XML-based source:

<rsscrawler xmlns="http://xmlns.oracle.com/search/rsscrawlerconfig">  
     <feedLocation>ftp://my.host.com/rss_feeds</feedLocation>
     <feedType>directoryFeed</feedType>
     <errorFileLocation>/tmp/errors</errorFileLocation>
     <securityType>attributeBased</securityType> 
     <sourceName>Contacts</sourceName>
     <securityAttribute name="EMPLOYEE_ID" grant="true"/> 
</rsscrawler> 

Where

PK ..PKɒj?OEBPS/bisources005.htm Setting Up Mailing List Sources

Setting Up Mailing List Sources

A mailing list source enables users to search messages that were sent to a mailing list on an IMAP server.

The Oracle SES crawler is IMAP4 compliant. To crawl mailing list sources, you need an IMAP e-mail account. Oracle recommends that you create an e-mail account that is used solely for Oracle SES to crawl mailing list messages. The crawler is configured to crawl one IMAP account for all mailing list sources. Therefore, all mailing list messages to be crawled must be found in the Inbox of the e-mail account specified on this page. This e-mail account should be subscribed to all the mailing lists. New postings for all the mailing lists are sent to this single account and subsequently crawled.

Messages deleted from the global mailing list e-mail account are not removed from the Oracle SES index. The mailing list crawler deletes messages from the IMAP e-mail account as it crawls. The next time the IMAP account for mailing lists is crawled, the previous messages no longer exist. Any new messages in the account are added to the index and consequently deleted from the account. This keeps the global mailing list IMAP account clean. The Oracle SES index serves as a complete archive of all the mailing list messages.

The following procedures identify the basic steps for setting up a mailing list source using the Oracle SES Administration GUI. For more information on each page, click Help.

To create a mailing list source: 

  1. Enter the global mailing list settings:

    1. On the Global Settings page, choose Mailing List Settings under Sources to display the Global Mailing List Settings page.

    2. Complete the following fields. Click Help for additional information.

      User Name: IMAP e-mail account that is used to crawl the messages. This user must be on all of the mailing lists identified as a mailing list source.

      Password: Password for User Name.

      IMAP Server: Address of the IMAP server, such as mail.example.com.

    3. Click Apply.

  2. On the Home page, select the Sources secondary tab to display the Sources page.

  3. For Source Type, select Mailing List.

  4. Click Create to display the Create Mailing List Source page.

  5. Complete the following fields. Click Help for additional information.

    • Source Name: Name that you assign to this table source.

    • Mailing List: Name of the mailing list to be searched, such as news@example.com.

  6. Click Create.

  7. Follow the steps for crawling and indexing in "Getting Started Basics for the Administration GUI" for the mailing list schedule.

Mailing List Attributes

Oracle SES crawls and indexes these search attributes.

  • Author

  • Title

  • Subject

  • Language

  • LastModifiedDate

PKPKɒj?OEBPS/clsources007.htm"& Setting Up Oracle Collaboration Suite E-Mail Sources

Setting Up Oracle Collaboration Suite E-Mail Sources

Oracle Collaboration Suite 10g Mail (Oracle Mail) implements the IMAP protocol, which is used by Oracle SES to retrieve data. You must login to the mail server using the user name and password to retrieve information. Note that Oracle Collaboration Suite mail server has a flag that allows the administrator to crawl mails of all users. The IMAP connector uses this feature to crawl all the mails of all users using the mail server's administration login.

Creating an Oracle Collaboration Suite E-Mail Source

Create an Oracle Collaboration Suite E-Mail source on the Home - Sources page. Select Oracle Collaboration Suite E-Mail from the Source Type list, and click Create.

Enter values for the following parameters: 

  • Email Server Address: The IP address or DNS name of the IMAP e-mail server to be crawled, with the port number. This also specifies if the e-mail server follows IMAP or IMAPS protocol. Required.

    Use the format:

    [imap | imaps]://IPaddress:portNumber

    An exception is thrown if this parameter is null. If the server address is incorrect, then an exception is logged at the time of accessing the server.

  • Email Server Admin User: The administration user name to access the e-mail server. Required.

  • Email Server Admin Password: The password of the e-mail admin user. Required.

  • Authentication Attribute: Attribute used to validate the user. This varies based on the identity plug-in used for authentication. Oracle Collaboration Suite E-Mail uses Oracle Internet Directory for authentication, so set this parameter to mail.

  • LDAP Server: The LDAP server information (IP address or DNS name, and so on).

  • LDAP Server Port: The LDAP server port number.

  • LDAP Base: The domain to be searched; for example, dc=oracle, dc=com.

  • LDAP Query: The query string defining the users whose e-mails must be crawled. This parameter is used for user-level partitioning.

    For example, to crawl only users with names beginning with A and having an e-mail in the domain us.example.com, the query is (|(cn=A*)(mail=*@us.example.com)).

  • LDAP Admin User Name: The administrator user name of the LDAP server. Required.

  • LDAP Admin Password: The password of the admin user of the LDAP server.

  • Days to which the crawling needs to be done: Specifies the number of days earlier to which the crawling must be done. The current date (time of crawl) is the base. For example, a value of 7 specifies crawling messages that are seven or more days old. Today is the default value.

  • Days from which crawling needs to be done: The number of days earlier from which the crawling is done. The current date (time of crawl) is the base. For example, a value of 200 specifies crawling messages with dates that are 200 or fewer days old. All mail is the default value.

  • Folders to crawl: The comma-delimited list of folders to be crawled. '*' means crawl all folders. Other valid values are INBOX, sent, and trash. This does not support regular expressions.

  • Folders not to crawl: The comma-delimited list of folders not to be crawled. This list is considered only if the Folders to crawl parameter has the * wildcard as its value. Valid values are INBOX, sent, and trash. This parameter does not support regular expressions.

  • Remove Deleted messages from Index: Indicates whether to keep the index for deleted mails in incremental recrawls. Valid values are yes and no. Any other value is considered to be yes.

  • Display URL template: The display URL to be used for viewing the documents. This should have the placeholder for e-mail or user ID. For example, to see the full e-mail address in the display URL, enter the following:

    http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$EMAIL

    To see the user ID, enter the following:

    http://<>/um/templates/message_list.uix?state=message_list&cAction=openmessage&message_wmuid=$UID

  • Email Server Version: The email server to be crawled. Valid values are ocs10g or beehive.

  • Revisit Skipped Attachments: Controls whether the crawler revisits attachments that were skipped in earlier crawls because they did not meet the document type inclusion rules. This setting provides an alternative to a force recrawl after changing the document type inclusion rules. Set to TRUE to revisit skipped attachments, or set to FALSE otherwise (default). The skipped attachments must have been crawled in Oracle SES 11.1.2.2 or later to be revisited.

PKU f'&"&PKɒj?OEBPS/apsources004.htm| Setting Up Database Sources

Setting Up Database Sources

With a database source, you can crawl any JDBC-enabled database. A database source can crawl database content projected as a view or query. Each record in the view or query result set is interpreted as a document. You can create public database sources or secure database sources.

Required Columns in Database Sources

The view or query to be crawled must contain the columns described in Table 8-9. All column names must be in upper case.

Table 8-9 Database Source Required Columns

Column Type Description

CONTENT

VARCHAR2 or CLOB

Document content.

KEY

VARCHAR2 or RAW

Key to identify the record in the record set. You can use a custom name for this column by modifying drivers.properties. See "Configuring the JDBC Driver".

LANG

VARCHAR2

Document language in ISO 639-1 language code; for example, en for English or ja for Japanese.

LASTMODIFIEDDATE

DATE

Last modified date of the document.

If you do not have a column for the mandatory LastModifiedDate attribute, use a constant date value in the SQL query for the source. Use a format that the getTimestamp method of the corresponding JDBC driver accepts without errors. Incremental changes to records are not picked up by re-crawls, so always schedule a full crawl.

URL

VARCHAR2

Display URL for the document. The value for this column cannot be null. This connector requires that there is URL-based access to the records in the result set of the view or query.


Optional Columns in Database Sources

The view or query can contain the optional columns describe in Table 8-10. Any other column is considered an attribute of the document.

If the query or view contains both content and either an attachment or attachment link, then one column (in the following order) is considered document content:

  1. ATTACHMENT_LINK

  2. ATTACHMENT

  3. CONTENT

Even if the ATTACHMENT_LINK or ATTACHMENT column is specified in the query, you should include the mandatory CONTENT column. However, the content of ATTACHMENT_LINK or ATTACHMENT is indexed as document content.

Table 8-10 Database Source Optional Columns

Column Type Description

ATTACHMENT

BLOB

Binary attachments for the document.

ATTACHMENT_LINK

VARCHAR2

A link to the attachment for the document. HTTP, HTTPS, FILE, and FTP are valid.)

CONTENTTYPE

VARCHAR2

Content type of the document; for example, "text/html" for HTML documents, "application/pdf" for PDF documents, or "application/msword" for Microsoft Word documents.

Leave blank when the content type is unknown or varied so that is it not feasible to specify the content type for each document individually.

PATH

VARCHAR2

Path to the document. It is used in the browse feature. It can represent the organizational hierarchy of the document. For example, level1#level2#level3.

TITLE

VARCHAR2

Title of the document to be displayed in the Oracle SES search result page.

LMD_TIMEZONE

VARCHAR2

Specifies the time zone for the date specified in LASTMODIFIEDDATE. For example CST. Oracle SES converts the last modified date from the specified time zone to Oracle SES time zone. If the time zone is not specified, then the date is considered to be in the Oracle SES time zone.


Configuring the JDBC Driver

Depending on your database source, you may need to configure the JDBC driver.

To crawl any third-party database:  

  1. Download the appropriate JDBC driver jar for JRE 1.6 into ORACLE_HOME/search/lib/plugins/oracleapplications.

  2. Add the JRE 1.6 JDBC driver jar file name to the JDBC Driver Class parameter, as described in Table 8-11.

  3. Add the JRE 1.6 JDBC driver jar file name to the classpath in MANIFEST.MF of appsjdbc.jar and DBCrawler.jar.

  4. Restart the middle tier.

For a key attribute that is not named KEY: 

  1. When configuring the database connector, specify the column name in the Key Attribute Name parameter, as described in Table 8-11.

  2. In the crawling query, use the key attribute name as the alias for the key value column name. In this example, ID was entered as the value of the Key Attribute Name parameter and is the alias for KEYVAL:

    SELECT keyval id, content, url, lastmodifieddate, lang FROM sales_only
    

Query File XML Schema Definition

The following is the XSD that defines the format of the XML query file.

<!--[if !supportEmptyParas]-->XSD for the XML sub-queries file:<!--[endif]-->
<?xml version="1.0" encoding="windows-1252" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"  xmlns="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries" targetNamespace="http://xmlns.oracle.com/ses/sqlconnector/detail-attribute-queries"  elementFormDefault="qualified">
  <xsd:complexType name="sqlQueriesType">
    <xsd:annotation>
      <xsd:documentation>
        Specify detail and attribute queries as a source parameter for 
        each document fetched by the parent query.
      </xsd:documentation>
    </xsd:annotation>
    <xsd:sequence>
      <xsd:element name="attachmentQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify detail queries to fetch detail records for each document
            represented by the parent record. The parent records, fetched by 
            the parent query, are specified as a source parameter. Each record 
            in the document (parent) query can be associated with several detail
            (child) records. Each of these child records has a single column
           specifying the content that will be indexed as attachment to the
           parent document. The child query should select a single column, and
           the WHERE clause should have bind variables of the form 
           ##PARENT ATTR##, where the value of PARENT ATTR from the parent
           record is substituted while executing the detail query. 
         </xsd:documentation> 
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1">
             <xsd:complexType>
                <!--Attribute to specify whether the contents retrieved by the
                  query is inline attachment or link to an attachment. The value
                  "true" specifies that the content is a link to an attachment
                  and "false" indicates inline attachment. Default value is
                  false.-->                                    
                <xsd:attribute name="link" default="false"/>
                <!--Content type of the attachment. If no value is specified, 
                  SES will auto-detect the content type.-->
                <xsd:attribute name="contenttype" default="null"/>
              </xsd:complexType>
            </xsd:element>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
      <xsd:element name="attributeQueries" maxOccurs="1" minOccurs="0">
        <xsd:annotation>
          <xsd:documentation>
            Specify queries to retrieve values of attributes of the parent
            document. Use this feature if the attribute can contain multiple
            values for a document. If the attribute is a single-valued 
            attribute, then it can be specified in the parent query. The WHERE
            clause should have bind variables of the form ##PARENT ATTR##,
            where the value of PARENT ATTR from the parent record is substituted
            while executing the query.
          </xsd:documentation>
        </xsd:annotation>
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="query" maxOccurs="unbounded" minOccurs="1"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:element name="sqlQueries" type="sqlQueriesType"/>
</xsd:schema>

Creating Public Database Sources

Public database sources have no security implemented in Oracle SES.

To create a public database source: 

  1. Create a database source on the Home - Sources page. Select Database from the Source Type list, and click Create.

  2. Enter the database source parameters as described in Table 8-11.

  3. Click Next.

  4. Set authorization to No Access Control List, and clear the authorization manager class name and jar name.

  5. Click Create to create the database source.

Table 8-11 Database Source Parameters

Parameter Value

Database Connection String

JDBC connection string for the database with content to be crawled. The JDBC string is driver-specific. For example, jdbc:oracle:thin:@server:port:SID

User ID

User ID to log in to the database specified in Database Connection String. This user ID must have access to the schema owning the view specified in View or the query specified in Query.

Password

Password to log in to the database specified in Database Connection String.

View

Table or view to be crawled. Specify either View or Query, not both.

JDBC Driver Class

JDBC driver class to connect to the database. For example, oracle.jdbc.driver.OracleDriver.

Leave blank to use the default driver:

  • Oracle Database: oracle.jdbc.driver.OracleDriver

  • SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver

Key Attribute Name

Name of the KEY attribute in the crawling query/view. The default value is KEY.

Document Count

Maximum number of documents to be crawled before indexing. Enter -1 to crawl all documents before indexing.

Query File

Path to the XML file specifying the subqueries to crawl attachments and attributes of documents corresponding to every record in the main query. See "Query File XML Schema Definition".

Query

Query projecting the content to be crawled. Specify either View or Query, not both.

URL Prefix

String that precedes the content of the URL column and forms a display URL for the document.

Cache File

Prefix of a local file name in which the contents can be temporarily cached while crawling.

Path Separator

The character separating the tokens in the PATH of the document as returned by the query or view. It must be a single character, and it cannot be a space, a single or double quote, or a control character.

Parse Attributes

Enter true to extract the values of the attributes from the document content specified in the SOLUTION or CONTENT column. Enter false otherwise, or when the content is type text/html.

In this example, attr1 and attr2 are extracted as attributes of the document with values 22 and 333 respectively:

<attr1>22</attr1> <attr2>333</attr2>

Content up to the first attribute is interpreted as the document content. The remaining portion is used to extract attributes only. In this example, only "page" is considered document content:

page<attr1>22</attr1> is <attr2>333</attr2> dispersed

Remove Deleted Documents

Enter true to remove deleted documents from the index; otherwise, enter false.

Attachment Link Authentication Type

Standard Java authentication type used by the application serving the link in the ATTACHMENT_LINK column. Enter one of these values:

  • PUBLIC: No authentication.

  • DIGEST: Digest authentication

  • BASIC: Basic authentication

  • NATIVE: Native authentication in the source

Attachment Link User ID

User ID for accessing the links specified in the ATTACHMENT_LINK column. Required when the link targets are secure.

Attachment Link Password

Password for Attachment Link User ID.

Attachment Link Realm

Realm of the application serving the link in the ATTACHMENT_LINK column. Required when the link targets are secure.

Grant Security Attributes

Leave blank for public sources.

Deny Security Attributes

Leave blank for public sources.

JDBC Driver Class

JDBC driver class used to connect to the database. For example, oracle.jdbc.driver.OracleDriver.

Key Attribute Name

Name of the key column in the database source. The default value is KEY, as described in Table 8-9, "Database Source Required Columns".


Defining User-Defined Security for Database Sources

Some attributes in the view or query being crawled must be identified as security attributes. The values of these attributes determine if a user is authorized to view a document. These attributes can be either GRANT attributes or DENY attributes.

To create a database source with user-defined security: 

  1. On the Home - Sources page, select Database from the Source Type list and click Create.

  2. Enter values for the parameters as described Table 8-11. Specify the GRANT and DENY attributes as values for parameters Grant Security Attributes and Deny Security Attributes respectively. If there are multiple GRANT or DENY security attributes, then separate attribute names with a space.

  3. Click Next.

  4. Enter values for the authorization plug-in parameters:

    • Authorization Database Connection String: JDBC connection string for the authorization database. The values of the security attributes to which a given user is authorized are retrieved from this database. The JDBC string is driver-specific.

    • User ID: User ID to login to the authorization database.

    • Password: Password to login to the authorization database.

    • Authorization Query: SQL query to retrieve the values of security attributes to which a given user is authorized. The SELECT clause of this query should have all the security attributes specified in Step 2 with identical names. This query can be of two types:

      • The query can return a single record for a given user. The value in each security attribute column should be a space-delimited list of values to which the user is authorized.

      • The query can return multiple records for a given user. The value in each security attribute column of every row of the result set of this query is interpreted as a single value.

        Specify a question mark (?) as the placeholder for the username in the query.

    • Single Record Query: Enter true if the authorization query returns a single record for a given user.

    • Authorization User ID Format: Format of the user ID to be used in the SQL query specified in Authorization Query. This format should be an authentication attribute of the active identity plug-in.

      For example, if Oracle SES is configured with the Oracle Internet Directory identity plug-in (which supports DN, nickname and e-mail address as authentication attributes), then this parameter can be specified as nickname. The nickname of the current user is then used in the SQL authorization query to build the security filter.

      If no value is specified for this parameter, then the user ID in the canonical form of the active identity plug-in is used in the authorization query to build the security filter.

  5. Click Create to create the database source.

Database Search Attributes

Database sources have no predefined attributes. The crawler collects attributes from columns defined during source creation. You must map the columns to the search attributes.

Example of Creating a Database Source With User-Defined Security

The document set to be crawled is in tables T1 and T2 as specified by the following query:

SELECT 
     T1.ID, 
     T1.DESCRIPTION, 
     T2.NAME, 
     T1.LAST_UPDATE_DATE, 
     T2.AUTH_ID, T1.HIERARCHY
FROM 
     T1, T2 
WHERE 
     T1.ID = T2.DOC_ID 

The document content is provided by the T1.DESCRIPTION column.

Each document has an HTTP access URL of the form http://my.company.com/docserver?doc_id=document_identifier.

The value of T2.AUTH_ID controls access to a document. For example, user SCOTT can access a document only if the value of T2.AUTH_ID for the document is in the list of AUTH_IDs for SCOTT as retrieved by the following query:

SELECT AUTH_ID FROM USER_AUTH A 
     WHERE A.USER='SCOTT' 

This source can be crawled as a database source type with the following source parameter values: