Oracle® Secure Enterprise Search Administrator's Guide 11g Release 1 (11.1.2.0.0) Part Number E14130-04 |
|
|
View PDF |
This chapter describes the basic components of Oracle Secure Enterprise Search: the sources, crawler, and user interfaces. It contains the following topics:
Oracle Secure Enterprise Search (Oracle SES) is a complete stacked application. Oracle Database 11g Release 1 (11.1.0.7) Enterprise Edition (EE) is installed with Oracle SES. Use of Oracle Database EE is restricted to storing and managing the search index, metadata, cache, and Oracle SES configuration information. The Oracle WebLogic Server is included with Oracle SES. This embedded version is provided solely to run the Oracle SES user interfaces and APIs.
The Oracle SES home software use is restricted to support the Oracle SES database repository, and no other databases created using the Oracle SES executables are supported. Oracle SES connectors listed on the Oracle price list may be licensed separately to use with the Oracle SES installation.
Some connectors shipped with Oracle SES require additional licensing fees. Contact Oracle sales for details.
Oracle Secure Enterprise Search enables a secure, high quality, easy-to-use search across all enterprise information assets. Key features include:
The ability to search and locate public, private and shared content across Intranet web-servers, databases, files on local disks or file servers, IMAP e-mail, document management systems, applications, and portals
Highly secure crawling, indexing, and searching
A simple, intuitive search interface leading to an excellent user experience
Excellent search quality, with the most relevant items for a query shown first, even when the query spans diverse public and private data sources
Analytics on search results and usage patterns
Sub-second query performance
Ease of administration and maintenance, leveraging existing IT expertise
See Also:
Oracle Secure Enterprise Search Installation Guide for requirements, tips, and information on getting started using Oracle SES
Oracle Technology Network for updated information on known issues, code samples, and best practices:
The Oracle Secure Enterprise Search Release Notes has version information and known issues
A collection of information is called a source. Each source has a type that identifies where the information is stored, such as on a Web site or in a database table. Oracle SES provides several built-in source types and an architecture for adding new, custom types.
Additionally, Oracle SES provides access to more third-party data repositories than any other enterprise search engine, without requiring you to generate any additional coding. While these data sources are classified as user-defined source types, they are available the same as the built-in source types.
This guide organizes these user-defined source types into content management sources, collaboration sources, and applications sources. For information that is not stored in one of these predefined sources, you can use the Oracle SES extendable architecture to define a new data type.
Oracle SES also provides authorization cache sources for facilitating access to secure data.
Built-in Sources
Web: Represents the content on a specific Web site. Web sources facilitate maintenance crawling of specific Web sites.
Table: Represents content in a table or view in Oracle Database.
File: The set of documents that can be accessed through the file system protocol.
E-mail: Derives content from e-mails sent to a specific e-mail address. When Oracle SES crawls an e-mail source, it collects e-mail from all folders set up in the e-mail account, including Drafts, Sent Items, and Trash e-mails.
Mailing list: Derives its content from e-mails sent to a specific mailing list.
OracleAS Portal: Lets you search across multiple OracleAS Portal repositories, such as Web pages, files on disk, and pages on other OracleAS Portal instances.
Federated Sources: Enable you to share content across multiple Oracle SES instances.
Content Management Sources
EMC Documentum Content Server
FileNet Content Engine
FileNet Image Services
Hummingbird Document Management
IBM DB2 Content Manager
Microsoft SharePoint
Open Text Livelink
Oracle Content Database
Oracle Content Server (formerly Stellent Content Server)
You may need to install client libraries and obtain a license from the vendor for some content sources to work. For example, EMC Documentum requires installation of a compatible version of Documentum Foundation Classes (DFC), which is a Java library, on the computer running Oracle SES. Oracle SES does not ship with DFC.
Collaboration Sources
EMC Documentum eRoom
IBM Lotus Notes
Microsoft Exchange
Files in Microsoft NT file systems (NTFS)
Oracle Calendar
IMAP for OCS E-Mail Server
Oracle Applications Sources
Database
Oracle E-Business Suite
Siebel 7.8
Siebel 8
Authorization Sources
User Authorization Cache (UAC)
Federated UAC
See Also:
Oracle Secure Enterprise Search Release Notes for a list of supported platformsOracle SES includes the following components:
The Oracle Secure Enterprise Search Administration GUI enables you to manage and monitor Oracle SES components using a browser-based interface. These are among the tasks that you perform:
Define sources and crawling scope
Configure the search application
Monitor crawl progress and search quality
Customize search results
See Also:
Oracle SES administration tutorial for help understanding common administrator tasks:
http://st-curriculum.oracle.com/tutorial/SESAdminTutorial/index.htm
Oracle SES Administration GUI Help
Oracle SES uses a crawler to collect data from the sources. The Oracle SES crawler is a Java process activated by a schedule. When activated, the crawler spawns a configurable number of processor threads that fetch information from various sources and index the documents. This index is used for searching sources.
The crawler maps links and analyzes relationships. Whenever the crawler encounters embedded non-HTML, or non-textual documents during the crawling, it automatically detects the document type, and filters and indexes the document.
Figure 1-1 shows the crawler in relation to other Oracle SES components and a variety of data sources.
Figure 1-1 Crawler Collecting Information for Oracle SES
See Also:
Chapter 4, "Understanding Crawling"Oracle Secure Enterprise Search provides several APIs. For example, with the Web Services API, you can integrate Oracle SES search capabilities into your search application. You can also customize the default Oracle SES ranking to create a more relevant search result list for your enterprise or configure clustering for customized applications.
The Crawler Plug-in API enables you to create a custom secure crawler plug-in (or connector) to meet your requirements. The Document Service API accepts input from documents and performs some operation on it. For example, you could create a document service for auditing or to show custom metatags.
Information in an enterprise can be spread across Web pages, databases, mail servers or other collaboration software, document repositories, file servers, and desktops. Oracle SES searches all your data through the same interface. Oracle SES is fully globalized and works with many languages including Chinese, Japanese, Korean, Arabic, and Hebrew.
This section introduces a few of the features in Oracle SES. It includes the following topics:
See Also:
Chapter 4, "Understanding Crawling" for more features relating to the crawlerMuch of the information within an organization is publicly accessible. Anyone is allowed to view it. Therefore, it is relatively easy for a crawler to find and index that information.
However, there are other sources that are protected. These protected sources might be viewable only by certain users or groups of users. For example, while users can search in their own e-mail folders, they should not be able to search anyone else's e-mail.
For protected sources, the Oracle SES crawler indexes documents with the proper access control list. When end users perform a search, only documents that they have privileges to view are returned.
See Also:
"Enabling Secure Search"Oracle SES can search multiple Oracle SES instances with their own document repositories and indexes. It provides a unified framework to search the different repositories that are crawled, indexed, and maintained separately.
Federated search allows a single query to run across all Oracle SES instances. It aggregates the search results to show one unified result list to the user. User credentials are passed along with the query so that each federation endpoint can authenticate the user against its own document repository.
Figure 1-2 illustrates the federation architecture and two options for an end user to connect through a browser to Oracle SES. Option 1 allows users to connect their browsers directly to Oracle SES using the end-user graphical interface. Option 2 retrieves results from Oracle SES through Web Services after arbitrary post-processing, such as changing the look-and-feel or embedding the results in a page. For this option, the browser connects to remote applications, which connect to the Web Services API.
Oracle SES provides an extensible crawler plug-in framework that lets you crawl and index proprietary document repositories. The Crawler Plug-in API enables you to create a custom secure crawler plug-in to meet your requirements. You can also create an identity plug-in and an authorization plug-in for crawling that data source.
See Also:
The Oracle Secure Enterprise Search home page at http://www.oracle.com/technology/products/oses/index.html
for updated information on known issues, code samples, and best practices