Oracle® Secure Enterprise Search Administrator's Guide 10g Release 1 (10.1.7) Beta Part Number B32011-01 |
|
|
View PDF |
Beta Draft
This chapter contains the following topics:
Oracle Secure Enterprise Search (SES) provides uniform search capabilities over multiple repositories.
Oracle SES uses a crawler to collect data from these sources. The crawler supports a number of built-in source types, as well as a published plug-in (or connector) architecture for adding new types. Multiple Oracle SES instances can also share content through the federated source type.
Oracle SES supports the following built-in source types:
Web: A Web source represents the content on a specific Web site. Web sources facilitate maintenance crawling of specific Web sites.
Table: A table source represents content in an Oracle database table or view.
File: A file source is the set of documents that can be accessed through the file protocol.
E-mail: An e-mail source derives its content from e-mails sent to a specific e-mail address. When Oracle SES crawls an e-mail source, it collects e-mail from all folders set up in the e-mail account, including Drafts, Sent Items, and Trash e-mails.
Mailing list: A mailing list source derives its content from e-mails sent to a specific mailing list.
OracleAS Portal: An OracleAS Portal source allows users to search across multiple OracleAS Portal repositories, such as Web pages, files on disk, and pages on other OracleAS Portal instances.
Federated: A federated source represents a connection to a remote Oracle SES instance or application that maintains its own index. Oracle SES can issue a search to this remote instance, and the remote instance can return results.
Oracle Calendar: An Oracle Calendar source represents the content in an Oracle Calendar repository. Oracle SES can crawl content (meetings and events) and metadata in Oracle Calendar and provide secure full-text search over an Oracle Calendar repository. You can specify more than one thread to crawl. Deleted items are removed from the index during incremental crawling. You can search based on title, author, start or end date (year, month, day), event type, status, or location.
Oracle Content Database: An Oracle Content Database source represents the content in an Oracle Content Database repository.
Oracle SES can crawl documents and metadata in Oracle Content Database and provide secure full-text search over an Oracle Content Database repository. It also provides metadata search and browse, which allows a search to be done against a specific subfolder in the hierarchy. Documents in Oracle Content Database are organized into Folders. Oracle SES navigates the folder hierarchy to crawl all documents in Oracle Content Database. It creates an index, stores the metadata, and accesses information in Oracle SES to provide search according to the end users' permissions.
Oracle SES supports incremental crawling; that is, it only crawls and indexes documents that have changed since the last crawling. A document is re-crawled if either the content or the direct security access information of the document changes. A document is also re-crawled if it is moved within Oracle Content Database and the end user has to access the same document with a different URL. Deleted documents are removed from the index during incremental crawling.
The following diagram illustrates Oracle SES architecture.
See Also:
|
Oracle SES includes the following components:
The Oracle SES crawler is a Java process activated by a set schedule. When activated, the crawler spawns a configurable number of processor threads that fetch information from various sources and index the documents. This index is used for searching sources.
The crawler maps links and analyzes relationships. Whenever the crawler encounters embedded non-HTML, or non-textual documents during the crawling, it automatically detects the document type and filters and indexes the document.
Use the Oracle Secure Enterprise Search administration tool to manage and monitor Oracle SES components. For example:
Define sources and crawling scope
Configure the search application
Monitor crawl progress and search performance
See Also:
|
Oracle Secure Enterprise Search provides several APIs. For example, the Crawler Plug-in API enables you to create a custom secure crawler plug-in (or connector) to meet your requirements. With the Web Services API, you can integrate Oracle SES search capabilities into your search application.
Oracle SES also provides an out-of-the-box search application.
Information in an enterprise can be spread across Web pages, databases, mail servers or other collaboration software, document repositories, file servers, and desktops. Oracle SES searches all your data through the same interface. Oracle SES is fully globalized and works with 27 languages including Chinese, Japanese, Korean, Arabic, and Hebrew.
This section introduces a few of the features in Oracle SES. It includes the following topics:
See Also: Chapter 3, "Understanding Crawling and Searching" for more features relating to the crawler |
Much of the information within an organization is publicly accessible. Anyone is allowed to view it. Therefore, it is relatively easy for a crawler to find and index that information.
However, there are other sources that are protected. These protected sources might only be viewable by certain users or groups of users. For example, while users can search within their own e-mail folders, they should not be able to search anyone else's e-mail.
For protected sources, the Oracle SES crawler will index documents with the proper access control list. When end users perform a search, only documents that they have privileges to view will be returned.
Oracle Secure Enterprise Search provides the capability of searching multiple Oracle SES applications with their own document repositories and indexes. It provides a unified framework to search the different document repositories that are crawled, indexed, and maintained separately. Federated search allows a single query to be run across all indexes. It aggregates the search results to show one result list to the user. User credentials are passed along with the search so that each remote application can authenticate the user against its own document repository.
The following diagram illustrates Oracle SES federation architecture.
Oracle SES offers a Web services API that lets you integrate Oracle SES search capabilities into your search application.
Oracle SES provides an extensible crawler plug-in (or connector) framework that lets you crawl and index proprietary document repositories.
See Also:
|