Plumtree.Remote.Crawler

Provides classes and interfaces for crawling, indexing and representing documents from other systems in the AquaLogic Interaction Knowledge Directory.

Crawlers are extensible components used to index documents from a specific type of document repository, including Lotus Notes, Microsoft Exchange, Documentum and Novell. Crawlers only import links to documents; the documents themselves are left in their original locations. Crawlers access content from an external repository and index it in the portal. Portal users can search for and open crawled files through the portal Knowledge Directory. Crawlers can be used to provide access to files on protected backend systems without violating access restrictions.

In ALI version 5.x and above, crawlers are implemented as remote services that use XML over SOAP and HTTP. Using the IDK, you can create remote crawlers that access a wide range of backend systems. The purposes of a crawler are threefold:

Iterate over and catalog a hierarchical data repository.
Retrieve and index metadata about each document in the data repository and include it in the portal Knowledge Directory and search index. (After the documents are indexed, the crawler is still required.)
Retrieve individual documents on demand through the portal Knowledge Directory, enforcing any user-level access restrictions.

Namespace hierarchy

Classes

Class	Description
ACLEntry	Bean-type class representing a security domain and user or group.
ChildContainer	Bean-type class representing the `ChildContainer` data type. SOAP RPC.
ChildDocument	Bean-type class representing the `ChildDocument` data type.
ChildRequestHint	A type-safe enumeration of portal child request queries. `ChildRequestHint`s can signal the backend to behave differently.
ContainerMetaData	Stores metadata information about the `Container` object.
CrawlerConstants	Constants related to crawlers.
CrawlerInfo	A `NamedValueMap` for storing information about Crawler settings.
CWSLogger	A simple logging implementation for passing string messages back to the portal. The CWS stub will instantiate this logger and pass it through to `ContainerProvider.Initialize()`. It is up to the developer to keep and use this object.
DocumentFormat	Enumeration for portal `DocumentFormat` flag. The document format flag tells the backend whether to return the actual requested document or a metadata file more suitable for indexing.
DocumentMetaData	Stores metadata information about a `Document` object.
TypeNamespace	Enumeration for the portal's Document Type Map namespaces. The document type identifier and namespace help the portal decide how to interpret a document's metadata. Standard namespaces include "file" and "MIME." Custom crawlers may define new namespaces in the OTHER namespace.

Interfaces

Interface	Description
IContainer	An interface that allows the portal to systematically crawl a remote document repository by querying all documents and child containers (sub-nodes) that a particular container (node) contains and the respective users and groups that can access that particular container (node).
IContainerProvider	An interface that allows the portal to iterate over a backend directory structure.
ICrawlerLog	An instance of this interface will be passed into the service's `Initialize` calls. To return messages to the portal job log, keep track of this instance and invoke the `Log` method with messages. The portal currently retrieves messages after `IContainerProvider.AttachToContainer` but this behavior is subject to change in future versions.
IDocument	An interface that allows the portal to query information about and retrieve documents from a backend repository.
IDocumentProvider	An interface that allows the portal to specify documents for retrieval from a backend repository.

Plumtree.Remote.Crawler Namespace

Classes

Interfaces