Crawlers are extensible components used to index documents from a specific type of document repository, including Lotus Notes, Microsoft Exchange, Documentum and Novell. Crawlers only import links to documents; the documents themselves are left in their original locations. Crawlers access content from an external repository and index it in the portal. Portal users can search for and open crawled files through the portal Knowledge Directory. Crawlers can be used to provide access to files on protected backend systems without violating access restrictions.
In ALI version 5.x and above, crawlers are implemented as remote services that use XML over SOAP and HTTP. Using the IDK, you can create remote crawlers that access a wide range of backend systems. The purposes of a crawler are threefold:
Class | Description |
---|---|
ACLEntry | Bean-type class representing a security domain and user or group. |
ChildContainer | Bean-type class representing the ChildContainer data type. SOAP RPC. |
ChildDocument | Bean-type class representing the ChildDocument data type. |
ChildRequestHint | A type-safe enumeration of portal child request queries. ChildRequestHint s can signal the backend to behave differently. |
ContainerMetaData | Stores metadata information about the Container object. |
CrawlerConstants | Constants related to crawlers. |
CrawlerInfo | A NamedValueMap for storing information about Crawler settings. |
CWSLogger | A simple logging implementation for passing string messages back to the portal. The CWS stub will instantiate this logger and pass it through to ContainerProvider.Initialize() . It is up to the developer to keep and use this object. |
DocumentFormat | Enumeration for portal DocumentFormat flag. The document format flag tells the backend whether to return the actual requested document or a metadata file more suitable for indexing. |
DocumentMetaData | Stores metadata information about a Document object. |
TypeNamespace | Enumeration for the portal's Document Type Map namespaces. The document type identifier and namespace help the portal decide how to interpret a document's metadata. Standard namespaces include "file" and "MIME." Custom crawlers may define new namespaces in the OTHER namespace. |
Interface | Description |
---|---|
IContainer | An interface that allows the portal to systematically crawl a remote document repository by querying all documents and child containers (sub-nodes) that a particular container (node) contains and the respective users and groups that can access that particular container (node). |
IContainerProvider | An interface that allows the portal to iterate over a backend directory structure. |
ICrawlerLog | An instance of this interface will be passed into the service's Initialize calls. To return messages to the portal job log, keep track of this instance and invoke the Log method with messages. The portal currently retrieves messages after IContainerProvider.AttachToContainer but this behavior is subject to change in future versions. |
IDocument | An interface that allows the portal to query information about and retrieve documents from a backend repository. |
IDocumentProvider | An interface that allows the portal to specify documents for retrieval from a backend repository. |