49 Site Capture File System

The Site Capture file system is created during the Site Capture installation process to store installation-related files, property files, sample crawlers, and sample code used by the FirstSiteII crawler to control its site capture process. The file system also provides the framework in which Site Capture organizes custom crawlers and their captures.

This chapter contains the following topics:

49.1 General Directory Structure

Figure 49-1 shows Site Capture's most frequently accessed folders to help administrators find commonly used Site Capture information. All folders, except for <crawlerName>, are created during the Site Capture installation process. For information about <crawlerName> folders, see Table 49-1, "Site Capture's Frequently Accessed Folders" and Section 49.2, "Custom Folders."

Figure 49-1 Site Capture File System

Description of Figure 49-1 follows
Description of ''Figure 49-1 Site Capture File System ''

Table 49-1 Site Capture's Frequently Accessed Folders

Folder Description

/fw-site-capture

The parent folder.

/fw-site-capture/crawler

Contains all Site Capture crawlers, each stored in its own crawler-specific folder.

/fw/site-capture/crawler/_sample

Contains the source code for the FirstSiteII sample crawler.

Note: Folder names beginning with the underscore character ("_") are not treated as crawlers. They are not displayed in the Site Capture interface.

/fw-site-capture/crawler/Sample

Represents a crawler named "Sample." This folder is created only if the "Sample" crawler was installed during the Site Capture installation process.

The Sample folder contains an /app folder, which stores the CrawlerConfiguration.groovy file specific to the "Sample" crawler. The file contains basic configuration code for capturing any dynamic site. The code demonstrates the use of required methods (such as getStartUri) in the BaseConfigurator class.

When the Sample crawler is invoked in static or archive mode, subfolders are created within the /Sample folder.

/fw-site-capture/logs

Contains the crawler.log file, a system log for Site Capture.

/fw-site-capture/publish-listener

Contains the following files needed for installing Site Capture for publishing-triggered crawls:

  • fw-crawler-publish-listener-1.1-elements.zip

  • fw-crawler-publish-listener-1.1.jar

/fw-site-capture/Sql-Scripts

Contains the following scripts, which create database tables that are needed by Site Capture to store its data:

  • crawler_db2_db.sql

  • crawler_oracle_db.sql

  • crawler_sql_server_db.sql

/fw-site-capture/webapps

Contains the ROOT/WEB-INF/ folder.

/fw-site-capture/webapps/ROOT/WEB-INF

Contains the log4j.xml file, used to customize the path to the crawler.log file.

/fw-site-capture/webapps/ROOT/WEB-INF/classes

Contains the following files:

  • sitecapture.properties file, where you can specify information for the WebCenter Sites application on which Site Capture is running. The information includes WebCenter Sites' host machine name (or IP address) and port number.

  • root-context.xml file, where you can configure the Site Capture database.


49.2 Custom Folders

A custom folder is created for every crawler that a user creates in the Site Capture interface. The custom folder, <crawlerName>, is used to organize the crawler's configuration file, captures, and logs, as summarized in Figure 49-2.

Figure 49-2 Site Capture's Custom Folders: <crawlerName>

Description of Figure 49-2 follows
Description of ''Figure 49-2 Site Capture's Custom Folders: <crawlerName>''