Netscape Compass Server Administrator's Guide

[Contents] [Previous] [Next] [Index]

Chapter 4
Managing the Compass Database

The Netscape Compass Server stores its descriptions of resources in a database. For the most part, you shouldn't need to work with the database directly, but there are some configuration and maintenance tasks you can perform through the Server Manager.

This chapter describes the following database tasks:

Importing Resource Descriptions

Normally, the items in your Compass Server database come from the robot: You tell the robot which sites to visit, and it locates and describes all the resources it finds there. But you can also import databases of existing items, either from other Compass Servers, from Netscape Enterprise Servers using the AutoCatalog feature, or from databases generated from other sources.

There are a number of circumstances in which you might want to import existing databases of resource descriptions instead of sending the robot to create them anew. Most of them deal with either reducing the amount of network traffic or breaking the network into small enough parts that you can complete the indexing in a reasonable amount of time. Here are some examples:

About Import Agents

When a Compass Server needs to import resource descriptions from another server or from a database, it uses an import agent. An import agent is a process that retrieves a number of resource descriptions from an external source and merges that information into the local database.

In general, an import agent contains parameters that tell it where to go to import resource descriptions, what to ask for when it gets there, and some other information that fine-tunes the way it goes about the job. For more details, see Editing an Import Agent.

The Import Agents form shows a list of all the currently defined import agents. If there are none currently defined, the list includes only a New button, used to create new import agents. If you have defined one or more import agents, there will be several other buttons below the list of import agents.

When you run import agents, whether manually or automatically, the Compass Server runs all enabled import agents. An enabled import agent has the On button checked beside it in the list. Newly created import agents are enabled by default. You can enable or disable any of the defined import agents by clicking the On/Off buttons next to the desired items in the import agent list. A disabled import agent does nothing.

You can also perform the following import tasks once you have defined at least one import agent:

Creating a New Import Agent

When you want to import resource descriptions into your Compass Server database, you create an import agent to perform the task. Once you have created an import agent, you can reuse it as often as you need to.

To create a new import agent, do the following:

  1. Click New to open the Import Agent Properties form for a new import agent.

  2. After you edit the new import agent (as described in the next section), you will be looking at the list of import agents, where you can run the import agents manually to import immediately, or wait for a scheduled import-agent run.

Editing an Import Agent

The Import Agent Properties form allows you to change any aspect of an import agent. You use the same form both to create a new import agent and to modify an existing one.

To edit an import agent, do the following:

  1. Make any desired changes to fields in the Import Agent Properties form.
    The fields are described below.

  2. Click OK to save the changes.
The following sections describe the fields in the Import Agent Properties form that define the parameters of an import agent.

Deleting Import Agents

There are several times when you might need to delete import agents:

In any case, the procedure for deleting the import agent is the same.

To delete one or more import agents, do the following:

The Delete Import Agents form shows a list of all existing import agents much like that on the Import Agents form, but instead of the On/Off option, there is a checkbox in front of each import agent.

  1. Check the Delete checkbox in front of each import agent you want to delete.

  2. Click OK to delete the specified import agents.
After you delete import agents, you return to the updated Import Agents form.

Running Import Agents

Once you have defined import agents, you can run them in either of two ways, manually (immediately) or automatically as a scheduled, periodic job.

When you run import agents, you run them all as a batch. That is, the Compass Server goes through the list of all defined import agents, running all that are marked as "on," or enabled, and ignoring those marked as "off," or disabled. If you temporarily disable a number of import agents to run a manual import with only a specified subset of agents, you must be sure to reenable the disabled agents afterward so they run at the next scheduled time.

Your choice of whether to run manually or on schedule probably depends on the nature of the import task. Running Import Agents Manually is most appropriate for either the first time you import, just to get started, or for one-time import jobs. Scheduling Import Agents is best for periodic updates from the same group of sources.

Running Import Agents Manually

In general, you need to run import agents manually only when you create a completely new Compass Server that needs an initial set of resource descriptions, when you have added new import agents to an existing Compass Server, or when you know that the source associated with an import agent has a large number of changes you want to incorporate into your database.

To run import agents manually, do the following:

  1. Ensure that all import agents you want to run are enabled (On), and that any import agents you don't want to run at this time are disabled (Off).

  2. Click Run to run all enabled import agents.
The Compass Server opens a new Navigator window and displays the progress of the import agent process in that window. You can do other work while the import agents run.

WARNING: Do not close this import agent status window. Closing the window will cancel the import operation. You can minimize the window to get it out of your way, but do not close it or open another URL in the window until the import agent finishes its job.
Note that by default an import agent imports all resource descriptions added to the source or changed since the last time it imported from that source. If you rerun an import agent immediately, it might appear to "fail," because it will not find any new or changed resource descriptions.

Scheduling Import Agents

Once you have your Compass Server running, you will probably set up any routine import agents to run automatically on a regular schedule. You should coordinate the import agent schedule with the robot's automated schedule, or at least run the import processes often enough to minimize any lag between fresh information on the remote servers and import into the database.

Importing is a very efficient process that does not overburden your server. If anything, you should err on the side of importing too often rather than too infrequently.

To run import agents automatically, do the following:

  1. Click Schedule to open the Schedule Task form.
    The Schedule task form indicates whether you currently have an automatic schedule activated and shows the day(s) and time of scheduled import agent runs, if any.

  2. Follow the directions for task scheduling in Scheduling Tasks.

Editing the Database Schema

A schema determines what information your Compass Server maintains on each resource, and in what form. The design of your schema determines two factors that affect the usability of your index:

The schema is essentially a master data structure for resource descriptions in the database. Depending on how you define and index the fields in that data structure, users will have varying degrees of access to the resources.

WARNING: The schema is intimately tied to the structure of the files used by the Compass Server and its robot. You should only make changes to the data structure by using the schema tools in the Netscape Server Manager for your Compass Server. You should never edit the schema file (schema.rdm) directly, even though it is a text file.

Understanding the Schema

A schema in the context of the Compass Server is the definition of the contents of a resource description. That is, the schema determines the names of the fields in the resource description and the type of each field. For example, a schema for a document might have fields for the name of the document, the dates of its creation and last modification, its length, and so on.

The schema for a Compass Server appears to the user only when choosing fields to display in search results or when constructing a complex query using the Advanced Search screen. In each case, the schema appears as a hierarchy of boxes, and the user's interaction with them is limited to choosing and arranging individual elements.

As the administrator, you have the ability to control what items appear in the schema, how those fields are filled from incoming resource descriptions, and how users can use those items.

Editing the Schema

Using the schema editor, you can modify any aspect of the schema for your Compass Server. In all likelihood, the most common thing you will do to the schema is add a searchable field, which is shown as an example below.

To edit the database schema, do the following:

  1. Click Enable Java Applet to activate the schema editor applet in a separate window.
    The schema appears in outline form on the left side of the applet. The right side shows the attributes of the selected field. The attributes are described in the table below.

  2. Make any changes to any field attributes.

  3. Choose File|Save to commit your changes.

  4. Choose File|Close to close the schema editor applet.

Schema Attributes

For each item in the database schema, you can change the following attributes:

Attribute Meaning
Editable

If checked, this attribute indicates that the attribute appears in the RD Editor, so you can change its values. The RD Editor is explained in Editing Resource Descriptions.

Indexable

If checked, this attribute indicates that the field appears in the pop-up menu in the Advanced Search screen. This allows users to search for values in that particular field.

Description

This is a free-text string for your use. You can use it for comments or annotations. The Compass Server ignores this field.

Example: Adding a Searchable Attribute

When the robot encounters a META tag in a document, it converts the tag into a field in the resource description. If there is already an item in the schema with a corresponding name, it places the META tag contents in that field. If there is no predefined field with that name, it adds one, and assigns the value.

There are two key facts here:

Suppose Airius Airlines has a corporate standard that every document that deals with a particular type of its planes contains a META tag called PlaneType that contains the names of the planes covered. The Compass Server administrator wants to enable users to search for documents concerning those particular types of planes. These are the steps to do so:

  1. Start the Schema Editor.

  2. Click any of the existing fields in the schema.

  3. Click New Peer.
    This creates a new schema item named New, with the name selected.

  4. Type the new name, PlaneType.

  5. Click the checkbox next to Indexable.

  6. Choose File|Save.
When saving, the Compass Server needs to be off, so the Schema Editor can reindex the database. After the reindex is complete, the administrator can restart the server.

Users can then go to the Advanced Search and choose PlaneType from the field list, and search for particular types of planes mentioned in that particular field.

Converting Schema Names

There are several instances where you might encounter discrepancies between the names used for fields in database schemas. One is when you import resource descriptions from one server into another. You cannot always guarantee that the two servers use identical names for items in their schemas. Similarly, when the robot converts HTML META tags from a document into schema fields, the document controls the names.

The Compass Server deals with these by allowing you to define schema conversions, which are mappings of external schema names into valid names for fields in your database. You define your schema conversions on the Schema Conversion form.

To convert incoming schema field names, do the following:

  1. Type the name of the incoming schema field name you want to convert in the text box on the left.

  2. Type the name of the field in your schema that will receive the contents of the converted field in the text box on the right.

  3. Click OK when you have finished.
You can add more conversions by clicking More. If you want to delete the last conversion in the list, click Fewer.

Optimizing the Database

After you run the robot a number of times, the database files and indexes can become fragmented with empty space, causing them to take up more disk space than necessary and slowing down both user searches and robot operations. The solution to this is to periodically optimize the database.

To optimize the database, do the following:

  1. Make sure the Compass Server and the robot are not running.

  2. Click OK.

Partitioning the Database

Netscape Compass Server allows you to split the physical files that contain the Compass database across multiple disks, file systems, directories, or partitions. By spreading the database across different physical or logical devices, you can create a larger database than would fit on a single device.

The Database Partitions form shows a list of up to 16 partitions defined for the database. By default, the Compass Server sets up the database to use only one directory.

You can perform two kinds of manipulations on the database partitions:

The Compass Server does not perform any checking to ensure that individual partitions have space remaining. It is your responsibility to maintain adequate free space for the database.

Adding New Partitions

You can add new database partitions up to a maximum of 16 total partitions. Keep in mind, however, that once you increase the number of partitions, you will need to delete the entire database if you later want to reduce the number again. To add partitions to your database, do the following:

  1. Make sure both the server and the robot are not running.

  2. Type the full pathname of a file to hold the new partition.
    Note that the Compass Server does not check to ensure that the pathname is valid. If you type an invalid pathname, the operation will fail.

  3. Click Add New Partition.
    This creates the path if needed, redistributes the database records to take advantage of the new space, and reindexes the database.

  4. Repeat steps 2-3 as needed for additional partitions.

Moving Partitions

You can change the physical location of any of your database partitions by specifying the name of the new location. Similarly, you can rename an existing partition.

To move a database partition, do the following:

  1. Make sure the Compass Server is not running.

  2. Type the new full pathname for the partition in the text box next to the existing pathname.

  3. Click Update Partitions.
    The Compass Server moves the partition to its new location, then reindexes the database.

Editing Resource Descriptions

At times you will find it necessary to change the contents of one or more resource descriptions. For example, you might need to correct a typographical error copied into a resource description from an original document. You edit resource description contents using the RD Editor.

NOTE: One specialized use of the RD Editor is to assign categories to resource descriptions when the robot fails to do so. This is described in Handling Unassigned Resources.

Deleting the Database

There might be times when you want to delete your entire resource database. Such occasions might include extreme corruption of the database files, major redesign of the schema or taxonomy, or a dramatic change in the sites to be indexed.

WARNING: For whatever reason you want to delete the database, the procedure is the same. You should not simply delete database files from the disk. Doing so will likely result in your having to reinstall the Compass Server.
To delete the entire resource database, do the following:

  1. Make sure the Compass Server is not running.
    Similarly, the robot and all import agents should not be running when you delete the database, as they rely on the database being there. They will crash if you delete the database with them running.

  2. Select either or both of the available options:

  3. Click OK to delete the selected portions of the database.
After deleting the database, you can run the robot or import resource descriptions to refill the database.

Purging Expired Resource Descriptions

One attribute of a resource description is its expiration date. Your robots can set the expiration date from HTML META tags or from information provided by the resource's server. By default, resource descriptions expire in three months from creation unless the resource specifies a different expiration date.

Periodically your Compass Server should purge expired resource descriptions from its database. You can perform this task manually, or you can schedule it to occur automatically.

To purge expired resource descriptions from a server, do the following:

  1. Make sure the Compass Server is not running.

  2. Click Expire All RDs.
To schedule the purging of expired resource descriptions, do the following:

  1. Access the Server Manager for the server.

  2. Choose Tasks|Schedule Expire to display the Schedule Expire Agent form.

  3. Follow the directions for task scheduling in Scheduling Tasks.

Reindexing the Database

In certain instances, you might need to reindex the resource description database for the Compass Server. One obvious instance is if you have edited the schema to add or remove an indexed field. You might also need to reindex the database if a disk error corrupts the index file. It's also a good idea to reindex after adding a large number of new resource descriptions.

The time required to reindex the database is roughly proportional to the number of records in the database, so if you have a large database, you should probably perform reindexing at a time when the server is not in high demand.

To reindex the database, do the following:

  1. Make sure the Compass Server is turned off.

  2. Click Reindex All Partitions.
    The Compass Server rebuilds the index files for all partitions in the database.
You can also schedule reindexing as an automated task. Although you should not need to run reindexing as a regular, periodic task, you might want to schedule a single reindexing for a time when users will not need to access the Compass Server, such as during the night.

To schedule automated reindexing of the database, do the following:

  1. Access the Server Manager for the Compass Server.

  2. Choose Tasks|Schedule Reindex to display the Schedule Reindex Agent form.

  3. Follow the directions for task scheduling in Scheduling Tasks.
Be sure to deactivate the schedule after it runs.

Checking the Database

Each Compass Server stores its resource descriptions in a database. You can use the following procedure to get information about the number of sites indexed and the number of resources from each in the database. This report is also available to end users through the About Compass page.

The Compass Database Analysis form indicates whether a database analysis page has already been generated, and if so, whether the page is up-to-date. It also includes the site analysis report in the form of a table.


[Contents] [Previous] [Next] [Index]

Last Updated: 02/12/98 13:34:06


Copyright © 1997 Netscape Communications Corporation

Any sample code included above is provided for your use on an "AS IS" basis, under the Netscape License Agreement - Terms of Use