Administrator's Guide
Introduction
Chapter 1 Compass Server Concepts
Chapter 2 Configuring Server Preferences
Chapter 3 Filling the Database

Chapter 4 Managing the Compass Database
    Importing Resource Descriptions
      About Import Agents
      Creating a New Import Agent
      Editing an Import Agent
      Deleting Import Agents
      Running Import Agents
        Running Import Agents Manually
        Scheduling Import Agents
    Editing the Database Schema
      Understanding the Schema
      Editing the Schema
        Schema Attributes
        Example: Adding a Searchable Attribute
    Converting Schema Names
    Optimizing the Database
    Partitioning the Database
      Adding New Partitions
      Moving Partitions
    Editing Resource Descriptions
    Deleting the Database
    Purging Expired Resource Descriptions
    Reindexing the Database
    Recovering the Database
    Checking the Database

Chapter 5 Setting Up Categories
Chapter 6 Customizing the User Interface
Chapter 7 Monitoring the Server
Glossary
Index

Managing the Compass Database

The iPlanet Compass Server stores its descriptions of resources in a database. For the most part, you shouldn't need to work with the database directly, but there are some configuration and maintenance tasks you can perform through the Server Manager.

This chapter describes the following database tasks:

Importing Resource Descriptions

Normally, the items in your Compass Server database come from the robot: You tell the robot which sites to visit, and it locates and describes all the resources it finds there. But you can also import databases of existing items, either from other Compass Servers, from iPlanet Web Servers or Netscape Enterprise Servers using the AutoCatalog feature, or from databases generated from other sources.

There are a number of circumstances in which you might want to import existing databases of resource descriptions instead of sending the robot to create them anew. Most of them deal with either reducing the amount of network traffic or breaking the network into small enough parts that you can complete the indexing in a reasonable amount of time. Here are some examples:

  • Scalability - A number of robots working simultaneously on different sites can build a very large database more efficiently than a single robot.

    For instance, if it takes a single robot three full days to completely traverse a network and index all its resources, you might instead install six robots, each assigned to a particular portion of the network. Working in parallel, they could complete the indexing in half a day, and you could import the results into a central database.

  • iPlanet AutoCatalogs - The iPlanet Web Server has a built-in robot that can generate and export descriptions of all the documents served by a particular installation. A Compass Server can import those descriptions, so instead of having to traverse all the pages on that server across the network, it can import the already generated descriptions from the AutoCatalog. This greatly reduces the network traffic involved in indexing that site.

    Keep in mind that the resource descriptions generated by AutoCatalog are not as rich as those generated by the Compass Server robot. For internal sites, you might choose to bypass the AutoCatalog and send the robot to locate and describe resources anyway.

  • Network topography - In cases where the central database is truly distant from the servers being indexed, it can be helpful to generate the resource descriptions locally, then have the central database import the various remote databases periodically.

    For example, say a company runs three local sites, one in Europe, one in Asia, and one in North America. Each of those sites can generate an index of its contents locally, and a centralized Compass Server at any of those sites can import the results from the others. That way, instead of hundreds or thousands of intercontinental network contacts, the servers can have a few, longer import sessions.

  • Multiple indexes - It's possible to have multiple databases that include resources from some or all of the same sites. This is sometimes called mirroring. In such a case, it is better if each of the databases doesn't have to send a robot to each site to generate the same information. If the site can generate resource descriptions once, each of the databases that includes that site can import the items.
      About Import Agents

When a Compass Server needs to import resource descriptions from another server or from a database, it uses an import agent. An import agent is a process that retrieves a number of resource descriptions from an external source and merges that information into the local database.

In general, an import agent contains parameters that tell it where to go to import resource descriptions, what to ask for when it gets there, and some other information that fine-tunes the way it goes about the job. For more details, see Editing an Import Agent.

The Import Agents form shows a list of all the currently defined import agents. If there are none currently defined, the list includes only a New button, used to create new import agents. If you have defined one or more import agents, there will be several other buttons below the list of import agents.

When you run import agents, whether manually or automatically, the Compass Server runs all enabled import agents. An enabled import agent has the On button checked beside it in the list. Newly created import agents are enabled by default. You can enable or disable any of the defined import agents by clicking the On/Off buttons next to the desired items in the import agent list. A disabled import agent does nothing.

You can also perform the following import tasks once you have defined at least one import agent:

      Creating a New Import Agent

When you want to import resource descriptions into your Compass Server database, you create an import agent to perform the task. Once you have created an import agent, you can reuse it as often as you need to.

To create a new import agent, do the following:

  1. Click New to open the Import Agent Properties form for a new import agent.

  2. After you edit the new import agent (as described in the next section), you will be looking at the list of import agents, where you can run the import agents manually to import immediately, or wait for a scheduled import-agent run.
      Editing an Import Agent

The Import Agent Properties form allows you to change any aspect of an import agent. You use the same form both to create a new import agent and to modify an existing one.

To edit an import agent, do the following:

  1. Make any desired changes to fields in the Import Agent Properties form.
  2. The fields are described below.

  3. Click OK to save the changes.

The following sections describe the fields in the Import Agent Properties form that define the parameters of an import agent.

  • Resource Description Source--This is where you specify where you want to import resource descriptions from.

Source Value and meaning
Local File You specify the full pathname of a locally accessible file. This can be a file on another server, as long as the path is addressable as if it were locally mounted. The indicated file must contain valid resource descriptions in SOIF format. For more details on SOIF, see the Compass Server Programmer's Guide.
Compass Server You specify the hostname and port number for another iPlanet Compass Server. For example, to import from a Compass Server at port 8000 on the host example.iplanet.com, you would type this:
example.iplanet.com:8000 You can also indicate whether the import session between the Compass Servers should use the SSL (Secure Sockets Layer) protocol.
Instance Name This string is the server instance name used by Compass Server. You can find this instance name in the Server Preferences for the server you are importing from.
iPlanet Web Server

You specify the hostname and port number of an iPlanet Web Server that has its AutoCatalog feature turned on.

If the server on the indicated port is not an iPlanet Web Server running AutoCatalog, the import agent will put a message in the RDM log.

  • Authentication--This is where you specify how the import agent should identify itself to the system it imports from. By default, there is no authentication used. If the server you want to import from requires authentication, you can specify a user name and password for the import agent to use.
  • Content Transfer--Here you specify which resource descriptions you want to import from the source. There are two options here.

Option Meaning
Use Incremental Gathering of Full Contents By default, an import agent asks for all resource descriptions added or changed since its last import from the same source.
Note that this can make repeated import-agent runs appear to "fail," as it is likely they will find no new resource descriptions if run again immediately after finishing.
The Collect All RDs button below this option allows you to override the incremental search for one import session. It does this by clearing the Newest Resource Description field, meaning that for the next run, all resource descriptions are "newer" and therefore imported.
Verity Query You can specify that the import agent should request only certain resource descriptions from the source. This is much the same way that users request listings of resources from the Compass Server database.
These are the fields you use to specify the query:
Scope is the text of the query. The query syntax is identical to that used for end-user queries from the server. For the full syntax available, see the Compass Server User's Guide.
View-Attributes is which fields you want to import in each resource description. The default is all.
View-Hits is the maximum number of matching resource descriptions to import.

  • Advanced--These parameters allow you to customize the import agent's interaction with its source. In most cases, you should not need to change these.

Parameter Meaning
Agent Description This is like a nickname, and is ignored by the program. It appears on the Import Agents form to identify this agent.
Newest Resource Description This is the date of the creation of the newest resource description previously imported by this import agent. This date is used by theUse Incremental Gathering of Full Contents option to determine which resources are new and should be imported.
Network Timeout This specifies the number of seconds the import agent will allow before timing out the connection over the network. You can adjust this to allow for varying network traffic and quality.
Is Catalog Server 1.0? This option should be checked if this import agent is importing resource descriptions from a Netscape Catalog Server 1.0, because some of its internal query syntax, notably the CSID, differs from the defaults for the Compass Server.

      Deleting Import Agents

There are several times when you might need to delete import agents:

  • You created an import agent for a one-time use.
  • You made a mistake in creating the import agent initially.
  • You no longer need an existing import agent.

In any case, the procedure for deleting the import agent is the same.

To delete one or more import agents, do the following:

The Delete Import Agents form shows a list of all existing import agents much like that on the Import Agents form, but instead of the On/Off option, there is a checkbox in front of each import agent.

  1. Check the Delete checkbox in front of each import agent you want to delete.
  2. Click OK to delete the specified import agents.

After you delete import agents, you return to the updated Import Agents form.

      Running Import Agents

Once you have defined import agents, you can run them in either of two ways, manually (immediately) or automatically as a scheduled, periodic job.

When you run import agents, you run them all as a batch. That is, the Compass Server goes through the list of all defined import agents, running all that are marked as "on," or enabled, and ignoring those marked as "off," or disabled. If you temporarily disable a number of import agents to run a manual import with only a specified subset of agents, you must be sure to reenable the disabled agents afterward so they run at the next scheduled time.

Your choice of whether to run manually or on schedule probably depends on the nature of the import task. Running Import Agents Manually is most appropriate for either the first time you import, just to get started, or for one-time import jobs. Scheduling Import Agents is best for periodic updates from the same group of sources.

         Running Import Agents Manually

In general, you need to run import agents manually only when you create a completely new Compass Server that needs an initial set of resource descriptions, when you have added new import agents to an existing Compass Server, or when you know that the source associated with an import agent has a large number of changes you want to incorporate into your database.

To run import agents manually, do the following:

  1. Ensure that all import agents you want to run are enabled (On), and that any import agents you don't want to run at this time are disabled (Off).
  2. Click Run to run all enabled import agents.

The Compass Server opens a new Navigator window and displays the progress of the import agent process in that window. You can do other work while the import agents run.

Do not close this import agent status window. Closing the window will cancel the import operation. You can minimize the window to get it out of your way, but do not close it or open another URL in the window until the import agent finishes its job.

Note that by default an import agent imports all resource descriptions added to the source or changed since the last time it imported from that source. If you rerun an import agent immediately, it might appear to "fail," because it will not find any new or changed resource descriptions.

         Scheduling Import Agents

Once you have your Compass Server running, you will probably set up any routine import agents to run automatically on a regular schedule. You should coordinate the import agent schedule with the robot's automated schedule, or at least run the import processes often enough to minimize any lag between fresh information on the remote servers and import into the database.

Importing is a very efficient process that does not overburden your server. If anything, you should err on the side of importing too often rather than too infrequently.

To run import agents automatically, do the following:

  1. Click Schedule to open the Schedule Task form.
  2. =

    The Schedule task form indicates whether you currently have an automatic schedule activated and shows the day(s) and time of scheduled import agent runs, if any.

  3. Follow the directions for task scheduling in Scheduling Tasks.
Editing the Database Schema

A schema determines what information your Compass Server maintains on each resource, and in what form. The design of your schema determines two factors that affect the usability of your index:

  • The way users can search for resources
  • The ways users view resource information

The schema is essentially a master data structure for resource descriptions in the database. Depending on how you define and index the fields in that data structure, users will have varying degrees of access to the resources.

The schema is intimately tied to the structure of the files used by the Compass Server and its robot. You should only make changes to the data structure by using the schema tools in the iPlanet Server Manager for your Compass Server. You should never edit the schema file (schema.rdm) directly, even though it is a text file.
      Understanding the Schema

A schema in the context of the Compass Server is the definition of the contents of a resource description. That is, the schema determines the names of the fields in the resource description and the type of each field. For example, a schema for a document might have fields for the name of the document, the dates of its creation and last modification, its length, and so on.

The schema for a Compass Server appears to the user only when choosing fields to display in search results or when constructing a complex query using the Advanced Search screen. In each case, the schema appears as a hierarchy of boxes, and the user's interaction with them is limited to choosing and arranging individual elements.

As the administrator, you have the ability to control what items appear in the schema, how those fields are filled from incoming resource descriptions, and how users can use those items.

      Editing the Schema

Using the schema editor, you can modify any aspect of the schema for your Compass Server. In all likelihood, the most common thing you will do to the schema is add a searchable field, which is shown as an example below.

To edit the database schema, do the following:

  1. Click Enable Java Applet to activate the schema editor applet in a separate window.
  2. The schema appears in outline form on the left side of the applet. The right side shows the attributes of the selected field. The attributes are described in the table below.

  3. Make any changes to any field attributes.
  4. Choose File|Save to commit your changes.
  5. Choose File|Close to close the schema editor applet.
         Schema Attributes

For each item in the datbase schema, you can change the following attributes:
 

Attribute

Meaning

Editable

If checked, this attribute indicates that the attribute appears in the RD Editor, so you can change its values. The RD Editor is explained in Editing Resource Descriptions.

Indexable

If checked, this attribute indicates that the field appears in the pop-up menu in the Advanced Search screen. This allows users to search for values in that particular field. 

Description

This is a free-text string for your use. You can use it for comments or annotations. The Compass Server ignores this field.

         Example: Adding a Searchable Attribute

When the robot encounters a META tag in a document, it converts the tag into a field in the resource description. If there is already an item in the schema with a corresponding name, it places the META tag contents in that field. If there is no predefined field with that name, it adds one, and assigns the value.

There are two key facts here:

  • All META tag information is included in the resource description.
  • Even obscure or unique tags are included in the resource description. For example, suppose a document contains the following tag:

    <META NAME="Sauce" CONTENT="cranberry">

    When the robot indexes the document, it will create a schema attribute called Sauce and assign it the value cranberry. Even though this word is not visible in the document, users will be able to search for the term cranberry and retrieve the document.

  • Only fields with the Indexable attribute set are indexed, and therefore included in the Advanced Search.
  • To continue with the preceding example, the Advanced Search does not provide the ability to search for the word cranberry (or any other particular value) in the field called Sauce. If you want to enable users to search for terms in particular fields, you must define those fields in the schema and make them indexable, as shown in the following example.

Suppose Airius Airlines has a corporate standard that every document that deals with a particular type of its planes contains a META tag called PlaneType that contains the names of the planes covered. The Compass Server administrator wants to enable users to search for documents concerning those particular types of planes. These are the steps to do so:

  1. Start the Schema Editor.
  2. Click any of the existing fields in the schema.
  3. Click New Peer.
  4. This creates a new schema item named New, with the name selected.

  5. Type the new name, PlaneType.
  6. Click the checkbox next to Indexable.
  7. Choose File|Save.

When saving, the Compass Server needs to be off, so the Schema Editor can reindex the database. After the reindex is complete, the administrator can restart the server.

Users can then go to the Advanced Search and choose PlaneType from the field list, and search for particular types of planes mentioned in that particular field.

Converting Schema Names

There are several instances where you might encounter discrepancies between the names used for fields in database schemas. One is when you import resource descriptions from one server into another. You cannot always guarantee that the two servers use identical names for items in their schemas. Similarly, when the robot converts HTML META tags from a document into schema fields, the document controls the names.

The Compass Server deals with these by allowing you to define schema conversions, which are mappings of external schema names into valid names for fields in your database. You define your schema conversions on the Schema Conversion form.

To convert incoming schema field names, do the following:

  1. Type the name of the incoming schema field name you want to convert in the text box on the left.
  2. Type the name of the field in your schema that will receive the contents of the converted field in the text box on the right.
  3. Click OK when you have finished.

You can add more conversions by clicking More. If you want to delete the last conversion in the list, click Fewer.

Optimizing the Database

After you run the robot a number of times, the database files and indexes can become fragmented with empty space, causing them to take up more disk space than necessary and slowing down both user searches and robot operations. The solution to this is to periodically optimize the database.

To optimize the database, do the following:

  1. Make sure the Compass Server and the robot are not running.
  2. Click OK.
Partitioning the Database

iPlanet Compass Server allows you to split the physical files that contain the Compass database across multiple disks, file systems, directories, or partitions. By spreading the database across different physical or logical devices, you can create a larger database than would fit on a single device.

The Database Partitions form shows a list of up to 15 partitions defined for the database. By default, the Compass Server sets up the database to use only one directory.

You can perform two kinds of manipulations on the database partitions:

The Compass Server does not perform any checking to ensure that individual partitions have space remaining. It is your responsibility to maintain adequate free space for the database.

      Adding New Partitions
You can add new database partitions up to a maximum of 15 total partitions. Keep in mind, however, that once you increase the number of partitions, you will need to delete the entire database if you later want to reduce the number again.

To add partitions to your database, do the following:

  1. Make sure both the server and the robot are not running.
  2. Type the full pathname of a file to hold the new partition.
  3. Note that the Compass Server does not check to ensure that the pathname is valid. If you type an invalid pathname, the operation will fail.

  4. Click Add New Partition.
  5. This creates the path if needed, redistributes the database records to take advantage of the new space, and reindexes the database.

  6. Repeat steps 2-3 as needed for additional partitions.
      Moving Partitions

You can change the physical location of any of your database partitions by specifying the name of the new location. Similarly, you can rename an existing partition.

To move a database partition, do the following:

  1. Make sure the Compass Server is not running.
  2. Type the new full pathname for the partition in the text box next to the existing pathname.
  3. Click Update Partitions.
The Compass Server moves the partition to its new location, then reindexes the database.
Editing Resource Descriptions

At times you will find it necessary to change the contents of one or more resource descriptions. For example, you might need to correct a typographical error copied into a resource description from an original document. You edit resource description contents using the RD Editor.

Note:
One specialized use of the RD Editor is to assign categories to resource descriptions when the robot fails to do so. This is described in Handling Unassigned Resources.
Deleting the Database

There might be times when you want to delete your entire resource database. Such occasions might include extreme corruption of the database files, major redesign of the schema or taxonomy, or a dramatic change in the sites to be indexed.

For whatever reason you want to delete the database, the procedure is the same. You should not simply delete database files from the disk. Doing so will likely result in your having to reinstall the Compass Server.

To delete the entire resource database, do the following:

  1. Make sure the Compass Server is not running.
  2. Similarly, the robot and all import agents should not be running when you delete the database, as they rely on the database being there. They will crash if you delete the database with them running.

  3. Select:
    • Delete Database to remove all resource descriptions from the database

  4. Click OK to delete the selected portions of the database.

After deleting the database, you can run the robot or import resource descriptions to refill the database.

Purging Expired Resource Descriptions

One attribute of a resource description is its expiration date. Your robots can set the expiration date from HTML META tags or from information provided by the resource's server. By default, resource descriptions expire in three months from creation unless the resource specifies a different expiration date.

Periodically your Compass Server should purge expired resource descriptions from its database. You can perform this task manually, or you can schedule it to occur automatically.

To purge expired resource descriptions from a server, do the following:

  1. Make sure the Compass Server is not running.
  2. Click Expire All RDs.

To schedule the purging of expired resource descriptions, do the following:

  1. Access the Server Manager for the server.
  2. Choose Tasks|Schedule Expire to display the Schedule Expire Agent form.
  3. Follow the directions for task scheduling in Scheduling Tasks.
Reindexing the Database

In certain instances, you might need to reindex the resource description database for the Compass Server. One obvious instance is if you have edited the schema to add or remove an indexed field. You might also need to reindex the database if a disk error corrupts the index file. It's also a good idea to reindex after adding a large number of new resource descriptions.

The time required to reindex the database is roughly proportional to the number of records in the database, so if you have a large database, you should probably perform reindexing at a time when the server is not in high demand.

To reindex the database, do the following:

  1. Make sure the Compass Server is turned off.
  2. Select Reindex the Database and click OK
  3. The Compass Server rebuilds the search collection and its index files.

Recovering the Database

The page-locked main Compass database can become 'hung' if a stale lock is left by an abnormally terminated process. Recovery resets the database state files and will free up hung locks. It also repairs damaged memory caches and transaction logs. Use it if the database appears to be hung and commands that read from or write to the database are freezing, ie, not terminating. Recovery should only be necessary in rare circumstances.

Checking the Database

Each Compass Server stores its resource descriptions in a database. You can use the following procedure to get information about the number of sites indexed and the number of resources from each in the database. This report is also available to end users through the About Compass page.

The Compass Database Analysis form indicates whether a database analysis page has already been generated, and if so, whether the page is up-to-date. It also includes the site analysis report in the form of a table.

  • If the report is up-to-date, click the link to view the report in a separate window, the same way end users will see it from the About Compass page.
  • If the page is not up-to-date, click OK to generate a new one.

© Copyright © 2001 Sun Microsystems, Inc. Some preexisting portions Copyright © 2001 Netscape Communications Corp. All rights reserved.