Netscape Compass Server Administrator's Guide

[Contents] [Previous] [Next] [Index]

Chapter 5
Setting Up Categories

Users interact with the Compass Server system in two distinct ways: They can type direct queries to search the database, or they can browse through the database contents using a set of categories you design. A hierarchy of categories is sometimes called a taxonomy. It is difficult to describe all the possible uses of categories, because the categories used will differ greatly among Compass Server systems.

Categorizing resources is like creating a table of contents for the database. Browsing is an optional feature in a Compass system. That is, you can have a perfectly useful Compass system that does not include browsing by categories. You need to decide whether adding browsable categories will be useful to the users of your index, and then what kind of categories you want to create.

This chapter describes the tools you can use to create and edit a set of categories for a Compass Server, covering the following topics:

Understanding Categories

The term taxonomy in general describes any system of categories. In the context of a networked resource database such as Compass Server, it describes any method you choose of categorizing network resources to facilitate retrieval. In almost every situation there are different ways you could choose to organize your categories.

The Purpose of Categories

The resources in a Compass database are assigned to categories to clarify complexity. If there is a large number of items in the database, it is helpful to group related items together. This allows users to quickly locate specific kinds of items, compare similar items, and choose which ones they want.

Such categorizing is common in the product and service indexes we use every day. Clothing catalogs divide men's, women's, and children's clothing, with each of those further subdivided for coats, shirts, shoes, and so on. An office products catalog could separate furniture from stationery, computers, and software. And the telephone "yellow pages" groups advertisements by categories of products and services.

Categories in Online Databases

The principles of categorical groupings in a printed index also apply to online indexes. The idea is to make it easy for users to locate resources of a certain type, so that they can choose the ones they want.

The categories you choose for your database will vary, however, depending on what resources your Compass Server indexes. For example, a very broad index, such as an Internet directory, might choose to categorize documents by general areas of knowledge, such as science, art, business, and so on. Each of those areas could in turn be subdivided into more specific categories. But the highest-level categories could just as easily be geographical, with each geographical division subdivided in various ways.

On the other hand, an index for a single company is more likely to have divisions meaningful to the company, perhaps relating to the organizational structure of the company, product lines, or geographic regions in which the company does business.

No matter what the scope of the index you design, the primary concern in setting up your categories should be usability. That is, you need to know how users will use the categories. For example, if you were designing an index for a company that has three main offices in different locations, you might make your top-level categories correspond to each of the three offices. But if users are more interested in, say, functional divisions that cut across the geographical boundaries, it might make more sense to categorize resources by corporate divisions.

Assigning Categories

Resource descriptions receive their categories when created. That is, when a robot discovers a resource and generates a resource description, it can assign a categorical designation. If the Compass Server's category hierarchy does not contain the categories assigned by the robot, the Compass Server ignores the category information, and you will need to assign categories manually.

Assigning categories with robot rules is described in Setting Classification Rules. Assigning categories manually is described in Assigning Categories Manually.

Authors can also embed explicit category information in their documents, as explained in Category Annotations. The robot uses this explicit category assignment directly, in addition to any categories assigned by rules.

When a Compass Server imports resource descriptions from another Compass Server or from a file, the categories in the imported resource descriptions must match those used by the importing Compass Server, or again the importing server will ignore the category information, and you will need to assign them manually.

In an ideal setup, a Compass Server's set of categories should be either identical to those of the other servers it imports from or a superset of all of them. In that case, no category information is lost when importing.

The following section on strategies describes some ways to ensure category compatibility among the servers and robots that make up your Compass Server system.

Category Strategies

There are several approaches you can take to creating and maintaining a set of categories for your Compass system. The most important consideration is usability because there is no point in creating a set of categories that no one can or will use. But it is also important to create a set of categories that is workable and maintainable for you.

As with all issues pertaining to categorizing resources, it is impossible to cover all the possibilities that might pertain to all your resources, but this section presents some of the most common and important considerations you will encounter.

If possible, you should try to take advantage of existing systems of classification that your users will recognize. If the servers you are indexing already have specific content divisions, you can mirror those in your categories. For example, if each department in a company has a server that contains its documents, you could create categories for each department and automatically mark resources from those servers with the corresponding category.

Similarly, if there is already a directory structure, document-naming conventions, or internal coding schemes, you can take advantage of these systems and incorporate them in the classification of resources in your database.

Categorizing by Location

The simplest way to assign categories to resources is to assign the same category to all the resources from a particular server. That way, the robot does not make any choices but instead assigns one category to all the resources it discovers. For example, if you send a robot to enumerate all the documents on a server for the finance department of your company, you could have that robot assign each document a finance department category code.

This approach can work well even in an environment where a single robot covers a number of servers. You can program the robot's filtering to assign different categories to the resources discovered on different, specific servers.

Categorizing by server works well in cases where a single server contains similar resources. However, many servers contain a mixture of resources. In that case, you can assign categories on more specific addresses, such as specific directories on specific servers.

Categorizing by Matching

The robot can assign categories on almost any basis you choose. One useful feature is the ability to match any data contained in the resource description. So, you could look for specific keywords, server names, company or product names, file types, author names, and so on, and assign categories accordingly.

Categorizing by Annotation

Documents can contain explicit category information. When the robot finds these category annotations in a document, it automatically assigns the specified category to the document if the specified category exists. The robot still applies its classification rules, which can result in additional category assignments.

Explicit category assignments take precedence over assignments by the robot. For more information about specifying category assignments in documents, see Category Annotations.

Category Notation

Most categories are assigned automatically by robots, but it is sometimes necessary to assign them manually. In that case, you must use proper notation.

Because categories are hierarchical, you must specify the entire classification of a resource when placing it in a category. The notation for resource description categories separates categorical levels with a colon (:). For example, to specify a category "White Papers" as a subcategory of "Marketing Documents," which is a subcategory of "Company Documents," you specify as follows:

Company Documents:Marketing Documents:White Papers
NOTE: Category entries are case-sensitive.
You can concatenate multiple category names with semicolons. For example:

Company:Marketing:Plans;Company:Products

Category Annotations

Some kinds of files and editors enable the author to directly specify a category for placement in hierarchy.

Annotations and Robots

The robot recognizes HTML META tags for specifying certain kinds of information, such as the name of the document author. The format of a META tag is as follows:

<META name=[string] content=[string]>
The content of a classification tag must conform to the standard notation for category names shown in "Category Notation" on page 145. For example, the following set of META tags define both the name of the document author and a category to assign to the document.

<META name="Author" content="Mark Twain">
<META name="Classification" content="Literature:Books:Fiction">
You can assign multiple categories either by using the standard concatenation of categories or by using multiple classification META tags. For example, the following sets of tags are equivalent:

<META name="Classification"       content="Animals:Mammals:Marine;Animals:Aquatic:Whales">
<META name="Classification" content="Animals:Mammals:Marine">
<META name="Classification" content="Animals:Aquatic:Whales">
The specific META tag names recognized by the default robots are Author, Classification, Description, and Keywords. All of these can be created manually, but they can also be added using Netscape Composer, as described in the next section.

Annotations and Netscape Composer

If you edit an HTML document with the Netscape Composer editor, you can use the document properties to specify a classification for the document. Specifically, if you access the properties (Format|Page Colors and Properties) for a document, one of the properties on the General page is called Classification. If you type a string in that field, Composer writes the string as the value of a META tag named "Classification."

The robot will then use the string as the category when generating a resource description for the document. You must be sure to use a category that actually exists in the hierarchy and use the proper notation, as described in Category Notation.

Creating a New Set of Categories

Almost every new Compass system will need its own set of categories. Because each company, network, or search service and their users have distinct needs, different indexes will categorize their documents differently. In all likelihood, therefore, you will need to create a completely new set of categories at least once. Once you have created the new categories, you can share them with other servers by copying, as explained in Copying an Existing Set of Categories.

Netscape Compass Server comes with several sample category hierarchies you can use as starting points for your own set. You can also use them for ideas on different ways you might employ categories in your Compass system.

There are two different ways to create a new set of categories. The first is to delete the existing one and manually create another in its place. The other is to overwrite the existing one with a complete set of categories from another server.

Creating a Set of Categories Manually

When you initially create a new set of categories, you use the Category Construction form in the Server Manager. This form contains a Java applet for creating or modifying the Compass Server category hierarchy.

To create a new set of categories, do the following:

  1. Click Enable Java Applet to start the Category Editor in a separate window.
    The hierarchy of categories appears in outline form.

  2. Select the top-level category.

  3. Click Delete.
    This will delete all the categories in the hierarchy and start a new set with a single topic called "Top."

  4. Rename the new top-level category and add subcategories as described in Editing Categories.

  5. Choose File|Categories Save to record your new categories.
If you plan to use this new categories for other servers, too, you should now copy the new set of categories to those other servers.

Copying an Existing Set of Categories

Each Compass Server stores its categories in a file called taxonomy.rdm. In general, you should not modify this file manually. Instead, use the category editor to make any changes to the categories.

You can, however, copy an entire set of categories from one server to another by copying the taxonomy.rdm file. For example, if you design an entire set of categories for a main Compass Server, then want to use the same categories for a remote Compass Server, copy the file from the server where you have created the new set of categories to the server where you want the new categories reflected.

Locating taxonomy.rdm

The taxonomy.rdm file containing a server's categories is found in a directory called config under the files for each server under the server root directory. That is,

ServerRoot/compass-name/config/taxonomy.rdm
For example, for a Compass Server called info on a default Unix installation, the path would be

/usr/Netscape/SuiteSpot/compass-info/config/taxonomy.rdm
On a Windows NT system, the default path for server info would be

C:\Netscape\SuiteSpot\compass-info\config\taxonomy.rdm

Example: Copying a Sample Category Set

Netscape Compass Server comes with several sample category sets you can use as starting points for your own categories. You might also use them for ideas on how to create a whole new set of categories.

The sample category sets are stored in a directory under the binary files for the Compass Server:

ServerRoot/bin/compass/samples/taxonomy
You can copy one of the sample sets over the default categories installed with a new server.

NOTE: You should not copy a sample over a real set of categories you have created unless you are sure you have a backup copy you can restore later.
To copy one of the samples, do the following:

  1. Go to the command line.

  2. Change to the server root directory.

  3. Type the copy command appropriate for your operating system.

  4. Stop and start the server so the new categories take effect.
For this example, we'll assume you've installed a Compass Server called info in the default server root directory, and that you want to copy the sample categories in the file kc1.soif.

For a Unix system, the commands are as follows:

cd /usr/Netscape/SuiteSpot
cp bin/compass/samples/taxonomy/kc1.soif compass-info/config/taxonomy.rdm
For a Windows NT system, the commands are as follows:

cd C:\Netscape\SuiteSpot
copy bin\compass\samples\taxonomy\kc1.soif compass-info\config\taxonomy.rdm
After you copy the sample set of categories, stop and start the server.

Editing Categories

The Server Manager for Compass Servers contains items under Categories you can use to customize the categories for your servers.

Before modifying categories, you should consider the issues of compatibility, as discussed under Category Strategies. For example, if you add categories to the hierarchy for one server in a multi-server Compass system, you should make sure there is a corresponding category in the hierarchy of any Compass Server that imports resource descriptions from it. Similarly, if you add categories to the hierarchy of a Compass Server, you should provide some way to import documents into that category.

Using the Category Editor

You can use the Category Editor to make any changes to your set of categories. The editor itself is a Java applet that presents the hierarchy of categories in an expandable outline that you can manipulate directly. Subcategories always appear in alphabetical order under their parent categories.

This section describes the following tasks involving the Category Editor:

Although changes you make appear instantly in the editor, none of them actually take effect on the server until you commit the changes by choosing File|Categories Save. If you leave the Category Editor without saving, you will be reminded to save. If you do not save at that time, your changes will be lost.

Also, after editing categories, you should always reindex the categories so that user searches will reflect the changes you have made. Reindexing categories is explained in Reindexing Categories.

Starting the Category Editor

To start the Category Editor, do the following:

  1. Access the Server Manager for the Compass Server.

  2. Choose Categories|Construction to open the Category Construction Editor form.

  3. Click Enable Java Applet to open the Category Editor in a separate window.

Adding Categories

There are two ways to add a category to the hierarchy, as a peer or as a child. The difference is defined by the relationship to the currently selected category. If you select a category and add a category at the same level, the new category is called a peer. If you create the new category as a subcategory of the selected category, the new category is called a child.

To add a category to the hierarchy, do the following:

  1. Start the Category Editor.

  2. Select the category you want to give a new peer or child to.
    You might need to scroll and expand categories and subcategories to locate the category you want.

  3. Click New Peer or New Child.
    The new category appears in the hierarchy at the appropriate level, with its name, "New," highlighted.

  4. Type the name for the new category.
    The Category Editor will automatically reposition the category in alphabetical order with its peers when you press Enter.

  5. Repeat steps 2-4 as needed for additional categories.
After adding all your new categories, be sure to save them by choosing File|Categories Save.

Renaming Categories

The Category Editor always shows peer categories in alphabetical order, making it easier for you to find them. This ordering is less important to users of the Compass Server. Creating meaningful, clear names is more important than their order.

To rename a category, do the following:

  1. Start the Category Editor.

  2. Select the category you want to rename.
    You might need to scroll and expand categories and subcategories to locate the category you want.
    The selected category's name is highlighted in a text box.

  3. Type the new name for the category.
    The Category Editor will automatically reposition the category in alphabetical order with its peers when you press Enter.

  4. Repeat steps 2-3 as needed for additional categories.
After renaming categories, be sure to save your changes by choosing File|Categories Save.

Deleting Categories

To delete a category, do the following.

  1. Start the Category Editor.

  2. Select the category you want to delete.
    You might need to scroll through the category outline to find the category you want.

  3. Click Delete.

  4. Repeat steps 2-3 as needed to delete additional categories.
When you finish deleting categories, choose File|Categories Save to accept the changes you've made.

Assigning Categories Manually

In some cases, your robots will not assign categories to some resources they locate. This is particularly true when you first create your database and have not yet worked out a complete set of category rules.

You can assign these uncategorized resources to categories manually through a graphical editor in the Server Manager. Based on the size of this list, you can decide how to handle the uncategorized resources:

To see the list of uncategorized documents to categories, do the following:

  1. Click Enable Java Applet to launch the RD Editor applet in a separate window.

    The Categorize URLs screen shows a list of the URLs for resources in the database that have no category assigned.

  2. Click the Search button to search for unclassified resources.

  3. Next, determine how you want to handle the uncategorized resources.

Reindexing Categories

Netscape Compass Server allows end users to search for categories in the database in addition to resources. For example, if a user searches for the term "server" and your hierarchy of categories contains one or more categories with the word "server" in them, the search results will list those categories in addition to specific resources, enabling the user to easily browse those categories for further information.

Before users can search for categories, however, you must index those categories, making them searchable.

In addition, if you change the hierarchy by adding, deleting, or renaming categories, you should reindex the categories to keep searches up-to-date.

To index the current set of categories, do the following:

  1. Click Index.
The index is now up-to-date, and searches will reflect the proper categories.


[Contents] [Previous] [Next] [Index]

Last Updated: 02/12/98 13:34:10


Copyright © 1997 Netscape Communications Corporation

Any sample code included above is provided for your use on an "AS IS" basis, under the Netscape License Agreement - Terms of Use