Categorizing resources is like creating a table of contents for the database. Browsing is an optional feature in a Compass system. That is, you can have a perfectly useful Compass system that does not include browsing by categories. You need to decide whether adding browsable categories will be useful to the users of your index, and then what kind of categories you want to create. This chapter describes the tools you can use to create and edit a set of categories for a Compass Server, covering the following topics:
Authors can also embed explicit category information in their documents, as explained in Category Annotations. The robot uses this explicit category assignment directly, in addition to any categories assigned by rules.
When a Compass Server imports resource descriptions from another Compass Server or from a file, the categories in the imported resource descriptions must match those used by the importing Compass Server, or again the importing server will ignore the category information, and you will need to assign them manually.
In an ideal setup, a Compass Server's set of categories should be either identical to those of the other servers it imports from or a superset of all of them. In that case, no category information is lost when importing.
The following section on strategies describes some ways to ensure category compatibility among the servers and robots that make up your Compass Server system.
Category Strategies
There are several approaches you can take to creating and maintaining a set of categories for your Compass system. The most important consideration is usability because there is no point in creating a set of categories that no one can or will use. But it is also important to create a set of categories that is workable and maintainable for you.
As with all issues pertaining to categorizing resources, it is impossible to cover all the possibilities that might pertain to all your resources, but this section presents some of the most common and important considerations you will encounter.
If possible, you should try to take advantage of existing systems of classification that your users will recognize. If the servers you are indexing already have specific content divisions, you can mirror those in your categories. For example, if each department in a company has a server that contains its documents, you could create categories for each department and automatically mark resources from those servers with the corresponding category.
Similarly, if there is already a directory structure, document-naming conventions, or internal coding schemes, you can take advantage of these systems and incorporate them in the classification of resources in your database.
Categorizing by Location
The simplest way to assign categories to resources is to assign the same category to all the resources from a particular server. That way, the robot does not make any choices but instead assigns one category to all the resources it discovers. For example, if you send a robot to enumerate all the documents on a server for the finance department of your company, you could have that robot assign each document a finance department category code.
This approach can work well even in an environment where a single robot covers a number of servers. You can program the robot's filtering to assign different categories to the resources discovered on different, specific servers.
Categorizing by server works well in cases where a single server contains similar resources. However, many servers contain a mixture of resources. In that case, you can assign categories on more specific addresses, such as specific directories on specific servers.
Categorizing by Matching
The robot can assign categories on almost any basis you choose. One useful feature is the ability to match any data contained in the resource description. So, you could look for specific keywords, server names, company or product names, file types, author names, and so on, and assign categories accordingly.
Categorizing by Annotation
Documents can contain explicit category information. When the robot finds these category annotations in a document, it automatically assigns the specified category to the document if the specified category exists. The robot still applies its classification rules, which can result in additional category assignments.
Explicit category assignments take precedence over assignments by the robot.
For more information about specifying category assignments in documents, see Category Annotations.
Category Notation
Most categories are assigned automatically by robots, but it is sometimes necessary to assign them manually. In that case, you must use proper notation.
Because categories are hierarchical, you must specify the entire classification of a resource when placing it in a category. The notation for resource description categories separates categorical levels with a colon (:). For example, to specify a category "White Papers" as a subcategory of "Marketing Documents," which is a subcategory of "Company Documents," you specify as follows:
Company Documents:Marketing Documents:White Papers
NOTE: Category entries are case-sensitive.You can concatenate multiple category names with semicolons. For example:
Company:Marketing:Plans;Company:Products
META
tags for specifying certain kinds of information, such as the name of the document author. The format of a META
tag is as follows:
<META name=[string] content=[string]>The content of a classification tag must conform to the standard notation for category names shown in "Category Notation" on page 145. For example, the following set of
META
tags define both the name of the document author and a category to assign to the document.
<META name="Author" content="Mark Twain">
You can assign multiple categories either by using the standard concatenation of categories or by using multiple classification
<META name="Classification" content="Literature:Books:Fiction">META
tags. For example, the following sets of tags are equivalent:
<META name="Classification" content="Animals:Mammals:Marine;Animals:Aquatic:Whales">
<META name="Classification" content="Animals:Mammals:Marine">
The specific
<META name="Classification" content="Animals:Aquatic:Whales">META
tag names recognized by the default robots are Author, Classification, Description, and Keywords. All of these can be created manually, but they can also be added using Netscape Composer, as described in the next section.
Annotations and Netscape Composer
If you edit an HTML document with the Netscape Composer editor, you can use the document properties to specify a classification for the document. Specifically, if you access the properties (Format|Page Colors and Properties) for a document, one of the properties on the General page is called Classification. If you type a string in that field, Composer writes the string as the value of a META
tag named "Classification."
The robot will then use the string as the category when generating a resource description for the document. You must be sure to use a category that actually exists in the hierarchy and use the proper notation, as described in Category Notation.
Creating a New Set of Categories
Almost every new Compass system will need its own set of categories. Because each company, network, or search service and their users have distinct needs, different indexes will categorize their documents differently. In all likelihood, therefore, you will need to create a completely new set of categories at least once. Once you have created the new categories, you can share them with other servers by copying, as explained in Copying an Existing Set of Categories.
Netscape Compass Server comes with several sample category hierarchies you can use as starting points for your own set. You can also use them for ideas on different ways you might employ categories in your Compass system.
There are two different ways to create a new set of categories. The first is to delete the existing one and manually create another in its place. The other is to overwrite the existing one with a complete set of categories from another server.
Creating a Set of Categories Manually
When you initially create a new set of categories, you use the Category Construction form in the Server Manager. This form contains a Java applet for creating or modifying the Compass Server category hierarchy.
To create a new set of categories, do the following:
The hierarchy of categories appears in outline form.
This will delete all the categories in the hierarchy and start a new set with a
single topic called "Top."
taxonomy.rdm
. In general, you should not modify this file manually. Instead, use the category editor to make any changes to the categories.
You can, however, copy an entire set of categories from one server to another by copying the taxonomy.rdm
file. For example, if you design an entire set of categories for a main Compass Server, then want to use the same categories for a remote Compass Server, copy the file from the server where you have created the new set of categories to the server where you want the new categories reflected.
config
under the files for each server under the server root directory. That is,
ServerRoot/compass-name/config/taxonomy.rdmFor example, for a Compass Server called
info
on a default Unix installation, the path would be
/usr/Netscape/SuiteSpot/compass-info/config/taxonomy.rdmOn a Windows NT system, the default path for server
info
would be
C:\Netscape\SuiteSpot\compass-info\config\taxonomy.rdm
ServerRoot/bin/compass/samples/taxonomyYou can copy one of the sample sets over the default categories installed with a new server.
NOTE: You should not copy a sample over a real set of categories you have created unless you are sure you have a backup copy you can restore later.To copy one of the samples, do the following:
info
in the default server root directory, and that you want to copy the sample categories in the file kc1.soif.
For a Unix system, the commands are as follows:
cd /usr/Netscape/SuiteSpotFor a Windows NT system, the commands are as follows:
cp bin/compass/samples/taxonomy/kc1.soif compass-info/config/taxonomy.rdm
cd C:\Netscape\SuiteSpotAfter you copy the sample set of categories, stop and start the server.
copy bin\compass\samples\taxonomy\kc1.soif compass-info\config\taxonomy.rdm
Using the Category Editor
You can use the Category Editor to make any changes to your set of categories. The editor itself is a Java applet that presents the hierarchy of categories in an expandable outline that you can manipulate directly. Subcategories always appear in alphabetical order under their parent categories.
This section describes the following tasks involving the Category Editor:
Starting the Category Editor
To start the Category Editor, do the following:
To add a category to the hierarchy, do the following:
You might need to scroll and expand categories and subcategories to locate
the category you want.
The new category appears in the hierarchy at the appropriate level, with its
name, "New," highlighted.
The Category Editor will automatically reposition the category in
alphabetical order with its peers when you press Enter.
You might need to scroll and expand categories and subcategories to locate
the category you want.
The selected category's name is highlighted in a text box.
The Category Editor will automatically reposition the category in
alphabetical order with its peers when you press Enter.
You might need to scroll through the category outline to find the category
you want.
The Categorize URLs screen shows a list of the URLs for resources in the
database that have no category assigned.
Last Updated: 02/12/98 13:34:10
Any sample code included above is provided for your use on an "AS IS" basis, under the Netscape License Agreement - Terms of Use