Sun Java System Portal Server 7.1 Administration Guide

Chapter 11 Managing the Search Server

This chapter describes how to configure and administer the Sun JavaTM System Portal Server Search Server.

This chapter contains these sections:

Understanding the Search Server

The Portal Server Search Server is a taxonomy and database service designed to support search and browse interfaces similar to popular Internet search servers such as Google and Alta Vista. The Search Server includes a robot to discover, convert, and summarize document resources. The Portal Server Desktop includes a search user interface based on JavaServer PagesTM (JSPTM). The Search Server includes administration tools for configuration editing and command-line tools for system management. Configuration settings can be defined and stored through the Portal Server management console.


Note –

The management console permits an administrator to configure a majority of the search server options, but it does not perform all the administrative functions available through the command-line interface.


Search Database

User query the search server's databases to locate resources. Individual entries in each database are called resource descriptions (RDs). A resource description provides summary information about a single resource. The database schema determines the fields of each resource description.

The search server is based on open Internet standards such as Resource Description Messages (RDM) and the Summary Object Interchange Format (SOIF) to ensure that the search server can operate in a cross-platform enterprise environment.

Database Taxonomy Categories

Users interact with the search system in two ways. They can type direct queries to search the database, or they can browse through the database contents using a set of categories that you design. A hierarchy of categories is sometimes called a taxonomy. Categorizing resources is like creating a table of contents for the database.

Browsing is an optional feature in a search system. That is, you can have a perfectly useful Search system that does not include browsing by categories. You need to decide whether adding categories that users can browse is useful to the users of your index, and, if so, what kind of categories you want to create.

The resources in a Search database are assigned to categories to reduce complexity. If a large number of items are in the database, grouping related items together is helpful. Doing so allows users to quickly locate specific kinds of items, compare similar items, and choose which ones they want.

Such categorizing is common in product and service indexes. Clothing catalogs divide men’s, women’s, and children’s clothing, with each of those further subdivided for coats, shirts, shoes, and other items. An office products catalog could separate furniture from stationery, computers, and software. And advertising directories are arranged by categories of products and services.

The principles of categorical groupings in a printed index also apply to online indexes. The idea is to make it easy for users to locate resources of a certain type, so that they can choose the ones they want. No matter what the scope of the index you design, the primary concern in setting up your categories should be usability. You need to know how users use the categories. For example, if you design an index for a company with three offices in different locations, you might make your top-level categories correspond to each of the three offices. If users are more interested in, say, functional divisions that cut across the geographical boundaries, it might make more sense to categorize resources by corporate divisions.

Once the categories are defined, you must set up rules to assign resources to categories. These rules are called classification rules. If you do not define your classification rules properly, users cannot locate resources by browsing in categories. You need to avoid categorizing resources incorrectly, but you also should avoid failing to categorize documents.

Managing Search Servers

Sun Java System Portal Server can support one or more search servers.

ProcedureTo Create a Search Server

During Portal Server installation, a default search server (search1) is created. You can also create a new search server using the Create Search Server wizard.

Before You Begin

You will need to know configuration information specific to the web container instance that you use:

  1. Log in to the Portal Server management console.

  2. Select Search Servers and then New from the menu bar.

    The New Search Server wizard appears.

  3. Follow the instructions and then click Finish to create the specified search server.

For equivalent psadmin Command

psadmin create-search-server.

ProcedureTo Delete a Search Server

  1. Log in to the Portal Server management console.

  2. Select Search Servers from the menu bar.

  3. Select a search server and click Delete.

For equivalent psadmin Command

psadmin delete-search-server

Overview of the Database

The search server stores its descriptions of resources in a database. A search database is a document collection index. They are created by the indexer (command rdmgr, or search server itself). For example, by default the robot can be setup to crawl web sites and the robot indexes whatever it finds into the default" search database where users can search for the data. The data or index into other databases too.

The following are some configuration and maintenance tasks you may need to perform to administer the database:

Importing to a Database

Normally, items in your search database come from the robot. You can also import databases of existing items, either from other Portal Server Search servers, from iPlanet Web Servers or NetscapeTM Enterprise Servers, or from databases generated from other sources. Importing existing databases of RDs instead of sending the robot to create them anew helps reduce the amount of network traffic. Doing so also enables large indexing efforts to be completed more quickly by breaking the effort down into smaller parts. If the central database is physically distant from the servers being indexed, it can be helpful to generate the RDs locally and periodically import the remote databases to the central database.

The search server uses import agents to import RDs from another server or from a database. An import agent is a process that retrieves a number of RDs from an external source and merges that information into a local database.

Before you can import a database, you must create an import agent. Once an agent is created, you can start the import process immediately or schedule a time to run the import process on a regular basis.

Editing the Database Schema

A schema determines what information your search server maintains on each resource, and in what form. The design of your schema determines two factors that affect the usability of your index:

The schema is a master data structure for Resource Descriptions in the database. Depending on how you define and index the fields in that data structure, users have varying degrees of access to the resources.

The schema is closely tied to the structure of the files used by the search server and its robot. You should change only the data structure by using the schema tools in management console. Never edit the schema file directly.

You can edit the database schema of the search server to add a new schema attribute, to modify a schema attribute, or to delete attributes.

The schema includes the following attributes:

Defining Schema Aliases

You might encounter discrepancies between the names used for fields in database schemas. When you import Resource Descriptions from one server to another, you cannot always guarantee that the two servers use identical names for items in their schemas. Similarly, when the robot converts HTML <meta> tags from a document into schema fields, the document controls the names.

The search server allows you to define schema aliases for your schema attributes, to map these external schema names into valid names for fields in your database.

Viewing Database Analysis

The search server provides a report with information about the number of sites indexed and the number of resources from each in the database.

Re-indexing the Database

You might need to re-index the Resource Description database for the search server if you have edited the schema to add or remove an indexed field or if a disk error corrupts the index file. It may also be necessary to re-index if a discrepancy occurs between the database content and its index for any other reason. For example, a system failure while indexing.

Re-indexing a large database can take several hours. The time required to re-index the database corresponds to the number of records in the database. If you have a large database, perform re-indexing at a time when the server is not in high demand.

Expiring the Database

Removing Resource Descriptions that are out of date is expiring the database. Resource Descriptions are removed only when you run the expiration. Expired Resource Descriptions are deleted, but the database size is not decreased.

One attribute of a Resource Description is its expiration date. Your robots can set the expiration date from HTML <meta> tags or from information provided by the resource’s server. By default, Resource Descriptions expire in three months from creation unless the resource specifies a different expiration date. Periodically your search server should purge expired Resource Descriptions from its database.

Purging the Database

Purging allows you to remove the contents of the database. Disk space used for indexes is recovered, but disk space used by the main database is not recovered. Instead it is reused as new data are added to the database.

Partitioning the Database

The search server allows you to put the physical files that make up each search database on multiple disks, file systems, directories, or partitions. By spreading databases across different physical or logical devices, you can create a larger database than would fit on a single device.

By default, the search server sets up the database to use only one directory. The command-line interface allows you to perform two kinds of manipulations on the database partitions:

The search server does not perform any checking to ensure that individual partitions have space remaining. It is your responsibility to maintain adequate free space for the database.

You can add new database partitions up to a maximum of 15 total partitions.


Note –

Once you increase the number of partitions, you must delete the entire database if you want to reduce the number later.

However, partitions are not recommended as long as you have enough disk space.


To change the physical location of any database partition, specify the name of the new location. Similarly, you can rename an existing partition. Use the rdmgr command to manipulate the partitions. See the Sun Java System Portal Server 7.1 Command Line Reference for information on the psadmin command.

Managing Databases

Use the following instruction to manage a database:

ProcedureTo Create a Database

  1. Log in to the Portal Server management console.

  2. Select Search Servers tab, then select a search server.

  3. Click Databases, then Management from the menu bar.

  4. Click New.

    The New Database page displays.

  5. Type the name of the new database, and click OK.

For equivalent psadmin Command

psadmin create-search-database

ProcedureTo Create an Import Agent

  1. Log in to the Portal Server management console.

  2. Select Search Servers tab, then select a search server.

  3. Click Databases, then Import Agents from the menu bar.

  4. Click New to launch the wizard.

  5. Specify the Import Agent attributes.

    For more information about the attributes, see Import Agents in Sun Java System Portal Server 7.1 Technical Reference

  6. Click Finish.

For equivalent psadmin Command

psadmin create-search-importagent

ProcedureTo Create a Resource Description

  1. Log in to the Portal Server management console.

  2. Select the Search Servers tab, then select a search server.

  3. Click Databases, then Management from the menu bar.

  4. Select a database and click Manage Resource Descriptions.

  5. Click New and specify the attributes.

    For more information about the attributes, see Schema in Sun Java System Portal Server 7.1 Technical Reference

  6. Click OK.

ProcedureTo Manage Resource Descriptions

  1. Log in to the Portal Server management console.

  2. Select Search Servers tab, then select a search server.

  3. Click Databases, then Management from the menu bar.

  4. Select a database and click Manage Resource Descriptions.

  5. Select a Resource Description to perform one of the following actions:

    • Edit

    • Edit All

    • Delete

    For more information about the attributes, see Schema in Sun Java System Portal Server 7.1 Technical Reference

  6. Click Save.

For equivalent psadmin Command

psadmin modify-search-resourcedescription

Managing Reports

The search server provides a number of reports to allow you to monitor search activity.

ProcedureTo View Reports

  1. Log in to the Portal Server management console.

  2. Select the Search Servers tab , then select a search server.

  3. Click Reports from the menu bar.

  4. Click on a link in the menu bar to view a specific report.

    The following options are available:

    • Logs

    • Advanced Robot Reports

    • Popular Searches

    • Excluded URLs

Managing Categories

The following tasks can be used to manage categories:

ProcedureTo Create a Category

  1. Log in to the Portal Server management console.

  2. Select Search Servers from the tab, then select a search server.

  3. Select Categories, then Browse/Search from the menu bar.

  4. Click New.

    The New Search Category dialog appears.

  5. Specify the attributes as necessary.

    For more information about the attributes, see Manage Categories in Sun Java System Portal Server 7.1 Technical Reference

  6. Click OK.

ProcedureTo Edit a Category

  1. Log in to the Portal Server management console.

  2. Select the Search Servers tab, then select a search server.

  3. Click Categories, then Browse/Search from the menu bar.

  4. Select a category and click Edit to display the Edit Category page.

    For more information about the attributes, see Manage Categories in Sun Java System Portal Server 7.1 Technical Reference

ProcedureTo Run Autoclassify

  1. Log in to the Portal Server management console.

  2. Select the Search Servers tab, then select a search server.

  3. Click Categories, then Autoclassify from the menu bar.

  4. Click Run Autoclassify.

ProcedureTo Edit Autoclassify Attributes

  1. Log in to the Portal Server management console.

  2. Click the Search Servers tab, then select a search server.

  3. Click Categories, then Autoclassify from the menu bar.

  4. Modify the attributes as necessary.

    For more information about the attributes, see Sun Java System Portal Server 7.1 Technical Reference

  5. Click Save.