Sun Java System Portal Server 7.1 Administration Guide

Overview of the Database

The search server stores its descriptions of resources in a database. A search database is a document collection index. They are created by the indexer (command rdmgr, or search server itself). For example, by default the robot can be setup to crawl web sites and the robot indexes whatever it finds into the default" search database where users can search for the data. The data or index into other databases too.

The following are some configuration and maintenance tasks you may need to perform to administer the database:

Importing to a Database

Normally, items in your search database come from the robot. You can also import databases of existing items, either from other Portal Server Search servers, from iPlanet Web Servers or NetscapeTM Enterprise Servers, or from databases generated from other sources. Importing existing databases of RDs instead of sending the robot to create them anew helps reduce the amount of network traffic. Doing so also enables large indexing efforts to be completed more quickly by breaking the effort down into smaller parts. If the central database is physically distant from the servers being indexed, it can be helpful to generate the RDs locally and periodically import the remote databases to the central database.

The search server uses import agents to import RDs from another server or from a database. An import agent is a process that retrieves a number of RDs from an external source and merges that information into a local database.

Before you can import a database, you must create an import agent. Once an agent is created, you can start the import process immediately or schedule a time to run the import process on a regular basis.

Editing the Database Schema

A schema determines what information your search server maintains on each resource, and in what form. The design of your schema determines two factors that affect the usability of your index:

The schema is a master data structure for Resource Descriptions in the database. Depending on how you define and index the fields in that data structure, users have varying degrees of access to the resources.

The schema is closely tied to the structure of the files used by the search server and its robot. You should change only the data structure by using the schema tools in management console. Never edit the schema file directly.

You can edit the database schema of the search server to add a new schema attribute, to modify a schema attribute, or to delete attributes.

The schema includes the following attributes:

Defining Schema Aliases

You might encounter discrepancies between the names used for fields in database schemas. When you import Resource Descriptions from one server to another, you cannot always guarantee that the two servers use identical names for items in their schemas. Similarly, when the robot converts HTML <meta> tags from a document into schema fields, the document controls the names.

The search server allows you to define schema aliases for your schema attributes, to map these external schema names into valid names for fields in your database.

Viewing Database Analysis

The search server provides a report with information about the number of sites indexed and the number of resources from each in the database.

Re-indexing the Database

You might need to re-index the Resource Description database for the search server if you have edited the schema to add or remove an indexed field or if a disk error corrupts the index file. It may also be necessary to re-index if a discrepancy occurs between the database content and its index for any other reason. For example, a system failure while indexing.

Re-indexing a large database can take several hours. The time required to re-index the database corresponds to the number of records in the database. If you have a large database, perform re-indexing at a time when the server is not in high demand.

Expiring the Database

Removing Resource Descriptions that are out of date is expiring the database. Resource Descriptions are removed only when you run the expiration. Expired Resource Descriptions are deleted, but the database size is not decreased.

One attribute of a Resource Description is its expiration date. Your robots can set the expiration date from HTML <meta> tags or from information provided by the resource’s server. By default, Resource Descriptions expire in three months from creation unless the resource specifies a different expiration date. Periodically your search server should purge expired Resource Descriptions from its database.

Purging the Database

Purging allows you to remove the contents of the database. Disk space used for indexes is recovered, but disk space used by the main database is not recovered. Instead it is reused as new data are added to the database.

Partitioning the Database

The search server allows you to put the physical files that make up each search database on multiple disks, file systems, directories, or partitions. By spreading databases across different physical or logical devices, you can create a larger database than would fit on a single device.

By default, the search server sets up the database to use only one directory. The command-line interface allows you to perform two kinds of manipulations on the database partitions:

The search server does not perform any checking to ensure that individual partitions have space remaining. It is your responsibility to maintain adequate free space for the database.

You can add new database partitions up to a maximum of 15 total partitions.


Note –

Once you increase the number of partitions, you must delete the entire database if you want to reduce the number later.

However, partitions are not recommended as long as you have enough disk space.


To change the physical location of any database partition, specify the name of the new location. Similarly, you can rename an existing partition. Use the rdmgr command to manipulate the partitions. See the Sun Java System Portal Server 7.1 Command Line Reference for information on the psadmin command.