This chapter describes how to configure and administer the Sun JavaTM System Portal Server Search Server.
This chapter contains these sections:
The Portal Server Search Server is a taxonomy and database service designed to support search and browse interfaces similar to popular Internet search servers such as Google and Alta Vista. The Search Server includes a robot to discover, convert, and summarize document resources. The Portal Server Desktop includes a search user interface based on JavaServer PagesTM (JSPTM). The Search Server includes administration tools for configuration editing and command-line tools for system management. Configuration settings can be defined and stored through the Portal Server management console.
The management console permits an administrator to configure a majority of the search server options, but it does not perform all the administrative functions available through the command-line interface.
User query the search server's databases to locate resources. Individual entries in each database are called resource descriptions (RDs). A resource description provides summary information about a single resource. The database schema determines the fields of each resource description.
The search server is based on open Internet standards such as Resource Description Messages (RDM) and the Summary Object Interchange Format (SOIF) to ensure that the search server can operate in a cross-platform enterprise environment.
Users interact with the search system in two ways. They can type direct queries to search the database, or they can browse through the database contents using a set of categories that you design. A hierarchy of categories is sometimes called a taxonomy. Categorizing resources is like creating a table of contents for the database.
Browsing is an optional feature in a search system. That is, you can have a perfectly useful Search system that does not include browsing by categories. You need to decide whether adding categories that users can browse is useful to the users of your index, and, if so, what kind of categories you want to create.
The resources in a Search database are assigned to categories to reduce complexity. If a large number of items are in the database, grouping related items together is helpful. Doing so allows users to quickly locate specific kinds of items, compare similar items, and choose which ones they want.
Such categorizing is common in product and service indexes. Clothing catalogs divide men’s, women’s, and children’s clothing, with each of those further subdivided for coats, shirts, shoes, and other items. An office products catalog could separate furniture from stationery, computers, and software. And advertising directories are arranged by categories of products and services.
The principles of categorical groupings in a printed index also apply to online indexes. The idea is to make it easy for users to locate resources of a certain type, so that they can choose the ones they want. No matter what the scope of the index you design, the primary concern in setting up your categories should be usability. You need to know how users use the categories. For example, if you design an index for a company with three offices in different locations, you might make your top-level categories correspond to each of the three offices. If users are more interested in, say, functional divisions that cut across the geographical boundaries, it might make more sense to categorize resources by corporate divisions.
Once the categories are defined, you must set up rules to assign resources to categories. These rules are called classification rules. If you do not define your classification rules properly, users cannot locate resources by browsing in categories. You need to avoid categorizing resources incorrectly, but you also should avoid failing to categorize documents.
Sun Java System Portal Server can support one or more search servers.
During Portal Server installation, a default search server (search1) is created. You can also create a new search server using the Create Search Server wizard.
You will need to know configuration information specific to the web container instance that you use:
Sun Java System Web Server 7
Sun Java System Web Server 6
Sun Java System Application Server 8.1
BEA Weblogic 8
IBM WebSphere 5
Select Search Servers and then New from the menu bar.
The New Search Server wizard appears.
Follow the instructions and then click Finish to create the specified search server.
The search server stores its descriptions of resources in a database. A search database is a document collection index. They are created by the indexer (command rdmgr, or search server itself). For example, by default the robot can be setup to crawl web sites and the robot indexes whatever it finds into the default" search database where users can search for the data. The data or index into other databases too.
The following are some configuration and maintenance tasks you may need to perform to administer the database:
Normally, items in your search database come from the robot. You can also import databases of existing items, either from other Portal Server Search servers, from iPlanet Web Servers or NetscapeTM Enterprise Servers, or from databases generated from other sources. Importing existing databases of RDs instead of sending the robot to create them anew helps reduce the amount of network traffic. Doing so also enables large indexing efforts to be completed more quickly by breaking the effort down into smaller parts. If the central database is physically distant from the servers being indexed, it can be helpful to generate the RDs locally and periodically import the remote databases to the central database.
The search server uses import agents to import RDs from another server or from a database. An import agent is a process that retrieves a number of RDs from an external source and merges that information into a local database.
Before you can import a database, you must create an import agent. Once an agent is created, you can start the import process immediately or schedule a time to run the import process on a regular basis.
A schema determines what information your search server maintains on each resource, and in what form. The design of your schema determines two factors that affect the usability of your index:
The way users can search for resources
The ways users view resource information
The schema is a master data structure for Resource Descriptions in the database. Depending on how you define and index the fields in that data structure, users have varying degrees of access to the resources.
The schema is closely tied to the structure of the files used by the search server and its robot. You should change only the data structure by using the schema tools in management console. Never edit the schema file directly.
You can edit the database schema of the search server to add a new schema attribute, to modify a schema attribute, or to delete attributes.
The schema includes the following attributes:
Editable – If checked, this attribute indicates that the attribute appears in the Resource Description Editor, and you can change its values.
Indexable – This attribute indicates that users can search for values in this particular field. An indexable fields may also appear in the pop-up menu in the Advanced Search screen.
Description – This attribute is a text string to use to describe the schema. You can use it for comments or annotations.
Aliases – This attribute allows you to define aliases to convert imported database schema names into your own schema.
Score Multiplier – A weighting field for scoring a particular element. Any positive value is valid.
Data Type – Defines the data type.
You might encounter discrepancies between the names used for fields in database schemas. When you import Resource Descriptions from one server to another, you cannot always guarantee that the two servers use identical names for items in their schemas. Similarly, when the robot converts HTML <meta> tags from a document into schema fields, the document controls the names.
The search server allows you to define schema aliases for your schema attributes, to map these external schema names into valid names for fields in your database.
The search server provides a report with information about the number of sites indexed and the number of resources from each in the database.
You might need to re-index the Resource Description database for the search server if you have edited the schema to add or remove an indexed field or if a disk error corrupts the index file. It may also be necessary to re-index if a discrepancy occurs between the database content and its index for any other reason. For example, a system failure while indexing.
Re-indexing a large database can take several hours. The time required to re-index the database corresponds to the number of records in the database. If you have a large database, perform re-indexing at a time when the server is not in high demand.
Removing Resource Descriptions that are out of date is expiring the database. Resource Descriptions are removed only when you run the expiration. Expired Resource Descriptions are deleted, but the database size is not decreased.
One attribute of a Resource Description is its expiration date. Your robots can set the expiration date from HTML <meta> tags or from information provided by the resource’s server. By default, Resource Descriptions expire in three months from creation unless the resource specifies a different expiration date. Periodically your search server should purge expired Resource Descriptions from its database.
Purging allows you to remove the contents of the database. Disk space used for indexes is recovered, but disk space used by the main database is not recovered. Instead it is reused as new data are added to the database.
The search server allows you to put the physical files that make up each search database on multiple disks, file systems, directories, or partitions. By spreading databases across different physical or logical devices, you can create a larger database than would fit on a single device.
By default, the search server sets up the database to use only one directory. The command-line interface allows you to perform two kinds of manipulations on the database partitions:
Adding New Partitions
Moving Partitions
The search server does not perform any checking to ensure that individual partitions have space remaining. It is your responsibility to maintain adequate free space for the database.
You can add new database partitions up to a maximum of 15 total partitions.
Once you increase the number of partitions, you must delete the entire database if you want to reduce the number later.
However, partitions are not recommended as long as you have enough disk space.
To change the physical location of any database partition, specify the name of the new location. Similarly, you can rename an existing partition. Use the rdmgr command to manipulate the partitions. See the Sun Java System Portal Server 7.1 Command Line Reference for information on the psadmin command.
Use the following instruction to manage a database:
Select Search Servers tab, then select a search server.
Click Databases, then Management from the menu bar.
Click New.
The New Database page displays.
Type the name of the new database, and click OK.
psadmin create-search-database
Select Search Servers tab, then select a search server.
Click Databases, then Import Agents from the menu bar.
Click New to launch the wizard.
Specify the Import Agent attributes.
For more information about the attributes, see Import Agents in Sun Java System Portal Server 7.1 Technical Reference
Click Finish.
psadmin create-search-importagent
Select the Search Servers tab, then select a search server.
Click Databases, then Management from the menu bar.
Select a database and click Manage Resource Descriptions.
Click New and specify the attributes.
For more information about the attributes, see Schema in Sun Java System Portal Server 7.1 Technical Reference
Click OK.
Select Search Servers tab, then select a search server.
Click Databases, then Management from the menu bar.
Select a database and click Manage Resource Descriptions.
Select a Resource Description to perform one of the following actions:
Edit
Edit All
Delete
For more information about the attributes, see Schema in Sun Java System Portal Server 7.1 Technical Reference
Click Save.
psadmin modify-search-resourcedescription
The search server provides a number of reports to allow you to monitor search activity.
Select the Search Servers tab , then select a search server.
Click Reports from the menu bar.
Click on a link in the menu bar to view a specific report.
The following options are available:
Logs
Advanced Robot Reports
Popular Searches
Excluded URLs
The following tasks can be used to manage categories:
Select Search Servers from the tab, then select a search server.
Select Categories, then Browse/Search from the menu bar.
Click New.
The New Search Category dialog appears.
Specify the attributes as necessary.
For more information about the attributes, see Manage Categories in Sun Java System Portal Server 7.1 Technical Reference
Click OK.
Select the Search Servers tab, then select a search server.
Click Categories, then Browse/Search from the menu bar.
Select a category and click Edit to display the Edit Category page.
For more information about the attributes, see Manage Categories in Sun Java System Portal Server 7.1 Technical Reference
Select the Search Servers tab, then select a search server.
Click Categories, then Autoclassify from the menu bar.
Click Run Autoclassify.
Click the Search Servers tab, then select a search server.
Click Categories, then Autoclassify from the menu bar.
Modify the attributes as necessary.
For more information about the attributes, see Sun Java System Portal Server 7.1 Technical Reference
Click Save.