iPlanet Web Server, Enterprise Edition Administrator's Guide: Chapter 12 Using Search

Previous Contents Index DocHome Next

iPlanet Web Server, Enterprise Edition Administrator's Guide

Chapter 12 Using Search

The iPlanet Web Server search function allows you to search the contents and attributes of documents on the server. As the server administrator, you can create a customized text search interface tailored to your user community.

Note
The Search function is not available on Linux platforms.

This chapter contains the following sections:

About Search

Configuring Text Search

Indexing Your Documents

Performing a Search: The Basics

Using the Query Operators

Customizing the Search Interface

About Search

Server documents can be in a variety of formats, such as HTML, Microsoft Excel, Adobe PDF, and WordPerfect, provided that there is a conversion filter available for a particular file format. With the filters, the server converts the documents into HTML as it indexes them, allowing you to use your web browser to view the documents found for your search. For more information, see About Collections.
Users can search through server documents for a specific word or attribute value, obtaining a set of search results that list all documents that match the query. They can then select a document from the list to browse it in its entirety. This provides easy access to server content.
As the server administrator you can:

Restrict which users and groups are authorized to use text search

Determine which documents users and groups can access

Modify the configuration files that govern how text search operates

Customize the search query and results pages
To enable searching capability on your server, begin by identifying the special configuration needs of your server, and using the several search configuration windows to input these. Then you need to identify the directory or directories of documents that you want prepared for searching, and index the document information into a searchable database, called a collection. The next several sections discuss the details of configuring search and indexing collections.

Configuring Text Search

You can configure several aspects of the search function for your server:

Collection-specific

Applying across all collections
Collection-specific configuration controls how documents are indexed into a particular collection, and must be defined before you create the collection. Other configuring actions can be defined at any time, because they only affect the searches themselves.
Collection-specific configuration actions are as follows:

Define URL mappings for the document directories to be indexed

Define the pattern files to display for searches on a particular collection
Configuration actions that affect all collections, are as follows:

Establish access control for files and directories

Define any words you want dropped from the search

Define the search parameters

Turn the search function off and on

Restrict the amount of memory available for indexing operations
This section includes the following topics:

Controlling Search Access

Mapping URLs

Eliminating Words from Search

Turning Search On or Off

Configuring the Search Parameters

Configuring Your Search Pattern Files

Configuring Files Manually

Controlling Search Access

The search function accesses the default ACL database for your server. You can restrict access to the documents and directories on your server by defining explicit access control list (ACL) rules, or you can rely on the default access control definitions. You can add users to your server's access control database through the Administration Server's Users & Groups function. For more information about setting access control, see Chapter 8 "Controlling Access to Your Server .
You can configure your server to check access permissions before displaying search results using the Search Configuration interface in the Server Manager, as described in Configuring the Search Parameters. When this option is set, the server challenges the user to identify themselves, and checks a user's access privileges before returning the results of a search query.

Mapping URLs

When users search through a collection's files, the resulting documents use a partial Uniform Resource Identifier (URI), to identify them. This security feature prevents users from knowing the complete physical pathname for a file. A URI is set up by mapping a URL to an additional document directory.
For example, if the path for a file is:
server_root/Docs/marketing/bizplans/planB.doc
you could prevent users from seeing all but the last directory by defining a URL prefix of plans and mapping it to:
server_root/Docs/marketing/bizplans
From then on, users need only enter /plans/planB.doc to locate the file. For more information, see Content Management.

Note
By default, URLs that are redirected are always escaped. To prevent this, add escape="no". For example:

NameTrans fn="redirect" from="/foobar" url-prefix="index.html" escape="no"

The iPlanet Web Server provides three default mappings:

/ (slash) the primary document directory (sometimes called the document root), which initially maps to server_root/docs

/help the directory for most of the help files

/search-ui the directory for most of the search interface files
When creating a collection, you must specify which document directory to index. You can only choose a directory that has a URL mapping, or a subdirectory within such a mapped directory. You can create your own mappings to define specific directories.
To map a URL, perform the following steps:

Open the Class Manager and select the server instance from the drop-down list.

Choose the Content Mgmt tab.

Click the Additional Document Directories link.

The web server displays the Additional Document Directories page.

(Optional) Add another directory by entering one of the following.

URL prefix.

For example: plans.

Absolute physical path of the directory you want the URL mapped to.

For example:

C:/iPlanet/Servers/docs/marketing/plans

Click OK.

Click Apply.

Edit one of the current additional directories listed by selecting one of the following:

Edit

Remove

If editing, select edit next to the listed directory you wish to change.

Enter a new prefix using ASCII format.

(Optional) Select a style in the Apply Style drop-down list if you want to apply a style to the directory:

For more information about styles, see Applying Configuration Styles.

Click OK to add the new document directory.

Click Apply.

Choose Apply Changes to hard start /restart your server.

Note
Once you create a collection based on an additional document directory, you cannot change the URL mapping or the collection's entries will target the URL mapping to the wrong physical file location.

Eliminating Words from Search

You can specify words the search engine should not index or search against. These are typically referred to as stop words or drop words, and include articles, conjunctions, and prepositions such as at, and, be, for, and the.
To specify stop words, edit the file named style.stp. This file resides in each of the subdirectories html, pdf, mail, and news for each collection type in the directory server_root\plugins\search\common\style. Each style.stp file controls stop words for that collection type. For example, the style.stp file in server_root\plugins\search\common\style\html controls stop words for HTML files in that collection.
Add the stop words to style.stp, one per line and left justified. You can use operators such as square brackets ([ ]) to indicate character classes, periods (.) to indicate any
character, and plus notation (+) to indicate repeats. For example, the style.stp file might contain the following lines:

........................................+
at
and
be
[0-9a-zA-Z]
[0-9][0-9][0-9][0-9]+

In this example, the first line of periods (in the file by default) indicates that words with 40 or more characters are not to be indexed, as well as the words at, and, and be. [0-9a-zA-Z] indicates that all one letter words are not to be indexed. [0-9][0-9][0-9][0-9]+ indicates that all integers with four or more digits are not to be indexed.
The words you specify are case sensitive, so you need to enter all the case variations of a word. For instance, for the you should enter the, THE, and The.
If you want to use stop words, make sure you create the style.stp file before you create a collection. Changing the style.stp file after a collection has been created, requires you to:

Delete the current collection.

Change the stop list for the collection type.

Recreate the collection.

Reindex all the documents in the collection.

Turning Search On or Off

Before users can search your server or web site, you must turn search on. The default setting is for search to be turned off. Turn the search function on or off using the Search State interface in the Server Manager.
Turning search off for a server where users do not use search can improve server performance. You may also want to turn off the search function when you know the server will have heavy traffic, and back on when traffic is lighter. When search is turned off, the search plug-in is not loaded when the HTTP server starts up.

Configuring the Search Parameters

As server administrator, you can set the default parameters that govern what users see when they get search results.
To configure search parameters, perform the following steps:

Access the Server Manager and choose Search.

Click the Search Configuration link.

The web server displays the Search Configuration page.

Enter the default maximum number of search result items displayed to users at a time in the Default Result Set Size field.

This number cannot be larger than the value for the largest possible result set size, as defined in Step 4. The default is 20.

Enter the maximum number of items in a result set in the Largest Possible Result Set Size field.

The default is 5000. If you enter 250 as the value, and 1000 documents were found matching the search criteria, users would only have access to either the first 250 or the 250 top-ranked documents.

Enter the Date/Time string in Posix format.

This entry defines how search results are displayed to users. Use the symbols listed in Table 12-1.

Enter a default HTML title to be used when a title tag has not been included with an HTML document.

The typical HTML default is (Untitled) and appears in the search results page for HTML files.

Choose Yes to check access permissions on collection root before doing a search if you want to restrict access to the server.

Choose Yes to check access permissions on search results if you want to restrict access to search results.

Click OK to set your new search configuration.

Click Apply.

Choose Apply Changes to hard start /restart your server.

Table 12-1    Common Posix Date and Time Formats

Format

Displayed result (example)

%a

Abbreviated week day (for example, Wed)

%A

Full week day (for example, Wednesday)

%b

Abbreviated month (for example, Oct)

%B

Full month (for example, October)

%c

Date and time formatted for current locale

%d

Day of the month as a decimal number (for example, 01-31)

%H

Hour as a decimal number, 24 hour military format (for example, 00-23)

%m

Month as a decimal number (for example, 01-12)

%M

Minute as a decimal number (for example, 00-59)

%x

Date

%X

Time

%y

Year without century (for example, 00-99)

%Y

Year with century (for example, 1999)

Configuring Your Search Pattern Files

Pattern files are HTML files that define the layout of the text search interface. You can associate a pattern file with a search function, and a set of pattern variables to create a specific portion of the interface. In the pattern file you define the look, feel, and function of the text search interface. Pattern files use pattern variables that allow you to customize background color, help text, banners, and so on. In some cases, the values are pathnames to files containing the actual text and graphics that these variables represent; in other cases, the values represent text and HTML.
You can use the default pattern files, or you can create your own customized set of files. The Default start and end pattern files will be used if no start and end pattern files are listed for a collection, or in a multi-collection search. For more information about how to change the user interface, see Customizing the Search Interface.
To define where the search function looks for default pattern files associated with a particular search request, you have to specify paths for the files.
To configure pattern files, perform the following steps:

Access the Server Manager and choose Search.

Click the Search Pattern Files link.

The web server displays the Search Pattern Files page.

Enter the absolute path for the directory where your pattern files are stored.

The default start (header), end (footer), and query page pattern files are located in this directory.

Enter the relative pathname for the Default Start Pattern File.

This entry defines the top of the search results page when a collection has no defined header file, or when more than one collection is being searched.

Enter the relative pathname for the Default End Pattern File.

This entry defines the footer of the search results page when for a collection has no defined footer file, or when more than one collection is being searched.

Enter the relative pathname for the Pattern File for Query Page.

This entry defines the search query page that appears when the search function is started.

Click OK to configure your search pattern files.

Click Apply.

Choose Apply Changes to hard start /restart your server.

Configuring Files Manually

The search function examines several configuration files to determine how search is configured on your server. These files define system settings, user-defined variables, and information about your search collections. You normally change this information through the iPlanet Web Server's Search pages, but you can also modify the files manually with your own text editor. Some of the implications of changing the configuration files in order to customize the user interface are discussed in Customizing the Search Interface.

Note
Manual modifications to your configuration files are not recommended. If you do make manual modifications, remember to restart the server for your modifications to take effect.

The Configuration Files

The configuration files that govern searching are described in the following list:

userdefs.ini—This user definitions file defines the user-defined pattern variables. It maps to the userdefs.ini file for your language (English, German, Japanese, and so on).

You can customize a search interface for all your pattern files by creating and defining your own pattern variables in the userdefs.ini file. For more information, see User-defined Pattern Variables.

dblist.ini—This collection contents file describes collection-specific information. When you create and maintain collections, the dblist.ini file is updated for you with information about your collections.

Adjusting the Maximum Number of Attributes

Collections have different sets of default attributes depending on their file format. For example, HTML files have Title and SourceType attribures. You can also define META-tagged HTML attributes in your HTML files. Some file formats, such as PDF, have a great many default attributes. For more information about the attributes for each format, see About Collection Attributes and Table 12-2.

Restricting Memory for Indexing

You can set a limit on the amount of RAM available for indexing operations. To do this, manually edit the [NS-loader] file to add a line defining a maximum memory amount. For example:

NS-max-memory = 32000000

The server default uses all of the available system memory. Typically you need to limit the RAM used for indexing if:

The server is installed on a machine that has less than the suggested minimum RAM

Server administrators on Windows NT servers require a great deal of indexing, but need memory for other server operations

Restricting Your Index File Size

You can limit how much disk space an index file can consume. To do this, you need to manually edit the [NS-loader] file to define a maximum index file size. For example,

NS-max-idx-file-size = 1500000

An indexing operation typically requires approximately 1.5MB per file, and since there are two files, one of which is temporary, you may need as much as 3MB of disk space for indexing. Setting the file size to 1.5MB per file puts a cap on how large each file can become.

Indexing Your Documents

A database of searchable data is required for users to search. You must create a database, called a collection, that indexes and stores information about the documents, such as their content and file properties.
Searches require collections of files to target. Once the documents are indexed, their contents and file properties, such as their titles, creation dates, and authors, are available for searching.
You can add or delete documents from a collection: optimizing, updating, and managing your collections as needed.
This section includes the following topics:

About Collections

About Collection Attributes

Creating a New Collection

Configuring a Collection

Updating a Collection

Maintaining a Collection

Scheduling Regular Maintenance

Removing Scheduled Collection Maintenance

About Collections

When your server administrator indexes all or some of a server's documents, information about the documents is stored in a collection. Collections contain such information as:

Format of the documents

Language they are in

Searchable attributes

Number of documents in the collection

Collection's status

Brief description of the collection.
For more details, see Displaying Collection Contents.
When creating a collection, you must define the type of files that it contains:

HTML

ASCII

News

Email

PDF
During indexing, this definition determines which attributes are indexed, and whether any file conversion is needed.
You can index all the files in a directory, or only those with a specific extension, for example HTML, or PDF.
A collection has records with information about each document that has been indexed. If the document is deleted from the collection, only the collection's entry for that document is removed. The original document is not deleted.
When you have multiple server instances, a collection is only associated with the server instance where it was created. Users can only search collections within that server instance.

About Collection Attributes

Certain file formats have a default set of attributes indexed for files of that type, as shown in Table 12-2.

Table 12-2    The Default Attributes Indexed for Each File Format

File format

Attribute

Type

Description

ASCII

(none)

-

-

HTML

Title

text

The user-defined title of the file.

SourceType

text

The original format of the document.

NEWS

From

text

The source userID of the news item.

Subject

text

The text from the subject field of the news item.

Keywords

text

Any keywords defined for the news item

Date

date

The date the news item was created.

EMAIL

From

text

The source userID of the email.

To

text

The destination userID of the email.

Subject

text

The text from the email's subject field.

Date

date

The date the email was created.

PDF

InstanceID

text

An internal ID number.

PermanentID

text

An internal ID number.

NumPages

integer

The number of pages in the document.

DirID

text

The directory where the PDF file exists.

FTS_ModificationDate

date

The document's last modification date.

FTS_CreationDate

date

The document's creation date.

WXEVersion

integer

The version of Adobe Word Finder used to extract the text from the PDF document.

FileName

text

The Adobe filename specification.

FTS_Title

text

The document's title.

FTS_Subject

text

The document's subject.

FTS_Author

text

The document's author.

FTS_Creator

text

The document's creator.

FTS_Producer

text

The document's producer.

FTS_Keywords

text

The document's keywords.

PageMap

text

The page map, describing the word instances for the page.

By default, HTML collections have Title and SourceType attributes, but they can be indexed to permit searching and sorting by up to 30 file attributes tagged with the HTML <META> tag. You can change the maximum settings for file attributes as discussed in Adjusting the Maximum Number of Attributes.
For example, a document could have these lines of HTML code:

<META NAME="Writer" CONTENT="R. Hunter">

<META NAME="Song" CONTENT="Stella Blue">

If this document was indexed with its META tags extracted, you could search it for specific values in the writer or product fields. For example, you could enter this query: Writer <contains> Hunter or Song <contains> Blue.

Note
Attribute values in META-tagged fields are text strings only, which means that all numeric values, such as date and time, are sorted as text. Any illegal HTML characters in a META-tagged attribute are replaced with a hyphen.

Creating a New Collection

You can only have twelve collections on your server. To use a thirteenth collection, you must first remove an existing collections using Search /Maintain Collection.
You can only have entries for a maximum of 16 million documents in your collections. A document that is indexed in multiple collections counts as multiple documents. It is best to create new collections of over 10,000 documents at low-traffic times, or the indexing operation may affect your system's performance.
You can create a collection that indexes the content of all or some of the files in a directory. You can define collections that contain only one kind of file, or you can create a collection of documents in various formats that are automatically converted to HTML during indexing. When you define a multiple format collection with the auto-convert option the indexer first converts the documents into HTML, and then indexes their contents. The converted HTML documents are put into the html_doc directory in the server's search collections folder.
The file format you choose defines which default attributes are used in the collection, and whether automatic HTML conversion of the content is needed during indexing. For information about the attributes for each format, see Table 12-2, and About Collection Attributes.
Regardless of the file type chosen, the content of the file is always indexed. If you choose HTML as the file type, the server creates the collection with the HTML default attributes, and does not attempt to convert any non-HTML files you try to index. If you index HTML files into an ASCII collection, even the HTML markup tags are indexed as part of the file's contents, and the contents are displayed as raw text.

Note You need to have at least 3MB of available disk space on your system to create a collection. For information on how you can restrict the size of the index files, see Restricting Your Index File Size.

To create a new collection, perform the following steps:

Access the Server Manager and select the server to create a collection for form the drop-down list.

Choose the Search tab.

Click the New Collection link.

The web server displays the Create a Collection page.

Select:

The current document directory from the The Directory to Index field

A different document directory defined for the server drop-down list

View for a list of files and subdirectories

For more information about additional document directories, see Mapping URLs.

Accept the default *.html for the Documents Matching field, or define your own wildcard expression.

You can define multiple wildcards in an expression. For example:

(*.htm|*.html or *(.htm|.html)

For details of the syntax for wildcard patterns, see Using Wildcards.

Note
You cannot index a file that includes a semi-colon (;) in its name. Rename these files.

Choose Yes to include subdirectories within the specified directory in the index.

Enter a name for your collection in the Collection Name field.

The collection name is used for collection maintenance. This is the physical file name for the file, so follow the standard directory-naming conventions for your operating system. You can use up to a maximum of 128 characters. Spaces are converted to underscores.

Note
Do not use accented characters in the collection name. If you need accented characters, exclude the accents from the collection name, but use them in the label. The label is displayed to the user from the search interface.

(Optional) Enter a user-defined name for your collection in the Collection Label field.

This name is displayed when users perform a text search. Make your collection's label as descriptive and relevant as possible. You can use any characters except single or double quotation marks, up to a maximum of 128 characters.

(Optional) Enter a description for your collection (up to a maximum of 1024 characters) in the optional Description field.

This description is displayed in the collection contents page.

Select the type of files the collection is to contain:

ASCII

HTML

News

Email

PDF

Select whether or not to extract META-tagged attributes from HTML files during indexing.

Only select this option for HTML collections. Extracting these attributes allows you to search their values. You can index a maximum of 30 different user-defined META tags in a document.

Choose the collection's language from the drop-down list.

The default is English, labeled "English (ISO-8859-1)." For more information on character sets, see Content Management.

Click OK to create a new collection.

Note
Once you begin indexing a collection, you cannot stop the process until either the indexing is complete, or you reboot the system. Shutting down your server does not kill the process.

Click Apply.

Choose Apply Changes to hard start /restart your server.

Configuring a Collection

After you have created a collection you can modify some of its initial settings. These settings reside in the collection information file, dblist.ini. When you reconfigure a collection the dblist.ini file is updated to reflect your changes. For more information about the configuration files, see Configuring Files Manually. You can modify your settings to:

Revise the description

Change the label

Define a different URL for the documents

Define how to highlight displayed documents

Define which pattern files to use

Define how to format dates

You should avoid making unnecessary changes to a collection's settings.
To configure a collection, perform the following steps:

Access the Server Manager and select the server instance from the drop-down list that the collection is in.

Choose the Search tab.

Click the Configure Collection link.

The web server displays the Configure Collection page.

Choose the collection to configure.

You can enter or change:

A description using up to 1024 characters in the optional Description field.

A user-defined name in the Label field.

A URL in the URL for Documents field, if that has changed.

For example, you might change the URL mapping from publisher/help, to the simpler /helpFiles.

The HTML tagging the server will use when highlighting a search query word or phrase in a displayed document in the Highlight Begin and Highlight End fields.

The default is bold, with the <b> and </b> tags, but you can add to this or change it. For example, you could add <blink><FONT COLOR = #FF0000> and the corresponding </blink></FONT> to highlight with blinking bold red text.

Select the format for input dates.

Define or change the default pattern files for displaying the search results for:

Header

Footer

Record

Enter or change the name of the pattern file displaying a single highlighted document from the list of search results in the Result Pattern File field.

Click OK to change the collection configuration.

When the server finishes configuring the collection, click Apply.

Choose Apply Changes to hard start /restart your server.

Updating a Collection

After you have created a collection, you may want to add or remove files. When adding documents, the file contents are indexed and converted, if necessary. If you are removing documents, the entries for the files are removed from the collection along with their metadata. The original documents are not affected, only their entries in the collection.

Note
If you selected the Extract Metatags option when you created a collection, then the META-tagged HTML attributes are indexed whenever you add new documents to it.

To update a collection, perform the following steps:

Access the Server Manager and select the server instance from the drop-down list that the collection is in.

Choose the Search tab.

Click the Update Collection link.

The web server displays the Update Collection page.

Choose the collection to update.

The list of documents displays which have index entries in the currently selected collection. Each list holds 100 records; use the Prev and Next buttons to display more lists for collections greater than 100 files.

Enter a single filename, or use wildcards to specify the type of files you want added or removed from the collection in the Documents Matching field.

Entering a wildcard such as *.html, allows only files with this extension to be updated. For files within a subdirectory enter the pathname as it appears in the list of files. For example: frenchDocs/*.html

Caution
Be careful entering wildcard expressions. Entering index.html allows you to add or remove the index file from the current collection, but */index.html causes you to add or remove all index.html files in the collection.

Choose whether to include subdirectories.

Click either:

AddDocs to add the indicated files and subdirectories

RemoveDocs to remove the indicated files

Click Apply.

Choose Apply Changes to hard start /restart your server.

Maintaining a Collection

Periodically, you may want to maintain your collections. With normal usage, these tasks may not be necessary, unless you index and update collections frequently.
You can perform the following collection management tasks:

Optimize collections—You can optimize a collection to improve performance if you frequently add, delete, or update documents or directories in your collections. An analogy is defragmenting your hard drive. Optimizing is not done automatically; you must manually optimize after you reindex or update a collection. You might want to optimize a collection just before publishing it to another site, or before putting it onto a read-only CD-ROM.

Reindex—You can reindex a collection. Each file that already has an entry in the collection can be located and its attributes and contents reindexed. META-tagged attributes will be extracted if that option was selected when the files were originally indexed into the collection. This does not return to the original criteria for creating the collection, say *.html, and add any new documents that fit the original criteria. Collection entries are removed when the source documents have been deleted and can no longer be found.

Remove—You can remove a collection. This only removes the collection, not the original source documents.

Note
Do not use your local file manager to remove collections. When you try to execute a search before restarting your server again, the search will fail.

To perform any of these collection management tasks, use The Maintain Collection link in the Server Manager.

Scheduling Regular Maintenance

You can schedule collection maintenance at regular intervals, and you can set up separate maintenance schedules for optimizing and reindexing. With normal usage, these tasks may not be necessary, unless you index and update collections frequently. For example, some very active web sites may require frequent reindexing if new documents are added on a daily basis.
A common combination of tasks regularly schedules:

Cleaning out deleted entries with reindex and update operations

Adding entries for new documents matching your collection criteria.

Updating a collection by entering new indexing criteria for the collection
To optimize, reindex, or update your collection, perform the following steps:

Choose Search from the Server Manager.

Click the Schedule Collection Maintenance link.

The web server displays the Schedule Collection Maintenance window.

Choose a collection from the drop-down list.

This lists all the collections that you have created.

Choose an action from the drop-down list:

Reindex

Optimize

Update

You can set up different schedules for different actions on the same collection. If you choose to update your collection, two extra fields are displayed for entering the document matching criteria, and for including documents found in subdirectories that match your criteria.

Enter the time of day when you want the scheduled maintenance to take place in the Schedule Time field.

Use a military format (HH:MM). HH must be less than 24 and MM must be less than 60. You must enter a time.

Check one or more days in the section labeled Schedule Day(s) of the Week.

You can select all days, but you must select at least one day.

Click OK to schedule the maintenance.
For Unix/Linux users, to make your newly scheduled maintenance take effect, you must restart the ns-cron process from the Administration Server.
To restart the ns-cron process, perform the following steps:

From the Administration Server, Choose Global Settings.

Click the Cron Control link.

If ns-cron is already on, click Restart to restart it. If ns-cron is not on, click Start to start it up.

In either case, your regularly scheduled maintenance will now be able to take place automatically.

Removing Scheduled Collection Maintenance

You can remove scheduled regular maintenance of a collection if no longer needed.
To unschedule collection maintenance, perform the following steps:

Choose Search from the Server Manager.

Click the Remove Scheduled Collection Maintenance link.

The web server displays the Remove Scheduled Collection Maintenance window.

Choose a collection from the drop-down list for Choose Collection.

This lists all your collections for which you have set up regular maintenance.

Choose an action from the drop-down list: Reindex, Optimize, or Update.

In the lower part of the frame, you can see the time and days of the week when the scheduled maintenance is currently scheduled to take place.

Click OK to remove the scheduled maintenance.
For Unix/Linux users, to make your newly scheduled maintenance take effect, you must restart the ns-cron process.
To restart the ns-cron process, perform the following steps:

From the Administration Server, choose Global Settings.

Click the Cron Control link.

If ns-cron is already on, click Restart to restart it. If ns-cron is not on, click Start to start it up.

In either case, your regularly scheduled maintenance will no longer take place.

Performing a Search: The Basics

Users are primarily concerned with asking questions of the data in the search collections and getting a list of documents in return. When you install the iPlanet Web Server a default set of search query and result forms are included. These allow users a simple method of accessing the search function.
There are four parts to text searching:

making a query—the user enters search criteria

displaying search results—the server displays a list of the documents that match your criteria

viewing a document—the user can view a specific highlighted document from the search results list

viewing the contents of a collection—the user can look at the information that is maintained for each of your collections.

Note
If the search function is turned off, these query forms are not available.

This section includes the following topics:

Search Home Page

A Search Query

Guided Search

Advanced Search

The Search Results

Displaying Collection Contents

Search Home Page

The search home page (see: http://server_root:port/search) provides individual links to each of the three search query interfaces as well as an online QuickStart tutorial on customizing the interface. The tutorial discusses the various pattern files and gives examples of how they can be changed to produce different results.

A Search Query

The default installation of iPlanet Web Server includes three search query pages: standard and advanced HTML queries and a Java-based guided query.
With the standard search query you select a collection to search against, and enter in a word or phrase to search for using the query language operators.
With the guided Java-based search interface you can use the many drop-down lists to easily construct a query. To do this Java must be enabled for your browser.
With the advanced HTML page, you have the additional options of selecting multiple collections to search through, establishing a sort sequence for the results, and defining how many documents are to be displayed on a page. Typically, clicking the Prev and Next arrows moves you through the pages of results.
To perform a standard search, perform the following steps:

Enter the following URL in the location field in your web browser:

http://server_root:port/search

In the search query page that appears, choose the collection you want to search through from the drop-down list in the Search In field.

Enter the word or phrase for your search query in the For field. You can create complex queries by combining operators. For details about the search operators, see Using the Query Operators.

Click the Search button to execute your query.

Guided Search

You can choose to use the Java-based guided search interface, which helps you construct the query. This is especially useful if you want to build a query that has several parts, say searching for a word in the documents' content as well as a specific attribute value.
Make sure Java is enabled for your browser. Use the Preferences /Languages to enable.

Note
The attributes for Version Control and Link Management are no longer used in iPlanet Web Server. However, note that if you perform a guided search, iPlanet Web Server may still return them; consequently, do not use these variables.

There are two ways to obtain the guided search page:

Through the Search home page

Through the standard search query page
To access the guided search interface through the Search home page, perform the following steps:

Enter the following URL in the location field in your web browser:

http://server_root:port/search

Click the Guided Search link on the home page.
To access guided search through the standard search query page, perform the following steps:

Go to the standard search query page by typing the following URL in the location field in your web browser:

http://server_root:port/search

Click Guided Search on the standard search page and the guided Java-based query page is displayed.

Choose the collection you want to search through from the drop-down list in the Search In field.

Use the For drop-down list to select the type of element you wish to search for. In this example, choose Words.

Enter in the word you want to search for in the blank text field.

For details about the search operator, see Using the Query Operators.

Click Add Line to add the first part of the query. The word appears in the large text display box at the bottom of the form.

Choose another element from the drop-down list to add to your query. In this example, choose Attribute.

Choose the attribute you want to search against from the new drop-down list of all attributes that are available for the chosen collection.

Choose a query operator (Contains, Starts, Ends, Matches, Has a substring), or logical operator (=, <, , <=, =) for your query from the drop-down list above the text input field.

Enter the attribute value you want to search for in the blank text field.

Choose:

Add Line to add another line for yor query

Undo Line to remove the last line you added

Clear to remove the entire query

Click the Search button to execute the search.

Advanced Search

You can choose to use the advanced HTML search interface, which helps you construct the query. This is especially useful if you want to create a query that searches through more than one collection, or that produces results sorted by a specific attribute value.
There are two ways to obtain the advanced HTML search page:

Through the Search home page

Through the standard search query page

To access advanced HTML through the Search home page, perform the following steps:

Enter the following URL in the location field in your web browser:

http://server_root:port/search

Click the Advanced HTML Search link on the home page.
To access advanced HTML search through the standard search query page, perform the following steps:

Go to the standard search query page by typing the following URL in the location field in your web browser:

http://server_root:port/search

Disable Java for your browser using Preferences /Languages.

Click Guided Search on the standard search page and the web server displays the advanced HTML query page.

Enter the word or phrase you want to search for in the For field.

You can create complex queries by combining operators. For details about the search operators, see Using the Query Operators.

Enter in one or more attributes to sort the results by.

The default is an ascending sort order, but you can indicate a descending sort order with a minus. For more information about sorting, see Sorting the Results.

Expand or limit the number of matching documents you want the search to return depending on how many fields are listed for each document in the search results page, or how many you want to see at a time.

The Prev and Next buttons allow you access to additional pages of documents if there are too many to fit on a page at once.

Use the drop-down list in the Search In field to choose the collection you want to search through.

You can select more than one collection by holding down the Ctrl key as you click on another collection. All collections in a query must be in the same language.

Click the Search button to execute your query.

The Search Results

There are two standard types of search results:

A list of all documents that match the search criteria

The text of a single document that you selected from the list of matching documents
Your access permissions are checked at several points during the search process:

When a user clicks on the icon displayed for a document in the search results which displays the highlighted version

When searching on a collection that has the option NS-collection-acl-check set to yes. NS-collection-acl-check applies to all collections. When it is set, ACLs that are set on URIs matching the primary document directory defined for the collections (in dblist.ini) will be honored by not allowing search to be done on those collections

Listing Matched Documents

With the default installation of the iPlanet Web Server, when you execute a search from either the simple or advanced search query pages, the server returns a list of the documents that match your search criteria. The list gives some standard information about each file, depending on the collection's format. For example, the default results page for email collections give subject, to, from, and date for each entry; and news collections give subject, from, and date for each entry.
The file format in the collection indicates which default attributes are available for searching. For information about the attributes for each format, see About Collection Attributes.
For entries resulting from a search that checks for comparative proximity of words to each other, or for the exactness of the match, the file's ranking can be provided by showing a score.
If there are more matching documents than can fit on a page, click Next to see the next batch. You can always execute a new search by entering new query data and clicking Search.

Sorting the Results

By default, or if you don't enter anything in the Sort By field on the advanced HTML query page, all documents matching the search are returned according to:

Their relevance ranking (for queries that consider this)

Their position in the server file database (for other queries)
If you enter an attribute name in the Sort By field, the documents are displayed in an ascending sort sequence. You can list the documents in a descending sort sequence by adding a minus sign (-) prefix to the attribute, as in -keywords or -title. You can do a multiple sort, by typing in more than one field, as in Author,-PubDate.
In a short query, sort order usually isn't critical, but in queries that result in a great many matches, you may want to set a sort value in order to obtain useful search results. However, using a special sort sequence may impact the search's performance.

Note
Attribute values in META-tagged fields are text strings only, which means that all numeric values, such as date and time, are sorted as text. Any illegal HTML characters in a META-tagged attribute are replaced with a hyphen.

Displaying a Highlighted Document

In the default installation of iPlanet Web Server, when you obtain a list of the documents that match your search criteria, you can select a single document to view in your web browser. Depending on how the pattern files are set up, the word you entered as your original search query can be highlighted in the displayed document with color, boldface text, or blinking.
To view a highlighted document, click on the document's entry in the search results. The field you use to access the highlighted document depends on how your search interface has been designed; in the default installation you click the icon shown next to the document's listing. Additional code behind the icon's link defines how to format the displayed document with the search query highlighted.
In the default search results page, if you click the file's URL, the file opens in your browser without any special highlighting.
In the case of documents that have been converted into HTML, the URL points you to the original document. To get to the converted HTML document, click the document's title.

Displaying Collection Contents

You can display the contents of your collection database to see which attributes are set for each collection. The default installation of iPlanet Web Server uses the HTML-description.pat file to display information about each of your displayable (NS-display-select = YES) collections in the dblist.ini file. The collection contents typically include these items:

Collection name, label, and description

Collection format

Number of attributes in the collection and a list of their names

Number of documents in the collection

Collection size and status

Language and character set

Input and output date formats
To display your collection database contents, use the following URL:

http://server_root:port/search?NS-search-page=c

Using the Query Operators

To perform an effective search, you need to know how to use the query operators. You can only do Boolean searches, so all the subsequent information is based on Boolean search rules.

Note
The query language is not case-sensitive. The examples use uppercase for clarity only.

The search engine interprets the search query based on a set of syntax rules. For example, by entering the word region, the actual word region and all its stemmed variations, such as regions and regional, are found. The search results are ranked for importance, meaning how close the matched word comes to the originally input search criteria. In the example above, region would rank higher than any of the stemmed variants.
Not all queries rank their results. Only those queries that can have varying degrees of matching can be ranked. For example, <CONTAINS queries either do or do not contain the given string, but <NEAR queries can be ranked according to how close the words are to each other. Words closer together are listed at the top of the search results, while those that are far apart are put at the bottom of the results.
This section includes the following topics:

Default Assumptions

Search Rules

Determining Which Operators To Use

Using Wildcards

Default Assumptions

The search query language has some implicit defaults and assumptions that dictate how your input is interpreted. In some cases, you can circumvent the defaults, but the search engine decides what results to return using:

<STEM> Search finds all documents that contain any stemmed variant of the search word or phrase. The search engine looks at the meaning of the word, not just its spelling. For example, if you want to search plan, the results would include documents that contain planning and plans, but not those that contain plane or planet.

<MANY> Search considers how often the search word or phrase appear in the found documents and ranks the results for frequency or relevancy.

<PHRASE> Search considers words separated by spaces to be part of a phrase. For example, Monterey otter is interpreted as a phrase, and both words must be present and together to be found. Such a search would not find documents containing sea otter or Monterey Bay.

In any case where it's not clear that two words are to be considered as a phrase, you can use parentheses for clarity. For example, <PHRASE> (rise "and" fall).

OR Search considers each word or phrase in the query separated by a comma to be optional, although at least one must be present. In effect, this is an implicit OR operation. For example, Monterey, otter is interpreted as find documents that contain either Monterey or otter. Note that angle brackets are not required for OR.

Search Rules

To create complex searches, you can:

Combine query operators

Manipulate the query syntax

Include wildcard characters

Angle Brackets

With the exception of the AND, OR, NOT, and the date and numeric comparison operators, you need to enclose query operators in angle brackets, as in <CONTAINS> and <WILDCARD>.

Combining Operators

You can combine several query operators into a single query to obtain precise results. For example, you can input the following query to limit your search to those documents that have Bay and Monterey, but excludes those that also mention Aquarium:

Monterey AND Bay NOT <CONTAINS> Aquarium

You can achieve even greater precision by including some implicit phrases, as in the following query that finds documents that refer to the Monterey Bay Aquarium by its full name and also mention otters but do not refer to shark:

Monterey Bay Aquarium AND otter AND NOT shark

Using Query Operators as Search Words

You can use any of the query operators as a search word, but you must enclose the word in quotation marks. For example, you could search for documents about the ebb and flow of the tides with the following query:

<CONTAINS> ebb "and" flow

Canceling Stemming

You can cancel the implicit stemming by using quotation marks around a word. For example, you can be exact by using a query such as this:

"plan"

This search only results in documents that contain the exact word plan. It ignores documents with plans or planning.

Modifying Operators

You can use AND, OR, and NOT to modify other operators. For example, you may want to exclude documents with titles that contain the phrase theme park. A query such as this would solve this problem:

Title NOT <CONTAINS> theme park

Determining Which Operators To Use

Use the following reference to help determine which operators to use. Note that the query language is not case-sensitive, so <starts> and <STARTS> are equivalent. This document uses uppercase for clarity only.

Table 12-3    Deciding which operator to use

Type of Search

Valid Operators

Examples

Finding documents by date or numeric value comparison.

greater than (>)

greater than or equal to (>=)

less than (<)

less than or equal to (<=)

DATE >= 06-30-96
Finds documents created on or after June 30, 1996.

Finding words or phrases in specific document fields or in specific locations in the field.

<STARTS>

<CONTAINS>

<ENDS>

is equal to (=)

Title <STARTS> Help
Finds documents with titles that start with Help.

Finding two or more words in a document.

AND

<NEAR/1>

specifications AND review
Finds documents that contain both specifications and review.

The following table describes some commonly used operators and provides examples of how to use each one. All are relevance ranked except where explicitly noted.

Table 12-4    Query language operators

Operator

Description

Examples

AND

Adds mandatory criteria to the search.

Finds documents that have all of the specified words.

Antarctica AND mountain climb
Finds only documents containing both Antarctica and mountain climb plus all the stemmed variants, such as mountain climbing.

<CONTAINS>

Finds documents containing the specified words in a document field. The words must be in the exact same sequential and contiguous order.

You can use wildcards. Only alphanumeric values.

Does not rank documents for relevance.

Title <CONTAINS> higher profit
Finds documents containing the phrase higher profit in the title. Ignores documents with profits higher in the title.

<ENDS>

Finds documents in which a document field ends with a certain string of characters.

Does not rank documents for relevance.

Title <ENDS> draft
Finds documents with titles ending in draft.

equals (=)

Finds documents in which a document field matches a specific date or numeric value

Created = 6-30-96
Finds documents created on June 30, 1996.

greater than (>)

Finds documents in which a document field is greater than a specific date or numeric value.

Created > 6-30-96
Finds documents created after June 30, 1996.

greater than or equal to (>=)

Finds documents in which a document field is greater than or equal to a specific date or numeric value.

Created >= 6-30-96
Finds documents created on or after June 30, 1996.

less than (<)

Finds documents in which a document field is less than a specific date or numeric value.

Created < 6-30-96
Finds documents created before June 30, 1996.

less than or equal to (<=)

Finds documents in which a document field is less than or equal to a specific date or numeric value.

Created <= 6-30-96
Finds documents created on or before June 30, 1996.

<MATCHES>

Finds documents in which a string in a document field matches the character string you specify.

Ignores documents that contain partial matches.

Does not rank documents for relevance.

<MATCHES> employee
Finds documents containing employee or any of its stemmed variants such as employees.

<NEAR>

Finds documents that contain the specified words. The closer the terms are to each other in the document, the higher the document's score.

stock <NEAR> purchase
Finds any document containing both stock and purchase, but gives a higher score to a document that has stock purchase than to one that has purchase supplies and stock up.

<NEAR/N>

Finds documents in which two or more specified words are within N number of words from each other. N can be an integer up to 1000. Also ranks the documents for relevance based on the words' proximity to each other.

stock <NEAR/1> purchase

Finds documents containing the phrases stock purchase and purchase stock.

Ignores documents containing phrases like purchase supplies and stock up because stock and purchase do not appear next to each other.

When N is 2 or greater, finds documents that contain the words within the range and gives a higher score for documents which have the words closer together.

NOT

Finds documents that do not contain a specific word or phrase.

Note: You can use NOT to modify the OR or the AND operator.

surf AND NOT beach
Finds documents containing the word surf but not the word beach.

OR

Adds optional criteria to the search.

Finds any document that contains at least one of the search values.

apples OR oranges
Finds documents containing either apples or oranges.

<PHRASE>

Finds documents that contain the specified phrase.A phrase is a grouping of two or more words that occur in a specific order.

<PHRASE> (rise "and" fall)
Finds documents that include the entire phrase rise and fall. The and is in quotes to force the search to interpret it as a literal, not as an operator.

<STARTS>

Finds documents in which a document field starts with a certain string of characters.

Does not rank documents for relevance.

Title <STARTS> Corp
Finds documents with titles starting with Corp, such as Corporate and Corporation.

<STEM>
(English only)

Finds documents that contain the specified word and its variants.

<STEM> plan
Finds documents that contain plan, plans, planned, planning, and other variants with the same meaning stem. Ignores similarly spelled words such as planet and plane that don't come from the same stem.

<SUBSTRING>

Finds documents in which part or all of a string in a document field matches the character string you specify.

Similar to <MATCHES>, but can match on a partial string.

Does not work with wildcards.

Does not rank documents for relevance.

<SUBSTRING> employ
Finds documents that can match on all or part of employ, so it can succeed with ploy.

<WILDCARD>

Finds documents that contain the wildcard characters in the search string. You can use this to get words that have some similar spellings but which would not be found by stemming the word.

Some characters, such as * and ?, automatically indicate a wildcard-based search, so you don't have to include the word <WILDCARD>.

<WILDCARD> plan*

Finds documents that contain plan, plane, and planet as well as any word that begins with plan, such as planned, plans, and planetopolis.

See the next section for more details and examples.

<WORD>

Finds documents that contain the specified word.

<WORD> theme
Finds documents that contain theme, thematic, themes, and other words that stem from theme.

Using Wildcards

You can use wildcards to obtain special results. For example, you can find documents that contain words that have similar spellings but are not stemmed variants. For example, plan stems into plans and planning, but not plane or planet. With wildcards, you can find all of these words.
Only the * and ? wildcard charcters are supported. They automatically indicate a wildcard-based search, and do not require you to use the <WILDCARD> operator as part of the expression.

Table 12-5    Wildcard Operators

Character

Description

*

Specifies 0 or more alphanumeric characters. For example, air* finds documents that contain air, airline, and airhead.

Cannot use this wildcard as the first character in an expression.

This wildcard is ignored in a set of ([ ]) or in an alternative pattern ({ }).

With this wildcard, the<WILDCARD> operator is implicit.

?

Specifies a single alphanumeric character, although you can use more than one ? to indicate multiple characters. For example, ?at finds documents that contain cat and hat, while ??at finds documents that contain that and chat.

This wildcard is ignored in a set of ([ ]) or in an alternative pattern ({ }).

With this wildcard, the<WILDCARD> operator is implicit.

Non-alphanumeric Characters

You can only search for non-alphanumeric characters if the style.lex file used to create the collection is set up to recognize them. This file is in the HTML, news, and mail subdirectories of the server_root\plugins\common\ directory.

Customizing the Search Interface

As server administrator, you can customize the search interface to meet specific user requirements. All of the HTML-based forms that the user sees are defined through a set of pattern files to:

Display formats for the search results page header and footer

Display each search result record listed in response to a query
There is a set of pattern variables to construct the forms used for search input and output. Many of the variables are defined in the system and user configuration files userdefs.ini and dblist.ini, which are discussed in Configuring Files Manually.

Note
The search home page, at http://server_root:port/search also provides an introduction to the search interface, as well as an online QuickStart tutorial on customizing the interface. The tutorial discusses the various pattern files, and gives examples of how they can be changed to produce different results.

This section includes the following topics:

Dynamically Generated Headers and Footers

HTML Pattern Files

Search Function Syntax

Using Pattern Variables

Dynamically Generated Headers and Footers

You can specify dynamically generated headers and footers. To accomplish this, add the add-headers and add-footers directives to your obj.conf file as Service functions. These directives require either a path or URI parameter. Use the path parameter to specify a static file as the header or footer. For example:

Service fn="add-headers" path="/export2/docs/header.html"

Service fn="add-footer" path="/export2/docs/footer.html"

Use the URI parameter to specify a dynamically generated file, such as a CGI program, as the header or footer. For example:

uri="/cgi-bin/header.cgi"

These Service functions should precede the actual Service function that will answer the request, such as send-file or send-cgi.

HTML Pattern Files

A good place to begin customizing the interface is to modify the existing pattern files. After you understand pattern variables and how they work, you can create your own pattern files, and change the configuration files and other pattern files to point to them. In the default installation of iPlanet Web Server, the pattern files are in this directory: server_root\plugins\search\ui\text. It's a good idea to make copies of your original pattern files so you can restore them afterwards.
There are pattern files for different kinds of collections: email, news, ASCII, PDF, and HTML. There are several general types of pattern files, each of which has a particular use. A file prefix designates which type of file the pattern file is for, for example, ASCII-record.pat, or EMAIL-record.pat. The following list describes the general pattern file types:

NS-query.pat displays the standard and advanced query pages. Contains HTML calling the Web Search (the "Search the Web" box) as part of the search query page.

tocstart.pat displays the header across the top of the search results page.

tocrec.pat displays each document listed on the search results page.

tocend.pat displays the footer across the bottom of the search results page.

record.pat displays a single highlighted document from the search results page (for more information, see Displaying a Highlighted Document).

descriptions.pat displays the collection contents.
The pattern files contain HTML formatting instructions, which define how elements look; and HTML search arguments and variables, which define the text label or value that is displayed.
There are three kinds of pattern variables (discussed further in Using Pattern Variables):

User defined, in the userdefs.ini file, with a $$ prefix (see User-defined Pattern Variables).

defined in the configuration files, dblist.ini files, with a $$NS- prefix (for more information, see Configuration File Variables).

search macros and variables generated by a pattern file, with a $$NS- prefix (for more information, see Macros and Generated Pattern Variables).
The following lines from the standard query pattern file, NS-query.pat show how these work together:

<input type="hidden" name="NS-max-records" value="$$NS-max-records"

<td align=left colspan=2$$logo</td
<td align=right<h3$$sitename</h3</td

<td align=right<b$$queryLabel</b</td
<td align=left <input name="NS-query" size=40 value="$$NS-display-query"</td

Each line contains standard HTML tags, and one or more variables with the $$ or $$NS- prefix. Examining each line more closely requires looking at the configuration files mentioned in Configuring Files Manually.

NS-max-records: Because this field is hidden, users cannot change this value, which defines how many matching documents to return at a time. In the advanced HTML query pattern file, NS-advquery.pat, this is a user-modifiable input field.

$$NS-max-records: The search generates a variable from this field that can be used in subsequent searches to calculate how many result records to display at a time. In the advanced query, this value could vary for each query.

$$logo: Defined in the userdefs.ini file. This could be any image or text the user wanted to display on the form.

$$sitename: Defined in the userdefs.ini file as the server's host name that is provided by the $$NS-host search macro.

$$queryLabel: Defined in the userdefs.ini file as a text label for the query input field. In this case, the label on the form is the word "For:"

NS-query: Defined in this pattern file as the name of the input field.
$$NS-display-query: Defined in the userdefs.ini file. The search generates a variable from this field that can be used in subsequent searches to determine which word or phrase to highlight when an entire matching document is displayed.

Search Function Syntax

The search function uses standard URL syntax with a series of name-value pairs for the search arguments. This is the basic syntax:

http://server_root/search?name=value[&name=value][&name=value]

As you use the HTML search query and results pages, you can see search functions and arguments displayed in the URL field of your browser. When entered directly into the URL field, these are sometimes called decorated URLs. They can also be embedded in your pattern files with the HREF tag.
You can create a complete search function as an HREF element within a pattern file. The example given is from the HTML-descriptions.pat file, which defines how collection information is displayed. The following lines produce a heading for each collection with the label ("Collection:"), and provide a link to the actual collection file through the collection's label (NS-collection-alias) defined in the dblist.ini file.

<td colspan=6<font size=+2<b$$collectionLabel</b
<a href=$$NS-server-url/search?NS-collection=$$NS-collection$$NS-collection-alias</a
</font</td

The HREF contains a complete search function by using the following elements:

$$NS-server-url: A search macro that determines the user's server URL.
/search: The search command itself.

?: The query string indicator. Everything after the ? is information used by the search function.

NS-collection=$$NS-collection: This uses the search macro $$NS-collection to define the collection's filename.
You can set up a search to use a variable conditionally; if there is no value associated with the variable, nothing will be displayed. The syntax is as follows:

variableName[conditionalized output]

For example, you could request that the document's title be output if it exists. If there is no title for this document, not even the label "Title:" is to be displayed. To do this, you might enter:

$$Title[<PTitle: <B$$Title</B]

URL Encodings

When you construct HTML instructions, whether in decorated URLs or within a pattern file, you need to follow the rules for URL encoding. Any character that might be misunderstood as part of a URL should be encoded with the format of %nn format, where nn is a hexadecimal code. Blanks are converted to the + symbol (plus sign) in queries or to %20 in output. The following table shows the most commonly used URL codes.

Table 12-6    Common URL Encodings

Character

Description

Code

Space

%20

;

Semicolon

%3B

/

Slash

%2F

?

Question mark

%3F

:

Colon

%3A

@

At sign

%40

=

Equal sign

%3D

&

Ampersand

%26

Required Search Arguments

Although you can customize almost every aspect of query and result pages, there are some arguments required for search functions to display the different types of search pages. These arguments are required whether the search function is in a decorated URL, or embedded as an HREF in a pattern file.
Search functions that display the search query page require these arguments:

Search query (the word, phrase, or attribute you want to search on)

Collection (can specify more than once for multiple-collection searches)
Search functions that display the search results page require these arguments:

NS-search-page=results (or r, in upper- or lowercase)

Collection (can be specified more than once for multiple-collection searches) search query
Search functions that display a highlighted document require these arguments:

NS-search-page=document (or d, in upper- or lowercase)

Document path

Collection (can be specified only once)

Search query (necessary if you want to highlight the query data)
Search functions that display the collection contents require only this argument:

NS-search-page=contents (or c, in upper- or lowercase)

Using Pattern Variables

Using pattern variables you can customize the search text interface. This eliminates the need to update the actual HTML pages as user requirements change. For example, if the interface has graphics or text elements that change periodically, you can define a pattern variable pointing to a pathname where that graphic or text is maintained and stored.
There are three categories of pattern variables:

Variables defined in the userdefs.ini file, to which are added a $$ prefix in decorated URLs and pattern files. For example, uidir, logo, and title become $$uidir, $$logo, and $$title.

Variables defined in the dblist.ini configuration files, having an NS- prefix when defined in the configuration file, and a $$NS- prefix when used in decorated URLs and pattern files. For example, NS-max-records, NS-doc-root, and NS-date-time become $$NS-max-records, $$NS-doc-root, and $$NS-date-time.

Search macros and variables generated by a pattern file, which always have a $$NS- prefix. For example, $$NS-host, $$NS-get-next, and $$NS-sort-by.

User-defined Pattern Variables

You can create any number of your own user-defined pattern variables in the user definitions file, userdefs.ini, or you can modify existing definitions. When one of these variables is used in a pattern file, the $$ prefix is added to it. Variable names can have up to 32 characters or digits, or combinations of both. Characters can be letters A-Z in upper or lower case, hyphens (-), and underscores (_). Names are case sensitive.
The default userdefs.ini file included with iPlanet Web Server contains the following variables that are used to define:

Search query pages labeled [query] in the file

Results listings labeled [toc]

Document display pages labeled [record]

Collection contents pages labeled [contents]
Each line begins with a variable name, and is followed by a definition for that variable. Many are labels for screen elements, some are paths to other files, and some have more complex contents. For example, the following lines are from the query section of that file.

[query]

NS-character-set=iso-8859-1

uidir = $$NS-server-url/search-ui

icondir = $$uidir/icons

l10nicondir = $$uidir/icons

htmldir = $$uidir/text

logo = <img src="$$icondir/magnifier.jpg" border=0 align=absmiddle><b><font size=+2>N</font><font size=+1>etscape </font><font size=+2>S</font><font size=+1>earch</font></b>

sitename = $$NS-host

help = /help/5search.htm

title = Sample Search Interface

searchButtonLabel = Search

searchNote = To search, choose a collection, then enter words and phrases, separated by commas<br>(e.g., search, jet engines, basketball).

advSearchNote = To search, choose collections, then enter words and phrases, separated by commas<br>(e.g., search, jet engines, basketball).<p>Sorting is done on any defined attributes. Use '-' to specify descending order sort<br>(e.g., Title,-Author,+Date)

queryLabel = For:

queryLabelSJIS = $$queryLabel

queryLabelEUC = $$queryLabel

queryLabelJIS7 = $$queryLabel

collectionLabel = Search in:

booleanLabel = Boolean

sortByLabel = Sort by:

sortByLabelSJIS = $sortByLabel

sortByLabelEUC = $sortByLabel

sortByLabelJIS7 = $sortByLabel

freetextLabel = Freetext (unavailable)

maxDocumentsLabel = Documents to return:

maxDocumentsLabelSJIS = $$maxDocumentsLabel

maxDocumentsLabelEUC = $$maxDocumentsLabel

maxDocumentsLabelJIS7 = $$maxDocumentsLabel

copyright = Copyright © 1997 Netscape Communications Corporation. All Rights Reserved.

advancedButtonLabel = Advanced Button Label

helpButtonLabel = Help Button Label

The file also includes references to search macros, such as $$NS-server-url, and can refer to other user-defined variables, as in the following lines:

uidir = $$NS-server-url/search-ui
icondir = $$uidir/icons

Search macros are described further in Macros and Generated Pattern Variables.
You can use any supported HTML character entity in your variable definitions. You can use entity names that are defined in the &name; format as well as those defined with the three-digit code in the &#nnn; format. In the userdefs.ini code sample, the entity   inserts a nonbreaking space, and © inserts a copyright symbol. Some of the more commonly used entities are in the following table:

Table 12-7    Common HTML character entities

Numeric code

Entity name

Description

 

Space

"

"

Quotation mark

$

$

Dollar sign

:

-

Colon

<

<

Less than

>

>

Greater than



-

Trademark symbol

 

 

Nonbreaking space

©

©

Copyright symbol

®

®

Registered trademark

Configuration File Variables

Some variables are defined in the system configuration and in the collection configuration files. These use a prefix of NS- in the configuration file to differentiate them from other markup tags in an HTML page. To use these variables as arguments to the search function, you add another prefix $$ to the variable, as in $$NS-date-time and $$NS-max-records.
Variables that define defaults for all searches on a server are defined in the system configuration files.

NS-max-records = 20
NS-query-pat = /text/NS-query.pat
NS-ms-tocstart = /text/HTML-tocstart.pat
NS-ms-tocend = /text/HTML-tocend.pat
NS-default-html-title = (Untitled)
NS-HTML-descriptions-pat = /text/HTML-descriptions.pat
NS-date-time = %b-%d-%y %H:%M

Although installations may vary depending on how each server is configured, the most commonly found variables are listed in the following table:

Table 12-8    Commonly found variables

Variable

Description

NS-default-html-title

The name given to HTML documents that do not contain a user-defined title. Typically set to "(Untitled)."

NS-date-time

The date and time format to use when displaying results.

NS-date-input-format

The format for inputting dates (the default is MMDDYY).

NS-HTML-descriptions-pat

The pattern file to use when displaying the contents of the collections.

NS-largest-set

The maximum number of records that can be handled as matching the search criteria. The records are displayed in groups of NS-max-records.

NS-max-records

The maximum size of the result set displayed at one time.

NS-ms-tocend

The pattern file to use for the footer at the bottom of the search results page when searching multiple collections.

NS-ms-tocstart

The pattern file to use for the header at the top of the search results page when searching multiple collections.

NS-query-pat

The query pattern file used when creating a query page.

NS-search-type

The type of search to perform. Only Boolean is permitted.

Collection-specific variables are defined in the dblist.ini file. Among the variables defined there are:

NS-doc-root = C:/iPlanet/Servers/docs
NS-url-base = /
NS-display-select = YES

The variables in your dblist.ini file may differ according to the type of collections you are using. Table 11.9 contains some of the more commonly found collection-specific variables.

Table 12-9    Commonly found variables in dblist.ini

Variable

Description

NS-collection-alias

The collection's label. Can be specified more then once to search multiple collections.

NS-doc-root

The root directory for the documents in the collection.

NS-display-select

This indicates whether the collection is displayed as part of the collection information listing, when NS-search-page=contents. The default is YES.

NS-highlight-start

Begin highlighting at this point in the displayed document. Typically this highlights the search query criteria.

NS-highlight-end

End highlighting at this point in the displayed document.

NS-language

The language of the documents in the collection.

NS-record-pat

The pattern file to use when displaying a highlighted document page.

NS-tocend-pat

The footer pattern file associated with a collection to be used when formatting the search results.

NS-tocrec-pat

The record pattern file associated with a collection to be used when formatting the search results.

NS-tocstart-pat

The header pattern file associated with a collection to be used when formatting the search results.

NS-url-base

The base URL used when constructing the link used to locate the file.

Macros and Generated Pattern Variables

There are some search macros that you can use in your pattern files or decorated URLs. The search function itself generates some pattern variables you can use in subsequent search requests to define how output is to be displayed. These macros and variables have a prefix of $$NS- to indicate their use.
For example, after doing an initial search query that results in 24 documents on the results page, you can reuse the search-generated $$NS-docs-matched, and the $$NS-doc-number variables to help define a document page displaying one of the documents in detail. In this way, you can tell the user that this document is number 3 of 24 documents returned for the original search.
The search macros and the generated variables that you can use in a subsequent pattern file or decorated URL are listed the following table:

Table 12-10    Macros and generated pattern variables

Variable

Description

$$NS-collection-list

An HTML multiple select list of all the collections in dblist.ini where NS-display-select is set to YES.

$$NS-collection-list-dropdown

An HTML drop-down list version of NS-collection-list.

$$NS-collections-searched

The number of collections searched for this request.

$$NS-display-query

The HTML-displayable version of the query that is generated for a results page.

$$NS-doc-href

The HTML HREF tag for the document. This provides a URL to the original source document. For email, this is in the form mailbox:/boxname?id-messageID and for news, it is in the form news:messageID.

$$NS-doc-name

The document's name.

$$NS-doc-number

The sequence number of the document in the results page list.

$$NS-doc-path

The absolute path to the document.

$$NS-doc-score

The ranked score of the document (ranges 0 to 100).

$$NS-doc-score-div10

The ranked score of the document (ranges 0 to 10).

$$NS-doc-score-div5

The ranked score of the document (ranges 0 to 5).

$$NS-doc-time

The creation time for a document in the results list. To obtain this value, you must set NS-use-system-stat = YES. By default it is set to NO, since system statistics are expensive.

$$NS-doc-size

The size of the document rounded to the nearest K. To obtain this value, you must set NS-use-system-stat = YES. By default it is set to NO, since system statistics are expensive.

$$NS-docs-found

The actual number of documents that the search engine found for this request.

$$NS-docs-matched

The number of documents returned from the search (up to NS-max-records) for this request.

$$NS-docs-searched

The number of documents searched through for this request.

$$NS-get-highlighted-doc

This provides the URL for a highlighted document in order to be able to display the document as HTML text with highlights.

$$NS-get-next

This variable gets the next set of search results to be displayed. The set is equal to NS-max-records and is positioned by using NS-search-offset.

$$NS-get-prev

This variable gets the previous set of search results that has been displayed. The set is equal to NS-max-records and is positioned by using NS-search-offset.

$$NS-host

The host name.

$$NS-insert-doc

A placeholder used in the NS-record-pat pattern files for HTML to indicate where the source document is to be inserted.

$$NS-rel-doc-name

The relative name of the document to display creating a document page.

$$NS-search-offset

The offset into the set of records returned as search results. Used to determine which set of records are displayed when you use NS-get-next and NS-get-prev.

$$NS-server-url

The URL for the server.

$$NS-sort-by

The sort sequence for the items on the results page. You can select one or more of the available attributes for the collection. The default is an ascending sort.

Format	Displayed result (example)
%a	Abbreviated week day (for example, Wed)
%A	Full week day (for example, Wednesday)
%b	Abbreviated month (for example, Oct)
%B	Full month (for example, October)
%c	Date and time formatted for current locale
%d	Day of the month as a decimal number (for example, 01-31)
%H	Hour as a decimal number, 24 hour military format (for example, 00-23)
%m	Month as a decimal number (for example, 01-12)
%M	Minute as a decimal number (for example, 00-59)
%x	Date
%X	Time
%y	Year without century (for example, 00-99)
%Y	Year with century (for example, 1999)

File format	Attribute	Type	Description
ASCII	(none)	-	-
HTML	Title	text	The user-defined title of the file.
	SourceType	text	The original format of the document.
NEWS	From	text	The source userID of the news item.
	Subject	text	The text from the subject field of the news item.
	Keywords	text	Any keywords defined for the news item
	Date	date	The date the news item was created.
EMAIL	From	text	The source userID of the email.
	To	text	The destination userID of the email.
	Subject	text	The text from the email's subject field.
	Date	date	The date the email was created.
PDF	InstanceID	text	An internal ID number.
	PermanentID	text	An internal ID number.
	NumPages	integer	The number of pages in the document.
	DirID	text	The directory where the PDF file exists.
	FTS_ModificationDate	date	The document's last modification date.
	FTS_CreationDate	date	The document's creation date.
	WXEVersion	integer	The version of Adobe Word Finder used to extract the text from the PDF document.
	FileName	text	The Adobe filename specification.
	FTS_Title	text	The document's title.
	FTS_Subject	text	The document's subject.
	FTS_Author	text	The document's author.
	FTS_Creator	text	The document's creator.
	FTS_Producer	text	The document's producer.
	FTS_Keywords	text	The document's keywords.
	PageMap	text	The page map, describing the word instances for the page.

Type of Search	Valid Operators	Examples
Finding documents by date or numeric value comparison.	greater than (>) greater than or equal to (>=) less than (<) less than or equal to (<=)	DATE >= 06-30-96 Finds documents created on or after June 30, 1996.
Finding words or phrases in specific document fields or in specific locations in the field.	<STARTS> <CONTAINS> <ENDS> is equal to (=)	Title <STARTS> Help Finds documents with titles that start with Help.
Finding two or more words in a document.	AND <NEAR/1>	specifications AND review Finds documents that contain both specifications and review.

Operator	Description	Examples
AND	Adds mandatory criteria to the search. Finds documents that have all of the specified words.	Antarctica AND mountain climb Finds only documents containing both Antarctica and mountain climb plus all the stemmed variants, such as mountain climbing.
<CONTAINS>	Finds documents containing the specified words in a document field. The words must be in the exact same sequential and contiguous order. You can use wildcards. Only alphanumeric values. Does not rank documents for relevance.	Title <CONTAINS> higher profit Finds documents containing the phrase higher profit in the title. Ignores documents with profits higher in the title.
<ENDS>	Finds documents in which a document field ends with a certain string of characters. Does not rank documents for relevance.	Title <ENDS> draft Finds documents with titles ending in draft.
equals (=)	Finds documents in which a document field matches a specific date or numeric value	Created = 6-30-96 Finds documents created on June 30, 1996.
greater than (>)	Finds documents in which a document field is greater than a specific date or numeric value.	Created > 6-30-96 Finds documents created after June 30, 1996.
greater than or equal to (>=)	Finds documents in which a document field is greater than or equal to a specific date or numeric value.	Created >= 6-30-96 Finds documents created on or after June 30, 1996.
less than (<)	Finds documents in which a document field is less than a specific date or numeric value.	Created < 6-30-96 Finds documents created before June 30, 1996.
less than or equal to (<=)	Finds documents in which a document field is less than or equal to a specific date or numeric value.	Created <= 6-30-96 Finds documents created on or before June 30, 1996.
<MATCHES>	Finds documents in which a string in a document field matches the character string you specify. Ignores documents that contain partial matches. Does not rank documents for relevance.	<MATCHES> employee Finds documents containing employee or any of its stemmed variants such as employees.
<NEAR>	Finds documents that contain the specified words. The closer the terms are to each other in the document, the higher the document's score.	stock <NEAR> purchase Finds any document containing both stock and purchase, but gives a higher score to a document that has stock purchase than to one that has purchase supplies and stock up.
<NEAR/N>	Finds documents in which two or more specified words are within N number of words from each other. N can be an integer up to 1000. Also ranks the documents for relevance based on the words' proximity to each other.	stock <NEAR/1> purchase Finds documents containing the phrases stock purchase and purchase stock. Ignores documents containing phrases like purchase supplies and stock up because stock and purchase do not appear next to each other. When N is 2 or greater, finds documents that contain the words within the range and gives a higher score for documents which have the words closer together.
NOT	Finds documents that do not contain a specific word or phrase. Note: You can use NOT to modify the OR or the AND operator.	surf AND NOT beach Finds documents containing the word surf but not the word beach.
OR	Adds optional criteria to the search. Finds any document that contains at least one of the search values.	apples OR oranges Finds documents containing either apples or oranges.
<PHRASE>	Finds documents that contain the specified phrase.A phrase is a grouping of two or more words that occur in a specific order.	<PHRASE> (rise "and" fall) Finds documents that include the entire phrase rise and fall. The and is in quotes to force the search to interpret it as a literal, not as an operator.
<STARTS>	Finds documents in which a document field starts with a certain string of characters. Does not rank documents for relevance.	Title <STARTS> Corp Finds documents with titles starting with Corp, such as Corporate and Corporation.
<STEM> (English only)	Finds documents that contain the specified word and its variants.	<STEM> plan Finds documents that contain plan, plans, planned, planning, and other variants with the same meaning stem. Ignores similarly spelled words such as planet and plane that don't come from the same stem.
<SUBSTRING>	Finds documents in which part or all of a string in a document field matches the character string you specify. Similar to <MATCHES>, but can match on a partial string. Does not work with wildcards. Does not rank documents for relevance.	<SUBSTRING> employ Finds documents that can match on all or part of employ, so it can succeed with ploy.
<WILDCARD>	Finds documents that contain the wildcard characters in the search string. You can use this to get words that have some similar spellings but which would not be found by stemming the word. Some characters, such as * and ?, automatically indicate a wildcard-based search, so you don't have to include the word <WILDCARD>.	<WILDCARD> plan* Finds documents that contain plan, plane, and planet as well as any word that begins with plan, such as planned, plans, and planetopolis. See the next section for more details and examples.
<WORD>	Finds documents that contain the specified word.	<WORD> theme Finds documents that contain theme, thematic, themes, and other words that stem from theme.

Character	Description
*	Specifies 0 or more alphanumeric characters. For example, air* finds documents that contain air, airline, and airhead. Cannot use this wildcard as the first character in an expression. This wildcard is ignored in a set of ([ ]) or in an alternative pattern ({ }). With this wildcard, the<WILDCARD> operator is implicit.
?	Specifies a single alphanumeric character, although you can use more than one ? to indicate multiple characters. For example, ?at finds documents that contain cat and hat, while ??at finds documents that contain that and chat. This wildcard is ignored in a set of ([ ]) or in an alternative pattern ({ }). With this wildcard, the<WILDCARD> operator is implicit.

Character	Description	Code
	Space	%20
;	Semicolon	%3B
/	Slash	%2F
?	Question mark	%3F
:	Colon	%3A
@	At sign	%40
=	Equal sign	%3D
&	Ampersand	%26

Numeric code	Entity name	Description
		Space
"	"	Quotation mark
$	$	Dollar sign
:	-	Colon
<	<	Less than
>	>	Greater than
	-	Trademark symbol
		Nonbreaking space
©	©	Copyright symbol
®	®	Registered trademark

Variable	Description
NS-default-html-title	The name given to HTML documents that do not contain a user-defined title. Typically set to "(Untitled)."
NS-date-time	The date and time format to use when displaying results.
NS-date-input-format	The format for inputting dates (the default is MMDDYY).
NS-HTML-descriptions-pat	The pattern file to use when displaying the contents of the collections.
NS-largest-set	The maximum number of records that can be handled as matching the search criteria. The records are displayed in groups of NS-max-records.
NS-max-records	The maximum size of the result set displayed at one time.
NS-ms-tocend	The pattern file to use for the footer at the bottom of the search results page when searching multiple collections.
NS-ms-tocstart	The pattern file to use for the header at the top of the search results page when searching multiple collections.
NS-query-pat	The query pattern file used when creating a query page.
NS-search-type	The type of search to perform. Only Boolean is permitted.

Variable	Description
NS-collection-alias	The collection's label. Can be specified more then once to search multiple collections.
NS-doc-root	The root directory for the documents in the collection.
NS-display-select	This indicates whether the collection is displayed as part of the collection information listing, when NS-search-page=contents. The default is YES.
NS-highlight-start	Begin highlighting at this point in the displayed document. Typically this highlights the search query criteria.
NS-highlight-end	End highlighting at this point in the displayed document.
NS-language	The language of the documents in the collection.
NS-record-pat	The pattern file to use when displaying a highlighted document page.
NS-tocend-pat	The footer pattern file associated with a collection to be used when formatting the search results.
NS-tocrec-pat	The record pattern file associated with a collection to be used when formatting the search results.
NS-tocstart-pat	The header pattern file associated with a collection to be used when formatting the search results.
NS-url-base	The base URL used when constructing the link used to locate the file.

Variable	Description
$$NS-collection-list	An HTML multiple select list of all the collections in dblist.ini where NS-display-select is set to YES.
$$NS-collection-list-dropdown	An HTML drop-down list version of NS-collection-list.
$$NS-collections-searched	The number of collections searched for this request.
$$NS-display-query	The HTML-displayable version of the query that is generated for a results page.
$$NS-doc-href	The HTML HREF tag for the document. This provides a URL to the original source document. For email, this is in the form mailbox:/boxname?id-messageID and for news, it is in the form news:messageID.
$$NS-doc-name	The document's name.
$$NS-doc-number	The sequence number of the document in the results page list.
$$NS-doc-path	The absolute path to the document.
$$NS-doc-score	The ranked score of the document (ranges 0 to 100).
$$NS-doc-score-div10	The ranked score of the document (ranges 0 to 10).
$$NS-doc-score-div5	The ranked score of the document (ranges 0 to 5).
$$NS-doc-time	The creation time for a document in the results list. To obtain this value, you must set NS-use-system-stat = YES. By default it is set to NO, since system statistics are expensive.
$$NS-doc-size	The size of the document rounded to the nearest K. To obtain this value, you must set NS-use-system-stat = YES. By default it is set to NO, since system statistics are expensive.
$$NS-docs-found	The actual number of documents that the search engine found for this request.
$$NS-docs-matched	The number of documents returned from the search (up to NS-max-records) for this request.
$$NS-docs-searched	The number of documents searched through for this request.
$$NS-get-highlighted-doc	This provides the URL for a highlighted document in order to be able to display the document as HTML text with highlights.
$$NS-get-next	This variable gets the next set of search results to be displayed. The set is equal to NS-max-records and is positioned by using NS-search-offset.
$$NS-get-prev	This variable gets the previous set of search results that has been displayed. The set is equal to NS-max-records and is positioned by using NS-search-offset.
$$NS-host	The host name.
$$NS-insert-doc	A placeholder used in the NS-record-pat pattern files for HTML to indicate where the source document is to be inserted.
$$NS-rel-doc-name	The relative name of the document to display creating a document page.
$$NS-search-offset	The offset into the set of records returned as search results. Used to determine which set of records are displayed when you use NS-get-next and NS-get-prev.
$$NS-server-url	The URL for the server.
$$NS-sort-by	The sort sequence for the items on the results page. You can select one or more of the available attributes for the collection. The default is an ascending sort.

Previous Contents Index DocHome Next
Copyright © 2001 Sun Microsystems, Inc. Some preexisting portions Copyright © 2001 Netscape Communications Corp. All rights reserved.

Last Updated May 15, 2001

Note	By default, URLs that are redirected are always escaped. To prevent this, add escape="no". For example:
	NameTrans fn="redirect" from="/foobar" url-prefix="index.html" escape="no"