Siebel Search Administration Guide > Administration of Siebel Search > Index Settings >

Hummingbird Stop File Setting


This setting specifies an operating system file that contains a list of words not to be indexed. Typically, these are words with little semantic value. Use of the Stop File can significantly reduce the size of indices by removing words that are not useful for searching. For example, prepositions and articles can be safely removed from indices in most cases.

The stop file is assumed to be in the directory where the table configuration is created unless the stop file name is a fully qualified path name. The default value, an empty string, specifies that no stop file is used. In this case, SearchServer provides a stop file called FULTEXT.STP, which can be used by explicitly specifying it in this parameter. The default stop file contains the following words:

after, also, an and, as, at, be, because, before, between, but, by, for, from, however, if, in into, of, or, other, out, since, such, than, that, the, there, these, this, those, to, under, upon, when, where, whether, which, with, within, without.

The stop file can contain a maximum of 1,024 stop words totaling not more than 10,000 characters. The stop file is a text file that can be edited in Notepad, or any other plain-text editor.

To customize the stop file, open it in a text editor, directly modify it, then save it using the same name.

CAUTION:  If you choose to customize the stop file after you have created an index, you must regenerate all indices associated with that particular stop file. Also, if you are supporting Mobile Client searching, you need to remove the absolute path and leave only the stop file name fultext.stp, and make sure that the stop file is in the index directory (index directory/siebelroot/search/{datasource}/index/).

Usually, you specify a stop file that is appropriate for the language of the documents you are indexing. SearchServer provides several stop files which are listed in Table 23.

Table 23. Hummingbird SearchServer Stop Files
SearchServer Stop File
Explanation
csource.stp
Used with the C-language Source Code text reader.
fulfra.stp
Uses the multi-lingual unicode parser with default options, and contains French-language stop words.
fultext.stp
Uses the multi-lingual unicode parser with default options, and contains English-language stop words.
ixkor.stp
Uses the InXight-based ixasian parser for Korean-language text.
ixjap.stp
Uses the InXight-based ixasian parser for Japanese-language text.
ixschi.stp
Uses the InXight-based ixasian parser for simplified Chinese-language text.
ixchi
Uses the InXight-based ixasian parser for traditional Chinese-language text.
japan.stp
Included to support old collections. fultext.stp is used to support Japanese-language text. This file is empty.
korean.stp
Used for Korean-language text using n-grams. The Unicode parser is used with k=1 set to map han characters to hangul.
wspprox.stp
Used when indexing with support for Word, Sentence, and Paragraph Proximity.

Example

Property: Stop File

Value: C:\Program Files\HUMMINGBIRD\fultext\fultext.stp

To change the stop file location under Windows

  1. Navigate to Search Administration > Index Settings.
  2. In the Index Setting Properties list, select Stopfile, and then click in the Value column to make the row active.
  3. Change the value to match your stop file path, and then click Save.

CAUTION:  UNIX users: The provided sample database has a default stop file located at the following path:
"C:\PROGRAM FILES\HUMMINGBIRD\FULTEXT\FULTEXT.STP". This path is invalid with a UNIX system. You must change the stop file location to a path similar to the following example: /export/home/hummingbird/fultext/fultext.stp.


 Siebel Search Administration Guide, Version 7.5, Rev A 
 Published: 18 April 2003