Managing Stoplists
Every Ultra Search instance has a stoplist associated with it. A stoplist
is a list of words that are to be ignored during the indexing
process. These words are known as Stopwords. Stopwords are not
indexed because they are deemed not useful or even disruptive
to the performance and accuracy of indexing.
Default Ultra Search Stoplist
During the installation process, a default stoplist is created
for the Ultra Search product. Subsequently, when an Ultra Search instance
is created, a copy of the default stoplist will be created for
the Ultra Search instance.
The default stoplist is created under the WKSYS schema. The default
stoplist name is "wk_stoplist". (For your information,
this list is defined in the file $ORACLE_HOME/ultrasearch/admin/wk0pref.sql
which is run at the time of installation).
You can modify the default stoplist by adding or removing Stopwords
from it. However, remember that these modifications will not affect
existing Ultra Search instances. They will only affect Ultra Search instances
that are created after the modifications are made.
Modifying Instance Stoplists Before Initial Crawling
Modifying instance stoplists should be done as a last resort.
The preferred method is to do one of the following:
- Modify the default stoplist before creating the instance.
- Replace the instance stoplist immediately after creating the
instance.
Modifications made to the default stoplist will be reflected
in all other instance stoplists created after the time of modification.
Replacing the instance stoplist immediately after creating the
instance affects only that instance. You will first need to create
a user-defined stoplist.
In both cases above, the result is that the Ultra Search instance
stoplist is modified and defined before initial crawling. This
means that all documents collected by the Ultra Search Crawler will
be evaluated against the correct stoplist. It is important to
modify the stoplist before initial crawling to avoid having to
recrawl all documents again.
Modifying instance stoplists after initial crawling
If necessary, you may alter an instance stoplist after initial
crawling. You can choose one of the following methods:
- Add Stopwords to the instance stoplist.
- Define a new stoplist and replace the instance stoplist with
the new stoplist.
Choosing to Add Stopwords to the instance stoplist will not affect
any documents already crawled or indexed. This
operation is not an expensive operation.
Defining a new stoplist and replacing the instance stoplist with
it will invalidate the entire index. If you choose
this method, you must force the Ultra Search Crawler to
recrawl all documents in the index. You can do this by
selecting the "Process all documents" radio button in
the Edit Schedule page.
This is a very expensive operation. Therefore,
this option should be the last resort.
Instructions on modifying instance stoplists before initial
crawling
(1) Modifying the default stoplist before
creating the instance
To add the Stopword "web" to the default stoplist,
login as user WKSYS through SQL*Plus and issue the following
command:
exec ctx_ddl.add_stopword('wk_stoplist','web');
To remove the Stopword "web" from the default stoplist,
login as user WKSYS through SQL*Plus and issue the following
command:
exec ctx_ddl.remove_stopword('wk_stoplist','web');
Subsequently, the stoplists of all new instances will reflect
the modifications made to the default stoplist.
(2) Replace the instance stoplist immediately
after creating the instance
First, you must create a new user-defined stoplist. To do so,
login as the owner of the instance through SQL*Plus. Issue
the following commands:
begin
ctx_ddl.create_stoplist('example_stoplist');
ctx_ddl.add_stopword('example_stoplist','example_stopword');
... (add more stopwords by repeated the previous
line with new stopwords) ...
end;
/
To replace an instance stoplist with this new stoplist, login
as the owner of the instance through SQL*Plus and issue the following
command:
ALTER INDEX wk$doc_path_idx rebuild parameters('replace
stoplist example_stoplist');
Instructions on modifying instance stoplists after
initial crawling
(1) Add Stopwords to the instance stoplist
To add the Stopword "web" to the instance stoplist,
login as the owner of the instance through SQL*Plus and
issue the following command:
alter index wk$doc_path_idx rebuild parameters('add stopword
web');
(2) Replace the instance stoplist after
initial crawling
The method for replacing the instance stoplist after initial
crawling is no different from replacing it
before initial crawling. Remember that this is a very
expensive operation as it entails recrawling of all documents.
Remember also that if you choose this method, you must force the Ultra Search Crawler to
recrawl all documents in the index. Therefore,
this method should be the last resort.
|