Sun Java System Web Server 7.0 Update 7 Administrator's Guide

Configuring Search Collections

Searches require a database of searchable data against which users will search. Server administrators create this database, called a collection, which indexes and stores information about documents on the server. Once the server administrator indexes all or some of a server’s documents, information such as title, creation date, and author is available for searching.


Note –

About Search Collections:


Supported Formats

Files of the following format can be indexed and searched.

  1. HTML documents — .html and .htm

  2. ASCII Plain Text — .txt

  3. PDF.

Unsupported Formats

Following file formats are not supported for searching and indexing.

  1. MSWORD.

  2. MSEXCEL.

  3. MSPOWERPOINT.

However customers can still download the relevant filter plugins from nutch 1.0 and place them in appropriate directories. But this is not supported.

Here are the steps:

  1. Download parse-msexcel, parse-msword,parse-mspowerpoint copy into <INSTALL_ROOT>/lib/htmlconvert/plugins directory and update the htmlconvert/conf/nutch-site.xml to include these plugins.

    <value>parse-(text|html|pdf|oo|msword|mspowerpoint|msexcel)</value>

  2. Download lib-jakarta-poi from nutch1.0 and replace the existing libraries.

Adding a Search Collection

To add a new collection, perform the following tasks:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Under Search Collections, click Add Search Collection to add a new search collection.

The following section describes the fields in the page for creating a new search collection:

  1. Provide Search Collection Information

    1. Collection Name — Enter a unique name for the search collection.


      Note –

      Multi byte characters are not allowed as collection name.


    2. Display Name — (Optional) The display name will appear as the collection name in the search query page. If you do not specify a display name, the collection name serves as the display name.

    3. Description — (Optional) Enter text that describes the new collection.

    4. Path — You can either create the collection in the default location or provide a valid path, where the collection will be stored.

  2. Provide Indexing Information

    1. Directory to Index — Enter the directory from which documents will be indexed into the collection. Only the directories visible from this virtual server can be indexed.

    2. Sub Directory— Enter the sub directory from which documents will be indexed into the collection. Sub directory path should be relative to the directory path specified earlier.

    3. Pattern — Specify a wildcard to select the files to be indexed.

      Use the wildcard pattern judiciously to ensure that only specific files are indexed. For example, specifying *.* might cause even executable and Perl scripts to be indexed.

    4. Subdirectories— Enabled/Disabled. Default value is Enabled. If you enable this option, documents within the subdirectories of the selected directory will also be indexed.

    5. Default Document Encoding

      Documents in a collections are not restricted to a single language/encoding. Every time documents are added, only a single encoding can be specified. However, the next time you add documents to the collection, you can select a different default encoding.

  3. Step 3: View the Summary

    1. View the summary and click Finish to add the new collection.


Note –

Using CLI

To add a search collection through CLI, execute the following command.


wadm> create-search-collection --user=admin --password-file=admin.pwd 
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 --uri=/search_config1 
--document-root=../docs searchcoll

See CLI Reference, create-search-collection(1).


Deleting a Search Collection

To delete a search collection, perform the following tasks:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Under Search Collections, select the collection name and click Delete to delete the collection.


Note –

Using CLI

To delete a search collection through CLI, execute the following command.


wadm> delete-search-collection --user=admin --password-file=admin.pwd 
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 searchcoll

See CLI Reference, delete-search-collection(1).