Sun Java System Web Server 6.1 SP9 Administrator's Guide

About Search Collections

Searches require a database of searchable data against which users will search. Server administrators create this database, called a collection, which indexes and stores information about documents on the server. Once the server administrator indexes all or some of server’s documents, information such as title, creation date, and author is available for searching.

Please note the following about collections:

This section includes the following topics:

Creating a Collection

Collections are created and managed from the administrative interface. You create a new collection by specifying the documents to be indexed.

To create a new collection, perform the following steps

ProcedureTo create a new collection

  1. Select the virtual server in which you want to create a collection, and click the Manage button.

  2. Select the Search tab and then click the Create Collection link.

  3. Enter the following information:

    • Directory to Index: From the drop-down list, select the directory from which documents will be indexed into the collection. Only the directories visible from this virtual server will be listed.

      To view the contents of the directory, click View. If the selected directory has subdirectories, these are listed out in the “View directory_name” page. To select a directory to index, click index. To view a directory, click on the folder.

      In order to add a directory to the list of indexable directories, you must first create an additional document directory. For more information, see Setting Additional Document Directories.

    • Collection Name: Enter a name for the collection.

    • Display Name: (Optional) This will appear as the collection name in the search query page. If you don’t specify a display name, the collection name serves as the display name.

    • Description: (Optional) Enter text that describes the new collection.

    • Include Subdirectories? If you select No, documents within the subdirectories of the selected directory will not be indexed. The default is Yes.

    • Pattern: Specify a wildcard to select the files to be indexed. For more information on wildcards, see Wildcards Used in the Resource Picker.


      Caution – Caution –

      Use the wildcard pattern judiciously to ensure that only specific files are indexed. For example, specifying *.* might cause even executables and perl scripts to be indexed.


    • Default Encoding: Specify the character encoding for the documents to be indexed. The default is “ISO-8859-1.” The indexing engine tries to determine the encoding of HTML documents from the embedded meta tag. If this is not specified, the default encoding is used.

      Documents in a collections are not restricted to a single language/encoding. Every time documents are added, only a single encoding can be specified; however, the next time you add documents to the collection, you can select a different default encoding.

  4. Click OK.

    This creates a new collection by the specified name in the following location:

    <instance-root>/collections/<vs-id>/<collection-name>

    It also creates an appropriate SEARCHCOLLECTION entry in the server.xml file.

Configuring a Collection

After a collection has been created, you can modify some of its settings. These settings are stored in the server.xml file. When you reconfigure a collection, the server.xml file is updated to reflect your changes.

You should avoid making unnecessary changes to collection settings.

ProcedureTo reconfigure an existing collection

  1. Select the virtual server that contains the collection you want to configure, and click the Manage button.

  2. Select the Search tab and then click the Configure Collection link.

  3. From the Collection drop-down list, select the collection you want to configure and click Go.

  4. You can edit the following information for the collection you selected:

    • Display name: (Optional) This will appear as the new collection name in the search query page.

    • Description: (Optional) Edit the text description of the collection.

    • Document URI: Edit the URI for the document root for the search collection.


      Note –

      Do not change the Document URI unless you have changed the URI mapping for the document root from the Additional Document Directories page. For more information, see Setting Additional Document Directories.


    • Enabled: Select Yes to enable. If you select No, the collection will not appear on the search query page.

  5. Click OK

    This reconfigures the collection and modifies the appropriate SEARCHCOLLECTION entry in the server.xml file.

Updating a Collection

You can add or remove files after a collection has been created. Documents can be added only from under the directory that was specified during collection creation. If you are removing documents, only the entries for the files and their metadata are removed from the collection. The actual files themselves are not removed from the file system.

To update a collection, perform the following steps.

ProcedureTo update a collection

  1. Select the virtual server that contains the collection you want to update, and click the Manage button.

  2. Select the Search tab and then click the Update Collection link.

  3. From the Collection drop-down list, select the collection you want to update.

  4. Docs

  5. You can update the following information for the collection you selected:

    • Include subdirectories? If you select No, documents within the subdirectories of the selected directory will not be indexed. The default is Yes.


    Note –

    Include Subdirectories? has a bearing on only adding documents.



    Caution – Caution –

    While adding documents, use the wildcard pattern judiciously to ensure that only specific files are indexed. For example, specifying *.* might cause even executables and perl scripts to be indexed.


    • Default Encoding. Specify the character encoding for the documents to be indexed. The default is “ISO-8859-1.” The indexing engine tries to determine the encoding of HTML documents from the embedded meta tag. If this is not specified, the default encoding is used.

      Documents in a collections are not restricted to a single language/encoding. Every time documents are added, only a single encoding can be specified; however, the next time you add documents to the collection, you can select a different default encoding.

  6. Click Add Documents to add documents to the index, or Remove Documents to remove the appropriate index entries.


    Note –

    You can add documents only if they are located in the directory you specified when you created the collection.


Removing a Collection

You can remove a collection after it has been created. When a collection is deleted, it is no longer visible to users on the search query page, and all configuration and index files associated with the collection are deleted. The actual documents that formed the collection are not deleted from the file system, just their index entries in the collection are deleted.

To remove a collection, perform the following steps.

ProcedureTo remove a collection

  1. Select the virtual server that contains the collection you want to remove, and click the Manage button.

  2. Select the Search tab and then click the Maintain Collection link.

  3. From the Collection drop-down list, select the collection you want to remove.

  4. Click the Remove Collection button.


    Note –

    When a collection is removed, the maintenance scheduled for the collection is also removed. For information about scheduled maintenace, see Adding Scheduled Collection Maintenance.



    Note –

    Do not use your local file manager to remove collections because doing so will not update the corresponding configuration files.


Maintaining a Collection

Periodically, you may want to maintain your collections. These tasks may not be necessary unless you index and update collections frequently. You can:

Reindexing a Collection

You can reindex a collection after it has been created. If any documents are modified after the collection was created, the collection is reindexed. Reindexing a collection does not index any new content into the collection, but rather updates the existing contents of the collection. If index entries exist for documents that are no longer present in the server file system, those entries will be removed.

To reindex a collection, perform the following steps.

ProcedureTo reindex a collection

  1. Select the virtual server that contains the collection you want to reindex, and click the Manage button.

  2. Select the Search tab and then click the Maintain Collection link.

  3. From the Collection drop-down list, select the collection you want to reindex.

  4. Click the Reindex button.

Adding Scheduled Collection Maintenance

You can schedule maintenance tasks to be performed on collections at regular intervals. The tasks that can be scheduled are reindexing and updating. The administrative interface is used to schedule the tasks for a specific collection. You can specify the:

To add regular maintenance of a collection, perform the following steps

ProcedureTo add regular maintenance of a collection

  1. Select the collection you want to schedule maintenance for and click the Add Scheduled Maintenance link.

  2. Enter the following information:

    • Task. Select the task you want to automate. The choices are reindex and update.

      If you select Update, you must enter the following information:

      • Recurse Subdirectories? If you select No, documents within the subdirectoriesof the selected directory will not be indexed. The default is Yes.

      • Pattern. Specify a wildcard to select the files to be indexed. For more information on wildcards, see Wildcards Used in the Resource Picker.


      Caution – Caution –

      Use the wildcard pattern judiciously to ensure that only specific files are indexed. For example, specifying *.* might cause even executables and perl scripts to be indexed.


      • Default Encoding. Specify the character encoding for the documents to be indexed. The default is “ISO-8859-1.” The indexing engine tries to determine the encoding of HTML documents from the embedded meta tag. If this is not specified, the default encoding is used.

        Documents in a collections are not restricted to a single language/encoding. Every time documents are added, only a single encoding can be specified; however, the next time you add documents to the collection, you can select a different default encoding.

      • Scheduled Time. (Required) Specify the time of day, in the HH:MM format, when you want the scheduled maintenance to run. For example, you might want to scheduled maintenance to run at the end of the day when it is likely that the documents in the collection have been modified.

      • Schedule day(s) of week. (Required) Check one or more of the checkboxes to specify the day or days of the week when the scheduled maintenance will run.

  3. Click OK.


    Note –

    UNIX/Linux users must restart the cron control process after adding scheduled maintenance, in order for their changes to take effect.


Editing Scheduled Collection Maintenance

If your requirements change, you can change the properties of the scheduled maintenance for a collection. You might for example, decide to reschedule maintenance keeping in mind the time when your site is most likely to be updated.

To change the scheduled maintenance for a collection, perform the following steps.

ProcedureTo change the scheduled maintenance for a collection

  1. From the Collection drop-down list, select the collection for which you want to reschedule maintenance.

  2. Select the task you want to reconfigure, and enter the necessary information. For more details, see the Edit Scheduled Collection page in the online help.

  3. Click OK.


    Note –

    When a collection is removed, the maintenance scheduled for the collection is also removed.



    Note –

    UNIX/Linux users must restart the cron control process after reconfiguring scheduled maintenance, in order for their changes to take effect.


Removing Scheduled Collection Maintenance

You can cancel scheduled maintenance of a collection if it is no longer needed.

To cancel scheduled maintenance, perform the following steps.

ProcedureTo cancel scheduled maintenance

  1. From the Collection drop-down list, select the collection for which you want to remove maintenance.

  2. Select the task you want to for which you want to remove scheduled maintenance: Reindex or Update. If a task is scheduled the details are now displayed.

  3. For an Update task, check the Delete checkbox next to the task you want to remove.

  4. Click OK.


    Note –

    UNIX/Linux users must restart the cron control process after removing scheduled maintenance, in order for their changes to take effect.