Sun Java System Web Server 7.0 Update 4 Administrator's Guide

Chapter 12 Working With Search Collections

The server includes a search feature that enables users to search documents on the server and display results on a web page. Server administrators create the indexes of documents against which users will search (called collections), and can customize the search interface to meet the needs of their users.

For more information on querying the search collections, refer to the search online help.

About Search

The search feature is installed with other web components during the installation of Sun Java System Web Server. Search is configured and managed at the virtual server level instead of the server instance level

From the Search tab under the Virtual Servers tab in the administration console, you can:

Information obtained from the administrative interface is stored in the <server-root>/config/server.xml file, where it is mapped within the VS element.

Server administrators can customize the search query and search results pages. Customization can include re-branding the pages with a corporate logo, or changing the way search results appear. In previous releases customization was accomplished through the use of pattern files.

There is no global “on” or “off” functionality for search. Instead, a default search web application is provided and then enabled or disabled on a specific virtual server. This search application provides the basic web pages used to query collections and view results. The search application includes sample JSPs that demonstrate how to use the search tag libraries to build customized search interfaces.


Note –

The Sun Java System Web Server does not provide access checking on search results. Due to the number of potential security models and realms, it is impossible to perform security checks and filter results from within the search application. It is the responsibility of the server administrator to ensure that appropriate security mechanisms are in place to protect content.


Configuring Search Properties

Search is enabled for a virtual server by enabling the search application included on the server.


Note –

The Java web container must be enabled for search to be enabled.


After ensuring that Java is enabled for the virtual server you want to configure, enable search by performing the following steps:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Under Search Application, select theEnabled checkbox to enable the search application.

Other parameters which you can configure are listed below:


Note –

Using CLI

To set search properties through CLI, perform the following command in CLI:


wadm> set-search-prop --user=admin --password-file=admin.pwd --host=serverhost
--port=8888 --no-ssl --rcfile=null --config=config1 --vs=config1_vs_1
enabled=true max-hits=1200

See CLI Reference, set-search-prop(1).


Configuring Search Collections

Searches require a database of searchable data against which users will search. Server administrators create this database, called a collection, which indexes and stores information about documents on the server. Once the server administrator indexes all or some of a server’s documents, information such as title, creation date, and author is available for searching.


Note –

About Search Collections:


Supported Formats

Files of the following format can be indexed and searched.

  1. HTML documents — .html and .htm

  2. ASCII Plain Text — .txt

  3. PDF.

Adding a Search Collection

To add a new collection, perform the following tasks:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Under Search Collections, click Add Search Collection to add a new search collection.

The following section describes the fields in the page for creating a new search collection:

  1. Provide Search Collection Information

    1. Collection Name — Enter a unique name for the search collection.


      Note –

      Multi byte characters are not allowed as collection name.


    2. Display Name — (Optional) The display name will appear as the collection name in the search query page. If you do not specify a display name, the collection name serves as the display name.

    3. Description — (Optional) Enter text that describes the new collection.

    4. Path — You can either create the collection in the default location or provide a valid path, where the collection will be stored.

  2. Provide Indexing Information

    1. Directory to Index — Enter the directory from which documents will be indexed into the collection. Only the directories visible from this virtual server can be indexed.

    2. Sub Directory— Enter the sub directory from which documents will be indexed into the collection. Sub directory path should be relative to the directory path specified earlier.

    3. Pattern — Specify a wildcard to select the files to be indexed.

      Use the wildcard pattern judiciously to ensure that only specific files are indexed. For example, specifying *.* might cause even executable and Perl scripts to be indexed.

    4. Subdirectories— Enabled/Disabled. Default value is Enabled. If you enable this option, documents within the subdirectories of the selected directory will also be indexed.

    5. Default Document Encoding

      Documents in a collections are not restricted to a single language/encoding. Every time documents are added, only a single encoding can be specified. However, the next time you add documents to the collection, you can select a different default encoding.

  3. Step 3: View the Summary

    1. View the summary and click Finish to add the new collection.


Note –

Using CLI

To add a search collection through CLI, execute the following command.


wadm> create-search-collection --user=admin --password-file=admin.pwd 
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 --uri=/search_config1 
--document-root=../docs searchcoll

See CLI Reference, create-search-collection(1).


Deleting a Search Collection

To delete a search collection, perform the following tasks:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Under Search Collections, select the collection name and click Delete to delete the collection.


Note –

Using CLI

To delete a search collection through CLI, execute the following command.


wadm> delete-search-collection --user=admin --password-file=admin.pwd 
--host=serverhost --port=8989 --config=config1 --vs=config1_vs_1 searchcoll

See CLI Reference, delete-search-collection(1).


Scheduling Collection Update

You can schedule maintenance tasks to be performed on collections at regular intervals. The tasks that can be scheduled are re-indexing and updating. The administrative interface is used to schedule the tasks for a specific collection. You can specify the:

To schedule events for the collection, perform the following tasks:

  1. Click the Configurations tab.

  2. Select the configuration from the configuration list.

  3. Click the Virtual Servers tab.

  4. Select the virtual server from the virtual server list.

  5. Click the Search tab.

  6. Click the Scheduled Events tab.

  7. Under Search Events tab, click New.

The following table describes the fields in the New Search Event Schedule page:

Table 12–1 Field Description > New Search Event Schedule

Field

Description

Collection

Select the collection from the drop-down list for which you need to schedule maintenance. 

Event

  1. Re-index Collection—This scheduled event will re-index the specified collection at the specified time.

  2. Update Collection—You can add or remove files after a collection has been created. Documents can be added only from under the directory that was specified during collection creation. If you are removing documents, only the entries for the files and their metadata are removed from the collection. The actual files themselves are not removed from the file system. This scheduled event will update the collection at the specified time.

  3. Pattern—Specify a wildcard to select the files to be indexed.

  4. Subdirectories included—If you select this option, documents within the subdirectories of the selected directory will also be indexed. This is the default action.

  5. Encoding—Specify the character encoding for the documents to be indexed. The default is ISO-8859-1. The indexing engine tries to determine the encoding of HTML documents from the embedded meta tag. If this is not specified, the default encoding is used.

Time

The configured time when the event will start. Select the hour and minutes value from the drop-down box. 

Every Day — Starts the event specified every day at the specified time.

Specific Days — Starts the event specified at specific days.

1. Days — Specify any day from Sunday to Saturday.

2. Dates — Specify any day of the month from 1 to 31 as comma separated entries. E.g. 4,23,9

Specific Months — Starts the event specified at the specific time and month. Specify month from January to December.

Interval

Start the specified event after this time period. 

1. Every Hours — Select the number of hours from the drop-down box.

2. Every Seconds — Select the number of seconds from the drop-down box.

   

Performing a Search

Users are primarily concerned with asking questions of the data in the search collections, and getting a list of documents in return. The search web application installed with the Sun Java System Web Server provides default search query and search results pages. These pages can be used as they are, or customized using a set of JSP tags as described in Customizing Search Pages.

Users search against collections that have been created by the server administrator. They can:

Server administrators must provide users with the URL needed to access the search query page for a virtual server.

The Search Page

The default URL end-users can use to access search functionality is:

http://<server-instance>:port number/search

Example:

http://plaza:8080/search

When the end-user invokes this URL, the Search page, which is a Java web application, is launched.


Note –

For more detailed information about conducting basic and advanced searches, including information about keywords and optional query operators, see the online Help provided with the search engine. To access this information, click the Help link on the Search page.


Making a Query

A search query page is used to search against a collection. Users input a set of keywords and optional query operators, and then receive results on a web page displayed in their browser. The results page contains links to documents on the server that match the search criteria.


Note –

Server administrators can customize this search query page, as described in “Customizing Search Pages.”


To make a query, perform the following steps:

ProcedureMaking a Query

  1. Access the Search web application by entering its URL in the Location bar of your browser, in the following format:

    http://<server-instance>:port number/search

  2. In the search query page that appears, select the checkbox representing the collection you want to search in the "Search in" field.

  3. Type in a few words that describe your query and hit Enter, or click Search) for a list of relevant web pages.

    For a more fine-tuned search, you can use the search parameters provided in the Advanced Search page described in the following section.

Advanced Search

Users can increase the accuracy of their searches by adding operators that fine-tune their keywords. These options can be selected from the Advanced Search page.

To make an advanced search query, perform the following steps:

ProcedureTo Make an Advanced Search Query

  1. Access the Search web application by entering its URL in the Location bar of your browser, in the following format:

    http://<server-instance>:port number/search

  2. ClickAdvanced link.

  3. Enter any or all of the following information:

    • Search in—Select the collection you want to search.

    • Find—Four options are supported:

      • All of the words-Finds pages that include all the key words specified in Find.

      • Any of the words-Finds pages that include any of the key words specified in Find.

      • The exact phrase-Finds pages that match the exact phrase used in Find.

      • Passage search-Highlights the passage containing the keyword or words in the retrieved pages.

    • Without the words—The search will exclude Web pages that contain the specified words.

    • Title “does/does not“ contain—Restrict the search to pages with titles that include the specified key words.

    • Since—Restrict the search operation to Web pages indexed in the selected time period.

Document Field

The Sun Java System Web Server maintains an index of documents. The index contains an entry for each document. Each index entry contains one or more fields such as Title, Author, and URL. Queries can be limited to specific document fields, and documents are only found if they match your criteria in the specified fields.

For example, if you simply search for Einstein, you will find all documents that have the word Einstein in any one of the Title, Author, or Keywords fields. This will include documents about Einstein, documents that make reference to Einstein, and documents written by Einstein. But if you specify Author = "Albert Einstein" , you will only find documents written by Albert Einstein.

By default, the index fields that you can search are:

  1. Author — The author, authors, or organization that created the document as specified with an <author> meta tag.

  2. Keywords — The keywords as specified with a <keywords> meta tag.

  3. Date — The date that this document was last edited or modified.

  4. Title — The document's title as specified with the HTML <title> tag.

PDF files contain FTS information about the author, title and subject. To search in PDF files for these information, you can construct a query like <title> contains Java, <subject> contains web server.

Search Query Operators

For a detailed list of search query operators, refer to the Administration Console Search Online Help.

Viewing Search Results

Search results are displayed in the user’s browser on a web page that contains HTML hyperlinks to documents on the server that match the search criteria. Each page displays 10 records (hits) by default, which are sorted in descending order based on relevance. Each record lists information such as file name, size, date of creation. The matched words are also highlighted.

Customizing Search Pages

The Sun Java System Web Server includes a default search application that provides basic search query and search results pages. These web pages can be used as is, or customized to meet your specific needs. Such customizing might be as simple as re-branding the web pages with a different logo, or as complex as changing the order in which search results are displayed.

The default search application provides sample JSPs that demonstrate how to use the search tag libraries to build customized search interfaces. You can take a look at the default search application located at [install_dir]/lib/webapps/search/ as a sample application that illustrates the use of customizable search tags.

The default search interface consists of four main components: header, footer, query form, and results.

These basic elements can be easily customized simply by changing the values of the attributes of the tags. More detailed customizing can be accomplished using the tag libraries.

Search Interface Components

The Search interface consists of the following components:

Header

The header includes a logo, title, and a short description.

Footer

The footer contains copyright information.

Form

The query form contains a set of check boxes representing search collections, a query input box, and submit and Help buttons.

Results

The results are listed by default in 10 records per page. For each record, information such as the title, a passage, size, date of creation, and URL are displayed. A passage is a short fragment of the page with matched words highlighted.

Customizing the Search Query Page

The query form contains a list of check boxes for search collections, a query input box, and submit button. The form is created using the <s1ws:form> tag along with <collElem>, <queryBox>, and <submitButton> tags with default values:

<s1ws:form>
    <s1ws:collElem>
    <s1ws:queryBox> <s1ws:submitButton>
</s1ws:form>

The query form can be placed anywhere in a page. It can also be displayed in different formats such as with a cross bar where the collection select box, the query string input box, and the Submit button are lined up horizontally, or in a block where the collections appear as check boxes, and the query input box and Submit button are placed underneath.

The following examples show how the <searchForm> set of tags may be used to create query forms in different formats.

In a horizontal bar

The sample code below creates a form with a select box of all collections, a query input box and a submission button all in one row.

<s1ws:form>
    <table cellspacing="0" cellpadding="3" border="0">
    <tr class="navBar">
        <td class="navBar"><s1ws:collElem type=”select”></td>
        <td class="navBar">
            <s1ws:querybox size="30">
            <s1ws:submitButton class="navBar" style="padding: 0px; margin: 0px; width: 50px">
        </td>
    </tr>
    </table>
</s1ws:form>

In a Sidebar Block

You can create a form block in which form elements are arranged in a sidebar titled "Search", which uses the same format as other items on the sidebar.

In the sample code given below, the form body contains three check boxes arranged in one column listing the available search collections. The query input box and the Submit button are placed underneath:

<s1ws:searchForm>
    <table>
<!--... other sidebar items ... -->
    <tr class="Title"><td>Search</td></tr>
    <tr class="Body">
        <td>
        <table cellspacing="0" cellpadding="3" border="0">
        <tr class="formBlock">
            <td class="formBlock"> <s1ws:collElem type="checkbox" cols="1" values="1,0,1,0" /> </td>
        </tr>
        <tr class="formBlock">
            <td class="formBlock"> <s1ws:querybox size="15" maxlength="50"> </td>
        </tr>
        <tr class="formBlock">
            <td class="formBlock"> <s1ws:submitButton class="navBar" style="padding: 0px; margin: 0px; width: 50px"> </td>
        </tr>
        </table>
        </td>
    </tr>
    </table>
</s1ws:searchForm>

Customizing the Search Results Page

Search results are generated as follows:

You can customize the search results page simply by changing the attribute values of the tags.

The following sample code starts with a title bar, and then displays a number of records as specified, and finally, a navigation bar. The title bar contains the query string used in the search along with the range of total records returned, for example, 1– 10. For each record, the records section shows the title with a link to the file, up to three passages with keywords highlighted, the URL, the date of creation, and the size of the document.

At the end of the section, the navigation bar provides links to the previous and next pages, as well as direct links to eight additional pages before and after the current page.

<s1ws:formAction />
<s1ws:formSubmission success="true" >
    <s1ws:search scope="page" />
    <!--search results-->
    (...html omitted...)
        <s1ws:resultStat formId="test" type="total" /></b> Results Found, Sorted by Relevance</span></td><td>
        <span class="body"><a href="/search/search.jsp?">Sort by Date</a></span></td>
        <td align="right"><span class="body">
        <s1ws:resultNav formId="test" type="previous" caption="<img border=0 src=\\"images/arrow-left.gif\\" alt=\\"Previous\\">" />
        &nbsp;<s1ws:resultStat formId="test" type="range" />
        &nbsp;<s1ws:resultNav formId="test" type="next" caption="<img border=0 src=\\"images/arrow-right.gif\\" alt=\\"Next\\">" />
        &nbsp; <!img alt="Next" src="images/arrow-right.gif" border="0" WIDTH="13" HEIGHT="9">
            (...html omitted...)
        <table border=0>
        <s1ws:resultIteration formId="test" start="1" results="15">
            <tr class=body>
                <td valign=top>
                <s1ws:item property=’number’ />.&nbsp;&nbsp;
                </td>
                <td>
                    <b><a href="<s1ws:item property=’url’ />"><s1ws:item property=’title’ /></a></b>
                    <br>
                    <s1ws:item property=’passages’ />
                    <font color="#999999" size="-2">
                    <s1ws:item property=’url’ /> -
                    <s1ws:item property=’date’ /> -
                    <s1ws:item property=’size’ /> KB
                    </font><br><br>
                </td>
            </tr>
        </s1ws:resultIteration>
        </table>
        (...html omitted...)
        <s1ws:resultNav formId="test" type="previous" />
        <s1ws:resultNav formId="test" type="full" offset="8" />
        <s1ws:resultNav formId="test" type="next" />
    (...html omitted...)
</s1ws:formSubmission>

The basic search result interface can be easily customized by manipulating the tags and modifying the HTMLs. For example, the navigation bar may be copied and placed before the search results. Users may also choose to show or not show any of the properties for a search record.

Besides being used along with a form, the <search>, <resultIterate> and related tags may be used to listed specific topics. The following sample code lists the top ten articles on Java Web Services on a site:

<s1ws:search collection="Articles" query="Java Web Services" />
<table cellspacing="0" cellpadding="3" border="0">
  <tr class="Title"><td>Java Web Services</td></tr>
</table>
<table cellspacing="0" cellpadding="3" border="0">
<s1ws:resultIteration>
<tr>
<td><a href="<s1ws:item property=’URL’ />"> <s1ws:item property=’Title’/></a></td>
</tr>
</s1ws:resultIteration>
</table>

Customizing Form and Results in Separate Pages

If you need the form and results pages to be separate, you must create the form page using the <form> set of tags and the results pages using the <formAction> set of tags.

A link to the form page needs to be added in the results page for a smooth flow of pages.

Tag Conventions

Note the following tag conventions:

Tag Specifications

The Sun Java System Web Server includes a set of JSP tags that can be used to customize the search query and search results pages in the search interface.

For a complete list of JSP tags that you can use to customize your search pages, refer to the Sun Java System Web Server 7.0 Developer’s Guide to Web Applications.