Sun logo      Previous      Contents      Index      Next     

Sun ONE Portal Server 6.2 Developer's Guide

Chapter 10
Using Java to Access the Search Engine Database

This chapter describes how to submit queries and add entries to the search engine database by using the Java™ programming language.

To do so, you need to use the search engine Java™ Software Development Kit (SDK), which is available at:

portal-server-install-root/SUNWps/sdk/search

The Java SDK includes the Java classes needed for interacting with the search engine database and a sample search application which can be run as an applet or as a stand alone program.

This chapter contains the following sections:


The Search Engine Java SDK

The classes in the search engine Java SDK provide a Java interface for interacting with the search server to access the search engine database.

Using the Java SDK, you can build Java programs, applets, and your own interfaces that submit searches to the search engine.

The search engine database contains resource descriptions for the indexed resources, such as documents. Each resource description is described in SOIF format, where SOIF stands for Summary Object Interchange Format.

For a discussion of SOIF, see:

http://www.w3.org/TR/NOTE-rdm.html#soif.


Running the Sample Applications

The search engine Java SDK contains sample Java and HTML files. SearchDemo.java is an example Java program which can be run as an applet or as a stand alone program. To run the applet version, use the JDK appletviewer or a browser with the Java 1.2 plugin. You must also edit SearchDemo.html to pass the information about your search engine to the SearchDemo applet. The command line version takes the server information as a command line argument

To Install and Run the Search Demo Command Line Program

  1. Compile the SearchDemo.java file. Make sure the class path includes the SDK JAR file, searchsdk.jar. For example, type:
  2. javac -classpath searchsdk.jar SearchDemo.java

  3. Create a directory hierarchy that reflects the Java package structure for these classes to use the compiled classes directly, and move the classes into the demo package of this hierarchy.
  4. For example, type:

    mkdir -p com/sun/portal/search/demo; cp *.class com/sun/portal/search/demo

  5. Invoke the search demo from the command line and supply the address of the search server and the query string as arguments.
  6. For example, type:

    java -classpath .:searchsdk.jar com.sun.portal.search.demo.SearchDemo http://portal_server_host_name:port/portal/search ’search query’

The codebase parameter indicates the directory containing the main class and the ancillary classes for the applet. You can specify this in the form:

"http://host_name:compass_server_port/java"

To Install and Run the Search Demo Applet

To run the search demo as an applet, use the JDK appletviewer or a browser with the Java 1.3.1 plugin. You must also edit SearchDemo.html to configure the information about your search engine location.


Note

The instructions included here are for appletviewer. To run the applet in a browser, you need Java 1.2 or later browser plugin. More information on the Java browser plugin can be found at http://java.sun.com/products/plugin and http://java.sun.com/products/plugin/1.3/docs/tags.html.


  1. Compile the SearchDemo.java file. Make sure the class path includes the SDK JAR file, searchsdk.jar.
  2. For example, enter:

    javac -classpath searchsdk.jar SearchDemo.java

  3. Create a directory hierarchy that reflects the Java package structure for these classes to use the compiled classes directly, and move the classes into the demo package of this hierarchy.
  4. For example, type:

    mkdir -p com/sun/portal/search/demo; cp *.class com/sun/portal/search/demo

  5. Enter the server location information by editing the applet parameters in SearchDemo.html.
  6. For example, type:

    <param name="RDMServer" value="http://portal_server_host_name:port/portal/search">

Appletviewer can now be run from the command line. It will access the java classes from the current location. For example, enter:

appletviewer SearchDemo.html

Search queries are entered into the search bow that appears and results are sent to stdout of the controlling terminal. If you are running the applet in a browser, the results are displayed in the Java console.

If you have installed the browser Java plugin, then the instructions are similar to those for appletviewer except that you must make the classes available for download to the client browser. To do this:

  1. Copy SearchDemo.html, searchsdk.jar and the directory hierarchy created above (including the SearchDemo classes) to a location that is in the content path of your web server.
  2. You can also add the search demo classes to searchsdk.jar to make the applet available in a single download.

  3. The searchsdk.jar file must be named as the archive attribute of the applet tag in SearchDemo.html.
  4. For example, enter:

    jar uf com/sun/portal/search/demo/*.class searchsdk.jar


Using Java To Access the Search Server Database

You can use the search engine Java SDK to write Java programs that interface with the sendrdm program to retrieve information from the search engine database.

The main steps are:

Creating a Search Object

The entry point for submitting searches is the Search class. You need to create a new Search object, then call doQuery() on it to execute the search query.

The first thing you need to do is create a new Search object. The full constructor syntax is:

public Search(

    String scope

    String viewAttributes,

    String viewOrder,

    int firstHit,

    int viewHits,

    String queryLanguage,

    String database,

    String RDMServer,

    String ssoToken,

)

The arguments for the constructor are outlined in the following table:

String scope

The query string or scope, that is, the string being searched for.

String viewAttributes

A comma-delimited list of the SOIF attributes to be retrieved from the database, such as URL, Author, Description. For example, score,url,title,description,classification.

String viewOrder

The order by which to sort the results. This is a comma-delimited list of attributes. Use the minus sign to indicate descending order, and a plus sign to indicate ascending order. For example, -score,+title.

int firstHits

The hit number of the first result to return. A typical value is 1.

int viewHits

Maximum number of results to return. A typical value is 10.

String queryLanguage

The Search Server query language. You should use search for a normal query.

String database

The logical name of a database (or collection) you wish to search. A typical value is null which will search the server’s default database.

String RDMServer

The URL of the search engine servlet. This argument has the form:

http://hostname.domain.com:port/portal/search

For example, if the search server is installed on www.yourcompany.com on port 80, the value would be:

http://www.yourcompany.com:80/portal/search

String ssoToken

An Sun ONE Identity Server software single sign on token used when doing secure searches. There is also a simpler convenience constructor with the following syntax:

public Search(String scope, String RDMServer)

When this constructor is used the following values are used for the unspecified arguments:

viewAttributes: null. Return all attributes.

viewOrder: null. Use the server default sort order - sorted by relevance.

firsthit: 1. Start hits at hit number 1.

viewhits: 10. Return 10 hits only.

query language: search. Search for documents using the normal query language.

database: null. Search the server’s default database.

ssoToken: null. Use anonymous search.

Executing A Query and Getting the Results

You submit a query by calling the doQuery() method.

public void doQuery()

The results from Search.doQuery() can be obtained as a SOIF stream using Search.getResultStream(). The next search will replace the previous result stream reference, so you must process the results or save a reference to the result stream after each query. There are also methods for checking the number of results.

public SOIFInputStream getResultStream()

The function getResultStream() returns a SOIFInputStream which is used to read the SOIF hit objects. Each SOIF object read from the stream corresponds to one result.

public int getHitCount()

The function Search.getHitCount() returns the number of hits that matched the query.

public int getResultCount()

The function Search.getResultCount() returns the number of results that were returned by the server. The result count will be equal to the number requested by the viewHits argument whenever there are enough results available.

public int getDocumentCount()

The function Search.getDocumentCount() returns the total number of documents searched across. This will usually equal the total number of documents in the searched database.

Working Through An Example Search Application

This section discusses the SearchDemo example application provided with the Java search SDK. The purpose of this example is to show how to use a Search object to submit a query to the search server and how to extract the results from the Search object. The example application is very simple, and limits use of Java to achieving the goals of the example. It creates a Java applet that presents the user with a text field in which to enter a search query, and a Search button to initiate the search. The results of the query are read from a SOIFInputStream returned by the Search object. The query results are displayed to standard output as plain text.

Import the Necessary Classes

In your favorite editor or Java development environment, view the search SDK file SearchDemo.java. This demo runs as a stand alone application or as an applet. The soif package is provided as part of the Search Server Java SDK, while java.applet, java.awt, and java.io are standard Java packages.

package com.sun.portal.search.demo;

import com.sun.portal.search.soif.*;

import java.applet.Applet;

import java.awt.*;

import java.io.*;

Define the SearchDemo Class

The class SearchDemo is an applet, so it extends the class Applet. SearchDemo defines init() and main() methods which allow it to run as an applet or as a stand alone (command line) program.

Code Example 10-1  SearchDemo  

/**

* Applet/application for simple query interface. Can be used as an

* example for those who want to create their own java interface.

* This example demonstrates search only. Browse, determining

* the schema of the search server and obtaining the taxonomy

* of the search server will be demonstrated in other examples.

*/

public class SearchDemo extends Applet {

    /** Run as an applet. */

    public void init() {

        String rdm = getParameter("RDMServer");

        SimpleSearch ss = new SimpleSearch(rdm);

        SearchPanel sp = new SearchPanel(ss);

        setLayout(new FlowLayout(FlowLayout.CENTER));

        add(sp);

    }

    /** Run as an application. */

    public static void main(String argv[]) throws Exception {

        int args = argv.length;

        String SOIFOutputFile = null;

        if (args != 1 && args != 2 && args != 3) {

            System.out.println("args: RDMServer [query]

            [soif_output_file_name]");

            return;

        }

        String rdm = argv[0]; // rdm search server, eg,

        // http://portal.siroe.com:2222/ps/search

        SimpleSearch ss = new SimpleSearch(rdm);

        if (args == 3) {

            --args;

            ss.setSOIFfile(argv[2]); // dump raw soif results to this file

        }

        if (args == 1) {

            // run from a search box

            Frame f = new Frame();

            SearchPanel sp = new SearchPanel(ss);

            f.add(sp);

            f.pack();

            f.show();

        }

        else {

            // run from command line

            String query = argv[1];

            ss.doSearch(query);

        }

    }

}

There is a helper class called SearchPanel which handle the applet GUI. It sets up a search panel with a text box to enter a query and a submit button to run the query. See the source file for more details.

Define the SimpleSearch Class

Notice the private helper class SimpleSearch. This is where the search is set up and executed and we will look at it in more detail here. The applet/command line class SearchDemo sets up the arguments for SimpleSearch using either applet or command line parameters. It then calls the SimpleSearch.doSearch(String scope) method to execute the search and display the results. The SimpleSearch constructor takes the location of the search server as an argument. In this way, a single SimpleSearch object can be used repeatedly to run searches against the remote search server.

The SimpleSearch.setSOIFfile(String filename) method is used by the main program to direct search results to a file when running in command line mode.

Code Example 10-2  SimpleSearch Class  

/** Performs a simple search and displays its results. */

class SimpleSearch {

    String RDMServer;

    String SOIFOutputFile;

    /**

    * SimpleSearch constructor

    * @param rdm - the rdm search server, eg, http://portal.siroe.com:2222/portal/search

    */

    public SimpleSearch(String rdm) {

        System.out.println("Sun ONE Search Java Demo");

        RDMServer = rdm;

    }

    /**

    * @param filename - a file to dump raw SOIF results into - only

    * use if running from the comand line or an applet with file

    * system access

    */

    public void setSOIFfile(String filename) {

        SOIFOutputFile = filename;

    }

    /** Execute a search */

    public void doSearch(String scope) throws IOException {

        ...see Code Example 10-3...

    }

}

Before submitting the search, SimpleSearch needs to create a Search object. The constructor for the Search class takes several arguments as discussed previously.

Code Example 10-3  doSearch Class  

/** Execute a search */

public void doSearch(String scope) throws IOException {

    /* The Search class encapsulates the search.

    ** It’s parameters are:

    ** 1) the search string

    ** 2) the attributes you want returned, comma delimited

    ** 3) sort order, comma delimited, - descending, + ascending

    ** 4) first hit

    ** 5) number of hits

    ** 6) query language, eg search, taxonomy-basic, schema-basic, etc

    ** 7) database to search

    ** 8) The RDM server URL, eg, http://portal.siroe.com:2222/ps/search

    ** 9) Access token (null for anonymous access, or valid iPlanet Directory Server Access Management Edition session id)

    */

    Search search = new Search(

        scope,

        "score,url,title,description",

        "-score",

        1,

        20,

        "search",

        null,

        RDMServer,

        null

    );

The Search constructor arguments used here are:

scope

The search scope is the actual query run by the search server. It is the scope argument to doSearch() and ultimately derives from either the applet input panel or a command line argument to the main program.

viewAttributes = "score,url,title,descri ption"

The requested attribute set shown here will result in the server returning the score, url, title, and description of all documents that match the query.

viewOrder = "-score"

A comma delimited list of the attributes to be used to sort the results. A minus sign indicates descending order, a plus sign indicates ascending order. In this case, sort the results by decreasing numerical score value, and use alphabetical order of the title as the secondary sort order.

firstHit = 1

The hit number of the first returned result.

viewHits = 20

The maximum number of results to return.

queryLanguage = "search"

The search server query language. Use search for normal searches.

RDMServer

The URL of the remote search engine, specified as an argument to the SimpleSearch constructor.

ssoToken = null

Sun ONE Identity Server software single sign on token. Not used in this case, implying anonymous search.

Execute the Search Query

An output stream is created to hold the search results and paginate through the search results for a fixed number of pages, in this case five pages in total, where each page has viewHits (=20) results. The first page starts with the first hit (firstHit=1). The search is executed again for each page of results. It is possible to cache the results for all pages with a single search of course, but it is often easier to simply resubmit the search each time. This is equivalent to a user clicking a next button in a search user interface.

/* Execute the query. */

System.out.println("\nSearch results for '" + scope + "'");

DataOutputStream sos = null;

if (SOIFOutputFile != null) {

try {

sos = new DataOutputStream(new FileOutputStream(SOIFOutputFile));

}

catch (Exception e1) {

System.out.println("Error: failed to create output file: " + e1);

}

}

int pagenum = 1;

int pagesize = 10;

SOIFBuffer firstPageSOIF = new SOIFBuffer();

for (; pagenum <= 5; pagenum++) {

int firstHit = (pagenum-1)*pagesize+1;

try {

search.doQuery(firstHit, pagesize);

}

catch (Exception ex) {

ex.printStackTrace();

break;

}

// Check the result count. -1 indicates an error.

if (search.getResultCount() <= 0)

break;

The results are stored in the Search object. Now do something with the results. The functions doSomethingWithResults() and displayHTMLResults() will be defined in this file. They each show a different way of extracting the results from the Search object.

Display the Results

The example application displays the query results to standard output or to a named file. In reality, you would do more with the results than just print them like this, but once you know how to get the results out of the Search object, it is up to you what you do with them. You can use standard Java functionality to process the results in any way you like.

The Search object has a method called getResultStream() that returns a SOIFInputStream object. Each result is read from this SOIF stream in turn. Note that the client server connection uses an efficient streamed protocol; it is conceivable that the server is still returning later results while the client is processing the first results. For each SOIF object read from the result stream you can use the getValue() method to get the value of a particular field, for example, getValue("title") gets the title of a SOIF object.

First, print out some general result information:

System.out.println("=========================================");

System.out.println("page " + pagenum

+ ": hits " + search.getFirstHit()

+ " to " + (search.getFirstHit() + search.getResultCount() - 1)

+ " out of " + search.getHitCount()

+ " across " + search.getDocumentCount() + " documents");

System.out.println("=========================================");

System.out.println();

Now, retrieve each search hit from the result stream as SOIF objects and print its URL, title, description, and score to the output stream (either the Java console, standard output, or a named output file).

SOIFInputStream resultStream = search.getResultStream();

SOIF soif;

/* Examine the results of the search. The following

* code loops through the stream of SOIF instances. */

for (soif = resultStream.readSOIF(); soif != null; soif = resultStream.readSOIF()) {

    // For illustration, dump out the entire SOIF on the first page only.

    if (pagenum == 1)

        firstPageSOIF.write(soif.toByteArray());

        /* Now we use the getValue() method to get

        * the values of each of the requested

        * attributes. URL is special and has

        * its own accessor method.

        */

        String u = soif.getURL();

        String t = soif.getValue("title");

        String d = soif.getValue("description");

        String sc = soif.getValue("score");

        /* do something with the results */

        System.out.println(

            "TITLE: " + t + "\n" +

            "URL: " + u + "\n" +

            "SCORE: " + sc + "\n" +

            "DESCRIPTION: " + d + "\n" +

            "--------------------------------------------\n"

        );

        // If there is a SOIF output file, write the SOIF data there too...

        if (sos != null) {

            try {

                sos.writeBytes(soif.toString());

            }

            catch (Exception e1) {

                System.out.println("Error: failed to write to SOIF

                output file: " + e1);

            }

        }

    }

    // Break if the largest requested hit has been displayed

    if (search.getHitCount() <= (firstHit + pagesize - 1))

        break;

    }

    if (firstPageSOIF == null)

        System.out.println("No matching documents found.");

    }


Using Java To Add Entries to the Search Engine Database

The program rdmgr is used to add data to the database from the command line. This section describes how to create input data for rdmgr so that it can be added to the database. The rdmgr utility can add new data as well as replace, modify, or retrieve existing data. All data input and output is done using SOIF, with UTF-8 character encoding for character fields. Note that SOIF also supports binary-valued fields and they can be added or retrieved too.

For more information on rdmgr, see Sun ONE Portal Server 6.2 Administrator’s Guide.

In the simplest case, rdmgr can be used to add a file containing multiple SOIF objects to the database. This is as simple as creating a SOIF file and adding the data with the command rdmgr soif_input_file. The search robot calls rdmgr in this manner to index data it collects from its crawling runs.

In the general case though, rdmgr accepts a complete resource description submit request as input. The RD submit input must be in SOIF format with a request header and a body consisting of the SOIF data to be added or retrieved to or from the database.

A SOIF object consists of a schema name (such as @REQUEST or @DOCUMENT), a URL, and a list of attribute-value pairs. The com.sun.portal.search.soif package in the Search Server Java SDK is used to build SOIF objects and write them to a file. You can use the SOIF classes to create a RD submit request for input to rdmgr.

Here is an example of constructing a request that can be used as a second argument to rdmgr:

SOIF req = new SOIF("REQUEST", "-");

Write the header part of the RDM to send to the database. SOIF objects of type @Request do not have an associated URL. An update request to the search engine has the following attribute-value pairs:

submit-csid

submit-type

submit-operation

submit-view

Add values for each of these attributes to the request header:

req.insert("submit-csid", x-catalog://nikki.boots.com:80/default);

req.insert("submit-type", "persistent");

req.insert("submit-operation", "merge");

req.insert("submit-view", "title,author,description");

Now we create the body part of the submit request. We’ll be saving a resource description for a document, whose URL is http://www.sesta.com/~jocelyn/resdogs.index.htm, whose title is “Saving English Springer Spaniels,” whose author is Jocelyn Becker, and whose description is “English Springer Spaniels in need of homes.”

SOIF data = new SOIF("DOCUMENT", "http://www.sesta.com/~jocelyn/resdogs.index.htm\n");

data.insert("title", "Saving English Springer Spaniels");

data.insert("author", "Jocelyn Becker");

data.insert("description", "English Springer Spaniels in need of homes");

Now, the request is saved to a file for input to rdmgr:

SOIFOutputStream sos = new SOIFOutputStream("filename");

sos.write(req);

sos.write(data);

sos.close();

At this point soif_file should contain:

@REQUEST { -

    submit-csid{20}: x-catalog://nikki.boots.com:80/default

    submit-type{23}: persistent

    submit-operation{29}: merge

    submit-view{30}: title,author,description

}

@DOCUMENT { http://www.best.com/~jocelyn/resdogs/index.html

    title{35}: Saving English Springer Spaniels

    author{37}: Jocelyn Becker

    description{39}: English Springer Spaniels in need of homes

}

When this input is processed by rdmgr, it will result in the RD shown being added to the database and indexed. The rdmgr utility supports other types of requests too:

Code Example 10-4  rdmgr Submit  

// submit header fields

String SUBMIT_CSID = "submit-csid";

String SUBMIT_TYPE = "submit-type";

String SUBMIT_OPER = "submit-operation";

String SUBMIT_VIEW = "submit-view";

String SUBMIT_DB = "submit-database";

String SUBMIT_MESSAGE = "message";

String SUBMIT_ERROR = "error";

// submit types

String SUBMIT_PERSISTENT = "persistent";

String SUBMIT_NONPERSISTENT = "nonpersistent";

String SUBMIT_MERGED = "merged";

// submit operations

String SUBMIT_RETRIEVE = "retrieve";

String SUBMIT_INSERT = "insert";

String SUBMIT_DELETE = "delete";

String SUBMIT_UPDATE = "update";

The submit operations are as follows:

retrieve

Retrieves the requested fields (the submit view) for the requested RDs. In this case the data is a list of RDs that can be specified by their URLs only. The server will return the requested fields for these RDs.

insert

Default operation. The server adds the RDs supplied as data.

delete

The server deletes the RDs. As with retrieve, it is sufficient to list the RDs by url alone, it is not necessary to supply values for the fields of the RDs.

update

The server modifies the RDs in the database by merging any existing fields with the fields supplied in the data. If an attribute view list is supplied, only those attributes will be updated.

The submit types are as follow:

persistent

Data is added to the persistent part of each RD in the database. When an RD is retrieved from the database, or indexed, any persistent fields take precedence over non persistent fields. This allows you to manually edit the fields of an RD without having to worry that your edits will be lost the next time the RD is submitted by the robot, for example.

non-persistent

This is the default type. Data is normally added as non-persistent data.

merged

This is the default for retrieval. When data is retrieved, the persistent and non-persistent fields are merged together, with the persistent fields taking precedence over the non-persistent fields. You can view this as the persistent fields ‘covering’ the non-persistent fields.



Previous      Contents      Index      Next     


Copyright 2003 Sun Microsystems, Inc. All rights reserved.