![]() |
Sun ONE Portal Server Developer's Guide |
Chapter 8 Using Java to Access the Search Engine Database
This chapter describes how to submit queries and add entries to the search engine database by using the Java programming language.
To do so, you need to use the search engine JavaTM Software Development Kit (SDK), which is available at:
BaseDir/SUNWps/sdk/search
The Java SDK includes the Java classes needed for interacting with the search engine database and a sample search application which can be run as an applet or as a stand alone program.
The Search Engine Java SDK
The classes in the search engine Java SDK provide a Java interface for interacting with the search server to access the search engine database.
Using the Java SDK, you can build Java programs, applets, and your own interfaces that submit searches to the search engine.
The search engine database contains resource descriptions for the indexed resources, such as documents. Each resource description is described in SOIF format, where SOIF stands for Summary Object Interchange Format.
For a discussion of SOIF, see:
http://www.w3.org/TR/NOTE-rdm.html#soif.
Running the Sample Applications
The search engine Java SDK contains sample Java and HTML files. SearchDemo.java is an example Java program which can be run as an applet or as a stand alone program. To run the applet version, use the JDK appletviewer or a browser with the Java 1.2 plugin. You must also edit SearchDemo.html to pass the information about your search engine to the SearchDemo applet. The command line version takes the server information as a command line argument
Installing and Running the Search Demo Command Line Program
- Compile the SearchDemo.java file. Make sure the class path includes the SDK jar file, searchsdk.jar. For example: enter:
javac -classpath searchsdk.jar SearchDemo.java
- Create a directory hierarchy that reflects the Java package structure for these classes to use the compiled classes directly, and move the classes into the demo package of this hierarchy.
For example, enter:
mkdir -p com/sun/portal/search/demo; cp *.class com/sun/portal/search/demo
- Invoke the search demo from the command line and supply the address of the search server and the query string as arguments.
For example, enter:
java -classpath .:searchsdk.jar com.sun.portal.search.demo.SearchDemo http://portal_server_host_name:port/portal/search 'search query'
The codebase parameter indicates the directory containing the main class and the ancillary classes for the applet. You can specify this in the form:
"http://host_name:compass_server_port/java"
Installing and Running the Search Demo Applet
To run the search demo as an applet, use the JDK appletviewer or a browser with the Java 1.3.1 plugin. You must also edit SearchDemo.html to configure the information about your search engine location.
- Compile the SearchDemo.java file.Make sure the class path includes the SDK jar file, searchsdk.jar.
For example, enter:
javac -classpath searchsdk.jar SearchDemo.java
- Create a directory hierarchy that reflects the Java package structure for these classes to use the compiled classes directly, and move the classes into the demo package of this hierarchy.
For example, enter:
mkdir -p com/sun/portal/search/demo; cp *.class com/sun/portal/search/demo
- Enter the server location information by editing the applet parameters in SearchDemo.html.
For example, enter:
<param name="RDMServer" value="http://portal_server_host_name:port/portal/search">
Appletviewer can now be run from the command line. It will access the java classes from the current location. For example, enter:
appletviewer SearchDemo.html
Search queries are entered into the search bow that appears and results are sent to stdout of the controlling terminal. If you are running the applet in a browser, the results are displayed in the Java console.
If you have installed the browser Java plugin, then the instrucations are simliar to those for appletviewer except that you must make the classes available for download to the client browser. To do this:
- Copy SearchDemo.html, searchsdk.jar and the directory hierarchy created above (including the SearchDemo classes) to a location that is in the content path of your web server.
You can also add the search demo classes to searchsdk.jar to make the applet available in a single download.
- The searchsdk.jar file must be named as the archive attribute of the applet tag in SearchDemo.html.
For example, enter:
jar uf com/sun/portal/search/demo/*.class searchsdk.jar
Using Java To Access the Search Server Database
You can use the search engine Java SDK to write Java programs that interface with the sendrdm program to retrieve information from the search engine database.
The main steps are:
Creating a Search Object
The entry point for submitting searches is the Search class. You need to create a new Search object, then call doQuery() on it to execute the search query.
The first thing you need to do is create a new Search object. The full constructor syntax is:
public Search(
String scope
String viewAttributes,
String viewOrder,
int firstHit,
int viewHits,
String queryLanguage,
String database,
String RDMServer,
String ssoToken,
)
The arguments for the constructor are:
String scope
The query string or scope, that is, the string being searched for.
String viewAttributes
A comma-delimited list of the SOIF attributes to be retrieved from the database, such as URL, Author, Description. For example, score,url,title,description,classification.
String viewOrder
The order by which to sort the results. This is a comma-delimited list of attributes. Use the minus sign to indicate descending order, and a plus sign to indicate ascending order. For example, -score,+title.
int firstHits
The hit number of the first result to return. A typical value is 1.
int viewHits
Maximum number of results to return. A typical value is 10.
String queryLanguage
The Search Server query language. You should use "search" for a normal query.
String database
The logical name of a database (or collection) you wish to search. A typical value is null which will search the server's default database.
String RDMServer
The URL of the search engine servlet. This argument has the form:
http://portal_server_host.domain.com:port/portal/search
For example, if the search server is installed on www.yourcompany.com on port 80, the value would be:
http://www.yourcompany.com:80/portal/search
String ssoToken
An iPlanet Directory Server Access Management Edition software single sign on token used when doing secure searches. There is also a simpler convenience constructor with the following syntax:
public Search(String scope, String RDMServer)
When this construnctor is used the following valuse are used for the unspecified arguments:
viewAttributes: null. Return all attributes.
viewOrder: null. Use the server default sort order - sorted by relevance.
firsthit: 1. Start hits at hit number 1.
viewhits: 10. Return 10 hits only.
query language: search. Search for documents using the normal query language.
database: null. Search the server's default database.
ssoToken: null. Use anonymous search.
Executing A Query and Getting the Results
You submit a query by calling the doQuery() method.
public void doQuery()
The results from Search.doQuery()and can be obtained as a SOIF stream using Search.getResultStream(). The next search will replace the previous result stream reference, so you must process the results or save a reference to the result stream after each query. There are also methods for checking the number of results.
public SOIFInputStream getResultStream()
The function getResultStream() returns a SOIFInputStream which is used to read the SOIF hit objects. Each SOIF object read from the stream corresponds to one result.
public int getHitCount()
The function Search.getHitCount() returns the number of hits that matched the query.
public int getResultCount()
The function Search.getResultCount() returns the number of results that were returned by the server. The result count will be equal to the number requested by the viewHits argument whenever there are enough results available.
public int getDocumentCount()
The function Search.getDocumentCount() returns the total number of documents searched across. This will usually equal the total number of documents in the searched database.
Working Through An Example Search Application
This section discusses the SearchDemo example application provided with the Java search SDK. The purpose of this example is to show how to use a Search object to submit a query to the search server and how to extract the results from the Search object. The example application is very simple, and limits is use of Java to achieving the goals of the example. It creates a Java applet that presents the user with a text field in which to enter a search query, and a "Search" button to initiate the search. The results of the query are read from a SOIFInputStream returned by the Search object. The query results are displayed to standard output as plain text.
Import the Necessary Classes
In your favorite editor or Java development environment, view the search SDK file SearchDemo.java. This demo runs as a stand alone application or as an applet. The soif package is provided as part of the Search Server Java SDK, while java.applet, java.awt and java.io are standard Java packages.
package com.sun.portal.search.demo;
import com.sun.portal.search.soif.*;
import java.applet.Applet;
import java.awt.*;
import java.io.*;
Define the SearchDemo Class
The class SearchDemo is an applet, so it extends the class Applet. SearchDemo defines init() and main() methods which allow it to run as an applet or as a stand alone (command line) program.
There is a helper class called SearchPanel which handle the applet GUI. It sets up a search panel with a text box to enter a query and a submit button to run the query. See the source file for more details.
Define the SimpleSearch Class
Notice the private helper class SimpleSearch. This is where the search is set up and executed and we will look at it in more detail here. The applet/command line class SearchDemo sets up the arguments for SimpleSearch using either applet or command line parameters. It then calls the SimpleSearch.doSearch(String scope) method to extecute the search and display the results. The SimpleSearch constructor takes the location of the search server as an argument. In this way, a single SimpleSearch object can be used repeatedly to run searches against the remote search server.
The SimpleSearch.setSOIFfile(String filename) method is used by the main program to direct search results to a file when running in command line mode.
Code Example 8-2    SimpleSearch class
/** Performs a simple search and displays its results. */
class SimpleSearch {
String RDMServer;
String SOIFOutputFile;
/**
* SimpleSearch constructor
* @param rdm - the rdm search server, eg, http://portal.siroe.com:2222/portal/search
*/
public SimpleSearch(String rdm) {
System.out.println("Sun ONE Search Java Demo");
RDMServer = rdm;
}
/**
* @param filename - a file to dump raw SOIF results into - only
* use if running from the comand line or an applet with file
* system access
*/
public void setSOIFfile(String filename) {
SOIFOutputFile = filename;
}
/** Execute a search */
public void doSearch(String scope) throws IOException {
...see Code Example 8-3...
}
}
Before submitting the search, SimpleSearch needs to create a Search object. The constructor for the Search class takes several arguments as discussed previously.
The Search constructor arguments used here are:
- scope - the search scope is the actual query run by the search server. It is the scope argument to doSearch() and ultimately derives from either the applet input panel or a command line argument to the main program.
- viewAttributes = "score,url,title,description" - the requested attribute set shown here will result in the server returning the score, url, title, and description of all documents that match the query.
- viewOrder = "-score" - a comma delimited list of the attributes to be used to sort the results. A minus sign indicates descending order, a plus sign indicates ascending order. In this case, sort the results by decreasing numerical score value, and use alphabetical order of the title as the secondary sort order.
- firstHit = 1 - the hit number of the first returned result.
- viewHits = 20 - the maximum number of results to return.
- queryLanguage = "search" - the search server query language. Use "search" for normal searches.
- database = null - the database to search, in this case, the default database.
- RDMServer - the URL of the remote search engine, specified as an argument to the SimpleSearch constructor.
- ssoToken = null - SunONE Identity Server single sign on token. Not used in this case, implying anonymous search.
Execute the Search Query
An output stream is created to hold the search results and paginate through the search results for a fixed number of pages, in this case 5 pages in total, where each page has viewHits (=20) results. The first page starts with the first hit (firstHit=1). The search is executed again for each page of results. It is possible to cache the results for all pages with a single search of course, but it is often easier to simply resubmit the search each time. This is equivalent to a user clicking a next button in a seach user interface.
The results are stored in the Search object. Now do something with the results. The functions doSomethingWithResults() and displayHTMLResults() will be defined in this file. They each show a different way of extracting the results from the Search object.
Display the Results
The example application displays the query results to standard output or to a named file. In reality, you would do more with the results than just print them like this, but once you know how to get the results out of the Search object, it is up to you what you do with them. You can use standard Java functionality to process the results in any way you like.
The Search object has a method called getResultStream() that returns a SOIFInputStream object. Each resault is read from this SOIF stream in turn. Note that the client server connection uses an efficient streamed protocol; it is conceivable that the server is still returning later results while the client is processing the first results. For each SOIF object read from the result stream you can use the getValue() method to get the value of a particular field, for example, getValue("title") gets the title of a SOIF object.
First, print out some general result information:
System.out.println("=========================================");
System.out.println("page " + pagenum
+ ": hits " + search.getFirstHit()
+ " to " + (search.getFirstHit() + search.getResultCount() - 1)
+ " out of " + search.getHitCount()
+ " across " + search.getDocumentCount() + " documents");
System.out.println("=========================================");
System.out.println();
Now, retrieve each search hit from the result stream as SOIF objects and print its URL, title, description, and score to the output stream (either the Java console, standard output, or a named output file).
Using Java To Add Entries to the Search Engine Database
The program rdmgr is used to add data to the database from the command line. This section describes how to create input data for rdmgr so that it can be added to the database. The rdmgr utility can add new data as well as replace, modify, or retrieve existing data. All data input and output is done using SOIF, with UTF-8 character encoding for character fields. Note that SOIF also supports binary-valued fields and they can be added or retrieved too.
For more information on rdmgr, see Chapter 12 of the Sun ONE Portal Server 6.0 Administrator's Guide.
In the simplest case, rdmgr can be used to add a file containing multiple SOIF objects to the database. This is as simple as creating a SOIF file and adding the data with the command rdmgr soif_input_file. The search robot calls rdmgr in this manner to index data it collects from its crawling runs.
In the general case though, rdmgr accepts a complete resource description submit request as input. The RD submit input must be in SOIF format with a request header and a body consisting of the SOIF data to be added or retrieved to or from the database.
A SOIF object consists of a schema name (such as @REQUEST or @DOCUMENT), a URL, and a list of attribute-value pairs. The com.sun.portal.search.soif package in the Search Server Java SDK is used to build SOIF objects and write them to a file. You can use the SOIF classes to create a RD submit request for input to rdmgr.
Here is an example of constructing a request that can be used as a second argument to rdmgr:
SOIF req = new SOIF("REQUEST", "-");
Write the header part of the RDM to send to the database. SOIF objects of type @Request do not have an associated URL. An update request to the search engine has the following attribute-value pairs:
submit-csid
submit-type
submit-operation
submit-view
Add values for each of these attributes to the request header:
req.insert("submit-csid", x-catalog://nikki.boots.com:80/default);
req.insert("submit-type", "persistent");
req.insert("submit-operation", "merge");
req.insert("submit-view", "title,author,description");
Now we create the body part of the submit request. We'll be saving a resource description for a document, whose URL is http://www.best.com/~jocelyn/resdogs.index.htm, whose title is "Saving English Springer Spaniels," whose author is Jocelyn Becker, and whose description is "English Springer Spaniels in need of homes."
SOIF data = new SOIF("DOCUMENT", "http://www.best.com/~jocelyn/resdogs.index.htm\n");
data.insert("title", "Saving English Springer Spaniels");
data.insert("author", "Jocelyn Becker");
data.insert("description", "English Springer Spaniels in need of homes");
Now, the request is saved to a file for input to rdmgr:
SOIFOutputStream sos = new SOIFOutputStream("filename");
sos.write(req);
sos.write(data);
sos.close();
At this point soif_file should contain:
@REQUEST { -
submit-csid{20}: x-catalog://nikki.boots.com:80/default
submit-type{23}: persistent
submit-operation{29}: merge
submit-view{30}: title,author,description
}
@DOCUMENT { http://www.best.com/~jocelyn/resdogs/index.html
title{35}: Saving English Springer Spaniels
author{37}: Jocelyn Becker
description{39}: English Springer Spaniels in need of homes
}
When this input is processed by rdmgr, it will result in the RD shown being added to the database and indexed. The rdmgr utility supports other types of requests too:
The submit operations are as follows:
- retrieve - Retrieves the requested fields (the submit view) for the requested RDs. In this case the data is a list of RDs that can be specified by their urls only. The server will return the requested fields for these RDs.
- insert - Default operation. The server adds the RDs supplied as data.
- delete - The server deletes the RDs. As with retrieve, it is sufficient to list the RDs by url alone, it is not necessary to supply values for the fields of the RDs.
- update - The server modifies the RDs in the database by merging any existing existing fields with the fields supplied in the data. If an attribute view list is supplied, only those attributes will be updated.
The submit types are as follow:
- persistent - Data is added to the persistent part of each RD in the database. When an RD is retrieved from the database, or indexed, any persistent fields take precednece over non persistent fields. This allows you to manually edit the fields of an RD without having to worry that your edits will be lost the next time the RD is submitted by the robot, for example.
- non-persistent - This is the default type. Data is normally added as non-persistent data.
- merged - This is the default for retrieval. When data is retrieved, the persistent and non-persistent fields are merged together, with the persistent fields taking precedence over the non-persistent fields. You can view this as the persistent fields `covering' the non-persistent fields.