Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2004Q2 Developer's Guide 

Chapter 27
Using Java To Add Entries to the Search Engine Database

The program rdmgr is used to add data to the database from the command line. This chapter describes how to create input data for rdmgr so that it can be added to the database.


rdmgr

The rdmgr utility can add new data as well as replace, modify, or retrieve existing data. All data input and output is done using SOIF, with UTF-8 character encoding for character fields. Note that SOIF also supports binary-valued fields and they can be added or retrieved too.

For more information on rdmgr, see Portal Server Administration Guide.

In the simplest case, rdmgr can be used to add a file containing multiple SOIF Resource Descriptions (RDs) to the database. This is as simple as creating a SOIF file with the search sdk, or by other means, and adding the data with the command rdmgr soif_input_file. The rdmgr also accepts resource description submit requests as input. Submit requests also use SOIF format and include a request header in addition to the normal body consisting of the SOIF data to be added or retrieved to or from the database.


SOIF Object

A SOIF object consists of a schema name (such as REQUEST or DOCUMENT), a URL, and a list of attribute-value pairs. The com.sun.portal.search.soif package in the Search Server Java SDK is used to build SOIF objects and write them to a file.


Constructing and Submitting a Request

You can use the SOIF classes to create a RD submit request for input to rdmgr.

Constructing a Request

Here is an example of constructing a submit request that can be used as input to rdmgr. Request headers do not have an associated URL and use "-" instead.

SOIF req = new SOIF("REQUEST", "-");

A submit request can have the following attributes:

submit-csid

submit-database

submit-type

submit-operation

submit-view

Add values for each of these attributes to the request header. This example shows an update operation into the default database. The database attribute is optional, the default database is used if none is supplied. The submit view restricts which attributes are updated, by default all of the supplied input attributes will be updated for each resource description.

req.insert("submit-database", "default");

req.insert("submit-type", "nonpersistent");

req.insert("submit-operation", "update");

req.insert("submit-view", "title,author,description");

Now we create the body part of the submit request. We’ll be updating the resource description of a document, whose URL is http://www.sesta.com/~jocelyn/resdogs.index.htm, whose title is “Saving English Springer Spaniels,” whose author is Jocelyn Becker, and whose description is “English Springer Spaniels in need of homes.”

SOIF data = new SOIF("DOCUMENT", "http://www.sesta.com/~jocelyn/resdogs.index.htm\n");

data.insert("title", "Saving English Springer Spaniels");

data.insert("author", "Jocelyn Becker");

data.insert("description", "English Springer Spaniels in need of homes");

Now, the request is saved to a file for input to rdmgr:

SOIFOutputStream sos = new SOIFOutputStream("soif_file");

sos.write(req);

sos.write(data);

sos.close();

At this point soif_file should contain:

@REQUEST { -

    submit-database{7}: default

    submit-type{13}: nonpersistent

    submit-operation{6}: update

    submit-view{24}: title,author,description

}

@DOCUMENT { http://www.best.com/~jocelyn/resdogs/index.html

    title{32}: Saving English Springer Spaniels

    author{14}: Jocelyn Becker

    description{42}: English Springer Spaniels in need of homes

}

Submitting a Request

When this input is processed by rdmgr, it will result in the attributes of the RD shown being updated to the database and indexed. The rdmgr utility supports other types of requests too:

Code Example 27-1  rdmgr Submit  

// submit header fields

String SUBMIT_CSID = "submit-csid";

String SUBMIT_TYPE = "submit-type";

String SUBMIT_OPER = "submit-operation";

String SUBMIT_VIEW = "submit-view";

String SUBMIT_DB = "submit-database";

String SUBMIT_MESSAGE = "message";

String SUBMIT_ERROR = "error";

// submit operations

String SUBMIT_RETRIEVE = "retrieve";

String SUBMIT_INSERT = "insert";

String SUBMIT_DELETE = "delete";

String SUBMIT_UPDATE = "update";

// submit types

String SUBMIT_PERSISTENT = "persistent";

String SUBMIT_NONPERSISTENT = "nonpersistent";

String SUBMIT_MERGED = "merged";

Submit Operations

The submit operations are as follows:

retrieve

Retrieves the requested fields (the submit view) for the requested RDs. In this case the data is a list of RDs that can be specified by their URLs only. The server will return the requested fields for these RDs.

insert

The server adds or replaces the RDs supplied to the database.

delete

The server deletes the RDs. As with retrieve, it is sufficient to list the RDs by URL alone, it is not necessary to supply values for the fields of the RDs.

update

The server modifies the RDs in the database by merging any existing fields with the fields supplied in the data. If an attribute view list is supplied, only those attributes will be updated. If a view is not supplied, all of the given input attributes will be updated for each RD.

Submit Types

The submit types are as follow:

persistent

The operation is applied to the persistent part of each RD in the database. When an RD is retrieved from the database, or indexed, any persistent fields take precedence over non persistent fields. This allows you to manually edit the fields of an RD without having to worry that your edits will be lost the next time the RD is submitted by the robot, for example.

non-persistent

This is the default type. Data is normally added as non-persistent data. Note that RDs are only indexed and searchable if they have a non-empty non-persistent component.

merged

This is the default for retrieval. When data is retrieved, the persistent and non-persistent fields are merged together, with the persistent fields taking precedence over the non-persistent fields. You can view this as the persistent fields ‘covering’ the non-persistent fields. You can also retrieve just the ‘persistent’ fields, or just the ‘non persistent’ fields.



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.