Sun Java System Portal Server 7.1 Developer's Guide

Chapter 23 Overview of the Search API

This chapter contains the following sections:

Introduction to the Search API

Resource descriptions in the search engine database are described in Search, and so are Resource Description Messages (RDMs) that processes can use to exchange resource descriptions across a network.

The Search API provides routines for creating and modifying Search objects in C.

The Search API is defined in the search.h file, in the following directory in your search engine installation directory:

PortalServer-base/sdk/rdm/include

This chapter is restricted to discussing the use of C functions that come with the search engine Search API. Therefore, it is strongly recommended that you have a basic understanding of the C programming language.


Note –

To correctly support all languages, it is important that all Search data should use the UTF-8 character set. Note that UTF-8 is fully backward compatible with 7-bit ASCII Search.


What is Search?

Search stands for Summary Object Interchange Format. It is a syntax that can be used in numerous situations. In particular, it is used to describe resource descriptions (RDs) in the search engine database.

The Search format is a basic attribute-value format. Search files look like text but should be treated as binary data and edited with care. Search files contain tabs, and many editors will convert tabs to spaces and corrupt the file. You can use Search-manipulation functions to create and modify Search objects so you do not have to write and edit them manually.

The following sample Search describes a document, whose title is “Rescuing English Springer Spaniels”, whose author is “Jocelyn Becker” and whose URL is

http://www.siroe.com/~jocelyn/resdogs/index.html:

@DOCUMENT { http://www.siroe.com/~jocelyn/resdogs/index.html
    title{34}: Rescuing English Springer Spaniels
    author{14}: Jocelyn Becker
}

Each Search object has a schema-name (or template type) and an associated URL, and it contains a list of attribute-value pairs. In this case, the schema name is @DOCUMENT, which indicates this resource is a document. Title and author are both attribute names, and you can see that each attribute has a value.

Using the Search API

The Search API is defined in the search.h header file in directory PortalServer-base/sdk/rdm/include.

The Search API defines structures and functions for working with Search objects. For example, the following code uses the functions Search_Create() and Search_InsertStr() to create a Search and add some attribute-value pairs to it:


Search mysearch=Search_Create("DOCUMENT", "http://varrius/doc.htm");
Search_InsertStr(mysearch, "title", "All About Style Sheets");
Search_InsertStr(mysearch, "author", "Robin Styles");
Search_InsertStr(mysearch, "description", "All you need to know about style sheets");

These commands create a Search like the following example:


@document { http://varrius/doc.htm
    title{22}: All About Style Sheets
    author{12}: Robin Styles
    description{38}: All you need to know about style sheets
}

Each Search object contains attribute-value pairs, which are each represented as SearchAVPair objects. Using the Search API, you can get and set values of attributes, you can create and delete attribute-value pairs, you can change the values of attributes, and you can add values to existing attributes. (Some attributes can have multiple values.)

Multiple Search objects can be grouped together into Search streams, which are represented by SearchStream objects. A SearchStream object provides functionality for handling a stream of Search objects. For example, you can use the stream to filter attributes, and print the desired attributes for every Search in the stream.

Thus, the relevant data structures when using the Search API include:

An Introductory Example

You will find several examples of the use of the Search API in PortalServer-base/sdk/rdm/examples directory.

This section discusses an example that is similar to (but not necessarily identical to) example1.c. It shows how to iterate through a Search stream and print the URL and number of attributes of each Search in the stream.

This example assumes that you have already created a file containing a Search stream which is available on stdin. For example, you could have created a Search stream containing one or more RDs from the search engine database, which you would do by using the routines in RDM.h.

This example uses Search_ParseInitFile() to create a Search stream from the standard input.


Example 23–1 Simple Search Stream Parsing Example


/* Example 1 - Simple Search Stream Parsing */

#include <stdio.h>
#include <stdlib.h>
#include “search.h”

int main(int argc, char *argv[])

{
/* Define a SearchStream and Search */
SearchStream *ss;
Search *s;
char *titleptr;

/* Open a Search stream that gets its Search from stdin */
ss = Search_ParseInitFile(stdin);
/* SearchStream_IsEOS() checks if this is the end of the stream */

while (!SearchStream_IsEOS(ss)) {
  if (!(s = SearchStream_Parse(ss)))
  /* Exit the loop if the Search is invalid */
  break;

/* Print the URL for each Search (will be “-” if there is no URL)*/
printf(“URL = %s\\n”, s->url);

/* Print the title if it exists. */
 titleptr = Search_Findval(s, “title”);
 printf(“Title = %s\\n”, titleptr ? titleptr : “(none)”)

/* Print the number of attributes in the Search*/
 printf(“# of Attributes = %d\\n”, Search_GetAttributeCount(s));

 /* release the memory used by the Search */
 Search_Free(s);
 }
 /* Close the SearchStream and exit */
 SearchStream_Finish(ss);
 exit(0);
}

Getting Search Server Database Contents as a SearchStream

You can retrieve the entire contents of the search engine database as a Search stream by using the rdmgr utility. The rdmgr utility must be run in a search-enabled Sun Java System Portal Server software instance directory. The default is PortalServer-base/bin directory.

From the PortalServer-base/bin directory, run the following command:

./rdmgr -U

Be sure that the environment variable LD_LIBRARY_PATH to PortalServer-base/lib directory.

This command prints the database contents as a SearchStream. You can pipe the output to a program that uses SearchStream routines to parse the Searches in the stream.