Compass Server 3.0 Developer's Guide

[Contents] [Previous] [Next] [Last]

Chapter 11
Creating New Robot Application Functions

This chapter describes how to create and compile your own plug-in robot application functions using the Netscape Compass robot plug-in application programming interface (API).

You would need to create your own robot application functions (RAFs) when you want to modify the behavior of the Compass robot filters in a way that is not accommodated either by the Compass Server Administration Interface or by the predefined robot application functions. Robot filters are defined in the file filter.conf. Filter definitions consist of filter directives, which each specify a robot application function.

Of the systems the Netscape Compass Server supports, the following systems can load functions into the server at run time and can therefore use plug-in functions:

What is the Robot Plug-in API?

The robot plug-in API is a set of functions and header files that help you create your own robot application functions to use with the directives in robot configuration files. The Netscape Compass Server uses this API to create the built-in functions for the directives used in filter.conf (the robot filter configuration file).

The robot uses this API, so by becoming familiar with the API, you can learn how the robot works. This means you can override the robot functionality, add to it, or customize your own functions. For example, you can create functions that use a custom database for access control or you can create functions that create custom log files with special entries.

We expect that most people will write RAF functions in C. However, you can define the functions in any language as long as it can build a shared library. If you use C++, you will need to modify the provided C header files to be used by C++ files.

The following steps are a brief overview of the process for creating your own plug-in functions:

  1. Compile your code to create a shared object (.so) file.

    For Windows NT, you'll produce a dynamic-link library (DLL) containing your plug- in functions. When this chapter refers to a shared object file, it refers also to Windows NT DLLs.

  2. In the Setup directives at the top of filter.conf, you tell the robot to load your shared object file or dynamic-link library.

  3. Write directives that use your plug-in functions in the robot configuration file (filter.conf).
The compassdir/bin/compass/sdk/robot/include/ directory contains all the header files you need to include when writing your plug-in functions.

The compassdir/bin/compass/sdk/robot/examples/ directory contains sample code, the header files, and a makefile. You should familiarize yourself with the code and samples.

The Robot Application Function Header Files

This section discusses the header files needed for creating robot application functions.

Header File Hierarchy

The hierarchy of robot plug-in API header files is (directories are shown in code bold):

robot
include
csinfo.h
csmem.h
filterrules.h
robotapi.h
base
systems.h
libc
adt.h
cs.h
csidcf.h
getopt.h
log.h
pblock.h
The robot and its header files are written in ANSI C.

Header File Contents

This section describes the header files you can include when writing your plug-in functions. This section is intended as a starting point for learning the functions included in the header files.

Most of the header files are stored in three directories:

Writing Robot Application Functions

The file that defines your robot application functions must include robotapi.h. You will also find many useful functions in csinfo.h.

All Robot Application Functions use parameter blocks, (pblocks) to receive and set parameter values. A parameter block stores parameters as name-value pairs. A parameter block is a hash table that is keyed on the name portion of each parameter it contains.

RAF Prototype

All robot application functions have the following prototype:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);
pb is the parameter block containing the parameters for this function invocation.

csf is the pointer to an enumeration or generation filter.

Caution! The pb parameter should be considered read-only, and any data modification should be performed on copies of the data. Doing otherwise is unsafe in threaded server architectures, and will yield unpredictable results in multiprocess server architectures.

Writing Functions for Specific Directives

You should write each function for a particular stage in the filtering process, (setup, metadata, data, enumeration, generation, and shutdown.) The function should only use the data sources that are available at the relevant stage. See the section Sources and Destinations in the previous chapter for a list of the data sources available at each stage.

At the Setup stage, the filter is busy setting up, and cannot yet get information about the resource's URL or content.

At the MetaData stage, the robot has encountered the URL for a resource, but has not downloaded the resource's content, thus information is available about the URL itself, as well as data that is derived from other sources such as the filter.conf file. At this stage however, information is not available about the content of the resource.

At the Data stage, the robot has downloaded the content of the URL, so information is available about the content, such as the description, the author, and so on.

At the Enumeration and Generation stages, the same data sources are available as for the Data stage.

At the Shutdown stage, the filter has done its filtering and is shutting down. Although functions written for this stage can use the same data sources as those available at the Data stage, usually shutdown functions restrict their operations to shutdown and clean up activities.

Passing Parameters to Robot Application Functions

You must use parameter blocks (pblocks) to pass arguments into Robot Application Functions, and to get data out of them. For example, the following directive (which would be in the filter.conf file) invokes the filter-by-exact function.

Data fn=filter-by-exact src=type deny=text/plain
The fn parameter indicates the function to invoke, which in this case is filter-by-exact. The src and deny arguments are parameters for the function. They will be passed to the function in a parameter block, and the function must be defined to dis-assemble the parameter block and extract the parameters and their values from it.

The three structures that are used to hold parameters are libcs_pb_param, libcs_pb_entry, and libcs_pblock. These structures are defined in the header file bin/compass/sdk/robot/include/pblock.h.

Working with Parameter Blocks

A parameter block stores parameters and values as name/value pairs. There are many pre-defined functions you can use to work with parameter blocks, to extract parameter values, change parameter values, and so on. For example, libcs_pblock_findval(paramname, returnPblock) uses the given return pblock to return the value of the named parameter in the RAF's input pblock. (For a fully-fledged example, see RAF Definition Example.)

When adding, removing, editing, and creating name-value pairs for parameters, your robot application functions can use the functions in the pblock.h header file (in bin/compass/sdk/robot/include/libcs/).

The names of these functions are all prefixed by libcs_. These functions are similar to the non-prefixed functions documented in Chapter 4, "NSAPI Function Reference" of the NSAPI Programmer's Guide.

You can find the NSAPI Programmer'sGguide at:

http://developer.netscape.com/docs/manuals/enterprise/nsapi/index.htm

For example, the function pb_create described in the NSAPI Programmer's Guide, has the same behavior as the function libcs_pb_create in the bin/compass/sdk/robot/include/libcs/pblock.h header file.

The parameter manipulation functions are listed here. Please view the bin/compass/sdk/robot/include/libcs/pblock.h header file for full function signatures, with return type and arguments.

libcs_param_create

creates a parameter with the given name and value. If the name and value aren't null, they are copied and placed into a new pb_param structure.

libcs_param_free

frees a given parameter if it's non-NULL. It returns 1 if the parameter was non-NULL, and 0 if it was NULL. This function is also useful for error checking before using the libcs_pblock_remove function.

libcs_pblock_create

creates a new parameter block with a hash table of a chosen size. Returns the newly allocated parameter block

libcs_pblock_free

frees a given parameter block and any entries inside it.

libcs_pblock_find

finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.

libcs_pblock_findval

also finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.

libcs_pblock_remove

behaves like the libcs_pblock_find function, but it also removes the entry from the pblock.

libcs_pblock_nninsert and libcs_pblock_nvinsert

these both create a new parameter with a given name and value, and insert it into a given parameter block. The libcs_pblock_nninsert function requires that the value be an integer, but the libcs_pblock_nvinsert function accepts a string.

libcs_pblock_pinsert

inserts a parameter into a parameter block.

libcs_pblock_str2pblock

scans the given string for parameter pairs in the format name=value or name="value", adds them to a pblock, and returns the number of parameters added.

libcs_pblock_pblock2str

places all of the parameters in the given parameter block into the given string. Each parameter is of the form name="value" and is separated by a space from any adjacent parameter.

Getting Information About the Resource Being Processed

As mentioned earlier, the prototype for all robot application functions is:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);
where csr is a data structure that contains information about the resource being processed.

The CSResource structure is defined in the header file robotapi.h. This structure contains information about the resource being processed. Each resource is in SOIF syntax.

Objects in SOIF syntax have a schema name, an associated URL, and a set of attribute-value pairs. In the following SOIF example, the schema name is @document, the URL is http://developer.netscape.com/docs/manuals/htmlguid/index.htm, and the SOIF contains attribute-value pairs for title, author, and description.

@DOCUMENT { http://developer.netscape.com/docs/manuals/htmlguid/index.htm
  title{35}: HTML Tag Reference
  author{37}: Nikki Writer
  description{39}: Reference to HTML tags and attributes
}
A CSResource structure has a url field, which contains the URL for the SOIF. It also has an rd field, whose value is the SOIF for the resource. Once you get the SOIF for the resource, you can use the functions for working with SOIF that are defined in sdk/rdm/include/soif.h to get more information about the resource. (The file robotapi.h includes soif.h.)

For example, the macro SOIF_Findval(soif, attribute) gets the value of the given attribute in the given SOIF. The following sample code uses this macro to print the value of the META attribute if it exists for the resource being processed:

int my_new_raf(libcs_pblock *pb, CSFilter *csf, CSResource *csr) 
  char *metavalue;
if (metavalue = (char *)SOIF_Findval(csr->rd, "meta"))
printf("The value of the META tag in the resource is %s" metavalue);
   /* rest of function ... */
}
We encourage you to examine the CSResource structure in the file robotapi.h to see what other fields it has and to see the macros, such as CSResource_GetSource(), you can use to get information from the resource and to set information in the resource description.

For more information about the routines for working directly with SOIF objects, see Chapter 11, "Using the SOIF API to Work with SOIF Objects."

Returning a Response Status Code

When your robot application function has finished whatever it needs to do, it must return a code that tells the server how to proceed with the request.

These codes are defined in the header file bin/compass/sdk/robot/include/robotoapi.h. The codes are:

REQ_PROCEED

The function performed its task, so proceed with the request.

REQ_ABORTED

The entire request should be aborted because an error occurred.

REQ_NOACTION

The function performed no task, but proceed anyway.

REQ_EXIT

End the session and exit.

REQ_RESTART

Restart the entire request-response process.

Reporting Errors to the Robot Log File

When problems occur, robot application functions should return an appropriator HTTP response status code (such as REQ_ABORTED) and they should also log an error in the error log file.

To use the error-logging functionality, you must include the files log.h and objlog.h in the sdk/robot/include/libcs directory. You should check that this directory does in fact contain these files (some of the early releases of Compass Server 3.0 omitted them).

If these files are not there, you can open them here, and save them to your bin/compass/sdk/robot/include/libcs directory:

log.h

objlog.h

After you have ensured that log.h and objlog.h exist in the correct place, you can use the cslog_error macro to report errors. The prototype is:

cslog_error(int n, int loglevel, char* errorMessage)
The first parameter is not currently used (it may be used in the future.) You can pass this as any integer.

The second parameter is the log level. When the log level is less than or equal to the log level setting in the file process.conf, the error message is written in the robot.log.

The third parameter is the error message to print, and it has the same form as the argument to the standard printf() function.

For example:

cslog_error(1, 1, ("fn=extract-html-text: Out of memory!\n")); 
This invocation of cslog_error would generate the following error message in the robot log file:

[22/Jan/1998:15:57:31]  8270@0: ERROR: fn=extract-html-text: Out of memory! 
For another example:

cslog_error(1, 1, 
("<URL:%s>: Error %d (%d): %s\n",
ep->eo->key,
urls->server_status,
status,
(s = cslog_linestr(urls->error_msg)))
This invocation of cslog_error would generate the following error message in the robot log file:

[22/Jan/1998:15:57:31]  8270@0: ERROR: <URL:http://nikki.boots.com:80/>: Error 0 (-240): Can't connect to server 

RAF Definition Example

This section shows an example definition for a robot application function.

This function copies a specified source data to a multi-valued field in an RD. For example, the Compass Server stores category or classification information in the classification field of an RD. The copy_mv function allows the robot to get the value of an HTML <META> tag of any name and store the value in the classification field in the database. For example, using this function, you could instruct the robot to get the content of the <META NAME="topic"> tag, and store it as the classification of the resource.

You would invoke this function with a directive such as:

Generate fn=copy_mv src=topic dst=classification
Here is the sample function definition:

/******** example robot application function ********/
#include robotapi.h
#include pblock.h
#include log.h
#include objlog.h
NSAPI Public int copy_mv(libcs_pblock *pb, CSFilter *csf, 
CSResource *csr)
{
  char *s, *mv, *mvp;
/* Use the libcs_pblock_findval function to get the values of the
 * "src" and "dst" parameters, which were specified by the
 * directive that invoked this function */
  char *src = libcs_pblock_findval("src", pb); 
char *dst = libcs_pblock_findval("dst", pb);
/* if either the src or dst has not been supplied, 
 * log an error and return REQ_PROCEED */
  if(!src || !dst) {
cslog_error(1, 1,
("<URL:%s>: Error: No source or destination available."
csr->url,)
return REQ_PROCEED;
}
/* If the current document does not have a META tag whose name
 * matches the src parameter, just return, otherwise put the
 * src value in the string s */
/* The function SOIF_Findval(soif, attribute) is defined 
 * in sdk/rdm/include/soif.h. It gets the value of the
 * given attribute from the given resource.
 * The rd in the CSResource is a soif that describes the resource.
*/  
  if(!(s = (char *)SOIF_Findval(csr->rd, src))) 
return REQ_PROCEED;
/* Now insert the string s into the 
 * Classification field of the RD */
/* Deal with possibility that the classification field 
 * already has one or more values */
if((mv = libcs_pblock_findval(dst, csr->sources)) != NULL) {
mvp = malloc((strlen(mv)+strlen(s)+2));
sprintf(mvp, "%s;%s", mv, s);
/* append the new value to the existing values in the
     * classification field, separated by ';' */
libcs_pblock_nvinsert(dst, mvp, csr->sources);
/* do some clean up */
free(mvp);
}
/* if no values already exist, do a simple value insert */
else
{
libcs_pblock_nvinsert(dst, s, csr->sources);
}
  /* We're all done. Return a status code */
return REQ_PROCEED;
}

Compiling and Linking your Code

You can compile your code with any ANSI C compiler. See the makefile in the bin/compass/sdk/robot/include directory for an example. The UNIX makefile assumes the use of gmake.

For Windows NT
For UNIX
This section lists the linking options you need to use to create a UNIX shared object. The server can be instructed to load by commands in the magnus.conf configuration file.

The following table describes the commands used to link object files into a shared object under the various UNIX platforms. In these examples, the compiled object files t.o and u.o are linked to form a shared object called test.so.

Table 11.4 Options for linking
System Compile options
IRIX

ld -shared t.o u.o -o test.so

SunOS

ld -assert pure-text t.o u.o -o test.so

Solaris

ld -G t.o u.o -o test.so

OSF/1

ld -all -shared -expect_unresolved "*" t.o u.o -o test.so

HP-UX

ld -b t.o u.o -o test.so

When compiling your code, you must also use the +z flag to the HP C compiler.

AIX

cc -bM:SRE -berok t.o u.o -o test.so -bE:ext.exp -lc

The ext.exp file must be a text file with the name of a function that is externally accessible for each line.

Loading Your Shared Object

The robot uses the filters defined in filter.conf to filter resources that it encounters. If the file filter.conf use your customized robot application functions, it must load the shared object (or DLL) that contains the functions.

To load the shared object, add a line to filter.conf:

For UNIX
Init fn=load-modules shlib=[path]filename.so funcs="function1,function2,...,functionN"
For Windows NT
Init fn=load-modules shlib=[path]filename.dll funcs="function1,function2,...,functionN"
This initialization function opens the given shared object file (or DLL) and loads the functions function1, function2, and so on. You can then use the functions function1 and function2 in the robot configuration file (filter.conf). Remember to use the functions only with the directives you wrote them for, as described in the following section.

Using your New Robot Application Functions

When you have compiled and arranged for the loading of your functions, you need to provide for their execution. All functions are called as follows:

Directive fn=function [name1=value1] ... [nameN=valueN]
These two parameters are mandatory. After this, there may be an arbitrary number of function-specific parameters, each of which is a name-value pair.

You specify your function in the directive it was written for. For example, the following line uses a plug-in function called wordcount that can be used in the Data stage. This function counts the words in a resource and assigns the count to a destination specified by a parameter called dst.

Data fn=wordcount dst=word-count


[Contents] [Previous] [Next] [Last]

Last Updated: 02/07/98 20:49:14

Any sample code included above is provided for your use on an "AS IS" basis, under the Netscape License Agreement - Terms of Use