filter.conf
. Filter definitions consist of filter directives, which each specify a robot application function.
Of the systems the Netscape Compass Server supports, the following systems can load functions into the server at run time and can therefore use plug-in functions:
What is the Robot Plug-in API?
The robot plug-in API is a set of functions and header files that help you create your own robot application functions to use with the directives in robot configuration files. The Netscape Compass Server uses this API to create the built-in functions for the directives used in filter.conf
(the robot filter configuration file).
The robot uses this API, so by becoming familiar with the API, you can learn how the robot works. This means you can override the robot functionality, add to it, or customize your own functions. For example, you can create functions that use a custom database for access control or you can create functions that create custom log files with special entries.
We expect that most people will write RAF functions in C. However, you can define the functions in any language as long as it can build a shared library. If you use C++, you will need to modify the provided C header files to be used by C++ files.
The following steps are a brief overview of the process for creating your own plug-in functions:
For Windows NT, you'll produce a dynamic-link library (DLL) containing your plug-
in functions. When this chapter refers to a shared object file, it refers also to
Windows NT DLLs.
.so
) file.
filter.conf
).
/bin/compass/sdk/robot/include/
directory contains all the header files you need to include when writing your plug-in functions.
The compassdir/bin/compass/sdk/robot/examples/
directory contains sample code, the header files, and a makefile. You should familiarize yourself with the code and samples.
code bold
):
The robot and its header files are written in ANSI C.robot
include
csinfo.h
csmem.h
filterrules.h
robotapi.h
base
systems.h
libc
adt.h
cs.h
csidcf.h
getopt.h
log.h
pblock.h
Table 11.1 Header files in the
Table 11.2 Header files in the
Table 11.3 Header files in the robotapi/include/libcs
contains header files of functions that deal with robot and HTTP-specific functions such as handling access to configuration files and dealing with HTTP.include
directory
base
directory
Header File
Description
systems.h
Contains functions that handle systems information.
libcs
directory
robotapi.h
. You will also find many useful functions in csinfo.h
.
All Robot Application Functions use parameter blocks, (pblocks) to receive and set parameter values. A parameter block stores parameters as name-value pairs. A parameter block is a hash table that is keyed on the name portion of each parameter it contains.
int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);pb is the parameter block containing the parameters for this function invocation. csf is the pointer to an enumeration or generation filter. Caution! The pb parameter should be considered read-only, and any data modification should be performed on copies of the data. Doing otherwise is unsafe in threaded server architectures, and will yield unpredictable results in multiprocess server architectures.
At the Setup stage, the filter is busy setting up, and cannot yet get information about the resource's URL or content.
At the MetaData stage, the robot has encountered the URL for a resource, but has not downloaded the resource's content, thus information is available about the URL itself, as well as data that is derived from other sources such as the filter.conf
file. At this stage however, information is not available about the content of the resource.
At the Data stage, the robot has downloaded the content of the URL, so information is available about the content, such as the description, the author, and so on.
At the Enumeration and Generation stages, the same data sources are available as for the Data stage.
At the Shutdown stage, the filter has done its filtering and is shutting down. Although functions written for this stage can use the same data sources as those available at the Data stage, usually shutdown functions restrict their operations to shutdown and clean up activities.
Passing Parameters to Robot Application Functions
You must use parameter blocks (pblocks) to pass arguments into Robot Application Functions, and to get data out of them. For example, the following directive (which would be in the filter.conf
file) invokes the filter-by-exact
function.
Data fn=filter-by-exact src=type deny=text/plain
The fn
parameter indicates the function to invoke, which in this case is filter-by-exact
. The src
and deny
arguments are parameters for the function. They will be passed to the function in a parameter block, and the function must be defined to dis-assemble the parameter block and extract the parameters and their values from it.
The three structures that are used to hold parameters are libcs_pb_param
, libcs_pb_entry
, and libcs_pblock
. These structures are defined in the header file bin/compass/sdk/robot/include/pblock.h
.
This holds a single parameter. It records the name and value of the parameter.
libcs_pb_param
typedef struct {
char *name,*value;
} libcs_pb_param;
This creates linked lists oflibcs_pb_entry
libcs_parameter
structures.
struct libcs_pb_entry {
libcs_pb_param *param;
struct libcs_pb_entry *next;
};
This is a hash-table containing an array of libcs_pblock
libcs_pb_entry
structures.
typedef struct {
int hsize;
struct libcs_pb_entry **ht;
} libcs_pblock;
libcs_pblock_findval(paramname, returnPblock)
uses the given return pblock to return the value of the named parameter in the RAF's input pblock. (For a fully-fledged example, see RAF Definition Example.)
When adding, removing, editing, and creating name-value pairs for parameters, your robot application functions can use the functions in the pblock.h
header file (in bin/compass/sdk/robot/include/libcs/
).
The names of these functions are all prefixed by libcs_. These functions are similar to the non-prefixed functions documented in Chapter 4, "NSAPI Function Reference" of the NSAPI Programmer's Guide.
You can find the NSAPI Programmer'sGguide at:
http://developer.netscape.com/docs/manuals/enterprise/nsapi/index.htm
For example, the function pb_create described in the NSAPI Programmer's Guide, has the same behavior as the function libcs_pb_create in thebin/compass/sdk/robot/include/libcs/pblock.h
header file.
The parameter manipulation functions are listed here. Please view the bin/compass/sdk/robot/include/libcs/pblock.h
header file for full function signatures, with return type and arguments.
libcs_param_create
creates a parameter with the given name and value. If the name and value aren't null, they are copied and placed into a new pb_param structure.
libcs_param_free
frees a given parameter if it's non-NULL. It returns 1 if the parameter was non-NULL, and 0 if it was NULL. This function is also useful for error checking before using the libcs_pblock_remove function.
libcs_pblock_create
creates a new parameter block with a hash table of a chosen size. Returns the newly allocated parameter block
libcs_pblock_free
frees a given parameter block and any entries inside it.
libcs_pblock_find
finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.
libcs_pblock_findval
also finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.
libcs_pblock_remove
behaves like the libcs_pblock_find function, but it also removes the entry from the pblock.
libcs_pblock_nninsert and libcs_pblock_nvinsert
these both create a new parameter with a given name and value, and insert it into a given parameter block. The libcs_pblock_nninsert function requires that the value be an integer, but the libcs_pblock_nvinsert function accepts a string.
libcs_pblock_pinsert
inserts a parameter into a parameter block.
libcs_pblock_str2pblock
scans the given string for parameter pairs in the format name=value or name="value", adds them to a pblock, and returns the number of parameters added.
libcs_pblock_pblock2str
places all of the parameters in the given parameter block into the given string. Each parameter is of the form name=
"value" and is separated by a space from any adjacent parameter.
int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);where csr is a data structure that contains information about the resource being processed. The
CSResource
structure is defined in the header file robotapi.h
. This structure contains information about the resource being processed. Each resource is in SOIF syntax.
Objects in SOIF syntax have a schema name, an associated URL, and a set of attribute-value pairs. In the following SOIF example, the schema name is @document
, the URL is http://developer.netscape.com/docs/manuals/htmlguid/index.htm, and the SOIF contains attribute-value pairs for title, author, and description.
@DOCUMENT { http://developer.netscape.com/docs/manuals/htmlguid/index.htmA
title{35}: HTML Tag Reference
author{37}: Nikki Writer
description{39}: Reference to HTML tags and attributes
}
CSResource
structure has a url
field, which contains the URL for the SOIF. It also has an rd
field, whose value is the SOIF for the resource. Once you get the SOIF for the resource, you can use the functions for working with SOIF that are defined in sdk/rdm/include/soif.h
to get more information about the resource. (The file robotapi.h
includes soif.h
.)
For example, the macro SOIF_Findval(soif, attribute)
gets the value of the given attribute in the given SOIF. The following sample code uses this macro to print the value of the META attribute if it exists for the resource being processed:
int my_new_raf(libcs_pblock *pb, CSFilter *csf, CSResource *csr)We encourage you to examine the
char *metavalue;
if (metavalue = (char *)SOIF_Findval(csr->rd, "meta"))
printf("The value of the META tag in the resource is %s" metavalue);
/* rest of function ... */
}
CSResource
structure in the file robotapi.h
to see what other fields it has and to see the macros, such as CSResource_GetSource()
, you can use to get information from the resource and to set information in the resource description.
For more information about the routines for working directly with SOIF objects, see Chapter 11, "Using the SOIF API to Work with SOIF Objects."
Returning a Response Status Code
When your robot application function has finished whatever it needs to do, it must return a code that tells the server how to proceed with the request.
These codes are defined in the header file bin/compass/sdk/robot/include/robotoapi.h
. The codes are:
REQ_PROCEED
The function performed its task, so proceed with the request.
REQ_ABORTED
The entire request should be aborted because an error occurred.
REQ_NOACTION
The function performed no task, but proceed anyway.
REQ_EXIT
End the session and exit.
REQ_RESTART
Restart the entire request-response process.
Reporting Errors to the Robot Log File
When problems occur, robot application functions should return an appropriator HTTP response status code (such as REQ_ABORTED
) and they should also log an error in the error log file.
To use the error-logging functionality, you must include the files log.h
and objlog.h
in the sdk/robot/include/libcs
directory. You should check that this directory does in fact contain these files (some of the early releases of Compass Server 3.0 omitted them).
If these files are not there, you can open them here, and save them to your bin/compass/sdk/robot/include/libcs
directory:
log.h
and objlog.h
exist in the correct place, you can use the cslog_error
macro to report errors. The prototype is:
cslog_error(int n, int loglevel, char* errorMessage)The first parameter is not currently used (it may be used in the future.) You can pass this as any integer. The second parameter is the log level. When the log level is less than or equal to the log level setting in the file
process.conf
, the error message is written in the robot.log.
The third parameter is the error message to print, and it has the same form as the argument to the standard printf()
function.
For example:
cslog_error(1, 1, ("fn=extract-html-text: Out of memory!\n"));This invocation of
cslog_error
would generate the following error message in the robot log file:
[22/Jan/1998:15:57:31] 8270@0: ERROR: fn=extract-html-text: Out of memory!For another example:
cslog_error(1, 1,This invocation of
("<URL:%s>: Error %d (%d): %s\n",
ep->eo->key,
urls->server_status,
status,
(s = cslog_linestr(urls->error_msg)))
cslog_error
would generate the following error message in the robot log file:
[22/Jan/1998:15:57:31] 8270@0: ERROR: <URL:http://nikki.boots.com:80/>: Error 0 (-240): Can't connect to server
classification
field of an RD. The copy_mv function allows the robot to get the value of an HTML <META>
tag of any name and store the value in the classification
field in the database. For example, using this function, you could instruct the robot to get the content of the <META NAME="topic">
tag, and store it as the classification of the resource.
You would invoke this function with a directive such as:
Generate fn=copy_mv src=topic dst=classificationHere is the sample function definition:
/******** example robot application function ********/
#include robotapi.h
#include pblock.h
#include log.h
#include objlog.h
NSAPI Public int copy_mv(libcs_pblock *pb, CSFilter *csf,
CSResource *csr)
{
char *s, *mv, *mvp;
/* Use the libcs_pblock_findval function to get the values of the
* "src" and "dst" parameters, which were specified by the
* directive that invoked this function */
char *src = libcs_pblock_findval("src", pb);
char *dst = libcs_pblock_findval("dst", pb);
/* if either the src or dst has not been supplied,
* log an error and return REQ_PROCEED */
if(!src || !dst) {
cslog_error(1, 1,
("<URL:%s>: Error: No source or destination available."
csr->url,)
return REQ_PROCEED;
}
/* If the current document does not have a META tag whose name
* matches the src parameter, just return, otherwise put the
* src value in the string s */
/* The function SOIF_Findval(soif, attribute) is defined
* in sdk/rdm/include/soif.h. It gets the value of the
* given attribute from the given resource.
* The rd in the CSResource is a soif that describes the resource.
*/
if(!(s = (char *)SOIF_Findval(csr->rd, src)))
return REQ_PROCEED;
/* Now insert the string s into the
* Classification field of the RD */
/* Deal with possibility that the classification field
* already has one or more values */
if((mv = libcs_pblock_findval(dst, csr->sources)) != NULL) {
mvp = malloc((strlen(mv)+strlen(s)+2));
sprintf(mvp, "%s;%s", mv, s);
/* append the new value to the existing values in the
* classification field, separated by ';' */
libcs_pblock_nvinsert(dst, mvp, csr->sources);
/* do some clean up */
free(mvp);
}
/* if no values already exist, do a simple value insert */
else
{
libcs_pblock_nvinsert(dst, s, csr->sources);
}
/* We're all done. Return a status code */
return REQ_PROCEED;
}
bin/compass/sdk/robot/include
directory for an example. The UNIX makefile assumes the use of gmake
.
magnus.conf
configuration file.
The following table describes the commands used to link object files into a shared object under the various UNIX platforms. In these examples, the compiled object files t.o and u.o are linked to form a shared object called test.so.Table 11.4 Options for linking
Loading Your Shared Object
The robot uses the filters defined in filter.conf
to filter resources that it encounters. If the file filter.conf
use your customized robot application functions, it must load the shared object (or DLL) that contains the functions.
To load the shared object, add a line to filter.conf
:
For UNIX
Init fn=load-modules shlib=[path]filename.so funcs="function1,function2,...,functionN"
For Windows NT
Init fn=load-modules shlib=[path]filename.dll funcs="function1,function2,...,functionN"
This initialization function opens the given shared object file (or DLL) and loads the functions function1, function2, and so on. You can then use the functions function1 and function2 in the robot configuration file (filter.conf
). Remember to use the functions only with the directives you wrote them for, as described in the following section.
Using your New Robot Application Functions
When you have compiled and arranged for the loading of your functions, you need to provide for their execution. All functions are called as follows:
Directive
fn=
function [
name1=
value1] ... [
nameN=
valueN]
wordcount
that can be used in the Data stage. This function counts the words in a resource and assigns the count to a destination specified by a parameter called dst
.
Data fn=wordcount dst=word-count
Last Updated: 02/07/98 20:49:14
Any sample code included above is provided for your use on an "AS IS" basis, under the Netscape License Agreement - Terms of Use