Sun Java System Portal Server 7.1 Developer's Guide

Overview of Writing Robot Application Functions

When you write robot application functions, make sure that the file that defines your robot application functions includes robotapi.h. You will also find many useful functions in csinfo.h.

All Robot Application Functions use parameter blocks, (pblocks) to receive and set parameter values. A parameter block stores parameters as name-value pairs. A parameter block is a hash table that is keyed on the name portion of each parameter it contains.

RAF Prototype

All robot application functions have the following prototype:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);

pb is the parameter block containing the parameters for this function invocation.

csf is the pointer to an enumeration or generation filter.


Note –

The pb parameter is read-only, and any data modification should be performed on copies of the data. Doing otherwise is unsafe in threaded server architectures and will yield unpredictable results in multiprocess server architectures.


Writing Functions for Specific Directives

You should write each function for a particular stage in the filtering process, (setup, metadata, data, enumeration, generation, and shutdown.) The function should only use the data sources that are available at the relevant stage. See the section Sources and Destinations (in the Administration Guide) for a list of the data sources available at each stage.

At the Setup stage, the filter is preparing for setup and cannot get information about the resource’s URL or content.

At the MetaData stage, the robot has encountered a URL for a resource but has not downloaded the resource’s content. Consequently, information is available about the URL and the data that is derived from other sources such as the filter.conf file. At this stage, information is not available about the content of the resource.

At the Data stage, the robot has downloaded the content of the URL, so information is available about the content, such as the description, the author, and so on.

At the Enumeration and Generation stages, the same data sources are available as for the Data stage.

At the Shutdown stage, the filter has completed its processes and shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, shutdown functions typically restrict their operations to shutdown and clean up activities.

Passing Parameters to Robot Application Functions

You must use parameter blocks (pblocks) to pass arguments into Robot Application Functions and to extract data from them. For example, the following directive (in the filter.conf file) invokes the filter-by-exact function.

Data fn=filter-by-exact src=type deny=text/plain

The fn parameter indicates the function to invoke, which in this case is filter-by-exact. The src and deny arguments are parameters used with the function. They will be passed to the function in a parameter block, and the function should be defined to extract its parameters and their values from the parameter block.

The three structures that are used to hold parameters are libcs_pb_param, libcs_pb_entry, and libcs_pblock. These structures are defined in the header file PortalServer-base/sdk/robot/include/libcs/pblock.h file.

libcs_pb_param

This structure holds a single parameter. It records the name and value of the parameter:


typedef struct {
    char *name,*value;
} libcs_pb_param;
libcs_pb_entry

This structure creates linked lists of libcs_parameter structures:


struct libcs_pb_entry {
    libcs_pb_param *param;
    struct libcs_pb_entry *next;
};
libcs_pblock

This structure is a hash-table containing an array of libcs_pb_entry structures:


typedef struct {
    int hsize;
    struct libcs_pb_entry **ht;
} libcs_pblock;

Working with Parameter Blocks

A parameter block stores parameters and values as name/value pairs. There are many pre-defined functions you can use to work with parameter blocks, to extract parameter values, to change parameter values, and so on. For example, libcs_pblock_findval(paramname, returnPblock) uses the given return pblock to return the value of the named parameter in the RAF’s input pblock. For an example, see RAF Definition Example.

When adding, removing, editing, and creating name-value pairs for parameters, your robot application functions can use the functions in the pblock.h header file (in PortalServer-base/sdk/robot/include/libcs directory).

The names of these functions are all prefixed by libcs_.

The parameter manipulation functions and their description is provided below. See the PortalServer-base/sdk/robot/include/libcs/pblock.h header file for full function signatures with return type and arguments.

libcs_param_create

Creates a parameter with the given name and value. If the name and value are not null, they are copied and placed into a new pb_param structure.

libcs_param_free

Frees a given parameter if it is non-NULL. It returns 1 if the parameter was non-NULL and 0 if it was NULL. This function is useful for error checking before using the libcs_pblock_remove function.

libcs_pblock_create

Creates a new parameter block with a hash table of a chosen size. Returns the newly allocated parameter block

libcs_pblock_free

Frees a given parameter block and any entries inside it.

libcs_pblock_find

Finds the entry with the given name in a pblock and returns its value, otherwise returns NULL.

libcs_pblock_findval

Finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.

libcs_pblock_remove

Behaves like the libcs_pblock_find function, but in addition, it removes the entry from the pblock.

libcs_pblock_nninsert and libcs_pblock_nvinsert

These parameters create a new parameter with a given name and value and insert it into a given parameter block. The libcs_pblock_nninsert function requires that the value be an integer, but the libcs_pblock_nvinsert function accepts a string.

libcs_pblock_pinsert

Inserts a parameter into a parameter block.

libcs_pblock_str2pblock

Scans the given string for parameter pairs in the format name=value or name="value", adds them to a pblock, and returns the number of parameters added.

libcs_pblock_pblock2str

Places all of the parameters in the given parameter block into the given string. Each parameter is of the form name="value" and is separated by a space from any adjacent parameter.

Getting Information on the Processed Resource

As mentioned in RAF Prototype, the prototype for all robot application functions is in the following format:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);

where csr is a data structure that contains information about the resource being processed.

The CSResource structure is defined in the header file robotapi.h. This structure contains information about the resource being processed. Each resource is in Search syntax.

Objects in Search syntax have a schema name, an associated URL, and a set of attribute-value pairs.

In the Getting Information on the Processed Resource, the schema name is @DOCUMENT, the URL is: http://developer.siroe.com/docs/manuals/htmlguid/index.htm, and the Search contains attribute-value pairs for title, author, and description.


Example 22–1 Search Syntax Example


@DOCUMENT{ http://developer.siroe.com/docs/manuals/htmlguid/index.htm
    title{18}: HTML Tag Reference
    author{11}: Preston Day
    description{37}: Reference to HTML tags and attributes
}

A CSResource structure has a url field, which contains the URL for the Search. It also has an rd field, whose value is the Search for the resource. Once you get the Search for the resource, you can use the functions for working with Search that are defined in PortalServer-base/sdk/rdm/include/search.h file to get more information about the resource. (The file robotapi.h includes search.h.)

For example, the macro Search_Findval(search, attribute) gets the value of the given attribute in the given Search. Getting Information on the Processed Resource uses this macro to print the value of the META attribute if it exists for the resource being processed.


Example 22–2 Search_Findval Macro Example


int my_new_raf(libcs_pblock *pb, CSFilter *csf, CSResource *csr)
    char *metavalue;
    if (metavalue = (char *)Search_Findval(csr->rd, “meta”))
    printf(“The value of the META tag in the resource is %s” metavalue);
    /* rest of function ... */
}

It is recommended that you review the CSResource structure in the file robotapi.h for more information on other fields and macros. For more information about the routines to use with Search objects, see Memory Buffer Management.

Returning a Response Status Code

When your robot application function has finished processing, it must return a code that tells the server how to proceed with the request.

These codes are defined in the header file PortalServer-base/sdk/robot/include/robotoapi.h. The list of response status codes after the robot has completed processing and their description are:

REQ_PROCEED

The function performed its task, so proceed with the request.

REQ_ABORTED

The entire request should be aborted because an error occurred.

REQ_NOACTION

The function performed no task, but proceed anyway.

REQ_EXIT

End the session and exit.

REQ_RESTART

Restart the entire request-response process.

Reporting Errors to the Robot Log File

When problems occur, robot application functions should return an appropriate response status code (such as REQ_ABORTED), and they should also log an error in the error log file.

To use the error-logging functionality, you must include the file log.h in the PortalServer-base/sdk/robot/include/libcs directory.

After you have ensured that log.h exists in the correct place, you can use the cslog_error macro to report errors. The prototype is in the following format:

cslog_error(int n, int loglevel, char* errorMessage)

The first parameter is not currently used (may be used in the future) You can pass this as any integer.

The second parameter is the log level. When the log level is less than or equal to the log level setting in the file process.conf, the error message is written in the robot.log.

The third parameter is the error message to print, and it has the same form as the argument to the standard printf() function.

For example:


cslog_error(1, 1, ("fn=extract-html-text: Out of memory!\\n"));

         

This invocation of cslog_error would generate the following error message in the robot log file:


[22/Jan/1998:15:57:31]  8270@0: ERROR: fn=extract-html-text: Out of memory!

For another example:


cslog_error(1, 1,
    ("<URL:%s>: Error %d (%d): %s\\n",
    ep->eo->key,
    urls->server_status,
    status,
    (s = cslog_linestr(urls->error_msg)))

This invocation of cslog_error would generate the following error message in the robot log file:


[22/Mar/2002:15:57:31]  8270@0: ERROR: <URL:http://budgie.siroe.com:80/>:
Error 0 (-240): Can’t connect to server

RAF Definition Example

This section shows an example definition for a robot application function.

This function copies a specified source data to a multi-valued field in an RD. For example, the search engine stores category or classification information in the classification field of an RD. The copy_mv function allows the robot to get the value of an HTML <META> tag of any name and store the value in the classification field in the database. For example, using this function, you could instruct the robot to get the content of the <META NAME="topic"> tag, and store it as the classification of the resource.

You would invoke this function with a directive such as the following:

Generate fn=copy_mv src=topic dst=classification

RAF Definition Example shows a sample function definition.


Example 22–3 Robot Application Function Example


/******** example robot application function ********/
#include robotapi.h
#include pblock.h
#include log.h
#include objlog.h

NSAPI Public int copy_mv(libcs_pblock *pb, CSFilter *csf,
CSResource *csr)
{
char *s, *mv, *mvp;

/* Use the libcs_pblock_findval function to get the values of the
 * "src" and "dst" parameters, which were specified by the
 * directive that invoked this function */

char *src = libcs_pblock_findval("src", pb);
char *dst = libcs_pblock_findval("dst", pb);

if(!src || !dst) {
 cslog_error(1, 1,
 ("<URL:%s>: Error: No source or destination available."
 csr->url,)
 return REQ_PROCEED;
}

/* If the current document does not have a META tag whose name
 * matches the src parameter, just return, otherwise put the
 * src value in the string s */

/* The function Search_Findval(search, attribute) is defined
 * in sdk/rdm/include/search.h. It gets the value of the
 * given attribute from the given resource.
 * The rd in the CSResource is a search that describes the resource
*/

if(!(s = (char *)Search_Findval(csr->rd, src)))
 return REQ_PROCEED;

/* Now insert the string s into the
 * Classification field of the RD */
/* Deal with possibility that the classification field
 * already has one or more values */
 if((mv = libcs_pblock_findval(dst, csr->sources)) != NULL) {
   sprintf(mvp, "%s;%s", mv, s);
   mvp = malloc((strlen(mv)+strlen(s)+2));
   /* append the new value to the existing values in the
   * classification field, separated by ’;’ */
   libcs_pblock_nvinsert(dst, mvp, csr->sources);
   /* do some clean up */
   free(mvp);
}

/* if no values already exist, do a simple value insert */
else
{
   libcs_pblock_nvinsert(dst, s, csr->sources);
}

/* We’re all done. Return a status code */
return REQ_PROCEED;
}