Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server 6 2004Q2 Developer's Guide 

Chapter 19
Writing New Robot Application Functions

This chapter contains the following sections:


Introduction

When you write robot application functions, you must make sure that the file that defines your robot application functions includes robotapi.h. You will also find many useful functions in csinfo.h.

All Robot Application Functions use parameter blocks, (pblocks) to receive and set parameter values. A parameter block stores parameters as name-value pairs. A parameter block is a hash table that is keyed on the name portion of each parameter it contains.

RAF Prototype

All robot application functions have the following prototype:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);

pb is the parameter block containing the parameters for this function invocation.

csf is the pointer to an enumeration or generation filter.


Note

The pb parameter is read-only, and any data modification should be performed on copies of the data. Doing otherwise is unsafe in threaded server architectures and will yield unpredictable results in multiprocess server architectures.


Writing Functions for Specific Directives

You should write each function for a particular stage in the filtering process, (setup, metadata, data, enumeration, generation, and shutdown.) The function should only use the data sources that are available at the relevant stage. See the section Sources and Destinations (in the Portal Server Administration Guide) for a list of the data sources available at each stage.

At the Setup stage, the filter is preparing for setup and cannot get information about the resource’s URL or content.

At the MetaData stage, the robot has encountered a URL for a resource but has not downloaded the resource’s content. Consequently, information is available about the URL and the data that is derived from other sources such as the filter.conf file. At this stage, information is not available about the content of the resource.

At the Data stage, the robot has downloaded the content of the URL, so information is available about the content, such as the description, the author, and so on.

At the Enumeration and Generation stages, the same data sources are available as for the Data stage.

At the Shutdown stage, the filter has completed its processes and shuts down. Although functions written for this stage can use the same data sources as those available at the Data stage, shutdown functions typically restrict their operations to shutdown and clean up activities.

Passing Parameters to Robot Application Functions

You must use parameter blocks (pblocks) to pass arguments into Robot Application Functions and to extract data from them. For example, the following directive (in the filter.conf file) invokes the filter-by-exact function.

Data fn=filter-by-exact src=type deny=text/plain

The fn parameter indicates the function to invoke, which in this case is filter-by-exact. The src and deny arguments are parameters used with the function. They will be passed to the function in a parameter block, and the function should be defined to extract its parameters and their values from the parameter block.

The three structures that are used to hold parameters are libcs_pb_param, libcs_pb_entry, and libcs_pblock. These structures are defined in the header file portal-server-install-root/SUNWps/sdk/robot/include/libcs/pblock.h file.

libcs_pb_param

This structure holds a single parameter. It records the name and value of the parameter:

typedef struct {

    char *name,*value;

} libcs_pb_param;

libcs_pb_entry

This structure creates linked lists of libcs_parameter structures:

struct libcs_pb_entry {

    libcs_pb_param *param;

    struct libcs_pb_entry *next;

};

libcs_pblock

This structure is a hash-table containing an array of libcs_pb_entry structures:

typedef struct {

    int hsize;

    struct libcs_pb_entry **ht;

} libcs_pblock;

Working with Parameter Blocks

A parameter block stores parameters and values as name/value pairs. There are many pre-defined functions you can use to work with parameter blocks, to extract parameter values, to change parameter values, and so on. For example, libcs_pblock_findval(paramname, returnPblock) uses the given return pblock to return the value of the named parameter in the RAF’s input pblock. For an example, see RAF Definition Example.

When adding, removing, editing, and creating name-value pairs for parameters, your robot application functions can use the functions in the pblock.h header file (in portal-server-install-root/SUNWps/sdk/robot/include/libcs directory).

The names of these functions are all prefixed by libcs_.

The following table contains the parameter manipulation functions in the first (left) column and a description of the corresponding function in the second (right) column. See the portal-server-install-root/SUNWps/sdk/robot/include/libcs/pblock.h header file for full function signatures with return type and arguments.

libcs_param_create

Creates a parameter with the given name and value. If the name and value are not null, they are copied and placed into a new pb_param structure.

libcs_param_free

Frees a given parameter if it is non-NULL. It returns 1 if the parameter was non-NULL and 0 if it was NULL. This function is useful for error checking before using the libcs_pblock_remove function.

libcs_pblock_create

Creates a new parameter block with a hash table of a chosen size. Returns the newly allocated parameter block

libcs_pblock_free

Frees a given parameter block and any entries inside it.

libcs_pblock_find

Finds the entry with the given name in a pblock and returns its value, otherwise returns NULL.

libcs_pblock_findval

Finds the entry with the given name in a pblock, and returns its value, otherwise returns NULL.

libcs_pblock_remove

Behaves like the libcs_pblock_find function, but in addition, it removes the entry from the pblock.

libcs_pblock_nninsert and libcs_pblock_nvinsert

These parameters create a new parameter with a given name and value and insert it into a given parameter block. The libcs_pblock_nninsert function requires that the value be an integer, but the libcs_pblock_nvinsert function accepts a string.

libcs_pblock_pinsert

Inserts a parameter into a parameter block.

libcs_pblock_str2pblock

Scans the given string for parameter pairs in the format name=value or name="value", adds them to a pblock, and returns the number of parameters added.

libcs_pblock_pblock2str

Places all of the parameters in the given parameter block into the given string. Each parameter is of the form name="value" and is separated by a space from any adjacent parameter.

Getting Information on the Processed Resource

As mentioned in RAF Prototype, the prototype for all robot application functions is in the following format:

int (*RobotAPIFn)(pblock *pb, CSFilter *csf, CSResource *csr);

where csr is a data structure that contains information about the resource being processed.

The CSResource structure is defined in the header file robotapi.h. This structure contains information about the resource being processed. Each resource is in SOIF syntax.

Objects in SOIF syntax have a schema name, an associated URL, and a set of attribute-value pairs.

In the Code Example 19-1, the schema name is @DOCUMENT, the URL is: http://developer.siroe.com/docs/manuals/htmlguid/index.htm, and the SOIF contains attribute-value pairs for title, author, and description.

Code Example 19-1  SOIF Syntax Example  

@DOCUMENT{ http://developer.siroe.com/docs/manuals/htmlguid/index.htm

    title{18}: HTML Tag Reference

    author{11}: Preston Day

    description{37}: Reference to HTML tags and attributes

}

A CSResource structure has a url field, which contains the URL for the SOIF. It also has an rd field, whose value is the SOIF for the resource. Once you get the SOIF for the resource, you can use the functions for working with SOIF that are defined in portal-server-install-root/SUNWps/sdk/rdm/include/soif.h file to get more information about the resource. (The file robotapi.h includes soif.h.)

For example, the macro SOIF_Findval(soif, attribute) gets the value of the given attribute in the given SOIF. Code Example 19-2 uses this macro to print the value of the META attribute if it exists for the resource being processed:

Code Example 19-2  SOIF_Findval Macro Example  

int my_new_raf(libcs_pblock *pb, CSFilter *csf, CSResource *csr)

    char *metavalue;

    if (metavalue = (char *)SOIF_Findval(csr->rd, “meta”))

    printf(“The value of the META tag in the resource is %s” metavalue);

   /* rest of function ... */

}

It is recommended that you review the CSResource structure in the file robotapi.h for more information on other fields and macros. For more information about the routines to use with SOIF objects, see Chapter 9, "Extending the Container Providers."

Returning a Response Status Code

When your robot application function has finished processing, it must return a code that tells the server how to proceed with the request.

These codes are defined in the header file portal-server-install-root/SUNWps/sdk/robot/include/robotoapi.h. The following table describes the response status codes after the robot has completed processing in the first (left) column and includes a description of the corresponding status code in the second (right) column:

REQ_PROCEED

The function performed its task, so proceed with the request.

REQ_ABORTED

The entire request should be aborted because an error occurred.

REQ_NOACTION

The function performed no task, but proceed anyway.

REQ_EXIT

End the session and exit.

REQ_RESTART

Restart the entire request-response process.

Reporting Errors to the Robot Log File

When problems occur, robot application functions should return an appropriate response status code (such as REQ_ABORTED), and they should also log an error in the error log file.

To use the error-logging functionality, you must include the file log.h in the portal-server-install-root/SUNWps/sdk/robot/include/libcs directory.

After you have ensured that log.h exists in the correct place, you can use the cslog_error macro to report errors. The prototype is in the following format:

cslog_error(int n, int loglevel, char* errorMessage)

The first parameter is not currently used (may be used in the future) You can pass this as any integer.

The second parameter is the log level. When the log level is less than or equal to the log level setting in the file process.conf, the error message is written in the robot.log.

The third parameter is the error message to print, and it has the same form as the argument to the standard printf() function.

For example:

cslog_error(1, 1, ("fn=extract-html-text: Out of memory!\n"));

This invocation of cslog_error would generate the following error message in the robot log file:

[22/Jan/1998:15:57:31] 8270@0: ERROR: fn=extract-html-text: Out of memory!

For another example:

cslog_error(1, 1,

    ("<URL:%s>: Error %d (%d): %s\n",

    ep->eo->key,

    urls->server_status,

    status,

    (s = cslog_linestr(urls->error_msg)))

This invocation of cslog_error would generate the following error message in the robot log file:

[22/Mar/2002:15:57:31] 8270@0: ERROR: <URL:http://budgie.siroe.com:80/>: Error 0 (-240): Can’t connect to server

RAF Definition Example

This section shows an example definition for a robot application function.

This function copies a specified source data to a multi-valued field in an RD. For example, the search engine stores category or classification information in the classification field of an RD. The copy_mv function allows the robot to get the value of an HTML <META> tag of any name and store the value in the classification field in the database. For example, using this function, you could instruct the robot to get the content of the <META NAME="topic"> tag, and store it as the classification of the resource.

You would invoke this function with a directive such as the following:

Generate fn=copy_mv src=topic dst=classification

Code Example 19-3 shows a sample function definition:

Code Example 19-3  Robot Application Function Example  

/******** example robot application function ********/

#include robotapi.h

#include pblock.h

#include log.h

#include objlog.h

NSAPI Public int copy_mv(libcs_pblock *pb, CSFilter *csf,

CSResource *csr)

{

char *s, *mv, *mvp;

/* Use the libcs_pblock_findval function to get the values of the

* "src" and "dst" parameters, which were specified by the

* directive that invoked this function */

char *src = libcs_pblock_findval("src", pb);

char *dst = libcs_pblock_findval("dst", pb);

if(!src || !dst) {

cslog_error(1, 1,

("<URL:%s>: Error: No source or destination available."

csr->url,)

return REQ_PROCEED;

}

/* If the current document does not have a META tag whose name

* matches the src parameter, just return, otherwise put the

* src value in the string s */

/* The function SOIF_Findval(soif, attribute) is defined

* in sdk/rdm/include/soif.h. It gets the value of the

* given attribute from the given resource.

* The rd in the CSResource is a soif that describes the resource

*/

if(!(s = (char *)SOIF_Findval(csr->rd, src)))

return REQ_PROCEED;

/* Now insert the string s into the

* Classification field of the RD */

/* Deal with possibility that the classification field

* already has one or more values */

if((mv = libcs_pblock_findval(dst, csr->sources)) != NULL) {

  sprintf(mvp, "%s;%s", mv, s);

  mvp = malloc((strlen(mv)+strlen(s)+2));

   /* append the new value to the existing values in the

  * classification field, separated by ’;’ */

  libcs_pblock_nvinsert(dst, mvp, csr->sources);

   /* do some clean up */

  free(mvp);

}

/* if no values already exist, do a simple value insert */

else

{

  libcs_pblock_nvinsert(dst, s, csr->sources);

}

/* We’re all done. Return a status code */

return REQ_PROCEED;

}


Compiling and Linking your Code

You can compile your code with any ANSI C compiler. See the makefile in the portal-server-install-root/SUNWps/sdk/robot/example directory for an example. The makefile assumes the use of gmake.

This section lists the linking options you need to use to create a shared object that the robot can be instructed to load by commands in the filter.conf configuration file. Note that you can link object files into a shared object. In Table 19-1, the compiled object files t.o and u.o are linked to form a shared object called test.so.

Table 19-1  Options for linking  

System

Compile options

Solaris

ld -G t.o u.o -o test.so


Loading Your Shared Object

The robot uses the filters defined in filter.conf to filter resources that it encounters. If the file filter.conf uses your customized robot application functions, it must load the shared object that contains the functions.

To load the shared object, add a line to filter.conf:

Init fn=load-modules shlib=[path]filename.so funcs="function1,function2,...,functionN"

This initialization function opens the given shared object file and loads the functions function1, function2, and so on. You can then use the functions function1 and function2 in the robot configuration file (filter.conf). Remember to use the functions only with the directives you wrote them for, as described in the following section.


Using your New Robot Application Functions

When you have compiled and arranged for the loading of your functions, you need to provide for their execution. All functions are called as follows:

Directive fn=function [name1=value1] ... [nameN=valueN]

These two parameters are mandatory. In addition, there may be an arbitrary number of function-specific parameters, each of which is a name-value pair.

You will need to specify your function in the directive for which it was written. For example, the following line uses a plug-in function called wordcount that can be used in the Data stage. This function counts the words in a resource and assigns the count to a destination specified by a parameter called dst.

Data fn=wordcount dst=word-count



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.