[Top] [Prev] [Next] [Bottom]


12 - Procedure for Writing an Agent

Once the agent schema has been defined, you can add the Site/SunNet/Domain Manager code to turn your program into an agent. This chapter discusses the procedure for developing a Site/SunNet/Domain Manager agent, including: This chapter discusses an agent's flow of control, and how an agent interacts with the Agent Services library. For reference material, see the man pages in the Appendix B, "Man Page Summary for Writers of Agent Software ." Use Chapter 14, "Converting an Existing Application to an Agent ," as an outline for converting your application to an agent.

Figure  12-1 Agent Initialization

12.1 Agent Initialization and Startup

The first step for the agent is initialization. See Figure 12-1 for a flow diagram.

After parsing any command-line parameters and executing agent-specific initialization, the agent calls netmgt_init_rpc_agent() to register with the RPC system and initialize the Agent Services library data structures. The parameters of this call are protocol-specific. They do not directly affect the agent application. They fall into several categories:

Agent Identification

identifies an agent by the RPC service name, agent-writer assigned serial number, RPC program number, and RPC version number parameters.
Transaction Characterization
characterizes a transaction by the transport protocol to be used in manager/agent communication, the transaction time limit, and various flags such as whether agent requests run as subprocesses.
Callback Routines
lists agent-supplied routines called by the Agent Services library to validate a request from a manager, dispatch the request, reap the agent child spawned by the request, and shutdown the agent's parent process.
After initialization, the agent starts itself by making a call to netmgt_start_agent(). This function has no parameters and never directly returns. It waits for incoming requests and returns control to the agent via the agent-defined callback routines.


12.2 Agent Shutdown

Normal agent shutdown occurs when the child process returns from its dispatch routine (discussed in Section 12.3, "Request Verification and Dispatching ." If no agent-supplied shutdown function is specified during agent initialization, the Agent Services library will call netmgt_shutdown_agent() to unregister the agent from the RPC system, clean up some library data structures for the agent and have the agent exit with a return code of zero. If the agent supplies its own shutdown routine, that routine must call netmgt_shutdown_agent() as the last thing it does. An agent must never exit directly. It should call netmgt_shutdown_agent() to terminate.

If a report cannot be delivered to the report rendezvous, the Agent Services library automatically retries at appropriate intervals. If the rendezvous is a program with a permanent RPC number (like the logger or event dispatcher), the library tries to send the report forever. If the rendezvous is a program with a transient RPC number (like the Console or snm_cmd), the library tries to send the report a small number of times before considering it undeliverable. If a report is considered undeliverable the Agent Services library will shut down the agent child process.

The Agent Services library uses the following signals in the parent: SIGINT, SIGQUIT, SIGTERM and SIGCHLD. If the parent wishes to have control passed to its own functions when one of these signals occurs, it should specify the functions in the shutdown (for SIGINT, SIGQUIT and SIGTERM ) and reap (for SIGCHLD) parameters in the call to netmgt_init_rpc_agent(). One signal is used in the child: SIGUSR1. Agents should avoid using this signal.


12.3 Request Verification and Dispatching

When a request is received from a manager via RPC, the agent parent process performs a verification routine. This routine should determine if the agent is able to do the requested operation. If the agent cannot execute the request, the verification routine should call netmgt_send_error() , specifying the reason for the failure, then return FALSE. Otherwise, it should return TRUE without calling netmgt_send_error() . Figure 12-2 illustrates request verification and dispatching.

The agent verification routine has a limited time to verify a request as dictated by the <timeout> argument specified by the manager making the request. This timeout is typically between 10 and 30 seconds, so the agent's verification routine should not do an extended amount of processing. For instance, verifying that the group name is valid would be appropriate, but retrieving information from a distant network would not be appropriate.

Figure  12-2 Request Dispatch

NOTE - Although Site/SunNet/Domain Manager forks a subprocess before calling your dispatch function, your verification function should not cache request arguments as global data. Caching request arguments may save a little time but it will probably make your agent incompatible with future Site/SunNet/Domain Manager releases.
Upon successful validation of the request, the Agent Services library creates a child process and calls the agent-supplied dispatch function in the context of the child process.


NOTE - If the verification routine opens files or allocates memory, it must be able to close the files or free the memory after control has passed to the child process. Otherwise, memory and file descriptor leakage may occur. For this reason, you should avoid agent setup tasks in the verification routine.
The dispatch function should execute the request, reporting data at the requested times. When all request servicing is completed, the dispatch function returns to the Agent Services library, who shuts down the child process.

Besides receiving requests from a manager, agents can also read requests from the request.log file on the agent system--this is the case when a request is restarted. For these requests, the verification routine is bypassed and the dispatch routine is called directly. Note that if any agent setup tasks are performed in the verification routine, the agent will not be set up correctly for restarted requests.

12.3.1 Verification and Dispatching Routine Parameters

The verify and dispatch routines share a common set of parameters. When an agent is used with the Console, most of these parameters come from the information in the request property window.

The following parameters are passed to the agent-supplied verification and dispatching procedures:

<type>

unsigned integer containing the request type code. The type code is either NETMGT_DATA_REQUEST for requesting data reports or NETMGT_EVENT_REQUEST for requesting event reports. Agents don't need to know the request type, but the code is provided for agents who want to do additional processing based on the request type.
<target>
pointer to a null-terminated ASCII string containing the name of the system where the managed element is located. If the agent is a proxy agent, this name may be different from the name of the system where the agent is running.
<group>
pointer to a null-terminated ASCII string containing the name of the group whose attribute values are to be collected. Group and attribute relationships are specified in the agent schema. (See Chapter 2, "Registering for Data, Event, and Trap Reports ," for more information on the agent schema.)
<key>
optional pointer to a null-terminated ASCII string containing a key uniquely identifying an instance of the <group> on the managed <target>.

The interpretation of this string is left largely to the agent writer. Use white-space separated strings for key selectors, or a null string when all table entries are desired. The key is usually an attribute in the table. Managers have no knowledge of what agents will accept as keys; agent writers are encouraged to document their use of the key field.
<count>
unsigned integer containing the maximum reports to send. If <count> is zero, the agent should send reports until the request is asked to stop by a manager process.
<interval>
<interval> structure containing the reporting interval. If <interval> is zero, the agent should use an agent-default value.
<interval> and <count> taken together determine the type of reporting to do. Table 12-1 lists the possible combinations and their interpretations (c and i are positive integers):



Table  12-1 Count/Interval Interpretations
Count
Interval
Interpretation

0  

0  

Send reports forever at an agent-specific interval  

0  

i  

Send reports forever, every i seconds  

1  

0  

Quick Dump--send one report  

1  

i  

Quick Dump--send one report  

c  

0  

Send c reports at an agent-specific interval  

c  

i  

Send c reports every i seconds  


Figure  12-3 Sending Reports
Data gathering methods fall into two categories. Agents may be able to report data instantly or they may need to integrate it over time before reporting. For instant reporting, the agent reports the data then sleeps for the interval specified. For integrative reporting, the agent performs some set of actions over time before the data can be reported. For instance, the supplied ping agent sends many ICMP ECHO requests at one second intervals, then waits a small amount of time for the ECHO REPLY responses. It may take up to 20 seconds for a single report to be sent.

Agents who take a substantial amount of time to integrate and report should subtract their processing time from the manager-specified <interval>, so reports are sent as often as requested. Otherwise, an interval of 20 seconds and a processing time of 20 seconds would mean reports would be sent every 40 seconds--clearly not what the manager intended. Again, agent writers are encouraged to document any specific decisions they make along these lines, so users know what report intervals make sense for an agent.
<flags>
unsigned integer bit string specifying request options. Currently no request options are defined for use by agents.

12.4 Sending Reports

After a request is received and validated, a child process is created and the agent-supplied dispatch procedure is called. The dispatch procedure collects the information requested and sends reports at the specified intervals. Figure 12-3 provides a flow diagram for request handling.

When servicing a request, the child process should:


12.5 Handling Set Requests

In addition to sending data and event reports, some agents can change the values of managed objects (that is, set attributes values). Handling a request to set the values of one or more group attributes is very similar to handling a data or event request. The basic steps are:

12.5.1 Verifying the Request

When your agent receives a set request, your request verification function will be called just like a data or event request except for the following points: Your request verification function should determine whether you believe you can set the requested attributes values. Do not actually set the attribute values here. Determine whether the set request is reasonable. For example, you should probably check whether the attributes have read-write or write-only access.

Check whether the values to set the attributes have the correct data types and are within acceptable ranges. (See Table 11-1 on page 11-3 for a list of valid data types.) If your agent can take optional arguments, fetch the arguments using netmgt_fetch_argument(3n).


NOTE - The Console snm and command-line manager snm_cmd set the optional argument name to NETMGT_OPTSTRING and the optional argument type to NETMGT_STRING.
Fetch the set request arguments using netmgt_fetch_setval. This function specifies the <group> name, optional <key> name if the group is a table, <attribute> name, and <value> to set the attribute. Unlike data and event requests, a manager can request your agent set attributes in more than one group. Continue calling netmgt_fetch_setval until the <group> name is NETMGT_ENDOFARGS which indicates the end of the request arguments. If you determine the request is well-formed, your request verification functions should return the boolean value TRUE. Otherwise, it should call netmgt_send_error(3n) to send an error report to the requester and return the boolean value FALSE .

12.5.2 Set Attribute Values

If you determined the request was well-formed, your request dispatch procedure will be called just like your verification function. Once again, you should fetch any request options and the set request arguments.

You should now set the requested attribute values. The way you do this is entirely up to your application.

12.5.3 Send a Status Report

Whether or not you were successful in setting the requested attribute values, you should send a status message to the requester. If you were successful, call netmgt_send_error (3n) setting the <service_error> code to NETMGT_SUCCESS . If you encountered an error setting the attributes, send an error report just like you would for a data or event request.

12.5.4 Sample Code

The following code fragment is an example of a routine which receives a set request and sends a status message to the requester:




/* agent error codes */ 
#define SET_FAILED	 1  
/* -------------------------------------------------------------- 
 * dispatch_request - dispatch agent request 
 * no return value 
 * -------------------------------------------------------------- 
 */ 
void
dispatch_request (type, system, group, key, count, interval, flags) 
    u_int type, /* request type */

    char *system, /* target system */

    char *group, /* object group */

    char *key, /* table key */

    u_int count, /* report count */

    struct timeval interval, /* report interval */

    u_int flags; /* request flags */

{
    NETMGT_DBG("dispatch_request\n"); 
    /* dispatch request based on request type */ 
    switch (type) {

																																																																					    case NETMGT_SET_REQUEST: 
	        do_set_request(system); 
	        return; 
    /* other cases */ 
    } 
    	return; 
} 
/* -------------------------------------------------------------- 
 * do_set_request - perform the set request 
 * no return value 
 * -------------------------------------------------------------- 
 */ 
void 
do_set_request(system) 
    char *system; /* target system name */

{
    Netmgt_arg option; /* optional argument */

    Netmgt_setval setval; /* set request argument

 





    Netmgt_error status;		/* status report argument */ 
    NETMGT_DBG("do_set_request\n"); 
    /* get any optional arguments */ 
    if (netmgt_fetch_argument (NETMGT_OPTSTRING, &option)) 
    /* handle request option */ 
    /* fetch request arguments and perform set operations */ 
    for (;;) 
        /* get next set argument 
        if (!netmgt_fetch_setval (&setval)) {
        (void) netmgt_fetch_error(&status); 
        if (!netmgt_send_error(&status)) 
            NETMGT_DBG("netmgt_fetch_setval failed: %s\n", 
                netmgt_sperror()); 
        return; 
    } 
            /* a sentinel string marks the end of arguments 
            if (strcmp (setval.name, NETMGT_ENDOFARGS) == 0) 
            return; 
    /* assume here the set operation failed - send 
     * an fatal error report to the requester 
     */ 
    if (!perform_set(system, setval)) {
        status.service_error = NETMGT_FATAL;	 
        status.agent_error = SET_FAILED; 
        status.message = "a description of why it failed"; 
        if (!netmgt_send_error(&status)) 
            NETMGT_DBG("netmgt_send_error failed: %s\n", 
                netmgt_sperror()); 
        return; 
    } 
} 
/* assume all the set operations succeeded - send a success status 
 * message to the requester 
 */ 
status.service_error = NETMGT_SUCCESS;	 
status.agent_error = (u_int)0; 
status.message = (char *)NULL; 
 





if (!netmgt_send_error(&status)) 
    NETMGT_DBG("netmgt_send_error failed: %s\n", netmgt_sperror()); 
    return; 
} 
 



12.6 Error Reporting

The agent reports errors by calling netmgt_send_error() . There are two classes of errors, generic errors and agent-specific errors.

Generic errors are the fatal errors defined in the header file netmgt_errno.h. Generic errors may be generated by agents or the Agent Services library.

Agents may also report errors specific to the agent. Agent-specific errors are defined by a code/message pair in the agentErrors section of the agent's schema file. Agent-specific errors may also return supplementary text providing additional information about the error.

While all generic errors are fatal, an agent-specific error may be classified as either a warning or fatal. If the error is a warning, the manager assumes the agent will continue to return reports. If the error is fatal, the manager assumes the agent will stop servicing the request. The Console treats fatal errors as high-priority errors and warnings as low-priority errors.

netmgt_send_error() takes a single parameter, a pointer to a Netmgt_error structure with three fields:

service_error

an error code from the enumerated type Netmgt_stat as defined in netmgt_errno.h .

NETMGT_WARNING

signals an agent-specific non-fatal error. The agent_error field provides the agent error code.

NETMGT_FATAL

signals an agent-specific fatal error. The agent_error field provides the agent error code.

Any other error code signals a generic error, and agent_error should be set to zero.
agent_error

specifies the agent-specific error code from the list defined in the agentErrors section of the agent schema.

message

optional null-terminated string containing instance information about an agent-specific error that cannot be given in the agentErrors section of the agent schema--for example "ie0" or "port 7."


12.7 Generating and Sending Asynchronous Reports (Traps)

Site/SunNet/Domain Manager provides support for asynchronous reports, also known as traps. A program generates a trap by first calling netmgt_start_trap(), then calling netmgt_build_report() for each attribute in the trap, and finally calling netmgt_send_report() to send the trap report. When netmgt_send_report() is called, traps are sent to the rendezvous as specified by netmgt_start_trap() arguments.

The following shows the syntax of the netmgt_start_trap() function:


bool_t netmgt_start_trap(system, agent_prog, agent_vers, group, 
rendez_host, rendez_prog, rendez_vers, priority, timeout)
      char*	   system; 
      u_int   	agent_prog; 
      u_int	   agent_vers; 
      char*	   group; 
      char*	   rendez_host; 
      u_int	   rendez_prog; 
      u_int	   rendez_vers; 
      u_int	   priority; 
      struct timeval timeout;
 

where

system

is the name of the element associated with the trap.
agent_prog
specifies the RPC program number of the agent that is sending the trap. Specify 0 if unknown.
agent_vers
specifies the RPC program version number of the agent that is sending the trap. Specify 0 if agent_prog is set to 0.
group
specifies the group within the agent schema to which the trap attributes belong. Specify "trap" if the attributes are not defined in any agent schema.
rendez_host
specifies the name of the system where the trap is to be sent.
rendez_prog
rendezvous for the trap. This should be set to NETMGT_EVENT_PROG, the event dispatcher.
rendez_vers
specifies the version number of the rendezvous for the trap. This should be NETMGT_EVENT_VERS , the event dispatcher's version number.
priority
specifies the priority of the trap. Choices are NETMGT_LOW_PRIORITY, NETMGT_MEDIUM_PRIORITY, and NETMGT_HIGH_PRIORITY.
timeout
specifies the maximum time (in seconds) that the netmgt_start_trap() call is to wait for a reply from the rendezvous when sending a trap report.
netmgt_start_trap() sets up the parameters for the trap. After calling netmgt_start_trap() , the trap is constructed by one or more calls to the procedure netmgt_build_report() . As with Data and Event Reports, netmgt_build_report() is called for each attribute in the trap.

A trap is sent to the rendezvous using netmgt_send_report(). Note that netmgt_send_report() may block for the timeout period if the trap cannot be sent to the rendezvous. If the trap cannot be sent, it will be retried the next time a trap or other report is generated.

A point to note is that netmgt_start_trap() does not need to be called again in order to send another trap. You only need to call netmgt_start_trap() if you want to change groups or the value of any argument you specified in the previous call to netmgt_start_trap() .


NOTE - Reporting data and issuing traps should occur in separate processes. If an agent collects and sends data in addition to issuing traps, then the agent should issue a trap by calling the fork() system routine to spawn a child process and the building and sending of the trap should be carried out by the child process. This procedure is necessary to ensure that the parent process is not interrupted from collecting and reporting data while the trap is being built and sent. This will prevent loss or delay of reports, especially if the reporting interval is very short.

12.7.1 Sample Code

The following code fragment is an example of a routine that calls netmgt_start_trap() , then calls netmgt_build_report() for each attribute in the trap, and then calls netmgt_send_report() to send the trap report.




char myname[MAXHOSTNAMELEN+1];		/* system name */
struct timeval timeout;			/* timeout buffer */
timeout.tv_sec = 10;			/* set timeout value */
timeout.tv_usec = 0;
sysinfo (SI_HOSTNAME, MYNAME, 128);
netmgt_start_trap(myname, 		/* system name */
                  myprog_number,	/* agent program number */
                  myver_number,		/* agent version number */
                  "trap", 		/* group */
                  "localhost",		/* event dispatcher host */
                  NETMGT_EVENT_PROG,
		                  NETMGT_EVENT_VERS,
                  NETMGT_HIGH_PRIORITY,
		                  timeout))
...
/* Later on, build the report.  Do this as many times as you like, once
   for each attribute in the report. */
Netmgt_data data;			/* data buffer */
bool_t event;				/* whether an event occurred */
Netmgt_error error;			/* error buffer */
(void)strcpy(data.name, attr_name);	/* set the attribute name */
data.type = NETMGT_STRING;		/* set attribute data type */
data.length = strlen(attr_value);		/* set attribute length */
data.value = attr_value;		/* set attribute value */
 




/* build the report */
if (!netmgt_build_report(&data, &event)) 
  fprintf(stderr, "Cannot build report: %s\n", netmgt_sperror());
/* Finally send the trap report to the rendezvous specified by 
   netmgt_start_trap. */
struct timeval delta_time;
delta_time.tv_sec = delta_time.tv_usec = 0;
if (!netmgt_send_report(delta_time, 0, 0)) 
  fprintf(stderr, "Cannot send report: %s\n", netmgt_sperror());
 


12.8 Summary

Now that you know how to initialize and shut down your agent, handle requests and report errors, you're ready to integrate your agent into the Site/SunNet/Domain Manager environment.

[Top] [Prev] [Next] [Bottom]

Copyright 1996 Sun Microsystems, Inc., 2550 Garcia Ave., Mtn. View, CA 94043-1100 USA. All Rights Reserved