Solaris System Management Agent Developer's Guide

Chapter 8 Long–Running Data Collection

This chapter discusses the ways that you can enable a module to collect data over a long period of time without blocking the System Management Agent. The demonstration modules demo_module_9 and demo_module_10 illustrate these approaches.

This chapter contains the following topics:

About Long-Running Data Collection

SNMP is not ideally suited to collecting data that is generated over a period of time. Time-outs specified by an SNMP manager are generally only a few seconds, to enable most problems to be detected quickly. However, some data might be useful when looked at over a longer period, for example, to indicate a developing condition. Such data can only be collected through a long-running data collection to get around the timeout issue. You can code your module to perform long-running data collection. You can choose from several different design patterns to model such operations.

The following design patterns can be used to enable a module to handle long-running data collections through the agent.

SNMP alarm-based approach

The module registers an SNMP alarm to call a function at a specified interval. For most sites, this solution is most useful for performing long-running data collections. See SNMP Alarm Method for Data Collection for more information and code examples.

SNMP manager polling

The SNMP manager polls a status variable to find out whether a data collection is complete, and to determine the age of the data. The data is retrieved when the status variable returns an acceptable value. The polling approach is most useful if your site has one SNMP manager and several SNMP agents. See SNMP Manager Polling Method for Data Collection for more information and code examples.

SNMP Alarm Method for Data Collection

In the SNMP alarm method for long-running data collection, the module registers an SNMP alarm to call a function at a specified interval. The interval is specified in seconds. The function can be called one time, or called repeatedly until the alarm is unregistered. The module sets a flag that causes the agent to delegate the SNMP request. By delegating a request, the agent avoids blocking other requests while responding to a request. The agent caches the SNMP request information to be retrieved later when the request is handled. The demo_module_9 example demonstrates the SNMP-alarm-based approach.

demo_module_9 Code Example for SNMP Alarm Method

The demo_module_9 code is located by default in /usr/demo/sma_snmp/demo_module_9. The README_demo_module_9 file within that directory contains instructions that describe how to perform the following tasks:

The demo_module_9 example implements the objects defined in the SDK-DEMO9-MIB.txt. The module demonstrates how to implement objects that normally would block the agent as the agent waits for external events. The agent can continue responding to other requests while this implementation waits.

This example uses the following features:

Managing the Timing of Data Collection

An important aspect of the demo_module_9 example is the relationship between the SNMP timeout and the delay time interval of the module. The delay time interval is the interval in seconds after which the agent sends an alarm to the module. The delay_time variable in the module stores this value. By default, the delay time is set to 1 in the module. You can change this value by issuing an snmpset command on the delayedInstanceOid object and supplying an integer value. The set_demo_module_9 script does issue the snmpset command to change the delay time interval. The new time interval value is used by the module to register for an alarm with the agent.

The agent calls the module when a snmpget or snmpset is issued on the delayedInstanceOid object. Instead of returning the requesting data right away, the module sets a flag to tell the agent that the request processing might take a while. The agent is free to handle other requests. The module then registers an alarm with the agent. The module needs some way to get the agent to return to the module and return the requested data when the data collection has completed. In demo_module_9, a one-time alarm is set to go off in 1 second. If you want a longer data collection, you can set the delay_time value to a longer interval. You can also set the alarm to go off repeatedly at a specified interval.

The module registers the alarm with a callback function. At the specific alarm interval, the agent calls the callback function in the module. In demo_module_9, the callback function is return_delayed_response(), which actually handles the SNMP GET or SNMP SET request.

The client that requested the data with SNMP GET must wait for the response from the agent. The snmpget command and other Net-SNMP tools have a default timeout value of 5 seconds. The client is likely to time out before getting the requested response. For this reason, you should increase the timeout value for the snmpget and snmpset commands.

You should increase the timeout of the command the amount of time required to complete the data collection. If you are doing an snmpset, make the timeout value 3 or 4 times longer than the delay time interval. A longer timeout is needed because a SET operation is more time-consuming than a GET. The agent makes several calls to the module to process a single SET, and each call is delayed by the delay value.

The -t option is used to set the timeout value. See thesnmpcmd(1M) man page for more information about common command-line options for Net-SNMP tools.

SNMP Manager Polling Method for Data Collection

In the SNMP manager polling method, an SNMP manager polls a status variable to find out whether a data collection is complete. When the data collection is complete, the age of the data is determined. If the date of the data is not acceptable, the manager can set the status variable to start a new collection. The polling method is recommended if you have one SNMP manager that is to control the polling of one or more agents. The demo_module_10 example demonstrates the SNMP manager polling approach.

demo_module_10 Code Example for SNMP Polling Method

The demo_module_10 code is located by default in /usr/demo/sma_snmp/demo_module_10. The README_demo_module_10 file within that directory contains instructions that tell how to perform the following tasks:

The demo_module_10 example implements the objects defined in the SDK-DEMO10-MIB.txt. The module is designed to handle long-running data collections so that their values can be polled by an SNMP manager. The module also shows how to implement objects that normally would block the agent as the agent waits for external events. The agent can continue responding to other requests while this implementation waits.

The demo_module_10 module uses the following features:

Avoiding a Race Condition When Polling

A race condition can occur with two or more management applications. When multiple applications issue GET or SET protocol operations that span more than a single PDU, competition for the results occurs. In the case of a long-running data collection, a race condition can occur when the module completes data collection. The module updates the status variable to indicate that the data is ready to send. However, the agent issues a second GET operation on the same variable before the first request receives the requested data. If the module starts a new data collection in response to the second request, no data is available to return to the first request.

In the following figure, Mgr2's request is received by the module after Mgr1's request but before Mgr1 gets the data. This situation could happen if the module starts a new data collection while requests are pending.

Figure 8–1 Race Condition When Polling for Data

Diagram shows two managers
polling for same data

To avoid this scenario, a module can define a flag to maintain the state of outstanding requests. When an SNMP request is received, the module checks the flag. The module starts a new collection only if no SNMP requests are outstanding. The module returns an SNMP error if requests are outstanding.