Chapter 4. Using Agent Integrator for Polling


To track faults in critical system components or applications, management systems use polling to determine whether attributes of the managed resource have crossed some significant threshold. Polling consists in checking the value of an attribute of the managed resource at some interval. The BEA Agent Integrator can be configured to act as a proxy for the manager, doing the polling locally on the managed node. By off-loading polling to distributed Integrator agents, the load on the management station is reduced and less network bandwidth is consumed. Communication between the manager and Agent Integrator occurs only when the manager sends a SET request to activate or de-activate the polling, or when the Agent Integrator sends an SNMP trap if it detects the specified event in the managed resource.

Procedure for Setting Up Local Polling

The steps in using Agent Integrator for local polling can be summarized as follows:

  1. Decide which resources you want to monitor.

    The attributes of the resource that you want to monitor must be defined as MIB objects. These MIB objects must be supported by an agent or subagent that has been installed on the managed node.

  2. Make the managed resource accessible to the Agent Integrator.

    The Agent Integrator must know how to access the managed object. This means the object identifier for that object must lie within branches of the OID tree that are known to the Agent Integrator. If the managed object you want to monitor is supported by a SMUX subagent that has been installed on the managed node, the subagent automatically registers its section of the OID tree with the Agent Integrator when the subagent is started. This can be modified using OID_CLASS entries in the BEA Manager configuration file, as described in Chapter 7, "Configuration Files." For peer SNMP agents (or DPI or SMUX master agents), you must define the segments of the OID tree supported by those agents in NON_SMUX_PEER entries in the BEA Manager configuration file. This is described in the section "Integrator Access to Managed Objects" in Chapter 3, "Using Multiple SNMP Agents." The Agent Integrator directly supports the following MIB groups: MIB II system and snmp groups, the SMUX MIB, and the BEA Manager beaintAgtTable in the BEA Manager agent MIB. Additional MIB groups are supported by the unix_snmpd and nt_snmpd SMUX subagents, shipped with Agent Integrator. These are listed in Chapter 6, "Starting the Subagents."

  3. Define polling instructions for the Agent Integrator

    We can divide this task into two main subtasks:

    Each polling instruction for the SNMP Integrator is called a rule. Rules are defined under the RULE_ACTION entry in the BEA Manager configuration file, beamgr.conf. You can use your favorite text editor to modify this file. (For information on how to create rules, see the section "Creating New Polling Rules.") Rules are explained below under "Introduction to Agent Integrator Rules." Chapter 7, "Configuration Files," provides the complete syntax for the RULE_ACTION entries.

  4. Configure your SNMP management system for Agent Integrator traps

    When a polling threshold is crossed, Agent Integrator sends an enterprise-specific SNMP trap notification to the destinations specified by the TRAP_HOST entries in the BEA Manager configuration file. Some configuration will be required on your SNMP-compliant management system to make use of the traps that are thus generated. The exact set of steps you need to perform vary depending upon which management system you are using. Typically some configuration or mapping is required to get the management system to perform a desired action (such as turning an icon red) when a trap is received. Consult your management system documentation for specific instructions.

  5. Start Agent Integrator polling.

    The Agent Integrator begins executing all valid polling rules when it is started. Refer to the section "Starting and Stopping Polling" for more details.

  6. De-activate or re-activate Agent Integrator polling, when desired.

    Polling rules are available as MIB objects; thus an operator can de-activate or re-activate polling from the management station by means of an SNMP SET request. This is described in the section "Starting and Stopping Polling."

Introduction to Agent Integrator Rules

An Agent Integrator rule consists of the following parts:

Conditions

When the Agent Integrator polls, it checks to determine if a specified condition holds. A condition is defined as a relationship between an object (specified by its object identifier) and a value. (Object identifiers are described in Chapter 1, "Agent Integrator Overview." Polling is described in "Polling.")

Relations for Defining Conditions

The condition obtains (the threshold is crossed) if and only if the specified relation holds between the object and the value. For example, the relation greater than defines the following condition:

disk capacity in use greater than 90 percent

In this case, the condition holds (evaluates to true) if the object (percentage of disk capacity in use) has a value that is greater than 90. (In this example, the condition is described in English, not the actual code used to define Agent Integrator polling rules.)

Any of the relations listed in the following table can be used to define conditions.

Table 4-1 Relations for Defining Conditions

Symbol Meaning

==

is identical to

!=

is not identical to

<

is less than (for numeric values
is a substring of (for strings)

>

is greater than (for numeric values)
contains (for strings)

<=

is less than or equal to

>=

is greater than or equal to

Polling with a SMUX Subagent

For example, suppose that we want the Agent Integrator to check if CPU usage has exceeded 80 percent. This feature of the CPU is represented by the beaSysPerfCpu object in the beaSysPerf group. This MIB group is supported by the unix_snmpd subagent, which is shipped with the Agent Integrator. This subagent uses SNMP Multiplex (SMUX) protocol to talk to the Agent Integrator. Thus, the Agent Integrator can obtain the value of this object from the unix_snmpd subagent if it is running on the same machine. We could use the following condition to define a polling rule for the Agent Integrator:

(VAL(.1.3.6.1.4.1.140.11.1.0) > 80)

The expression VAL() is used to obtain the value of the beaSysPerfCpu object. The specified condition obtains if the percentage of CPU capacity in use exceeds 80 percent. In this example, the initial dot indicates that this is an absolute OID, that is, the path to the beaSysPerfCpu object is defined from the root of the OID tree. The actual OID for the beaSysPerfCpu object is .1.3.6.1.4.1.140.11.1. However, when retrieving the value, it is necessary to specify the instance of the object to be retrieved. The last numeral, 0, is the instance index. Because beaSysPerfCPU is a scalar object - an object that can have only one instance - an index of zero is specified in this case. (How to specify non-scalar objects is discussed in the section "Instance Indexes.")

The following is an example of an Agent Integrator rule that uses the condition previously specified:

RULE_ACTION checkcpu 600 \
if (VAL(.1.3.6.1.4.1.140.11.1.0) > 80) {TRAPID_ERR = 200}

In this example, checkcpu is the name of the rule. The Agent Integrator checks the CPU usage every ten minutes. If the value of beaSysPerfCpu is greater than 80 percent, TRAPID_ERR = 200 instructs the Agent Integrator to generate an enterprise-specific trap with a specific-trap type number of 200. This type number can be used by a system administrator to identify the cause of the trap.

Note: The MIB objects whose values the Agent Integrator can obtain depends on the MIB objects supported by the agents or subagents that the Agent Integrator is managing. In the previous example, the Agent Integrator can poll for the beaSysPerfCpu object value only if the unix_snmpd subagent is running on the managed node. The MIB objects that Agent Integrator can access through a "peer" SNMP agent depends on the NON_SMUX_PEER entries in the BEA Manager configuration file, as explained in Chapter 3, "Using Multiple SNMP Agents."

Polling with SNMP Peer Agents

The Agent Integrator can also obtain MIB object values from SNMP "peer" agents on either the same machine or other machines in the network. For example, suppose that we have a peer SNMP agent that supports the MIB II interfaces group. If so, we might want the Integrator to check if a physical interface is not operational. This feature of the interface is represented by the ifOperStatus object in the ifTable in the MIB II interfaces group. In this case, we want to know whether the value of ifOperStatus is not equal to 1. (An interface is operational if its ifOperStatus value is 1.) If we want to check the ifOperStatus value for the first interface on the machine, we could use the following condition:

(VAL(.1.3.6.1.2.1.2.2.1.8.1) != 1)

This condition holds if and only if the first interface in the ifTable is operational. The last numeral, 1, specifies the instance index - the first interface entry in the table.

If the condition is satisfied, we want the Agent Integrator to take some action. For example, if the ifOperStatus value for an interface is not 1 (i.e., the interface is not up), we might want the Agent Integrator to notify the management station. To do this, we can specify that the Agent Integrator send an enterprise-specific SNMP trap to the management station with a special specific-trap value, which identifies the cause of the trap to the systems administrator.

Instead of requesting this notification if a specific interface (such as the first one in the ifTable) is down, we might want to be notified if any of the interfaces is down.

Here is an example of a rule entry that would do this:

RULE_ACTION checkIf 120 \
if (VAL(.1.3.6.1.2.1.2.2.1.8.*) != 1) {TRAPID_ERR=300}

In this example, checkIf is a name we have given to this particular rule. We have indicated that Agent Integrator should check the interface every two minutes (120). By using the asterisk wildcard for the instance index, the condition will be satisfied if any interface in the ifTable has an ifOperStatus not equal to 1, that is, all instances will be checked. If the value of the OID is not equal to 1 (the interface is not up) for any instance, an enterprise-specific trap is sent with a specific trap ID of 300.

Note: This rule only causes a trap to be generated when Agent Integrator first detects that an interface is down. If the interface continues to be down, it does not generate additional traps.

Use of Logical Operators in Conditions

Conditions are of two types, simple and complex. A simple condition consists of a relation between a managed object and a value. All of the examples in the previous sections have been simple conditions.

You can use the logical operators AND, OR, and NOT to define complex conditions. For example, if A and B are two simple conditions, you can specify a complex condition that consists of both A and B occurring. The symbols listed in the following table can be used to define complex conditions.

Table 4-2 Logical Operators for Specifying Complex Conditions

Symbol Meaning

!(condition_A)

Logical negation. The threshold is crossed if and only if condition_A does not hold.

(condition_A || condition_B)

Logical disjunction. The threshold is crossed if and only if either condition_A or condition_B obtain.

(condition_A && condition_B)

Logical conjunction. The threshold is crossed if and only if both condition_A and condition_B obtain.

Scenario for Using a Complex Condition

For example, we might not want the Agent Integrator to send an alarm when ifOperStatus is not up for an interface if a system administrator has taken that interface down for repair. In that case, we could define a rule that asks the Agent Integrator to determine if two conditions hold: ifOperStatus is not up AND ifAdminStatus is up. In other words, we want to be notified if the interface should be up but is not.

Note: The MIB objects whose values the Agent Integrator can obtain depends on the MIB objects supported by the agents or subagents that the Agent Integrator is managing.

Sample Code for this Scenario

To do this, we might modify our checkIf rule as follows:

RULE_ACTION checkIf 60\
if ((VAL(.1.3.6.1.2.1.2.2.1.8.*) != 1) && \
(VAL(.1.3.6.1.2.1.2.2.1.7.*) == 1)) \
{TRAPID_ERR=301}
How this Rule Works

In this example, the Agent Integrator checks the interfaces every minute (60) and generates an enterprise-specific trap, with a specific trap value of 301, if any of the interfaces is not up (ifOperStatus not equal to 1) but has an ifAdminStatus value of up (i.e., the interface should be up but it is not).

Note: This rule causes this trap to be generated only when the condition first evaluates to true. As long as the interface continues in the same state, a new trap is not generated.

Data Types for Defining Conditions

The syntax for a simple condition is as follows:

(VAL(oid) relation value)

where

relation
Is one of the relations described in Table 4-2.

oid
Is specified in one of the formats described in the section "Specifying Object Identifiers in Conditions."

value
Can be one of the following data types:

Specifying Object Identifiers in Conditions

In defining polling conditions, the object identifier (OID) must be specified numerically, not using textual symbols (other than mib-2 or enterprises as indicated in the following list). One of the following formats can be used to specify the object identifier:

Instance Indexes

Columnar objects are used to represent a column of a tabular MIB group. Columnar objects accordingly can have multiple instances. To specify an instance, the index is appended to the rest of the OID. If the index is a single attribute, the last number in an OID is used to specify the particular instance. If the more than one attribute is required to uniquely identify an instance, an instance number for each attribute is appended to the OID, separated by a dot, in the order specified by the INDEX definition in the ASN.1 file.

For example, suppose that you want to check for the condition when the state of a particular server is anything but active. To uniquely specify a server instance, we require both the group number and the server ID. The INDEX entry for tuxTsrvrTbl in the ASN.1 file specifies the following as an INDEX to particular instances.

INDEX (tuxTsrvrGrpNo,tuxTsrvrId)

The relative OID for tuxTsrverState is the following:

140.300.20.1.1.5

Thus, to specify the particular server instance for group 55 and server ID 3, you use the following OID:

140.300.20.1.1.5.55.3

Note that the order of the two attribute instances added to the tuxTsrvrState OID is indicated by the INDEX definition above: tuxTsrvrGrpNo followed by tuxTsrvrId.

You can thus define the condition that you want to check as follows:

VAL(140.300.20.1.1.5.55.3) != 1

This condition will evaluate to true whenever this particular server instance is not active.

A specific number can be used to specify a particular instance or the asterisk wildcard can be used to specify all instances. Zero is used as the instance index in the case of scalar objects (objects that can have only one instance). The asterisk wildcard is only used to represent all instances of a columnar object. For example:

.1.3.6.1.4.1.140.1.1.0 

specifies the single instance of a scalar object while:

.1.3.6.1.4.1.140.2.22.1.2.*

specifies all of the instances of a columnar object. When a wildcard is used to define a condition, the condition will be satisfied if any instance satisfies the condition.

Note

States and Transitions

Associated with each active polling rule is a state. There are two possible states for an active rule:

OK - A rule is in the OK state when the specified condition does not hold (threshold is not crossed).

ERR - A rule is in the ERR state when the specified condition does hold (threshold is crossed).

A transition takes place when the value obtained from a poll of an object results in the state of a rule changing, either from OK to ERR or from ERR to OK. Transitions determine when an action is to be taken in response to a poll, as described in the following section. That is, Agent Integrator polling rules execute an action (such as generating a trap notification) only when a polling rule undergoes a transition from one state to a different state.

When the Agent Integrator begins executing a polling rule, the rule is initially in the OK state. As long as the threshold is not crossed, the rule remains in the OK state. If the threshold is crossed, the rule undergoes a transition from the OK state to the ERR state. As long as the condition continues to evaluate as true, the rule remains in the ERR state. If the condition subsequently evaluates to false, the rule then transitions back to the OK state. Thus, there are two types of transition:

Actions

An Agent Integrator rule can specify an action to be taken if the polling rule undergoes one of these transitions. Two different types of action can be specified:

Both types of action can be specified in the same rule.

Note: Agent Integrator carries out an action only when a transition occurs. Continued polling does not result in duplicate actions as long as the rule remains in the same state. This prevents duplicate traps from being generated in response to detection of a single event.

Four keywords are used to define actions:

TRAPID_ERR = specific-trap-number

Indicates that a trap should be sent if the state of the rule transitions from OK to ERR.

TRAPID_OK = specific-trap-number

Indicates that a trap should be sent if the state of the rule transitions from ERR to OK.

COMMAND_ERR = "command"

The program specified by command is executed if the state of the rule transitions from OK to ERR.

COMMAND_OK = "command"

The program specified by command is executed if the state of the rule transitions from ERR to OK.

Note: The string specifying the command to be executed must be in quotes. For example: COMMAND_ERR = "usr/mybin/test.ksh"

If you do not specify the absolute path to the executable or script, the path should be specified in the Agent Integrator's environment settings.

These statements specifying actions must be placed within curly braces. When multiple commands are specified in a rule, the commands must be separated by spaces. command must be enclosed in quotes.

A string containing the name of the rule and the direction of the state transition (OK to ERR or ERR to OK) is passed as an argument to the script or program called by the COMMAND_ERR or COMMAND_OK actions.

Trap Information

The following information is passed in the enterprise-specific traps generated by Integrator polling rules:

Examples

In the following example, the Agent Integrator polls every ten minutes (600) to determine if disk capacity in use is greater than 90 percent. If any file system has more than 90 percent capacity in use, an enterprise-specific trap with number 102 is generated. If subsequently all the file systems have less than or equal to 90 percent of capacity in use, an enterprise-specific trap with trap number 202 is generated.

RULE_ACTION diskchk 600 \
if (VAL(140.2.22.1.5.*) > 90) {TRAPID_ERR = 102 TRAPID_OK = 202}

In the next example, a TUXEDO application is checked to determine if the transaction triptime exceeds 36 mSec. If the threshold is crossed, an enterprise-specific trap is generated and a user script, logtime, is invoked to log the time of the event. If the triptime is subsequently less than 36 mSec after having crossed that threshold on the previous poll, an enterprise-specific trap with a number of 302 is generated.

RULE_ACTION triptime 20 \
if (VAL(140.150.1.3.*) > 35) \
{TRAPID_ERR = 301 TRAPID_OK = 302 \
COMMAND_ERR = "/usr/sbin/logtime"}

Note: The object identifier in this example is not defined in the BEA MIB. This is an example of an object that might be defined in a user-supplied custom MIB.

In the next example, Agent Integrator polls every five seconds to check whether the number of requests completed by the TUXEDO server Server1 is greater than six. If it is, an enterprise-specific trap is generated with a specific trap number of 210 and the command c:/etc/srv_reqs.cmd is executed.

RULE_ACTION Server1 5 \
if ((VAL(140.300.20.2.1.12.*) > 6)) \
{ TRAPID_ERR=210 COMMAND_ERR="c:/etc/srv_reqs.cmd" }

In the next example, Agent Integrator is checking a particular server instance in any state other than active. The server that is being checked is uniquely identified by its group number and server ID: group number 55 and server ID 3.

RULE_ACTION srvrUp 60 if (VAL(140.300.20.1.1.5.55.3) != 1 \
{TRAPID_ERR = 306 TRAPID_OK = 307}

Whenever the server satisfies the condition, the rule transitions to the ERR state and generates an enterprise-specific trap with the specific trap number of 306. Whenever the server becomes active again, it transitions back to the OK state and issues a trap with the specific trap number of 307.

Starting and Stopping Polling

Polling rules are defined as RULE_ACTION entries in the BEA Manager configuration file, beamgr.conf. The default location of this file is /etc on UNIX machines or C:\etc on Windows NT machines. Individual rules are MIB objects, stored as an entry (row) in the beaIntAgtTable.

The status of each rule entry determines whether the Agent Integrator will execute that rule, that is, actively check the condition specified in the rule. The status of each rule entry is stored in the beaIntAgtStatus object. Polling is active for a rule if the status of that rule is valid (integer value of 1). Polling is inactive for a rule if its status has been set to inactive (integer value of 3). The specific rule can be SET from a management station (such as OpenView or SunNet Manager) by using the unique name of the rule as the key field used to specify the entry instance (row).

Note: The Agent Integrator must be running in order to successfully SET objects in the beaIntAgtTable.

The Agent Integrator begins executing all polling rules defined in RULE_ACTION entries in the BEA Manager configuration file (beamgr.conf) when it first starts up. The status of each rule object in the beaIntAgtTable is valid at startup.

Creating New Polling Rules

Rules can be added to the configuration file in two ways:

Deleting or Modifying Polling Rules

Agent Integrator polling rules can modified in the same two ways they can be created:

Stopping Agent Integrator Polling Activity

Polling can be de-activated in one of two ways:

Restarting Agent Integrator Polling Activity

When a polling rule has been de-activated using a SET request from a management station, the rule can be re-activated using a SET request to set the value of the corresponding beaIntAgtStatus object to valid (integer value of 1).