Sun Enterprise 10000 SSP 3.5 User Guide

Event Detector Daemon

The event detector daemon, edd(1M), is a key component in providing the reliability, availability, and serviceability (RAS) features of Sun Enterprise 10000 system. edd(1M) initiates event monitoring on the Sun Enterprise 10000 control board, waits for an event to be generated by the event detection monitoring task running on the control board, and then responds to the event by executing a response action script on the SSP. The conditions that generate events and the response taken to events are fully configurable.

edd(1M) provides the mechanism for event management, but does not handle the event detection monitoring directly. Event detection is handled by an event monitoring task that runs on the control board. edd(1M) configures the event monitoring task by downloading a vector that specifies the event types to be monitored. Event handling is provided by response action scripts, which are invoked on the SSP by the edd(1M) when an event is received.

At SSP startup, edd(1M) obtains many of its initial control parameters from the following:

The RAS features are provided by several collaborative programs. The control board within the platform runs a control board executive (CBE) program that communicates through the Ethernet with a control board server daemon, cbs(1M), on the SSP. These two components provide the data link between the platform and the SSP.

The SSP provides a set of interfaces for accessing the control board through the control board server and the simple network management protocol (SNMP) agent. edd(1M) uses the control board server interface to configure the event detection monitoring task on the control board executive (Figure 10-2).

Figure 10-2 Uploading Event Detection Scripts

Graphic

After it is configured, the event detection monitoring task polls various conditions within the platform, including environmental conditions, signature blocks, power supply voltages, performance data, and so forth. If an event detection script detects a change of state that warrants an event, an event message containing the pertinent information is generated and delivered to the control board server, cbs(1M). Upon receipt of the event message, the control board server delivers the event to the SNMP agent, which in turn generates an SNMP trap (Figure 10-3).

Figure 10-3 Event Recognition and Delivery

Graphic

Upon receipt of an SNMP trap, edd(1M) determines whether to initiate a response action. If a response action is required, edd(1M) runs the appropriate response action script as a subprocess (Figure 10-4).

Figure 10-4 Response Action

Graphic

Event messages of the same type or related types can be generated while the response action script is running. Some of these secondary event messages may be meaningless or unnecessary if a responsive action script is already running for a similar event. For example, when edd(1M) runs a response action script for an overtemperature event, additional overtemperature events can be generated by the event monitoring scripts. edd(1M) does not respond to those overtemperature events (generated in response to the same overtemperature condition) until the first response script has finished. It is the responsibility of applications, such as edd(1M), to filter the events they will respond to as necessary. The cycle of event processing is completed at this point.

The edd(1M) response to a domain crash is another example of how edd(1M) responds to an event. After a domain crash, edd(1M) invokes the bringup(1M) script. The bringup(1M) script runs the POST program, which tests Sun Enterprise 10000 components. It then uses the obp_helper(1M) daemon to download and begin execution of OBP in the domain specified by the SUNW_HOSTNAME environment variable. This happens only if a domain fails (for example, after a kernel panic), in which case it is rebooted automatically. After a halt or shutdown, you must manually run bringup(1M), which then causes OBP to be downloaded and run.