NAME | DESCRIPTION | EXAMPLES | FILES | SEE ALSO
edd.erc files are ASCII files that specify how the system is to respond to certain events. Each domain has its own Event Response Configuration (ERC) file, and one exists for global (system-wide) events. Domain ERC files are responsible for events specific to a system board in that domain. Global ERC files handle events for nonsystem boards, system boards that are not part of a booted domain, and other nondomain-specific components.
Global ERC files are generated from a template file upon invocation of the ssp_config(1M) command. Domain ERC files are generated from a template file by domain_create(1M). The template files used by both commands reside in $SSPVAR/.ssp_private/templates/Ultra-Enterprise-10000. The global ERC template is named edd.platform.erc and the domain ERC template is named edd.domain.erc.
Each ERC file contains a series of lines in the following format:
event_type : invoke_action : throttle_timeout : throttle_counter : select_action
where:
A mnemonic (name string) which corresponds to an event type. See EXAMPLES, below.
A keyword, either enabled, which tells the system to invoke the Response Action Script for the event type; or disabled, which tells it not to. If this field is blank the system does not invoke the script.
A time interval, specified in seconds, that indicates how often throttle_counter select_action(s) are able to run. (Additional throttle_counter select_action(s) are not permitted to run until throttle_timeout seconds have expired.)
A number that specifies the number of times select_action is permitted to run within each throttle_timeout interval. After throttle_timeout seconds has expired, throttle_counter select_action(s) is again permitted to run.
The Response Action Script to be invoked. This field can contain the script name, with or without a full path, as necessary. The path may contain an environment variable, such as $SSPOPT, $SSPVAR or $SUNW_HOSTNAME. Optional arguments to the script name can be literals, such as -v, or event information: %e for event type, %b for the board with the event, %t for board type, and %d for the SNMP trap data. Valid board types (%t) are:
0 : No type 1 : System board 2 : Control board 3 : Centerplane 4 : Centerplane support board |
See EXAMPLES, below.
Fields are separated by a single colon with or without spaces to its right and left. Lines that begin with a pound sign (#) are considered comments and are not parsed.
The throttle_timeout and throttle_counter fields are used to control how often an action should be run. For example, with a throttle_timeout value of 600 seconds and a throttle_counter value of 3, an action (select_action) can run only three times every 10 minutes (600 seconds). This throttling of actions is helpful to reduce the number of repetitive log messages and dump files. Note that similar actions for different system boards, domains, etc. are throttled independently. For example, a sys_brd_temp_max event action would not throttle another sys_brd_temp_max event action for a different board. Similarly, an arbstop event action would not throttle another arbstop event action for a different domain.
The ERC file can specify more than one Response Action Script for a given event. To designate a secondary Response Action Script, use a second line with the same event-type mnemonic as that of the first line. Response Action Scripts are invoked sequentially (rather than in parallel) in the order they appear in the Event Response Configuration file. If multiple Response Action Scripts exist for an event, you can supply the name and exit status of the previous Response Action Script to the present Response Action Script through the arguments %p and %s, respectively.
The edd.platform.erc and edd.domain.erc files together contain the following information. For more information about event types, see edd.emc(4). The information in the ERC files is organized as follows:
System Board Events
Control Board Events
Centerplane Events
Centerplane Support Board Events
IDN Events
CBS/CBE Connection Events
System Configuration Change Events
Host Recovery Events
Other Events
sys_brd_temp_norm : enabled : 0 : 1 : TempNormact -b %b -e %e -d %d -t %t sys_brd_temp_high : enabled : 300 : 1 : TempHighact -b %b -e %e -d %d -t %t sys_brd_temp_warn : enabled : 300 : 1 : TempWarnact -b %b -e %e -d %d -t %t sys_brd_temp_max : enabled : 300 : 1 : TempMaxact -b %b -e %e -d %d -t %t sys_brd_temp_911 : enabled : 60 : 1 : Temp911act -b %b -e %e -d %d -t %t sys_brd_temp_bad : enabled : 300 : 1 : TempBadact -b %b -e %e -d %d -t %t sys_brd_volt_norm : enabled : 0 : 1 : VoltageNormalact -b %b -e %e -d %d -t %t sys_brd_volt_max : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t sys_brd_volt_min : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t sys_brd_volt_bad : enabled : 300 : 1 : VoltageBadact -b %b -e %e -d %d -t %t |
Log a message indicating that the board temperature has gone from an overtemperature condition to normal.
Log a message indicating that the board's temperature is high.
Execute the following steps, as necessary, to handle a maximum over-temperature event:
If the board is part of an IDN, unlink the domain.
If the board is in a domain, shutdown the domain.
Power off the board.
If the board is in a domain and there are other boards with power in the domain, reboot the domain.
If the board is part of an IDN, unlink the domain.
Power down the board regardless of whether it belongs to a domain.
Log a message indicating that the system was unable to obtain the temperature of the board.
Log a message indicating that the board's voltage reading has returned to a normal condition.
Log a message indicating that the board's voltage reading has risen above the maximum threshold.
Log a message indicating that the board's voltage reading has dipped below the minimum threshold.
Log a message indicating that system was unable to obtain the board's voltage reading.
cb_temp_norm : enabled : 0 : 1 : TempNormact -b %b -e %e -d %d -t %t cb_temp_high : enabled : 300 : 1 : TempHighact -b %b -e %e -d %d -t %t cb_temp_warn : enabled : 300 : 1 : TempWarnact -b %b -e %e -d %d -t %t cb_temp_max : enabled : 300 : 1 : TempMaxact -b %b -e %e -d %d -t %t cb_temp_911 : enabled : 60 : 1 : Temp911act -b %b -e %e -d %d -t %t cb_temp_bad : enabled : 300 : 1 : TempBadact -b %b -e %e -d %d -t %t cb_volt_norm : enabled : 0 : 1 : VoltageNormalact -b %b -e %e -d %d -t %t cb_volt_max : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t cb_volt_min : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t cb_volt_bad : enabled : 300 : 1 : VoltageBadact -b %b -e %e -d %d -t %t |
Log a message indicating that the board temperature has gone from an over-temperature condition to normal.
Log a message indicating that the board's temperature is high.
If the system has fewer than two control boards configured, shut down all domains in the system and power everything off. If the system has two control boards, power off that the control board that is reading maximum temperature.
Shut down the entire system.
Log a message indicating that system was unable to obtain the temperature of the board.
Log a message indicating that the board's voltage reading has returned to a normal condition.
Log a message indicating that the board's voltage reading has risen above the maximum threshold.
Log a message indicating that the board's voltage reading has dipped below the minimum threshold.
Log a message indicating that system was unable to obtain the board's voltage reading.
centerplane_temp_norm : enabled : 0 : 1 : TempNormact -b %b -e %e -d %d -t %t centerplane_temp_high : enabled : 300 : 1 : TempHighact -b %b -e %e -d %d -t %t centerplane_temp_warn : enabled : 300 : 1 : TempWarnact -b %b -e %e -d %d -t %t centerplane_temp_max : enabled : 300 : 1 : TempMaxact -b %b -e %e -d %d -t %t centerplane_temp_911 : enabled : 60 : 1 : Temp911act -b %b -e %e -d %d -t %t centerplane_temp_bad : enabled : 300 : 1 : TempBadact -b %b -e %e -d %d -t %t centerplane_volt_norm : enabled : 0 : 1 : VoltageNormalact -b %b -e %e -d %d - t %t centerplane_volt_max : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t centerplane_volt_min : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t centerplane_volt_bad : enabled : 300 : 1 : VoltageBadact -b %b -e %e -d %d -t %t |
Log a message indicating that the board temperature has gone from an overtemperature condition to normal.
Log a message indicating that the board's temperature is high.
Shut down all remaining domains, then power off the system.
Shut down down the system.
Log a message indicating that system was unable to obtain the temperature of the centerplane.
Log a message indicating that the centerplane's voltage reading has returned to a normal condition.
Log a message indicating that the centerplane's voltage reading has risen above the maximum threshold.
Log a message indicating that the centerplane's voltage reading has dipped below the minimum threshold.
Log a message indicating that system was unable to obtain the centerplane voltage reading.
supp_brd_temp_norm : enabled : 0 : 1 : TempNormact -b %b -e %e -d %d -t %t supp_brd_temp_high : enabled : 300 : 1 : TempHighact -b %b -e %e -d %d -t %t supp_brd_temp_warn : enabled : 300 : 1 : TempWarnact -b %b -e %e -d %d -t %t supp_brd_temp_max : enabled : 300 : 1 : TempMaxact -b %b -e %e -d %d -t %t supp_brd_temp_911 : enabled : 60 : 1 : Temp911act -b %b -e %e -d %d -t %t supp_brd_temp_bad : enabled : 300 : 1 : TempBadact -b %b -e %e -d %d -t %t supp_brd_volt_norm : enabled : 0 : 1 : VoltageNormalact -b %b -e %e -d %d -t %t supp_brd_volt_max : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t supp_brd_volt_min : enabled : 300 : 1 : Voltageact -b %b -e %e -d %d -t %t supp_brd_volt_bad : enabled : 300 : 1 : VoltageBadact -b %b -e %e -d %d -t %t |
Log a message indicating that the board temperature has gone from an over-temperature condition to normal.
Log a message indicating that the board's temperature is high.
Shut down all running domains, then power off the system.
Shut down down the system.
Log a message indicating that system was unable to obtain the temperature of the centerplane support board.
Log a message indicating that the centerplane support board's voltage reading has returned to a normal condition.
Log a message indicating that the centerplane support board's voltage reading has risen above the maximum threshold.
Log a message indicating that the centerplane support board's voltage reading has dipped below the minimum threshold.
Log a message indicating that system was unable to obtain the centerplane support board's voltage reading.
idn_boot : enabled : 20 : 1 : IDNevent -e %e -d %d idn_halt : enabled : 20 : 1 : IDNevent -e %e -d %d idn_awol : enabled : 30 : 1 : IDNevent -e %e -d %d cluster_arbstop : enabled : 1800 : 3 : Arbstopact -d %d cluster_recordstop : enabled : 1800 : 3 : Recordstopact -d %d |
If at least one other domain in the same IDN has also booted and loaded the IDN software, execute the domain_link(1M) command to link the subject domain into the IDN.
Update internal IDN state information to enable the IDN event-handling routines to maintain accurate status of all IDN member domains.
The local domain reports AWOL domains present. If the status of the AWOL domains are down, unlink them from the respective domain so that the remaining IDN member domains can commence communication again.
Do an hpost(1M) dump of all IDN member domains and save the BBRAM information for the boot processors, then do a complete bringup of all IDN member domains.
Do an hpost(1M) dump of all IDN member domains, then attempt to clear the record stop of all IDN member domains.
cbe_connected : enabled : 0 : 1 : actionsysclock cbe_connected : enabled : 0 : 1 : actioncb cbe_connected : enabled : 0 : 1 : PowerFailRebootact |
Set the system clock (if necessary) and fan speed, then re-establish control board heartbeat. All previously booted domains are checked for operating system and power status to determine if a domain must be rebooted due to a power failure condition. This event represents the condition where the SSP's CBS daemon and the Control Board Executive (CBE) lose connection.
arbstop : enabled : 900 : 3 : Arbstopact -d %d recordstop : enabled : 900 : 3 : Recordstopact -d %d watchdog : enabled : 900 : 3 : WatchDogRebootact -d %d environment_shutdown : enabled : 600 : 1 : Environmentact -d %d obp_reset : enabled : 300 : 3 : ObpResetact -d %d cb_power_on : enabled : 0 : 1 : PowerOnact -t %t -b %b cb_power_off : enabled : 0 : 1 : PowerOffact -t %t -b %b reboot : enabled : 300 : 3 : Rebootact -d %d panic1 : enabled : 900 : 3 : Panicact -t 300 -d %d -e %e panic2 : enabled : 900 : 3 : Panicact -t 900 -d %d -e %e panic_reboot : enabled : 900 : 3 : PanicRebootact -d %d heartbeat_failure : enabled : 900 : 3 : HeartBeatFailact -d %d |
Do an hpost(1M) dump and do a complete bringup on the domain.
Do an hpost(1M) dump and attempt to clear the record stop.
Execute the following steps:
Dump resetinfo of all processors in the domain.
Dump the signature block of all the processors in the domain.
Do an hpost(1M) dump on the domain.
Reboot the domain by doing a complete bringup.
Log a message indicating that the system detected an environmental shutdown on a specific domain.
Log a message indicating that an OBP reset has occurred.
Log a message indicating that the power to a control board has been switched on.
Log a message indicating that the power to a control board has been switched off.
Carry out the user requested reboot by doing a quick bringup.
Sleep for the time specified by the -t option of the Panicact field, then do a quick bringup if the domain is still in a panic1 state.
Sleep for the time specified by the -t option of the Panicact field, then do a quick bringup if the domain is still in a panic2 state.
Reboot the system by doing a complete bringup.
Reboot the system by doing a complete bringup.
sys_brd_power_on : enabled : 0 : 1 : PowerOnact -t %t -b %b sys_brd_power_off : enabled : 0 : 1 : PowerOffact -t %t -b %b supp_brd_power_on : enabled : 0 : 1 : PowerOnact -t %t -b %b supp_brd_power_off : enabled : 0 : 1 : PowerOffact -t %t -b %b bulk_power_norm : enabled : 0 : 1 : BulkPowerNormact -d %d bulk_power_fail : disabled : 300 : 1 : BulkPowerFailact -d %d fan_norm : enabled : 0 : 1 : FanNormact -d %d fan_fail : enabled : 180 : 1 : FanFailact -d %d system_config_change : enabled : 0 : 1 : SystemConfChangeact -d %d |
Log a message indicating that the system board has been powered on or off.
Log a message indicating that the centerplane support board has been powered on or off.
Log a message that the 48-volt power supply is on. (Note that 48-volt power is shown as bulk power in some system messages.)
Log a message indicating which 48-volt power supply has failed or is off, then determine if the system can continue operating with the current number of valid power supplies. If not, power down the entire system.
Log a message that a fan has gone from a failed state to an on or off state.
Log a message that a fan has gone from an on or off state to a failed state.
Log a message that a system board, centerplane support board, control board, fan tray, and/or 48-volt power supply has been removed or inserted into the system.
# Event Response Configuration File # centerplane_temp_warn : enabled : 300 : 1 : TempWarnact -b %b -e %e -d %d -t %t centerplane_temp_warn : enabled : 300 : 1 : fans -hi centerplane_temp_norm : enabled : 0 : 1 : fans -off |
The first two lines in the above example of a global ERC file tell the system how it is to respond to an overtemperature event on the centerplane. The first line tells the system to pass the specified information - board number of the board experiencing the event (%b), event type (%e), SNMP trap data (%d), and board type (%t) - to an action script named TempWarnact, and to then execute TempWarnact. The second line tells it to turn on the fans at their high-speed setting.
The third line above tells the system to turn off the fans when it sees that the temperature of the centerplane is normal.
The following files are supported:
Path to an instantiated global ERC file
Path to an instantiated ERC file for domain-specific events
NAME | DESCRIPTION | EXAMPLES | FILES | SEE ALSO