C H A P T E R 2 |
Environmental Monitoring |
The Netra CP2500 board uses an intelligent fault detection environmental monitoring system that increases uptime and manageability of the board. The system management controller (SMC) module on the Netra CP2500 supports the temperature and voltage environmental monitoring functions. This chapter describes the specific environmental monitoring functions of the Netra CP2500.
This chapter includes the following sections:
TABLE 2-1 lists the compatible environmental monitoring hardware, OpenBoot PROM, and Solaris OS for the Netra CP2500.
FIGURE 2-1 illustrates the Netra CP2500 environmental monitoring application block diagram. For locations of the temperature sensors, see FIGURE 2-2.
The Netra CP2500 monitors its CPU diode temperature and issues warnings at both the OpenBoot PROM and Solaris OS levels when these environmental readings are out of limits. At the Solaris OS level, the application program monitors and issues warnings for the board. At the OpenBoot PROM level, the CPU diode temperature is monitored.
This section describes a typical environmental monitoring cycle from power up to shutdown.
The OpenBoot PROM monitors the CPU diode temperature at the fixed polling rate of 10 seconds and displays warning messages on the default output device whenever the measured temperature exceeds the preprogrammed warning temperature or the critical temperature. These values have defaults set by the SMC and can not be changed for the OpenBoot PROM-level monitoring.
OpenBoot PROM-level protection is enabled and can not be disabled. If the board temperature exceeds the shutdown temperature, the SMC will shut down power to the Netra CP2500 CPU. The OpenBoot PROM will send a warning or critical temperature message to the user that the Netra CP2500 is overheating.
Monitoring changes in the sensor temperatures can be a useful tool for determining problems with the room where the system is installed, functional problems with the system, or problems on the board. Establishing baseline temperatures early in deployment and operation could be used to trigger alarms if the temperatures from the sensors increase or decrease dramatically. If all the sensors go to room ambient, power has probably been lost to the host system. If one or more sensors rise in temperature substantially, there might be a system fan malfunction, the system cooling might have been compromised, or room air conditioning might have failed.
Protection at the operating system level takes place when the PICL environmental monitoring program (envmond) is running. The environmental monitoring program is part of a UNIX daemon that runs automatically when the Solaris OS boots up.
In a typical environmental monitoring application program, the software reads the CPU, inlet, and exhaust temperature sensors once every polling cycle. The program then compares the measured CPU diode temperature with the warning temperature and displays a warning message on the default output device whenever the warning temperature is exceeded.
The program can also issue a shutdown message on the default output device whenever the measured CPU diode temperature exceeds the shutdown temperature. In addition, the envmond application program can be programmed to sync and shut down the Solaris OS when conditions warrant.
Refer to Sample Application Program for an example of how a simple envmond program can be implemented.
The power module is controlled by the SMC subsystem, except for automatic controls such as overcurrent shutdown or voltage regulation. The functions controlled are core voltage output level, and power sequencing and monitoring.
The on-board voltage controller is a hardware function that is not controlled by either firmware or software. At the OpenBoot PROM level, if the board temperature exceeds the shutdown temperature, the SMC will shut down power to the Netra CP2500 CPU.
There is no mechanism for the Solaris OS to either recover or restore power to the Netra CP2500 when an unusual condition occurs, for example, if the CPU diode temperature exceeds its maximum recommended level. In either case, the end user must intervene and manually recover the Netra CP2500 as well as the system through hardware control. Once a shutdown has occurred, you can recover the board using a cold-reset IPMI command to SMC or by extracting and reinserting the board.
This section summarizes the hardware environmental monitoring features on the Netra CP2500 board. TABLE 2-2 lists the environmental monitoring functions on a Netra CP2500 board.
TABLE 2-3 shows the I2C components.
FIGURE 2-2 shows the location of the environmental monitoring hardware on the Netra CP2500.
FIGURE 2-3 is a block diagram of the environmental monitoring functions.
The on-board voltage controller allows power to the CPU of the Netra CP2500 only when the following conditions are met:
The controller requires these conditions to be true for at least 100 milliseconds to help ensure the supply voltages are stable. If any of these conditions become untrue, the voltage monitoring circuit shuts down the CPU power of the board.
The CPU diode sensor reading may vary from slot to slot and from board to board in a system, and is dependent primarily on system cooling. As an example, a system might have sensor readings for the CPU diode from 35°C to 49°C with an ambient inlet of 21°C across many boards, with a variety of configurations and positions within a chassis. Care must be taken when setting the alarm and shutdown temperatures based on the CPU diode sensor value. This sensor typically is linear across the operating range of the board.
The exhaust sensor measures the local air temperature at the trailing edge of the board for systems with bottom to top airflow. This value depends on the character and volume of the airflow across the board. Typical values in a chassis may range from a delta over inlet ambient of 0°C to 12°C, depending on the power dissipation of the board configuration and the position in the chassis. The exhaust sensor is nonlinear with respect to ambient inlet temperature.
The inlet sensor measures the local air temperature at the leading edge of the board on the solder side under the solder-side cover. This value typically can range from a reading of 0°C to 13°C above inlet system ambient in a chassis. Care must be taken to understand the application and installation of the board to use this temperature sensor.
A sudden drop of all temperature sensors close to or near room ambient temperature can mean loss of power to one or more Netra CP2500s.
A gradual increase in the delta temperature from inlet to outlet can be due to dust clogging system filters. This feature can be used to set service levels for filter cleaning or changing.
The CPU diode temperature can be used to prevent damage to the board by shutting the board down if this sensor exceeds predetermined limits.
The Netra CP2500 uses the environmental monitoring detection system to monitor the temperature of the board. The environmental monitoring system will display messages if the board temperature exceeds the warning and critical settings. Because the on-board sensors may report different temperature readings for different system configurations and airflows, you might want to adjust the warning, critical, and shutdown temperature parameter settings.
The Netra CP2500 determines the board temperature by retrieving temperature data from sensors located on the board. A board sensor reads the temperature of the immediate area around the sensor. Although the software might appear to report the temperature of a specific hardware component, the software is actually reporting the temperature of the area near the sensor. For example, the CPU diode sensor reads the temperature at the location of the sensor and not on the actual CPU heat sink. The board's OpenBoot PROM collects the temperature readings from each board sensor at regular intervals. You can display these temperature readings using the show-sensors OpenBoot PROM command. See Using the show-sensors Command at the OpenBoot PROM.
The temperature read by the CPU sensor will trigger OpenBoot PROM warning and critical messages. When the CPU sensor reads a temperature greater than the warning parameter setting, the OpenBoot PROM will display a warning message. When the sensor reads a temperature greater than the shutdown setting, the SMC will shut down the board.
Many factors affect the temperature readings of the sensors, including the airflow through the system, the ambient temperature of the room, and the system configuration. These factors might contribute to the sensors reporting different temperature readings than expected.
The Netra CP2500 board CPU sensor default temperature threshold values are 110°C for the high warning temperature, 118°C for the high shutdown temperature, and 123°C for the high power-off temperature.
Note - If you have developed an application that uses the environmental monitoring software to monitor the temperature sensors, you may want to adjust your application's settings accordingly. |
This section describes the OpenBoot PROM environmental monitoring of the CPU.
When the CPU diode temperature reaches warning temperature, a similar message is displayed at the ok prompt at a regular interval:
Temperature sensor #2 has threshold event of <<< WARNING!!! Upper Non-critical - going high >>> The current threshold setting is : 110 The current temperature is : 111 |
When the CPU diode temperature reaches critical temperature, a similar message is displayed at the ok prompt at a regular interval:
Temperature sensor #2 has threshold event of <<< ALERT!!! Upper Critical - going high >>> The current threshold setting is : 118 The current temperature is : 119 |
The show-sensors command at OpenBoot PROM displays the readings of all the temperature sensors on the board. A sample output for typical sensor readings for a Netra CP2500 is as follows:
The following sections describe how to use the environmental monitoring functions in an application program.
For the environmental monitoring application program (envmond) to monitor the hardware environment, the following conditions must be met:
The environmental monitoring parameter values in the application program apply when the system is running at the Solaris level and do not necessarily have to be the same as the default settings programmed by the SMC and used by the OpenBoot PROM. The OpenBoot PROM environmental monitoring only applies when the system is running at the OpenBoot PROM level.
Temperature sensor states may be read using the libpicl API. The following properties are supported in a PICL temperature sensor class node:
The PICL plug-in receives these sensor events and updates the State property based on the information extracted from the IPMI message. It then posts a PICL event.
Threshold levels of the PICL node class temperature sensor are:
To obtain a reading of temperature sensor states, use the prtpicl -v command:
Sample PICL output of temperature sensors on a Netra CT system is as follows.
On the Netra CP2500, you can enable or disable sensors, and configure sensor threshold actions, such as shutdown and reboot, by editing the /etc/picl/config/envmond.conf file.
Sample entries in the envmond.conf file are:
The PICL envmond plug-in opens a SMC driver stream and requests sensor events. The SMC monitors the sensors and generates an event when it detects a change at a particular sensor which meets one of the specified thresholds and generates an event to local Solaris software. This event is captured by the SMC driver (as an IPMI message) and is sent on an open STREAM that has requested sensor events. The sensor events are received by the PICL plug-in. The PICL plug-in updates the State property based on the information it extracts from the IPMI message and posts a PICL event.
This section presents a sample environmental monitoring (envmond) application that monitors the CPU diode temperature.
You can access the CPU temperature sensor current readings and environmental monitoring settings from the Solaris prompt by typing the following commands. Sample output is listed after each command.
TABLE 2-5 shows which Solaris commands correspond to the environmental monitoring warning that runs when the CPU temperature exceeds the set limit.
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.