C H A P T E R 2 |
IPMI Server Management |
Server manufacturers today have to re-invent how each new server manages itself. The hardware and software design for one server does not necessarily work with another. Every server supplier provides basic monitoring and data collection functions but no two do it exactly the same. These proprietary implementations for manageability only complicate the problem.
The standardization of server-based management, called Intelligent Platform Management Interface (IPMI), provides a solution. IPMI enables you to interconnect the CPU and devices being managed. It allows for:
IPMI is an industry-standard, hardware-manageability interface specification that provides an architecture defining how unique devices can all communicate with the CPU in a standard way. It facilitates platform-side server management and remote server-management frameworks, by providing a standard set of interfaces for monitoring and managing servers.
With IPMI, the software becomes less dependent on hardware because the management intelligence resides in the IPMI firmware layer, thereby creating a more intelligently managed server. The IPMI solution increases server scalability by distributing the management intelligence closer to the devices that are being managed.
In order to perform autonomous platform-management functions, the processor runs embedded software or firmware. Together, the processor and its controlling firmware are referred to as the Baseboard Management Controller (BMC), which is the core of the IPMI structure. Tightly integrating an IPMI BMC and management software with platform firmware provides a total management solution.
Note - Another way to perform IPMI queries and actions on the BMC is through the IPMI client utility IPMItool, which is used extensively in the testing process. For more information, see Lights Out Management. |
The BMC is a service processor integrated into the motherboard design, providing a management solution independent of the main processor. The monitored server can communicate with the BMC through one of three defined interfaces, which are based on a set of registers shared between the platform and the BMC.
Note - In the Sun Fire V20z and Sun Fire V40z servers, the SP has software that emulates a BMC. |
The BMC provides the intelligence behind IPMI. In the Sun Fire V20z and Sun Fire V40z servers, the SP serves as the BMC, providing access to sensor data and events through the standard IPMI interfaces.
IPMI defines a mechanism for server monitoring and recovery implemented directly in hardware and firmware. IPMI functions are available independent of the main processors, BIOS, and operating system.
IPMI monitoring, logging, and access functions add a built-in level of manageability to the platform hardware. IPMI can be used in conjunction with server-management software running under the OS, which provides an enhanced level of manageability.
IPMI provides the foundation for smarter management of servers by providing a methodology for maintaining and improving the reliability, availability, and serviceability of expensive server hardware.
The following list details the main features of IPMI in the servers:
The BMC owns all sensors within the repository. The SDRR features include:
The servers support IPMI with both SMS and LAN channels through the SP software version 2.2 and later. These servers meet compliance standards for IPMI version 2.
The SMS is implemented as a Keyboard Controller Style (KCS) interface.
The IPMI implementation on these servers also supports LAN channel access. (Refer to the IPMI specification v2 for details.) By default, the LAN channel access is disabled. To enable it, use the ipmi enable channel command and specify the ID of the channel to enable for the LAN Interface, as follows.
# ssh spipaddr -l spuser ipmi enable channel {sms | lan}
As part of this command, you also specify the password for the default null user. The null user can then use IPMI over the LAN interface. For more information, see User Names and Passwords.
For more information about enabling or disabling the IPMI channel, refer to Appendix E.
Operator-level and admin-level access over the LAN channel requires a valid user name and password. These servers are not preconfigured with user accounts enabled. When you initially enable the LAN channel through the command ipmi enable channel, you are required to provide the password for the null user. See IPMI Compliance and LAN Channel Access.
Note - For security reasons, the LAN channel access is disabled by default. |
Note - IPMI user identities are in no way associated with user accounts defined for server-management capabilities. Refer to Initial Setup of the SP for more information about these server-management user accounts. |
IPMI enables you to set a number of boot options for interpretation by the BIOS. TABLE 2-1 describes important information about the server boot options and parameters that the BIOS supports.
The IPMI system event log (SEL) is part of the BMC. Several types of information are logged to the SEL, from administrative messages to indications of important events, such as sensor-threshold crossings.
The size of the log is 16K, which allows for 1024 records.
Sensors generate events, obtain readings, and set thresholds. The Sensor Data Record Repository (SDRR) contains several types of sensors.
You access all sensors through the BMC. Many sensors represent physical sensors that are distributed on the motherboard and contained within FRUs. These sensors are polled. When they cross a threshold, an entry is entered in the SEL.
For more information on sensor commands, see Appendix G.
To determine the presence of a sensor, run the subcommand sensor get.
A sensor that is offline (not reporting) or physically not present in the system is indicated by state unavailable in the command response data.
To retrieve sensor thresholds, run the subcommand sensor get.
To set sensor thresholds, run subcommand sensor set.
If you specify no thresholds, the result is no change and the return code is success.
TABLE 2-2 lists the completion codes that are returned by the subcommand set sensor.
Temperature sensor readings are defined within a range of 0° C to 150° C, a difference of 151° C. The CPU die temperature thermal trip occurs at approximately 140° C.
Temperature sensors can generate the following SEL events:
Each DIMM has its own record, which is used only to log IPMI events.
For more information, refer to the section "Analyzing Events" in the Sun Fire V20z and Sun Fire V40z Servers--Troubleshooting Techniques and Diagnostics Guide.
All voltage sensor readings are indicated in volts (V). The largest voltage swing that is measured is 15V (the bulk voltage sensor ranges from 0V to 15V). Many of the voltage sensors have much lower maximums and smaller ranges. Voltage sensors can generate the following SEL events:
The values reported for all fan-speed sensor readings are indicated in revolutions per minute (RPMs). The sensors have an upper bound of 15,000 RPM.
Fan sensors can generate the following SEL events:
All power sensor readings are indicated in watts (W) and are defined within a range of 0W to 600W.
One management-controller sensor represents the BMC. The management controller has the following capabilities:
The following additional sensors also are supported:
The system-event sensor indicates a variety of system events. However, no event conditions are reflected from the subcommand sensor get.
PEF actions-Pending actions matched against a platform event filter (PEF) are logged if the event sensor has been configured to do so. Only assertions of pending PEF action conditions are logged.
Sensor Type Code: 0x12 [System Event] Sensor Specific Offset: 0x04 [PEF Action]
Time sync-Time-sync events occur in pairs: one before and one after a SEL time sync.
Sensor Type Code: 0x12 [System Event] Sensor Specific Offset: 0x05 [Time sync]
The sensor event logging disabled indicates certain SEL-related events. This sensor is represented as a "type 2" SDR record.
SEL Full-When the SEL reaches the "maximum-1" number of records, a record is logged and any subsequent add SEL commands return a limit-exceeded code. This record becomes the last record in the SEL when the log is filled to capacity.
Sensor Type Code: 0x10 Sensor Specific Offset: 0x04 [Log Full]
SEL Clear-A record is written to the SEL whenever the command Clear SEL is executed. This occurs only on the command Clear SEL; it does not occur if you delete the last SEL entry with the command Delete SEL Entry.
Sensor Type Code: 0x10 Sensor Specific Offset: 0x02 [Log AreaReset/Cleared]
The system-firmware progress sensor is an event-only sensor. The BIOS Boot Success SEL entry can be logged against this sensor when the BIOS has successfully booted and has attempted to return control to the OS, or if the BIOS has been booted and you enter a BIOS Setup screen.
Sensor Type Code: 0x0F Sensor Specific Offset: 0x02 [Firmware Progress]Event Data 2: 0x13 [Starting operating system boot process]
The Watchdog 2 sensor is used to log watchdog timer expirations. These events are generated only for timers that do not have the "do not log" bit set. A timer-expiration event is logged when a watchdog timer expires.
Sensor Type Code: 0x23 Sensor Specific Offset: * all supported actions
Note - To ensure a graceful shutdown, the correct platform drivers must be installed on the server. |
Platform Event Filtering (PEF) provides policy management that enables the BMC to act on particular events. The supported actions through PEF include:
TABLE 2-3 lists the event filters that are enabled by default.
A watchdog timer allows a selected action to occur when the timer expires.
For timer actions, pre-timeout interrupts are currently not supported. The following actions are supported:
When you use platform event trap (PET) LAN alerts, the number of alert destinations is limited to 16 (1 nonvolatile, 15 volatile). The number of alert policies is limited to 32.
Note - Acknowledgement of PET LAN alerts and alert strings are unsupported. |
When event filters are matched, the following occurs:
You can configure policies so that, if the previous alert was successful, an alert is not sent as a result of the execution of the alert policy.
On these servers, Lights Out Management (LOM) is performed through IPMItool, a utility for controlling IPMI-enabled devices.
IPMItool is a simple command-line interface (CLI) to servers that supports the Intelligent Platform Management Interface (IPMI) v1.5 specification. It provides the ability to:
Originally written to take advantage of IPMI-over-LAN interfaces, IPMItool is also capable of using a system interface, as provided by a kernel device driver such as OpenIPMI.
http://ipmitool.sourceforge.net/
http://www.intel.com/design/servers/ipmi/spec.htm
http://openipmi.sourceforge.net/
The syntax used by IPMItool is as follows:
ipmitool [-ghcvV] -I lan -H address [-P password] expression
ipmitool [-ghcvV] -I open expression
TABLE 2-4 lists the options available for IPMItool.
TABLE 2-5 lists the expressions and parameters available for IPMItool.
Note - For each of these expressions, the beginning command is always ipmitool, followed by the expression and parameter(s). |
Note - The sol command is not supported in these servers, but you can enable a serial-over-LAN feature. See Serial-Over-LAN. |
The IPMItool application utilizes a modified MontaVista OpenIPMI kernel device driver that is provided on the Sun Fire V20z and Sun Fire V40z Servers Documentation and Support Files CD. The driver has been modified to use an alternate base hardware address and modified device I/O registration.
This driver must be compiled and installed from the Documentation and Support Files CD.
The following kernel modules must be loaded in order for IPMItool to work:
The message handler for incoming and outgoing messages for the IPMI interfaces.
An IPMI Keyboard Controller Style (KCS) interface driver for the message handler.
Linux character device interface for the message handler.
To force IPMItool to use the device interface, you can specify it on the command line:
# ipmitool -I open [option...]
To install and compile this kernel device driver, see Initial Setup of the SP.
Note - In the Sun Fire V20z and Sun Fire V40z servers, the SP has software that emulates a BMC. |
The IPMItool LAN interface communicates with the BMC over an Ethernet LAN connection using User Datagram Protocol (UDP) under IPv4. UDP datagrams are formatted to contain IPMI request/response messages with IPMI session headers and Remote Management Control Protocol (RMCP) headers.
Remote Management Control Protocol is a request-response protocol delivered using UDP datagrams to port 623. IPMI-over-LAN uses version 1 of the RMCP to support management both before installing the OS on the server, or if the server will not have an OS installed.
The LAN interface is an authenticated, multisession connection; messages delivered to the BMC can (and should) be authenticated with a challenge/response protocol with either a straight password/key or an MD5 message-digest algorithm. IPMItool attempts to connect with administrator privilege level as this is required to perform chassis power functions.
With the -I option, you can direct IPMItool to use the LAN interface:
# ipmitool -I lan [option...] address password
To use the LAN interface with IPMItool, you must provide a host name on the command line.
The password field is optional. If you do not provide a password on the command line, IPMItool attempts to connect without authentication. If you specify a password, it uses MD5 authentication, if supported by the BMC; otherwise, it will use straight password/key.
The file /dev/ipmi0 is a character-device file used by the OpenIPMI kernel driver.
If you want to remotely control the power of an IPMI-over-LAN-enabled server, you can use the following commands:
# ipmitool -I lan -H spipaddr -P sppasswd chassis power on
Chassis Power Control: Up/On
# ipmitool -I lan -H spipaddr -P sppasswd chassis power status
Chassis Power is on
To view the system event log (SEL), use IPMItool.
# ipmitool -I lan -H spipaddr -P ipmipasswd sel list
The in-band command (using OpenIPMI on a Linux software-based server or LIPMI on a Solaris software-based server) is:
# ipmitool -I open sel list
Note - To receive more verbose logging messages, you can run the following command:
|
You can use commands to clear the contents of the IPMI SEL.
Use one of the following commands, depending on your OS:
TABLE 2-6 describes some potential issues with IPMI and provides solutions.
Copyright © 2004-2007, Sun Microsystems, Inc. All Rights Reserved.