C H A P T E R  2

Oracle ILOM Platform Features for the Sun Fire X4470 Server

Oracle ILOM 3.0 operates on many platforms, supporting features that are common to all platforms. Some Oracle ILOM 3.0 features belong to a subset of platforms and not to all. This chapter describes the features that are specific to Oracle’s Sun Fire X4470 Server.

For detailed information about Oracle ILOM features that are common to all server platforms, see the Oracle Integrated Lights Out Manager (ILOM) 3.0 Documentation Collection, as described in Oracle ILOM 3.0 Common Feature Set Documentation Collection.

Oracle ILOM features discussed in this chapter, which are specific to the Sun Fire X4470 Server, are as follows:


Supported Sun Fire X4470 Server Firmware

TABLE 2-1 identifies the supported Oracle ILOM and BIOS firmware versions supported on the Sun Fire X4470 Server.


TABLE 2-1 Supported Platform Firmware

Software Release

Oracle ILOM SP Firmware

BIOS Firmware

1.0

3.0.9.10

9.1.25.11

1.1

3.0.9.25

9.2.1.15

1.2.1

3.0.14.10a

9.3.1.15


For information about how to update the firmware on your server, refer to the Oracle ILOM 3.0 Common Feature Set Documentation Collection at:

http://www.oracle.com/pls/topic/lookup?ctx=E19860-01&id=homepage


Hardware Management Pack for Single Server Management

The Sun Server Hardware Management Pack (Hardware Management Pack) from Oracle provides tools to help you manage and configure your Oracle servers from the host operating system. To use these tools, you must install the Hardware Management Pack software on your server. After installing the Hardware Management Pack software, you will be able to perform the following server management tasks described in TABLE 2-2.


TABLE 2-2 Hardware Management Pack - Server Management Tasks

Server Management Task

From Host OS*

Hardware Management Pack Implementation

Tool

Monitor Oracle hardware with host IP address

Use the Hardware Management Agent and the associated Simple Network Management Protocol (SNMP) Plug-ins at the operating-system level to enable in-band monitoring of your Oracle hardware. This in-band monitoring functionality enables you to use your host operating system IP address to monitor your Oracle servers without the need of connecting the Oracle ILOM management port to your network.

Host OS-level
management tool

Monitor storage devices, including RAID arrays

Use the Server Storage Management Agent at the operating-system level to enable in-band monitoring of the storage devices configured on your Oracle servers. The Server Storage Management Agent provides an operating-system daemon that gathers information about your server’s storage devices such as hard disk drives (HDDs) and RAID arrays, and sends this information to the Oracle ILOM service processor. The Storage Monitoring features in Oracle ILOM enable you to view and monitor the information provided by the Server Storage Management Agent. You can access the Storage Monitoring features in Oracle ILOM from the command-line interface (CLI).

Oracle ILOM 3.0 CLI
Storage Monitoring features

Configure BIOS CMOS settings, device boot order, and some SP settings

Use the biosconfig CLI tool from the host operating system to configure your Oracle x86 servers BIOS CMOS settings, device boot order, and some service processor (SP) settings.

Host OS-level
biosconfig CLI

Query, update, and validate firmware versions on supported SAS storage devices

Use the fwupdate CLI tool from the host operating system to query, update, and validate firmware versions on supported storage devices such as SAS host bus adapters (HBAs), embedded SAS storage controllers, LSI SAS storage expanders, and disk drives.

Host OS-level
fwupdate CLI

Restore, set, and view Oracle ILOM configuration settings

Use the ilomconfig CLI tool from the host operating system to restore Oracle ILOM configuration settings, as well as to view and set Oracle ILOM properties that are associated with network management, clock configuration, and user management.

Host OS-level
ilomconfig CLI

View or create RAID volumes on storage drives

Use the raidconfig CLI tool from the host operating system to view and create RAID volumes on storage drives that are attached to RAID controllers, including storage arrays.

Host OS-level
raidconfig CLI

Use IPMItool to access and manage Oracle servers

Use the open source command-line IPMItool from the host operating system to access and manage your Oracle servers via the IPMI protocol.

Host OS-level
command-line IPMItool

*Supported host operating systems include: Oracle Solaris, Linux, Windows, and VMware

Download Hardware Management Pack Software

Navigate to the following web site to download the Hardware Management Pack software.

http://support.oracle.com

Hardware Management Pack Documentation

For instructions for installing the management pack software or using its components, see the following Hardware Management Pack documentation:

For additional details about how to use the Storage Monitoring features in Oracle ILOM, see the Oracle Integrated Lights Out Manager (ILOM) 3.0 Concepts Guide and the Oracle Integrated Lights Out Manager (ILOM) 3.0 CLI Procedures Guide.

For additional details about accessing and managing your server via SNMP or IPMI, see the Oracle Integrated Lights Out Manager (ILOM) 3.0 Management Protocols Reference Guide.


Power Management Policies

This release of Oracle ILOM 3.0 software provides new Power Management policies that are supported on the Sun Fire X4470 Server.

For more information about the latest Oracle ILOM 3.0 Power Management policies, see the Oracle Integrated Lights Out Manager (ILOM 3.0) Feature Updates and Release Notes.

This section includes the following topics:

Host Power Throttling and Recovery

The Sun Fire X4470 Server supports a simple mechanism to automatically apply hardware throttles to the CPUs and memory controllers when power exceeds the rated capacity of the available power supplies. This can occur when a redundant power supply has failed or has been removed from the system.

When the server’s hardware (power CPLD) determines that power demand has exceeded the system’s available power, it automatically throttles the host processor to reduce its power consumption. The service processor (SP) removes this hardware throttle after it has been applied for 5 seconds. Host power throttling and recovery continues until such action is no longer needed.

Service Processor Power-On Policy

The service processor (SP) power-on policy determines the power state of the server when a cold boot is performed on the server. A server cold boot occurs only when AC power is applied to the server.

Service processor power-on policies are mutually exclusive, meaning that if one policy is enabled, the other policy is disabled by default. If both policies are disabled, then the server SP will not apply main power to the server at boot time. A brief description of the SP power-on policies and default settings follows:

You can configure SP power-on policies using the Oracle ILOM web interface or the Oracle ILOM command-line interface (CLI). For instructions, see the following sections:

Light Load Efficiency Mode

Light Load Efficiency Mode (LLEM) increases system power efficiency by placing power supply unit 1 (PSU1) in warm-standby mode when the system is lightly loaded. LLEM is disabled by default on the Sun Fire X4470 Server.

When PSU1 is in warm-standby mode, PSU0 carries the entire power load. If PSU0 loses AC power or is extracted for replacement, PSU1 takes over the load automatically.



Note - In rare instances, an internal failure might cause PSU0 to lose power faster than PSU1 can take over the load.


Disabling LLEM forces the PSUs to share the power load at all times, causing reduced efficiency during light power loads.

You can configure LLEM using the Oracle ILOM web interface or the Oracle ILOM command-line interface (CLI). For instructions, see the following sections:

Low Line AC Override Mode Policy

The Low Line AC Override Mode policy setting is provided to enable special test scenarios of a 4-CPU system using low-line (110 volt) power. Low-line voltage is normally supported only in 2-CPU system configurations. The capacity of each power supply unit (PSU) is roughly 1000 watts at low line. Since the power of a 4-CPU system can exceed 1000 watts by a large amount, enabling this setting results in a loss of PSU redundancy. This setting is disabled by default on the Sun Fire X4470 Server.



Note - The server is rated to have a maximum AC input current of 12 amps (with one or both PSUs working). When the Low Line AC Override policy is enabled, a 4-CPU system can require more than 12 amps total current for both PSUs. In any case, each AC inlet will not exceed 12 amps.


You can configure Low Line AC Override policy setting using the Oracle ILOM web interface or the Oracle ILOM command-line interface (CLI). For instructions, see the following sections:


procedure icon  Configure SP Power Management Policies Using the Web Interface

1. Log in to Oracle ILOM using the web interface.

2. Select Configuration --> Policy.

The Policy Configuration page appears.


Screenshot of the Policy Configuration page.

3. Depending on the SP policy you want to configure, do the following:

4. Click OK to enable or disable the SP policy.


procedure icon  Configure SP Power Management Policies Using the CLI

1. Log in to Oracle ILOM using the CLI.

2. To show the current power policy settings, type:

-> show /SP/policy

The SP policy properties appear. For example:


 /SP/policy
    Targets:
 
    Properties:
    HOST_AUTO_POWER_ON = disabled
    HOST_LAST_POWER_STATE = disabled
    LIGHT_LOAD_EFFICIENCY_MODE = enabled
    LOW_LINE_AC_OVERRIDE_MODE = disabled
 
    Commands:
    cd
    set
    show
->

In the above output, Host Auto Power On is disabled, Host Last Power State is disabled, Light Load Efficiency Mode is enabled, and Low Line AC Override Mode is disabled.

3. Depending on the SP policy you want to configure, do the following:

-> set /SP/policy/ HOST_AUTO_POWER_ON=[enabled|disabled]

-> set /SP/policy/ HOST_LAST_POWER_STATE=[enabled|disabled]

-> set /SP/policy/ LIGHT_LOAD_EFFICIENCY_MODE=[enabled|disabled]

-> set /SP/policy/ LOW_LINE_AC_OVERRIDE_MODE=[enabled|disabled]


Oracle ILOM Sideband Management

By default, you connect to the server’s service processor (SP) using the out-of-band network management port (NET MGT). The Oracle ILOM sideband management feature enables you to select either the NET MGT port or one of the server’s Gigabit Ethernet ports (NET 0, 1, 2, 3), which are in-band ports, to send and receive Oracle ILOM commands to and from the server SP. In-band ports are also called sideband ports.

The advantage of using a sideband management port to manage the server’s SP is that one fewer cable connection and one fewer network switch port is needed. In configurations where numerous servers are being managed, such as data centers, sideband management can represent a significant savings in hardware and network utilization.

You can configure sideband management using either the web interface, the command-line interface (CLI), the BIOS, or IPMI. For special considerations and configuration instructions, see the following sections:

Special Considerations for Sideband Management

When sideband management is enabled in Oracle ILOM, the following conditions might occur:



Note - If the ports are configured as switch ports and participate in the Spanning Tree Protocol (STP), you might experience longer outages due to spanning tree recalculation.



procedure icon  Configure Sideband Management Using the Web Interface

1. Log in to Oracle ILOM using the web interface.

2. Select Configuration --> Network.

The Network Settings page appears.


Screenshot of the Network Settings page.

3. In the Network Settings page, do the following:

a. Configure a static IP address or select the appropriate options to acquire an IP address automatically.

b. To select a sideband management port, click the Management Port drop-down list and select the desired management port.

The drop-down list enables you to change to any one of the four Gigabit Ethernet ports, /SYS/MB/NETn, where n is 0 to 3. The SP NET MGT port, /SYS/SP/NET0, is the default.

c. Click Save for the changes to take effect.


procedure icon  Configure Sideband Management Using the CLI

1. Log in to Oracle ILOM using the CLI.



Note - Using a serial connection for this procedure eliminates the possibility of losing connectivity during sideband management configuration changes.


2. If you logged in using the serial port, you can assign a static IP address.

For instructions, see the information about assigning an IP address in the Sun Fire X4470 Server Installation Guide.

3. To show the current port settings, type:

-> show /SP/network

The network properties appear. For example:


/SP/network
    Targets:
    Properties:
        commitpending = (Cannot show property)
        dhcp_server_ip = none
        ipaddress = xx.xx.xx.xx
        ipdiscovery = static
        ipgateway = xx.xx.xx.xx
        ipnetmask = xx.xx.xx.xx
        macaddress = 11.11.11.11.11.86
        managementport = /SYS/SP/NET0
        outofbandmacaddress = 11.11.11.11.11.86
        pendingipaddress = xx.xx.xx.xx
        pendingipdiscovery = static
        pendingipgateway =  xx.xx.xx.xx
        pendingipnetmask =  xx.xx.xx.xx
        pendingmanagementport = /SYS/SP/NET0
        sidebandmacaddress = 11.11.11.11.11.87 
        state = enabled

In the above output the current active macaddress is the same as the SP’s outofbandmacaddress and the current active managementport is set to the default (/SYS/SP/NET0).

4. To set the SP management port to a sideband port, type the following commands:

-> set /SP/network pendingmanagementport=/SYS/MB/NETn

Where n equals 0, 1, 2, or 3.

-> set commitpending=true

5. To view the change, type:

-> show /SP/network

The network properties appear and show that the change has taken effect. For example:


/SP/network
    Targets:
    Properties:
        commitpending = (Cannot show property)
        dhcp_server_ip = none
        ipaddress = xx.xx.xx.xx
        ipdiscovery = static
        ipgateway = xx.xx.xx.xx
        ipnetmask = xx.xx.xx.xx
        macaddress = 11.11.11.11.11.87
        managementport = /SYS/MB/NETn
        outofbandmacaddress = 11.11.11.11.11.86
        pendingipaddress = xx.xx.xx.xx
        pendingipdiscovery = static
        pendingipgateway =  xx.xx.xx.xx
        pendingipnetmask =  xx.xx.xx.xx
        pendingmanagementport = /SYS/MB/NETn
        sidebandmacaddress = 11.11.11.11.11.87
        state = enabled

In the above output the macaddress matches the sidebandmacaddress, and the managementport matches the pendingmanagementport.


procedure icon  Configure Sideband Management Using the Host BIOS Setup Utility

You can access the BIOS Setup Utility screens from the following interfaces:

To configure sideband management using the host BIOS Setup Utility, perform the following steps:

1. Power on or power cycle the server.

2. To enter the BIOS Setup Utility, press the F2 key while the system is performing the power-on self-test (POST).


Graphic of the Press F2 to run Setup prompt.

When BIOS is started, the main BIOS Setup Utility top-level screen appears. This screen provides seven menu options across the top of the screen.


Graphic of BIOS Setup utility main screen.

3. In the main screen, select Advanced --> IPMI 2.0 Configuration.

The IPMI 2.0 Configuration screen appears.


Graphic showing BIOS Setup utility: Advanced - IPMI configuration.

4. In the IPMI 2.0 Configuration screen, select the Set LAN Configuration option.

The LAN Configuration screen appears.


Graphic showing BIOS Setup utility: Advanced - LAN Configuration.

5. In the LAN Configuration screen, do the following:

a. Use the left and right arrow keys to select the IP Assignment option and set it to DHCP to acquire the IP address automatically, or set it to Static if manually specifying the IP address.

b. Use the left and right arrow keys to select the Active Management Port option and set the port to a sideband management port (NET0, NET1, NET2, NET3).

The NET MGT port is the default.

c. Select Commit for the change to take effect.


Switch Serial Port Output Between SP and Host Console

You can switch the serial port output of the Sun Fire X4470 Server between the SP console (SER MGT) and the host console (COM1). By default, the SP console is connected to the system serial port. This feature is beneficial for Windows kernel debugging, as it enables you to view non-ASCII character traffic from the host console.

You can switch serial port output using either the Oracle ILOM web interface or the Oracle ILOM command-line interface (CLI). For instructions, see the following sections:



caution icon Caution - You should set up the network on the SP before attempting to switch the serial port owner to the host server. If a network is not set up, and you switch the serial port owner to the host server, you will be unable to connect using the CLI or web interface to change the serial port owner back to the SP. To change the serial port owner back to the SP, you must use the Oracle ILOM Preboot Menu to restore access to the serial port over the network. For more information, see the Oracle ILOM Preboot Menu information in the Sun Fire X4470 Server Service Manual.



procedure icon  Switch Serial Port Output Using the Web Interface

1. Log in to Oracle ILOM using the web interface.

2. Select Configuration --> Serial Port.

The Serial Port Settings page appears.


Graphic of the Serial Port Settings page.

3. To select a serial port owner, click the Owner drop-down list and select the desired serial port owner.

The drop-down list enables you to select either Service Processor or Host Server.

By default, Service Processor is selected.

4. Click Save for your change to take effect.


procedure icon  Switch Serial Port Output Using the CLI

1. Log in to Oracle ILOM using the CLI.

2. To set the serial port owner, type:

-> set /SP/serial/portsharing/ owner=host

By default, owner=SP.


Server Chassis Intrusion Sensor

The /SYS/INTSW sensor is asserted when the server’s top cover is removed while power is being applied to the server. This is an improper service action so this sensor serves to alert you to any unauthorized and inadvertent removal of the server’s cover. Thus, this sensor enables system administrators to have confidence that the physical integrity of the server has not been violated. This is particularly beneficial when the server is in a remote or uncontrolled location.



Note - The server cannot be powered on when the server top cover is off and the /SYS/INTSW sensor is asserted. If the server’s top cover is removed while the server is powered-on, the host will immediately employ a non-graceful shutdown to power off the server.


How the /SYS/INTSW Sensor Works

The /SYS/INTSW sensor is asserted when the chassis intrusion switch trips while the server is powered-on. If the AC power cords are connected to the server, power is being applied to the server. Even when you shut down the server’s host, power is still being applied to the server. The only way to remove power from the server completely is to disconnect the server’s AC power cords.

The chassis intrusion switch will trip if the server’s cover is removed, the switch itself is misaligned, or the cover is not properly seated. This sensor is deasserted when the integrity of the server’s chassis is restored, that is, when the removed cover is properly reinstalled, returning the chassis intrusion switch to its closed state.



caution icon Caution - Removing the server’s top cover while the power cord is connected to the system is not an authorized service action. Proper service action requires that host and SP shutdown operations be observed and that the power cords be disconnected from the system before the cover is opened. If proper service actions are taken, you should not see the /SYS/INTSWsensor asserted unless there are other issues, such as a misaligned chassis intrusion switch.



Fault Management

When a server component fails, error telemetry is either captured via the BIOS or is monitored by the Oracle ILOM SP. Oracle ILOM consumes error telemetry from both sources and provides diagnosis in the form of a fault event. The fault event is stored in the Oracle ILOM event log as a fault message. You can use either the Oracle ILOM web interface or the command-line interface (CLI) to manually clear faults.

This section includes the following topics. The first four topics describe how to examine and clear faults, while the last topic provides reference information for sensors and indicators.

Determining Faults

When a system fault occurs, you can view system indicators and use the Oracle ILOM CLI or web interface to determine the fault:

For example:

For example:

Clearing Faults

The procedure for clearing a fault differs depending on the type of component.

1. Customer-replaceable units (CRUs) that are hot-swappable and are monitored by the SP will have their faults cleared automatically when the failed component is replaced and the updated status is reported as deasserted.

2. CRUs and field-replaceable units (FRUs) that have a FRUID container with identity information will have their faults cleared automatically when the failed component is replaced, as the SP is able to determine when a component is no longer present in the system.

3. CRUs and FRUs that are not hot-swappable or lack a FRUID container with identity information will not have their faults cleared automatically.

You can use the Oracle ILOM web interface or the command-line interface (CLI) to manually clear faults. For information on how to use the Oracle ILOM web interface or the CLI to clear server faults, see the Oracle ILOM 3.0 Documentation Collection at:

http://www.oracle.com/pls/topic/lookup?ctx=E19860-01&id=homepage

The following types of faults are diagnosed by the Oracle ILOM SP:

TABLE 2-3 lists the server component faults that are persistent after a system cold boot and the action to clear the fault.


TABLE 2-3 Component Fault Events

Component

Action to Clear the Fault

Motherboard

Fault is automatically cleared upon component replacement

Memory riser

Fault is automatically cleared upon component replacement

Fan board

Fault is automatically cleared upon component replacement

DDR3 Memory DIMMs

Fault is automatically cleared upon component replacement

CPU module

Clear fault manually after component replacement

PCIe cards

Clear fault manually after component replacement

Fan module

Fault is automatically cleared when the sensor status is OK

Power supply

Fault is automatically cleared when the sensor status is OK

Disk drive

Fault is automatically cleared when the sensor status is OK


In addition to the above faults, the following fault does not require replacement of a faulty part; however, user action is needed to clear it:

fault.security.integrity-compromised@/sys/sp

This fault is generated when the server’s top cover is removed while the AC power cords are still connected to the power supply, that is, power is not completely removed from the server. To clear this fault, replace the server’s top cover and either reboot the server’s SP or remove the AC power cords, and then reconnect the power cords.

Components With No Fault Diagnosis

Certain Sun Fire X4470 Server components do not provide a mechanism to diagnose faults. These include:

Viewing Sensors Using IPMItool

Sun Fire X4470 Server sensors can be viewed using IPMItool. For information and instructions for viewing sensors using IPMItool, see the Oracle Integrated Lights Out Manager (Oracle ILOM) 3.0 Management Protocols Reference Guide.


Sensors and Indicators Reference Information

The server includes several sensors and indicators that report on hardware conditions. Many of the sensor readings are used to adjust the fan speeds and perform other actions, such as illuminating LEDs and powering off the server.

This section describes the sensors and indicators that Oracle ILOM monitors for the Sun Fire X4470 Server.

The following types of sensors are described:



Note - For information about how to obtain sensor readings or to determine the state of system indicators in Oracle ILOM, see the Oracle Integrated Lights Out Manager (ILOM) 3.0 CLI Procedures Guide and the Oracle Integrated Lights Out Manager (ILOM) 3.0 Web Interface Procedures Guide.


System Components

TABLE 2-4 describes the system components.


TABLE 2-4 System Components

Component Name

Description

/SYS/DBP

Disk backplane

/SYS/DBP/HDDn

Hard disks n

/SYS/FB

Fan board

/SYS/FB/FANn

Fan n

/SYS/MB

Motherboard

/SYS/MB/NETn

Host network interfaces n

/SYS/MB/Pn

Processor n

/SYS/MB/Pn/MRn

Processor n; Memory riser n

/SYS/MB/Pn/MRn/Dn

Processor n; Memory riser n; DIMM n

/SYS/MB/PCIE[n, CC]

PCIe slot n, or cluster card

/SYS/PSn

Power supply n

/SYS/SP

Service processor

/SYS/SP/NETn

SP network interface n


System Indicators

TABLE 2-5 describes the system indicators.


TABLE 2-5 System Indicators

Indicator Name

Description

/SYS/CPU_FAULT

System CPU Fault LED

/SYS/DBP/HDDn/OK2RM

Hard disk n OK-to-Remove LED

/SYS/DBP/HDDn/
SERVICE

Hard disk n Service LED

/SYS/FAN_FAULT

System fan Fault LED

/SYS/FB/FANn/OK

Fan n OK LED

/SYS/FB/FANn/SERVICE

Fan n Service LED

/SYS/LOCATE

System Locate indicator LED

/SYS/MB/Pn/SERVICE

Processor n Service LED

/SYS/MB/Pn/MRn/
SERVICE

Processor n; Memory riser n Service LED

/SYS/MB/Pn/MRn/Dn/
SERVICE

Processor n; Memory riser n; DIMM n; Service indicator

/SYS/MEMORY_FAULT

System memory Fault LED

/SYS/OK

System OK LED

/SYS/PS_FAULT

System power supply Fault LED

/SYS/SERVICE

System Service LED

/SYS/SP/OK

SP OK LED

/SYS/SP/SERVICE

SP Service LED

/SYS/TEMP_FAULT

System temperature Fault LED


Temperature Sensors

TABLE 2-6 describes the environmental sensors.


TABLE 2-6 Temperature Sensors

Sensor Name

Sensor Type

Description

/SYS/DBP/T_AMB

Temperature

Disk back plane ambient temperature sensor

/SYS/MB/T_OUTn

Temperature

Motherboard exhaust temperature n sensor

Note - These sensors are located in the rear of the chassis.

/SYS/T_AMB

Temperature

System ambient temperature sensor

Note - This sensor is located on the underside of the fan board.

/SYS/PSn/T_OUT

Temperature

Power supply n exhaust temperature sensors


Power Supply Fault Sensors

TABLE 2-7 describes the power supply fault sensors. In the table, n designates the numbers 0-1.


TABLE 2-7 Power Supply Sensors

Sensor Name

Sensor Type

Description

/SYS/PSn/V_OUT_OK

Fault

Power supply n output voltage OK

/SYS/PSn/V_IN_ERR

Fault

Power supply n input voltage error

/SYS/PSn/V_IN_WARN

Fault

Power supply n input voltage warning

/SYS/PSn/V_OUT_ERR

Fault

Power supply n output voltage error

/SYS/PSn/I_OUT_ERR

Fault

Power supply n output current error

/SYS/PSn/I_OUT_WARN

Fault

Power supply n output current warning

/SYS/PSn/T_ERR

Fault

Power supply n temperature error

/SYS/PSn/T_WARN

Fault

Power supply n temperature warning

/SYS/PSn/FAN_ERR

Fault

Power supply n fan error

/SYS/PSn/FAN_WARN

Fault

Power supply n fan warning

/SYS/PSn/ERR

Fault

Power supply n error


Fan Speed, and Physical Security Sensors

TABLE 2-8 describes the fan and security sensors. In the table, n designates numbers 0, 1, 2, etc.


TABLE 2-8 Fan and Security Sensors

Sensor Name

Sensor Type

Description

/SYS/FB/FANn/TACH

Fan speed

Fan board; Fan n tachometer

/SYS/INTSW

Physical security

This sensor tracks the state of the chassis intrusion switch. If the server’s top cover is opened while the AC power cords are still connected so that power is being applied to the server, this sensor asserts. If the top cover is subsequently replaced, this sensor is de-asserted.

For more information, see Server Chassis Intrusion Sensor.


Power Supply Unit Current, Voltage, and Power Sensors

TABLE 2-9 describes the power supply unit current, voltage, and power sensors. In the table, n designates numbers 0-1.


TABLE 2-9 Power Supply Unit Current, Voltage, and Power Sensors

Sensor Name

Sensor Type

Description

/SYS/PSn/V_IN

Voltage

Power supply n AC input voltage sensor

/SYS/PSn/V_12V

Voltage

Power supply n 12 volt output sensor

/SYS/PSn/V_3V3

Voltage

Power supply n 3.3 volt output sensor

/SYS/PSn/P_IN

Power

Power supply n input power sensor

/SYS/PSn/P_OUT

Power

Power supply n output power sensor

/SYS/VPS

Power

Server total input power consumption sensor


Entity Presence Sensors

TABLE 2-10 describes the entity presence sensors. In the table, n designates numbers
0, 1, 2, etc.


TABLE 2-10 Presence Sensors

Sensor Name

Sensor Type

Description

/SYS/DBP/HDDn/PRSNT

Entity presence

Hard drive device present monitor

/SYS/DBP/PRSNT

Entity presence

Disk backplane present monitor

/SYS/FB/FANn/PRSNT

Entity presence

Fan board; Fan n present monitor

/SYS/MB/Pn/PRSNT

Entity presence

Motherboard; CPU n present monitor

/SYS/MB/Pn/MRn/PRSNT

Entity presence

Motherboard; CPU n; Memory riser n present monitor

/SYS/MB/Pn/MRn/Dn/PRSNT

Entity presence

Motherboard; CPU n; Memory riser n; DIMM n present monitor

/SYS/MB/PCIEn/PRSNT

Entity presence

PCIe card n present monitor

Note - n represents PCIe cards 0-9 or the cluster controller (cc) card.

/SYS/PSn/PRSNT

Entity presence

Power supply n present monitor



SNMP and PET Message Reference Information

This section describes Simple Network Management Protocol (SNMP) and Platform Event Trap (PET) messages that are generated by devices being monitored by Oracle ILOM.

SNMP Traps

SNMP Traps are generated by the SNMP agents that are installed on the SNMP devices being managed by Oracle ILOM. Oracle ILOM receives the SNMP Traps and converts them into SNMP event messages that appear in the event log. For more information about the SNMP event messages that might be generated on your system, see TABLE 2-11.


TABLE 2-11 SNMP Traps and Corresponding Oracle ILOM Events for Sun Fire X4470 Server

SNMP Trap Message

Oracle ILOM Event Message

Severity and Description

Sensor Name

Memory Events

sunHwTrapComponentFault

fault.memory.intel.boot-setup-init-failed

Major; A component is suspected of causing a fault

/SYS/

fault.memory.intel.boot-retries-failed

fault.memory.intel.dimm.none

/SYS/MB

fault.memory.controller.input-invalid

fault.memory.controller.init-failed

sunHwTrapComponentFault
Cleared

fault.memory.intel.boot-setup-init-failed

Informational; A component fault has been cleared

/SYS/

fault.memory.intel.boot-retries-failed

fault.memory.intel.dimm.none

/SYS/MB

fault.memory.controller.input-invalid

fault.memory.controller.init-failed

Service Processor Events

sunHwTrapComponentFault

fault.chassis.device.misconfig

Major; A component is suspected of causing a fault

/SYS/SP

fault.sp.failed

sunHwTrapComponentFault
Cleared

fault.chassis.device.misconfig

Informational; A component fault has been cleared

fault.sp.failed

Environmental Events

sunHwTrapComponentFault

fault.chassis.env.temp.over-fail

Major; A component is suspected of causing a fault

/SYS/

sunHwTrapComponentFault
Cleared

fault.chassis.env.temp.over-fail

Informational; A component fault has been cleared

/SYS/

sunHwTrapTempCrit
ThresholdExceeded

Lower critical threshold exceeded

Major; A temperature sensor has reported that its value has gone above an upper critical threshold setting or below a lower critical threshold setting

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

Upper critical threshold exceeded

/SYS/MB/T_OUT

/SYS/T_AMB

/SYS/DBP/T_AMB

sunHwTrapTempCrit
ThresholdDeasserted

Lower critical threshold no longer exceeded

Informational; A temperature sensor has reported that its value is in the normal operating range

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

Upper critical threshold no longer exceeded

/SYS/MB/T_OUT

/SYS/T_AMB

/SYS/DBP/T_AMB

sunHwTrapTempNonCrit
ThresholdExceeded

Upper noncritical threshold exceeded

 

 

Minor; A temperature sensor has reported that its value has gone above an upper critical threshold setting or below a lower critical threshold setting

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

sunHwTrapTempOk

Upper noncritical threshold no longer exceeded

Informational; A temperature sensor has reported that its value is in the normal operating range

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

sunHwTrapTempFatal
ThresholdExceeded

Lower fatal threshold exceeded

Critical; A temperature sensor has reported that its value has gone above an upper fatal threshold setting or below a lower fatal threshold setting

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

Upper fatal threshold exceeded

/SYS/MB/T_OUT

/SYS/T_AMB

/SYS/DBP/T_AMB

sunHwTrapTempFatal
ThresholdDeasserted

Lower fatal threshold no longer exceeded

Informational; A temperature sensor has reported that its value has gone below an upper fatal threshold setting or above a lower fatal threshold setting

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

Upper fatal threshold no longer exceeded

/SYS/MB/T_OUT

/SYS/T_AMB

/SYS/DBP/T_AMB

System Power Events

sunHwTrapComponentFault

fault.chassis.power.missing

Major; A component is suspected of causing a fault

/SYS/

fault.chassis.power.overcurrent

fault.chassis.power.inadequate

sunHwTrapComponentFault
Cleared

fault.chassis.power.missing

Informational; A component fault has been cleared

/SYS/

fault.chassis.power.overcurrent

fault.chassis.power.inadequate

sunHwTrapPowerSupplyFault

fault.chassis.env.power.loss

Major; A power supply component is suspected of causing a fault

/SYS/PS

fault.chassis.power.ac-low-line

fault.chassis.device.wrong

sunHwTrapPowerSupplyFaultCleared

fault.chassis.env.power.loss

Informational; A power supply component fault has been cleared

/SYS/PS

fault.chassis.power.ac-low-line

fault.chassis.device.wrong

sunHwTrapPowerSupplyError

Assert

Major; A power supply sensor has detected an error

/SYS/PWRBS

/SYS/PSn/
V_IN_ERR

/SYS/PSn/
V_IN_WARN

/SYS/PSn/
V_OUT_ERR

/SYS/PSn/
I_OUT_ERR

/SYS/PSn/
I_OUT_WARN

/SYS/PSn/T_ERR

/SYS/PSn/
T_WARN

/SYS/PSn/
FAN_ERR

/SYS/PSn/
FAN_WARN

/SYS/PSn/ERR

Deassert

/SYS/PSn/
V_OUT_OK

sunHwTrapPowerSupplyOk

Deassert

Informational; A power supply sensor has returned to its normal state

/SYS/PWRBS

/SYS/PSn/
V_IN_ERR

/SYS/PSn/
V_IN_WARN

/SYS/PSn/
V_OUT_ERR

/SYS/PSn/
I_OUT_ERR

/SYS/PSn/
I_OUT_WARN

/SYS/PSn/T_ERR

/SYS/PSn/
T_WARN

/SYS/PSn/
FAN_ERR

/SYS/PSn/
FAN_WARN

/SYS/PSn/ERR

Assert

/SYS/PSn/
V_OUT_OK

sunHwTrapComponentError

ACPI_ON_WORKING ASSERT

Major; A sensor has detected an error

/SYS/ACPI

ACPI_ON_WORKING DEASSERT

ACPI_SOFT_OFF ASSERT

ACPI_SOFT_OFF DEASSERT

Entity Presence Events

UNKNOWN

ENTITY_PRESENT ASSERT

Informational

/SYS/MB/Pn/
PRSNT

/SYS/MB/Pn/MRn/PRSNT

/SYS/MB/PCIEn/PRSNT

/SYS/MB/
PCIE_CC/PRSNT

ENTITY_PRESENT DEASSERT

ENTITY_ABSENT ASSERT

ENTITY_ABSENT DEASSERT

ENTITY_DISABLED ASSERT

ENTITY_DISABLED DEASSERT

Fans, Hard Drives, and Physical Security Events

sunHwTrapComponentFault

fault.chassis.device.fan.column-fail

Major; A component is suspected of causing a fault

/SYS

fault.security.enclosure-open

sunHwTrapComponentFault
Cleared

fault.chassis.device.fan.column-fail

Informational; A component fault has been cleared

/SYS/

fault.security.enclosure-open

UNKNOWN

Assert

Informational

/SYS/MB/PCIEn/WIDTH

/SYS/ESMR/ESM/FAULT

Deassert

sunHwTrapSecurityIntrusion

CHASSIS_INTRUSION ASSERT

Major; An intrusion sensor has detected that someone may have physically tampered with the system

/SYS/INTSW

CHASSIS_INTRUSION DEASSERT

sunHwTrapFanSpeedCrit
ThresholdExceeded

Lower critical threshold exceeded

 

 

Major; A fan speed sensor has reported that its value has gone above an upper critical threshold setting or below a lower critical threshold setting

/SYS/FB/FANn/
TACH

sunHwTrapFanSpeedCrit
ThresholdDeasserted

Lower critical threshold no longer exceeded

 

 

Informational; A fan speed sensor has reported that its value has gone below an upper critical threshold setting or above a lower critical threshold setting

sunHwTrapFanSpeedFatal
ThresholdExceeded

Lower fatal threshold exceeded

 

 

Critical; A fan speed sensor has reported that its value has gone above an upper fatal threshold setting or below a lower fatal threshold setting

sunHwTrapFanSpeedFatal
ThresholdDeasserted

Lower fatal threshold no longer exceeded

 

 

Informational; A fan speed sensor has reported that its value has gone below an upper fatal threshold setting or above a lower fatal threshold setting

System Chassis and I/O Events

sunHwTrapComponentFault

fault.chassis.boot.ipmi-init-failed

Major; A component is suspected of causing a fault

/SYS/

fault.io.quickpath.qpirc-init-failed

fault.io.quickpath.qpirc-failed

fault.io.quickpath.mrc-failed

sunHwTrapComponentFault
Cleared

fault.chassis.boot.ipmi-init-failed

Informational; A component fault has been cleared

/SYS/

fault.io.quickpath.qpirc-init-failed

fault.io.quickpath.qpirc-failed

fault.io.quickpath.mrc-failed


PET Event Messages

PET event messages are generated by systems with Alert Standard Format (ASF) or an IPMI baseboard management controller. The PET events provide advance warning of possible system failures. For more information about the PET event messages that might occur on your system, see TABLE 2-12.


TABLE 2-12 PET Messages and Corresponding Oracle ILOM Events for Sun Fire X4470 Server

PET Message

Oracle ILOM Event Message

Severity and Description

Sensor Name

System Power Events

petTrapACPIPowerStateS5G2
SoftOffAssert

SystemACPI 'ACPI_ON_WORKING'

Informational; System ACPI Power State S5/G2 (soft-off) was asserted

/SYS/ACPI

petTrapACPIPowerStateS5G2
SoftOffDeassert

System ACPI Power State : ACPI : S5/G2: soft-off : Deasserted

Informational; System ACPI Power State S5/G2 (soft-off) was deasserted

petTrapACPIPowerStateS0G0
WorkingAssert

System ACPI Power State : ACPI : S0/G0: working : Asserted

Informational; System ACPI Power State S0/G0 (working)

petTrapACPIPowerStateS0G0
WorkingDeassert

System ACPI Power State : ACPI : S0/G0: working : Deasserted

Informational; System ACPI Power State S0/G0 (working) was deasserted

petTrapPowerSupplyState
AssertedAssert

PowerSupply sensor DEASSERT

Informational; Power Supply is connected to AC Power

/SYS/PSn/
V_OUT_OK

/SYS/PSn/
V_IN_ERR

/SYS/PSn/
V_IN_WARN

/SYS/PSn/
V_OUT_ERR

/SYS/PSn/
I_OUT_ERR

/SYS/PSn/
I_OUT_WARN

/SYS/PSn/T_ERR

/SYS/PSn/
T_WARN

/SYS/PSn/
FAN_ERR

/SYS/PSn/
FAN_WARN

/SYS/PSn/ERR

petTrapPowerSupplyState
DeassertedAssert

PowerSupply sensor ASSERT

Warning; Power Supply is disconnected from AC Power

Entity Presence Events

petTrapEntityPresenceEntity
PresentAssert

Entity Presence : PCIE1/PRSNT : Present : Asserted

Informational; The Entity identified by the Entity ID is present

/SYS/PCIEn/
PRSNT

/SYS/PCIE_CC/
PRSNT

petTrapEntityPresenceEntity
AbsentDeassert

Entity Presence : PCIE1/PRSNT : Absent : Deasserted

petTrapEntityPresenceEntity
AbsentAssert

Entity Presence : PCIE1/PRSNT : Absent : Asserted

Informational; The Entity identified by the Entity ID is absent

petTrapEntityPresenceEntity
PresentDeassert

Entity Presence : PCIE1/PRSNT : Present : Deasserted

Informational; The Entity identified by the Entity ID for the sensor is absent

petTrapEntityPresenceEntity
DisabledAssert

Entity Presence : PCIE1/PRSNT : Disabled : Asserted

Informational; The Entity identified by the Entity ID is present, but has been disabled

/SYS/PCIE4/
PRSNT

/SYS/PCIE6/
PRSNT

/SYS/PCIE_CC/
PRSNT

petTrapEntityPresenceEntity
DisabledDeassert

Entity Presence : PCIE1/PRSNT : Disabled : Deasserted

Informational; The Entity identified by the Entity ID is present and has been enabled

petTrapEntityPresenceDevice
InsertedAssert

Entity Presence : PS0/PRSNT : DevicePresent

Informational; A device is present or has been inserted

/SYS/PSn/PRSNT

/SYS/FB/FANn/
PRSNT

/SYS/DBP/HDDn/PRSNT

petTrapEntityPresenceDevice
RemovedAssert

Entity Presence : PS0/PRSNT : DeviceAbsent

Informational; A device is absent or has been removed

Environmental Events

petTrapTemperatureUpper
NonRecoverableGoingLow
Deassert

Temperature Upper non-critical threshold has been exceeded

Major; Temperature has decreased below upper non-recoverable threshold

/SYS/MB/T_OUT

/SYS/DBP/T_AMB

/SYS/T_AMB

 

petTrapTemperatureUpper
CriticalGoingLowDeassert

Temperature Lower non-critical threshold has been exceeded

Warning; Temperature has decreased below upper critical threshold

petTrapTemperatureUpper
NonRecoverableGoingHigh

Temperature Lower non-critical threshold no longer exceeded

Critical; Temperature has decreased below upper non-recoverable threshold

petTrapTemperatureUpper
CriticalGoingHigh

Temperature Lower fatal threshold has been exceeded

Major; Temperature has increased above upper critical threshold

Fans, Hard Drives, and Physical Security Events

petTrapPhysicalSecurity
ChassisIntrusionState
DeassertedAssert

Physical Security : INTSW : State Deasserted

Informational; Physical security: chassis intrusion alarm cleared

/SYS/INTSW

petTrapPhysicalSecurity
ChassisIntrusionState
AssertedAssert

Physical Security : INTSW : State Asserted

Warning; Physical security breach: chassis intrusion

petTrapFanLowerCriticalGoingLow

Fan Lower fatal threshold has been exceeded

Major; Fan speed has decreased below lower critical threshold

/SYS/FB/FANn/
TACH

petTrapFanLowerCriticalGoingHighDeassert

Fan Lower fatal threshold no longer exceeded

Warning; Fan speed has increased above lower critical threshold

petTrapDriveSlotDriveFault
Assert

Drive Slot : DBP/HDD0/STATE : Drive Fault : Asserted

Critical; HDD Fault has been detected. A corresponding HDD Fault LED is ON

DBP/HDDn/STATE

 

petTrapDriveSlotDriveFault
Deassert

Drive Slot : DBP/HDD0/STATE : Drive Fault : Deasserted

Informational; HDD Fault has been cleared. An HDD Fault LED that was ON is now OFF

petTrapDriveSlotPredictive
FailureAssert

Drive Slot : DBP/HDD0/STATE : Predictive Failure : Asserted

Major; HDD Predictive Failure has been detected

petTrapDriveSlotReadyTo
RemoveAssert

Drive Slot : DBP/HDD0/STATE : Hot Spare : Asserted

Informational: A drive has been unmounted and is ready to be physically removed. A corresponding OK-to-Remove LED is ON

petTrapDriveSlotReadyTo
RemoveDeassert

Drive Slot : DBP/HDD0/STATE : Hot Spare : Deasserted

Informational; A drive is no longer ready to be physically removed. It has either been removed or mounted again. A corresponding OK-to-Remove LED is OFF

petTrapDriveSlotPredictive
FailureDeassert

Drive Slot : DBP/HDD0/STATE : Predictive Failure : Deasserted

Informational; Hard Disk Predictive Failure state has been cleared