C H A P T E R  2

Introduction

The X6275 Server Module offers a variety of components and utilities to help you monitor the health of your server blade and diagnose problems should they arise. This chapter covers the following subjects:


Troubleshooting

Your options for troubleshooting a server node, are, in approximate order:


TABLE 2-1 Troubleshooting Tasks and Instructions

Order

Task

Instructions

1

Watch the system BIOS power-on self-test results while booting the server node.

How BIOS POST Memory Testing Works

2

Obtain the ILOM Fault Management information, if any.

Viewing Faults

3

Examine the system event log with ILOM.

To View the System Event Log With the ILOM Web Interface

or

To View the System Event Log With the ILOM CLI)

or with IPMItool:

Viewing the SEL With IPMItool).

4

View the sensor status with ILOM.

Monitoring Status Using the ILOM Web Interface)

or with IPMItool:

Reading Sensor Status).

5

Run Pc-Check in Manual mode.

Running Pc-Check Diagnostics

6

Use ILOM’s Data Collector function to create a “snapshot” of the server node that you can send to Sun Services for diagnosis.

ILOM 2.0:

Data Collection Snapshot

ILOM 3.0:

Data Collection Snapshot



Monitoring

Continuous Monitoring

You can set up your system to generate SNMP traps with Integrated Lights Out Manager (ILOM). For more information, refer to the Sun Integrated Lights Out Manager (ILOM) 3.0 SNMP and IPMI Procedures Guide (820-6413).

You can also set up your system to generate SNMP traps with IPMItool. Refer to the IPMItool documentation available at:

http://ipmitool.sourceforge.net/

If you enable Pc-Check for a given server node, the Pc-Check diagnostic tests are run every time the server boots (which adds about 3 minutes to boot time). The test results are stored in the SP’s memory, where you or your Sun service personnel can access them and use for diagnostic evaluation if your server has problems.

Occasional Monitoring

You can log in to a chassis Chassis Monitoring Module (CMM) or an individual server node with the ILOM web and CLI interfaces. See Monitoring the Server Node With ILOM 3.0.



Note - If you want to use ILOM to monitor several servers, it is faster to log in to the CMM and use the Chassis View to move from server node to server node.


With IPMItool, you can obtain information about multiple servers at one time by using IPMItool commands with multiple target IP addresses.


X6275 Server Module Diagnostic Tools

Service Processor

Each server node on the X6275 server module has its own dedicated service processor (SP) that monitors the server node. Whenever the SP is rebooted, it runs a diagnostic program on itself, so that you can diagnose any problems that it might have. This is covered in The Service Processor.

ILOM

The SP runs software called the Integrated Lights Out Manager (ILOM), which is the focal point for most of the server node monitoring and diagnosis that you might do manually, that might be done automatically, or by your Sun Service personnel when he is assisting you.

The ILOM software is accessible both from a browser interface and from a command-line interface. It is covered in ILOM 2.0 Diagnostic Tools and ILOM 3.0 Diagnostic Tools.

Pc-Check

Pc-Check is a diagnostics program that can test and detect problems on all motherboard components, drives, ports, and slots. This program can be accessed and executed from ILOM.

You can run Pc-Check manually or you can choose to have it run automatically every time your server node reboots. When it is run manually, you see the results yourself. When it is run automatically, the results are stored on the SP where Sun Services can access them.

Pc-Check is covered in Performing Pc-Check Diagnostic Tests.

IPMItool

IPMItool is included on the X6275 blade’s Tools and Drivers CD image.

IPMItool enables you to monitor and manage a number of server modules simultaneously over a LAN. You can also generate IPMI-specific traps.

IPMItool is covered in Using IPMItool to View System Information.

Power-On Self-Test

The system BIOS provides a basic power-on self-test (POST), during which the BIOS checks the basic devices required for the server to operate. Using BIOS POST testing is covered in BIOS Power-On Self-Test (POST) Codes.


Commonly Used Terms

The following table identifies terms commonly used in this document.


TABLE 2-2 Commonly Used Terms

Term

Definition

Server blade, blade

The X6275 Blade Server Module, which is the physical blade that plugs into a Sun modular system chassis. The X6275 blade server contains two x64 independent nodes.

Node

Either of the two x64 computers resident on the server blade.

Host, Server, Server node

Other names used for nodes.

Service Processor (SP)

The SP is a “baseboard management controller” (BMC). Each of the two servers on the server blade has its own dedicated SP.

ILOM

The Integrated Lights Out Manager software that runs on the SP.

Chassis Management Module (CMM)

A baseboard management controller for the entire Sun modular system chassis.

CMM ILOM

The Integrated Lights Out Manager software that runs on the CMM.