C H A P T E R  3

Diagnostics Tools

This chapter contains information about diagnostic tools that you can use to determine the status of the Sun Blade X6240 server module and components.

This chapter contains the following topics:


3.1 Service Processor ILOM

The following component information is available through the service processor (SP) Integrated Lights Out Manager (ILOM).

See the Sun Integrated Lights Out Manager 2.0 User’s Guide (820-1188), for more information.


3.2 System Status LEDs

The Sun Blade X6240 server module has external and internal system status LEDs.

3.2.1 External Status Indicator LEDs

FIGURE 3-1 shows the locations of the external status indicator LEDs.

FIGURE 3-1 External LED Locations


Figure showing external LED locations on the server front panel

Refer to TABLE 3-1 for descriptions of the LED behavior.


TABLE 3-1 Front Panel LED Functions

LED Name

Description

Locate button/LED

This LED helps you to identify which system in the rack you are working on in a rack full of servers.

  • Push and release this button to make the Locate LED blink for 30 minutes.
  • Hold down the button for 5 seconds to initiate a “push-to-test” mode that illuminates all other LEDs both inside and outside the chassis for 15 seconds.

Ready-to-Remove LED

The server module is ready to be removed from the chassis. This LED is switched on by the service processor when the server module main power is off.

Service Action Required LED

This LED has three states:

  • Off: Normal operation.
  • Slow Blinking: A new (unacknowledged) event requiring a service action has been detected.
  • On: The event has been acknowledged, but the problem still requires attention.

Power LED

This LED has three states:

  • Off: Server main power and standby power are off.
  • Standby Blinking: Standby power is on, but main power is off.
  • Slow Blinking: POST or diagnostics are running.
  • On: Server is in main power mode with power supplied to all components.

Hard Disk Drive Status LEDs

The hard disk drives have three LEDs. The order listed below is when the server module is installed in the chassis:

  • Power: Fast blink means normal disk activity, slow blink means RAID activity, and off means power is off or no disk activity.
  • Service Action Required: System has detected a hard disk fault. This LED is controlled by the service processor.
  • Ready-to-Remove: On indicates that the disk is ready to be removed (hot-swapped). Off indicates that the disk is operating normally.

3.2.2 Internal Status Indicator LEDs

These servers have internal status indicator LEDs for the DIMM slots and the CPUs.

When the board is removed from the chassis, you can press a fault indicator button to view the location of the DIMM or CPU that has failed.

FIGURE 3-2 Fault Indicator Button


Figure showing fault indicator button and internal LEDs

See TABLE 3-2 for internal LED behavior.

 


TABLE 3-2 Internal LED Functions

LED Name

Description

DIMM Fault LED

(The ejector levers on the DIMM slots are the LEDs.)

This LED has two states:

  • Off: DIMM is operating properly.
  • Lit (amber): The system has detected a fault with the DIMM.

CPU Fault LED

(on motherboard)

This LED has two states:

  • Off: CPU is operating properly.
  • Lit (amber): The system has detected a fault with the CPU.


3.3 BIOS POST

The system BIOS provides a rudimentary power-on self-test (POST). The basic devices required for the server to operate are checked, memory is tested, the attached disks are probed and enumerated, and the two dual-gigabit Ethernet controllers are initialized.

The progress of the self-test is indicated by a series of POST codes. Refer to Appendix B for information on BIOS POST codes.

These codes are displayed at the bottom right corner of the system’s VGA screen (once the self-test has progressed far enough to initialize the video monitor). However, the codes are displayed as the self-test runs, and they scroll off the screen too quickly to be read. An alternate method of displaying the POST codes is to redirect the output of the console to a serial port (see Redirecting Console Output).

The message BMC Responding is displayed at the end of the POST.

3.3.1 How BIOS POST Memory Testing Works

The BIOS POST memory testing is performed as follows:

1. The first megabyte of DRAM is tested by the BIOS before the BIOS code is shadowed (that is, copied from ROM to DRAM).

2. Once executing out of DRAM, the BIOS performs a simple memory test
(a write/read of every location with the pattern 55aa55aa).

3. The BIOS polls the memory controllers for both correctable and uncorrectable memory errors and logs those errors into the service processor.

3.3.2 Redirecting Console Output

Use these instructions to access the service processor and redirect the console output so that the BIOS POST codes can be read.

1. Connect a dongle cable to the server module universal connector port (UCP). See FIGURE 1-2.

2. Connect a monitor to the dongle cable video port and a keyboard to a USB port.

3. Power cycle or power on the server.

4. Initialize the BIOS Setup Utility by pressing the F2 key while the system is performing the power-on self-test (POST).

5. When the BIOS Main Menu screen is displayed, select Advanced.

6. When the Advanced Settings screen is displayed, select IPMI 2.0 Configuration.

7. When the IPMI 2.0 Configuration screen is displayed, select the LAN Configuration menu item.

8. Select the IP Address menu item.

The service processor’s IP address is displayed in the following format:
Current IP address in BMC: xxx.xxx.xxx.xxx

Copy the service processor IP address. You will need to insert it into a web browser in the next step.

9. Start a web browser and type the service processor’s IP address in the browser’s URL field.

10. When you are prompted, type a user name and password as follows:

User name: root
Password: changeme

11. When the ILOM service processor web GUI screen is displayed, click the Remote Control tab.

12. Click the Redirection tab.

13. Set the color depth for the redirection console to either 8 or 16 bits.

14. Click the Start Redirection button.

The Remote Console window appears and prompts you for your user name and password again.

15. When you are prompted, type a user name and password as follows:

User name: root
Password: changeme

The current POST screen is displayed.


3.4 Hardware Debug Tool (HDT)

The hardware debug tool (HDT) is a diagnostic tool that allows access to all memory spaces and CPU registers of the system.

3.4.1 HDT Functionality

Available functionality includes:

HDT can be used to:



Note - HDT diagnostics will stop or reset and power cycle the system. Do not use HDT while the operating system is running.


3.4.2 Accessing HDT

You can access HDT through the server module service processor (SP) as follows:

single-step bullet  Log in to server module SP with the following login:

Username: sunservice

Password: changeme

3.4.3 HDT Commands

This command does the following:



Note - On a nonresponsive system, run this command before the system is reset or power cycled.


Where logfile_name is the path and file location.


3.5 Pc-Check Diagnostics Overview

Pc-Check diagnostics can test and detect problems on all motherboard components, drives, ports, and slots. This program can be accessed and executed only from Integrated Lights Out Manager (ILOM). If you are having a problem with your system, use the diagnostics to troubleshoot and solve the problem.

Normally, if you encounter any hardware-related error message (such as memory errors or hard disk errors) on your srver, you will run one of the following selections from the Pc-Check Diagnostics main menu:

Other selections on the Diagnostics main menu display information about the system, create disk partitions and display test results.

3.5.1 Accessing Pc-Check Diagnostics

1. Shut down the server.

For instructions, see Powering Off the Server.

2. Start the ILOM and access the web GUI.

See the Sun Integrated Lights Out Manager 2.0 User’s Guide (820-1158) for details.

3. Select Remote Control => Diagnostics => Run Diagnostics on Boot.

4. From the drop-down menu, select Boot to Manual.

5. Power cycle the platform.

The system boots to the Pc-Check main menu, which offers the following selections:

Use the arrow keys on the keyboard to navigate through the diagnostics software, the Enter key to select a menu item, and the ESC key to exit a menu. Navigation instructions appear at the bottom of each screen.

To test a specific hardware component, select “Advanced Diagnostics Test.” See Advanced Diagnostics for details.

To run a test script, select “Immediate Burn-In Testing.” Sun provides three scripts that include a full test of all possible devices (full.tst), a quick test of devices (quick.tst), and a test that requires no user interaction (noinput.tst). See Performing Immediate Burn-In Testing for details.

To create your own test script, select “Deferred Burn-In Testing.” See Performing Deferred Burn-In Testing for details.


3.6 Pc-Check Menus

The following sections in this chapter describe the menu items and tests in detail.

3.6.1 System Information Menu

Clicking System Information in the Diagnostics main menu causes the System Information menu to appear. Select items in this menu to see detailed information.

TABLE 3-3 describes the selections in the System Information menu.


TABLE 3-3 System Information Menu Options

Option

Description

System Information Menu

Includes basic information about your system, motherboard, BIOS, processor, memory cache, drives, video, modem, network, buses, and ports.

Hardware ID Image Menu

Enables you to create an XML or .txt document showing your system’s hardware ID.

System Management Info

Provides information about the BIOS type, system, motherboard, enclosure, processors, memory modules, cache, slots, system event log, memory array, memory devices, memory device mapped addresses, and system boot.

PCI Bus Info

Includes details about specific devices from pci-config space within the system, similar to the System Management Information section.

IDE Bus Info

Displays information about the IDE bus.

Interrupt Vectors

Displays a list of interrupt vectors.

IRQ Information

Shows hardware interrupt assignments.

Device Drivers

Shows device drivers loaded under OpenDOS.

APM Information

Enables you to test and configure the Advanced Power Management (APM) capabilities of the system. You can choose to change the power state, view the power status, indicate CPU usage, get a power management event, or change the interface mode.

I/O Port Browser

Shows the I/O port assignment for the hardware devices on the system.

Memory Browser

Enables you to view the mapped memory for the entire system.

Sector Browser

Reads sector information from the hard disks sector by sector.

CPU Frequency Monitor

Tests the processor speed.

CMOS RAM Utilities

Shows the CMOS settings of the system.

Text File Editor

Opens a file editor.

Start-Up Options

Enables you to set up startup options for diagnostics testing.


3.6.2 Advanced Diagnostics

Advanced diagnostics are used to test an individual device on the system. Most of the selections on this menu display information about the corresponding devices, and then offer a menu of testing options. For example, to test CPU 0, you can select Advanced Diagnostics => Processor => CPU0.



Note - If you do not know which device to test, see Burn-In Testing.


TABLE 3-4 gives the name and a brief description of the selections in the Advanced Diagnostics Tests menu.



Note - Some of the tests in TABLE 3-4 might be irrelevant for certain systems. Ignore any that are not relevant to your hardware configuration.



TABLE 3-4 Advanced Diagnostics Test Menu Options

Option

Description

Processor

Displays information about the processors and includes a Processor Tests menu.

Memory

Displays information about the memory, and includes tests for the different types of system memory.

Motherboard

Displays information about the motherboard, and includes a Motherboard Tests menu.

Floppy Disks

Not relevant.

Hard Disks

Displays information about the hard disk and includes a Hard Disk Tests menu.

Refer to Hard Disk Testing, for detailed information about scripts and about testing hard disks.

CD-ROM/DVD

Displays a CD-ROM/DVD menu to test DVD devices on the system.

ATAPI Devices

Displays information about devices attached to the IDE controllers on the system other than a DVD or hard disks (for example, zip drives).

ATA

Includes an ATA test menu. Select the parallel ATA driver to test.

USB

Displays information about the USB devices on the system and includes a USB Tests menu.

Network

Performs network register controller tests.

System Stress Test

Exercises and checks the CPU, memory, and hard drive.

Keyboard

Includes a Keyboard Test menu with options for performing different tests on the keyboard.

Mouse

Displays information about the mouse and includes a menu to test the mouse on the system.

Audio

Displays information about the audio devices on the system and includes an Audio Tests menu to test audio device information. A PCI audio card is required to run this test.

Video

Displays information about the video card. Initially, the monitor might flicker, but then it brings up a Video Test Options menu that enables you to perform various video tests.

Firmware -ACPI

Displays information about Advanced Configurable Power Interface (ACPI) and includes an ACPI Tests menu.


3.6.3 Hard Disk Testing

Use these tests to select and test a hard drive. Before starting the test, you can set the parameters using the Test Settings option.

3.6.3.1 To Select and Test a Hard Drive

1. From the main menu, choose Advanced Diagnostics Tests.

2. From the Advanced Diagnostics Tests menu, choose Hard Disks.

3. From the Select Drive menu, choose the hard disk that you need to test.

The Hard Disk Diagnostics dialog opens. It displays information about the selected hard drive and the Hard Disk Tests menu, which includes the following options:

4. Click Select Drive to select a hard drive to test.

5. Click Test Settings, if desired, to select options for that test.

This enables you to change the following parameters:

Selects the number of times to retry testing a device before terminating the test.

Selects the number of errors allowed before terminating the test.

Selects Smart Monitoring Analysis Reporting Test (SMART).

Selects Host Protected Area (HPA) protection.

Selects the test time duration, the percentage of the hard disk to test, and the sectors to be tested on the hard disk.

Selects the test time durations of the devices and the test level.

6. Select a test to begin execution.

The Read Test, Read Verify Test, the Non-Destructive Write Test, and the Destructive Write Test test the actual media on the physical disk drive.

The Mechanics Stress Test and the Internal Cache Test test non-media-related parts of the hard drive hardware.



caution icon Caution - Running the Destructive Write Test destroys any data on the disk.


3.6.4 Burn-In Testing

Burn-In testing enables you to run test scripts and to create new scripts.

The Diagnostics main menu provides two burn-in selections, Immediate Burn-In Testing and Deferred Burn-In Testing.

Sun provides three ready-made scripts designed to test the general health of the devices on your system. These scripts include:



Tip - Each of these scripts tests the operating status of your entire system. To test specific disk drives independently of the rest of the system, use the procedures in Hard Disk Testing.


3.6.4.1 Performing Immediate Burn-In Testing

Use Immediate Burn-In Testing to run test scripts.

1. From the Diagnostics main menu, select Immediate Burn-In Testing.

The screen displays a list of settings shown in TABLE 3-5 and a Burn-In menu.

2. From the Burn-In menu, select Load Burn-In Script.

A text box appears.

3. Type the name of the script you want to run.

where testname is the name of the script that you have created.

4. To change any of the options, at the bottom of the screen, select Change Options.

This opens the Burn-In Options menu, which enables you to modify the options listed in TABLE 3-5 for the currently loaded test script.

5. Select Perform Burn-In Tests.

The diagnostics software executes the test script as configured.

 


TABLE 3-5 Continuous Burn-In Testing Options

Option

Default - General

Default Using quick.tst, noinput.tst, or full.tst Script

All Possible Choices

Pass Control

Overall Time

Overall Passes

Individual Passes, Overall Passes, or Overall Time

Duration

01:00

1

Enter any number to choose the time duration of the test

Script File

N/A

quick.tst, noinput.tst, or full.tst

quick.tst, noiniput.tst, or full.tst

Report File

None

None

User defined

Journal File

None

D:\noinput.jrl, D:\quick.jrl, or D:\full.jrl

User defined

Journal Options

Failed Tests

All Tests, Absent Devices, and Test Summary

Failed Tests, All Tests, Absent Devices, and Test Summary

Pause on Error

N

N

Y or N

Screen Display

Control Panel

Control Panel

Control Panel or Running Tests

POST Card

N

N

Y or N

Beep Codes

N

N

Y or N

Maximum Fails

Disabled

Disabled

1-9999


3.6.4.2 Performing Deferred Burn-In Testing

Use Deferred Burn-In Testing to create scripts.

1. From the Diagnostics main menu, select Deferred Burn-In Testing.

The screen displays a list of settings shown in TABLE 3-5 and a Burn-In menu.

2. Use the Burn-In menu to configure the following selections:

Opens the Burn-In Options menu, which enables you to modify the options listed in TABLE 3-5 for the currently loaded test script.

Opens a listing of the tests available for your workstation configuration and the currently loaded test script.

3. When you are done, select Save Burn-In Script and type the name for the new script.

Enter d:\testname.tst

Where testname is the name of the script that you have created.

4. To run the newly created script, go to Immediate Burn-In Testing in Performing Immediate Burn-In Testing, and run the script testname.tst.

3.6.5 Diagnostic Partition

A diagnostic partition is required for the test scripts to write their log files. Without a diagnostic partition, the only output is the display on the diagnostic screens.

The diagnostic partition is preinstalled on the . You do not need to reinstall the diagnostic partition unless you have removed it.

To change partitions, see the instructions for your operating system.

If you have RAID, you can use the instructions in the Sun StorageTek RAID Manager Software User's Guide (820-1177) and the Uniform Command-Line Interface User's Guide (820-2145).

3.6.6 Show Results Summary

Selecting Show Results Summary on the Diagnostics main menu displays the tests that have been run and lists the results, which can be Pass, Fail, or N/A.

The following list describes all the tests that are available with the Tools and Drivers DVD. If your system does not have the corresponding option, the results will show as N/A in the Show Results Summary list.

This section shows the following tests conducted against the processor: Core Processor Tests, AMD 64-Bit Core Tests, Math Co-Processor Tests - Pentium Class FDIV and Pentium Class FIST, MMX Operation, 3DNow! Operation, SSE Instruction Set, SSE2 Instruction Set, and MP Symmetry.

This section shows the following tests conducted against the motherboard: DMA Controller Tests, System Timer Tests, Interrupt Test, Keyboard Controller Tests, PCI Bus Tests, and CMOS RAM/Clock Tests.

This section shows the following tests conducted against the various types of memory: Inversion Test Tree, Progressive Inv. Test, Chaotic Addressing Test, and Block Rotation Test.

This section shows the following tests conducted against the input device: Verify Device, Keyboard Repeat, and Keyboard LEDs.

This section shows the following tests conducted against the mouse: Buttons, Ballistics, Text Mode Positioning, Text Mode Area Redefine, Graphics Mode Positions, Graphics Area Redefine, and Graphics Cursor Redefine.

This section shows the following tests conducted against the video: Color Purity Test, True Color Test, Alignment Test, LCD Test, and Test Cord Test.

This section shows the following tests conducted against ATAPI devices: Linear Read Test, Non-Destructive Write, and Random Read/Write Test.

This section shows the following tests conducted against the hard disk: Read Test, Read Verify Test, Non-Destructive Write Test, Destructive Write Test, Mechanics Stress Test, and Internal Cache Test.

This section shows the following tests conducted against the USB: Controller Tests and Functional Tests.

The compare test is used to determine the machine ID for the system. This test is not available for the Sun .

3.6.7 Print Results Report

The Print Results Report option enables you to print results of the diagnosis of your system.

Ensure that your server is connected to a printer, and then enter the required information to print the results.

3.6.8 About Pc-Check

The About Pc-Check window includes general information about the Pc-Check software, including resident and nonresident components, such as mouse devices.

3.6.9 Exit

The Exit option exits the Pc-Check software and reboots the server module.