This chapter describes the diagnostic tools available for the system and provides an introduction to using these tools. The chapter also provides information about error indications and software commands to help you determine what component of the system you need to replace.
With the exception of internal disk drives in the Sun Enterprise 220R server, all other component installation or replacement must be performed by a qualified service provider.
The following tasks are covered in this chapter:
Other information covered in this chapter includes:
The system provides both firmware-based and software-based diagnostic tools to help you identify and isolate hardware problems. These tools include:
Power-on self-test (POST) diagnostics
OpenBoot Diagnostics (OBDiag)
SunVTS(TM) software
Sun Enterprise SyMON(TM) software
POST diagnostics verify the core functionality of the system, including the main logic board, system memory, and any on-board I/O devices. You can run POST even if the system is unable to boot. For more information about POST, see "7.2 About Power-On Self-Test (POST) Diagnostics".
OBDiag tests focus on system I/O and peripheral devices. Like POST, you can run OBDiag even if the system is unable to boot. For more information about OBDiag, see "7.5 About OpenBoot Diagnostics (OBDiag)".
The SunVTS system exerciser is a graphics-oriented UNIX application that permits the continuous exercising of system resources and internal and external peripheral equipment. For more information about SunVTS, see "7.8 About SunVTS Software".
UNIX-based Sun Enterprise SyMON allows you to monitor the system hardware status and operating system performance of your server. For information about SyMON, see "7.11 About Sun Enterprise SyMON Software".
Which method or tool you use to diagnose system problems depends on the nature of those problems:
If your machine is not able to boot its operating system software, you need to run POST and OBDiag tests.
If your machine is "healthy" enough to start up and load its operating system software, you can use Sun Enterprise SyMON software and SunVTS software to diagnose system problems.
The following chart provides an overview of when to use the various diagnostic tools to diagnose hardware problems.
The POST diagnostic code resides in flash PROM on the main logic board. The flash PROM that holds the POST code is known as the OpenBoot PROM (OBP) because it also holds the OpenBoot Diagnostic code.
POST tests the following system components each time the system is turned on or a system reset is issued:
CPU modules
Memory modules
NVRAM
Main logic board
POST reports its test results by flashing or steadily illuminating LEDs on the system's front panel. If a keyboard is installed, POST also displays test results on the keyboard LEDs. See "7.12.1 Error Indications" for more information about LEDs and error messages.
POST displays detailed diagnostic and error messages on a local terminal, if one is attached to the system's serial port A. For information about running POST, see "7.3 How to Use POST Diagnostics".
When you turn on the system power, POST diagnostics run automatically if any of the following conditions apply:
The OpenBoot PROM variable diag-switch? is set to true when you power on the system.
You hold down the keyboard's Stop and D keys as you power on the system.
You can view POST diagnostic and error messages locally on an attached terminal.
To view POST diagnostic and error messages on the local system, you need to connect an alphanumeric terminal or establish a tip connection to another Sun system. For more information, see "2.10 About Communicating With the Server" or if you already have a console setup, see "7.4 How to Set Up a tip Connection".
You must verify baud rates between a system and a monitor or a system and a terminal. See "7.4.1 How to Verify the Baud Rate".
You can choose to run an abbreviated POST with concise error and status reporting or run an extensive POST with more detailed messages. For more information, see "7.7 How to Set the Diagnostic Level for POST and OBDiag".
If a console or a monitor is not connected to serial port A (default port) of a system to be tested, the keyboard LEDs are used to determine error conditions. See "7.12.1 Error Indications".
Ensure that the front panel keyswitch is in the Standby position.
You can initialize POST one of two ways:
By setting the diag-switch? to true and the diag-level to max or min, followed by power cycling the system unit
By simultaneously pressing the keyboard Stop and D keys while power is applied to the system unit
To set the diag-switch? to true and power cycle the system unit:
When the ok prompt is displayed, type the following command:
ok setenv diag-switch? true |
At the Type-5 keyboard, power cycle the system by simultaneously pressing the Shift key and the Power-on key After a few seconds press the Power-on key again, or press the Power button on the system once.
The keyswitch must be set to the Power-On/Off position.
The system runs the POST diagnostics. POST displays status and error messages on the system console. For more information, see the "Results" section below.
Upon successful completion of POST, the system runs OBDiag. For more information about OBDiag, see "7.5 About OpenBoot Diagnostics (OBDiag)" and "7.6 How to Use OpenBoot Diagnostics (OBDiag)".
While POST is running, you can observe its progress and any error indications in the following locations:
System console or through a tip connection
Front panel fault LEDs
Keyboard LEDs (if a keyboard is present)
As POST runs, it displays detailed diagnostic status messages on the system console. If POST detects an error, it displays an error message on the system console that indicates the failing part. A sample error message is provided below:
Power On Self Test Failed. Cause: DIMM U0702 or System Board ok |
POST status and error conditions are indicated by the general fault LED on the system front panel. The LED flashes slowly to indicate that POST is running. It remains lit if POST detects a fault.
If a Sun Type-5 keyboard is attached, POST status and error indications are also displayed via the four LEDs on the keyboard. When POST starts, all four keyboard LEDs flash on and off simultaneously. After that, the Caps Lock LED flashes slowly to indicate POST is running. If an error is detected, the pattern of the lit LEDs provides an error indication. See "7.12.1 Error Indications" for more information.
If POST detects an error condition that prevents the system from booting, it halts operation and displays the ok prompt. The last message displayed by POST prior to the ok prompt indicates which part you need to replace.
A tip connection enables you to use a remote shell window as a terminal to display test data from a system. Serial port A or serial port B of a tested system is used to establish the tip connection between the system being tested and another Sun system monitor or TTY-type terminal. The tip connection is used in a terminal window and provides features to help with the OBP.
To set up a tip connection:
Connect serial port A of the system being tested to serial port B of another Sun system using a serial null modem cable (connect cable pins 2-3, 3-2, 7-20, and 20-7).
At the other Sun system, check the /etc/remote file by changing to the /etc directory and then editing the remote file:
hardwire:/ dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D: |
The example shows connection to serial port B.
To use serial port A:
In a shell window on the Sun system, type tip hardwire.
hostname% tip hardwire connected |
The shell window is now a tip window directed to the serial port of the system being tested. When power is applied to the system being tested, POST messages will be displayed in this window.
When POST is completed, disconnect the tip window as follows:
To verify the baud rate between the system being tested and a terminal or another Sun system monitor:
Open a shell window.
Type eeprom.
Verify the following serial port default settings as follows:
ttyb-mode = 9600,8,n,1 ttya-mode = 9600,8,n,1 |
Ensure that the settings are consistent with TTY-type terminal or system monitor settings.
The OpenBoot Diagnostics (OBDiag) utility resides in flash PROM on the main logic board. OBDiag can isolate errors in the following system components:
Main logic board
Diskette drive (if applicable)
CD-ROM drive
Tape drive
Disk drives
Any option card that contains an on-board self-test
On the main logic board, OBDiag tests not only the main logic board but also its interfaces:
PCI
SCSI
TPE Ethernet including MII Ethernet
Serial
Parallel
Keyboard/mouse
OBDiag reports some test results by flashing or steadily illuminating the LEDs on the system front panel. See "7.12.1 Error Indications" for more information about LEDs and error messages.
OBDiag also displays detailed diagnostic and error messages on a local console or terminal, if one is attached to the system.
OBDiag tests run automatically under certain conditions. You can also run OBDiag interactively from the system ok prompt. For information about running OBDiag, see "7.6 How to Use OpenBoot Diagnostics (OBDiag)".
When you run OBDiag interactively from the ok prompt, OBDiag displays a menu that lists all of the diagnostic tests that OBDiag can perform. For information about the OBDiag menu, see "7.5.1 OBDiag Menu".
The system also provides configuration variables that you can set to alter the operation of the OBDiag tests. For information about the configuration variables, see "7.5.2 Configuration Variable".
The OBDiag menu is created dynamically whenever you invoke OBDiag in interactive mode. Therefore, the menu entries may vary from system to system, depending on the system configuration. OBDiag also determines whether any optional devices are installed in the system. If the device has an on-board self-test, OBDiag incorporates the test name into the list of menu entries. It displays the menu entries in alphabetical order and numbers them accordingly. Consequently, the number and position of menu items may vary from system to system, depending on the system configuration. For example, the Keyboard and Mouse test options are displayed only if your system includes a keyboard and mouse.
The OBDiag menu displays the core tests that exercise parts of the basic system. These tests can be seen in the sample OBDiag menu displayed below. For a description of each test, see "7.6.2 OBDiag Tests".
OBDiag Menu 0 ..... PCI/Cheerio 1 ..... EBUS DMA/TCR Registers 2 ..... Ethernet 3 ..... Keyboard 4 ..... Mouse 5 ..... Parallel Port 6 ..... Serial Port A 7 ..... Serial Port B 8 ..... NVRAM 9 ..... Audio 10 ..... SCSI 11 ..... All Above 12 ..... Quit 13 ..... Display this Menu 14 ..... Toggle script-debug 15 ..... Enable External Loopback Tests 16 ..... Disable External Loopback Tests Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The following table provides information about the OpenBoot PROM configuration variable stored in NVRAM. This variable affects the operation of OBDiag. Use the printenvs command to show current values and the setenv command to set or change a value. Both commands are described in "7.12.2 Software Commands".
Variable |
Setting |
Description |
Default |
---|---|---|---|
diag-level |
off |
No tests are run at power up. |
|
|
min |
Performs minimal testing of core functionality. |
min |
|
max |
Runs exhaustive tests for all functions except external loopbacks. External loopback tests are not available. |
|
When you turn on the system power, OBDiag runs automatically if any of the following conditions apply:
The diag-switch? OpenBoot PROM variable is set to true.
You hold down the keyboard's Stop and D keys as you power on the system. The systems's ok prompt will appear.
In the event of an automatic system reset, POST diagnostics run under the following condition:
The OpenBoot PROM variable diag-switch? is set to true.
You can also run OBDiag in an interactive mode and select which tests you want to perform. The following procedure describes how to run OBDiag interactively from the system ok prompt.
Perform this procedure with the power on and the keyswitch in the Power-On/Off position.
With the keyswitch in the Power-On/Off position, press the Break key on your alphanumeric terminal keyboard, or enter the Stop-a sequence on a Sun keyboard.
To enter the Stop-a sequence, press the Stop key and the a key simultaneously. The ok prompt is displayed.
(Optional) Select a diagnostic level.
Three different levels of diagnostic testing are available for OBDiag; see "7.7 How to Set the Diagnostic Level for POST and OBDiag".
At the ok prompt type:
ok setenv diag-switch? true diag-switch? = true |
At the ok prompt, type:
ok obdiag |
The OBDiag menu is displayed.
The OBDiag menu is built dynamically each time you run the obdiag command. The exact number and order of menu items in the example might not match the menu items on your system.
OBDiag Menu 0 ..... PCI/Cheerio 1 ..... EBUS DMA/TCR Registers 2 ..... Ethernet 3 ..... Keyboard 4 ..... Mouse 5 ..... Parallel Port 6 ..... Serial Port A 7 ..... Serial Port B 8 ..... NVRAM 9 ..... Audio 10 ..... SCSI 11 ..... All Above 12 ..... Quit 13 ..... Display this Menu 14 ..... Toggle script-debug 15 ..... Enable External Loopback Tests 16 ..... Disable External Loopback Tests Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
At the OBDiag menu prompt, type 14 to select toggle script-debug.
Selecting toggle script-debug enables verbose test message displays.
At the Enter prompt, type the appropriate test number.
The OBDiag tests are described in the following sections:
The OBDiag Audio test is not available for this system.
The PCI/Cheerio test performs the following diagnostics.
Test |
Function |
---|---|
vendor_ID_test |
Verifies that the U2P ASIC vendor ID is 108e. |
device_ID_test |
Verifies that the U2P ASIC device ID is 1000. |
mixmode_read |
Verifies that the PCI configuration space is accessible as half-word bytes by reading the EBus2 vendor ID address. |
e2_class_test |
Verifies the address class code. Address class codes include bridge device (0 x B, 0 x 6), other bridge device (0 x A and 0 x 80), and programmable interface (0 x 9 and 0 x 0). |
status_reg_walk1 |
Performs walk-one test on status register with mask 0 x 280 (U2P ASIC is accepting fast back-to-back transactions, DEVSEL timing is 0 x 1). |
line_size_walk1 |
Performs tests a through e. |
latency_walk1 |
Performs walk-one test on latency timer. |
line_walk1 |
Performs walk-one test on interrupt line. |
pin_test |
Verifies that the interrupt pin is logic-level high (1) after reset. |
The following example shows the PCI/Cheerio diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 0 Test vendor_ID_test device_ID_test mixmode_read e2_class_test status_reg_walk1 line_size_walk1 latency_walk1 line_walk1 pin_test SUBTEST='pin_test' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The EBus DMA/TCR registers diagnostic performs the following tests.
Test |
Function |
---|---|
DMA_reg_test |
Performs a walking ones bit test for control status register, address register, and byte count register of each channel. Verifies that the control status register is set properly. |
DMA_func_test |
Validates the DMA capabilities and FIFOs. Test is executed in a DMA diagnostic loopback mode. Initializes the data of transmitting memory with its address, performs a DMA read and write, and verifies that the data received is correct. Repeats for four channels. |
The following example shows the EBus DMA/TCR registers diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 1 TEST='all_dma/ebus_test' SUBTEST='dma_reg_test' SUBTEST='dma_func_test' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The Ethernet diagnostic performs the following tests.
Test |
Function |
---|---|
my_channel_reset |
Resets the Ethernet channel. |
hme_reg_test |
Performs walk-one on the following registers set: global register 1, global register 2, bmac xif register, bmac tx register, and the mif register. |
MAC_internal_loopback_test |
Performs Ethernet channel engine internal loopback. |
10_mb_xcvr_loopback_test |
Enables the 10BASE-T data present at the transmit MII data inputs to be routed back to the receive MII data outputs. |
100_mb_phy_loopback_test |
Enables MII transmit data to be routed to the MII receive data path. |
100_mb_twister_loopback_test |
Forces the twisted-pair transceiver into loopback mode. |
The following example shows the Ethernet diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 2 TEST='ethernet_test' SUBTEST='my_channel_reset' SUBTEST='hme_reg_test' SUBTEST='global_reg1_test' SUBTEST='global_reg2_test' SUBTEST='bmac_xif_reg_test' SUBTEST='bmac_tx_reg_test' SUBTEST='mif_reg_test' SUBTEST='mac_internal_loopback_test' SUBTEST='10mb_xcvr_loopback_test' SUBTEST='100mb_phy_loopback_test' Enter (0-12 tests, 13 -Quit, 14 -Menu) ===> |
The keyboard diagnostic consists of an external and an internal loopback. The external loopback requires a passive loopback connector. The internal loopback verifies the keyboard port by transmitting and receiving 128 characters.
The following example shows the keyboard diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 3 TEST='keyboard_test' SUBTEST='internal_loopback' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The mouse diagnostic performs a keyboard-to-mouse loopback.
The following example shows the mouse diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 4 TEST='mouse_test' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The parallel port diagnostic performs the following tests.
Test |
Function |
---|---|
sio_passive_lb |
Sets up the Super I/O configuration register to enable extended/compatible parallel port select, then does a write 0, walk one, write 0 x ff to the data register. It verifies the results by reading the status register. |
dma_read |
Enables ECP mode and ECP DMA configuration, and FIFO test mode. Transfers 16 bytes of data from memory to the parallel port device and then verifies the data is in FIFO device. |
The following example shows the parallel port diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 5 TEST='parallel_port_test' SUBTEST='dma_read' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The serial port A diagnostic invokes the uart_loopback test, which transmits and receives 128 characters and checks the transaction validity. The following baud rates are tested in asynchronous mode: 460800, 307200, 230400, 153600, 76800, 57600, 38400, 19200, 9600, 4800, 2400, and 800.
The following example shows the serial port A output message when serial port A is being used for the tip connection.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 6 TEST='uarta_test' `UART A in use as console - Test not run.' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The following example shows the serial port A diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 7 TEST='uartb_test' BAUDRATE='1200' BAUDRATE='1800' BAUDRATE='2400' BAUDRATE='4800' BAUDRATE='9600' BAUDRATE='19200' BAUDRATE='38400' BAUDRATE='57600' BAUDRATE='76800' BAUDRATE='115200' BAUDRATE='153600' BAUDRATE='230400' BAUDRATE='307200' BAUDRATE='460800' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The serial port B diagnostic is identical to the serial port A diagnostic.
The following example shows the serial port B diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 7 TEST='uartb_test' BAUDRATE='1200' BAUDRATE='1800' BAUDRATE='2400' BAUDRATE='4800' BAUDRATE='9600' BAUDRATE='19200' BAUDRATE='38400' BAUDRATE='57600' BAUDRATE='76800' BAUDRATE='115200' BAUDRATE='153600' BAUDRATE='230400' BAUDRATE='307200' BAUDRATE='460800' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The NVRAM diagnostic verifies the NVRAM operation by performing a write and read to the NVRAM.
The following example shows the NVRAM diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 8 TEST='nvram_test' SUBTEST='write/read_patterns' SUBTEST='write/read_inverted_patterns' Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> |
The audio diagnostic is not included for this system.
The SCSI diagnostic validates both the SCSI chip and the SCSI bus subsystem.
The following example shows the SCSI diagnostic output message.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 10 TEST='selftest' Enter (0-12 tests, 13 -Quit, 14 -Menu) ===> |
The all above diagnostic validates the system.
The following example shows the all above diagnostic output message.
The all above diagnostic stalls if the tip line is installed on serial port A or serial port B.
Enter (0-11 tests, 12 -Quit, 13 -Menu) ===> 11 TEST='all_pci/cheerio_test' SUBTEST='vendor_id_test' SUBTEST='device_id_test' SUBTEST='mixmode_read' SUBTEST='e2_class_test' SUBTEST='status_reg_walk1' SUBTEST='line_size_walk1' SUBTEST='latency_walk1' SUBTEST='line_walk1' SUBTEST='pin_test' TEST='all_dma/ebus_test' SUBTEST='dma_reg_test' SUBTEST='dma_func_test' TEST='ethernet_test' SUBTEST='my_channel_reset' SUBTEST='hme_reg_test' SUBTEST='global_reg1_test' SUBTEST='global_reg2_test' SUBTEST='bmac_xif_reg_test' SUBTEST='bmac_tx_reg_test' SUBTEST='mif_reg_test' SUBTEST='mac_internal_loopback_test' SUBTEST='10mb_xcvr_loopback_test' SUBTEST='100mb_phy_loopback_test' TEST='keyboard_test' SUBTEST='internal_loopback' TEST='mouse_test' SUBTEST='mouse_loopback' ###OBDIAG_MFG_START### TEST='mouse_test' STATUS='FAILED' SUBTEST='mouse_loopback' ERRORS='1 ` TTF='456 ` SPEED='450.04 MHz' PASSES='1 ` MESSAGE='Error: Timeout receiving a character' TEST='floppy_test' SUBTEST='floppy_id0_read_test' TEST='parallel_port_test' SUBTEST='dma_read' TEST='uarta_test' `UART A in use as console - Test not run.' TEST='uartb_test' BAUDRATE='1200' BAUDRATE='1800' BAUDRATE='2400' BAUDRATE='4800' BAUDRATE='9600' BAUDRATE='19200' BAUDRATE='38400' BAUDRATE='57600' BAUDRATE='76800' BAUDRATE='115200' BAUDRATE='153600' BAUDRATE='230400' BAUDRATE='307200' BAUDRATE='460800' TEST='nvram_test' SUBTEST='write/read_patterns' SUBTEST='write/read_inverted_patterns' TEST='audio_test' SUBTEST='cs4231_test' Codec_ID='8a' Version_ID='a0' SUBTEST='external_lpbk' External Audio Test not run: Please set the mfg-mode to sys-ext. ###OBDIAG_MFG_START### TEST='audio_test' STATUS='FAILED' SUBTEST='external_lpbk' ERRORS='1 ` TTF='468 ` SPEED='450.04 MHz' PASSES='1 ` MESSAGE='Error: internal_loopback TBD' TEST='selftest' Enter (0-12 tests, 13 -Quit, 14 -Menu) ===> |
Three different levels of diagnostic testing are available for power-on self-test (POST) and OpenBoot Diagnostics (OBDiag): max (maximum level), min (minimum level), and off (no testing). The system runs the appropriate level of diagnostics based on the setting of the OpenBoot PROM variable diag-level.
The default setting for diag-level is min.
If your server is set up without a local console or terminal, you will need to set up a monitor, console, or terminal before setting the diagnostic level. See "2.10 About Communicating With the Server".
Perform this procedure with the power on and the keyswitch set to the Power-On/Off position.
With the keyswitch in the Power-On/Off position, press the Break key on your alphanumeric terminal's keyboard, or enter the Stop-a sequence on a Sun keyboard.
To enter the Stop-a sequence, press the Stop key and the a key simultaneously. The ok prompt is displayed.
To set the diag-level variable, type the following:
ok setenv diag-level value |
The value can be off, min, or max. See "7.5.2 Configuration Variable" for information about each setting.
SunVTS(TM), the Sun Validation and Test Suite, is an online diagnostics tool and system exerciser for verifying the configuration and functionality of hardware controllers, devices, and platforms. You can run SunVTS using any of these interfaces: a command-line interface, a TTY interface, or a graphical interface that runs within a windowed desktop environment.
SunVTS software lets you view and control a testing session over modem lines or over a network. Using a remote system, you can view the progress of a SunVTS testing session, as well as change testing options and control all testing features of another system on the network.
Useful tests to run on your system are listed below.
SunVTS Test |
Description |
---|---|
ecpptest |
Verifies the ECP1284 parallel port printer functionality |
cdtest |
Tests the CD-ROM drive by reading the disc and verifying the CD table of contents (TOC), if it exists |
disktest |
Verifies local disk drives |
fputest |
Checks the floating-point unit |
fstest |
Tests the integrity of the software's file systems |
m64test |
Tests the PGX frame buffer card |
mptest |
Verifies multiprocessor features (for systems with more than one processor) |
nettest |
Checks all the hardware associated with networking (for example, Ethernet, token ring, quad Ethernet, fiber optic, 100-Mbit per second Ethernet devices) |
pmem |
Tests the physical memory (read only) |
sptest |
Tests the system's on-board serial ports |
tapetest |
Tests the various Sun tape devices |
vmem |
Tests the virtual memory (a combination of the swap partition and the physical memory) |
The following documents provide information about SunVTS software. They are available on the Solaris on Sun Hardware AnswerBook. This AnswerBook documentation is provided on the Sun Updates CD for the Solaris release you are running.
SunVTS User's Guide
This document describes the SunVTS environment, including how to start and control the various user interfaces. SunVTS features are described in this document.
SunVTS Test Reference Manual
This document contains descriptions of each test SunVTS software runs in the SunVTS environment. Each test description explains the various test options and gives command-line arguments.
SunVTS Quick Reference Card
This card gives an overview of the main features of the SunVTS Open Look interface.
SunVTS software is an optional package that may or may not have been loaded when your system software was installed.
To check whether SunVTS software is installed, you must access your system either from a console or from a remote machine logged in to the system.
Type the following:
% pkginfo -l SUNWvts |
If SunVTS software is loaded, information about the package will be displayed.
If SunVTS software is not loaded, you'll see an error message:
ERROR: information for "SUNWvts" was not found |
If necessary, use the pkgadd utility to load the SUNWvts package onto your system from the Sun Updates CD.
Note that /opt/SUNWvts is the default directory for installing SunVTS software.
For more information, refer to the appropriate Solaris documentation, as well as the pkgadd reference manual page.
If your system passes the firmware-based diagnostics and boots the operating system, yet does not function correctly, you can use SunVTS software, the Sun Validation and Test Suite, to run additional tests. These tests verify the configuration and functionality of most hardware controllers and devices.
You must have root or superuser access to run SunVTS tests and the system must be booted to the multiuser level (level 3). If you are not familiar with these or other basic UNIX commands and procedures, such as shutting down the system, booting the system, and configuring devices, you can find the information you need in the following sources:
Solaris Handbook for Sun Peripherals
AnswerBook2(TM) online documentation for the Solaris operating environment
Other software documentation that you received with your system
This procedure assumes that you will test your Sun Enterprise 220R server remotely by running a SunVTS session from a workstation using the SunVTS graphical interface. For information about other SunVTS interfaces and options, see "7.1 About Diagnostic Tools".
Use xhost to give the remote server access to the workstation display.
On the system from which you will be running the SunVTS graphical interface, type:
% /usr/openwin/bin/xhost + remote_hostname |
Substitute the name of the Sun Enterprise 220R server for remote_hostname. Among other things, this command gives the server display permissions to run the SunVTS graphical interface in the OpenWindows(TM) environment of the workstation.
Remotely log in to the server as superuser (root).
Check whether SunVTS software is loaded on the server.
SunVTS is an optional package that may or may not have been loaded when the server software was installed. For more information, see "7.9 How to Check Whether SunVTS Software Is Installed".
To start the SunVTS software, type:
# cd /opt/SUNWvts/bin # ./sunvts -display local_hostname:0 |
Substitute the name of the server you are using for local_hostname. Note that /opt/SUNWvts/bin is the default /bin directory for SunVTS software. If you have installed SunVTS software in a different directory, use the appropriate path instead.
When you start SunVTS software, the SunVTS kernel probes the test system devices. The results of this probe are displayed on the Test Selection panel. For each hardware device on your system, there is an associated SunVTS test.
Fine-tune your testing session by selecting only the tests you want to run.
Click to select and deselect tests. (A check mark in the box indicates the item is selected.)
If SunVTS tests indicate an impaired or defective part, see the procedures in this service manual or contact your qualified Sun service provider to replace the defective part.
Sun Enterprise SyMON software is a GUI-based diagnostic tool designed to monitor system hardware status and UNIX operating system performance. It offers simple, yet powerful monitoring capabilities that allow you to:
Diagnose and address potential problems such as capacity problems or bottlenecks
Display physical and logical views of your exact server configuration
Monitor your server remotely from any location in the network
Isolate potential problems or failed components
For instructions about installing and using Sun Enterprise SyMON software, see the Sun Enterprise SyMon User's Guide.
See the Web site www.sun.com/symon for current software and documentation information.
The system provides the following features to help you identify and isolate hardware problems:
Error indications
Software commands
Diagnostic tools
This section describes the error indications and software commands provided to help you troubleshoot your system. Diagnostic tools are covered in "7.1 About Diagnostic Tools".
The system provides error indications via LEDs and error messages. Using the two in combination, you can isolate a problem to a particular field-replaceable unit (FRU) with a high degree of confidence.
The system provides fault LEDs in the following places:
Front panel
Keyboard
Power supplies
Disk drives
Error messages are logged in the /var/adm/messages file and are also displayed on the system console by the diagnostic tools.
Front panel LEDs provide your first indication if there is a problem with your system. Usually, a front panel LED is not the sole indication of a problem. Error messages and even other LEDs can help to isolate the problem further.
The front panel has a general fault indicator that lights whenever POST or OBDiag detects any kind of fault ( including a fault reported by a power supply).
Four LEDs on the Sun Type-5 keyboard are used to indicate the progress and results of POST diagnostics. These LEDs are on the Caps Lock, Compose, Scroll Lock, and Num Lock keys as shown below.
A keyboard is not shipped with the system. To read keyboard LEDs you must obtain a keyboard of the appropriate type (see the following graphic) and connect it to the keyboard/mouse port on the system's back panel.
To indicate the beginning of POST diagnostics, the four LEDs briefly light all at once. The monitor screen remains blank, and the Caps Lock LED flashes for the duration of the testing.
If the system passes all POST diagnostic tests, all four LEDs light again and then go off. Once the system banner appears on the monitor screen, the keyboard LEDs assume their normal functions and should no longer be interpreted as diagnostic error indicators.
If the system fails any test, one or more LEDs will light to form an error code that indicates the nature of the problem.
The LED error code may be lit continuously, or for just a few seconds, so it is important to observe the LEDs closely while POST is running.
The following table provides error code definitions.
Caps Lock |
Compose |
Scroll Lock |
Num Lock |
Failing FRU |
---|---|---|---|---|
On |
Off |
Off |
Off |
Main logic board |
Off |
On |
Off |
Off |
CPU module 0 |
Off |
On |
On |
Off |
CPU module 1 |
On |
Off |
Off |
On |
No memory detected |
On |
On |
On |
On |
Memory bank 0 |
On |
On |
Off |
On |
Memory bank 1 |
On |
On |
On |
Off |
Memory bank 2 |
On |
On |
On |
On |
Memory bank 3 |
Off |
Off |
Off |
On |
NVRAM |
The Caps Lock LED flashes on and off to indicate that POST diagnostics are running; all other LEDs are off. When the LED lights steadily, it indicates an error.
Power supply LEDs are visible from the front of the system when the doors are open. The following figure shows the LEDs on the power supply in bay 1.
The following table provides a description of each LED.
LED Name |
Icon |
Description |
---|---|---|
DC Status |
|
This green LED is lit to indicate that all DC outputs from the power supply are functional. |
Fault |
|
This yellow LED is lit to indicate a fault in the power supply. The power supply is non-functional and there is no DC output to the system. The yellow LED on the system front panel also lights if this LED is lit. |
AC-Present |
|
This green LED is lit to indicate that the primary circuit has power. When this LED is lit, the power supply is providing standby power to the system. |
The disk drive LEDs are visible from the front of the system when the left door is open, as shown in the following figure.
When a disk drive LED lights steadily and is green, it indicates that the slot is populated and that the drive is receiving power. When an LED is green and flashing, it indicates that there is activity on the disk. Some applications use the LED to indicate a fault on the disk drive. In this case, the LED changes color to yellow and lights steadily. The disk drive LEDs retain their state when the system is powered off. A yellow indicator also results in the yellow general fault indicator being lit on the system front panel.
Error messages and other system messages are saved in the file /var/adm/messages.
The two firmware-based diagnostic tools, POST and OBDiag, provide error messages either locally on the system console, or terminal, or in a remote console window through a tip connection. These error messages can help to further refine your problem diagnosis. The amount of error information displayed in diagnostic messages is determined by the value of the OpenBoot PROM variable diag-verbosity. See "7.5.2 Configuration Variable" for additional details.
System software provides Solaris operating system commands that you can use to diagnose problems, and OBP commands that enable you to diagnose problems even if the Solaris operating environment is unavailable for any reason. For more information on Solaris commands, see the appropriate man pages. For additional information on OBP commands, see the OpenBoot 3.x Command Reference Manual. (An online version of the manual is included with the Solaris System Administrator AnswerBook that ships with Solaris software.)
The prtdiag command is a UNIX shell command used to display system configuration and diagnostic information, such as:
System configuration, including information about clock frequencies, CPUs, memory, and I/O card types
Diagnostic information
Failed field-replaceable units (FRUs)
To run prtdiag, type:
% /usr/platform/sun4u/sbin/prtdiag |
To isolate an intermittent failure, it may be helpful to maintain a prtdiag history log. Use prtdiag with the -l (log) option to send output to a log file in /var/adm.
Refer to the prtdiag man page for additional information.
An example of prtdiag output follows. The exact format of prtdiag output depends on which version of the Solaris operating environment is running on your system.
prtdiagoutput:
ok /usr/platform/sun4u/sbin/prtdiag -v System Configuration: Sun Microsystems sun4u Sun Enterprise 220R (UltraSPARC-II 450MHz) System clock frequency: 112 MHz Memory size: 128 Megabytes ========================= CPUs ======================== Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- ------- ----- ------ ------ ---- 0 0 0 450 4.0 US-II 10.0 ========================= IO Cards ========================= Bus Brd Type MHz Slot Name Model --- ---- ---- ---- ------------------ ---------------------- 0 PCI 33 1 network-SUNW,hme 0 PCI 33 3 scsi-glm/disk (block) Symbios,53C875 0 PCI 33 3 scsi-glm/disk (block) Symbios,53C875 No failures found in System =========================== ====================== HW Revisions ====================== ASIC Revisions: PCI: pci Rev 4 Cheerio: ebus Rev 1 System PROM revisions: ---------------------- OBP 3.23.0 1999/06/30 14:57 POST 2.0.2 1998/10/19 10:46 ok |
The eeprom command is a UNIX shell command. You invoke it to list the names and current values of the OpenBoot PROM configuration variables stored in system NVRAM. You can also use the eeprom command to set new values for the OpenBoot PROM configuration variables.
To run the eeprom command:
Boot the operating system.
Become root.
Type the following command at the command-line prompt:
% eeprom % scsi-initiator-id=7 keyboard-click?=false keymap: data not available. ttyb-rts-dtr-off=false ttyb-ignore-cd=true ttya-rts-dtr-off=false ttya-ignore-cd=true ttyb-mode=9600,8,n,1,- ttya-mode=9600,8,n,1,- pcia-probe-list=1 pcib-probe-list=1,3,2,4,5 enclosure-type: 540-4284 banner-name: Sun Enterprise 220R energystar-enabled?=false mfg-mode=off diag-level=min #power-cycles=35 system-board-serial#=5014450071228 system-board-date=371c1bc9 fcode-debug?=false output-device=screen input-device=keyboard load-base=16384 boot-command=boot auto-boot?=true watchdog-reboot?=false diag-file: data not available. diag-device=net boot-file: data not available. boot-device=disk net local-mac-address?=false ansi-terminal?=true screen-#columns=80 screen-#rows=34 silent-mode?=false use-nvramrc?=false |
nvramrc: data not available. security-mode=none security-password: data not available. security-#badlogins=0 oem-logo: data not available. oem-logo?=false oem-banner: data not available. oem-banner?=false hardware-revision: data not available. last-hardware-update: data not available. diag-switch?=true |
If the NVRAM is a new part, you must reset the values of the following OpenBoot PROM configuration variables: banner-name= Sun Enterprise 220R, enclosure-type= , 540-4284and energystar-enabled?= false.
To set the values for the OpenBoot PROM configuration variables shown in the following example, boot the operating system, log on as root, and enter the following commands.
% % eeprom banner-name="Sun Enterprise 220R" % eeprom enclosure-type="540-4284" % eeprom energystar-enabled?=false |
Verify the variable settings by running the eeprom command without any parameters, as shown in the following example.
% eeprom % scsi-initiator-id=7 keyboard-click?=false keymap: data not available. ttyb-rts-dtr-off=false ttyb-ignore-cd=true ttya-rts-dtr-off=false ttya-ignore-cd=true ttyb-mode=9600,8,n,1,- ttya-mode=9600,8,n,1,- pcia-probe-list=1 pcib-probe-list=1,3,2,4,5 enclosure-type: 540-4284 banner-name: Sun Enterprise 220R energystar-enabled?=false mfg-mode=off diag-level=min #power-cycles=35 system-board-serial#=5014450071228 system-board-date=371c1bc9 fcode-debug?=false output-device=screen input-device=keyboard load-base=16384 boot-command=boot auto-boot?=true watchdog-reboot?=false diag-file: data not available. diag-device=net boot-file: data not available. boot-device=disk net local-mac-address?=false ansi-terminal?=true screen-#columns=80 screen-#rows=34 silent-mode?=false use-nvramrc?=false |
To display the names, current values, and default values of the OpenBoot PROM configuration variables stored in NVRAM, enter the OBP printenv command at the OBP ok prompt, as shown in the following example
ok printenv variable name Value Default Value scsi-initiator-id 7 7 keyboard-click? false false keymap ttyb-rts-dtr-off false false ttyb-ignore-cd true true ttya-rts-dtr-off false false ttya-ignore-cd true true ttyb-mode 9600,8,n,1,- 9600,8,n,1,- ttya-mode 9600,8,n,1,- 9600,8,n,1,- pcia-probe-list 1 1 pcib-probe-list 1,3,2,4,5 1,3,2,4,5 enclosure-type 540-4284 banner-name Sun Enterprise 220R energystar-enabled? false true mfg-mode off off diag-level min min #power-cycles 35 system-board-serial# 5014450071228 system-board-date 371c1bc9 fcode-debug? false false output-device screen screen input-device keyboard keyboard load-base 16384 16384 boot-command boot boot auto-boot? true true watchdog-reboot? false false diag-file diag-device net net boot-file boot-device disk net disk net local-mac-address? false false ansi-terminal? true true screen-#columns 80 80 screen-#rows 34 34 silent-mode? false false use-nvramrc? false false |
silent-mode? false false security-mode none security-password security-#badlogins 0 oem-logo oem-logo? false false oem-banner oem-banner? false false hardware-revision last-hardware-update diag-switch? true false |
To set the value of an OpenBoot PROM configuration variable stored in NVRAM, enter the OBP setenv command at the OBP ok prompt, as shown in the following example.
If the NVRAM is a new part, you must reset the three OpenBoot PROM configuration variables shown in this example. These variables are named banner-name, enclosure-type, and energystar-enabled? and they must be set to the values shown in the example.
ok setenv banner-name = Sun Enterprise 220R ok setenv enclosure-type = 540-4284 ok setenv energystar-enabled? = false ok printenv variable name Value Default Value scsi-initiator-id 7 7 keyboard-click? false false keymap ttyb-rts-dtr-off false false ttyb-ignore-cd true true ttya-rts-dtr-off false false ttya-ignore-cd true true ttyb-mode 9600,8,n,1,- 9600,8,n,1,- ttya-mode 9600,8,n,1,- 9600,8,n,1,- pcia-probe-list 1 1 pcib-probe-list 1,3,2,4,5 1,3,2,4,5 enclosure-type 540-4284 banner-name Sun Enterprise 220R energystar-enabled? false mfg-mode off off diag-level min min #power-cycles 35 system-board-serial# 5014450071228 system-board-date 371c1bc9 fcode-debug? false false output-device screen screen input-device keyboard keyboard load-base 16384 16384 boot-command boot boot auto-boot? true true watchdog-reboot? false false diag-file diag-device net net boot-file boot-device disk net disk net local-mac-address? false false ansi-terminal? true true screen-#columns 80 80 screen-#rows 34 34 silent-mode? false false use-nvramrc? false false |
nvramrc security-mode none security-password security-#badlogins 0 oem-logo oem-logo? false false oem-banner oem-banner? false false hardware-revision last-hardware-update diag-switch? true false |
To diagnose problems with the SCSI subsystem, you can use the OBP probe-scsi and probe-scsi-all commands. Both commands require that you halt the system.
When it is not practical to halt the system, you can use SunVTS software as an alternative method of testing the SCSI interfaces. See "7.1 About Diagnostic Tools" for more information.
The probe-scsi command transmits an inquiry command to all SCSI devices connected to the main logic board SCSI interfaces. These include any tape or CD-ROM drive in the removable media assembly (RMA), any internal disk drive, and any device connected to the external SCSI connector on the system back panel. For any SCSI device that is connected and active, its target address, unit number, device type, and manufacturer name are displayed.
The probe-scsi-all command transmits an inquiry command to all SCSI devices connected to the system SCSI host adapters, including any host adapters installed in PCI slots. The first identifier listed in the display is the SCSI host adapter address in the system device tree followed by the SCSI device identification data.
The first example that follows shows a probe-scsi output message. The second example shows a probe-scsi-all output message.
probe-scsi output:
ok probe-scsi This command may hang the system if a Stop-A or halt command has been executed. Please type reset-all to reset the system before executing this command. Do you wish to continue? (y/n) n ok reset-all ok probe-scsi Primary UltraSCSI bus: Target 0 Unit 0 Disk SEAGATE ST34371W SUN4.2G3862 Target 4 Unit 0 Removable Tape ARCHIVE Python 02635-XXX5962 Target 6 Unit 0 Removable Read Only device TOSHIBA XM5701TASUN12XCD0997 Target 9 Unit 0 Disk SEAGATE ST34371W SUN4.2G7462 Target b Unit 0 Disk SEAGATE ST34371W SUN4.2G7462 ok |
probe-scsi-all output:
ok probe-scsi-all This command may hang the system if a Stop-A or halt command has been executed. Please type reset-all to reset the system before executing this command. Do you wish to continue? (y/n) y /pci@1f,4000/scsi@4,1 Target 0 Unit 0 Disk SEAGATE ST39102LC SUN9.0G0828 Target 1 Unit 0 Disk SEAGATE ST39102LC SUN9.0G0828 Target 6 Unit 0 Removable Read Only deviceTOSHIBA XM6201TA SUN32XCD1103 ok |
The system is unable to communicate over the network.
Your system conforms to the Ethernet 10BASE-T/100BASE-TX standard, which states that the Ethernet 10BASE-T link integrity test function should always be enabled on both the host system and the Ethernet hub. The system cannot communicate with a network if this function is not set identically for both the system and the network hub (either enabled for both or disabled for both). This problem applies only to 10BASE-T network hubs, where the Ethernet link integrity test is optional. This is not a problem for 100BASE-TX networks, where the test is enabled by default. Refer to the documentation provided with your Ethernet hub for more information about the link integrity test function.
If you connect the system to a network and the network does not respond, use the OpenBoot PROM command watch-net-all to display conditions for all network connections:
ok watch-net-all |
For most PCI Ethernet cards, the link integrity test function can be enabled or disabled with a hardware jumper on the PCI card, which you must set manually. (See the documentation supplied with the card.) For the standard TPE and MII main logic board ports, the link test is enabled or disabled through software.
Remember also that the TPE and MII ports share the same circuitry and as a result, you can use only one port at a time.
Some hub designs permanently enable (or disable) the link integrity test through a hardware jumper. In this case, refer to the hub installation or user manual for details of how the test is implemented.
To enable or disable the link integrity test for the standard Ethernet interface, or for a PCI-based Ethernet interface, you must first know the device name of the desired Ethernet interface. To list the device name:
Shut down the operating system and take the system to the ok prompt.
Determine the device name for the desired Ethernet interface, using one of the two solutions that follow.
Use this method while the operating system is running:
Become superuser.
Type:
# eeprom nvramrc="probe-all install-console banner apply disable-link-pulse device-name" (Repeat for any additional device names.) # eeprom "use-nvramrc?"=true |
Reboot the system (when convenient) to make the changes effective.
Use this alternative method when the system is already in OpenBoot:
At the ok prompt, type:
ok nvedit 0: probe-all install-console banner 1: apply disable-link-pulse device-name (Repeat this step for other device names as needed.) (Press CONTROL-C to exit nvedit.) ok nvstore ok setenv use-nvramrc? true |
Reboot the system to make the changes effective.
The system attempts to power up but does not boot or initialize the monitor.
Run POST diagnostics.
Observe POST results.
The front panel general fault LED should flash slowly to indicate that POST is running. Check the POST output using a locally attached terminal or a tip connection.
If you see no front panel LED activity, a power supply may be defective.
If the POST output contains an error message, then POST has failed.
The most probable cause for this type of failure is the main logic board. However, before replacing the main logic board you should:
A CD-ROM drive read error or parity error is reported by the operating system or a software application.
Disk drive or CD-ROM drive fails to boot or is not responding to commands.
Test the drive response to the probe-scsi-all command as follows:
At the system ok prompt, type:
ok reset-all ok probe-scsi-all |
If the SCSI device responds correctly to probe-scsi-all, a message similar to the one shown in the probe-scsi output example on "7.1 About Diagnostic Tools" is printed out.
If the device responds and a message is displayed, the system SCSI controller has successfully probed the device. This indicates that the main logic board is operating correctly.
If one drive does not respond to the SCSI controller probe but the others do, replace the unresponsive drive.
If only one internal disk drive is configured with the system and the probe-scsi-all test fails to show the device in the message, replace the drive.
If the problem is still evident after replacing the drive, replace the main logic board.
If replacing both the disk drive and the main logic board does not correct the problem, replace the associated UltraSCSI data cable and UltraSCSI backplane.
To check whether the main logic board SCSI controllers are defective, test the drive response to the probe-scsi command. To test additional SCSI host adapters added to the system, use the probe-scsi-all command. You can use the OBP printenv command to display the OpenBoot PROM configuration variables stored in the system NVRAM. The display includes the current values for these variables as well as the default values. See "7.12.2.3 OBP printenv Command" for more information.
At the ok prompt, type:
ok probe-scsi |
If a message is displayed for each installed disk, the system SCSI controllers have successfully probed the devices. This indicates that the main logic board is working correctly.
If a disk does not respond, make sure that each SCSI device on the SCSI bus has a unique SCSI target ID.
If the problem persists, replace the unresponsive drive.
If the problem remains after replacing the drive, replace the main logic board.
If the problem persists, replace the associated SCSI cable and backplane.
If there is a problem with a power supply, POST lights the general fault indicator and the power supply fault indicator on the front panel. If you have more than one power supply, then you can use the LEDs located on the power supplies themselves to identify the faulty supply. The power supply LEDs indicate any problem with the AC input or DC output. See "7.12.1.3 Power Supply LEDs" for more information about the LEDs.
SunVTS and POST diagnostics can report memory errors encountered during program execution. Memory error messages typically indicate the DIMM location number ("U" number) of the failing module.
Use the following diagram to identify the location of a failing memory module from its U number.