Sun Enterprise 3500 System Reference Manual

Diagnosing Problems

Servicing Obvious Problems

If the Service LED on the system front panel (or the clock+ board) indicates a hardware failure, find the failing module by looking for a lit service LED on the individual module.

The system contains a number of hot-pluggable modules. Under limited conditions, these modules can be removed and replaced while the system continues running. (For a general description of the hot-plug feature, see "Hot-Plug Feature".)

The hot-pluggable modules include these types: CPU/Memory+ board, SBus+ I/O board, Graphics+ I/O board, PCI+ I/O board, and PCM.


Caution - Caution -

The hot-plug feature requires a functional peripheral power supply/AC. If the peripheral power supply cannot provide current, the hot-pluggable module will be damaged if you attempt to remove or replace it.


If a module fails and there are redundant resources in the system, it may be safe to leave the module in a running system until a replacement part is delivered. For example, if a CPU fails (as indicated perhaps by system messages), but other CPUs continue to function in the system, you can leave the CPU/Memory+ board in place until a replacement CPU is available. Note that it is particularly helpful to leave a module in place if you do not have a filler panel to replace it.

If you choose to remove a faulty board or PCM, remember that you must fill the vacated slot with a replacement or a filler panel to prevent the system from overheating.

Troubleshooting Less Obvious Problems

When board LED codes do not specify the failing hardware, several types of software programs are available to supply information about the problem. This software includes the SunVTS(TM) program, the prtdiag command, the prtenv command, POST and OpenBoot PROM commands, and the SyMON(TM) program.

SunVTS

Run SunVTS(TM) under the Solaris operating environment, or equivalent.

The SunVTS online validation test suite is designed to stress test Sun hardware. By running multiple and multithreaded diagnostic hardware tests, the SunVTS software verifies the system configuration and functionality of most hardware controllers and devices.

SunVTS tests many board and system functions, as well as interfaces for Fibre Channel, SCSI, and SBus interfaces. SunVTS accepts user-written scripts for automated testing.

Refer to the SunVTS User's Guide for starting and operating instructions.

prtdiag Command

You can use the prtdiag command to display:

Refer to the prtdiag man page for instructions.

History Log Option

To isolate an intermittent failure, it can be helpful to maintain a prtdiag history log. Use the prtdiag command with the -l (log) option to send output to a log file in the /var/adm directory.

Running prtdiag

To run prtdiag, type:


% /usr/platform/sun4u/sbin/prtdiag

or use the log option:


% /usr/platform/sun4u/sbin/prtdiag -l

POST and OpenBoot

POST and OpenBoot work together in the system to test and manage system hardware.

POST resides in the OpenBoot PROM on each CPU/Memory+ board and I/O+ board. When the system is turned on, or if a system reset is issued, POST detects and tests buses, power supplies, boards, CPUs, SIMMs, and many board functions. POST controls the status LEDs on the system front panel and all boards. POST displays diagnostic and error messages on a console terminal, if available.

Only POST can configure the system hardware, and only POST can enable hot-pluggable boards. If a new PCM is added to the card cage after the system has booted, the new PCM will not work until the system is rebooted, at which time POST reconfigures the system, using the PCMs that are found in the system at that time.

OpenBoot provides basic environmental monitoring, including detection of overheating conditions and out-of-tolerance voltages. For example, if an overheated board is found, OpenBoot issues a warning message. If the temperature passes the danger level, POST will put the overheated board(s) in low power mode.

OpenBoot also provides a set of commands and diagnostics at the ok prompt. For example, you can use OpenBoot to set NVRAM variables that reserve a board or a set of SIMMs for hot-sparing.

The following OpenBoot commands may be useful for diagnosing problems:

show-devs Command

Use the show-devs command to list the devices that are included in the system configuration.

printenv Command

Use the printenv command to display the system configuration variables stored in the system NVRAM. The display includes the current values for these variables, as well as the default values.

If the system cannot communicate with a 10BASE-T network, the Ethernet link test setting for the port may be incompatible with the setting at the network hub. See "Failure of Network Communications" for further details.

probe-scsi Command

The probe-scsi command locates and tests SCSI devices attached to the system. probe-scsi is run from the OpenBoot prompt.

When it is not practical to halt the system, you can use SunVTS as an alternate method of testing the SCSI interfaces.

Reference Documents for POST/OpenBoot

Solstice SyMON

The Solstice(TM) SyMON program monitors system functioning and features a graphical user interface (GUI) to continuously display system status. Solstice SyMON is intended to complement system management tools such as SunVTS.

Solstice SyMON is accessible through an SNMP interface from network tools such as Solstice SunNet Manager(TM).

Refer to the Solstice SyMON User's Guide manual, part number 802-5355, for starting and operating instructions.