2 Diagnostics Overview





This chapter describes the different types of diagnostic firmware and software tools available to you and how they are related. The main categories of diagnostics are:

This chapter also briefly covers the Forth Toolkit, which is an interactive command interpreter based on the Forth programming language. The Forth Toolkit provides the capability to interactively execute Boot PROM diagnostics. For a more complete discussion of the Sun Forth Toolkit, see the Introduction to Open Boot 2.0.

The flowchart in Figure 2-1 outlines the roles played by various diagnostics during the default boot mode.

    Figure 2-1 Default Boot Mode

How the Diagnostics Fit Together

This section describes the relationship between the various diagnostic tools. A graphical description is provided by the flowchart in Figure 2-1. The flowchart outlines the roles played by various diagnostics during the default boot mode. This description assumes you are using a graphics monitor to view test results.

Power On Self-Test (POST) code is stored in the boot PROM. When the system is powered on, POST is executed before anything else and tests the most basic functions of the system hardware before they are used by subsequent functions. You can view the progress of POST by monitoring the four LEDs on the keyboard or via serial port a (the ttya port) on the system. The flow chart in Figure 2-1 assumes that POST was able to execute and pass all of the POST tests. Failure modes are discussed in more detail later in this chapter.

After POST completes and basic machine initialization is performed, a system banner is displayed that describes the system. After displaying system messages, the system checks parameters stored in the NVRAM to determine whether the system should automatically proceed to boot a stand-alone program.

For a list of NVRAM parameters, see Table 2-1.

    Table 2-1

    Table of NVRAM parameters used during POST and boot

If the auto-boot? parameter is set to false the system proceeds to the Forth Toolkit (ok prompt), or the system monitor (> prompt). Using the Forth Toolkit, a user may command the system to boot from wherever he wishes or command the system to execute a variety of User Callable Diagnostics. See the Introduction to Open Boot 2.0 for a complete description of the Forth Toolkit.

If the auto-boot? parameter is set to true (default) the system boots a stand- alone program. To determine which program and device to boot from the system checks the diag-switch? NVRAM parameter. Table 2-2 describes the functions of the auto-boot? and diag-switch? parameters.

    Table 2-2 Summary of Autoboot and Diagnostic Parameters.

Autoboot Diagnostic Result
Switch Switch
Parameter Parameter
False False or True > or ok prompt True False boot SunOS (vmunix)
from disk 0*
(/sbus/esp/sd@3,0)
True True boot SunOS (vmunix)
from network*
(/sbus/le)
* The boot parameters represented here are default settings. The defaults may be changed by following the procedures listed in the Introduction to Open Boot 2.0.

The default stand-alone program booted is SunOS (vmunix). Once SunOS is running, the Sundiag System Exerciser may be invoked. Refer to the Sundiag System Exerciser section later in this chapter for further information.

Another stand-alone program is the SunDiagnostic Executive. Refer to the Sun Diagnostic Executive section later in this chapter for further information.

To boot user-specified programs, such as the SunDiagnostic Executive, you must be at the > prompt or ok prompt. See "User Callable Diagnostics" later in this chapter for a detailed procedure on how to obtain the '> and ok prompts.

When to Use Diagnostics

You should use each type of diagnostic tool in the appropriate circumstances. Table 2-3 provides a summary of the available diagnostic tools, and lists when to use each diagnostic tool.

    Table 2-3

    Summary of Available Diagnostic Tools

Boot PROM Diagnostics

The diagnostics stored in the boot PROM include the following:

The Power-On Self-Test (POST) is automatically run at power-up and tests the core CPU functionality. The progress of testing may be monitored using the keyboard LEDs, the console and the system's serial port A. If there is system trouble, you may want to run extended user callable diagnostics to take advantage of thorough tests including - but not limited to - Ethernet Controller, memory, and diskette drive tests. See Table 2-4 for the table of keyboard LED diagnostic codes.

The boot PROM diagnostics are described later in this chapter.

Power-On Self-Test (POST)

The Power-On Self-Test (POST) runs automatically when you turn on the system's power switch or reset the system. The POST code, which resides in the boot PROM, is executed by the CPU (IU) when the Power On Reset (POR) signal is received from the power supply. POR is a Power-On Reset TTL open collector signal from the power supply, which is activated after DC voltages have risen. The POST consists of a sequence of tests designed to test the major hardware components of the main logic board, in a short time before SunOS is booted. POST does not perform extensive testing on any component of the main logic board. Only major failures can be detected by POST. Major failures that can be detected are:

POST Failures Modes

This section describes the POST failure modes and how the you can detect them. POST progress may be monitored via three means:

Figure 2-2 is a flowchart that shows POST progress from power-on to booting SunOS.

    Figure 2-2 Post Progress from Power-on to Boot

POST is designed to test the most basic system hardware. While POST is running, the four keyboard LEDs are turned on and off in a cyclical pattern to indicate testing progress.

If a failure occurs in POST, a specific LED pattern is displayed on the four LEDs located on the upper right corner of your keyboard. Table 2-4 shows the arrangement of keyboard LEDs. After setting the keyboard LEDs, the system attempts to continue initialization. If the failed device is critical to the subsequent initialization the system may halt, leaving the LED code displayed. If initialization is able to proceed to the keyboard initialization section, the keyboard LEDs are reset and any information displayed there is lost.

After resetting the keyboard the system proceeds with the initialization sequence. Once complete, it displays the banner message. Immediately after the banner message, a pass/fail message is displayed:

An error message is displayed after the banner message only if POST failed. If the system hangs between resetting the keyboard and the banner, the test result information is lost and it may be necessary to power down the system, power up the system, and note the LED code in the short time it is displayed. Refer to Table 2-4 for a description of the LED codes.

If POST passes, the system probes for SBus devices and interprets their drivers. The devices found during this probe are displayed on the graphics monitor. You will see these types of messages:

Following the successful initialization of the system, SunOS is booted automatically, unless the NVRAM configuration options specify not to do so.

You can retrieve more detailed POST failure information by using the POST output over the ttya serial port. If you connect a terminal, you must set the NVRAM parameter diag-switch? to True. For more information on the NVRAM parameters, see Table 2-1 earlier in this chapter. Test failure messages are displayed whether or not the system is in diagnostic mode. It may be easier to understand the failure message output when accompanied by POST progress messages. An example of a POST failure message output over the ttya serial port follows:

Power-On Self-Test Detailed Description

This section describes the keyboard LED patterns as a result of POST and their meaning. Figure 2-3 shows the arrangement of keyboard LEDs on the keyboard.

    Figure 2-3 Arrangement of keyboard LEDs

Table 2-4 shows the LED display patterns, the field replaceable units (FRUs) that fail power-on tests, and the meaning of the display patterns. The FRUs include:

Figure 2-4 and Table 2-5 help you determine which SIMM is faulty. Figure 2-4 shows the location of the SIMM slots in the system unit. Table 2-5 is a list of Physical Memory Addresses. For more information on locating faulty SIMMs see Chapter 4, "Determining Faulty SIMM Locations."

    Table 2-4

    Keyboard LED Diagnostic Codes

    Figure 2-4 Location of SIMM Slots in System Unit

    Table 2-5

    Table of Memory Banks

For further information about replacing the FRUs that fail, see "Removing and Replacing FRUs" in Chapter 4.

If all POST tests pass, run the SunDiagnostic Executive with the cache disabled. The SunDiagnostic Executive is an independent operating system. It runs exhaustive subsystem tests independently of SunOS. See the latest version of SunDiagnostic Executive User's Guide for SPARCstations.

User Callable Diagnostics

You have access to a number of user callable diagnostics. To invoke these tests you must enter the Forth Toolkit.

The Forth Toolkit provides an interface for the SPARCstation IPX implementation of the 2.0 Open Boot PROM Architecture. See the Introduction to Open Boot 2.0 for more information.

To enter the Forth Toolkit from SunOS:

    1. Preparation.
      a. Save all your work and quit all applications.
      b. Become root
    2. Halt the system.

    As root, halt the system by entering:
    /usr/etc/halt

    The system synchronizes the file systems and brings you to either the or ok prompt. The prompt is the default prompt; the ok prompt is the Forth Toolkit prompt. You will see the ok prompt if you reset the system parameters to have the ok prompt, as the default prompt. To modify the NVRAM contents, see Appendix D for a list of parameters used during reset.

    If you see the ok prompt, you are already in the Forth Toolkit and need to do nothing further. If you see the prompt, go to the next step.

    3. Enter the Forth Toolkit.
      a. Enter n to enter the Forth Toolkit. The ok prompt shows that you are in the Forth Toolkit.

    The following screen summarizes the steps you need to take to halt the system and enter the Forth Toolkit.

    4. Enter help diag to get a listing of tests comprising on-board diagnostics.

Figure 2-5 is a partial list of the tests you can run in Forth Toolkit.

    Figure 2-5 Displaying User-Callable Diagnostics

User Callable Tests

When the system has been initialized and the Forth monitor has been entered (ok prompt displayed), a set of PROM-resident diagnostics are available for further testing. Most of the available tests can be displayed with the
help diag command. Table 2-6 is a list of specific tests available on the SPARCstation IPX with Release 2.0 of the Open Boot PROM:

    Table 2-6 Table of Tests Available with Release 2.0 of the Boot PROM

Test Description test screen Tests the system frame buffer. The diag-switch? parameter must be
true.
test /audio Tests the audio logic, speaker, and audio jack output. Outputs a
tone to the system speaker followed by a tone to the audio output
jack.
test ttya Tests the ability to output characters to a device attached to serial
test ttyb port A or B. The printable ASCII character set is sent.
test keyboard Tests keyboard type recognition and flashes all LEDs on and off. test net Tests internal and external ethernet loopback (Same as test
/sbus/le.
test /memory Tests memory based on the settings of the diag-switch? parameter
or the selftest-#megs parameter.
test floppy Tests the floppy drive connection and the ability to recognize a
formatted floppy disk.
probe-scsi Probes the on-board SCSI bus and returns the controller id and
target addresses of the SCSI devices connected and powered up.
watch-clock Shows the ticks of the CPU's real time clock. test /device-id Tests the device identified by the device-id as the system knows
it. The device under test must have a selftest defined in its fcode
PROM and the diag-switch? must (usually) be true.
watch-net Tests the network connections and determines whether the
ethernet connection is alive, dead, or receiving good or bad
packets.
show-sbus Displays a list of the system's SBus slots and the associated device
names found during system initialization.
show-post-results Displays the results of the Power On Self-Test. May contain an
error message or POST PASSED message.

Returning to the Monitor Prompt

To return to the monitor prompt from the Forth Toolkit type old-mode at
the > prompt.

Altering the Power-up Sequence

During the workstation's power-up sequence, certain keyboard key combinations can be used to modify how the system gets initialized. These key combinations are all variations on the L1 (Stop) key on the type 4 keyboard.

To invoke the initialization modifier, the desired keyboard keys should be pressed and held throughout the POST routine. At the end of POST, the CPU looks at the keyboard for any modifiers and takes action accordingly. When you see the keyboard LEDs stop flashing and go off it is then safe to release the keys.

The following modifiers are supported in the 2.0 Open Boot PROM:

    Table 2-7 Table of Modifiers Supported in 2.0 Open Boot PROM.

Command Description L1-D Set the NVRAM parameter diag-switch? true. This
puts the workstation in the diagnostic mode.
L1-N Forces all NVRAM parameters back to their default
settings.
L1-D-N NVRAM parameter diag-switch? is set to true and all
other parameters are returned to their default settings.
L1-F Forces all input and output to the ttya port. This
operation will take place prior to probing SBus slots,
but the minimum CPU should be functional, allowing
low-level tests, such as test /memory to be run.

The L1-A key sequence operates as an abort signal to the system and will halt the operation of programs in most cases.

Sundiag System Exerciser

The Sundiag System Exerciser runs under SunOS. It displays real-time use of system resources and peripheral equipment such as Desktop Storage Packs and External Storage Modules. Run the Sundiag System Exerciser to verify that the system is functioning properly.

The exerciser is shipped with SunOS. If it has been selected during the SunInstall (operating system loading) procedure, it can be run at any time and is found in the directory /usr/diag/sundiag. If the Sundiag System Exerciser is not found on the system hard disk or server, you can load it from tape or CD.

For information on how to use the Sundiag System Exerciser, see the Sundiag User's Guide. Appendix A, "Loopback Connectors" in the Sundiag User's Guide explains how to connect the external loopback connectors required for some options.

If Sundiag passes, the system is operating properly. If Sundiag fails, the system is not operating properly. To identify the problem when Sundiag fails, first run the POST. If all POST tests pass, next run the SunDiagnostic Executive to isolate the problem.

SunDiagnostic Executive

The SunDiagnostic Executive is an independent operating system. The SunDiagnostic Executive runs exhaustive subsystem tests independently of SunOS. Run the SunDiagnostic Executive if all POST tests pass in order to troubleshoot what field-replaceable unit needs to be replaced. For information on POST, see "Power-On Self-Test Detailed Description" earlier in this chapter. The SunDiagnostic Executive is described in the SunDiagnostic Executive User's Guide for SPARCstations.