This chapter describes the different types of diagnostic firmware and software tools available to you and how they are related. The main categories of diagnostics are:
This chapter also briefly covers the Forth Toolkit, which is an interactive command interpreter based on the Forth programming language. The Forth Toolkit provides the capability to interactively execute Boot PROM diagnostics. For a more complete discussion of the Sun Forth Toolkit, see the Introduction to Open Boot 2.0.
The flowchart in Figure 2-1 outlines the roles played by various diagnostics during the default boot mode.
Figure 2-1 Default Boot Mode
This section describes the relationship between the various diagnostic tools. A graphical description is provided by the flowchart in Figure 2-1. The flowchart outlines the roles played by various diagnostics during the default boot mode. This description assumes you are using a graphics monitor to view test results.
Power On Self-Test (POST) code is stored in the boot PROM. When the system is powered on, POST is executed before anything else and tests the most basic functions of the system hardware before they are used by subsequent functions. You can view the progress of POST by monitoring the four LEDs on the keyboard or via serial port a (the ttya port) on the system. The flow chart in Figure 2-1 assumes that POST was able to execute and pass all of the POST tests. Failure modes are discussed in more detail later in this chapter.
After POST completes and basic machine initialization is performed, a system banner is displayed that describes the system. After displaying system messages, the system checks parameters stored in the NVRAM to determine whether the system should automatically proceed to boot a stand-alone program.
For a list of NVRAM parameters, see Table 2-1.
If the auto-boot? parameter is set to false the system proceeds to the Forth Toolkit (ok prompt), or the system monitor (> prompt). Using the Forth Toolkit, a user may command the system to boot from wherever he wishes or command the system to execute a variety of User Callable Diagnostics. See the Introduction to Open Boot 2.0 for a complete description of the Forth Toolkit.
If the auto-boot? parameter is set to true (default) the system boots a stand- alone program. To determine which program and device to boot from the system checks the diag-switch? NVRAM parameter. Table 2-2 describes the functions of the auto-boot? and diag-switch? parameters.
Table 2-2 Summary of Autoboot and Diagnostic Parameters.
The default stand-alone program booted is SunOS (vmunix). Once SunOS is running, the Sundiag System Exerciser may be invoked. Refer to the Sundiag System Exerciser section later in this chapter for further information.
Another stand-alone program is the SunDiagnostic Executive. Refer to the Sun Diagnostic Executive section later in this chapter for further information.
To boot user-specified programs, such as the SunDiagnostic Executive, you must be at the > prompt or ok prompt. See "User Callable Diagnostics" later in this chapter for a detailed procedure on how to obtain the '> and ok prompts.
You should use each type of diagnostic tool in the appropriate circumstances. Table 2-3 provides a summary of the available diagnostic tools, and lists when to use each diagnostic tool.
The diagnostics stored in the boot PROM include the following:
The Power-On Self-Test (POST) is automatically run at power-up and tests the core CPU functionality. The progress of testing may be monitored using the keyboard LEDs, the console and the system's serial port A. If there is system trouble, you may want to run extended user callable diagnostics to take advantage of thorough tests including - but not limited to - Ethernet Controller, memory, and diskette drive tests. See Table 2-4 for the table of keyboard LED diagnostic codes.
The boot PROM diagnostics are described later in this chapter.
The Power-On Self-Test (POST) runs automatically when you turn on the system's power switch or reset the system. The POST code, which resides in the boot PROM, is executed by the CPU (IU) when the Power On Reset (POR) signal is received from the power supply. POR is a Power-On Reset TTL open collector signal from the power supply, which is activated after DC voltages have risen. The POST consists of a sequence of tests designed to test the major hardware components of the main logic board, in a short time before SunOS is booted. POST does not perform extensive testing on any component of the main logic board. Only major failures can be detected by POST. Major failures that can be detected are:
This section describes the POST failure modes and how the you can detect them. POST progress may be monitored via three means:
Figure 2-2 is a flowchart that shows POST progress from power-on to booting SunOS.
Figure 2-2 Post Progress from Power-on to Boot
POST is designed to test the most basic system hardware. While POST is running, the four keyboard LEDs are turned on and off in a cyclical pattern to indicate testing progress.
If a failure occurs in POST, a specific LED pattern is displayed on the four LEDs located on the upper right corner of your keyboard. Table 2-4 shows the arrangement of keyboard LEDs. After setting the keyboard LEDs, the system attempts to continue initialization. If the failed device is critical to the subsequent initialization the system may halt, leaving the LED code displayed. If initialization is able to proceed to the keyboard initialization section, the keyboard LEDs are reset and any information displayed there is lost.
After resetting the keyboard the system proceeds with the initialization sequence. Once complete, it displays the banner message. Immediately after the banner message, a pass/fail message is displayed:
An error message is displayed after the banner message only if POST failed. If the system hangs between resetting the keyboard and the banner, the test result information is lost and it may be necessary to power down the system, power up the system, and note the LED code in the short time it is displayed. Refer to Table 2-4 for a description of the LED codes.
If POST passes, the system probes for SBus devices and interprets their drivers. The devices found during this probe are displayed on the graphics monitor. You will see these types of messages:
Following the successful initialization of the system, SunOS is booted automatically, unless the NVRAM configuration options specify not to do so.
You can retrieve more detailed POST failure information by using the POST output over the ttya serial port. If you connect a terminal, you must set the NVRAM parameter diag-switch? to True. For more information on the NVRAM parameters, see Table 2-1 earlier in this chapter. Test failure messages are displayed whether or not the system is in diagnostic mode. It may be easier to understand the failure message output when accompanied by POST progress messages. An example of a POST failure message output over the ttya serial port follows:
This section describes the keyboard LED patterns as a result of POST and their meaning. Figure 2-3 shows the arrangement of keyboard LEDs on the keyboard.
Figure 2-3 Arrangement of keyboard LEDs
Table 2-4 shows the LED display patterns, the field replaceable units (FRUs) that fail power-on tests, and the meaning of the display patterns. The FRUs include:
Figure 2-4 and Table 2-5 help you determine which SIMM is faulty. Figure 2-4 shows the location of the SIMM slots in the system unit. Table 2-5 is a list of Physical Memory Addresses. For more information on locating faulty SIMMs see Chapter 4, "Determining Faulty SIMM Locations."
Figure 2-4 Location of SIMM Slots in System Unit
For further information about replacing the FRUs that fail, see "Removing and Replacing FRUs" in Chapter 4.
If all POST tests pass, run the SunDiagnostic Executive with the cache disabled. The SunDiagnostic Executive is an independent operating system. It runs exhaustive subsystem tests independently of SunOS. See the latest version of SunDiagnostic Executive User's Guide for SPARCstations.
You have access to a number of user callable diagnostics. To invoke these tests you must enter the Forth Toolkit.
The Forth Toolkit provides an interface for the SPARCstation IPX implementation of the 2.0 Open Boot PROM Architecture. See the Introduction to Open Boot 2.0 for more information.
To enter the Forth Toolkit from SunOS:
As root, halt the system by entering:
/usr/etc/halt
The system synchronizes the file systems and brings you to either the or ok prompt. The prompt is the default prompt; the ok prompt is the Forth Toolkit prompt. You will see the ok prompt if you reset the system parameters to have the ok prompt, as the default prompt. To modify the NVRAM contents, see Appendix D for a list of parameters used during reset.
If you see the ok prompt, you are already in the Forth Toolkit and need to do nothing further. If you see the prompt, go to the next step.
The following screen summarizes the steps you need to take to halt the system and enter the Forth Toolkit.
Figure 2-5 is a partial list of the tests you can run in Forth Toolkit.
Figure 2-5 Displaying User-Callable Diagnostics
When the system has been initialized and the Forth monitor has been entered
(ok prompt displayed), a set of PROM-resident diagnostics are available for
further testing. Most of the available tests can be displayed with the
help diag command. Table 2-6 is a list of specific tests available on the
SPARCstation IPX with Release 2.0 of the Open Boot PROM:
Table 2-6 Table of Tests Available with Release 2.0 of the Boot PROM
To return to the monitor prompt from the Forth Toolkit type old-mode at
the > prompt.
During the workstation's power-up sequence, certain keyboard key combinations can be used to modify how the system gets initialized. These key combinations are all variations on the L1 (Stop) key on the type 4 keyboard.
To invoke the initialization modifier, the desired keyboard keys should be pressed and held throughout the POST routine. At the end of POST, the CPU looks at the keyboard for any modifiers and takes action accordingly. When you see the keyboard LEDs stop flashing and go off it is then safe to release the keys.
The following modifiers are supported in the 2.0 Open Boot PROM:
Table 2-7 Table of Modifiers Supported in 2.0 Open Boot PROM.
The L1-A key sequence operates as an abort signal to the system and will halt the operation of programs in most cases.
The Sundiag System Exerciser runs under SunOS. It displays real-time use of system resources and peripheral equipment such as Desktop Storage Packs and External Storage Modules. Run the Sundiag System Exerciser to verify that the system is functioning properly.
The exerciser is shipped with SunOS. If it has been selected during the SunInstall (operating system loading) procedure, it can be run at any time and is found in the directory /usr/diag/sundiag. If the Sundiag System Exerciser is not found on the system hard disk or server, you can load it from tape or CD.
For information on how to use the Sundiag System Exerciser, see the Sundiag User's Guide. Appendix A, "Loopback Connectors" in the Sundiag User's Guide explains how to connect the external loopback connectors required for some options.
If Sundiag passes, the system is operating properly. If Sundiag fails, the system is not operating properly. To identify the problem when Sundiag fails, first run the POST. If all POST tests pass, next run the SunDiagnostic Executive to isolate the problem.
The SunDiagnostic Executive is an independent operating system. The SunDiagnostic Executive runs exhaustive subsystem tests independently of SunOS. Run the SunDiagnostic Executive if all POST tests pass in order to troubleshoot what field-replaceable unit needs to be replaced. For information on POST, see "Power-On Self-Test Detailed Description" earlier in this chapter. The SunDiagnostic Executive is described in the SunDiagnostic Executive User's Guide for SPARCstations.