C H A P T E R  6

Diagnostic Tools

The Sun Fire V490 server and its accompanying software contain many tools and features that help you:

This chapter introduces the tools that let you accomplish these goals, and helps you to understand how the various tools fit together.

Topics in this chapter include:

If you only want instructions for using diagnostic tools, skip this chapter and turn to
Part Three of this manual. There, you can find chapters that tell you how to isolate failed parts (Chapter 10), monitor the system (Chapter 11), and exercise the system (Chapter 12).


About the Diagnostic Tools

Sun provides a wide spectrum of diagnostic tools for use with the Sun Fire V490 server. These tools range from the formal--like Sun's comprehensive Validation Test Suite (SunVTS), to the informal--like log files that may contain clues helpful in narrowing down the possible sources of a problem.

The diagnostic tool spectrum also ranges from standalone software packages, to firmware-based power-on self-tests (POST), to hardware LEDs that tell you when the power supplies are operating.

Some diagnostic tools enable you to examine many computers from a single console, others do not. Some diagnostic tools stress the system by running tests in parallel, while other tools run sequential tests, enabling the machine to continue its normal functions. Some diagnostic tools function even when power is absent or the machine is out of commission, while others require the operating system to be up and running.

The full palette of tools discussed in this manual is summarized in TABLE 6-1.


TABLE 6-1 Summary of Diagnostic Tools

Diagnostic Tool

Type

What It Does

Accessibility and Availability

Remote Capability

LEDs

Hardware

Indicate status of overall system and particular components

Accessed from system chassis. Available anytime power is available

Local, but can be viewed via SC

POST

Firmware

Tests core components of system

Runs automatically on startup. Available when the operating system is not running

Local, but can be viewed via SC

OpenBoot Diagnostics

Firmware

Tests system components, focusing on peripherals and
I/O devices

Runs automatically or interactively. Available when the operating system is not running

Local, but can be viewed via SC

OpenBoot commands

Firmware

Display various kinds of system information

Available whether or not the operating system is running

Local, but can be accessed via SC

Solaris commands

Software

Display various kinds of system information

Requires operating system

Local, but can be accessed via SC

SunVTS

Software

Exercises and stresses the system, running tests in parallel

Requires operating system. Optional package may need to be installed

View and control over network

SC card
and RSC software

Hardware and software

Monitors environmental conditions, performs basic fault isolation, and provides remote console access

Can function on standby power and without operating system

Designed for remote access

Sun Management Center

Software

Monitors both hardware environmental conditions and software performance of multiple machines. Generates alerts for various conditions

Requires operating system to be running on both monitored and master servers. Requires a dedicated database on the master server

Designed for remote access

Hardware Diagnostic Suite

Software

Exercises an operational system by running sequential tests. Also reports failed FRUs

Separately purchased optional add-on to Sun Management Center. Requires operating system and Sun Management Center

Designed for remote access


Why are there so many different diagnostic tools?

There are a number of reasons for the lack of a single all-in-one diagnostic test, starting with the complexity of the server systems.

Consider the data bus built into every Sun Fire V490 server. This bus features a five-way switch called a CDX that interconnects all processors and high-speed I/O interfaces (refer to FIGURE 6-1). This data switch enables multiple simultaneous transfers over its private data paths. This sophisticated high-speed interconnect represents just one facet of the Sun Fire V490 server's advanced architecture.


FIGURE 6-1 Simplified Schematic View of a Sun Fire V490 System

This illustration presents a simplified schematic view of a Sun Fire V490 system


Consider also that some diagnostics must function even when the system fails to start. Any diagnostic capable of isolating problems when the system fails to start up must be independent of the operating system. But any diagnostic that is independent of the operating system will also be unable to make use of the operating system's considerable resources for getting at the more complex causes of failures.

Another complicating factor is that different installations have different diagnostic requirements. You may be administering a single computer or a whole data center full of equipment racks. Alternatively, your systems may be deployed remotely-- perhaps in areas that are physically inaccessible.

Finally, consider the different tasks you expect to perform with your diagnostic tools:

Not every diagnostic tool can be optimized for all these varied tasks.

Instead of one unified diagnostic tool, Sun provides a palette of tools each of which has its own specific strengths and applications. To appreciate how each tool fits into the larger picture, it is necessary to have some understanding of what happens when the server starts up, during the so-called boot process.


About Diagnostics and the Boot Process

You have probably had the experience of powering on a Sun system and watching as it goes through its boot process. Perhaps you have watched as your console displays messages that look like the following:


0:0>
0:0>@(#) Sun Fire[TM] V480/V490 POST 4.15 2004/04/09 16:27 
0:0>Copyright © 2004 Sun Microsystems, Inc. All rights reserved
  SUN PROPRIETARY/CONFIDENTIAL.
  Use is subject to license terms.
0:0>Jump from OBP->POST.
0:0>Diag level set to MIN.
0:0>Verbosity level set to NORMAL.
0:0>
0:0>Start selftest...
0:0>CPUs present in system: 0:0 1:0 2:0 3:0
0:0>Test CPU(s)....Done

It turns out these messages are not quite so inscrutable once you understand the boot process. These kinds of messages are discussed later.

It is important to understand that almost all of the firmware-based diagnostics can be disabled so as to minimize the amount of time it takes the server to start up. In the following discussion, assume that the system is configured to run its firmware-based tests.

Prologue: System Controller Boot

As soon as you plug in the Sun Fire V490 server to an electrical outlet, and before you turn on power to the server, the system controller (SC) inside the server begins its self-diagnostic and boot cycle. During this time, the locator LED blinks. Running off standby power, the system controller card begins functioning before the server itself comes up.

The system controller provides access to a number of control and monitoring functions through Remote System Control (RSC) software. For more information about RSC software, refer to Sun Remote System Control Software.

Stage One: OpenBoot Firmware and POST

Every Sun Fire V490 server includes a chip holding about 2 Mbytes of firmware-based code. This chip is called the Boot PROM. After you turn on system power, the first thing the system does is execute code that resides in the Boot PROM.

This code, which is referred to as the OpenBoot firmware, is a small-scale operating system unto itself. However, unlike a traditional operating system that can run multiple applications for multiple simultaneous users, OpenBoot firmware runs in single-user mode and is designed solely to test, configure, and boot the system, thereby ensuring that the hardware is sufficiently "healthy" to run its normal operating system software.

When system power is turned on, the OpenBoot firmware begins running directly out of the Boot PROM, since at this stage system memory has not been verified to work properly.

Soon after power is turned on, the system hardware determines that at least one processor is powered on, and is submitting a bus access request, which indicates that the processor in question is at least partly functional. This becomes the master processor, and is responsible for executing OpenBoot firmware instructions.

The OpenBoot firmware's first actions are to check whether to run the power-on self-test (POST) diagnostics and other tests. The POST diagnostics constitute a separate chunk of code stored in a different area of the Boot PROM (refer to FIGURE 6-2).


FIGURE 6-2 Boot PROM and IDPROM

This illustration presents a block diagram view of the Boot PROM and IDPROM


The extent of these power-on self-tests, and whether they are performed at all, is controlled by configuration variables stored in a separate firmware memory device called the IDPROM. These OpenBoot configuration variables are discussed in Controlling POST Diagnostics.

As soon as POST diagnostics can verify that some subset of system memory is functional, tests are loaded into system memory.

The Purpose of POST Diagnostics

The POST diagnostics verify the core functionality of the system. A successful execution of the POST diagnostics does not ensure that there is nothing wrong with the server, but it does ensure that the server can proceed to the next stage of the boot process.

For a Sun Fire V490 server, this means:

It is possible for a system to pass all POST diagnostics and still be unable to boot the operating system. However, you can run POST diagnostics even when a system fails to boot, and these tests are likely to disclose the source of most hardware problems.

POST generally reports errors that are persistent in nature. To catch intermittent problems, consider running a system exercising tool. Refer to About Exercising the System.

What POST Diagnostics Do

Each POST diagnostic is a low-level test designed to pinpoint faults in a specific hardware component. For example, individual memory tests called address bitwalk and data bitwalk ensure that binary 0s and 1s can be written on each address and data line. During such a test, the POST may display output similar to this:


1:0>Data Bitwalk on Slave 3
1:0>	Test Bank 0.

In this example, processor 1 is the master processor, as indicated by the prompt 1:0>, and it is about to test the memory associated with processor 3, as indicated by the message "Slave 3."



Note - The x:y numbering system identifies processors that have multiple cores.



The failure of such a test reveals precise information about particular integrated circuits, the memory registers inside them, or the data paths connecting them:


1:0>ERROR: TEST = Data Bitwalk on Slave 3
1:0>H/W under test = CPU3 Memory
1:0>MSG = ERROR:	miscompare on mem test!
	Address: 00000030.001b0038
	Expected: 00000000.00100000
	Observed: 00000000.00000000

What POST Error Messages Tell You

When a specific power-on self-test discloses an error, it reports different kinds of information about the error:

Here is an excerpt of POST output showing another error message.

CODE EXAMPLE 6-1 POST Error Message

0:0>Schizo unit 1 PCI DMA C test   
0:0>	FAILED   
0:0>ERROR: TEST = Schizo unit 1 PCI DMA C test   

0:0>H/W under test = Motherboard/Centerplane Schizo 1, I/O Board, CPU
0:0>MSG = 
0:0>	Schizo Error - 16bit Data miss compare
0:0>	address  0000060300012800
0:0>	expected 0001020304050607 
0:0>	observed 0000000000000000
0:0>END_ERROR

Identifying FRUs

An important feature of POST error messages is the H/W under test line. (Refer to the arrow in CODE EXAMPLE 6-1.)

The H/W under test line indicates which FRU or FRUs may be responsible for the error. Note that in CODE EXAMPLE 6-1, three different FRUs are indicated. Using TABLE 6-13 to decode some of the terms, you can refer to that this POST error was most likely caused by a bad system interconnect circuit (Schizo) on the centerplane. However, the error message also indicates that the PCI riser board (I/O board) may be at fault. In the least likely case, the error might stem from the master processor, in this case processor 0.

Why a POST Error May Implicate Multiple FRUs

Because each test operates at such a low level, the POST diagnostics are often more definite in reporting the minute details of the error, like the numerical values of expected and observed results, than they are about reporting which FRU is responsible. If this seems counter-intuitive, consider the block diagram of one data path within a Sun Fire V490 server, shown in FIGURE 6-3.


FIGURE 6-3 POST Diagnostic Running Across FRUs

This illustration depicts the FRU boundaries in a single data path within a Sun Fire V490 server


The dashed lines in FIGURE 6-3 represent boundaries between FRUs. Suppose a POST diagnostic is running in the processor in the left part of the diagram. This diagnostic attempts to initiate a built-in self-test in a PCI device located in the right side of the diagram.

If this built-in self-test fails, there could be a fault in the PCI controller, or, less likely, in one of the data paths or components leading to that PCI controller. The POST diagnostic can tell you only that the test failed, but not why. So, though the POST may present very precise data about the nature of the test failure, any of three different FRUs could be implicated.

Controlling POST Diagnostics

You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables in the IDPROM. Changes to OpenBoot configuration variables generally take effect only after the machine is restarted. These variables affect OpenBoot Diagnostics tests as well as POST diagnostics.

TABLE 6-2 lists the most important and useful of these variables. You can find more extensive lists and descriptions in OpenBoot PROM Enhancements for Diagnostic Operation and OpenBoot 4.x Command Reference Manual. The former is included on the Sun Fire V490 Documentation CD. The latter is included with the Solaris Software Supplement CD that ships with Solaris software.

You can find instructions for changing OpenBoot configuration variables in How to View and Set OpenBoot Configuration Variables.


TABLE 6-2 OpenBoot Configuration Variables

OpenBoot Configuration Variable

Description and Keywords

auto-boot

Determines whether the operating system automatically starts up. Default is true.

  • true--Operating system automatically starts once firmware tests finish.
  • false--System remains at ok prompt until you type boot.

auto-boot-on-error?

Determines whether the system attempts to boot after a nonfatal error. Default is true.

  • true--System automatically boots after a nonfatal error if the variable auto-boot? is also set to true.
  • false--System remains at the ok prompt.

diag-level

Determines the level or type of diagnostics executed. Default is max.

  • off--No testing.
  • min--Only basic tests are run.
  • max--More extensive tests may be run, depending on the device.

diag-out-console

Redirects diagnostic and console messages to the system controller. Default is false.

  • true--Display diagnostic messages via the SC console.
  • false--Display diagnostic messages via the serial port ttya or a graphics terminal.

diag-script

Determines which devices are tested by OpenBoot Diagnostics. Default is normal.

  • none--No devices are tested.
  • normal--On-board (centerplane-based) devices that have self-tests are tested.
  • all--All devices that have self-tests are tested.

diag-switch?

Controls diagnostic execution in normal mode. Default is false.

  • true--Diagnostics are only executed on power-on reset events, but the level of test coverage, verbosity, and output is determined by user-defined settings.
  • false--Diagnostics are executed upon next system reset, but only for those class of reset events specified by the OpenBoot configuration variable
    diag-trigger. The level of test coverage, verbosity, and output is determined by user-defined settings.

Note: The above behaviors only apply to server machines like the Sun Fire V490 server. Workstations behave differently. For details, refer to OpenBoot PROM Enhancements for Diagnostic Operation.

diag-trigger

Specifies the class of reset event that causes diagnostic tests to run. This variable can accept single keywords as well as combinations of the first three keywords separated by spaces. For details, refer to How to View and Set OpenBoot Configuration Variables. Default is power-on-reset and error-reset.

  • error-reset--Reset that is caused by certain hardware error events such as RED State Exception Reset, Watchdog Reset, Software-Instruction Reset, or Hardware Fatal Reset.
  • power-on-reset--Reset that is caused by power cycling the system.
  • user-reset--Reset that is initiated by an operating system panic or by user-initiated commands from OpenBoot (reset-all or boot) or from Solaris (reboot, shutdown, or init).
  • all-resets--Any kind of system reset.
  • none--No power-on self-tests or OpenBoot Diagnostics tests run.

input-device

Selects where console input is taken from. Default is keyboard.

  • ttya--From built-in serial port.
  • keyboard--From attached keyboard that is part of a graphics terminal.
  • rsc-console--From the system controller.

Note: Should the specified input device be unavailable, the system automatically reverts to ttya.

output-device

Selects where diagnostic and other console output is displayed. Default is screen.

  • ttya--To built-in serial port.
  • screen--To attached screen that is part of a graphics terminal.
  • rsc-console--To the system controller.

Note: POST messages cannot be displayed on a graphics terminal. They are sent to ttya even when output-device is set to screen. Should the specified output device be unavailable, the system automatically reverts to ttya.

service-mode?

Controls whether the system is in service mode. Default is false.

  • true--Service mode. Diagnostics are executed at Sun-specified levels, overriding but preserving user settings.
  • false--Normal mode, unless overridden by the system control switch. Diagnostics execution depends entirely on the settings of diag-switch? and other user-defined OpenBoot configuration variables.

Note: If the system control switch is in Diagnostics position, the system will boot in service mode even if the service-mode? variable is false.


Stage Two: OpenBoot Diagnostics Tests

Once POST diagnostics have finished running, POST reports back to the OpenBoot firmware the status of each test it has run. Control then reverts back to the OpenBoot firmware code.

OpenBoot firmware code compiles a hierarchical "census" of all devices in the system. This census is called a device tree. Though different for every system configuration, the device tree generally includes both built-in system components and optional PCI bus devices.

Following the successful execution of POST diagnostics, the OpenBoot firmware proceeds to run OpenBoot Diagnostics tests. Like the POST diagnostics, OpenBoot Diagnostics code is firmware-based and resides in the Boot PROM.

What Are OpenBoot Diagnostics Tests For?

OpenBoot Diagnostics tests focus on system I/O and peripheral devices. Any device in the device tree, regardless of manufacturer, that includes an IEEE 1275-compatible self-test is included in the suite of OpenBoot Diagnostics tests. On a Sun Fire V490 server, OpenBoot Diagnostics test the following system components:

By default, the OpenBoot Diagnostics tests run automatically via a script when you start up the system. However, you can also run OpenBoot Diagnostics tests manually, as explained in the next section.

Controlling OpenBoot Diagnostics Tests

When you restart the system, you can run OpenBoot Diagnostics tests either interactively from a test menu, or by entering commands directly from the ok prompt.

Most of the same OpenBoot configuration variables you use to control POST (refer to TABLE 6-2) also affect OpenBoot Diagnostics tests. Notably, you can determine OpenBoot Diagnostics testing level--or suppress testing entirely--by appropriately setting the diag-level variable.

In addition, the OpenBoot Diagnostics tests use a special variable called test-args that enables you to customize how the tests operate. By default, test-args is set to contain an empty string. However, you can set test-args to one or more of the reserved keywords, each of which has a different effect on OpenBoot Diagnostics tests. TABLE 6-3 lists the available keywords.


TABLE 6-3 Keywords for the test-args OpenBoot Configuration Variable

Keyword

What It Does

bist

Invokes built-in self-test (BIST) on external and peripheral devices

debug

Displays all debug messages

iopath

Verifies bus/interconnect integrity

loopback

Exercises external loopback path for the device

media

Verifies external and peripheral device media accessibility

restore

Attempts to restore original state of the device if the previous execution of the test failed

silent

Displays only errors rather than the status of each test

subtests

Displays main test and each subtest that is called

verbose

Displays detailed messages of status of all tests

callers=N

Displays backtrace of N callers when an error occurs

  • callers=0--Displays backtrace of all callers before the error

errors=N

Continues executing the test until N errors are encountered

  • errors=0--Displays all error reports without terminating testing

If you want to make multiple customizations to the OpenBoot Diagnostics testing, you can set test-args to a comma-separated list of keywords, as in this example:


ok setenv test-args debug,loopback,media

From the OpenBoot Diagnostics Test Menu

It is easiest to run OpenBoot Diagnostics tests interactively from a menu. You access the menu by typing obdiag at the ok prompt. Refer to How to Isolate Faults Using Interactive OpenBoot Diagnostics Tests for full instructions.

The obdiag> prompt and the OpenBoot Diagnostics interactive menu (FIGURE 6-4) appear. For a brief explanation of each OpenBoot Diagnostics test, refer to TABLE 6-10 in Reference for OpenBoot Diagnostics Test Descriptions.


FIGURE 6-4 OpenBoot Diagnostics Interactive Test Menu

This figure depicts the interactive OpenBoot Diagnostics test menu for a Sun Fire V490 server


Interactive OpenBoot Diagnostics Commands

You run individual OpenBoot Diagnostics tests from the obdiag> prompt by typing:


obdiag> test n

where n represents the number associated with a particular menu item.

There are several other commands available to you from the obdiag> prompt. For descriptions of these commands, refer to TABLE 6-11 in Reference for OpenBoot Diagnostics Test Descriptions.

You can obtain a summary of this same information by typing help at the obdiag> prompt.

From the ok Prompt: The test and test-all Commands

You can also run OpenBoot Diagnostics tests directly from the ok prompt. To do this, type the test command, followed by the full hardware path of the device (or set of devices) to be tested. For example:


ok test /pci@x,y/SUNW,qlc@2



Note - Knowing how to construct an appropriate hardware device path requires precise knowledge of the hardware architecture of the Sun Fire V490 system.



To customize an individual test, you can use test-args as follows:


ok test /usb@1,3:test-args={verbose,debug}

This affects only the current test without changing the value of the test-args OpenBoot configuration variable.

You can test all the devices in the device tree with the test-all command:


ok test-all

If you specify a path argument to test-all, then only the specified device and its children are tested. The following example shows the command to test the USB bus and all connected devices with self-tests:


ok test-all /pci@9,700000/usb@1,3

What OpenBoot Diagnostics Error Messages Tell You

OpenBoot Diagnostics error results are reported in a tabular format that contains a short summary of the problem, the hardware device affected, the subtest that failed, and other diagnostic information. CODE EXAMPLE 6-2 displays a sample OpenBoot Diagnostics error message.

CODE EXAMPLE 6-2 OpenBoot Diagnostics Error Message

Testing /pci@9,700000/ebus@1/rsc-control@1,3062f8 
 
   ERROR   : SC card is not present in system, or SC card is broken.
   DEVICE  : /pci@9,700000/ebus@1/rsc-control@1,3062f8
   SUBTEST : selftest
   CALLERS : main 
   MACHINE : Sun Fire V490
   SERIAL# : 705459 
   DATE    : 11/28/2001 14:46:21  GMT 
   CONTR0LS: diag-level=min test-args=media,verbose,subtests
 
Error: /pci@9,700000/ebus@1/rsc-control@1,3062f8 selftest failed, return code = 1
Selftest at /pci@9,700000/ebus@1/rsc-control@1,3062f8 (errors=1) ...... failed
Pass:1 (of 1) Errors:1 (of 1) Tests Failed:1 Elapsed Time: 0:0:0:0

I2C Bus Device Tests

The i2c@1,2e and i2c@1,30 OpenBoot Diagnostics tests examine and report on environmental monitoring and control devices connected to the Sun Fire V490 server's Inter-IC (I2C) bus.

Error and status messages from the i2c@1,2e and i2c@1,30 OpenBoot Diagnostics tests include the hardware addresses of I2C bus devices:


Testing /pci@9,700000/ebus@1/i2c@1,2e/fru@2,a8

The I2C device address is given at the very end of the hardware path. In this example, the address is 2,a8, which indicates a device located at hexadecimal address A8 on segment 2 of the I2C bus.

To decode this device address, refer to Reference for Decoding I2C Diagnostic Test Messages. Using TABLE 6-12, you can refer to that fru@2,a8 corresponds to an I2C device on DIMM 4 on processor 2. If the i2c@1,2e test were to report an error against fru@2,a8, you would need to replace this memory module.

Other OpenBoot Commands

Beyond the formal firmware-based diagnostic tools, there are a few commands you can invoke from the ok prompt. These OpenBoot commands display information that can help you assess the condition of a Sun Fire V490 server. These include the following commands:

This section describes the information these commands give you. For instructions on using these commands, turn to How to Use OpenBoot Information Commands, or look up the appropriate man page.

.env Command

The .env command displays the current environmental status, including fan speeds; and voltages, currents, and temperatures measured at various system locations. For more information, refer to About OpenBoot Environmental Monitoring, and How to Obtain OpenBoot Environmental Status Information.

printenv Command

The printenv command displays the OpenBoot configuration variables. The display includes the current values for these variables as well as the default values. For details, refer to How to View and Set OpenBoot Configuration Variables.

For more information about printenv, refer to the printenv man page. For a list of some important OpenBoot configuration variables, refer to TABLE 6-2.

probe-scsi and probe-scsi-all Commands

The probe-scsi and probe-scsi-all commands check the presence of SCSI or FC-AL devices and verify that the bus itself is operating properly.



caution icon

Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, then issuing the probe-scsior probe-scsi-allcommand can hang the system.



The probe-scsi command communicates with all SCSI and FC-AL devices connected to on-board SCSI and FC-AL controllers. The probe-scsi-all command additionally accesses devices connected to any host adapters installed in PCI slots.

For any SCSI or FC-AL device that is connected and active, the probe-scsi and probe-scsi-all commands display its loop ID, host adapter, logical unit number, unique World Wide Name (WWN), and a device description that includes type and manufacturer.

The following is sample output from the probe-scsi command.

CODE EXAMPLE 6-3 probe-scsi Command Output

ok probe-scsi
LiD HA LUN  --- Port WWN ---  ----- Disk description ----- 
 0   0   0  2100002037cdaaca  SEAGATE ST336704FSUN36G 0726
 1   1   0  2100002037a9b64e  SEAGATE ST336704FSUN36G 0726

The following is sample output from the probe-scsi-all command.

CODE EXAMPLE 6-4 probe-scsi-all Command Output

ok probe-scsi-all
/pci@9,600000/SUNW,qlc@2
LiD HA LUN  --- Port WWN ---  ----- Disk description ----- 
 0   0   0  2100002037cdaaca  SEAGATE ST336704FSUN36G 0726
 1   1   0  2100002037a9b64e  SEAGATE ST336704FSUN36G 0726
 
/pci@8,600000/scsi@1,1
Target 4 
  Unit 0   Disk     SEAGATE ST32550W SUN2.1G0418
 
/pci@8,600000/scsi@1
 
/pci@8,600000/pci@2/SUNW,qlc@5
 
/pci@8,600000/pci@2/SUNW,qlc@4
LiD HA LUN  --- Port WWN ---  ----- Disk description ----- 
 0   0   0  2200002037cdaaca  SEAGATE ST336704FSUN36G 0726
 1   1   0  2200002037a9b64e  SEAGATE ST336704FSUN36G 0726

Note that the probe-scsi-all command lists dual-ported devices twice. This is because these FC-AL devices (refer to the qlc@2 entry in CODE EXAMPLE 6-4) can be accessed through two separate controllers: the on-board Loop-A controller and the optional Loop-B controller provided through a PCI card.

probe-ide Command

The probe-ide command communicates with all Integrated Drive Electronics (IDE) devices connected to the IDE bus. This is the internal system bus for media devices such as the DVD drive.



caution icon

Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, then issuing the probe-idecommand can hang the system.



The following is sample output from the probe-ide command.

CODE EXAMPLE 6-5 probe-scsi-all Command Output

ok probe-ide
  Device 0  ( Primary Master ) 
         Removable ATAPI Model: TOSHIBA DVD-ROM SD-C2512                
 
  Device 1  ( Primary Slave ) 
         Not Present

show-devs Command

The show-devs command lists the hardware device paths for each device in the firmware device tree. CODE EXAMPLE 6-6 shows some sample output (edited for brevity).

CODE EXAMPLE 6-6 show-devs Command Output

/pci@9,600000
/pci@9,700000
/pci@8,600000
/pci@8,700000
/memory-controller@3,400000
/SUNW,UltraSPARC-IV@3,0
/memory-controller@1,400000
/SUNW,UltraSPARC-IV@1,0
/virtual-memory
/memory@m0,20
/pci@9,600000/SUNW,qlc@2
/pci@9,600000/network@1
/pci@9,600000/SUNW,qlc@2/fp@0,0
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk

Stage Three: The Operating System

If a system passes OpenBoot Diagnostics tests, it normally attempts to boot its multiuser operating system. For most Sun systems, this means the Solaris OS. Once the server is running in multiuser mode, you have recourse to software-based diagnostic tools, like SunVTS and Sun Management Center. These tools can help you with more advanced monitoring, exercising, and fault isolating capabilities.



Note - If you set the auto-boot OpenBoot configuration variable to false, the operating system does not boot automatically following completion of the firmware-based tests.



In addition to the formal tools that run on top of Solaris OS software, there are other resources that you can use when assessing or monitoring the condition of a Sun Fire V490 server. These include:

Error and System Message Log Files

Error and other system messages are saved in the file /var/adm/messages. Messages are logged to this file from many sources, including the operating system, the environmental control subsystem, and various software applications.

For information about /var/adm/messages and other sources of system information, refer to your Solaris system administration documentation.

Solaris System Information Commands

Some Solaris commands display data that you can use when assessing the condition of a Sun Fire V490 server. These include the following commands:

This section describes the information these commands give you. For instructions on using these commands, turn to How to Use Solaris System Information Commands, or look up the appropriate man page.

prtconf Command

The prtconf command displays the Solaris device tree. This tree includes all the devices probed by OpenBoot firmware, as well as additional devices, like individual disks, that only the operating system software "knows" about. The output of prtconf also includes the total amount of system memory. CODE EXAMPLE 6-7 shows an excerpt of prtconf output (edited to save space).

CODE EXAMPLE 6-7 prtconf Command Output

System Configuration:  Sun Microsystems  sun4u
Memory size: 1024 Megabytes
System Peripherals (Software Nodes):
 
SUNW,Sun-Fire-V490
    packages (driver not attached)
        SUNW,builtin-drivers (driver not attached)
...
    SUNW,UltraSPARC-IV (driver not attached)
    memory-controller, instance #3
    pci, instance #0
        SUNW,qlc, instance #5
            fp (driver not attached)
                disk (driver not attached)
...
    pci, instance #2
        ebus, instance #0
            flashprom (driver not attached)
            bbc (driver not attached)
            power (driver not attached)
            i2c, instance #1
                fru, instance #17

The prtconf command's -p option produces output similar to the OpenBoot
show-devs command (refer to show-devs Command). This output lists only those devices compiled by the system firmware.

prtdiag Command

The prtdiag command displays a table of diagnostic information that summarizes the status of system components.

The display format used by the prtdiag command can vary depending on what version of the Solaris OS is running on your system. Following is an excerpt of some of the output produced by prtdiag on a healthy Sun Fire V490 system running Solaris 8, Update 7.

CODE EXAMPLE 6-8 prtdiag Command Output

System Configuration:  Sun Microsystems  sun4u Sun Fire V490
System clock frequency: 150 MHz
Memory size: 4096 Megabytes
 
========================= CPUs ===============================================
 
          Run   E$    CPU     CPU  
Brd  CPU  MHz   MB   Impl.    Mask 
---  ---  ---  ----  -------  ---- 
 A    0   900  8.0   US-IV 2.1
 A    2   900  8.0   US-IV 2.1
 
========================= Memory Configuration ===============================
 
          Logical  Logical  Logical
     MC   Bank     Bank     Bank         DIMM    Interleave  Interleaved
Brd  ID   num      size     Status       Size    Factor      with
---  ---  ----     ------   -----------  ------  ----------  -----------
 A    0     0       512MB   no_status     256MB     8-way        0
 A    0     1       512MB   no_status     256MB     8-way        0
 A    0     2       512MB   no_status     256MB     8-way        0
 A    0     3       512MB   no_status     256MB     8-way        0
 A    2     0       512MB   no_status     256MB     8-way        0
 A    2     1       512MB   no_status     256MB     8-way        0
 A    2     2       512MB   no_status     256MB     8-way        0
 A    2     3       512MB   no_status     256MB     8-way        0
 
========================= IO Cards =========================
 
                    Bus  Max
 IO  Port Bus       Freq Bus  Dev,
Type  ID  Side Slot MHz  Freq Func State Name                       Model
---- ---- ---- ---- ---- ---- ---- ----- -------------------------  ----------------
PCI   8    B    3    33   33    3,0  ok    TECH-SOURCE,gfxp            GFXP 
PCI    8    B     5     33    33    5,1   ok    SUNW,hme-pci108e,1001        SUNW,qsi
# 

In addition to that information, prtdiag with the verbose option (-v) also reports on front panel status, disk status, fan status, power supplies, hardware revisions, and system temperatures.

CODE EXAMPLE 6-9 prtdiag Verbose Output

System Temperatures (Celsius):
-------------------------------
Device			Temperature				Status
---------------------------------------
CPU0             59             OK
CPU2             64             OK
DBP0             22             OK

In the event of an overtemperature condition, prtdiag reports an error in the Status column.

CODE EXAMPLE 6-10 prtdiag Overtemperature Indication Output

System Temperatures (Celsius):
-------------------------------
Device				Temperature				Status
---------------------------------------
CPU0				62				OK
CPU1		 		102				ERROR

Similarly, if there is a failure of a particular component, prtdiag reports a fault in the appropriate Status column.

CODE EXAMPLE 6-11 prtdiag Fault Indication Output

Fan Status:
-----------
 
Bank             RPM    Status
----            -----   ------
CPU0             4166   [NO_FAULT]
CPU1             0000   [FAULT]

prtfru Command

The Sun Fire V490 system maintains a hierarchical list of all field-replaceable units (FRUs) in the system, as well as specific information about various FRUs.

The prtfru command can display this hierarchical list, as well as data contained in the serial electrically-erasable programmable read-only memory (SEEPROM) devices located on many FRUs. CODE EXAMPLE 6-12 shows an excerpt of a hierarchical list of FRUs generated by the prtfru command with the -l option.

CODE EXAMPLE 6-12 prtfru -l Command Output

/frutree
/frutree/chassis (fru)
/frutree/chassis/io-board (container)
/frutree/chassis/rsc-board (container)
/frutree/chassis/fcal-backplane-slot

CODE EXAMPLE 6-13 shows an excerpt of SEEPROM data generated by the prtfru command with the -c option.

CODE EXAMPLE 6-13 prtfru -c Command Output

/frutree/chassis/rsc-board (container)
   SEGMENT: SD
      /ManR
      /ManR/UNIX_Timestamp32: Fri Apr 27 00:12:36 EDT 2001
      /ManR/Fru_Description: SC PLAN B
      /ManR/Manufacture_Loc: BENCHMARK,HUNTSVILLE,ALABAMA,USA
      /ManR/Sun_Part_No: 5015856
      /ManR/Sun_Serial_No: 001927
      /ManR/Vendor_Name: AVEX Electronics
      /ManR/Initial_HW_Dash_Level: 02
      /ManR/Initial_HW_Rev_Level: 50
      /ManR/Fru_Shortname: SC

Data displayed by the prtfru command varies depending on the type of FRU. In general, this information includes:

Information about the following Sun Fire V490 FRUs is displayed by the prtfru command:

psrinfo Command

The psrinfo command displays the date and time each processor came online. With the verbose (-v) option, the command displays additional information about the processors, including their clock speed. The following is sample output from the psrinfo command with the -v option.

CODE EXAMPLE 6-14 psrinfo -v Command Output

Status of processor 0 as of: 04/11/03 12:03:45
  Processor has been on-line since 04/11/03 10:53:03.
  The sparcv9 processor operates at 900 MHz,
        and has a sparcv9 floating point processor.
Status of processor 2 as of: 04/11/03 12:03:45
  Processor has been on-line since 04/11/03 10:53:05.
  The sparcv9 processor operates at 900 MHz,
        and has a sparcv9 floating point processor.

showrev Command

The showrev command displays revision information for the current hardware and software. CODE EXAMPLE 6-15 shows sample output of the showrev command.

CODE EXAMPLE 6-15 showrev Command Output

Hostname: abc-123
Hostid: cc0ac37f
Release: 5.8
Kernel architecture: sun4u
Application architecture: sparc
Hardware provider: Sun_Microsystems
Domain: Sun.COM
Kernel version: SunOS 5.8 cstone_14:08/01/01 2001

When used with the -p option, this command displays installed patches. CODE EXAMPLE 6-16 shows a partial sample output from the showrev command with the -p option.

CODE EXAMPLE 6-16 showrev -p Command Output

Patch: 109729-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 109783-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 109807-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 109809-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 110905-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 110910-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 110914-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu
Patch: 108964-04 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsr

Tools and the Boot Process: A Summary

Different diagnostic tools are available to you at different stages of the boot process. TABLE 6-4 summarizes what tools are available to you and when they are available.


TABLE 6-4 Diagnostic Tool Availability

Stage

Available Diagnostic Tools

Fault Isolation

System Monitoring

System Exercising

Before the operating system starts

- LEDs

- POST

- OpenBoot Diagnostics

- RSC software

- OpenBoot commands

-none-

After the operating system starts

- LEDs

- RSC software

- Sun Management Center

- Solaris info commands

- OpenBoot commands

- SunVTS

- Hardware Diagnostic Suite

When the system is down and power is not available

-none-

- RSC software

-none-



About Isolating Faults in the System

Each of the tools available for fault isolation discloses faults in different field-replaceable units (FRUs). The row headings along the left of TABLE 6-5 list the FRUs in a Sun Fire V490 system. The available diagnostic tools are shown in column headings across the top. A check mark () in this table indicates that a fault in a particular FRU can be isolated by a particular diagnostic.


TABLE 6-5 FRU Coverage of Fault Isolating Tools

LEDs

POST

OpenBoot Diags

CPU/Memory Boards

 

Yes

 

IDPROM

 

 

Yes

DIMMs

 

Yes

 

DVD Drive

 

 

Yes

FC-AL Disk Drive

Yes

 

Yes

Centerplane

 

Yes

Yes

SC Card

 

 

Yes

PCI Riser

 

Yes

Yes

FC-AL Disk Backplane

 

 

Yes

Power Supplies

Yes

 

 

Fan Tray 0 (CPU)

Yes

 

 

Fan Tray 1 (PCI)

Yes

 

 


In addition to the FRUs listed in TABLE 6-5, there are several minor replaceable system components--mostly cables--that cannot directly be isolated by any system diagnostic. For the most part, you determine when these components are faulty by eliminating other possibilities. These FRUs are listed in TABLE 6-6.


TABLE 6-6 FRUs Not Directly Isolated by Diagnostic Tools

FRU

Notes

FC-AL power cable

FC-AL signal cable

If OpenBoot Diagnostics tests indicate a disk problem, but replacing the disk does not fix the problem, you should suspect the FC-AL signal and power cables are either defective or improperly connected.

Fan Tray 0 power cable

If the system is powered on and the fan does not spin, or if the Power/OK LED does not come on, but the system is up and running, you should suspect this cable.

Power distribution board

Any power issue that cannot be traced to the power supplies should lead you to suspect the power distribution board. Particular scenarios include:

  • The system will not power on, but the power supply LEDs indicate DC Present
  • System is running, but RSC indicates a missing power supply

Removable media bay board and cable assembly

If OpenBoot Diagnostics tests indicate a problem with the CD/DVD drive, but replacing the drive does not fix the problem, you should suspect this assembly is either defective or improperly connected.

System control switch/power button cable

If the system control switch and Power button appear unresponsive, you should suspect this cable is loose or defective.



About Monitoring the System

Sun provides two tools that can give you advance warning of difficulties and prevent future downtime. These are:

These monitoring tools let you specify system criteria that bear watching. For instance, you can set a threshold for system temperature and be notified if that threshold is exceeded.

Monitoring the System Using Remote System Control Software

Sun Remote System Control (RSC) software, working in conjunction with the system controller (SC) card, enables you to monitor and control your server over a serial port or a network. RSC software provides both graphical and command-line interfaces for remotely administering geographically distributed or physically inaccessible machines.

You can also redirect the server's system console to the system controller, which lets you remotely run diagnostics (like POST) that would otherwise require physical proximity to the machine's serial port.

The system controller card runs independently, and uses standby power from the server. Therefore, the SC and its RSC software continue to be effective when the server operating system goes offline.

RSC software lets you monitor the following on the Sun Fire V490 server.


TABLE 6-7 What RSC Software Monitors

Item Monitored

What RSC Software Reveals

Disk drives

Whether each slot has a drive present, and whether it reports OK status

Fan trays

Fan speed and whether the fan trays report OK status

CPU/Memory boards

The presence of a CPU/Memory board, the temperature measured at each processor, and any thermal warning or failure conditions

Power supplies

Whether each bay has a power supply present, and whether it reports OK status

System temperature

System ambient temperature as measured at several locations in the system, as well as any thermal warning or failure conditions

Server front panel

System control switch position and status of LEDs


Before you can start using RSC software, you must install and configure it on the server and client systems. Instructions for doing this are given in the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V490 Documentation CD.

You also have to make any needed physical connections and set OpenBoot configuration variables that redirect the console output to the system controller. The latter task is described in How to Redirect the System Console to the System Controller.

For instructions on using RSC software to monitor a Sun Fire V490 system, refer to How to Monitor the System Using the System Controller and RSC Software.

Monitoring the System Using Sun Management Center

Sun Management Center software provides enterprise-wide monitoring of Sun servers and workstations, including their subsystems, components, and peripheral devices. The system being monitored must be up and running, and you need to install all the proper software components on various systems in your network.

Sun Management Center lets you monitor the following on the Sun Fire V490 server.


TABLE 6-8 What Sun Management Center Software Monitors

Item Monitored

What Sun Management Center Reveals

Disk drives

Whether each slot has a drive present, and whether it reports OK status

Fan trays

Whether the fan trays report OK status

CPU/Memory boards

The presence of a CPU/Memory board, the temperature measured at each processor, and any thermal warning or failure conditions

Power supplies

Whether each bay has a power supply present, and whether it reports OK status

System temperature

System ambient temperature as measured at several locations in the system, as well as any thermal warning or failure conditions


How Sun Management Center Works

The Sun Management Center product comprises three software entities:

You install agents on systems to be monitored. The agents collect system status information from log files, device trees, and platform-specific sources, and report that data to the server component.

The server component maintains a large database of status information for a wide range of Sun platforms. This database is updated frequently, and includes information about boards, tapes, power supplies, and disks as well as operating system parameters like load, resource usage, and disk space. You can create alarm thresholds and be notified when these are exceeded.

The monitor components present the collected data to you in a standard format. Sun Management Center software provides both a standalone Javatrademark application and a Web browser-based interface. The Java interface affords physical and logical views of the system for highly-intuitable monitoring.

Other Sun Management Center Features

Sun Management Center software provides you with additional tools in the form of an informal tracking mechanism and an optional add-on diagnostics suite. In a heterogeneous computing environment, the product can interoperate with management utilities made by other companies.

Informal Tracking

Sun Management Center agent software must be loaded on any system you want to monitor. However, the product lets you informally track a supported platform even when the agent software has not been installed on it. In this case, you do not have full monitoring capability, but you can add the system to your browser, have Sun Management Center periodically check whether it is up and running, and notify you if it goes out of commission.

Add-On Diagnostic Suite

The Hardware Diagnostic Suite is available as a premium package you can purchase as an add-on to the Sun Management Center product. This suite lets you exercise a system while it is still up and running in a production environment. Refer to Exercising the System Using Hardware Diagnostic Suite for more information.

Interoperability With Third-Party Monitoring Tools

If you administer a heterogeneous network and use a third-party network-based system monitoring or management tool, you may be able to take advantage of Sun Management Center software's support for Tivoli Enterprise Console, BMC Patrol, and HP Openview.

Who Should Use Sun Management Center?

Sun Management Center software is geared primarily toward system administrators who have large data centers to monitor or other installations that have many computer platforms to monitor. If you administer a more modest installation, you need to weigh Sun Management Center software's benefits against the requirement of maintaining a significant database (typically over 700 Mbytes) of system status information.

The servers being monitored must be up and running if you want to use Sun Management Center, since this tool relies on the Solaris OS. For instructions, refer to How to Monitor the System Using Sun Management Center Software. For detailed information about the product, refer to the Sun Management Center User's Guide.

Obtaining the Latest Information

For the latest information about this product, go to the Sun Management Center Web site at: http://www.sun.com/sunmanagementcenter.


About Exercising the System

It is relatively easy to detect when a system component fails outright. However, when a system has an intermittent problem or seems to be "behaving strangely," a software tool that stresses or exercises the computer's many subsystems can help disclose the source of the emerging problem and prevent long periods of reduced functionality or system downtime.

Sun provides two tools for exercising Sun Fire V490 systems:

TABLE 6-9 shows the FRUs that each system exercising tool is capable of isolating. Note that individual tools do not necessarily test all the components or paths of a particular FRU.

 

 


TABLE 6-9 FRU Coverage of System Exercising Tools

SunVTS

Hardware Diagnostic Suite

CPU/Memory Boards

Yes

Yes

IDPROM

Yes

 

DIMMs

Yes

Yes

DVD Drive

Yes

Yes

FC-AL Disk Drive

Yes

Yes

Centerplane

Yes

Yes

SC Card

Yes

 

PCI Riser

Yes

Yes

FC-AL Disk Backplane

Yes

 


Exercising the System Using SunVTS Software

SunVTS software validation test suite performs system and subsystem stress testing. You can view and control a SunVTS session over a network. Using a remote machine, you can view the progress of a testing session, change testing options, and control all testing features of another machine on the network.

You can run SunVTS software in five different test modes:

Since SunVTS software can run many tests in parallel and consume many system resources, you should take care when using it on a production system. If you are stress-testing a system using SunVTS software's Comprehensive test mode, you should not run anything else on that system at the same time.

The Sun Fire V490 server to be tested must be up and running if you want to use SunVTS software, since it relies on the Solaris operating system. Since SunVTS software packages are optional, they may not be installed on your system. Turn to How to Check Whether SunVTS Software Is Installed for instructions.

It is important to use the most-up-to-date version of SunVTS available, to ensure you have the latest suite of tests. To download the most recent SunVTS software, point your Web browser to: http://www.sun.com/oem/products/vts/.

For instructions on running SunVTS software to exercise the Sun Fire V490 server, refer to How to Exercise the System Using SunVTS Software. For more information about the product, refer to:

These documents are available on the Solaris Software Supplement CD and on the Web at: http://docs.sun.com. You should also consult the SunVTS README file located at /opt/SUNWvts/. This document provides late-breaking information about the installed version of the product.

SunVTS Software and Security

During SunVTS software installation, you must choose between Basic or Sun Enterprise Authentication Mechanism (SEAM) security. Basic security uses a local security file in the SunVTS installation directory to limit the users, groups, and hosts permitted to use SunVTS software. SEAM security is based on Kerberos--the standard network authentication protocol--and provides secure user authentication, data integrity, and privacy for transactions over networks.

If your site uses SEAM security, you must have the SEAM client and server software installed in your networked environment and configured properly in both Solaris and SunVTS software. If your site does not use SEAM security, do not choose the SEAM option during SunVTS software installation.

If you enable the wrong security scheme during installation, or if you improperly configure the security scheme you choose, you may find yourself unable to run SunVTS tests. For more information, refer to the SunVTS User's Guide and the instructions accompanying the SEAM software.

Exercising the System Using Hardware Diagnostic Suite

The Sun Management Center product features an optional Hardware Diagnostic Suite, which you can purchase as an add-on. The Hardware Diagnostic Suite is designed to exercise a production system by running tests sequentially.

Sequential testing means the Hardware Diagnostic Suite has a low impact on the system. Unlike SunVTS, which stresses a system by consuming its resources with many parallel tests (refer to Exercising the System Using SunVTS Software), the Hardware Diagnostic Suite lets the server run other applications while testing proceeds.

When to Run Hardware Diagnostic Suite

The best use of the Hardware Diagnostic Suite is to disclose a suspected or intermittent problem with a noncritical part on an otherwise functioning machine. Examples might include questionable disk drives or memory modules on a machine that has ample or redundant disk and memory resources.

In cases like these, the Hardware Diagnostic Suite runs unobtrusively until it identifies the source of the problem. The machine under test can be kept in production mode until and unless it must be shut down for repair. If the faulty part is hot-pluggable or hot-swappable, the entire diagnose-and-repair cycle can be completed with minimal impact to system users.

Requirements for Using Hardware Diagnostic Suite

Since it is a part of Sun Management Center, you can only run Hardware Diagnostic Suite if you have set up your data center to run Sun Management Center. This means you have to dedicate a master server to run the Sun Management Center server software that supports Sun Management Center software's database of platform status information. In addition, you must install and set up Sun Management Center agent software on the systems to be monitored. Finally, you need to install the console portion of Sun Management Center software, which serves as your interface to the Hardware Diagnostic Suite.

Instructions for setting up Sun Management Center, as well as for using the Hardware Diagnostic Suite, can be found in the Sun Management Center User's Guide.


Reference for OpenBoot Diagnostics Test Descriptions

This section describes the OpenBoot Diagnostics tests and commands available to you. For background information about these tests, refer to Stage Two: OpenBoot Diagnostics Tests.


TABLE 6-10 OpenBoot Diagnostics Menu Tests

Test Name

What It Does

FRU(s) Tested

SUNW,qlc@2

Tests the registers of the Fibre Channel-Arbitrated Loop
(FC-AL) subsystem. With diag-level set to max, verifies each disk can be written to, and with test-args set to media, performs more extensive disk tests.

Centerplane,

FC-AL disk backplane

bbc@1,0

Tests all writable registers in the Boot Bus Controller. Also verifies that at least one system processor has Boot Bus access

Centerplane

ebus@1

Tests the PCI configuration registers, DMA control registers, and EBus mode registers. Also tests DMA controller functions

Centerplane

flashprom@0,0

Performs a checksum test on the Boot PROM

Centerplane

i2c@1,2e

Tests segments 0-4 of the I2C environmental monitoring subsystem, which includes various temperature and other sensors located throughout the system


Multiple. Refer to Reference for Decoding I2C Diagnostic Test Messages.

i2c@1,30

Same as above, for segment 5 of the I2C environmental monitoring subsystem

ide@6

Tests the on-board IDE controller and IDE bus subsystem that controls the DVD drive

PCI riser board, DVD drive

network@1

Tests the on-board Ethernet logic, running internal loopback tests. Can also run external loopback tests, but only if you install a loopback connector (not provided)

Centerplane

network@2

Same as above, for the other on-board Ethernet controller

Centerplane

pmc@1,300700

Tests the registers of the power management controller

PCI riser board

rsc-control@1,3062f8

Tests SC hardware, including the SC serial and Ethernet ports

SC card

rtc@1,300070

Tests the registers of the real-time clock and then tests the interrupt rates

PCI riser board

serial@1,400000

Tests all possible baud rates supported by the ttya serial line. Performs an internal and external loopback test on each line at each speed

Centerplane,
PCI riser board

usb@1,3

Tests the writable registers of the USB open host controller

Centerplane


TABLE 6-11 describes the commands you can type from the obdiag> prompt.


TABLE 6-11 OpenBoot Diagnostics Test Menu Commands

Command

Description

exit

Exits OpenBoot Diagnostics tests and returns to the ok prompt

help

Displays a brief description of each OpenBoot Diagnostics command and OpenBoot configuration variable

setenv variable value

Sets the value for an OpenBoot configuration variable (also available from the ok prompt)

test-all

Tests all devices displayed in the OpenBoot Diagnostics test menu (also available from the ok prompt)

test #

Tests only the device identified by the given menu entry number. (A similar function is available from the ok prompt. Refer to From the ok Prompt: The test and test-all Commands.)

test #,#

Tests only the devices identified by the given menu entry numbers

except #,#

Tests all devices in the OpenBoot Diagnostics test menu except those identified by the specified menu entry numbers

versions

Displays the version, last modified date, and manufacturer of each self-test in the OpenBoot Diagnostics test menu and library

what #,#

Displays selected properties of the devices identified by menu entry numbers. The information provided varies according to device type



Reference for Decoding I2C Diagnostic Test Messages

TABLE 6-12 describes each I2C device in a Sun Fire V490 system, and helps you associate each I2C address with the proper FRU. For more information about I2C tests, refer to I2C Bus Device Tests.


TABLE 6-12 Sun Fire V490 I2C Bus Devices

Address

Associated FRU

What the Device Does

fru@0,a0

processor 0, DIMM 0


Provides configuration

information for processor 0 DIMMs

fru@0,a2

processor 0, DIMM 1

fru@0,a4

processor 0, DIMM 2

fru@0,a6

processor 0, DIMM 3

fru@0,a8

processor 0, DIMM 4

fru@0,aa

processor 0, DIMM 5

fru@0,ac

processor 0, DIMM 6

fru@0,ae

processor 0, DIMM 7

fru@1,a0

processor 1, DIMM 0


Provides configuration

information for processor 1 DIMMs

fru@1,a2

processor 1, DIMM 1

fru@1,a4

processor 1, DIMM 2

fru@1,a6

processor 1, DIMM 3

fru@1,a8

processor 1, DIMM 4

fru@1,aa

processor 1, DIMM 5

fru@1,ac

processor 1, DIMM 6

fru@1,ae

processor 1, DIMM 7

fru@2,a0

processor 2, DIMM 0


Provides configuration

information for processor 2 DIMMs

fru@2,a2

processor 2, DIMM 1

fru@2,a4

processor 2, DIMM 2

fru@2,a6

processor 2, DIMM 3

fru@2,a8

processor 2, DIMM 4

fru@2,aa

processor 2, DIMM 5

fru@2,ac

processor 2, DIMM 6

fru@2,ae

processor 2, DIMM 7

fru@3,a0

processor 3, DIMM 0


Provides configuration

information for processor 3 DIMMs

fru@3,a2

processor 3, DIMM 1

fru@3,a4

processor 3, DIMM 2

fru@3,a6

processor 3, DIMM 3

fru@3,a8

processor 3, DIMM 4

fru@3,aa

processor 3, DIMM 5

fru@3,ac

processor 3, DIMM 6

fru@3,ae

processor 3, DIMM 7

fru@4,a0

CPU/Mem board, slot A

Provides configuration information for the CPU/Memory board in slot A

fru@4,a2

CPU/Mem Board, slot B

Provides configuration information for the CPU/Memory board in slot B

nvram@4,a4

PCI riser

Provides system configuration information (IDPROM)

fru@4,a8

Centerplane

Provides centerplane configuration information

fru@4,aa

PCI riser

Provides PCI riser board configuration information

fru@5,10

Centerplane

Provides communication and control for I2C subsystem

fru@5,14

RSC card

Provides communication and control for the RSC card

temperature@5,30

CPU/Mem board A

Monitors processor 0 temperature

temperature@5,32

CPU/Mem board B

Monitors processor 1 temperature

temperature@5,34

CPU/Mem board A

Monitors processor 2 temperature

temperature@5,52

CPU/Mem board B

Monitors processor 3 temperature

ioexp@5,44

FC-AL disk backplane

Monitors drive status/LED control

ioexp@5,46

FC-AL disk backplane

Monitors Loop B control

ioexp@5,4c

Power distribution board

Monitors power distribution board status

ioexp@5,70

Power Supply 0

Monitors Power Supply 0 status

ioexp@5,72

Power Supply 1

Monitors Power Supply 1 status

ioexp@5,80

Centerplane

Monitors I/O port expander

ioexp@5,82

PCI riser

Monitors I/O port expander

temperature@5,98

Reserved

Reserved for thermal monitoring

temperature-sensor@5,9c

FC-AL disk backplane

Monitors ambient temperature at disk backplane

fru@5,a0

Power Supply 0

Provides configuration information for Power Supply 0

fru@5,a2

Power Supply 1

Provides configuration information for Power Supply 1

fru@5,a6

SC card

Provides SC card configuration information

fru@5,a8

FC-AL disk backplane

Provides disk backplane configuration information

fru@5,ae

Power distribution board

Provides configuration information for the power distribution board and the enclosure

fru@5,d0

SC card

Monitors SC card's real-time clock



Reference for Terms in Diagnostic Output

The status and error messages displayed by POST diagnostics and OpenBoot Diagnostics tests occasionally include acronyms or abbreviations for hardware sub-components. TABLE 6-13 is included to assist you in decoding this terminology and associating the terms with specific FRUs, where appropriate.


TABLE 6-13 Abbreviations or Acronyms in Diagnostic Output

Term

Description

Associated FRU(s)

ADC

Analog-to-Digital Converter

PCI riser board

APC

Advanced Power Control - A function provided by the SuperIO integrated circuit

PCI riser board

BBC

Boot Bus Controller - Interface between the processors and components on many other buses

Centerplane

CDX

Data Crossbar - Part of the system bus

Centerplane

CRC

Cyclic Redundancy Check

N/A

DAR

Address Repeater - Part of the system bus

Centerplane

DCDS

Dual Data Switch - Part of the system bus

CPU/Memory board

DMA

Direct Memory Access - In diagnostic output, usually refers to a controller on a PCI card

PCI card

EBus

A byte-wide bus for low-speed devices

Centerplane, PCI riser board

HBA

Host Bus Adapter

Centerplane, various others

I2C

Inter-Integrated Circuit (also written as I2C) - A bidirectional, two-wire serial data bus. Used mainly for environmental monitoring and control

Various. Refer to TABLE 6-12.

I/O Board

PCI Riser

PCI riser

JTAG

Joint Test Access Group - An IEEE subcommittee standard (1149.1) for scanning system components

N/A

MAC

Media Access Controller - Hardware address of a device connected to a network

Centerplane

MII

Media Independent Interface - Part of Ethernet controller

Centerplane

Motherboard

Centerplane

Centerplane

NVRAM

IDPROM

IDPROM, located on PCI riser board

OBP

Refers to OpenBoot firmware

N/A

PDB

Power Distribution Board

Power distribution board

PMC

Power Management Controller

PCI riser board

POST

Power-On Self-Test

N/A

RIO

Multifunction integrated circuit bridging the PCI bus with EBus and USB

PCI riser board

RTC

Real-Time Clock

PCI riser board

RX

Receive - Communication protocol

Centerplane

Safari

The system interconnect architecture--that is, the data and address buses

CPU/Memory board, centerplane

Schizo

System bus to PCI bridge integrated circuit

Centerplane

Scan

A means for monitoring and altering the content of ASICs and system components, as provided for in the IEEE 1149.1 standard

N/A

SIO

SuperIO integrated circuit - Controls the SC UART port and more

PCI riser

TX

Transmit - Communication protocol

Centerplane

UART

Universal Asynchronous Receiver Transmitter - Serial port hardware

Centerplane, PCI riser board, SC card