C H A P T E R  2

Diagnostics

Diagnostics are a set of tests that determine the health of the hardware in your Sun Fire V20z server or Sun Fire V40z server. The diagnostics tests that are included with the server check the platform and the SP.

Diagnostics tests:

You can run diagnostics tests in either of two ways.



Note - While you run diagnostics on your server, do not interact with the SP through the command-line interface of IPMI. The values returned by the sensors are not reliable in this case. Sensor commands that are issued while diagnostics are loaded might result in the logging of false critical events in the events log.



Specific tests are designed to run on the SP and other tests are designed to run on the platform OS. See Diagnostics Modules, for more information.


SP-based Diagnostics

You can run diagnostics tests from the SP. The diagnostics files are included in the Network Share Volume (NSV) directory. If you choose to run SP-based diagnostics tests:

See the Sun Fire V20z and Sun Fire V40z Servers--Installation Guide for information about how to set up the SP, how to install and configure the NSV software, and how to use SSH scripting. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for information about how to update the diagnostics tests.



Note - The diagnostics version that is in the NSV must be the same as the version that is installed on the SP.



How to Start SP-based Diagnostics

1. To enable both SP and platform diagnostic tests, execute the command, diags start. This command reboots the platform into diagnostics mode. Wait at least two or three minutes before you attempt to run the tests.

or

To enable only SP diagnostics tests without rebooting the platform, execute the command diags start -n.



Note - For CD-based diagnostics, the -n argument specifies: Do not load the SP with diagnostics.



2. To determine if the diagnostics tests are available to run, execute the command diags get state. The command returns one of these states:

Success Text Message
The SP and the platform diagnostics systems are available to receive test requests.

or

Error Text Message

The platform diagnostics system is not available.

See Diagnostics Modules, for a table of diagnostics modules and the types of tests that they contain. The table indicates whether each test module runs on the SP or on the platform.


CD-Based Diagnostics



Note - It is possible to run platform-only tests on a previous release of the NSV (earlier than 2.x.x.x), but the user must manually disable interleaving to run memory tests. It is not possible to run SP tests from the CD with these earlier releases of the NSV.



Installing and Running CD-Based Diagnostics

BIOS does not, by default, boot into diagnostics mode. If the CD is installed in the server when the system boots and if the CD drive is first in the boot order, BIOS detects the CD and reboots in diagnostics mode. To accomplish this, follow the instructions, below.

BIOS Version 2.2.0.0 and Later

In BIOS versions 2.2.0.0 and later, you can set up BIOS to boot into the diagnostics mode. Then, during boot, the CD detects the BIOS setting and reboots the machine into diagnostics mode, if necessary. This is an option in the BIOS Advanced Menu. See the BIOS Configuration information in the Sun Fire V20z and Sun Fire V40z Servers--User Guide, for information about how to suppress the reboot.

Earlier BIOS Versions

If your BIOS version cannot boot in diagnostic mode (this information is detected on boot), the system displays a set of steps that the user can follow to configure the BIOS settings and to successfully run the memory tests. (If the settings are incorrect, the memory tests print warnings.)

Installation of CD-Based Diagnostics

To ensure that the CD boots automatically, it must be first in your server's boot sequence. The boot sequence is established in the BIOS Boot menu. You can alter the sequence as noted, below:

1. See your system vendor for the location of the ISO image:

cd_diags.iso

2. Burn the ISO image onto a CD.

3. Insert the CD into the drive and boot the platform. (The CD drive must be first on the boot list, in order for this to occur automatically. See the bullet points, above, to ensure this.)

When the CD has booted, the platform IP address displays:

Welcome to CD Diagnostics <version displayed>.
Platform eth0 connected for SSH sessions at <ipaddr>
Platform eth1 connected for SSH sessions at <ipaddr>

You can use this IP address if you want to SSH remotely. See "Remote Access to CD-Based Diagnostics" on page 43. You are logged on automatically as the user diagUser.

As soon as the CD boot process is complete, you are logged on and the CD diagnostics menu displays on your screen. You can use the menu options to run tests and capture system information, or you can use the command line.

Running CD-Based Diagnostics from the Options Menu

The options menu simplifies the process of running a full set of diagnostics tests and capturing system information on a floppy or USB storage device.

Menu Options

1. View Documentation - Use this option to open the documentation. This online documentation explains:

2. Create script run_commands.sh - Use this option to run tests and save system information in a log file. This option opens a series of three prompts. When you select the prompts, a script is created and stored in the same location as the saved log file. You can use it to run operations on multiple machines.

3. Run script run_commands.sh - Use this option to run a script that you saved to a floppy disk.

4. Go to Command Line Interface - Use this option to go to the command line interface. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for more information.

5. Shutdown System - Use this option to terminate diagnostics tests and shut down the OS.



Note - For detailed information, select View Documentation.



Remote Access to CD-Based Diagnostics

Remote access requires the prior creation of a manager-level user on the platform. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for instructions.

To use a remote command-line interface for CD-based diagnostics tests, via SSH network access:

1. SSH to the platform IP address as the user: setup.

If you already created a manager-level user on the SP, you are prompted for a username and password to create a new account. You can use any username except one of these:

diagUser
setup
root 

When your username and password are validated, you are logged off.

2. Now use your user name and password to SSH to the platform.

3. To enable only platform diagnostics tests without loading the SP tests, execute the command diags start -n.

For SP-based diagnostics, the -n argument specifies: "Do not boot the platform with diagnostics."

or

To enable both SP and platform diagnostic tests, execute the command, diags start. This command reboots the platform into diagnostics mode.

Wait at least two or three minutes before you attempt to run the tests.

or

Implement one of the following in shell or Perl:

diags start
sleep 240
rc = diags get state
if (rc ==0)
then
   # run desired tests using diags run tests command
else
   echo "Diagnostics not loaded in expected time. rc = $rc"
fi

or

rc = diags get state
timer = 0
while (rc == 25 (device error)) and (timer < MAX_WAIT)
do
    sleep SLEEP_TIME
    timer=time+SLEEP_TIME
    rc = diags get state
done
if (timer < MAX_WAIT)
then
   # run desired tests using diags run tests command
else 
  echo "Error loading platform diagnostics. rc = $rc" 
fi

4. To determine if the diagnostics tests are available to run, you can execute the command diags get state.

The command returns one of these states:

The SP and the platform diagnostics systems are available to receive test requests.

or

The platform diagnostics system is not available.
end
if re == 0
diags run tests -a


Note - See "Running Diagnostic Tests," below, for command-line arguments. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for more information about commands and the use of scripts for systems management.




Available Diagnostics Tests and Modules

To list the available modules and the tests they contain, execute the command: diags get tests.

The table below lists the available diagnostics modules and indicates whether the module runs on the platform OS or on the SP. Each module contains one or more individual tests.

TABLE 2-1 Diagnostics Modules

Module Name
(command)

Runs on

Description of Test

Memory
(memory)

Platform

Identify memory errors, address decoding faults, and dataline faults

Network Controllers
(nic)

Platform

Test the platform NIC interfaces, using an internal loopback test.

Storage
(storage)

Platform

Invoke self-tests on the SCSI drive.

Fans
(fan)

SP

Verify that each fan is rotating and that the RPM is within the specified ranges.

Flash
(flash)

SP

Read and write flash files.

LED
(led)

SP

Verify the correct operation of the LED drive circuitry. (Non-interactive tests.)

Operator Panel
(oppanel)

SP

Verify the memory of the Operator Panel. Indicates values and locations of any errors.

Power
(power)

SP

Verify that the power backplane and power supplies are functioning properly. (Not available for all systems.)

Temperature
(temp)

SP

Verify that each of the temperature sensors is functional and that the temperature is within the specified ranges.

Voltage (voltage)

SP

Verify the derived (generated by various VRMs in the system) and bulk voltages.

 


Running Diagnostic Tests



Note - When you launch diagnostics on the platform OS, the system attempts to mount the floppy drive and returns this error: mount : Mounting /dev/fd0 on /mnt/floppy failed. No such device. You can safely ignore this error message.



If you run tests from the command-line interface, you can choose to execute all tests, tests for a specific module (fans, memory, voltage, temperature, and so on), specific tests within a module, or any combination of these options. You specify these options when you execute the diags run tests command.

For example, to run the Operator Panel diagnostics module, the command is:

diags run tests -m oppanel.


Note - You can write scripts for additional control over the timing of the tests. For example, you can write a shell script to repeat a test a specified number of times. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for details.




Test Results

After a test is complete, the status is returned. If a test detects an error, the software reports details about the error and continues to run any remaining tests that were submitted.



Note - Specify the -v| --verbose option to display details for all tests, including successes. For example, details might include high, normal, and low values.



The following data is generated for all diagnostics tests.



Note - See Diags Test Results, for examples of output for all diagnostics tests.



To locate a component that is identified by a diagnostics test see the System Status window of the SM Console, which enables you to view a representative display of system components and related sensors. For more information about the SM Console, see the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide. For illustrations of the system and component labels, see the Sun Fire V20z and Sun Fire V40z Servers--User Guide and the Sun Fire V20z and Sun Fire V40z Servers--Installation Guide.


Sample Output

This section contains output that could be returned if you start diags in no-platform mode, with the power on, and with the --verbose argument. For example:

diags start -n
platform set power state on -f
diags run tests -a -v

Typical output is included below:

Submitted Test Name           Test Handle
speed.allFans                        1
 
Results
Submitted Test Name           Test Handle  Test Result
speed.allFans                        1                   Passed
    Test Details:
        fan1.tach         Passed
            Controller:   fan-ctrl
            High Rated:   13000
            High Actual:  13740
            High Delta:   +5.39%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10580
            Low Actual:   11100
            Low Delta:    4.69%
            Low Limits:   -/+15%
            Sensor:       Fan 1 measured speed (ID=fan1.tach)
            Component(s): Fan 1  (ID=NA)
        fan2.tach         Passed
            Controller:   fan-ctrl
            High Rated:   13000
            High Actual:  13920
            High Delta:   +6.61%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10718
            Low Actual:   11100
            Low Delta:    3.44%
            Low Limits:   -/+15%
            Sensor:       Fan 2 measured speed (ID=fan2.tach)
            Component(s): Fan 2 (ID=NA)
        fan3.tach         Passed
            Controller:   fan-ctrl1
            High Rated:   13000
            High Actual:  13860
            High Delta:   +6.20%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10672
            Low Actual:   11040
            Low Delta:    3.33%
            Low Limits:   -/+15%
            Sensor:       Fan 3 measured speed (ID=fan3.tach)
            Component(s): Fan 3 (ID=NA)        fan4.tach         Passed
            Controller:   fan-ctrl1
            High Rated:   13000
            High Actual:  13920
            High Delta:   +6.61%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10718
            Low Actual:   11100
            Low Delta:    3.44%
            Low Limits:   -/+15%
            Sensor:       Fan 4 measured speed (ID=fan4.tach)
            Component(s): Fan 4 (ID=NA)
        fan5.tach         Passed
            Controller:   fan-ctrl2
            High Rated:   13000
            High Actual:  13980
            High Delta:   +7.01%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10765
            Low Actual:   11100
            Low Delta:    3.02%
            Low Limits:   -/+15%
            Sensor:       Fan 5 measured speed (ID=fan5.tach)
            Component(s): Fan 5 (ID=NA)
        fan6.tach         Passed
            Controller:   fan-ctrl2
            High Rated:   13000
            High Actual:  14160
            High Delta:   +8.19%
            High Limits:  -10/+35%
            Low Setpoint: 10010
            Low Expected: 10903
            Low Actual:   11340
            Low Delta:    3.85%
            Low Limits:   -/+15%
            Sensor:       Fan 6 measured speed (ID=fan6.tach)
            Component(s): Fan 6 (ID=NA)


Saving Test Results

SP-based Diagnostics

To save SP-based diagnostic test results, save the output as a network share volume file. For example, to save results of all the tests that you run in diags.log1, use:

diags run tests -all > /mnt/log/diags.log1

CD-Based Diagnostics Tests

To save CD-Based diagnostic test results, mount a USB stick or a floppy drive and save the results.

mount /usbstorage


Note - Mounting usbstorage works only if you have a single disk drive in your system.



mount /floppy
umount /<usbstorage | floppy>


Stopping Tests

diags cancel tests {-t|--test} TEST HANDLE {-a|--all}