C H A P T E R 2 |
Diagnostics |
Diagnostics are a set of tests that determine the health of the hardware in your Sun Fire V20z server or Sun Fire V40z server. The diagnostics tests that are included with the server check the platform and the SP.
You can run diagnostics tests in either of two ways.
Specific tests are designed to run on the SP and other tests are designed to run on the platform OS. See Diagnostics Modules, for more information.
You can run diagnostics tests from the SP. The diagnostics files are included in the Network Share Volume (NSV) directory. If you choose to run SP-based diagnostics tests:
See the Sun Fire V20z and Sun Fire V40z Servers--Installation Guide for information about how to set up the SP, how to install and configure the NSV software, and how to use SSH scripting. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for information about how to update the diagnostics tests.
Note - The diagnostics version that is in the NSV must be the same as the version that is installed on the SP. |
1. To enable both SP and platform diagnostic tests, execute the command, diags start. This command reboots the platform into diagnostics mode. Wait at least two or three minutes before you attempt to run the tests.
or
To enable only SP diagnostics tests without rebooting the platform, execute the command diags start -n.
Note - For CD-based diagnostics, the -n argument specifies: Do not load the SP with diagnostics. |
2. To determine if the diagnostics tests are available to run, execute the command diags get state. The command returns one of these states:
Success Text Message
The SP and the platform diagnostics systems are available to receive test requests.
The platform diagnostics system is not available.
See Diagnostics Modules, for a table of diagnostics modules and the types of tests that they contain. The table indicates whether each test module runs on the SP or on the platform.
BIOS does not, by default, boot into diagnostics mode. If the CD is installed in the server when the system boots and if the CD drive is first in the boot order, BIOS detects the CD and reboots in diagnostics mode. To accomplish this, follow the instructions, below.
In BIOS versions 2.2.0.0 and later, you can set up BIOS to boot into the diagnostics mode. Then, during boot, the CD detects the BIOS setting and reboots the machine into diagnostics mode, if necessary. This is an option in the BIOS Advanced Menu. See the BIOS Configuration information in the Sun Fire V20z and Sun Fire V40z Servers--User Guide, for information about how to suppress the reboot.
If your BIOS version cannot boot in diagnostic mode (this information is detected on boot), the system displays a set of steps that the user can follow to configure the BIOS settings and to successfully run the memory tests. (If the settings are incorrect, the memory tests print warnings.)
To ensure that the CD boots automatically, it must be first in your server's boot sequence. The boot sequence is established in the BIOS Boot menu. You can alter the sequence as noted, below:
1. See your system vendor for the location of the ISO image:
cd_diags.iso
2. Burn the ISO image onto a CD.
3. Insert the CD into the drive and boot the platform. (The CD drive must be first on the boot list, in order for this to occur automatically. See the bullet points, above, to ensure this.)
When the CD has booted, the platform IP address displays:
Welcome to CD Diagnostics <version displayed>.
Platform eth0 connected for SSH sessions at <ipaddr>
Platform eth1 connected for SSH sessions at <ipaddr>
You can use this IP address if you want to SSH remotely. See "Remote Access to CD-Based Diagnostics" on page 43. You are logged on automatically as the user diagUser.
As soon as the CD boot process is complete, you are logged on and the CD diagnostics menu displays on your screen. You can use the menu options to run tests and capture system information, or you can use the command line.
The options menu simplifies the process of running a full set of diagnostics tests and capturing system information on a floppy or USB storage device.
1. View Documentation - Use this option to open the documentation. This online documentation explains:
2. Create script run_commands.sh - Use this option to run tests and save system information in a log file. This option opens a series of three prompts. When you select the prompts, a script is created and stored in the same location as the saved log file. You can use it to run operations on multiple machines.
3. Run script run_commands.sh - Use this option to run a script that you saved to a floppy disk.
4. Go to Command Line Interface - Use this option to go to the command line interface. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for more information.
5. Shutdown System - Use this option to terminate diagnostics tests and shut down the OS.
Remote access requires the prior creation of a manager-level user on the platform. See the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide for instructions.
To use a remote command-line interface for CD-based diagnostics tests, via SSH network access:
1. SSH to the platform IP address as the user: setup.
If you already created a manager-level user on the SP, you are prompted for a username and password to create a new account. You can use any username except one of these:
diagUser
setup
root
When your username and password are validated, you are logged off.
2. Now use your user name and password to SSH to the platform.
3. To enable only platform diagnostics tests without loading the SP tests, execute the command diags start -n.
For SP-based diagnostics, the -n argument specifies: "Do not boot the platform with diagnostics."
To enable both SP and platform diagnostic tests, execute the command, diags start. This command reboots the platform into diagnostics mode.
Wait at least two or three minutes before you attempt to run the tests.
Implement one of the following in shell or Perl:
diags start
sleep 240
rc = diags get state
if (rc ==0)
then
# run desired tests using diags run tests command
else
echo "Diagnostics not loaded in expected time. rc = $rc"
fi
rc = diags get state
timer = 0
while (rc == 25 (device error)) and (timer < MAX_WAIT)
do
sleep SLEEP_TIME
timer=time+SLEEP_TIME
rc = diags get state
done
if (timer < MAX_WAIT)
then
# run desired tests using diags run tests command
else
echo "Error loading platform diagnostics. rc = $rc"
fi
4. To determine if the diagnostics tests are available to run, you can execute the command diags get state.
The command returns one of these states:
The SP and the platform diagnostics systems are available to receive test requests.
The platform diagnostics system is not available.
end
if re == 0
diags run tests -a
To list the available modules and the tests they contain, execute the command: diags get tests.
The table below lists the available diagnostics modules and indicates whether the module runs on the platform OS or on the SP. Each module contains one or more individual tests.
If you run tests from the command-line interface, you can choose to execute all tests, tests for a specific module (fans, memory, voltage, temperature, and so on), specific tests within a module, or any combination of these options. You specify these options when you execute the diags run tests command.
For example, to run the Operator Panel diagnostics module, the command is:
diags run tests -m oppanel.
After a test is complete, the status is returned. If a test detects an error, the software reports details about the error and continues to run any remaining tests that were submitted.
Note - Specify the -v| --verbose option to display details for all tests, including successes. For example, details might include high, normal, and low values. |
The following data is generated for all diagnostics tests.
Note - See Diags Test Results, for examples of output for all diagnostics tests. |
To locate a component that is identified by a diagnostics test see the System Status window of the SM Console, which enables you to view a representative display of system components and related sensors. For more information about the SM Console, see the Sun Fire V20z and Sun Fire V40z Servers--Server Management Guide. For illustrations of the system and component labels, see the Sun Fire V20z and Sun Fire V40z Servers--User Guide and the Sun Fire V20z and Sun Fire V40z Servers--Installation Guide.
This section contains output that could be returned if you start diags in no-platform mode, with the power on, and with the --verbose argument. For example:
diags start -n
platform set power state on -f
diags run tests -a -v
Typical output is included below:
Submitted Test Name Test Handle
speed.allFans 1
Results
Submitted Test Name Test Handle Test Result
speed.allFans 1 Passed
Test Details:
fan1.tach Passed
Controller: fan-ctrl
High Rated: 13000
High Actual: 13740
High Delta: +5.39%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10580
Low Actual: 11100
Low Delta: 4.69%
Low Limits: -/+15%
Sensor: Fan 1 measured speed (ID=fan1.tach)
Component(s): Fan 1 (ID=NA)
fan2.tach Passed
Controller: fan-ctrl
High Rated: 13000
High Actual: 13920
High Delta: +6.61%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10718
Low Actual: 11100
Low Delta: 3.44%
Low Limits: -/+15%
Sensor: Fan 2 measured speed (ID=fan2.tach)
Component(s): Fan 2 (ID=NA)
fan3.tach Passed
Controller: fan-ctrl1
High Rated: 13000
High Actual: 13860
High Delta: +6.20%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10672
Low Actual: 11040
Low Delta: 3.33%
Low Limits: -/+15%
Sensor: Fan 3 measured speed (ID=fan3.tach)
Component(s): Fan 3 (ID=NA) fan4.tach Passed
Controller: fan-ctrl1
High Rated: 13000
High Actual: 13920
High Delta: +6.61%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10718
Low Actual: 11100
Low Delta: 3.44%
Low Limits: -/+15%
Sensor: Fan 4 measured speed (ID=fan4.tach)
Component(s): Fan 4 (ID=NA)
fan5.tach Passed
Controller: fan-ctrl2
High Rated: 13000
High Actual: 13980
High Delta: +7.01%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10765
Low Actual: 11100
Low Delta: 3.02%
Low Limits: -/+15%
Sensor: Fan 5 measured speed (ID=fan5.tach)
Component(s): Fan 5 (ID=NA)
fan6.tach Passed
Controller: fan-ctrl2
High Rated: 13000
High Actual: 14160
High Delta: +8.19%
High Limits: -10/+35%
Low Setpoint: 10010
Low Expected: 10903
Low Actual: 11340
Low Delta: 3.85%
Low Limits: -/+15%
Sensor: Fan 6 measured speed (ID=fan6.tach)
Component(s): Fan 6 (ID=NA)
To save SP-based diagnostic test results, save the output as a network share volume file. For example, to save results of all the tests that you run in diags.log1, use:
diags run tests -all > /mnt/log/diags.log1
To save CD-Based diagnostic test results, mount a USB stick or a floppy drive and save the results.
mount /usbstorage
Note - Mounting usbstorage works only if you have a single disk drive in your system. |
mount /floppy
umount /<usbstorage | floppy>
diags cancel tests {-t|--test} TEST HANDLE {-a|--all}
Copyright © 2005, Sun Microsystems, Inc. All Rights Reserved.