Netra CT Server Service Manual
|
|
Troubleshooting the System
|
This chapter gives instructions for troubleshooting the Netra CT server. You can troubleshoot the system several ways.
In addition, Appendix C lists the error messages that might appear when you are operating or servicing your Netra CT server.
4.1 Troubleshooting the System Using the System Status Panel
You can use the system status panel to troubleshoot the Netra CT server.
4.1.1 Locating and Understanding the System Status Panel
The system status panel on the Netra CT server give the majority of troubleshooting information that you will need for your server. FIGURE 4-1 shows the locations of the system status panels on the Netra CT servers. FIGURE 4-2 shows the system status panel for the Netra CT 810 server, and FIGURE 4-3 shows the system status panel for the Netra CT 410 server.
FIGURE 4-1 System Status Panel Locations
FIGURE 4-2 System Status Panel (Netra CT 810 Server)
FIGURE 4-3 System Status Panel (Netra CT 410 Server)
4.1.2 Using the System Status Panel LEDs to Troubleshoot the System
When you first power-on the Netra CT server, some or all of the green Power LEDs on the system status panel flash on and off for several seconds. Do not attempt to troubleshoot the system until after the LEDs have gone through their initial power-on testing.
Each major component in the Netra CT 810 server or Netra CT 410 server has a set of LEDs on the system status panel that gives the status on that particular component. Each component will have either the green Power and the amber Okay to Remove LEDs (FIGURE 4-4) or the green Power and amber Fault LEDs (FIGURE 4-5).
FIGURE 4-4 Power and Okay to Remove LEDs
FIGURE 4-5 Power and Fault LEDs
TABLE 4-1 describes which combination of LEDs is used for each component in the Netra CT 810 server, and TABLE 4-2 describes which combination of LEDs is used for each component in the Netra CT 410 server. Note that the components in the Netra CT servers all have the green Power LED, and they will have either the amber Okay to Remove LED or the amber Fault LED, but not both.
TABLE 4-1 System Status Panel LEDs for the Netra CT 810 Server
LED
|
LEDs Available
|
Component
|
HDD 0
|
Power and Okay to Remove
|
Upper hard disk drive
|
HDD 1
|
Power and Okay to Remove
|
Lower hard disk drive
|
Slot 1
|
Power and Okay to Remove
|
Host CPU card installed in slot 1
|
Slots 2 - 7
|
Power and Okay to Remove
|
I/O card or satellite CPU card (●) installed in slot 2 - 7
|
Slot 8
|
Power and Okay to Remove
|
Alarm card (■) installed in slot 8
|
SCB
|
Power and Fault
|
System controller board (behind the system status panel)
|
FAN 1
|
Power and Fault
|
Upper fan tray (behind the system status panel)
|
FAN 2
|
Power and Fault
|
Lower fan tray (behind the system status panel)
|
RMM
|
Power and Okay to Remove
|
Removeable media module
|
PDU 1 (DC only)
|
Power and Fault
|
Leftmost power distribution unit (behind the server)
|
PDU 2 (DC only)
|
Power and Fault
|
Rightmost power distribution unit (behind the server)
|
PSU 1
|
Power and Okay to Remove
|
Leftmost power supply unit
|
PSU 2
|
Power and Okay to Remove
|
Rightmost power supply unit
|
TABLE 4-2 System Status Panel LEDs for the Netra CT 410 Server
LED
|
LEDs Available
|
Component
|
Slot 1
|
Power and Okay to Remove
|
Alarm card(■) installed in slot 1
|
Slot 2
|
Power and Okay to Remove
|
I/O card or satellite CPU card (●) installed in slot 2
|
Slot 3
|
Power and Okay to Remove
|
Host CPU card installed in slot 3
|
Slot 4 and 5
|
Power and Okay to Remove
|
I/O cards or satellite CPU cards (●) installed in slot 4 and 5
|
HDD 0
|
Power and Okay to Remove
|
Hard disk drive
|
SCB
|
Power and Fault
|
System controller board (behind the system status panel)
|
FAN 1
|
Power and Fault
|
Upper fan tray (behind the system status panel)
|
FAN 2
|
Power and Fault
|
Lower fan tray (behind the system status panel)
|
FTC
|
Power and Fault
|
Host CPU front transition card or host CPU front termination board
|
PDU 1 (DC only)
|
Power and Fault
|
Power distribution unit (behind the server)
|
PSU 1
|
Power and Okay to Remove
|
Power supply
|
- TABLE 4-3 gives the LED states and meanings for any CompactPCI boards installed in a slot in the Netra CT 810 server or Netra CT 410 server.
- TABLE 4-4 gives the LED states and meanings for any component other than a CompactPCI board that has the green Power and amber Okay to Remove LEDs.
- TABLE 4-5 gives the LED states and meanings for any component other than a CompactPCI board that has the green Power and amber Fault LEDs.
Note - Do not use the information in TABLE 4-4 to troubleshoot a power supply unit in a server that has only one power supply unit (a Netra CT 410 server or a Netra CT 810 server with only one power supply). To troubleshoot the power supply in a single power supply system, use the LEDs on the power supply itself. Refer to Section 4.6, Troubleshooting a Power Supply Using the Power Supply Unit LEDs for more information. The information given in TABLE 4-4 applies to all other components in the Netra CT 810 server or Netra CT 410 server, including the power supplies in a two power supply Netra CT 810 server.
|
TABLE 4-3 CompactPCI Board LED States and Meanings
Green Power LED state
|
Amber Okay to Remove LED state
|
Meaning
|
Action
|
Off
|
Off
|
The slot is empty or the system thinks that the slot is empty because the system didn't detect the card when it was inserted.
|
If there is a card installed in this slot, then one of the following components is faulty:
- the card installed in the slot
- the alarm card
- the system controller board
Remove and replace the failed component to clear this state.
|
Blinking
|
Off
|
The card is coming up or going down.
|
Do not remove the card in this state.
|
On
|
Off
|
The card is up and running.
|
Do not remove the card in this state.
|
Off
|
On
|
The card is powered off.
|
You can remove the card in this state.
|
Blinking
|
On
|
The card is powered on, but it is offline for some reason (for example, a fault was detected on the card).
|
Wait several seconds to see if the green Power LED stops blinking. If it does not stop blinking after several seconds, enter cfgadm and verify that the card is in the unconfigured state, then perform the necessary action, depending on the card:
- Alarm card--You can remove the alarm card in this state.
- All other cards--Power off the slot through the alarm card software, then remove the card.
|
On
|
On
|
The card is powered on and is in use, but a fault has been detected on the card.
|
Deactivate the card using one of the following methods:
- Use the cfgadm -f -c unconfigure command to deactivate the card. Note that in some cases, this may cause the system to panic, depending on the nature of the card hardware or software.
- Halt the system and power off the slot through the alarm card software, then remove the card.
The green Power LED will then give status information:
- If the green Power LED goes off, then you can remove the card.
- If the green Power LED remains on, then you must halt the system and power off the slot through the alarm card software.
|
TABLE 4-4 Meanings of Power and Okay to Remove LEDs
LED State
|
Power LED
|
Okay to Remove LED
|
On, Solid
|
Component is installed and configured.
|
Component is Okay to Remove. You can remove the component from the system, if necessary.
|
On, Flashing
|
Component is installed but is unconfigured or is going through the configuration process.
|
Not applicable.
|
Off
|
Component was not recognized by the system or is not installed in the slot.
|
Component is not Okay to Remove. Do not remove the component while the system is running.
|
TABLE 4-5 Meanings of Power and Fault LEDs
LED State
|
Power LED
|
Fault LED
|
On, Solid
|
Component is installed and configured.
|
Component has failed. Replace the component.
|
On, Flashing
|
Component is installed but is unconfigured or is going through the configuration process.
|
Not applicable.
|
Off
|
Component was not recognized by the system or is not installed in the slot.
|
Component is functioning properly.
|
4.2 Troubleshooting the System Using prtdiag
You can troubleshoot the system using the prtdiag command. Log into the server console and, as root, enter:
# /usr/platform/sun4u/sbin/prtdiag
|
If you have a Netra CT 810 server, you should get output on the console similar to the following:
CODE EXAMPLE 4-1 prtdiag Output for a Netra CT 810 Server
System Configuration: Sun Microsystems sun4u SPARCengine CP2000 model 140
(UltraSPARC-IIi 648MHz)
Memory size: 512 Megabytes
platform is : SUNW,NetraCT-810
=============================== FRU Information ===============================
FRU FRU FRU Green Amber Miscellaneous
Type Unit# Present LED LED Information
---------- ----- ------- ----- ----- --------------------------
Midplane 1 Yes Netra ct800
Properties:
Version=0
Maximum Slots=8
SCB 1 Yes on off System Controller Board
Properties:
Version=2
hotswap-mode=basic
SSB 1 Yes System Status Panel
CPU 1 Yes on off CPU board
temperature(celsius):38
I/O 2 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 3 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 4 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 5 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 6 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
I/O 7 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci1176,608
I/O 8 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Alarm Card
Devices:
pci
ebus
ethernet
PDU 1 Yes on off Power Distribution Unit
PDU 2 Yes on off Power Distribution Unit
PSU 1 Yes on on Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
PSU 2 Yes on on Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
FAN 1 Yes on off Fan Tray
condition:ok
fan speed:low
FAN 2 Yes on off Fan Tray
condition:ok
fan speed:low
HDD 0 Yes on off Hard Disk Drive
condition:ok
HDD 1 Yes on off Hard Disk Drive
condition:ok
RMM Yes on on Removable Media Module
condition:Unknown
System Board PROM revision:
---------------------------
OBP 3.14.1 2000/04/28 12:56
|
If you have a Netra CT 410 server, you should get output on the console similar to the following:
CODE EXAMPLE 4-2 prtdiag Output for a Netra CT 410 Server
System Configuration: Sun Microsystems sun4u SPARCengine CP2000 model 140
(UltraSPARC-IIi 648MHz)
Memory size: 512 Megabytes
platform is : SUNW,NetraCT-410
=============================== FRU Information ===============================
FRU FRU FRU Green Amber Miscellaneous
Type Unit# Present LED LED Information
---------- ----- ------- ----- ----- --------------------------
Midplane 1 Yes Netra ct400
Properties:
Version=0
Maximum Slots=5
SCB 1 Yes on off System Controller Board
Properties:
Version=2
hotswap-mode=basic
SSB 1 Yes System Status Panel
I/O 1 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Alarm Card
Devices:
pci
ebus
ethernet
I/O 2 Yes off off CompactPCI IO Slot
Properties:
auto-config=disabled
CPU 3 Yes on off CPU board
temperature(celsius):38
I/O 4 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 5 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
PDU 1 Yes on off Power Distribution Unit
PSU 1 Yes on off Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
FAN 1 Yes on off Fan Tray
condition:ok
fan speed:low
FAN 2 Yes on off Fan Tray
condition:ok
fan speed:low
HDD 0 Yes on off Hard Disk Drive
condition:ok
System Board PROM revision:
---------------------------
OBP 3.14.1 2000/04/28 12:56
|
4.3 Troubleshooting the System Using Diagnostic Software
There are several software packages that allow you to run diagnostic tests on your system, such as Sun VTS. SunVTS is a validation test suite that is provided as a supplement to the Solaris operating environment. The individual tests can stress a device, system or resource so as to detect and pinpoint specific hardware and software failures and provide users with informational messages to resolve any problems found. SunVTS runs at the operating system level.
There are several tests that are particularly useful when troubleshooting a Netra CT server:
- alarm2test--alarm2test is part of SunVTS, but it is used specifically to test the alarm card installed in the Netra CT server by invoking the alarmdiag test on the alarm card. alarm2test runs at the operating system level.
- obdiag--obdiag is similar to the alarm2test, in that it invokes the alarmdiag test on the alarm card; however, obdiag is run from the firmware level, not the operating system level.
- Apost--Apost is part of the Chorus operating system image on the alarm card. It runs a basic test on the alarm card to verify that the alarm card is operating properly before bringing up Chorus on the alarm card.
A new utility called diagconf, which is also part of the Chorus operating system image on the alarm card, is now available. You can use diagconf to set or display the configuration settings for Apost, allowing you to make the tests run on the alarm card more or less thoroughly before the Chorus operating system is brought up on the alarm card.
To display the values currently set for Apost, access the alarm card command line interface (CLI), and, through the alarm card CLI, enter the following command:
hostname cli> diagconf -d
|
You should see output similar to the following, giving you the values currently set for the Apost test on the alarm card:
diag-switch False
verb-mode True
stop-on-error False
diag-level Max
mfg-mode Off
hdr-checksum 0xaa
time-stamp 0
record-format-ver 49
post-version 02
reset-status 0xd0000000
post-status ...
post-msg Watchdog Reset-------- POST Passed-------------------
|
Some values are hard-set and cannot be changed by a user, while others can be changed to make that particular test more or less thorough. To change the value for a particular test, enter the following command:
hostname cli> diagconf -s command value
|
where command is the name of the command that you want to change, and value is the value you want to change.
The following table lists the Apost tests that can be changed by a user and the allowable values for each. Any tests not listed in TABLE 4-6 are either hard-set and cannot be changed, or should not be changed by a user.
TABLE 4-6 Apost Tests and Values through diagconf
Command
|
Value
|
diag-switch
|
- True--Turns the diag-switch test on.
- False--Turns the diag-switch test off.
|
verb-mode
|
- True--Turns the verb-mode test on.
- False--Turns the verb-mode test off.
|
stop-on-error
|
- True--Stops the Apost testing when the first error is encountered.
- False--Continues Apost testing, regardless of the number of errors encountered.
|
diag-level
|
- Off--Turns the diag-level test off.
- Min--Sets the diag-level test to the minimum level of testing.
- Max--Sets the diag-level test to the maximum level of testing
|
For more information on these and other tests in the SunVTS test suite, refer to the Computer Systems Release Notes Supplement for Sun Hardware document or the SunVTS documentation on the Solaris on Sun Hardware Answerbook, both included with your Solaris operating environment.
4.4 Troubleshooting the System Using the Power-On Self Test (POST)
When you first power-up the Netra CT server, some or all of the green Power LEDs on the system status panel will flash on and off for several seconds. The green Power LED for the I/O slot holding the CPU card (slot 1 in the Netra CT 810 server and slot 3 in the Netra CT 410 server) will go to solid green while the green Power LEDs for the remaining components are still flashing on and off; this is an indication that the CPU card has passed the power-on self test (POST).
Before any processing can occur on a system, it must successfully complete the POST. Messages are displayed for each step in the POST process. If there is a critical failure, the system will not complete POST and will not boot. To monitor this process, you must be connected to the TTY A port on the CPU card or CPU transition card. See Section 5.2.1, Logging In to the Netra CT Server.
OpenBoot PROM (OBP) variables control the console port. The variables and their possible settings are described below.
To see the console output device, enter:
ok printenv output-device
|
The screen will display something similar to the following:
The possible settings for this variable are:
- ttya (default)
- ttyb
- screen
- rsc
ttya and ttyb represent the serial ports on the CPU card. screen represents the display attached to the first frame buffer installed in the system (not present on the Netra CT server). rsc is used by the alarm card.
To see the console input device, enter:
The screen will display something similar to the following:
The possible settings for this variable are:
- ttya (default)
- ttyb
- keyboard
- rsc
ttya and ttyb represent the serial ports on the CPU card. keyboard represents the standard system keyboard (not present on the Netra CT server). rsc is used by the alarm card. If no system keyboard is connected, the console port defaults to ttya.
Note - Be sure the two variables are consistent with each other. For example, do not set the output-device to screen and the input-device to ttya.
|
There is another OBP variable that controls the behavior of the POST process called diag-level. By default, this variable is set to max, which means POST will run more thorough/verbose tests against the hardware. This variable can also be set to min, which will run a less stringent set of tests against the hardware. A minimum level of POST testing also takes less time, so the Solaris operating environment can boot more quickly on a machine with diag-level set to min.
To run the maximum amount of POST tests, enter:
To run the minimum amount of POST tests, enter:
4.5 Troubleshooting the System Using the Alarm Card Software
For information on troubleshooting using the alarm card software, refer to the Netra CT Server System Administration Guide (816-2483-xx).
4.6 Troubleshooting a Power Supply Using the Power Supply Unit LEDs
There are two LEDs on each power supply unit: a green () LED and an amber () LED. You can use the LEDs on the power supply unit to troubleshoot each power supply unit; however, because there is one power supply unit in the Netra CT 410 server and two power supply units in the Netra CT 810 server, the actions to take are different.
4.6.1 Troubleshooting the Power Supply Unit in the Netra CT 410 Server
Following are the states of the LEDs on the power supply unit in the Netra CT 410 server:
- Green, flashing--The power supply unit is in the standby mode; the power supply unit is powered on, but it is not supplying power to the server.
- Green, solid--Both the server and the power supply unit are powered on and functioning properly.
- Amber--A fault was found in the power supply unit. Replace the power supply unit. See Section 10.5, Power Supply Unit for those instructions.
- Off--One of the following conditions apply:
- The power supply locking mechanism is in the upper, unlocked position.
- The accompanying cable is disconnected from the DC power distribution unit or the AC power entry unit.
- The accompanying power distribution unit has failed.
- The power supply unit has failed.
4.6.2 Troubleshooting the Power Supply Units in the Netra CT 810 Server
When both power supply units in a Netra CT 810 server are up and running properly, the green ()LEDs on both power supply units will be ON (note that these are the LEDs on the power supply units themselves, not the LEDs on the system status panel).
If a power supply unit fails, the amber () LED on the power supply unit might light, depending on the type of failure that has occurred:
- If a soft-fault occurs, such as a stuck fan or a temperature warning, you should get a notification of the error; however, the amber (
) LED on the power supply unit will not light for a soft-fault condition. The power supply unit is still supplying power to the system during a soft-fault condition.
- If a hard-fault occurs, such as a voltage problem, you should get a notification of the error. In addition, the amber (
) LED on the power supply unit does light for a hard-fault condition. The power supply unit does not supply power to the system during a hard-fault condition.
If one power supply unit fails (either a soft-fault or a hard-fault), but the other power supply unit is still functioning normally, you should replace the faulty power supply unit as soon as possible to keep the system up and running. If both power supply units fail, the action you should take varies depending on which of the two types of fault has occurred:
If
|
Then
|
Both power supply units go through a soft-fault
|
Replace one power supply unit at a time in order to keep the system up and running.
|
One power supply unit goes through a soft-fault and the other power supply unit goes through a hard-fault
|
Replace the power supply unit that has gone through a hard-fault first in order to keep the system up and running.
|
Both power supply units go through a hard-fault
|
The system is down and you should replace at least one of the power supply units to bring the system back up again.
|
4.7 Troubleshooting a CPU Card
This section describes how to troubleshoot problems related to the CPU card. The information provided here primarily covers those situations when the system containing the CPU card does not boot up or when the CPU card is not fully functional after boot up. Only general troubleshooting tips are provided here. No component level troubleshooting information is included in this section.
The following topics are covered:
- General troubleshooting tips
- General troubleshooting requirements
- Mechanical failures
- Power-on failures
- Failures subsequent to power-on
- Troubleshooting during POST/OBP and during boot process
The following diagnostic procedures are also described:
- OpenBoot PROM on-board diagnostics
- OpenBoot diagnostics
4.7.1 General Troubleshooting Tips
|
Caution - High voltages are present in the Netra CT server. To avoid physical injury, follow all the safety rules specified in the Netra CT Server Safety and Compliance Manual when opening the enclosure and/or removing and installing the board.
|
The following general troubleshooting tips are useful in isolating the problems related to the CPU card:
1. Make sure the CPU card is installed properly in the correct slot in the Netra CT server.
The CPU card should be installed in slot 1 in the Netra CT 810 server and in slot 3 in the Netra CT 410 server.
2. Make sure all the necessary cables are attached properly to the CPU transition card.
The following figures show the connectors on the different CPU transition cards:
Note - The CPU rear transition card is the same for both the Netra CT 810 server and the Netra CT 410 server; only the location in the rear card cage differs.
|
FIGURE 4-6 Connectors on the CPU Front Transition Card (Netra CT 410 Server)
FIGURE 4-7 Connectors on the CPU Rear Transition Card
4.7.2 General Troubleshooting Requirements
The following devices are generally required to take some of the recommended actions in this section:
- Network interface
- TTYA and TTYB connection or an ASCII terminal connection to serial port
- Parallel port interface
- Loopback connectors
4.7.3 Mechanical Failures
Symptom
Unable to insert the CPU card into the backplane.
Action
1. Verify that there are no mechanical and physical obstructions in the slot where the CPU card is going to be installed.
2. Make sure no pins on the board connectors or the CompactPCI backplane connectors are bent or damaged.
4.7.4 Power-On Failures
This section provides examples of power-on failure symptoms and suggested actions. There can be several reasons for the power-on failures.
Make sure the CPU card is installed properly.
Note - If both Ready and Alarm LEDs on the CPU card are green, the board is partially functional and capable of running POST (power on self-test). It means that the basic functionality of the board is present. If none of the aforementioned LEDs is green, and the board is installed properly, the board is not functional. In that case, contact your Sun supplier or field service engineer.
|
4.7.5 Failures Subsequent to Power-On
Symptom
Cannot connect successfully to a TTY serial port; there are no POST messages and unable to send keyboard input.
Action
1. Check the TTY cable for proper setup.
2. If you do not see any output after connecting the TTY terminal to the CPU transition card, remove it and connect it to the COM port of the CPU card and try again.
4.7.6 Troubleshooting During POST/OBP and During Boot Process
This section describes certain possible problems encountered while running POST and OBP and during the boot process.
Symptom
POST error message displays:
cannot establish network service
|
Action
This might be a hardware address problem. Add or check the media access control (MAC) address to the server and the IP address at the server.
Symptom
POST detects Ecache error and a message similar to the one below is displayed:
STATUS =FAILED
TEST =Memory Addr w/ Ecache
SUSPECT=U5201 and U5202
MESSAGE=Mem Addr line compare error
addr 00000000.00000000
exp 00000000.00000000
obs 88888888.88888888
|
Action
This might be a mounting issue with the CPU Mylar film, socket, or heatsink which could have occurred during transportation or due to severe vibration. Contact Sun s Enterprise Services Solution Center.
|
Caution - Any attempt to disassemble or replace the aforementioned devices will void the warranty.
|
4.7.7 OpenBoot PROM On-Board Diagnostics
There are several OBP variables specific to the Netra CT server, such as:
- pcia-probe-list--Probes the bus that runs the first ethernet port (front connection) and standard I/O devices (by default: 1, 2)
- pcib-probe-list--Probes the bus that runs the second ethernet port (rear connection) (by default: 1, 2, 3)
- cpci-probe-list--Probes the bus that runs connections to all cPCI slots in the ct400 or ct800 (by default: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f)
The following section describes the OBP on-board diagnostics. To execute the OBP on-board diagnostics, the system must be at the ok prompt. The OBP on-board diagnostics are listed as follows:
- watch-clock
- watch-net and watch-net-all
- probe-scsi
- test alias name, device path, -all
4.7.7.1 watch-clock
The watch-clock command reads a register in the NVRAM/TOD chip and displays the result as a seconds counter. During normal operation, the seconds counter repeatedly increments from 0 to 59 until interrupted by pressing any key on the PS/2 keyboard. The following identifies the watch-clock output message.
ok watch-clock
Watching the seconds register of the real time clock chip
It should be ticking once a second
Type any key to stop
49
ok
|
4.7.7.2 watch-net and watch-net-all
The watch-net and watch-net-all commands monitor Ethernet packets on the Ethernet interfaces connected to the system. Good packets received by the system are indicated by a period (.). Errors such as the framing error and the cyclic redundancy check (CRC) error are indicated with an X and an associated error description. CODE EXAMPLE 4-3 identifies the watch-net output message and CODE EXAMPLE 4-4 identifies the watch-net-all output message.
CODE EXAMPLE 4-3 watch-net Output Message
ok watch-net
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check --
Using Onboard Transceiver - Link Up. passed Using Onboard
Transceiver - Link Up. Looking for Ethernet Packets.
. is a Good Packet. X is a Bad Packet.
Type any key to stop. .................................................. ................................................................ ................................................................ ........................................................
ok
|
CODE EXAMPLE 4-4 watch-net-all Output Message
ok watch-net-all
/pci@1f,0/pci@1,1/network@1,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Link Up. passed
Using Onboard Transceiver - Link Up.
Looking for Ethernet Packets.
. is a Good Packet.
X is a Bad Packet.
Type any key to stop. ........ ........ ........................................................ ................................................................ ................................................................ ....................................
ok
|
4.7.7.3 probe-scsi
The probe-scsi command transmits an inquiry command to SCSI devices connected to the system unit on-board SCSI interface. If the SCSI device is connected and active, the target address, unit number, device type, and manufacturer name is displayed. CODE EXAMPLE 4-5 identifies the probe-scsi output message.
CODE EXAMPLE 4-5 probe-scsi Output Message
ok probe-scsi
Primary UltraSCSI bus:
Target 0 Unit 0 Disk SEAGATE ST32272W 0876
Target 6
Unit 0 Removable Read Only device TOSHIBA CD-ROM XM-6201TA1037
ok
|
4.7.7.4 test alias name, device path, -all
The test command, combined with a device alias or device path, enables a device self-test program. If a device has no self-test program, the message: No selftest method for device name is displayed. To enable the self-test program for a device, type the test command followed by the device alias or device path name. TABLE 4-7 lists test alias name selections, a description of the selection, and preparation.
TABLE 4-7 Selected OBP On-Board Diagnostic Tests
Type of Test
|
Description
|
Preparation
|
test screen
|
Tests system video graphics hardware and monitor.
|
Diag-switch? NVRAM parameter must be true for the test to execute.
|
test floppy
|
Tests diskette drive response to commands.
|
A formatted diskette must be inserted into the diskette drive.
|
test net
|
Performs internal/external loopback test of the system auto- selected Ethernet interface.
|
An Ethernet cable must be attached to the system and to an Ethernet tap or hub or the external loopback test fails.
|
test ttya
test ttyb
|
Outputs an alphanumeric test pattern on the system serial ports: ttya, serial port A; ttyb, serial port B.
|
A terminal must be connected to the port being tested to observe the output.
|
test keyboard
|
Executes the keyboard self-test.
|
Four keyboard LEDs should flash once and a message is displayed: Keyboard Present.
|
test -all
|
Sequentially test system- configured devices containing self-test.
|
Tests are sequentially executed in device-tree order (viewed with the show-devs command).
|
4.7.8 OpenBoot Diagnostics (OB Diag)
OpenBoot Diagnostics is an interactive tool that tests various hardware and peripheral devices. When obdiag is typed at the ok prompt in OBP, the menu shown in CODE EXAMPLE 4-6 is displayed on the screen.
OBDiag performs root-cause failure analysis on the referenced devices by testing internal registers, confirming subsystem integrity, and verifying device functionality. To run OBDiag:
1. At the ok prompt, enter obdiag.
This displays the OBDiag menu as shown in CODE EXAMPLE 4-6.
2. At the OBDiag menu prompt, enter a number from the menu (such as 17 to enable toggle script-debug messages).
CODE EXAMPLE 4-6 OBDiag Menu
0 .... PCI/Cheerio
1 .... EBUS DMA/TCR Registers
2 .... Ethernet
3 .... Ethernet2 <Inactive>
4 .... Parallel Port
5 .... Serial Port C (on optional I/O board) <Inactive>
6 .... Serial Port D (on optional I/O board) <Inactive>
7 .... NVRAM
8 .... Floppy
9 .... Serial port A
10 ... Serial port B
11 ... RAS
12 ... User Flash1
13 ... User Flash2
14 ... All Above
15 ... Quit
16 ... Display this Menu
17 ... Toggle Script-debug
18 ... Enable External Loopback Tests
19 ... Disable External Loopback Tests
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
|
Caution - Prior to running obdiag, do not run any other OBP command that may change the hardware state of the board. After obdiag tests are run, always reset the system to bring it to a known state.
|
The user may type the relevant numbers at this point to run all or some of the tests. If an error is detected the error message is displayed on the screen. For example, if an error is detected while testing the floppy disk drive, a display similar to the following message is displayed on the screen:
TEST= floppy_test
STATUS= FAILED
SUBTEST= floppy_id0_read_test
ERRORS= 1
TTF= 66
SPEED= 440 MHz
PASSES= 1
MESSAGE= Error: Recalibrate failed. floppy missing, improperly connected, or defective.
|
Some of the individual items on the OBDiag menu are described in further detail in the following paragraphs.
4.7.8.1 PCI/PCIO
The PCI/PCIO diagnostic performs the following:
- vendor_ID_test: Verifies that the PCIO ASIC vendor ID is 108e.
- device_ID_test: Verifies that the PCIO ASIC device ID is 1000.
- mixmode_read: Verifies that the PCI configuration space is accessible as half-word bytes by reading the EBus2 vendor ID address.
- 2_class_test: Verifies the address class code. Address class codes include bridge device (0 x B, 0 x 6), other bridge device (0 x A and 0 x 80), and programmable interface (0 x 9 and 0 x 0).
- status_reg_walk1: Performs walk-one test on status register with mask 0 x 280 (PCIO ASIC is accepting fast back-to-back transactions, DEVSEL timing is 0 x 1).
- line_size_walk1: Performs tests a through e.
- latency_walk1: Performs walk one test on latency timer.
- line_walk1: Performs walk one test on interrupt line.
- pin_test: Verifies that the interrupt pin is logic-level high (1) after reset.
CODE EXAMPLE 4-7 identifies the PCI/PCIO output message.
CODE EXAMPLE 4-7 PCI/PCIO Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 0
TEST= all_pci/PCIO_test
SUBTEST= vendor_id_test
SUBTEST= device_id_test
SUBTEST= mixmode_read
SUBTEST= e2_class_test
SUBTEST= status_reg_walk1
SUBTEST= line_size_walk1
SUBTEST= latency_walk1
SUBTEST= line_walk1
SUBTEST= pin_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.2 EBus DMA/TCR Registers
The EBUS DMA/TCR registers diagnostic performs the following:
- The dma_reg_test: Performs a walking ones bit test for control status register, address register, and byte count register of each channel. Verifies that the control status register is set properly.
- The dma_func_test: Validates the DMA capabilities and FIFOs. The test is executed in a DMA diagnostic loopback mode. It initializes the data of transmitting memory with its address, performs a DMA read and write, and verifies that the data received is correct. This is repeated for four channels.
CODE EXAMPLE 4-8 identifies the EBus DMA/TCR registers output message.
CODE EXAMPLE 4-8 EBus DMA/TCR Registers Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 1
TEST= all_dma/ebus_test
SUBTEST= dma_reg_test
SUBTEST= dma_func_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.3 Ethernet
The Ethernet diagnostic performs the following:
- my_channel_reset resets the Ethernet channel.
- hme_reg_test performs Walk1 on the following registers set: global register 1, global register 2, bmac xif register, bmac tx register, and the mif register.
- MAC_internal_loopback_test performs Ethernet channel engine internal loopback.
- 10_mb_xcvr_loopback_test enables the 10Base-T data present at the transmit MII data inputs to be routed back to the receive MII data outputs.
- 100_mb_phy_loopback_test enables MII transmit data to be routed to the MII receive data path.
- 100_mb_twister_loopback_test forces the twisted-pair transceiver into loopback mode.
CODE EXAMPLE 4-9 identifies the Ethernet output message.
CODE EXAMPLE 4-9 Ethernet Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 2
TEST= ethernet_test
SUBTEST= my_channel_reset
SUBTEST= hme_reg_test
SUBTEST= global_reg1_test
SUBTEST= global_reg2_test
SUBTEST= bmac_xif_reg_test
SUBTEST= bmac_tx_reg_test
SUBTEST= mif_reg_test
Test only supported for National Phy DP83840A
SUBTEST= 10mb_xcvr_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
SUBTEST= 100mb_phy_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
SUBTEST= 100mb_twister_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.4 Parallel Port
The parallel port diagnostic performs the dma_read. This enables ECP mode and ECP DMA configuration, and FIFO test mode. It transfers 16 bytes of data from the memory to the parallel port device and then verifies that the data is in TFIFO. CODE EXAMPLE 4-10 identifies the parallel port output message.
CODE EXAMPLE 4-10 Parallel Port Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 4
TEST= parallel_port_test
SUBTEST= dma_read
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.5 Serial Port A
The serial port A diagnostic invokes the uart_loopback test. This test transmits and receives 128 characters and checks the transaction validity. CODE EXAMPLE 4-11 identifies the serial port A output message.
CODE EXAMPLE 4-11 Serial Port A Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 9
TEST= uarta_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
Note - The serial port A diagnostic will stall if the TIP line is installed on serial port A. CODE EXAMPLE 4-12 identifies the serial port A output message when the TIP line is installed on serial port A.
|
CODE EXAMPLE 4-12 Serial Port A Output Message with TIP Line Installed
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 9
TEST= uarta_test
UART A in use as console - Test not run.
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.6 Serial Port B
The serial port B diagnostic is identical to the serial port A diagnostic. CODE EXAMPLE 4-13 identifies the serial port B output message.
Note - The serial port B diagnostic will stall if the TIP line is installed on serial port B.
|
CODE EXAMPLE 4-13 Serial Port B Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 10
TEST= uartb_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.7 NVRAM
The NVRAM diagnostic verifies the NVRAM operation by performing a write and read to the NVRAM. CODE EXAMPLE 4-14 identifies the NVRAM output message.
CODE EXAMPLE 4-14 NVRAM Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 7
TEST= nvram_test
SUBTEST= write/read_patterns
SUBTEST= write/read_inverted_patterns
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.8.8 All Above
The All Above diagnostic validates the system unit. CODE EXAMPLE 4-15 shows an example of the All Above option output message.
CODE EXAMPLE 4-15 All Above Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 14
TEST= all_pci/cheerio_test
SUBTEST= vendor_id_test
SUBTEST= device_id_test
...
SUBTEST= bmac_xif_reg_test
SUBTEST= bmac_tx_reg_test
SUBTEST= mif_reg_test
SUBTEST= mac_internal_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
...
SUBTEST= 100mb_twister_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
TEST= ethernet2_test
TEST= parallel_port_test
SUBTEST= dma_read
TEST= uarta_test
...
SUBTEST= write/read_patterns
...
ttya in use as console - Test not run.
TEST= usi_test
ttyb in use as console - Test not run.
TEST= ras_test env-monitor = disabled
SUBTEST= obd-init-i2c-test
...
TEST= flash_test
SUBTEST= flash-supported?
TEST= flash_test
SUBTEST= flash-supported?
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
Netra CT Server Service Manual
|
816-2482-11
|
|
Copyright © 2003, Sun Microsystems, Inc. All rights reserved.