Netra CT Server Service Manual
|
|
Troubleshooting the System
|
This chapter provides instructions for troubleshooting the Netra CT server. You can troubleshoot the system several ways.
In addition, Appendix C lists the error messages that might appear when you are operating or servicing a Netra CT server.
4.1 Troubleshooting the System Using the System Status Panel
You can use the system status panel to troubleshoot the Netra CT server.
4.1.1 Locating and Understanding the System Status Panel
The system status panel on the Netra CT server give the majority of troubleshooting information that you need for your server. FIGURE 4-1 shows the locations of the system status panels on the Netra CT servers. FIGURE 4-2 shows the system status panel for the Netra CT 810 server, and FIGURE 4-3 shows the system status panel for the Netra CT 410 server.
FIGURE 4-1 System Status Panel Locations
FIGURE 4-2 System Status Panel (Netra CT 810 Server)
FIGURE 4-3 System Status Panel (Netra CT 410 Server)
4.1.2 Using the System Status Panel LEDs to Troubleshoot the System
When you first power-on the Netra CT server, some or all of the green Power LEDs on the system status panel flash on and off for several seconds. Do not attempt to troubleshoot the system until after the LEDs have gone through their initial power-on testing.
Each major component in the Netra CT 810 server and Netra CT 410 server has a set of LEDs on the system status panel that gives the status on that component. Each component has either green Power and amber Okay to Remove LEDs (FIGURE 4-4) or green Power and amber Fault LEDs (FIGURE 4-5).
FIGURE 4-4 Power and Okay to Remove LEDs
FIGURE 4-5 Power and Fault LEDs
TABLE 4-1 lists the LEDs for each component in the Netra CT 810 server, and TABLE 4-2 lists the LEDs for each component in the Netra CT 410 server. Note that the boards in the Netra CT servers all have the green Power LED, and they have either the amber Okay to Remove LED or the amber Fault LED, not both.
TABLE 4-1 System Status Panel LEDs for the Netra CT 810 Server
LED
|
LEDs Available
|
Component
|
HDD 0
|
Power and Okay to Remove
|
Upper hard drive
|
HDD 1
|
Power and Okay to Remove
|
Lower hard drive
|
Slot 1
|
Power and Okay to Remove
|
Host CPU board installed in slot 1
|
Slots 2 - 7
|
Power and Okay to Remove
|
I/O board or satellite CPU board (●) installed in slot 2 - 7
|
Slot 8
|
Power and Okay to Remove
|
Alarm card (■) installed in slot 8
|
SCB
|
Power and Fault
|
System controller board (behind the system status panel)
|
FAN 1
|
Power and Fault
|
Upper fan tray (behind the system status panel)
|
FAN 2
|
Power and Fault
|
Lower fan tray (behind the system status panel)
|
RMM
|
Power and Okay to Remove
|
Removable media module
|
PDU 1 (DC only)
|
Power and Fault
|
Left most power distribution unit (behind the server)
|
PDU 2 (DC only)
|
Power and Fault
|
Right most power distribution unit (behind the server)
|
PSU 1
|
Power and Okay to Remove
|
Left most power supply unit
|
PSU 2
|
Power and Okay to Remove
|
Right most power supply unit
|
TABLE 4-2 System Status Panel LEDs for the Netra CT 410 Server
LED
|
LEDs Available
|
Component
|
Slot 1
|
Power and Okay to Remove
|
Alarm card(■) installed in slot 1
|
Slot 2
|
Power and Okay to Remove
|
I/O board or satellite CPU board (●) installed in slot 2
|
Slot 3
|
Power and Okay to Remove
|
Host CPU board installed in slot 3
|
Slot 4 and 5
|
Power and Okay to Remove
|
I/O boards or satellite CPU boards (●) installed in slot 4 and 5
|
HDD 0
|
Power and Okay to Remove
|
hard drive
|
SCB
|
Power and Fault
|
System controller board (behind the system status panel)
|
FAN 1
|
Power and Fault
|
Upper fan tray (behind the system status panel)
|
FAN 2
|
Power and Fault
|
Lower fan tray (behind the system status panel)
|
PDU 1 (DC only)
|
Power and Fault
|
Power distribution unit (behind the server)
|
PSU 1
|
Power and Okay to Remove
|
Power supply
|
- TABLE 4-3 lists the LED states and meanings for any CompactPCI board installed in a slot in the Netra CT 810 server or Netra CT 410 server.
- TABLE 4-4 lists the LED states and meanings for any component other than a CompactPCI board that has the green Power and amber Okay to Remove LEDs.
- TABLE 4-5 lists the LED states and meanings for any component other than a CompactPCI board that has the green Power and amber Fault LEDs.
Note - Do not use the information in TABLE 4-4 to troubleshoot a power supply unit in a server that has only one power supply unit (a Netra CT 410 server or a Netra CT 810 server with only one power supply). To troubleshoot the power supply in a single power supply system, use the LEDs on the power supply itself. See Section 4.6, Troubleshooting a Power Supply Using the Power Supply Unit LEDs for more information. The information given in TABLE 4-4 applies to all other components in the Netra CT 810 server or Netra CT 410 server, including the power supplies in a two-power supply Netra CT 810 server.
|
TABLE 4-3 CompactPCI Board LED States and Meanings
Green Power LED state
|
Amber Okay to Remove LED state
|
Meaning
|
Action
|
Off
|
Off
|
The slot is empty or the system thinks that the slot is empty because the system didn't detect the component when it was inserted.
|
If there is a component installed in this slot, then one of the following boards is faulty:
- the board installed in the slot
- the alarm card
- the system controller board
Remove and replace the failed board to clear this state.
|
Blinking
|
Off
|
The component is coming up or going down.
|
Do not remove the component in this state.
|
On
|
Off
|
The component is up and running.
|
Do not remove the component in this state.
|
Off
|
On
|
The component is powered off.
|
You can remove the component in this state.
|
Blinking or Off
|
On
|
The component is powered on, but it is offline for some reason (for example, a fault was detected on the component).
|
Wait until the green Power LED stops blinking. If it does not stop blinking after several seconds, enter cfgadm and verify that the component is in the unconfigured state, then perform the necessary action, depending on the component:
- Alarm card--When the green Power LED is OFF and the amber Okay to Remove LED is ON, you can remove the alarm card.
- All other boards--Power off the slot through the alarm card software, then remove the component.
|
On
|
On
|
The component is powered on and is in use, but a fault has been detected on the component
|
Deactivate the component using one of the following methods:
- Use the cfgadm -f -c unconfigure command to deactivate the component. Note that in some cases, this may cause the system to panic, depending on the nature of the component's hardware or software.
- Halt the system and power off the slot through the alarm card software, then remove the component.
The green Power LED will then give status information:
- If the green Power LED goes off, then you can remove the component.
- If the green Power LED remains on, then you must halt the system and power off the slot through the alarm card software.
|
TABLE 4-4 Meanings of Power and Okay to Remove LEDs
LED State
|
Power LED
|
Okay to Remove LED
|
On, Solid
|
Component is installed and configured.
|
Component is Okay to Remove. You can remove the component from the system, if necessary.
|
On, Flashing
|
Component is installed but is unconfigured or is going through the configuration process.
|
Not applicable.
|
Off
|
Component was not recognized by the system or is not installed in the slot.
|
Component is not Okay to Remove. Do not remove the component while the system is running.
|
TABLE 4-5 Meanings of Power and Fault LEDs
LED State
|
Power LED
|
Fault LED
|
On, Solid
|
Component is installed and configured.
|
Component has failed. Replace the component.
|
On, Flashing
|
Component is installed but is unconfigured or is going through the configuration process.
|
Not applicable.
|
Off
|
Component was not recognized by the system or is not installed in the slot.
|
Component is functioning properly.
|
4.2 Troubleshooting the System Using prtdiag
You can troubleshoot the system using the prtdiag command. Log onto the server console and, as root, enter:
# /usr/platform/sun4u/sbin/prtdiag
|
If you have a Netra CT 810 server, the output on the console is similar to the following:
CODE EXAMPLE 4-1 prtdiag Output for a Netra CT 810 Server
System Configuration: Sun Microsystems sun4u SPARCengine CP2000 model 140
(UltraSPARC-IIi 648MHz)
Memory size: 512 Megabytes
platform is : SUNW,NetraCT-810
=============================== FRU Information ===============================
FRU FRU FRU Green Amber Miscellaneous
Type Unit# Present LED LED Information
---------- ----- ------- ----- ----- --------------------------
Midplane 1 Yes Netra ct800
Properties:
Version=0
Maximum Slots=8
SCB 1 Yes on off System Controller Board
Properties:
Version=2
hotswap-mode=basic
SSB 1 Yes System Status Panel
CPU 1 Yes on off CPU board
temperature(celsius):38
I/O 2 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 3 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 4 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 5 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 6 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
I/O 7 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci1176,608
I/O 8 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Alarm Card
Devices:
pci
ebus
ethernet
PDU 1 Yes on off Power Distribution Unit
PDU 2 Yes on off Power Distribution Unit
PSU 1 Yes on on Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
PSU 2 Yes on on Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
FAN 1 Yes on off Fan Tray
condition:ok
fan speed:low
FAN 2 Yes on off Fan Tray
condition:ok
fan speed:low
HDD 0 Yes on off hard drive
condition:ok
HDD 1 Yes on off hard drive
condition:ok
RMM Yes on on Removable Media Module
condition:Unknown
System Board PROM revision:
---------------------------
OBP 3.14.1 2000/04/28 12:56
|
If you have a Netra CT 410 server, the output on the console is similar to the following:
CODE EXAMPLE 4-2 prtdiag Output for a Netra CT 410 Server
System Configuration: Sun Microsystems sun4u SPARCengine CP2000 model 140
(UltraSPARC-IIi 648MHz)
Memory size: 512 Megabytes
platform is : SUNW,NetraCT-410
=============================== FRU Information ===============================
FRU FRU FRU Green Amber Miscellaneous
Type Unit# Present LED LED Information
---------- ----- ------- ----- ----- --------------------------
Midplane 1 Yes Netra ct400
Properties:
Version=0
Maximum Slots=5
SCB 1 Yes on off System Controller Board
Properties:
Version=2
hotswap-mode=basic
SSB 1 Yes System Status Panel
I/O 1 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Alarm Card
Devices:
pci
ebus
ethernet
I/O 2 Yes off off CompactPCI IO Slot
Properties:
auto-config=disabled
CPU 3 Yes on off CPU board
temperature(celsius):38
I/O 4 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,hme
SUNW,isptwo
I/O 5 Yes on off CompactPCI IO Slot
Properties:
auto-config=disabled
Board Type:Unknown
Devices:
pci
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
pci108e,1000
SUNW,qfe
PDU 1 Yes on off Power Distribution Unit
PSU 1 Yes on off Power Supply Unit
condition:ok
temperature:ok
ps fan:ok
supply:on
FAN 1 Yes on off Fan Tray
condition:ok
fan speed:low
FAN 2 Yes on off Fan Tray
condition:ok
fan speed:low
HDD 0 Yes on off hard drive
condition:ok
System Board PROM revision:
---------------------------
OBP 3.14.1 2000/04/28 12:56
|
4.3 Troubleshooting the System Using Diagnostic Software
Software packages, such as Sun VTS, allow you to run diagnostic tests on your system. SunVTS is a validation test suite that is provided as a supplement to the Solaris operating environment. The individual tests can stress a device, system, or resource so as to detect and pinpoint hardware and software failures and to provide users with informational messages for resolving problems. SunVTS runs at the operating system level.
The following tests are useful when troubleshooting a Netra CT server:
- alarm2test--is part of SunVTS, but it is used to test the alarm card installed in the Netra CT server by invoking the alarmdiag test on the alarm card. alarm2test runs at the operating system level.
- obdiag--is similar to the alarm2test, in that it invokes the alarmdiag test on the alarm card; however, obdiag is run from the firmware level, not the operating system level.
- Apost--is part of the embedded firmware image on the alarm card. It runs a basic test on the alarm card to verify that the alarm card is operating properly before bringing up Chorus on the alarm card.
A new utility called diagconf, which is part of the embedded firmware image on the alarm card, is now available. You can use diagconf to set or display the configuration settings for Apost, allowing you to make the tests run on the alarm card more or less thoroughly before the embedded firmware is brought up on the alarm card.
To display the values currently set for Apost, access the alarm card command line interface (CLI) and enter the following command:
hostname cli> diagconf -d
|
Output similar to the following is displayed, giving you the values currently set for the Apost test on the alarm card:
diag-switch False
verb-mode True
stop-on-error False
diag-level Max
mfg-mode Off
hdr-checksum 0xaa
time-stamp 0
record-format-ver 49
post-version 02
reset-status 0xd0000000
post-status ...
post-msg Watchdog Reset-------- POST Passed-------------------
|
Some values are hard-set and cannot be changed by a user, while others can be changed to make the test more or less thorough. To change the value for a test, enter the following command:
hostname cli> diagconf -s command value
|
where command is the name of the command that you want to change, and value is the value you want to change.
The following table lists the Apost tests that can be changed by a user and the allowable values for each. Any tests not listed in TABLE 4-6 are either hard-set and cannot be changed, or should not be changed by a user.
TABLE 4-6 Apost Tests and Values through diagconf
Command
|
Value
|
diag-switch
|
- True--Turns the diag-switch test on.
- False--Turns the diag-switch test off.
|
verb-mode
|
- True--Turns the verb-mode test on.
- False--Turns the verb-mode test off.
|
stop-on-error
|
- True--Stops the Apost testing when the first error is encountered.
- False--Continues Apost testing, regardless of the number of errors encountered.
|
diag-level
|
- Off--Turns the diag-level test off.
- Min--Sets the diag-level test to the minimum level of testing.
- Max--Sets the diag-level test to the maximum level of testing.
|
For more information on these and other tests in the SunVTS test suite, refer to the Computer Systems Release Notes Supplement for Sun Hardware document or the SunVTS documentation on the Solaris on Sun Hardware Answerbook, both included with your Solaris operating environment.
4.4 Troubleshooting the System Using the Power-On Self Test (POST)
When you first power-up the Netra CT server, some or all of the green Power LEDs on the system status panel flash on and off for several seconds. The green Power LED for the I/O slot holding the host board (slot 1 in the Netra CT 810 server and slot 3 in the Netra CT 410 server) lights solid green while the green Power LEDs for the remaining components are flashing on and off; this status is an indication that the CPU board passed the power-on self test (POST).
Before any processing occurs on a system, the system must successfully complete the POST. Messages are displayed for each step in the POST process. If there is a critical failure, the system does not complete POST and does not boot. To monitor this process, you must be connected to the TTY A port on the CPU board or rear transition module. See Section 5.2.1, Logging In to the Netra CT Server.
OpenBoot PROM (OBP) variables control the console port. The variables and their possible settings are described as follows.
To see the console output device, enter:
ok printenv output-device
|
The screen displays output similar to the following:
The possible settings for this variable are as follows:
- ttya (default)
- ttyb
- screen
- rsc
Both ttya and ttyb represent the serial ports on the CPU board. screen represents the display attached to the first frame buffer installed in the system (not present on the Netra CT server). rsc is used by the alarm card.
To see the console input device, enter:
The screen displays output similar to the following:
The possible settings for this variable are:
- ttya (default)
- ttyb
- keyboard
- rsc
ttya and ttyb represent the serial ports on the CPU board. keyboard represents the standard system keyboard (not present on the Netra CT server). rsc is used by the alarm card. If no system keyboard is connected, the console port defaults to ttya.
Note - Be sure the two variables are consistent with each other. For example, do not set the output-device to screen and the input-device to ttya.
|
Another OBP variable controls the behavior of the POST process called diag-level. By default, this variable is set to max, which means POST runs more thorough (verbose) tests against the hardware. This variable can be set to min, which runs a less stringent set of tests against the hardware. A minimum level of POST testing takes less time, so the Solaris operating environment can boot more quickly on a machine with diag-level set to min.
To run the maximum amount of POST tests, enter:
To run the minimum amount of POST tests, enter:
4.5 Troubleshooting the System Using the Alarm Card Software
For information on troubleshooting using the alarm card software, refer to the Netra CT Server System Administration Guide (819-2743-xx).
4.6 Troubleshooting a Power Supply Using the Power Supply Unit LEDs
Two LEDs are on each power supply unit: a green () LED and an amber () LED. Use the LEDs on the power supply unit to troubleshoot each power supply unit. Because there is one power supply unit in the Netra CT 410 server and two power supply units in the Netra CT 810 server, the actions to take are different. The following sections provide guidelines for each server.
4.6.1 Troubleshooting the Power Supply Unit in the Netra CT 410 Server
Following are the states of the LEDs on the power supply unit in the Netra CT 410 server:
- Green, flashing--The power supply unit is in the standby mode; the power supply unit is powered on, yet it is not supplying power to the server.
- Green, solid--Both the server and the power supply unit are powered on and functioning properly.
- Amber--A fault was found in the power supply unit. Replace the power supply unit. See Section 10.2, Cold-Swappable Power Supply Unit for those instructions.
- Off--One of the following conditions apply:
- The power supply locking mechanism is in the upper, unlocked position.
- The accompanying cable is disconnected from the power distribution unit.
- The accompanying power distribution unit failed.
- The power supply unit failed.
4.6.2 Troubleshooting the Power Supply Units in the Netra CT 810 Server
When both power supply units in a Netra CT 810 server are up and running properly, the green ()LEDs on both power supply units is ON (note that these are the LEDs on the power supply units themselves, notthe LEDs on the system status panel).
If a power supply unit fails, the amber () LED on the power supply unit might light, depending on the type of failure that occurs:
- If a soft-faultoccurs, such as a stuck fan or a temperature warning, a notification of the error is given; however, the amber (
) LED on the power supply unit does notlight for a soft-fault condition. The power supply unit is still supplying power to the system during a soft-fault condition.
- If a hard-faultoccurs, such as a voltage problem, a notification of the error is given. In addition, the amber (
) LED on the power supply unit doeslight for a hard-fault condition. The power supply unit does notsupply power to the system during a hard-fault condition.
If one power supply unit fails (either a soft-fault or a hard-fault), but the other power supply unit is still functioning normally, replace the faulty power supply unit as soon as possible. If both power supply units fail, the action to take varies depending on which of the two types of faults occurred:
If
|
Then
|
Both power supply units go through a soft-fault
|
Replace one power supply unit at a time, in order to keep the system up and running.
|
One power supply unit goes through a soft-fault and the other power supply unit goes through a hard-fault
|
Replace the power supply unit that has a hard-fault first, in order to keep the system up and running.
|
Both power supply units go through a hard-fault
|
The system is down. Replace at least one of the power supply units to bring the system back up again.
|
4.7 Troubleshooting a Host CPU Board
This section describes how to troubleshoot problems related to the host board. The information provided here primarily covers situations when the system containing the host board does not boot up or when the board is not fully functional after boot. Only general troubleshooting tips are provided here. No component-level troubleshooting information is included in this section.
The following topics are covered:
- General troubleshooting tips
- General troubleshooting requirements
- Mechanical failures
- Power-on failures
- Failures subsequent to power-on
- Troubleshooting during POST/OBP and during boot process
Also, the following diagnostic procedures are described:
- OpenBoot PROM on-board diagnostics
- OpenBoot diagnostics
4.7.1 General Troubleshooting Tips
|
Caution - High voltages are present in the Netra CT server. To avoid physical injury, follow all the safety rules specified in the Netra CT Server Safety and Compliance Manualwhen opening the enclosure and/or removing and installing a board.
|
The following general troubleshooting tips are useful in isolating issues related to a host board:
1. Make sure the host board is installed properly in the correct slot in the Netra CT server.
The CPU board is installed in slot 1 in the Netra CT 810 server and in slot 3 in the Netra CT 410 server.
2. Make sure all the necessary cables are attached properly to the host rear transition module.
The following are possible board and rear transition module combinations:
- Netra CP2140 and Netra CT CPU transition card (CTC, hereafter referred to as rear transition module)
- Netra CP2500 and Netra CP2500 RTM-H (rear transition module for host)
FIGURE 4-6 shows the connectors on the CTC. FIGURE 4-7 shows connectors on the RTM-H.
FIGURE 4-6 Connectors on the Netra CP2140 Host Rear Transition Module
FIGURE 4-7 Connectors on the Netra CP2500 Rear Transition Module Host (RTM-H) Board
4.7.2 Warning, Critical, and Shutdown Temperatures
The following temperatures apply to the Netra CP2500 board:
- Warning: 221°F (105°C)
- Critical: 230°F (110°C)
- Shutdown: 239°F (115°C)
4.7.3 General Troubleshooting Requirements
The following devices are required to take some of the recommended actions in this section:
- Network interface
- TTYA and TTYB connection or an ASCII terminal connection to serial port
- Parallel port interface
- Loopback connectors
4.7.4 Mechanical Failures
Symptom
Unable to insert the CPU board into the backplane.
Action
1. Verify that no mechanical and physical obstructions exist in the slot where the CPU board is going to be installed.
2. Make sure no pins on the board connectors or the CompactPCI backplane connectors are bent or damaged.
3. Verify that the front panel screws are seated and not preventing the board from seating properly.
4.7.5 Power-On Failures
This section provides examples of power-on failure symptoms and suggests actions.
Make sure the CPU board is installed properly.
- If the CPU board is a host Netra CP2140 or satellite Netra CP2160 and both the Ready and Alarm LEDs on the CPU board are green, the board is partially functional and capable of running POST (power on self-test). However, if none of the LEDs is green, the board is not functional. In this case, contact your Sun supplier or field service engineer.
- If the CPU board is a host or satellite Netra CP2500 and the Ready LED is green and the Alarm LEDs is either off or amber, the board is partially functional and capable of running POST (power on self-test). If the LEDs are not green and either off or amber, the board is not functional. In this case, contact your Sun supplier or field service engineer.
4.7.6 Failures Subsequent to Power-On
Symptom
Cannot connect successfully to a TTY serial port; there are no POST messages and unable to send keyboard input.
Action
1. Check the TTY cable for proper setup.
2. If you do not see any output after connecting the TTY terminal to the rear transition module, remove the terminal, then connect it to the COM port of the CPU board and try again.
4.7.7 Troubleshooting During POST/OpenBoot PROM and During Boot Process
This section describes problems encountered while running POST and OBP and during the boot process.
Symptom
POST error message displays:
cannot establish network service
|
Action
This might be a hardware address problem.
Add or check the media access control (MAC) address to the server and the IP address at the server.
Symptom
POST detects Ecache error and a message similar to the following is displayed:
STATUS =FAILED
TEST =Memory Addr w/ Ecache
SUSPECT=U5201 and U5202
MESSAGE=Mem Addr line compare error
addr 00000000.00000000
exp 00000000.00000000
obs 88888888.88888888
|
Action
This might be a mounting issue with the CPU Mylar film, socket, or heatsink, which could have occurred during transportation or due to severe vibration.
Contact Sun's Customer Care Center.
|
Caution - Any attempt to disassemble or replace the aforementioned devices voids the warranty.
|
4.7.8 OpenBoot PROM On-Board Diagnostics
Note - For Netra CP2500 boards, pcia-probe-list does not apply.
|
The following OBP variables are specific to the Netra CT server:
- pcia-probe-list--Probes the bus that runs the first ethernet port (front connection) and standard I/O devices (by default: 1, 2)
- pcib-probe-list--Probes the bus that runs the second ethernet port (rear connection) (by default: 1, 2, 3)
- cpci-probe-list--Probes the bus that runs connections to all cPCI slots in the ct400 or ct800 (by default: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f)
The following section describes the OBP on-board diagnostics. To execute the OBP on-board diagnostics, the system must be at the ok prompt. The OBP on-board diagnostics are listed as follows:
- watch-clock
- watch-net and watch-net-all
- probe-scsi
- test alias name, device path, -all
4.7.8.1 watch-clock
The watch-clock command reads a register in the TOD chip and displays the result as a seconds counter. During normal operation, the seconds counter repeatedly increments from 0 to 59 until interrupted by pressing any key on the keyboard. The following identifies the watch-clock output message.
ok watch-clock
Watching the seconds register of the real time clock chip
It should be ticking once a second
Type any key to stop
49
ok
|
4.7.8.2 watch-net and watch-net-all
The watch-net and watch-net-all commands monitor Ethernet packets on the Ethernet interfaces connected to the system. Good packets received by the system are indicated by a period (.). Errors such as the framing error and the cyclic redundancy check (CRC) error are indicated with an X and an associated error description. CODE EXAMPLE 4-3 identifies the watch-net output message and CODE EXAMPLE 4-4 identifies the watch-net-all output message.
CODE EXAMPLE 4-3 watch-net Output Message
ok watch-net
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check --
Using Onboard Transceiver - Link Up. passed Using Onboard
Transceiver - Link Up. Looking for Ethernet Packets.
. is a Good Packet. X is a Bad Packet.
Type any key to stop. .................................................. ................................................................ ................................................................ ........................................................
ok
|
CODE EXAMPLE 4-4 watch-net-all Output Message
ok watch-net-all
/pci@1f,0/pci@1,1/network@1,1
Hme register test --- succeeded.
Internal loopback test -- succeeded.
Transceiver check -- Using Onboard Transceiver - Link Up. passed
Using Onboard Transceiver - Link Up.
Looking for Ethernet Packets.
. is a Good Packet.
X is a Bad Packet.
Type any key to stop. ........ ........ ........................................................ ................................................................ ................................................................ ....................................
ok
|
4.7.8.3 probe-scsi
The probe-scsi command transmits an inquiry command to SCSI devices connected to the system unit on-board SCSI interface. If the SCSI device is connected and active, the target address, unit number, device type, and manufacturer name is displayed. CODE EXAMPLE 4-5 identifies the probe-scsi output message.
CODE EXAMPLE 4-5 probe-scsi Output Message
ok probe-scsi
Primary UltraSCSI bus:
Target 0 Unit 0 Disk SEAGATE ST32272W 0876
Target 6
Unit 0 Removable Read Only device TOSHIBA CD-ROM XM-6201TA1037
ok
|
4.7.8.4 test alias name, device path, -all
The test command, combined with a device alias name or device path, enables a device self-test program. If a device has no self-test program, the message: No selftest method for device name is displayed. To enable the self-test program for a device, type the test command followed by the device alias or device path name. TABLE 4-7 lists test alias name selections, a description of the selection, and preparation.
TABLE 4-7 Selected OpenBoot PROM On-Board Diagnostic Tests
Type of Test
|
Description
|
Preparation
|
test screen
|
Tests system video graphics hardware and monitor.
|
Diag-switch? NVRAM parameter must be true for the test to execute.
|
test floppy
|
Tests diskette drive response to commands.
|
A formatted diskette must be inserted into the diskette drive.
|
test net
|
Performs internal/external loopback test of the system auto- selected Ethernet interface.
|
An Ethernet cable must be attached to the system and to an Ethernet tap or hub or the external loopback test fails.
|
test ttya
test ttyb
|
Outputs an alphanumeric test pattern on the system serial ports: ttya, serial port A; ttyb, serial port B.
|
A terminal must be connected to the port being tested to observe the output.
|
test keyboard
|
Executes the keyboard self-test.
|
Four keyboard LEDs should flash once and a message is displayed: Keyboard Present.
|
test -all
|
Sequentially test system- configured devices containing self-test.
|
Tests are sequentially executed in device-tree order (viewed with the show-devs command).
|
4.7.9 OpenBoot Diagnostics
OpenBoot Diagnostics is an interactive tool that tests various hardware and peripheral devices. When obdiag is typed at the ok prompt in OBP, the menu shown in CODE EXAMPLE 4-6 is displayed on the screen.
obdiag performs root-cause failure analysis on the referenced devices by testing internal registers, confirming subsystem integrity, and verifying device functionality. To run obdiag:
|
Caution - Prior to running obdiag, do not run any other OBP command that might change the hardware state of the board. After obdiagtests are run, always reset the system to bring it to a known state.
|
1. At the ok prompt, enter obdiag.
This displays the OBDiag menu as shown in CODE EXAMPLE 4-6.
2. At the OBDiag menu prompt, enter a number from the menu (such as 17 to enable toggle script-debug messages).
CODE EXAMPLE 4-6 OBDiag Menu
0 .... PCI/Cheerio
1 .... EBUS DMA/TCR Registers
2 .... Ethernet
3 .... Ethernet2 <Inactive>
4 .... Parallel Port
5 .... Serial Port C (on optional I/O board) <Inactive>
6 .... Serial Port D (on optional I/O board) <Inactive>
7 .... NVRAM
8 .... Floppy
9 .... Serial port A
10 ... Serial port B
11 ... RAS
12 ... User Flash1
13 ... User Flash2
14 ... All Above
15 ... Quit
16 ... Display this Menu
17 ... Toggle Script-debug
18 ... Enable External Loopback Tests
19 ... Disable External Loopback Tests
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
You can type the relevant numbers to run all or some of the tests. If an error is detected, an error message is displayed on the screen. For example, if an error is detected while testing the floppy disk drive, a message similar to the following is displayed on the screen:
TEST= floppy_test
STATUS= FAILED
SUBTEST= floppy_id0_read_test
ERRORS= 1
TTF= 66
SPEED= 440 MHz
PASSES= 1
MESSAGE= Error: Recalibrate failed. floppy missing, improperly connected, or defective.
|
Some of the OBDiag menu options are described in detail in the following paragraphs.
4.7.9.1 PCI/PCIO
The PCI/PCIO diagnostic performs the following:
- vendor_ID_test: Verifies that the PCIO ASIC vendor ID is 108e.
- device_ID_test: Verifies that the PCIO ASIC device ID is 1000.
- mixmode_read: Verifies that the PCI configuration space is accessible as half-word bytes by reading the EBus2 vendor ID address.
- 2_class_test: Verifies the address class code. Address class codes include bridge device (0 x B, 0 x 6), other bridge device (0 x A and 0 x 80), and programmable interface (0 x 9 and 0 x 0).
- status_reg_walk1: Performs walk1 test on status register with mask 0 x 280 (PCIO ASIC is accepting fast back-to-back transactions, DEVSEL timing is 0 x 1).
- line_size_walk1: Performs tests "a" through "e."
- latency_walk1: Performs walk1 test on latency timer.
- line_walk1: Performs walk1 test on interrupt line.
- pin_test: Verifies that the interrupt pin is logic-level high (1) after reset.
CODE EXAMPLE 4-7 shows the PCI/PCIO output message.
CODE EXAMPLE 4-7 PCI/PCIO Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 0
TEST= all_pci/PCIO_test
SUBTEST= vendor_id_test
SUBTEST= device_id_test
SUBTEST= mixmode_read
SUBTEST= e2_class_test
SUBTEST= status_reg_walk1
SUBTEST= line_size_walk1
SUBTEST= latency_walk1
SUBTEST= line_walk1
SUBTEST= pin_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.2 EBus DMA/TCR Registers
The diagnostic EBus DMA/TCR registers performs the following:
- The dma_reg_test: Performs a walk1 bit test for control status register, address register, and byte count register of each channel. Verifies that the control status register is set properly.
- The dma_func_test: Validates the direct memory access (DMA) capabilities and first in, first out (FIFO). The test is executed in a DMA diagnostic loopback mode. It initializes the data of transmitting memory with its address, performs a DMA read and write, and verifies that the data received is correct. This diagnostic is repeated for four channels.
CODE EXAMPLE 4-8 shows the EBus DMA/TCR registers output message.
CODE EXAMPLE 4-8 EBus DMA/TCR Registers Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 1
TEST= all_dma/ebus_test
SUBTEST= dma_reg_test
SUBTEST= dma_func_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.3 Ethernet
The Ethernet diagnostic performs the following:
- my_channel_reset: Resets the Ethernet channel.
- hme_reg_test: Performs walk1 on the following registers set: global register 1, global register 2, bmac xif register, bmac tx register, and the mif register.
- MAC_internal_loopback_test: Performs Ethernet channel engine internal loopback.
- 10_mb_xcvr_loopback_test: Enables the 10Base-T data present at the transmit MII data inputs to be routed back to the receive MII data outputs.
- 100_mb_phy_loopback_test: Enables MII transmit data to be routed to the MII receive data path.
- 100_mb_twister_loopback_test: Forces the twisted-pair transceiver into loopback mode.
CODE EXAMPLE 4-9 shows the Ethernet output message.
CODE EXAMPLE 4-9 Ethernet Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 2
TEST= ethernet_test
SUBTEST= my_channel_reset
SUBTEST= hme_reg_test
SUBTEST= global_reg1_test
SUBTEST= global_reg2_test
SUBTEST= bmac_xif_reg_test
SUBTEST= bmac_tx_reg_test
SUBTEST= mif_reg_test
Test only supported for National Phy DP83840A
SUBTEST= 10mb_xcvr_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
SUBTEST= 100mb_phy_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
SUBTEST= 100mb_twister_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.4 Parallel Port
The parallel port diagnostic performs the dma_read. This enables the enhanced capability port (ECP) mode configuration, ECP DMA configuration, and FIFO test mode. It transfers 16 bytes of data from the memory to the parallel-port device, then verifies that the data is in FIFO test. CODE EXAMPLE 4-10 shows the parallel-port output message.
CODE EXAMPLE 4-10 Parallel Port Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 4
TEST= parallel_port_test
SUBTEST= dma_read
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
Note - The Netra CP2500 board has no parallel port.
|
4.7.9.5 Serial Port A
The serial port A diagnostic invokes the uart_loopback test. This test transmits and receives 128 characters and checks the transaction validity. CODE EXAMPLE 4-11 identifies the serial port A output message.
CODE EXAMPLE 4-11 Serial Port A Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 9
TEST= uarta_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
Note - The serial port A diagnostic stalls if the TIP line is installed on serial port A. CODE EXAMPLE 4-12 identifies the serial port A output message when the TIP line is installed on serial port A.
|
CODE EXAMPLE 4-12 Serial Port A Output Message With TIP Line Installed
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 9
TEST= uarta_test
UART A in use as console - Test not run.
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.6 Serial Port B
The serial port B diagnostic is identical to the serial port A diagnostic. CODE EXAMPLE 4-13 identifies the serial port B output message.
Note - The serial port B diagnostic stalls if the TIP line is installed on serial port B.
|
CODE EXAMPLE 4-13 Serial Port B Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 10
TEST= uartb_test
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.7 NVRAM
The NVRAM diagnostic verifies the NVRAM operation by performing a write and read to the NVRAM. CODE EXAMPLE 4-14 identifies the NVRAM output message.
CODE EXAMPLE 4-14 NVRAM Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 7
TEST= nvram_test
SUBTEST= write/read_patterns
SUBTEST= write/read_inverted_patterns
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
4.7.9.8 All Above
The All Above diagnostic validates the system unit. CODE EXAMPLE 4-15 shows an example of the All Above output message.
CODE EXAMPLE 4-15 All Above Output Message
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===> 14
TEST= all_pci/cheerio_test
SUBTEST= vendor_id_test
SUBTEST= device_id_test
...
SUBTEST= bmac_xif_reg_test
SUBTEST= bmac_tx_reg_test
SUBTEST= mif_reg_test
SUBTEST= mac_internal_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
...
SUBTEST= 100mb_twister_loopback_test
selecting internal transceiver
Test only supported for National Phy DP83840A
TEST= ethernet2_test
TEST= parallel_port_test
SUBTEST= dma_read
TEST= uarta_test
...
SUBTEST= write/read_patterns
...
ttya in use as console - Test not run.
TEST= usi_test
ttyb in use as console - Test not run.
TEST= ras_test env-monitor = disabled
SUBTEST= obd-init-i2c-test
...
TEST= flash_test
SUBTEST= flash-supported?
TEST= flash_test
SUBTEST= flash-supported?
Enter (0-14 tests, 15 -Quit, 16 -Menu) ===>
|
Netra CT Server Service Manual
|
819-2741-10
|
|
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.