This chapter contains these topics:
If the system does not have a console, you can log in remotely or attach a terminal directly to the system.
To attach a terminal to the system:
Halt the system and turn off power.
Connect the terminal to serial port A on the clock+ board.
The clock+ board is located at the back of system, near the top of the card cage. Figure 9-1 shows the Enterprise 6500/5500 cabinet server. In the 8-slot Enterprise 4500 standalone server, the clock+ board is also near the top of the card cage.
Power on the terminal.
Set up the terminal.
Refer to the OpenBoot Command Reference for instructions for using the set-defaults and printenv commands.
The settings will vary with the terminal type, but these settings are often used:
9600 bps
8 data bits
1 stop bit
Even parity
Full duplex
Turn the key switch to the diagnostic position ().
The system will turn on. The diagnostic position puts POST in interactive mode and enables extensive POST tests.
LEDs indicate system status. The front panel and the boards have three LEDs (Figure 9-2). Power supply modules have two LEDs.
The LEDs on the system front panel or the clock+ board indicate the status of the system as a whole. The LEDs on individual boards and power supplies indicate the status of the individual board or power supply. Many of the LED codes (Table 9-1) are common to the system front panel and various types of boards. Table 9-2 lists specific exceptions for LED codes for system boards.
Table 9-1 lists the LED codes for system operations.
Table 9-1 System Status Codes
Power |
Service |
Cycling |
Condition |
---|---|---|---|
Off |
Off |
Off |
No power or the key switch is in the Off position. |
Off |
On |
Off |
Failure mode. System has electrical power. |
Off |
Off |
On |
Failure mode. System has electrical power. |
Off |
On |
On |
Failure mode. System has electrical power. |
On |
Off |
Off |
System is hung, either in POST/OBP or in the operating system. |
On |
Off |
On |
Hung in OS. |
On |
On |
Off |
(Hung in POST/OBP) or (hung in OS and failed component in system). |
On |
On |
On |
(Hung in POST/OBP) or (hung in OS and failed component in system). |
On |
Off |
Flashing |
OS running. System is operating normally. |
On |
On |
Flashing |
OS running and failed component in system. |
On |
Flashing |
Off |
Slow flash = POST. Fast flash = OBP. |
On |
Flashing |
On |
OS or OBP error. |
LEDs in the system are controlled by OpenBoot(TM) PROM programming (OBP).
The clock+ board also displays system status. The LED codes are the same as for the front panel (Table 9-1).
Table 9-2 summarizes LED codes for boards. The Power, Service, and Cycling symbols are marked on the card cage frame above the respective LEDs. Note that many but not all of the LED codes are the same as the system codes (Table 9-1).
Table 9-2 Board Status LED Codes
Power |
Service |
Cycling |
Condition |
---|---|---|---|
Off |
Off |
Off |
Board has no electrical power. |
Off |
On |
Off |
Board is in low-power mode, can be unplugged. |
Off |
Off |
Flashing |
Undefined. |
Off |
On |
Flashing |
Undefined. |
On |
Off |
Off |
System is hanging, either in POST/OBP or OS. |
On |
Off |
On |
Hung in OS. |
On |
On |
Off |
(Hung in POST/OBP) or (hung in OS and failed component on board). |
On |
On |
On |
(Hung in POST/OBP) or (hung in OS and failed component on board). |
On |
Off |
Flashing |
OS running. System is operating normally. |
On |
On |
Flashing |
OS running and failed component on board. |
On |
Flash |
Off |
Slow flash = POST. Fast flash = OBP. |
On |
Flash |
On |
OS or OBP error. |
For boards, Off-On-Off indicates that the board is in low-power mode and is ready for removal. (For the system, Off-On-Off indicates a failure.)
If the Power LED is lit, do not remove the board. Removing a board that is not in low-power mode will damage the board and the system.
If the yellow LED (middle LED) is continuously lit (not flashing) the board requires service.
If the left and right green LEDs are off, the board is ready for removal.
If no LEDs are flashing, the system is hung.
If no LEDs are lit, there is no electrical power to the board.
The board status LED codes correspond to those shown in Table 9-2 for the CPU/Memory+ and I/O+ boards. The Disk board has two additional LEDs on the opposite side of the board to show the status of the two onboard disk drives. The LED for disk drive 1 is nearer to the side of the Disk board, and the LED for disk drive 0 is closer to the center of the board.
A system has one peripheral power supply and up to four or eight CPU/IO modular power supplies. All the power supplies have one green LED and one yellow LED.
The control and status signals of all power supply modules connect to the clock+ board. If the clock+ board LEDs indicate a problem, inspect the LEDs on the power supplies to locate a faulty module, if any.
The green LED is to the right of the yellow LED on the peripheral power supply. The green LED indicates that the peripheral power supply is operating, but does not necessarily indicate that the DC outputs are within specification.
When the peripheral power supply module yellow LED is lit, a DC power output has malfunctioned or the voltage level is out of specification.
The peripheral power supply produces +5 VDC and +12 VDC current. The current is available for peripherals such as a tape drive and/or CD-ROM drive. In addition, the +5 VDC output of the peripheral power supply is available at the center plane for current sharing with the +5 VDC outputs of the power supply modules.
For a PCM at the front of the card cage, the green LED is to the left of the yellow LED. At the back of the card cage, the LED positions are reversed and the green LED is to the right of the yellow LED. See Table 9-3.
When the yellow LED is lit, a fan or a DC output has malfunctioned. Each modular power supply contains two fans and three DC supplies (+3.3 VDC, +5 VDC, and +2 VDC).
The green LED indicates that the DC supplies are operating, but does not guarantee that the DC outputs are within specification.
Table 9-3 Modular Power Supply LED Codes
Green |
Yellow |
Condition |
---|---|---|
Off |
Off |
No AC input or key switch is turned off. |
On |
Off |
Normal operation. |
On |
On |
A fan has failed or one or more voltages are out of specification. |
Off |
On |
One or more DC outputs have failed, or the voltages are out of specification, or the system is in the low power state. |
The PCMs operate in redundant current share mode. If a module fails, the remaining modules may or may not provide enough current to continue system operation. The system's ability to continue operations depends on the total demand for current.
The availability and type of status information varies with the disk tray type used in a system. Refer to the disk tray user manual for specific status information.
When LED codes (Table 9-1, Table 9-2, Table 9-3) indicate a hardware problem, several types of software programs are available to supply information about the problem.
Error messages and other system messages are saved in the /var/adm/messages file.
The latest version of SunVTS(TM) (online validation test suite) has several modes of testing, including low-impact testing, which can run with minimum affect on customer applications.
The SunVTS can also be used to stress-test Sun hardware, either in or out of the Solaris operating environment. By running multiple and multithreaded diagnostic hardware tests, the SunVTS software verifies the system configuration and functionality of most hardware controllers and devices.
SunVTS tests many board and system functions, as well as interfaces for Fibre Channel, SCSI, and SBus interfaces. SunVTS accepts user-written scripts for automated testing.
Refer to the SunVTS User's Guide for starting and operating instructions.
You can use the prtdiag command to display:
System configuration, including information about clock frequencies, CPUs, memory, and I/O card types.
Diagnostic information
Failed field replaceable units (FRUs)
Refer to the prtdiag man page for instructions.
To isolate an intermittent failure, it may be helpful to maintain a prtdiag history log. Use the prtdiag command with the -l (log) option to send output to a log file in the /var/adm directory.
% /usr/platform/sun4u/sbin/prtdiag
POST and OpenBoot work together in the system to test and manage system hardware.
POST resides in the OpenBoot PROM on each CPU/Memory+ board, I/O+ board, and Disk board. When the system is turned on, or if a system reset is issued, POST detects and tests buses, power supplies, boards, CPUs, SIMMs, and many board functions. POST controls the status LEDs on the system front panel and all boards. POST displays diagnostic and error messages on a console terminal, if available.
Only POST can configure the system hardware, and only POST can enable hot-pluggable boards. If a new unit (board or modular power supply) is added to the card cage after the system has booted, the new unit will not work until the system is rebooted, at which time POST reconfigures the system, using the units that are found in the system at that time.
POST does not test drives or internal parts of SBus cards. To test these devices, run OBP diagnostics manually after the system has booted. Refer to the OpenBoot Command Reference manual for instructions.
OpenBoot provides basic environmental monitoring, including detection of overheating conditions and out-of-tolerance voltages. For example, if an overheated board is found, OpenBoot issues a warning message. If the temperature passes the danger level, POST will put the overheated board(s) in low power mode.
OpenBoot also provides a set of commands and diagnostics at the ok prompt. For example, you can use OpenBoot to set NVRAM variables that reserve a board or a set of SIMMs for hot-sparing.
The following OpenBoot commands may be useful for diagnosing problems:
Use the show-devs command to list the devices that are included in the system configuration.
Use the printenv command to display the system configuration variables stored in the system NVRAM. The display includes the current values for these variables, as well as the default values.
If the system cannot communicate with a 10BASE-T network, the Ethernet link test setting for the port may be incompatible with the setting at the network hub. See "Failure of Network Communications" for further details.
The probe-scsi command locates and tests SCSI devices attached to the system. probe-scsi is run from the OpenBoot prompt.
When it is not practical to halt the system, you can use SunVTS as an alternate method of testing the SCSI interfaces.
For more information, refer to:
OpenBoot 3.x Command Reference, part number 802-3242
Writing FCode 3.x Programs, part number 802-3230
The Solstice(TM) SyMON(TM) program monitors system functioning and features a graphical user interface (GUI) to continuously display system status. Solstice SyMON is intended to complement system management tools such as SunVTS.
Solstice SyMON is accessible through an SNMP interface from network tools such as Solstice(TM) SunNet Manager(TM).
Refer to the Solstice SyMON User's Guide, part number 802-5355, for starting and operating instructions.
The system cannot communicate with a network if the system and the network hub are not set in the same way for the Ethernet Link Integrity Test. This problem particularly applies to 10BASE-T network hubs, where the Ethernet Link Integrity Test is optional. This is not a problem for 100BASE-T networks, where the test is enabled by default.
If you connect the system to a network and the network does not respond, use the OpenBoot command watch-net-all to display conditions for all network connections:
ok watch-net-all
For SBus Ethernet cards, the test can be enabled or disabled with a hardware jumper, which you must set manually. For the TPE and MII onboard ports on the I/O+ board, the link test is enabled or disabled through software, as shown below.
The TPE and MII ports share some circuitry so do not try to use the two ports at the same time.
Some hub designs do not use a software command to enable/disable the test, but instead permanently enable (or disable) the test through a hardware jumper. Refer to the hub installation or user manual for details of how the test is implemented.
To enable or disable the link test for an on-board TPE (hme) port, you must first know the device name for the I/O+ board. To list the device names:
Shut down the system and take the system into OpenBoot.
Determine the device names of the I/O+ boards:
Use this method while the operating system is running:
Become superuser.
# eeprom nvramrc="probe-all install-console banner apply disable-link-pulse device-name " (Repeat for any additional device names.) # eeprom "use-nvramrc?"=true
Reboot the system (when convenient) to make the changes effective.
Use this alternate method when the system is already in OpenBoot:
At the monitor OpenBoot prompt, type:
ok nvedit 0: probe-all install-console banner 1: apply disable-link-pulse device-name (Repeat this step for other device names as needed.) (Press CONTROL-C to exit nvedit.) ok nvstore ok setenv use-nvramrc? true
Reboot to make the changes effective.
It is possible to reset the system or cycle power from the remote console under these conditions:
The console must be connected to port A on the clock+ board.
The key switch must be in either the On or Diagnostic setting. If the key switch is in the Secure or Off position, the remote key sequences and button resets are ignored.
Security features permit the use of the remote console.
You must use a slow typing speed, not less than 0.5 seconds and not more than 5 seconds between characters.
Command |
Enter this sequence |
---|---|
Remote power off/on |
<CR> <CR> <~> <Control-Shift-p> |
Remote system reset |
<CR> <CR> <~> <Control-Shift-r> |
Remote XIR (CPU) reset |
<CR> <CR> <~> <Control-Shift-x> |
Key: <CR> = ASCII 0d hexadecimal, <~> = ASCII 7e hexadecimal, <Control-Shift-p> = 10 hexadecimal, <Control-Shift-r> = 12 hexadecimal, <Control-Shift-x> = 18 hexadecimal. |
The remote console logic circuit continues to receive power even if you have commanded system power off.
The remote system reset command is useful for resetting the system under general conditions. The remote XIR reset command is used for software development and debugging.