A P P E N D I X C |
Error Messages |
This appendix describes error messages that you might see while operating or servicing a Netra CT server, their meanings, and the actions necessary for each. All error messages in this appendix are written to the /var/adm/messages file on your system.
This program must be run on the same chassis.
You must restart mcnet. Change directories to the mcn directory.
Alarm and Slot presence state bits do not match!
A problem was encountered when a hot-swap alarm card was installed in the server.
Run prtdiag to determine the state of the I/O slot. If the alarm card is not listed when you run prtdiag, remove and reinsert the alarm card into the slot.
SCSB: Should NOT remove SCB(#) while cPCI Slot # is in RESET with a possible bad board. scsb#0: Slot # Now out of Reset!
The system controller board was removed from the server while the amber Okay to Remove LED was ON for an I/O slot.
Enable basic hot-swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot-swap is enabled on all I/O slots, it is safe to remove the system controller board from the server.
scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Offlinescsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Online!!!
The CompactPCI board lost its HEALTHY report.
The CompactPCI board failed or is damaged. Configure the board, then unconfigure it using the instructions in Chapter 6. If the error messages repeat, then the board has failed. Replace the I/O board, using the instructions in Chapter 6.
If the system has taken the board offline because the board stopped sending a HEALTHY signal, the following message is displayed:
scsb#0: Slot # successfully taken offline
scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline.
The system identified an I/O board that is sending repeated interrupts, and the system has taken the board offline.
Replace the I/O board using the instructions in Chapter 6.
scsb#0: Could not Update %s LEDs.scsb#0: Could not Blink %s LEDs.
An Inter-Integrated Circuit (I2C) error resulted in an LED change failure. The LEDs on the system status panel might give incorrect information as a result.
Use the prtdiag tool to print the correct LED states. Remove and reinstall the system controller board to correct the problem. See Chapter 8 for instructions.
scsb#0: hsc_board_healthy: No Slot Info.
A disabled slot that is no longer being monitored by the system (due to errors or user request) is having HEALTHY state changes and sending full-hot-swap style interrupts to the CPU.
Remove the I/O board from the slot. If the error messages repeat, set the I/O slot to basic hot-swap, using the instructions in Chapter 5.
scsb#0: hsc_enum_intr: No Last Board Insertion Info.
A CompactPCI board that is probably damaged is installed in an I/O slot in the system. The board has an error causing it to continually interrupt the CPU with hot-swap service events when there is no change to the board's state. The board continually reports itself "inserted" after it has been acknowledged. Because no board is "claiming" the event, no slot number can be given. See also scsb#0: Slot # successfully taken offline.
Remove the I/O board from the server, using the instructions in Chapter 6. If the error message repeats, the system controller board might have failed. Try replacing the system controller board using the instructions in Chapter 8.
scsb#0: hsc_restore: Cannot reset disconnected slot #
The system controller board was installed in the server while the amber Okay to Remove LED was ON for an I/O slot.
Enable basic hot-swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot-swap is enabled on all I/O slots, remove the system controller board from the server.
scsb0: I2C TRANSFER Failedscsb0: Error Reading Healthy# Registersscsb#0: scsb_reset_slot: error reading Reset regs
An error occurred when the scsb driver received the retry command from the system controller board.
Retry. If the error persists, the system controller board is damaged and should be replaced. See Chapter 8 for instructions.
scsb#0: no HEALTHY# signal on slot#
You tried to connect or configure a hot-swappable I/O board that was not reporting itself HEALTHY. The board has failed or was not inserted properly.
Remove the I/O board from the server and reinsert it, making sure the board is completely and properly inserted into the server. If the error message repeats, then the board has failed. Replace the I/O board, using the instructions in Chapter 6.
scsb#0: Reset Not Asserted on Healthy# Failed slot#
You rebooted the system with a failed board. The OpenBoot PROM has taken it out of reset and probed it.
The board is probably damaged and should not be used. Unconfigure the board manually, and remove the board from the system using the instructions in Chapter 6.
scsb#0: slot # Occupant configured, Regained HEALTHY#!scsb#0: slot # Occupant Unconfigured, Regained HEALTHY#!
A CompactPCI board is sending conflicting HEALTHY and UNHEALTHY signals.
The board has failed. Replace the I/O board using the instructions in Chapter 6.
scsb#0: Successfully Downgraded to Basic Hotswap Mode
Basic hot-swap was enabled on the system.
scsb#0: Successfully Upgraded to Full Hotswap Mode
Full hot-swap was enabled on the system.
Interrupt Level 4--Not serviced
Such a message occurring intermittently is always a result of the underlying hardware doing something unpredictable.
Transient interrupts occur when, for example, a fan is starting to fail, and it fails long enough to generate an interrupt but then resumes operation. By the time the fan driver is queried, it denies the interruption because now it is functioning normally.
The condition is a result of the architecture of interrupt generation and response. As long as the generating hardware has resumed normal operation, no further action is required.
Interrupt Level 4--Not serviced
This message, occurring continuously, signals a soft hang of the system. The presenting symptom is that the system is noticeably sluggish because it is busy processing interrupts.
A soft hang occurs when a component such as a power supply sends a level high interrupt and keeps it high. The kernel notices and polls the devices. Each device answers negatively, including the culprit power supply. Meanwhile, the CPU continues with minimal work before returning to the querying process. This error condition is a serious problem because the failing component remains unidentified.
Completely power the server off, then on again using the instructions in Chapter 2. When the system boots, it always boots interrupts low (masked), and attaches the drivers one by one. Use OpenBoot PROM commands to probe the components and determine which one has failed.
NO ADDERSS ACK 80
This message indicates a problem with Inter-Integrated Circuit (I2C), and often it's the pcf8584 driver that complains, followed by the address it is trying to access (for example, NO ADDRESS ACK 80. indicates a problem with address 80, which is the fixed address of the system controller board).
Most of the Sun drivers print a secondary error message, but the principal error message comes from pcf8584. The interface to this is through an ioctl, so it is done through software. This message indicates a problem, but not the severity. Sometimes such a message is normal.
For example when a power supply is removed, the Present line goes low and the SCB sets the bit high (interrupt). The kernel pcf8584 goes down the device line querying for interrupts in the order in which the devices boot, each one answering. The message 8584 NO ADDR ACK 0x9E occurs when the device is removed. Because it happened after the driver tried to query the hardware, this spurious error message occurs. This condition happens with fans and power supplies.
If the error message occurs during a hot-swap operation, it is erroneous and can be ignored. If the error message occurs during normal operation, it might indicate a problem with the I2C device.
Bus busy, cleared after initializing
This is a transient 12C error message.
Usually no action is necessary because the system recovers from most transient 12C errors. If the system becomes unresponsive, completely power the server off, then power it back on. Watch the power-on self-test (POST) messages to determine the cause of the error.
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.