A P P E N D I X  C

Error Messages

This appendix describes error messages that you might see while operating or servicing a Netra CT server, their meanings, and the actions necessary for each. All error messages in this appendix are written to the /var/adm/messages file on your system.

 


TABLE C-1 Netra CT Server Error Messages

Error Message

Page Number

scsb Error Messages

 

Alarm and Slot presence state bits do not match!

"Invalid Cross-Reference Format"

SCSB: Should NOT remove SCB(#) while cPCI Slot # is in RESET with a possible bad board. scsb#0: Slot # Now out of Reset!

"Invalid Cross-Reference Format"

scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Offline scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Online!!!

"Invalid Cross-Reference Format"

scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline.

"Invalid Cross-Reference Format"

scsb#0: Could not Update %s LEDs. scsb#0: Could not Blink %s LEDs.

"Invalid Cross-Reference Format"

scsb#0: hsc_board_healthy: No Slot Info.

"Invalid Cross-Reference Format"

scsb#0: hsc_enum_intr: No Last Board Insertion Info.

"Invalid Cross-Reference Format"

scsb#0: hsc_restore: Cannot reset disconnected slot #

"Invalid Cross-Reference Format"

scsb0: I2C TRANSFER Failed scsb0: Error Reading Healthy# Registers scsb#0: scsb_reset_slot: error reading Reset regs

"Invalid Cross-Reference Format"

scsb#0: no HEALTHY# signal on slot#

"Invalid Cross-Reference Format"

scsb#0: Reset Not Asserted on Healthy# Failed slot#

"Invalid Cross-Reference Format"

scsb#0: slot # Occupant configured, Regained HEALTHY#! scsb#0: slot # Occupant Unconfigured, Regained HEALTHY#!

"Invalid Cross-Reference Format"

scsb#0: Successfully Downgraded to Basic Hotswap Mode

"Invalid Cross-Reference Format"

scsb#0: Successfully Upgraded to Full Hotswap Mode

"Invalid Cross-Reference Format"

Anticipated Hardware Failure

 

Interrupt Level 4--Not serviced

"Invalid Cross-Reference Format"

Interrupt Level 4--Not serviced

"Invalid Cross-Reference Format"

I2C Complaints

 

NO ADDERSS ACK 80

"Invalid Cross-Reference Format"

Bus Busy Complaints

 

Bus busy, cleared after initializing

"Invalid Cross-Reference Format"



C.1 Generic Error Messages

Message
This program must be run on the same chassis.
Action

You must restart mcnet. Change directories to the mcn directory.

Then, enter this command:


# ./mcnet start


C.2 scsb Error Messages

Message
Alarm and Slot presence state bits do not match!
Cause

A problem was encountered when a hot-swap alarm card was installed in the server.

Action

Run prtdiag to determine the state of the I/O slot. If the alarm card is not listed when you run prtdiag, remove and reinsert the alarm card into the slot.

Message
SCSB: Should NOT remove SCB(#) while cPCI Slot # is in RESET with a possible bad board. scsb#0: Slot # Now out of Reset!
Cause

The system controller board was removed from the server while the amber Okay to Remove LED was ON for an I/O slot.

Action

Enable basic hot-swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot-swap is enabled on all I/O slots, it is safe to remove the system controller board from the server.

Message
scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Offlinescsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Online!!!
Cause

The CompactPCI board lost its HEALTHY report.

Action

The CompactPCI board failed or is damaged. Configure the board, then unconfigure it using the instructions in Chapter 6. If the error messages repeat, then the board has failed. Replace the I/O board, using the instructions in Chapter 6.

If the system has taken the board offline because the board stopped sending a HEALTHY signal, the following message is displayed:

Message
scsb#0: Slot # successfully taken offline
scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline.
Cause

The system identified an I/O board that is sending repeated interrupts, and the system has taken the board offline.

Action

Replace the I/O board using the instructions in Chapter 6.

Message
scsb#0: Could not Update %s LEDs.scsb#0: Could not Blink %s LEDs.
Cause

An Inter-Integrated Circuit (I2C) error resulted in an LED change failure. The LEDs on the system status panel might give incorrect information as a result.

Action

Use the prtdiag tool to print the correct LED states. Remove and reinstall the system controller board to correct the problem. See Chapter 8 for instructions.

Message
scsb#0: hsc_board_healthy: No Slot Info.
Cause

A disabled slot that is no longer being monitored by the system (due to errors or user request) is having HEALTHY state changes and sending full-hot-swap style interrupts to the CPU.

Action

Remove the I/O board from the slot. If the error messages repeat, set the I/O slot to basic hot-swap, using the instructions in Chapter 5.

Message
scsb#0: hsc_enum_intr:  No Last Board Insertion Info.
Cause

A CompactPCI board that is probably damaged is installed in an I/O slot in the system. The board has an error causing it to continually interrupt the CPU with hot-swap service events when there is no change to the board's state. The board continually reports itself "inserted" after it has been acknowledged. Because no board is "claiming" the event, no slot number can be given. See also scsb#0: Slot # successfully taken offline.

Action

Remove the I/O board from the server, using the instructions in Chapter 6. If the error message repeats, the system controller board might have failed. Try replacing the system controller board using the instructions in Chapter 8.

Message
scsb#0: hsc_restore:  Cannot reset disconnected slot #
Cause

The system controller board was installed in the server while the amber Okay to Remove LED was ON for an I/O slot.

Action

Enable basic hot-swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot-swap is enabled on all I/O slots, remove the system controller board from the server.

Message
scsb0: I2C TRANSFER Failedscsb0: Error Reading Healthy# Registersscsb#0: scsb_reset_slot: error reading Reset regs
Cause

An error occurred when the scsb driver received the retry command from the system controller board.

Action

Retry. If the error persists, the system controller board is damaged and should be replaced. See Chapter 8 for instructions.

Message
scsb#0: no HEALTHY# signal on slot#
Cause

You tried to connect or configure a hot-swappable I/O board that was not reporting itself HEALTHY. The board has failed or was not inserted properly.

Action

Remove the I/O board from the server and reinsert it, making sure the board is completely and properly inserted into the server. If the error message repeats, then the board has failed. Replace the I/O board, using the instructions in Chapter 6.

Message
scsb#0: Reset Not Asserted on Healthy# Failed slot#
Cause

You rebooted the system with a failed board. The OpenBoot PROM has taken it out of reset and probed it.

Action

The board is probably damaged and should not be used. Unconfigure the board manually, and remove the board from the system using the instructions in Chapter 6.

Message
scsb#0: slot # Occupant configured, Regained HEALTHY#!scsb#0: slot # Occupant Unconfigured, Regained HEALTHY#!
Cause

A CompactPCI board is sending conflicting HEALTHY and UNHEALTHY signals.

Action

The board has failed. Replace the I/O board using the instructions in Chapter 6.

Message
scsb#0: Successfully Downgraded to Basic Hotswap Mode
Cause

Basic hot-swap was enabled on the system.

Action

No action is necessary.

Message
scsb#0: Successfully Upgraded to Full Hotswap Mode
Cause

Full hot-swap was enabled on the system.

Action

No action is necessary.


C.3 Anticipated Hardware Failure

C.3.1 Transient Interrupts

Message
Interrupt Level 4--Not serviced
Cause

Such a message occurring intermittently is always a result of the underlying hardware doing something unpredictable.

Transient interrupts occur when, for example, a fan is starting to fail, and it fails long enough to generate an interrupt but then resumes operation. By the time the fan driver is queried, it denies the interruption because now it is functioning normally.

Action

The condition is a result of the architecture of interrupt generation and response. As long as the generating hardware has resumed normal operation, no further action is required.

C.3.2 Soft Hang

Message
Interrupt Level 4--Not serviced
Cause

This message, occurring continuously, signals a soft hang of the system. The presenting symptom is that the system is noticeably sluggish because it is busy processing interrupts.

A soft hang occurs when a component such as a power supply sends a level high interrupt and keeps it high. The kernel notices and polls the devices. Each device answers negatively, including the culprit power supply. Meanwhile, the CPU continues with minimal work before returning to the querying process. This error condition is a serious problem because the failing component remains unidentified.

Action

Completely power the server off, then on again using the instructions in Chapter 2. When the system boots, it always boots interrupts low (masked), and attaches the drivers one by one. Use OpenBoot PROM commands to probe the components and determine which one has failed.


C.4 I2C Complaints

Message
NO ADDERSS ACK 80
Cause

This message indicates a problem with Inter-Integrated Circuit (I2C), and often it's the pcf8584 driver that complains, followed by the address it is trying to access (for example, NO ADDRESS ACK 80. indicates a problem with address 80, which is the fixed address of the system controller board).

Most of the Sun drivers print a secondary error message, but the principal error message comes from pcf8584. The interface to this is through an ioctl, so it is done through software. This message indicates a problem, but not the severity. Sometimes such a message is normal.

For example when a power supply is removed, the Present line goes low and the SCB sets the bit high (interrupt). The kernel pcf8584 goes down the device line querying for interrupts in the order in which the devices boot, each one answering. The message 8584 NO ADDR ACK 0x9E occurs when the device is removed. Because it happened after the driver tried to query the hardware, this spurious error message occurs. This condition happens with fans and power supplies.

Action

If the error message occurs during a hot-swap operation, it is erroneous and can be ignored. If the error message occurs during normal operation, it might indicate a problem with the I2C device.


C.5 Bus Busy Complaints

Message
Bus busy, cleared after initializing
Cause

This is a transient 12C error message.

Action

Usually no action is necessary because the system recovers from most transient 12C errors. If the system becomes unresponsive, completely power the server off, then power it back on. Watch the power-on self-test (POST) messages to determine the cause of the error.