A P P E N D I X  C

Error Messages

This appendix gives several error messages that you might see while operating or servicing your Netra CT server, their meanings, and the actions necessary for each. All error messages in this appendix are written to the /var/adm/messages file on your system.

 

TABLE C-1 Netra CT Server Error Messages

Error Message

Page Number

scsb Error Messages

 

Alarm and Slot presence state bits do not match!

"Invalid Cross-Reference Format"

SCSB: Should NOT remove SCB(#) while cPCI Slot # is in RESET with a possible bad board. scsb#0: Slot # Now out of Reset!

"Invalid Cross-Reference Format"

scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Offline scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Online!!!

"Invalid Cross-Reference Format"

scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline.

"Invalid Cross-Reference Format"

scsb#0: Could not Update %s LEDs. scsb#0: Could not Blink %s LEDs.

"Invalid Cross-Reference Format"

scsb#0: hsc_board_healthy: No Slot Info.

"Invalid Cross-Reference Format"

scsb#0: hsc_enum_intr: No Last Board Insertion Info.

"Invalid Cross-Reference Format"

scsb#0: hsc_restore: Cannot reset disconnected slot #

"Invalid Cross-Reference Format"

scsb0: I2C TRANSFER Failed scsb0: Error Reading Healthy# Registers scsb#0: scsb_reset_slot: error reading Reset regs

"Invalid Cross-Reference Format"

scsb#0: no HEALTHY# signal on slot#

"Invalid Cross-Reference Format"

scsb#0: Reset Not Asserted on Healthy# Failed slot#

"Invalid Cross-Reference Format"

scsb#0: slot # Occupant configured, Regained HEALTHY#! scsb#0: slot # Occupant Unconfigured, Regained HEALTHY#!

"Invalid Cross-Reference Format"

scsb#0: Successfully Downgraded to Basic Hotswap Mode

"Invalid Cross-Reference Format"

scsb#0: Successfully Upgraded to Full Hotswap Mode

"Invalid Cross-Reference Format"

Anticipated Hardware Failure

 

Interrupt Level 4--Not serviced

"Invalid Cross-Reference Format"

Interrupt Level 4--Not serviced

"Invalid Cross-Reference Format"

I2C Complaints

 

NO ADDERSS ACK 80

"Invalid Cross-Reference Format"

Bus Busy Complaints

 

Bus busy, cleared after initiallizing

"Invalid Cross-Reference Format"



C.1 Generic Error Messages

This program must be run on the same chassis.
Action

You must restart mcnet. Change directories to the mcn directory.

Then, enter this command:

# ./mcnet start


C.2 scsb Error Messages

Alarm and Slot presence state bits do not match!
Cause

A problem was encountered when a hot swap alarm card was installed into the server.

Action

Run prtdiag to determine the state of the I/O slot. If the alarm card is not listed when you run prtdiag, reinsert the alarm card into the slot.

 

SCSB: Should NOT remove SCB(#) while cPCI Slot # is in RESET with a possible bad board. scsb#0: Slot # Now out of Reset!
Cause

The system controller board was removed from the server while the amber Okay to Remove LED was ON for an I/O slot.

Action

Enable basic hot swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot swap is enabled on all I/O slots, it is safe to remove the system controller board from the server.

 

scsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Offlinescsb#0: ALERT! Lost HEALTHY# on Slot #, Occupant Online!!!
Cause

The CompactPCI card has lost its HEALTHY report.

Action

The CompactPCI card has failed or is damaged. Configure the card, and then unconfigure it using the instructions in Chapter 6. If the error messages repeat, then the card has failed. Replace the I/O card using the instructions in Chapter 6.

If the system has already taken the card offline because the card stopped sending a HEALTHY signal, the following message is displayed:

scsb#0: Slot # successfully taken offline

 

scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline.
Cause

The system has identified an I/O card that is sending repeated interrupts and has taken it offline.

Action

Replace the I/O card using the instructions in Chapter 6.

 

scsb#0: Could not Update %s LEDs.scsb#0: Could not Blink %s LEDs.
Cause

An I2C error has resulted in an LED change failure. The LEDs on the system status panel may give incorrect information as a result.

Action

Use the prtdiag tool to print the correct LED states. Remove and reinstall the system controller board to correct the problem. Refer to Section 8.2, System Controller Board for those instructions.

 

scsb#0: hsc_board_healthy: No Slot Info.
Cause

A disabled slot that is no longer being monitored by the system (due to errors or user request) is having HEALTHY state changes and sending full hot swap style interrupts to the CPU.

Action

Remove the I/O card from that slot. If the error messages repeat, set the I/O slot to basic hot swap using the instructions in Section 5.2.3.2, Enabling Basic Hot Swap on I/O Slots.

 

scsb#0: hsc_enum_intr:  No Last Board Insertion Info.
Cause

A CompactPCI card that is probably damaged was installed into an I/O slot in the system. The card has some sort of error causing it to continually interrupt the CPU with hot swap service events when there is no change to the board's state. The card continually reports itself 'inserted' after it has already been acknowledged. Since no board is 'claiming' the event, no slot # can be given. Also see Section , scsb#0: Bad (non friendly ?) Board in Slot # ? Taking it Offline..

Action

Remove the I/O card from the server using the instructions in Chapter 6. If the error message repeats, the system controller board may have failed. Try replacing the system controller board using the instructions in Section 8.2, System Controller Board.

 

scsb#0: hsc_restore:  Cannot reset disconnected slot #
Cause

The system controller board was installed in the server while the amber Okay to Remove LED was ON for an I/O slot.

Action

Enable basic hot swap on all the I/O slots in the server using the instructions in Chapter 5. Once basic hot swap is enabled on all I/O slots, remove the system controller board from the server.

 

scsb0: I2C TRANSFER Failedscsb0: Error Reading Healthy# Registersscsb#0: scsb_reset_slot: error reading Reset regs
Cause

An error occured when the scsb driver received the retry command from the system controller board.

Action

Retry. If the error persists, the system controller board is damaged and should be replaced. Refer to Section 8.2, System Controller Board for those instructions.

 

scsb#0: no HEALTHY# signal on slot#
Cause

You tried to connect or configure a hot-swappable I/O card that was not reporting itself HEALTHY. The card has failed or was not inserted properly.

Action

Remove the I/O card from the server and reinsert it, making sure the card is completely and properly inserted into the server. If the error message repeats, then the card has failed. Replace the I/O card using the instructions in Chapter 6.

 

scsb#0: Reset Not Asserted on Healthy# Failed slot#
Cause

You rebooted the system with a failed board. While the board is not reporting itself HEALTHY, the OpenBoot PROM has taken it out of reset and probed it anyway.

Action

The board is probably damaged and should not be used. Unconfigure the board manually and remove the board from the system using the instructions in Chapter 6.

 

scsb#0: slot # Occupant configured, Regained HEALTHY#!scsb#0: slot # Occupant Unconfigured, Regained HEALTHY#!
Cause

A CompactPCI card is sending conflicting HEALTHY and UNHEALTHY signals.

Action

The card has failed. Replace the I/O card using the instructions in Chapter 6.

 

scsb#0: Successfully Downgraded to Basic Hotswap Mode
Cause

Basic hot swap was enabled on the system.

Action

No action is necessary.

 

scsb#0: Successfully Upgraded to Full Hotswap Mode
Cause

Full hot swap was enabled on the system.

Action

No action is necessary.


C.3 Anticipated Hardware Failure

C.3.1 Transient Interrupts

Message
Interrupt Level 4--Not serviced
Cause

Such a message occuring intermittently is always a result of the underlying hardware doing something unpredictable.

Transient interrupts occur when, for example, a fan is starting to fail, and it fails long enough to generate an interrupt and then resumes operation. By the time the fan driver is queried, it denies the interruption because now it is functioning normally.

Action

The condition is a result of the architecture of interrupt generation and response. As long as the generating hardware has resumed normal operation, no further action is required.

C.3.2 Soft Hang

Message
Interrupt Level 4--Not serviced
Cause

This message, occurring continuously, signals a soft hang of the system. The presenting symptom is the system is noticeably sluggish because it is busy processing interrupts.

A soft hang occurs when a component such as a power supply sends a level high interrupt and keeps it high. The kernel notices and polls the devices. Each device answers negative, including the culprit power supply. Meanwhile, the CPU continues with minimal work before returning to the querying process. This is a serious problem because the failing component remains unidentified.

Action

Completely power the server off and then on again using the instructions in Chapter 2. When the system boots, it always boots interrupts low (masked), and attaches the drivers one by one. You can also use OpenBoot PROM commands to probe the components and determine which one has failed.


C.4 I2C Complaints

Message
NO ADDERSS ACK 80
Cause

This message indicates a problem with I2C, and often it's the pcf8584 driver that complains, followed by the address it was trying to access (for example, NO ADDRESS ACK 80. indicates a problem with address 80, which is the fixed address of the system controller board.

Most of the Sun drivers print a secondary error message, but the principal error message comes from pcf8584. The interface to this is through an ioctl, so its done through software. This message indicates a problem, but not the severity. Sometimes such a message is normal.

For example when a power supply is removed, the Present line goes low and the SCB sets the bit high (interrupt). The kernel pcf8584 goes down the device line querying for interrupts in the order in which the devices boot, each one answering. The message 8584 NO ADDR ACK 0x9E occurs when the device is removed. Because it happened after the driver tried to query the hardware, this spurious error message occurs. This happens with fans and power supplies.

Action

If the error message occurs during a hot swap operation, it is erroneous and should be ignored. If the error messge occurs during normal operation, it may indicate a problem with the I2C device.


C.5 Bus Busy Complaints

Message
Bus busy, cleared after initiallizing
Cause

This is a transient 12C error message.

Action

Usually no action is necessary because the system should recover from most transient 12C errors. If the system becomes unresponsive, completely power the server off and then power it back on. Watch the Power-On Self-Test messages to determine the cause for the error.