A P P E N D I X  A

Software Error Messages

This appendix contains information on Netra CT server platform-specific software error messages. Messages are produced by software and firmware running on the alarm card, and by software and firmware running on the Netra CT server, including: the Solaris OS, OpenBoot PROM firmware, the MOH application, and the PMS application.

For Netra CT server platform-specific hardware error messages, refer to the Netra CT Server Service Manual.

For additional information on software error messages not specific to the Netra CT server, refer to:

This appendix includes the following sections:


Overview

This appendix lists error messages in alphabetical order, with the format:

Message, Cause, Action

Alarm Card Messages. Error messages originate from software and firmware on the alarm card itself, such as the BMC and the CLI. In addition, messages from other software, such as the PMS application and the OpenBoot PROM firmware, might be displayed on the alarm card console.

Alarm card error messages are displayed on the alarm card console. They are not saved to a log.

Solaris OS Messages. Messages are displayed on the Netra CPU board console. They are saved to a log in /var/adm/messages.

OpenBoot PROM Firmware Messages. Messages from OpenBoot PROM are displayed through a Netra CPU board console. They can be displayed on the CPU board console itself or on the alarm card console if you are logged in remotely using the CLI console command.

OpenBoot PROM error and warning messages are not saved to a log on either the alarm card or on a CPU board.

MOH Application Messages. These messages are displayed on a CPU board console, on the alarm card console, or on both. They are not saved to a log.

PMS Application Messages. PMS is a high-level application. Thus, faults in various places in the software and hardware underlying this application can result in PMS error messages. For example, a fault could occur on the midplane or on a disk. This situation might make it difficult to isolate where a specific fault is occurring. A solution to many PMS error messages is to reset the alarm card.

PMS error messages are printed to the console you are using to execute the pmsd CLI command; they are not saved to a log on either the alarm card or on a CPU board.


Messages

!!! ALERT !!! Crossing Critical temperature thresholdThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

!!! ALERT!!! Crossing Shutdown temperature thresholdThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

An attempt to start the "protocol" communication server failed, will retry.The problem could be because of a misconfigured primary network interface, or possibly another instance of the agent is running

Cause: A network configuration problem, another MOH agent instance, or another application or process using the MOH port has resulted in the MOH agent's inability to start the RMI server.

Action: If this message occurs on the alarm card, check the network interfaces (for example, make sure the ifeth0 interface has a valid IP address). If this message occurs on a CPU board, check the network interfaces; check to see if an MOH agent is already running, with the command pgrep -fl java; try stopping and restarting the agent; check to see which ports are in use, with the command netstat -a.

An attempt to start the SnmpView failed, will retry. Check the network configuration

Cause: The MOH agent could not start the SNMP view, because of a network configuration problem or because another application or process is using the SNMP port.

Action: This message could occur on the alarm card or on a CPU board. (1) Check the network configuration. (2) Check to see which ports are in use, with the command netstat -a.

CLI: unknown command: use help for valid commands

Cause: You used an alarm card CLI command that is not a valid command.

Action: For a list of valid CLI commands, either use the CLI help command or refer to TABLE 3-1.

console:  All console sessions busy to slot number

Cause: From the alarm card, you tried to open a console session to a CPU board, but the maximum four console sessions for that CPU board are already open.

Action: Either retry connecting later or free up a session to that CPU board.

console: failed to connect to console in slot number

Cause: This message on the alarm card console could indicate an IPMI bus problem or a CPU board configuration problem after you try to open a console connection to a CPU board.

Action: Try opening a console connection to a different slot. If this fails, reset the alarm card and try reconnecting to the same slot.

Error Disabling Temperature Sensor

Cause: This message could occur at power on of the CPU board, after POST has completed, but before the OpenBoot PROM prompt is displayed. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists at power on, the board might need to be returned to SunService.

Error Disabling the Watchdog

Cause: The following OpenBoot PROM commands could generate this message: reset-all, delete-dropin, or add-dropin. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

Error Enabling Temperature Sensor

Cause: This message could occur at power on of the CPU board, after POST has completed, but before the OpenBoot PROM prompt is displayed. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists at power on, the board might need to be returned to SunServiceSM.

Invalid cpu_node number: number

Cause: You entered an invalid node board number for a console connection from the alarm card.

Action: Enter a valid node number, 1 through 7 on the Netra CT 810 or 2 through 5 on the Netra CT 410 server.

Invalid IP mode

Cause: You specified an invalid syntax for the CLI command setipmode.

Action: The setipmode usage is: setipmode -b port_num rarp|config|none. Refer to Configuring the Alarm Card Ethernet Ports for more information.

Invalid slot number

Cause: You specified an invalid slot number for a CLI command that accepts a slot number option.

Action: Refer to TABLE 3-1 for the correct syntax for that particular command.

IP address for the system management bus interface not found -For distributed agent functionalityPlease check the following interface configuration : interface

Cause: The MOH application needs an IP address for the system management network to be able to communicate between the alarm card and the CPU boards. This message displays if the alarm card or a CPU board does not have an IP address for the system management network interface, or if either of these interfaces failed to initialize.

Action: Configure the specified interface and restart the MOH application.

Lower Critical - going highThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Lower Critical - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Lower Non-critical - going highThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Lower Non-critical - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Lower Non-recoverable - going highThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Lower Non-recoverable - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

NFS Portmap: RPC: Rpcbind failure - RPC: Timed out

Cause: Using the CLI flashupdate command with the NFS option might cause NFS timeouts.

Action: (1) Make sure the NFS path is a shared NFS mount. (2) If the shared NFS server is on a different network, make sure that the gateway is properly configured. (3) Check the alarm card network configuration.

Permission denied

Cause: You used an alarm card CLI command for which you do not have the correct user permissions.

Action: For information on CLI command user permissions, either use the CLI help command or refer to CLI Commands.

showfru: failed to get the FRU property

Cause: The CLI showfru command may generate this message. It indicates either (1) A FRU ID (midplane, CPU board, or third-party node board) is not programmed; or (2) An IPMI bus problem occurred.

Action: (1) Make sure your hardware has the FRU ID programmed, for example, check to see if you can read a different FRU property. (2) Reset the alarm card. (3) Power cycle the system. (4) If the error still occurs, contact SunService.

Slot not configured to be managed by PMS Daemon

Cause: Many pmsd CLI commands can generate this message.

Action: Use the pmsd slotaddressset command to set the IP address for the slot.

SUNW_envmond: current temperature (temp) exceeds upper warning temperature (temp)

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary. (3) Check the temperature threshold settings
(prtpicl -v -c temperature-sensor) to make sure they are within range of the chassis environment. Refer to the Netra CP2500 Board Programming Guide for more information.

SUNW_envmond: current temperature (temp) exceeds upper critical temperature (temp)

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary. (3) Check the temperature threshold settings
(prtpicl -v -c temperature-sensor) to make sure they are within range of the chassis environment. Refer to the Netra CP2500 Board Programming Guide for more information.

SUNW_envmond: current temperature (temp) is below lower warning temperature (temp)

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary. (3) Check the temperature threshold settings
(prtpicl -v -c temperature-sensor) to make sure they are within range of the chassis environment. Refer to the Netra CP2500 Board Programming Guide for more information.

SUNW_envmond: current temperature (temp) is below lower critical temperature (temp)

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary. (3) Check the temperature threshold settings
(prtpicl -v -c temperature-sensor) to make sure they are within range of the chassis environment. Refer to the Netra CP2500 Board Programming Guide for more information.

SUNW_picl_watchdog: Error in opening SMC drv

Cause: The watchdog timer failed to access the Netra CT system management controller (SMC) driver.

Action: (1) Check whether your watchdog timer application is accessing the watchdog correctly. Refer to the Netra CP2500 Board Programming Guide or to the Netra CT Server Software Developer's Guide for more information. (2) Reboot the CPU board.

SUNW_picl_watchdog: Error in patting the watchdog

Cause: The watchdog timer failed to access the Netra CT system management controller (SMC) driver.

Action: (1) Check whether your watchdog timer application is accessing the watchdog correctly. Refer to the Netra CP2500 Board Programming Guide and/or to the Netra CT Server Software Developer's Guide for more information. (2) Reboot the CPU board.

SUNW_picl_watchdog: Error in writing to SMC

Cause: The watchdog timer failed to access the Netra CT system management controller (SMC) driver.

Action: (1) Check whether your watchdog timer application is accessing the watchdog correctly. Refer to the Netra CP2500 Board Programming Guide and/or to the Netra CT Server Software Developer's Guide for more information. (2) Reboot the CPU board.

Unable to connect to the ctmgx agent

Cause: This message occurs if the ctmgx stop command is issued on a CPU board, and the MOH agent can't be contacted.

Action: Check to see whether the MOH application is running on the CPU board using the command pgrep -fl java. If it is running, kill the process with the command kill process_id.

Unable to start CPU board PMS Daemon

Cause: The PMS daemon can't be started on a CPU board.

Action: (1) Check to see if a PMS daemon is already running on the CPU board; if there is, stop the daemon and try restarting it. (2) Reboot the CPU board and try restarting the daemon.

Upper Critical - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Upper Non-critical - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

Upper Non-recoverable - going lowThe current threshold setting is: number degreeCThe current temperature is : number degreeC

Cause: A temperature problem, either in the chassis environment (for example, a fan failure) or as configured on the CPU board (for example, a user misconfiguration of a temperature setting), causes this message.

Action: (1) Check the fans to make sure they are working properly; replace if necessary. (2) Check the room environment for proper cooling and adjust if necessary.

WARNING: Could not check healthy line status!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

WARNING: Could not get current execution state!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

WARNING: Could not set previous execution state!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

WARNING: Could not set state break!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

WARNING: Could not set state offline!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.

WARNING: Could not set state online!

Cause: This message could occur while the operating system is being halted or while breaking from the operating system to go to the OpenBoot PROM prompt. The CPU board could be in an unknown state or could have a hardware problem.

Action: Hot-swap the CPU board. If the problem still exists, the board might need to be returned to SunService.