A P P E N D I X  B

ALOM CMT Event Messages


Event Message Overview

ALOM CMT sends event messages to several destinations:


Event Severity Levels

Each event has a severity level and corresponding number. These levels and numbers are:

ALOM CMT configuration parameters use the severity levels to determine which event messages are displayed. For information on how sc_clieventlevel and mgt_mailalert use the numerical values of the severity levels, see sc_clieventlevel and mgt_mailalert.


Event Messages

Appendix TABLE B-1 displays startup event messages from the system controller.


TABLE B-1 System Controller Startup Event Messages

Severity

Message

Description

Critical

SC System booted.

ALOM CMT sends this message every time the SC boots. This event is a normal event.

Critical

Preceding SC reset due to
watchdog.

ALOM CMT sends this message at SC boot if the SC detects that it has been reset because of the internal SC watchdog. This message can indicate a problem with the SC hardware if the problem persists.

Critical

Host flash image invalid,
flashupdate required.

ALOM CMT sends this message if the SC reboots during a flashupdate command. This event indicates that the host flash is in an invalid state and the flashupdate command must be used to reprogram the host flash. The system is not allowed to power on while this fault is present.

This fault event message appears in the output of the ALOM CMT showfaults command.

Minor

DHCP network configuration
initiated.

ALOM CMT sends this message if the ALOM CMT parameter if_dhcp is set to true. This message indicates that ALOM CMT has begun DHCP negotiation.

Major

DHCP configuration complete
(from server IP_address).

ALOM CMT sends this message once DHCP negotiation is complete. IP_address is the IP address of the DHCP server that provided the lease information.

Major

No SC IP gateway has been
assigned by the DHCP server

ALOM CMT sends this message if DHCP is used, but the DHCP server has not provided an IP gateway structure in the DHCP lease. Normally this is provided in Tag 3, DHCP_ROUTER_TAG, as detailed in RFC 1533.

Major

DHCP lease lost.

ALOM CMT sends this message if DHCP is used and the DHCP lease is lost. This event indicates that the SC is no longer on the network. ALOM CMT periodically retries to obtain a DHCP lease.

Major

Invalid SC IP gateway
address for the specified SC
IP address and mask.

ALOM CMT sends this message if using a manual IP address and gateway, and the user has provided an invalid gateway address. The IP Gateway must be reachable on the local subnet, based on the IP address and IP netmask provided.


Appendix TABLE B-2 displays SCC PROM event messages from the system controller.


TABLE B-2 System Controller SCC PROM Event Messages

Severity

Message

Description

Critical

SCC data cannot be accessed.

ALOM CMT sends this message at boot. This message indicates that the SCC PROM can not be accessed. There is a problem with the SCC PROM or the SC hardware.

This fault event message appears in the output of the ALOM CMT showfaults command.

Major

SCC is not valid.

ALOM CMT sends this message at boot or when an SCC is inserted while ALOM CMT is running. This message indicates that the SCC PROM is invalid and must be replaced.

This fault event message appears in the output of the ALOM CMT showfaults command.

Major

Replace SCC to avert managed
system shutdown in 60
seconds.

ALOM CMT sends this message if the host power is on while the SCC PROM is removed. Normally this is not possible, as the SCC PROM can only be removed with the cover opened, which causes a managed system power off automatically. This message indicates a problem with the SCC PROM or SC hardware.

Critical

Correct SCC not replaced -
shutting managed system
down.

ALOM CMT sends this message if the SCC PROM is not replaced within the 60 second interval allocated. After this event, the system is powered off.

Major

SCC has been inserted.

ALOM CMT sends this message when the SCC PROM is inserted.

Major

Correct SCC replaced -
managed system shutdown
cancelled.

ALOM CMT sends this message if an SCC PROM has been replaced during the 60 second shutdown interval.

Major

Correct SCC not replaced -
managed system shutdown
continuing.

ALOM CMT sends this message if a different SCC PROM is inserted while the managed system is shutting down.

Major

Different SCC detected.
SC will reset itself
momentarily.

ALOM CMT sends this message if a different SCC PROM is inserted. The SC must reset itself to reinitialize configuration and network parameters based on the contents of the replacement SCC PROM.

Critical

SCC platform data is not
valid, will be replaced by
SC nvram data.

ALOM CMT sends this message if an SCC PROM is inserted with invalid contents. The SCC PROM contents are protected by a checksum to detect data corruption. If the data is corrupted the SCC PROM data is erased and replaced by the SC NVRAM data.

Critical

SCC NVRAM data updated to
new version while preserving
data.

ALOM CMT sends this message if the new SC firmware has a newer version of NVRAM data than is currently stored on the NVRAM hardware. This message indicates that the data format has been updated. Existing data should be preserved. After this message appears, the user should check the output of the showsc command to ensure that the configuration parameters are still valid and are set correctly. The new firmware image might have added new configuration parameters or removed pre-existing parameters. Refer to the release notes of the firmware image for more information.


 

Appendix TABLE B-3 displays usage event messages from the system controller.


TABLE B-3 System Controller Usage Event Messages

Severity

Message

Description

Major

SC Request to Power Off
Host.

ALOM CMT sends this message whenever the SC requests a host power off, including when a user types the poweroff command.

Major

SC Request to Power Off Host
Immediately.

ALOM CMT sends this message when the SC requires an immediate host power off, including when a user types the poweroff -f command.

Critical

Host system has shut down.

ALOM CMT sends this message when the host power has turned off. It is also normal for this event to be sent when the host has reset itself.

Minor

SC Request to Power On Host.

ALOM CMT sends this message when the SC requests a host power on, either because of sc_powerstatememory or when a user types the poweron command.

Major

SC Request to Reset Host.

ALOM CMT sends this message when the SC requests a host reset, including when a user types the reset command.

Critical

Host System has Reset.

ALOM CMT sends this message when the SC detects that the host has reset. This message is followed immediately by the Host system has shut down event message because reset is implemented as a powercycle on these systems.

Major

SC Request to send Break to
host.

ALOM CMT sends this message when the SC sends a break request to the host, such as when a user types the break command.

Minor

SC date/time has been set to
date_and_time.

ALOM CMT sends this message when a user types the setdate command to modify the SC date or time.

Major

SC firmware was reloaded.

ALOM CMT sends this message after the SC firmware has been reloaded after operation of the flashupdate command.

Minor

SC set bootmode to normal.

ALOM CMT sends this message after a user changes the bootmode to normal using the bootmode command.

Minor

SC set bootmode to
reset_nvram, will expire
date_and_time.

ALOM CMT sends this message after a user changes the bootmode to reset_nvram with the bootmode command. date_and_time are the date and time that the bootmode setting expires, ten minutes from the time the command was run.

Minor

SC set bootscript to
bootscript.

ALOM CMT sends this message after a user changes the bootmode bootscript. The bootscript is the text of the bootscript provided by the user.

Minor

Host System has read and
cleared bootmode.

ALOM CMT sends this message after the host has booted and read the bootmode and bootscript. After this event the bootmode and bootscript are reset to normal.

Minor

Keyswitch position has been
changed to keyswitch_position.

ALOM CMT sends this message after a user changes the keyswitch position with the setkeyswitch command. The keyswitch_position is the new keyswitch position.

Minor

Indicator indicator_name is
now indicator_state.

ALOM CMT sends this message any time an indicator, such as an LED, changes state. The indicator_name is the name of the indicator, and indicator_state is the new state of the indicator. Normally this is in response to platform events such as power on or power off events, fault events, disk ready-to-remove events from the host, and so on. Refer to your platform's administration guide for more information about the platform's indicators and their states.

Major

Failed to send email alert
for recent event.

ALOM CMT sends this message if the if_emailalerts parameter is set to true, but an email alert could not be sent. Check the mgt_mailhost and mgt_mailalert settings and the status of your network mail server to resolve the issue.

Major

Failed to send email alert
to the primary mailserver.

ALOM CMT sends this message if the if_emailalerts parameter is set to true, but an email alert could not be sent. Check the mgt_mailhost and mgt_mailalert settings and the status of your network mail server to resolve the issue.

Major

Email alerts will not be
sent while network is
disabled.

ALOM CMT sends this message if if_emailalerts is set to true, but if_network is set to false. To correct the problem, either disable email alerts or enable the SC network.

Minor

SC Login: User username
Logged on.

ALOM CMT sends this message when users log in. The username is the name of the user who just logged in.

Minor

SC Login: User username
Logged out.

ALOM CMT sends this message when users log out. The username is the name of the user who just logged out.

Major

SC Login Failure for user
username.

ALOM CMT sends this message if a username has failed login five times in a five minute period. The username is the name of the user whose login attempt failed.

Major

SC Request to Dump core
host.

ALOM CMT sends this message when an ALOM CMT user sends a request to the host to dump core by typing the break -D command.

Major

SC Host Watchdog Reset
Disabled.

ALOM CMT sends this message when a user has set the sys_autorestart variable to none.

Critical

Host Watchdog timeout.

ALOM CMT sends this message when the host watchdog has timed out and the sys_autorestart variable has been set to none. The SC will not perform any corrective measures.

Critical

SC Request to Dump core Host
due to Watchdog.

ALOM CMT sends this message when the host watchdog has timed out and the sys_autorestart variable has been set to dumpcore. The SC attempts to perform a core dump of the host to capture error state information. The dump core feature is not supported by all OS versions.

Critical

SC Request to Reset Host due
to Watchdog.

ALOM CMT sends this message when the host watchdog has timed out and the sys_autorestart variable has been set to reset. Then the SC attempts to reset the host.


Appendix TABLE B-4 displays environmental monitoring event messages from the system controller.


TABLE B-4 Environmental Monitoring Event Messages

Severity

Message

Description

Critical

SC can't determine platform
type.

ALOM CMT sends this message if the SC is unable to determine the platform hardware properties. The SC goes into a degraded mode and prevents many operations. This message indicates a problem with the platform hardware or the SC hardware.

Minor

SC Environment Poller:
Cannot open i2c device.

ALOM CMT sends this message if the I2C interface cannot be opened. Environmental monitoring will not be enabled. This message indicates a problem with the SC hardware. This event will accompany other events, such as SC can't determine platform type.

Major

Required device_type at location
is not present.

ALOM CMT sends this message if a required piece of hardware monitoring is not present. . This indicates a problem with the platform hardware. device_type is the type of device (sensor, indicator, and so on) and location indicates the location and the name of the device. The device location indicates which FRU the device is installed on. Normally this indicates a problem with that FRU. If multiple FRUs are listed, location can point to a problem with the SC hardware rather than the individual FRUs.

Critical

Chassis cover removed.

ALOM CMT sends this message if the chassis cover has been removed. The platform hardware turns managed system power off immediately as a precautionary measure. The event message System poweron is disabled should accompany this message to prevent the use of the poweron command while the chassis cover is removed.

Critical

System poweron is disabled.

ALOM CMT sends this message when the SC refuses to power on the system, either through the user poweron command or by the front panel power button. The SC disables power on because of an accompanying event, such as the event indicated by the message Chassis cover removed. Other possibililities include a device failure or insufficient fan cooling.

Minor

System poweron is enabled.

ALOM CMT sends this message after the condition that caused power on to be disabled (indicated by the preceding System poweron is disabled message) has been rectified. For example, by replacing the chassis cover or installing sufficient fans to cool the system.

Major

Device at location has FAILED.

Device at location has FAULTED.

ALOM CMT sends this message when a failure or a fault is detected. A fault is a lower priority condition that indicates the system is operating in a degraded mode. A failure is a higher priority condition indicating that a FRU has failed and should be replaced. Device is the type of device that has failed, such as SYS_FAN, PSU, CURRENT_SENSOR, DOC, or FPGA. The location is the location and name of the device that has the error condition. The location and name of the device match the output of the ALOM CMT showenvironment command.

This fault event message appears in the output of the ALOM CMT showfaults command.

Minor

Device at location is OK.

ALOM CMT sends this message to indicate that a prior fault or failure has recovered or been repaired. The fields (Device and location) are the same as the prior fault or failure event.

Critical

Device_type at location has
exceeded low warning
threshold.

Device_type at location has
exceeded low soft shutdown
threshold.

Device_type at location has
exceeded low hard shutdown
threshold.

Device_type at location has
exceeded high warning
threshold.

Device_type at location has
exceeded high soft shutdown
threshold.

Device_type at location has
exceeded high hard shutdown
threshold.

ALOM CMT sends these messages when analog measurement sensors have exceeded the specified threshold. The threshold that was exceeded is included in the message. Device_type is the type of device which has failed, such as VOLTAGE_SENSOR or TEMP_SENSOR. The location is the location and name of the device that has the error condition. The location and name of the device match the output of the ALOM CMT showenvironment command.

For TEMP_SENSOR events, this message could indicate a problem outside of the server, such as the temperature in the room or blocked airflow in or out of the server. For VOLTAGE_SENSOR events, this message indicates a problem with the platform hardware or possibly with add-on cards installed.

These fault event messages appear in the output of the ALOM CMT showfaults command.

Minor

Device_type at location is
within normal range.

ALOM CMT sends this message when an analog measurement sensor no longer exceeds any warning or failure thresholds. This message is sent only if the sensor reading recovers sufficiently within the boundaries of the failure parameters. The message might not match the current output of the ALOM CMT showenvironment command.

Critical

SC initiating soft host
system shutdown due to fault
at location.

SC initiating hard host
system shutdown due to fault
at location.

ALOM CMT sends this message when the SC has started a system shutdown due to a fault. The location is the location and name of the faulty device that has caused the shutdown.

Critical

SC initiating soft host
system shutdown due to
insufficient fan cooling.

ALOM CMT sends this message to indicate that the SC has started a shutdown because there are not enough working fans necessary to keep the system cooled. The number of fans necessary to maintain system cooling depends on the platform. See your platform manuals for more information.

Critical

Host Power Failure:
MB_DC_POK Fault.

ALOM CMT sends this message to indicate a problem with the power convertors or Power-OK sensors. The system is unable to remain powered on as a result. This message indicates a problem with the platform hardware. The SC will attempt to powercycle the system to recover from the fault.

This fault event message appears in the output of the ALOM CMT showfaults command.

Major

Power cycling Host System.
Please wait.

ALOM CMT sends this message to indicate that the SC is performing a platform power cycle after a Power-OK fault.

Critical

Host Power: MB_DC_POK is OK.

ALOM CMT sends this message to indicate that the system has recovered from a prior Power-OK sensor fault. If the failure happens again this may indicate a problem with either the platform hardware or the SC hardware.

Major

Host system poweron failed
due to fault at sensor.

ALOM CMT sends this message to indicate that the SC is unable to power on the system. The sensor is a device such as the MB/FF_POK. This fault indicates a problem with either the platform hardware or the SC hardware.

This fault event message appears in the output of the ALOM CMT showfaults command.

Critical

Host system failed to power
off.

ALOM CMT sends this message if the SC is unable to power off the system. This indicates a problem with either the platform hardware or the SC hardware. The system should be manually unplugged to prevent damage to the platform hardware.

This fault event message appears in the output of the ALOM CMT showfaults command.

Major

FRU_type at location has been
removed.

FRU_type at location has been
inserted.

ALOM CMT sends these messages to indicate that a FRU has been removed or inserted. The field FRU_type indicates the type of FRU, such as SYS_FAN, PSU, or HDD. The field location indicates the location and name of the FRU, as shown in the output of the the showenvironment command.

Major

Input power unavailable for
PSU at location.

ALOM CMT sends this message to indicate that a power supply is not receiving input power. This message normally indicates that the power supply is not plugged in to AC power. If the power cords are plugged in to an outlet that is provided power, this message indicates a problem with the power supply itself.

This fault event message appears in the output of the ALOM CMT showfaults command.


 

Appendix TABLE B-5 displays host monitoring event messages from the system controller


TABLE B-5 Host Monitoring Event Messages

Severity

Message

Description

Critical

Component deemed faulty and
disabled.

ALOM CMT sends this message when a component has been disabled, either automatically by POST discovering a fault or by a user typing the disablecomponent command. Component is the component disabled, which will be an entry from the platform showcomponent command.

This fault event message appears in the output of the ALOM CMT showfaults command.

Critical

Component reenabled.

ALOM CMT sends this message when a component is enabled. This includes a user typing the enablecomponent command or FRU replacement if the component itself is a FRU (such as a DIMM). Component is the name of the component shown in the output of the platform showcomponent command.

Major

Host detected fault,
MSGID: SUNW-MSG-ID.

ALOM CMT sends this message when the Solaris PSH software diagnoses a fault. The SUNW-MSG-ID of the fault is an ASCII identifier which can be entered at http://www.sun.com/msg for more information about the nature of the fault and the steps to repair.

This fault event message appears in the output of the ALOM CMT showfaults command.

Major

Dropping ereports, message
queue is full.

ALOM CMT sends this message to indicate that the hardware has encountered a flood of hardware errors which could not be disabled at the source. This message indicates that some errors have been lost because of insufficient memory space to store the excessive events.

Major

Location has been replaced;
faults cleared.

ALOM CMT sends this message after the replacement of a FRU that contained a host-detected fault. Location is the location and name of the FRU which was replaced. This event can be received at SC boot, or after FRUs have been swapped and the chassis cover is closed.

Major

Existing faults detected in
FRU_PROM at location.

ALOM CMT sends this message to indicate that the SC has detected a new FRU with pre-existing faults logged into its FRU PROM. This event can occur when either a FRU or the SC card is moved from one system to another. The location is the name of the SEEPROM on the replaced FRU, such as MB/SEEPROM.

The most recent existing fault will be imported from the FRU PROM onto the showfaults list. The entry on the showfaults list is the fault imported, not this message.