C H A P T E R 11

Domain Events

Event monitoring periodically checks the domain and hardware status to detect conditions that require an action. The action taken is determined by the condition and can involve reporting the condition or initiating automated procedures to deal with it. This chapter describes the events that are detected by monitoring and the requirements with respect to actions taken in response to detected events.

This chapter includes the following sections:

Message Logging

Domain Reboot Events

Domain Panic Events

Solaris Software Hang Events

Hardware Configuration Events

Environmental Events

Hardware Error Events

SC Failure Events

Message Logging

SMS logs all significant actions other than logging or updating user monitoring displays taken in response to an event. Log messages for significant domain software events and their response actions are written to the message log file for the affected domain located in /var/opt/SUNWSMS/adm/domain-id/messages. Included in the log is information to support subsequent servicing of the hardware or software.

SMS writes log messages for significant hardware events to the platform log file located in /var/opt/SUNWSMS/adm/platform/messages. SMS writes log messages to /var/opt/SUNWSMS/adm/domain-id/messages for significant hardware events that can visibly affect one or more domains of the affected domains.

The actions taken in response to events that crash domain software systems include automatic system recovery (ASR) reboots of all affected domains, provided that the domain hardware (or a bootable subset thereof) meets the requirements for safe and correct operation.

SMS also logs domain console, syslog, event, post, and dump information and manages sms_core files.

Log File Maintenance

SMS software maintains SC-resident copies of all server information that it logs. Use the showlogs(1M) command to access log information.

The platform message log file can be accessed only by administrators for the platform, using the following command:

sc0:sms-user:> showlogs

SMS log information relevant to a configured domain can be accessed only by administrators for that domain. SMS maintains separate log files for each domain. To access the files, type the following command:

sc0:sms-user:> showlogs -d domain-indicator

where:

-d domain-indicator

Specifies the domain using:

domain-id - ID for a domain. Valid domain-ids are A-R and are not case sensitive.

domain-tag - Name assigned to a domain using addtag(1M).

SMS maintains copies of domain syslog files on the SC in /var/opt/SUNWSMS/adm/domain-id/syslog.The syslog information can be accessed only by administrators for that domain.

To access the information, type the following command:

sc0:sms-user:> showlogs -d domain-indicator  -p s

Solaris console output logs are maintained to provide valuable insight into what happened before a domain crashed. Console output is available on the SC for a crashed domain in /var/opt/SUNWSMS/adm/domain-id/console. console information can be accessed only by administrators for that domain.

To access the information, type the following command:

sc0:sms-user:> showlogs -d domain-indicator  -p c

XIR state dumps, generated by the reset command, can be displayed using showxirstate. For more information, refer to the showxirstate man page.

Domain post logs are for service diagnostic purposes and are not displayed by showlogs or any SMS CLI.

The /var/tmp/sms_core.daemon files are binaries and not viewable.

The availability of various log files on the SC supports analysis and correction of problems that prevent a domain or domains from booting. For more information, refer to the showlogs man page.

Note - Panic dumps for panicked domains are available in the /var/crash logs on the domain, not on the SC.

TABLE 11-1 lists the SMS log information types and their descriptions.

TABLE 11-1 SMS Log Type Information
Type	Description
Firmware versioning	Unsuitable configuration of firmware version at firmware invocation is automatically corrected and logged.
Power-on self test	LED fault; platform and domain messages detailing why a fault LED was illuminated.
Power control	All power operations are logged.
Power control	Power operations that violate hardware requirements or hardware suggested procedures.
Power control	Use of override to forcibly complete a power operation.
Domain console	Automatic logging of console output to a standard file.
Hardware configuration	Part numbers are used to identify board type in message logs.
Fault and error event monitoring and actions	List of all fault events or error reports written to the event log.
Event monitoring and actions	All significant environmental events (those that require taking action).
Event monitoring and actions	All significant actions taken in response to environmental events.
Domain event monitoring and actions	All significant domain software events and their response actions.
Event monitoring and actions	Significant hardware events written to the platform log.
Event monitoring and actions	All significant clock input failures, clock input switch failures, and loss or gain of phase lock.
Domain event monitoring and actions	Significant hardware events that visibly affect one or more domains are written to the domain logs.
Domain boot initiation	Initiation of each boot and the passage through each significant stage of booting a domain is written to the domain log.
Domain boot failure	Boot failures are logged to the domain log.
Domain boot failures	All ASR recovery attempts are logged to the domain log.
Domain panic	Domain panics are logged to the domain log.
Domain panic	All ASR recovery attempts are logged to the domain log.
Domain panic hang	Each occurrence of a domain hang and its accompanying information is logged to the domain log.
Domain panic	All ASR recovery attempts after a domain panic and hang are logged to the domain log.
Repeated domain panic	All ASR recovery attempts after repeated domain panics are logged to the domain message log.
Solaris OS hang events	All OS hang events are logged to the domain message log.
Solaris OS hang events	All OS hang events result in a domain panic in order to obtain a core image for analysis of the Solaris hang. This information and subsequent recovery action is logged to the domain message log.
Solaris OS hang events	SMS monitors for the inability of the domain software to satisfy the request to panic. Upon determining noncompliance with the panic request, SMS aborts the domain and initiates an ASR reboot. All subsequent recovery action is logged to the domain message file.
Hot-plug events	All HPU insertion events of system boards to a domain are logged in the domain message log.
Hot-unplug events	All HPU removals are logged to the platform message log.
Hot-unplug events	All HPU removals from a domain are logged to the domain message log.
POST-initiated configuration events	All POST-initiated hardware configuration changes are logged in `/var/opt/SUNWSMS/adm/`domain-id`/post`.
Environmental events	All sensor measurements outside of acceptable operational limits are logged as environmental events to the platform log file.
Environmental events	All environmental events that affect one or more domains are logged to the domain message log.
Environmental events	Significant actions taken in response to environmental events are logged to the platform message log.
Environmental events	Significant actions taken in response to environmental events within a domain are logged to the domain message log.
Hardware error events	Hardware error and related information is logged to the platform message log.
Hardware error events	Hardware error and related information within a domain is logged to the domain message file.
Hardware error events	Log entries about hardware error for which data was collected include the name of the data files.
Hardware error events	All significant actions taken in response to hardware error events are logged to the platform message log.
Hardware error events	All significant actions taken in response to hardware error events affecting a domains are logged to the domains message log.
SC failure events	All SC hardware failure and related information is logged to the platform message log.
SC failure events	The occurrence of an SC failover event is logged to the platform message log.

Log File Management

SMS manages the log files, as necessary, to keep the SC disk utilization within acceptable limits.

The message log daemon (mld) monitors message log size, file count per directory, and age every 10 minutes. The mld daemon executes when it reaches the first limit. TABLE 11-2 lists the MLD default settings.Table listing default MLD settings.

TABLE 11-2 MLD Default Settings
	File Size (in Kb)	File Count
SMI event log	2500	10
Platform messages	2500	10
Domain messages	2500	10
Domain console	2500	10
Domain `syslog`	2500	10
Domain post	20000*	1000
Domain dump	20000*	1000
sms-core.daemon	100000	20

^{* total per directory, not per file}

Assuming 20 directories, the defaults represent approximately 4 Gbytes of stored logs.

Caution - The parameters shown in TABLE 11-2are stored in the file /etc/opt/SUNWSMS/config/mld_tuning. For any changes to take effect, mldmust be stopped and restarted. Only an administrator experienced with system disk utilization should edit this file. Improperly changing the parameters in this file could flood the disk and hang or crash the SC.

When a log message file reaches the size limit, mld does the following:

Starting with the oldest message file x.X, it moves that file to x.X+1, except when the oldest message file is message.9 or core file is sms_core.daemon.1; then it starts with x.X-1.

For example, messages becomes messages.0, messages.0 becomes messages.1 and so on up to messages.9. When messages reaches 2.5 Mbytes, then messages.9 is deleted, all files are bumped up by one and a new empty messages file is created.

When a log file reaches the file count limit, mld does the following:

When messages or sms_core.daemon reaches its count limit, then the oldest message or core file is deleted.

When a log file reaches the age limit, mld does the following:

When any message file reaches x days, it is deleted.

Note - By default, the age limit (*_log_keep_days) is set to zero and not used.

When a postdate.time.sec.log or a dump-name.date.time.sec file reaches the file size, count, or age limit, mld deletes the oldest file in the directory.

Note - Post files are provided for service diagnostic purposes and not intended for display.

For more information, refer to the mld and showlogs man pages, and see Message Logging Daemon.

Domain Reboot Events

SMS monitors domain software status (see Software Status) to detect domain reboot events.

Domain Reboot Initiation

Since the domain software is incapable of rebooting itself, SMS software controls the initial sequence for all domain reboots. As a result, SMS is always aware of domain reboot initiation events.

SMS software logs the initiation of each reboot and the passage through each significant stage of booting a domain to the domain-specific log file.

Domain Boot Failure

SMS software detects all domain reboot failures.

Upon detecting a domain reboot failure, SMS logs the reboot failure event to the domain-specific message log.

SC resident per-domain log files are available for failure analysis. In addition to the reboot failure logs, SMS can maintain duplicates of important domain-resident logs and transcripts of domain console output, as described in Log File Maintenance.

Domain reboot failures are handled as follows:

The response to reboot or reset requests is always a fast bringup procedure.

The first attempt to recover a domain from software failure uses a quick reboot procedure.

The first attempt to recover a domain from hardware failure uses the reboot procedure. The POST default diagnostic level is used in the reboot procedure.

If the domain recovery fails during the POST run, dsmd retries POST at the default diagnostic level for up to six consecutive domain recovery failures after the first recovery attempt fails.

If the domain recovery fails during the IOSRAM layout, OpenBoot PROM download and jump, OpenBoot PROM run, or Solaris software boot, dsmd reruns POST at the default diagnostic level. For subsequent failures of this type, for each recovery dsmd runs POST at a test diagnostic level higher than the previous level. The dsmd daemon retries domain recovery domain at the default level for up to six attempts after the first recovery attempt fails. All in all, dsmd tries domain recovery attempts at most seven times.

Once the system has been recovered and Solaris software has been booted, any domain failure within four hours is treated as a repeated domain failure and is recovered by running POST at a higher diagnostic level.

If there are no domain failures within four hours of Solaris software running, then the domain is considered successfully recovered and healthy.

A subsequent domain hardware failure is handled by the reboot procedure.

A subsequent domain software failure is handled by a quick reboot procedure, and the reboot or reset request is handled by the fast bringup procedure.

SMS tries all ASR methods at its disposal to boot a domain that has failed booting. All recovery attempts are logged in the domain-specific message log.

Domain Panic Events

When a domain panics, it informs dsmd so that a recovery reboot can be initiated. The panic is reported as a domain software status change (see Software Status).

Domain Panic

The dsmd daemon is informed when the Solaris software on a domain panics.

Upon detecting a domain panic, dsmd logs the panic event to the domain-specific message log.

SC resident per-domain log files are available to assist in domain panic analysis. In addition to the panic logs, SMS can maintain duplicates of important domain-resident logs and transcripts of domain console output, as described in Log File Maintenance.

In general, after an initial panic where there has been no prior indication of hardware errors, SMS requests that a fast reboot be tried to bring up the domain. For more information, see Domain Reboot.

The dsmd daemon handles a panic event as follows:

If the domain recovery fails during the POST run, the dsmd daemon retries POST at the default diagnostic level for up to six consecutive domain recovery failures after the first recovery attempt fails.

If the domain recovery fails during the IOSRAM layout, OpenBoot PROM download and jump, OpenBoot PROM run, or Solaris software boot, the dsmd daemon reruns POST at the default diagnostic level. For subsequent failures of this type, for each recovery dsmd runs POST at a test diagnostic level higher than the previous level. The dsmd daemon retries domain recovery at the default level for up to six attempts after the first recovery attempt fails. (dsmd makes a maximum of seven domain recovery attempts.)

Once the system has been recovered and Solaris software has been booted, any domain failure within four hours is treated as a repeated domain failure and is recovered by running POST at a higher diagnostic level.

If there are no domain failures within four hours of Solaris software startup the domain is considered successfully recovered and healthy.

A subsequent domain hardware failure is handled by the reboot procedure.

A subsequent domain software failure is handled by a quick reboot procedure, and the reboot or reset request is handled by the fast bringup procedure.

This recovery action is logged in the domain-specific message log.

Domain Panic Hang

The Solaris panic dump logic has been redesigned to minimize the possibility of hangs at panic time. In a panic situation, Solaris software might operate differently, either because normal functions are shut down or because it is disabled by the panic. An ASR reboot of a panicked Solaris domain is eventually started, even if the panicked domain hangs before it can request a reboot.

Since the normal heartbeat monitoring (see Solaris Software Hang Events) of a panicked domain might not be appropriate or sufficient to detect situations where a panicked Solaris domain does not proceed to request an ASR reboot, dsmd takes special measures as necessary to detect a domain panic hang event.

Upon detecting a panic hang event, dsmd logs each occurrence, including event information, to the domain-specific message log.

Upon detection of a domain panic hang (if any), SMS aborts the domain panic (see Domain Abort or Reset) and initiates an ASR reboot of the domain. dsmd logs these recovery actions in the domain-specific message log.

SC-resident log files are available to assist in panic hang analysis. In addition to the panic hang event logs, the dsmd daemon maintains duplicates of important domain-resident logs and transcripts of domain console output on the SC, as described in Log File Maintenance.

Repeated Domain Panic

If a second domain panic is detected shortly after recovering from a panic event, dsmd classifies the domain panic as a repeated domain panic event.

In addition to the standard logging actions that occur for any panic, the following actions are taken when attempting to reboot after the repeated domain panic event:

With each successive repeated domain panic event, SMS attempts to run POST at a higher diagnostic test level to boot against the next untried administrator-specified degraded configuration (see Degraded Configuration Preferences).

After all degraded configurations have been tried, successive repeated domain panic events continue full-test-level boots using the last specified degraded configuration.

Upon determining that a repeated domain panic event has occurred, dsmd tries the ASR method at its disposal to boot a stable domain software environment. The dsmd daemon logs all recovery attempts in the domain-specific message log.

Solaris Software Hang Events

The dsmd daemon monitors the Solaris heartbeat described in Solaris Software Heartbeat in each domain while Solaris software is running (see Software Status). When the heartbeat indicator is not updated for a period of time, a Solaris software hang event occurs.

The dsmd daemon detects Solaris software hangs.

Upon detecting a Solaris hang, dsmd logs the event, including event information, to the domain-specific message log.

Upon detecting a Solaris hang, dsmd requests the domain software to panic so that it can obtain a core image for analysis of the Solaris hang (Domain Abort or Reset). SMS logs this recovery action in the domain-specific message log.

The dsmd daemon monitors the inability of the domain software to satisfy the request to panic. Upon determining noncompliance with the panic request, the dsmd daemon aborts the domain (see Domain Abort or Reset) and initiates an ASR reboot. The dsmd daemon logs these recovery actions in the domain-specific message log.

Although the core image taken as a result of the panic is available for analysis only from the domain, SC-resident log files are available to assist in domain hang analysis. In addition to the Solaris hang event logs, the dsmd daemon can maintain duplicates of important domain-resident logs and transcripts of domain console output on the SC.

Hardware Configuration Events

Changes to the hardware configuration status are considered hardware configuration events. esmd detects the following hardware configuration events on a Sun Fire high-end system.

Hot-Plug Events

The insertion of a hot-pluggable unit (HPU) is a hot-plug event. The following actions take place:

SMS detects HPU insertion events and logs each event and additional information to a platform message log file.

If the inserted HPU is a system board in the logical configuration for a domain, SMS also logs its arrival in the domain's message log file.

Hot-Unplug Events

The removal of a hot-pluggable unit (HPU) is a hot-unplug event. The following actions take place:

Upon occurrence of a hot-unplug event, SMS makes a log entry recording the removal of the HPU to the platform message log file.

A hot-unplug event that detects the removal of a system board from a logical domain configuration logs it to that domain's message log file.

POST-Initiated Configuration Events

POST can run against different server components at different times due to domain-related events such as reboots and dynamic reconfigurations. As described in Hardware Configuration, SMS includes status from POST and identifying failed-test components. Consequently, changes in POST status of a component are considered to be hardware configuration events. SMS logs POST-initiated hardware configuration changes to the platform message log.

Environmental Events

In general, environmental events are detected when hardware status measurements exceed normal operational limits. Acceptable operational limits depend upon the hardware and the server configuration.

The esmd daemon verifies that measurements returned by each sensor are within acceptable operational limits. The esmd daemon logs all sensor measurements outside of acceptable operational limits as environmental events to the platform log file.

The esmd daemon also logs significant actions taken in response to an environmental event (such as those beyond logging information or updating user displays) to the platform log file.

The esmd daemon logs significant environmental event response actions that affect one or more domains to the log files of the affected domains.

The esmd daemon handles environmental events by removing from operation the hardware that has experienced the event (and any other hardware dependent upon the disabled component). Hardware can be left in service, however, if continued operation of the hardware does not harm the hardware or cause hardware functional errors.

The options for handling environmental events are dependent upon the characteristics of the event. All events have a time frame during which the event must be handled. Some events kill the domain software; some do not. Event response actions are such that esmd responds within the event time frame.

There are a number of responses esmd can make to environmental events, such as increasing fan speeds. In response to a detected environmental event that requires a powering off, esmd undertakes one of the following corrective actions:

The esmd daemon uses immediate power off if there is no other option that meets the time constraints.

If the environment event does not require immediate power off and the component is a MaxCPU board, esmd attempts to DR the endangered board out of the running domain and power it off.

If the environment event does not require immediate power off and the component is a centerplane support board (CSB), esmd attempts to reconfigure the bus traffic to use only the other CSB and power the component off.

Where possible, if the environment event does not require immediate power off and the component is any type of board other than a MaxCPU or CSB, esmd notifies dsmd of the environment condition and dsmd sends an orderly shutdown request to the domain. The domain flushes uncommitted memory buffers to physical storage.

If the software is still running and a viable domain configuration remains after the affected hardware is removed, dsmd attempts to recover the domain.

If either of the last two options takes longer than the allotted time for the given environmental condition, esmd immediately powers off the component regardless of the state of the domain software.

SMS illuminates the Fault indicator on any hot-pluggable unit that can be identified as the cause of an environmental event.

So long as the environmental event response actions do not include shutdown of the system controllers, all domains whose software operations were terminated by an environmental event or the ensuing response actions are subject to ASR reboot as soon as possible.

ASR reboot begins immediately if there is a bootable set of hardware that can be operated in accordance with constraints imposed by the Sun Fire high-end system to assure safe and correct operation.

Note - Loss of system controller operation (for example, by the requirement to power both SCs down) eliminates all possibility of Sun Fire high-end platform self-recovery actions being taken. In this situation, some recovery actions can require human intervention. Although an external monitoring agent might not be able to recover the Sun Fire high-end platform operation, that monitoring agent could still serve an important role in notifying an administrator about the Sun Fire high-end platform shutdown.

The following sections provide a little more detail about each type of environmental event that can occur on an Sun Fire high-end system.

Over-Temperature Events

The esmd daemon monitors temperature measurements from Sun Fire high-end systems hardware for values that are too high. There is a critical temperature threshold that, if exceeded, is handled as quickly as possible by powering off the affected hardware. High, but not critical, temperatures are handled by attempting slower recovery actions, such as a graceful shutdown or DR for the MCPU boards.

Power Failure Events

There is very little opportunity to do anything when a full power failure occurs. The entire platform, domains as well as SCs, is shut off when the plug is pulled without the benefit of a graceful shutdown. The ultimate recovery action occurs when power is restored (see Power-On Self-Test (POST)).

Out-of-Range Voltage Events

Power voltages for Sun Fire high-end systems are monitored to detect out-of-range events. The handling of out-of-range voltages follows the general principles outlined at the beginning of Environmental Events.

Under-Power Events

In addition to checking for adequate power before powering on any boards, as mentioned in Power Control, the failure of a power supply could leave the server inadequately powered. The system is equipped with power supply redundancy in the event of failure. The esmd daemon does not take any action (other than logging) in response to a bulk power supply hardware failure. The handling of under power events follows the general principles outlined at the beginning of Environmental Events.

Fan Failure Events

The esmd daemon monitors fans for continuing operation. Should a fan fail, a fan failure event occurs. The handling of fan failures follows the general principles outlined at the beginning of Environmental Events.

Clock Failure Events

The esmd daemon monitors clocks for continuing operation. Should a clock fail, esmd logs a message every 10 minutes. It also turns on manual override so the clock selector on that board never automatically starts using that clock. If the clock returns to good status, esmd turns off manual override and logs a message.

When phase lock is lost, the esmd daemon turns on manual override on all the boards and logs one message. When phase lock returns, esmd turns off manual override on all the boards and logs a message.

Hardware Error Events

As described in Hardware Error Status, the occurrence of Sun Fire high-end system hardware errors is recognized at the SC by more than one mechanism. Of the errors that are directly visible to the SC, some are reported directly by PCI interrupt to the UltraSPARC processor on the SC, and others are detected only through monitoring of the hardware registers on Sun Fire high-end systems.

There are other hardware errors that are detected by the processors running in a domain. Domain software running in the domain detects the occurrence of those errors in the domain, which then reports the error to the SC. Like the mechanism by which the SC becomes aware of the occurrence of a hardware error, the error state retained by the hardware after a hardware error is dependent upon the specific error.

The dsmd daemon performs the following functions:

Implements the mechanisms necessary to detect all SC-visible hardware errors

Implements domain software interfaces to accept reports of domain-detected hardware errors

Collects hardware error data and clears the error state

Logs the hardware error and related information as required, to the platform message log

Logs the hardware error to the domain message log file for all affected domains

If data collected in response to a hardware error is not suitable for inclusion in a log file, the data can be saved in uniquely named files in /var/opt/SUNWSMS/adm/domain-id/dump on the SC.

SMS illuminates the Fault LED on any hot-pluggable unit that can be identified as the cause of a hardware error.

The actions taken in response to hardware errors (other than collecting and logging information as described previosly) are twofold. First, it might be possible to eliminate the further occurrence of certain types of hardware errors by eliminating from use the hardware identified to be at fault. Second, all domains that crashed either as a result of a hardware error or were shut down as a consequence of the first type of action are subject to ASR reboot actions.

Note - Even when hardware is not shutdown or identified to be at fault, the ASR reboot actions are subject to full POST verification. POST eliminates any hardware components that fail testing from the hardware configuration.

In response to each detected hardware error and each domain-software-reported hardware error, dsmd undertakes the appropriate corrective actions. In some cases automatic diagnosis and domain recovery occurs (see Chapter 6), while in other instances, an ASR reboot with full POST verification is initiated for each domain brought down by a hardware error.

Note - Problems with the ASR reboot of a domain after a hardware error are detected as domain boot failure events and subject to the recovery actions described in Domain Boot Failure.

The dsmd daemon logs all significant actions, such as those beyond logging information or updating user displays taken in response to a hardware error in the platform log file. When a hardware error affects one or more domains, dsmd logs the significant response actions in the message log files of the affected domains.

The following sections summarize the types of hardware errors expected to be detected and handled on a Sun Fire high-end system.

Domain Stop Events

Domain stops are uncorrectable hardware errors that immediately terminate the affected domains. Hardware state dumps are taken before dsmd initiates an ASR reboot of the affected domains. These files are located in /var/opt/SUNWSMS/adm/domain-id/dump

The dsmd daemon logs the event in the domain message log file and also the event log file.

CPU-Detected Events

A RED_state or Watchdog reset traps to low-level domain software (OpenBoot PROM or kadb), which reports the error and requests initiation of ASR reboot of the domain.

An XIR signal (reset -x) also traps to low-level domain software (OpenBoot PROM or kadb), which retains control of the software. The domain must be rebooted manually.

Record Stop Events

Correctable data transmission errors (for example, CE ECC errors) can stop the normal transaction history recording feature of ASICs in Sun Fire high-end systems. SMS reports a transmission error as a record stop. SMS dumps the transaction history buffers of these ASICs and re-enables transaction history recording when a record stop is handled. The dsmd daemon records record stops in the domain log file.

Other ASIC Failure Events

ASIC-detected hardware failures other than domain stop or record stop include console bus errors, which might or might not impact a domain. The hardware itself does not abort any domain, but the domain software might not survive the impact of the hardware failure and could panic or hang. The dsmd daemon logs the event in the domain log file.

SC Failure Events

SMS monitors the main SC hardware and running software status as well as the hardware and running software of the spare SC, if present. In a high-availability SC configuration, SMS handles failures of the hardware or software on the main SC or failures detected in the hardware control paths (for example, console bus, or internal network connections) to the main SC by an automatic SC failover process. This cedes main responsibilities to the spare SC and leaves the former main SC as a (possibly crippled) spare.

SMS monitors the hardware of the main and spare SCs for failures.

SMS logs the hardware failure and related information to the platform message log.

SMS illuminates the Fault LED on a system controller with an identified hardware failure.

For more information, see Chapter 12.