|C H A P T E R 4|
Managing RAS Features and System Firmware
This chapter describes how to manage reliability, availability, and serviceability (RAS) features and system firmware, including the Sun Advanced Lights Out Manager (ALOM) system controller, and automatic system recovery (ASR). In addition, this chapter describes how to unconfigure and reconfigure a device manually, and introduces multipathing software.
This chapter contains the following sections:
The introduction of universal serial bus (USB) keyboards with the newest Sun systems has made it necessary to change some of the OpenBoot emergency procedures. Specifically, the Stop-N, Stop-D, and Stop-F commands that were available on systems with non-USB keyboards are not supported on systems that use USB keyboards, such as the Sun Fire server. If you are familiar with the earlier (non-USB) keyboard functionality, this section describes the analogous OpenBoot emergency procedures available in newer systems that use USB keyboards.
The following sections describe how to perform the functions of the Stop commands on systems that use USB keyboards. These same functions are available through the ALOM software.
Stop-A (Abort) key sequence works the same as it does on systems with standard keyboards, except that it does not work during the first few seconds after the server is reset. In addition, you can issue the ALOM system controller break command.
Stop-N functionality is not available. However, the Stop-N functionality can be closely emulated by completing the following steps, provided the system console is configured to be accessible using either the serial management port or the network management port.
1. Log in to the ALOM system controller.
2. Type the following commands:
You can issue the bootmode command without arguments to display the current setting.
3. To reset the system, type the following commands:
4. To view console output as the system boots with default OpenBoot configuration variables, switch to console mode.
5. Type set-defaults to discard any customized IDPROM values and to restore the default settings for all OpenBoot configuration variables.
The Stop-F functionality is not available on systems with USB keyboards.
The Stop-D (Diags) key sequence is not supported on systems with USB keyboards. However, the Stop-D functionality can be closely emulated by setting the virtual keyswitch to diag, using the ALOM setkeyswitch command.
The system provides for automatic system recovery (ASR) from failures in memory modules or PCI cards.
Automatic system recovery functionality enables the system to resume operation after experiencing certain nonfatal hardware faults or failures. When ASR is enabled, the firmware diagnostics automatically detect failed hardware components. An autoconfiguring capability designed into the system firmware enables the system to unconfigure failed components and to restore system operation. As long as the system is capable of operating without the failed component, the ASR features enable the system to reboot automatically, without operator intervention.
The system firmware stores a configuration variable called auto-boot?, which controls whether the firmware will automatically boot the operating system after each reset. The default setting for Sun platforms is true.
Normally, if a system fails power-on diagnostics, auto-boot? is ignored and the system does not boot unless an operator boots the system manually. An automatic boot is not acceptable for booting a system in a degraded state. Therefore, the Sun Fire server OpenBoot firmware provides a second setting, auto-boot-on-error?. This setting controls whether the system will attempt a degraded boot when a subsystem failure is detected. Both the auto-boot? and auto-boot-on-error? switches must be set to true to enable an automatic degraded boot. To set the switches, type:
Note - The default setting for auto-boot-on-error? is true. Therefore, the system attempts a degraded boot unless you change this setting to false. In addition, the system will not attempt a degraded boot in response to any fatal nonrecoverable error, even if degraded booting is enabled. For examples of fatal nonrecoverable errors, see Error Handling Summary.
Error handling during the power-on sequence falls into one of the following three cases:
Note - If POST or OpenBoot firmware detects a nonfatal error associated with the normal boot device, the OpenBoot firmware automatically unconfigures the failed device and tries the next-in-line boot device, as specified by the boot-device configuration variable.
ALOM software lets you display current valid system faults. The showfaults command displays the fault ID, the faulted FRU device, and the fault message to standard output. The showfaults command also displays POST results. For example:
Adding the -v option displays the time:
At the sc> prompt, type:
Multipathing software enables you to define and control redundant physical paths to I/O devices, such as storage devices and network interfaces. If the active path to a device becomes unavailable, the software can automatically switch to an alternate path to maintain availability. This capability is known as automatic failover. To take advantage of multipathing capabilities, you must configure the server with redundant hardware, such as redundant network interfaces or two host bus adapters connected to the same dual-ported storage array.
Three different types of multipathing software are available:
For instructions on how to configure and administer Solaris IP Network Multipathing, consult the IP Network Multipathing Administration Guide provided with your specific Solaris release.
For information about Sun StorEdge Traffic Manager, refer to your Solaris OS documentation.