C H A P T E R  6

Configuring System Firmware

This chapter describes the OpenBoot firmware commands and configuration variables available for configuring the following aspects of the Sun Fire V890 system behavior:

In addition, this chapter provides information about keyboard commands and alternative methods for performing OpenBoot emergency procedures.

Tasks covered in this chapter include:

Other information covered in this chapter includes:



Note - To enhance system restoration and server availability, Sun has recently introduced a new standard (default) OpenBoot firmware configuration. These changes, which affect the behavior of servers like the Sun Fire V890, are described in OpenBoot PROM Enhancements for Diagnostic Operation. This document is included on the Sun Fire V890 Documentation CD.





Note - The procedures in this chapter assume that you are familiar with the OpenBoot firmware and that you know how to enter the OpenBoot environment. For more information about the OpenBoot firmware, see the OpenBoot 4.x Command Reference Manual, which is available at http://docs.sun.com, under Solaris on Sun Hardware. Refer to the Sun Fire V890 Server Product Notes for late-breaking details.




About OpenBoot Environmental Monitoring

Environmental monitoring and control capabilities for Sun Fire V890 systems reside at both the operating system level and the OpenBoot firmware level. This ensures that monitoring capabilities are operational even if the system has halted or is unable to boot. Whenever the system is under OpenBoot control, the OpenBoot environmental monitor checks the state of the system power supplies, fans, and temperature sensors every 30 seconds. If it detects any voltage, current, fan speed, or temperature irregularities, the monitor generates a warning message to the system console. In the event of a critical fan failure or overtemperature condition, the monitor generates a shutdown warning and automatically powers off the system after 30 seconds to prevent hardware damage.

For additional information about the system's environmental monitoring capabilities, see Environmental Monitoring and Control.

Enabling or Disabling the OpenBoot Environmental Monitor

The OpenBoot environmental monitor is enabled by default whenever the system is operating at the ok prompt. However, you can enable or disable it yourself using the OpenBoot commands env-on and env-off. For more information, see:



Note - Using the Stop-A keyboard command to enter the OpenBoot environment will immediately disable the OpenBoot environmental monitor. If you want the OpenBoot environmental monitor enabled, you must re-enable it prior to rebooting the system. If you enter the OpenBoot environment through any other means--by halting the operating system, by power-cycling the system, or as a result of a system panic--the OpenBoot environmental monitor will remain enabled.



Automatic System Shutdown

If the OpenBoot environmental monitor detects a critical fan failure or overtemperature condition, it will initiate an automatic system shutdown sequence. In this case, a warning similar to the following is generated to the system console:


WARNING: SYSTEM POWERING DOWN IN 30 SECONDS!
Press Ctrl-C to cancel shutdown sequence and return to ok prompt.

If necessary, you can type Control-C to abort the automatic shutdown and return to the system ok prompt; otherwise, after the 30 seconds expire, the system will power off automatically.



Note - Typing Control-C to abort an impending shutdown also has the effect of disabling the OpenBoot environmental monitor. This gives you enough time to replace the component responsible for the critical condition without triggering another automatic shutdown sequence. After replacing the faulty component, you must type the env-on command to reinstate OpenBoot environmental monitoring.





caution icon

Caution - If you type Control-C to abort an impending shutdown, you should immediately replace the component responsible for the critical condition. If a replacement part is not immediately available, power off the system to avoid damaging system hardware.



OpenBoot Environmental Status Information

The OpenBoot command .env lets you obtain status on the current state of everything of interest to the OpenBoot environmental monitor. You can obtain environmental status at any time, regardless of whether OpenBoot environmental monitoring is enabled or disabled. The .env status command simply reports the current environmental status information; it does not take action if anything is abnormal or out of range.

For an example of .env command output, see How to Obtain OpenBoot Environmental Status Information.


How to Enable OpenBoot Environmental Monitoring

The OpenBoot environmental monitor is enabled by default whenever the system is operating at the ok prompt. However, you can enable or disable it yourself using the OpenBoot commands env-on and env-off.



Note - The commands env-on and env-off only affect environmental monitoring at the OpenBoot level. They have no effect on the system's environmental monitoring and control capabilities while the operating system is running.



Before You Begin

This procedure assumes that you are familiar with the OpenBoot firmware and that you know how to enter the OpenBoot environment. For more information about the OpenBoot firmware, see the OpenBoot 4.x Command Reference Manual, which is available at http://docs.sun.com, under Solaris on Sun Hardware. Refer to the Sun Fire V890 Server Product Notes for late-breaking details.

What to Do

single-step bulletTo enable OpenBoot environmental monitoring, type env-on at the system ok prompt.


ok env-on
Environmental monitor is ON
ok


How to Disable OpenBoot Environmental Monitoring

The OpenBoot environmental monitor is enabled by default whenever the system is operating at the ok prompt. However, you can enable or disable it yourself using the OpenBoot commands env-on and env-off.



Note - The commands env-on and env-off only affect environmental monitoring at the OpenBoot level. They have no effect on the system's environmental monitoring and control capabilities while the operating system is running.





Note - Using the Stop-A keyboard command to enter the OpenBoot environment will immediately disable the OpenBoot environmental monitor. You must then
re-enable the environmental monitor prior to rebooting the system. If you enter the OpenBoot environment through any other means--by halting the operating system, by power-cycling the system, or as a result of a system panic--the OpenBoot environmental monitor will remain enabled.



What to Do

single-step bulletTo disable OpenBoot environmental monitoring, type env-off at the system ok prompt.


ok env-off
Environmental monitor is OFF
ok


How to Obtain OpenBoot Environmental Status Information

You can use the OpenBoot command .env at the system ok prompt to obtain status information about the system's power supplies, fans, and temperature sensors.

What to Do

single-step bulletTo obtain OpenBoot environmental status information, type .env at the system ok prompt.


ok .env
Environmental Status: 
 
Power Supplies:
PS0:                      Present, receiving AC power
PS1:                      Present, receiving AC power
PS2:                      Present, receiving AC power
 
Fans:
Tray 1 (CPU):             Present, Fan A @ 3225 RPM, Fan B @ 3157 RPM
Tray 2 (CPU):             Present, Fan A @ 3529 RPM, Fan B @ 3571 RPM
Tray 3 (I/O):             Present, Fan A @ 3529 RPM, Fan B @ 3488 RPM
Tray 4 (I/O):             Present, Fan A @ 3157 RPM, Fan B @ 3030 RPM
Fan  5 (IO-Bridge):       Present, Fan   @ 3846 RPM
Fan  6 (IO-Bridge):       Present, Fan   @ 3658 RPM
 
Temperatures:
CMP0:                     Ambient =  32 deg. C, Die =  56 deg. C
CMP1:                     Ambient =  34 deg. C, Die =  52 deg. C
CMP2:                     Ambient =  31 deg. C, Die =  52 deg. C
CMP3:                     Ambient =  33 deg. C, Die =  57 deg. C
CMP4:                     Ambient =  36 deg. C, Die =  59 deg. C
CMP5:                     Ambient =  32 deg. C, Die =  53 deg. C
CMP6:                     Ambient =  33 deg. C, Die =  59 deg. C
CMP7:                     Ambient =  32 deg. C, Die =  56 deg. C
Motherboard:              Ambient =  22 deg. C
I/O Board:                Ambient =  19 deg. C
Disk Backplane 0:         Ambient =  19 deg. C
 
Environmental monitor is ON



Note - You can obtain environmental status at any time, regardless of whether OpenBoot environmental monitoring is enabled. The .env status command simply reports the current environmental status information; it does not take action if anything is abnormal or out of range.




About Automatic System Recovery

To some, automatic system recovery (ASR) implies an ability to shield the operating system in the event of a hardware failure, allowing the operating system to remain up and running. The implementation of ASR on the Sun Fire V890 server is different. ASR on the Sun Fire V890 server provides for automatic fault isolation and restoration of the operating system following non-fatal faults or failures of these hardware components:

In the event of such a hardware failure, firmware-based diagnostic tests isolate the problem and mark the device (using the 1275 Client Interface, via the device tree) as either failed or disabled. The OpenBoot firmware then deconfigures the failed device and reboots the operating system. This all occurs automatically, as long as the Sun Fire V890 system is capable of functioning without the failed component.

Once restored, the operating system will not attempt to access any deconfigured device. This prevents a faulty hardware component from keeping the entire system down or causing the system to crash repeatedly.

As long as the failed component is electrically dormant (that is, it does not cause random bus errors or introduce noise into signal lines), the system reboots automatically and resumes operation. Be sure to contact a qualified service technician about replacing the failed component.

Auto-Boot Options

The auto-boot? OpenBoot configuration variable controls whether the operating system boots after each reset. The default setting for Sun platforms is true.



Note - The system will not boot automatically when it is in service mode. For details, see Reset Scenarios.



If a system fails power-on diagnostics, then auto-boot? is ignored and the system does not start up unless an operator boots the system manually. This behavior obviously provides limited system availability. Therefore, the Sun Fire V890 OpenBoot firmware provides a second OpenBoot configuration variable switch called auto-boot-on-error?. This switch controls whether the system will attempt to boot when a subsystem failure is detected.

Both the auto-boot? and auto-boot-on-error? switches must be set to true (their default values) to enable an automatic boot following the firmware detection of a non-fatal subsystem failure.


ok setenv auto-boot? true
ok setenv auto-boot-on-error? true

The system will not attempt to boot if it is in service mode, or following any fatal non-recoverable error. For examples of fatal non-recoverable errors, see Error Handling Summary.

Error Handling Summary

Error handling during the power-on sequence falls into one of three cases summarized in the following table..


Scenario

System Behavior

Notes

No errors are detected

The system attempts to boot if auto-boot? is true.

By default, auto-boot? and auto-boot-on-error? are both true.

Non-fatal errors are detected

The system attempts to boot if auto-boot? and auto-boot-on-error? are both true.

Non-fatal errors include:

  • IDE bus failure
  • FC-AL subsystem failure[1]
  • Gigabit or Fast Ethernet interface failure
  • USB interface failure
  • Serial interface failure
  • PCI card failure
  • Processor failure[2]
  • Memory failure[3]

Fatal non-recoverable errors are detected

The system will not boot regardless of OpenBoot configuration variable settings.

Fatal non-recoverable errors include:

  • All processors failed
  • All logical memory banks failed
  • Flash RAM cyclical redundancy check (CRC) failure
  • Critical FRU-ID SEEPROM configuration data failure
  • Critical application specific integrated circuit (ASIC) failure



Note - If POST or OpenBoot Diagnostics detects a non-fatal error associated with the normal boot device, the OpenBoot firmware automatically deconfigures the failed device and tries the next-in-line boot device, as specified by the boot-device configuration variable.



Reset Scenarios

The system keyswitch position and two OpenBoot configuration variables,
diag-switch?
and diag-trigger, control whether and how the system runs firmware diagnostics in response to system reset events.

When you set the system keyswitch to the Diagnostics position, the system is in service mode and runs tests at Sun-specified levels, ignoring the settings of OpenBoot configuration variables.

Setting the diag-switch? variable to true also puts the system in service mode, producing exactly the same results as setting the system keyswitch to the Diagnostics position.



Note - Auto-booting is disabled when the system is in service mode.



When you set the system keyswitch to the Normal position, and when the OpenBoot diag-switch? variable is set to false (its default value), the system is in normal mode. When the system is in this mode, you can control diagnostics and auto-boot behavior by setting OpenBoot configuration variables, principally diag-trigger.

The following table describes the various settings (keywords) of the diag-trigger variable. You can use the first three of these keywords in any combination.


Keyword

Function

power-on-reset (default)

Reset caused by power-cycling the system.

error-reset
(default)

Reset caused by certain hardware error events, such as a RED State Exception, Watchdog Reset, or Fatal Resets.

user-reset

Reset caused by operating system panics or by user-initiated commands from OpenBoot (reset-all, boot) or from Solaris OS (reboot, shutdown, init).

none

Diagnostic tests are not executed.


Normal Mode and Service Mode Information

You will find a full description of normal and service modes, as well as detailed information about the OpenBoot configuration variables that affect ASR behavior, in OpenBoot PROM Enhancements for Diagnostic Operation, which is available on the
Sun Fire V890 Documentation CD.

ASR User Commands

The OpenBoot commands .asr, asr-disable, and asr-enable are available for obtaining ASR status information and for manually deconfiguring or reconfiguring system devices. For more information, see:


How to Enable ASR

The automatic system recovery (ASR) feature is enabled by default when the system is in normal mode. However, if you have edited the OpenBoot configuration variables controlling ASR, follow this procedure to restore them. See Reset Scenarios for more information.

What to Do

1. Type the following at the system ok prompt:


ok setenv diag-switch? false
ok setenv auto-boot? true
ok setenv auto-boot-on-error? true

2. Set the diag-trigger and diag-script variables as shown. Type:


ok setenv diag-trigger power-on-reset error-reset
ok setenv diag-script normal

The system permanently stores the parameter changes.


How to Disable ASR

To disable the automatic system recovery (ASR) feature, either place the system in service mode, or edit OpenBoot configuration variables as described in this procedure. See Reset Scenarios for more information.

What to Do

single-step bulletType the following at the system ok prompt:


ok setenv auto-boot-on-error? false

The system permanently stores the parameter change.


About Manually Configuring Devices

This section explains the difference between deconfiguring a device and a slot, tells what happens if you try to deconfigure all of a system's processors, and also discusses how to obtain device paths.

Deconfiguring Devices vs. Slots

For some devices, different things happen when you deconfigure a slot than when you deconfigure the device that resides within a slot.

If you deconfigure a PCI device, the device in question can still be probed by firmware and recognized by the operating system. Solaris OS "sees" such a device, reports it as failed, and refrains from using it.

If you deconfigure a PCI slot, firmware will not even probe the slot, and the operating system will not "know about" any devices that may be plugged into the slot.

In both cases, the devices in question are rendered unusable. So why make the distinction? Occasionally, a device may fail in such a way that probing it disrupts the system. In cases such as these, deconfiguring the slot in which the device resides is more likely to contain the problem.

Deconfiguring All System Processors

You can use the asr-disable command to deconfigure all system processors. Doing this will not crash the system. The OpenBoot system firmware, even though it reports all processors as deconfigured, in actuality keeps one processor functioning well enough to run the firmware.

Device Paths

When manually deconfiguring and reconfiguring devices, you might need to determine the full physical paths to those devices. You can do this by typing:


ok show-devs

The show-devs command lists the system devices and displays the full path name of each device. An example of a path name for a Fast Ethernet PCI card is shown below:


/pci@8,700000/pci@2/SUNW,hme@0,1

You can display a list of current device aliases by typing:


ok devalias

You can also create your own device alias for a physical device by typing:


ok devalias alias_name physical_device_path

where alias_name is the alias that you want to assign, and physical_device_path is the full physical device path for the device.



Note - If you manually deconfigure a device alias using asr-disable, and then assign a different alias to the device, the device will remain deconfigured even though the device alias has changed.



You can determine which devices are currently disabled by typing:


ok .asr

See How to Obtain ASR Status Information.

The related deconfiguration and reconfiguration procedures are covered in:

Device identifiers are listed in:


How to Deconfigure a Device Manually

To support the ability to boot even when nonessential components fail, the OpenBoot firmware provides the asr-disable command, which lets you manually deconfigure system devices. This command "marks" a specified device as disabled, by creating an appropriate "status" property in the corresponding device tree node. By convention, UNIX will not activate a driver for any device so marked.

What to Do

1. At the ok prompt, type:


ok asr-disable device-identifier

where the device-identifier is one of the following:



Note - Manually deconfiguring a single processor causes the entire CPU/Memory board to be deconfigured, including both processors and all memory residing on the board.



OpenBoot configuration variable changes take effect after the next system reset.

2. To effect the changes immediately, type:


ok reset-all



Note - To immediately effect the changes, you can also power cycle the system using the front panel Power button.




How to Reconfigure a Device Manually

You can use the OpenBoot asr-enable command to reconfigure any device that you previously deconfigured with asr-disable.

What to Do

1. At the ok prompt, type:


ok asr-enable device-identifier

where the device-identifier is one of the following:

2. Do one of the following:

a. If you are reconfiguring a processor, power cycle the system using the front panel Power button.

b. If you are reconfiguring any other device, type:


ok reset-all



Note - To reconfigure a processor, you must power cycle the system. The
reset-all command will not suffice to bring the processor back online.




How to Obtain ASR Status Information

What to Do

single-step bulletType the following at the system ok prompt:


ok .asr
ASR Disablement Status
Component:     Status
 
CMP0:          Enabled
Memory Bank0:  Disabled
Memory Bank1:  Enabled
Memory Bank2:  Enabled
Memory Bank3:  Enabled
CMP1/Memory:   Enabled
CMP2/Memory:   Enabled
CMP3/Memory:   Enabled
CMP4/Memory:   Enabled
CMP5/Memory:   Enabled
CMP6/Memory:   Enabled
CMP7/Memory:   Enabled
IO-Bridge8:    Enabled
IO-Bridge9:    Enabled
GPTwo Slots:   Enabled
Onboard SCSI:  Enabled
Onboard FCAL:  Enabled
Onboard GEM:   Enabled
PCI Slots:     Enabled
 
The following devices have been ASR disabled:
/pci@8,700000/TSI,gfxp@5

In the .asr command output, any devices marked disabled have been manually deconfigured using the asr-disable command. In this example, the .asr output shows that one of the memory banks controlled by CMP 0, as well as the frame buffer card in PCI slot 0, have been deconfigured.



Note - The .asr command only shows devices that have been manually disabled using the asr-disable command. It does not show devices that have been automatically deconfigured as a result of failing firmware diagnostics. To see which devices, if any, have failed POST diagnostics, use the show-post-results command, as described in Sun Fire V890 Diagnostics and Troubleshooting. You can find this document at: http://www.sun.com/documentation.



For more information, see:


About OpenBoot Emergency Procedures

The following paragraphs describe the functions of the Stop commands on systems that use USB keyboards, such as the Sun Fire V890 system.

Stop-A Functionality

Stop-A (Abort) issues a break that drops the system into OpenBoot firmware control (indicated by the display of the ok prompt). The key sequence works the same on the Sun Fire V890 server as it does on systems with older keyboards, except that it does not work during the first few seconds after the machine is reset.

Stop-D Functionality

The Stop-D (diags) key sequence is not supported on systems with USB keyboards. However, the Stop-D functionality can be closely emulated by turning the system keyswitch to the Diagnostics position. For more information, see About the Status and Control Panel.

The RSC bootmode diag command also provides similar functionality. For more information, see the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V890 Documentation CD.

Stop-F Functionality

The Stop-F functionality is not available in systems with USB keyboards. However, the RSC bootmode forth command provides similar functionality. For more information, see the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V890 Documentation CD.

Stop-N Functionality

The Stop-N sequence is a method of bypassing problems typically encountered on systems with misconfigured OpenBoot configuration variables. On systems with older keyboards, you did this by pressing the Stop-N sequence while powering on the system.

On systems with USB keyboards, like the Sun Fire V890, the implementation is somewhat more cumbersome, and involves waiting for the system to reach a particular state. For instructions, see How to Implement Stop-N Functionality.

The drawback of using Stop-N on a Sun Fire V890 system is that, if diagnostics are enabled, it can take some time for the system to reach the desired state. Fortunately, an alternative exists: Place the system keyswitch in Diagnostics position.

Placing the system keyswitch in Diagnostics position will override OpenBoot configuration variable settings, allowing the system to recover to the ok prompt and letting you correct misconfigured settings.

Assuming you have access to RSC software, another possibility is to use the RSC bootmode reset_nvram command, which provides similar functionality. For more information, see the Sun Remote System Control (RSC) User's Guide, which is included on the Sun Fire V890 Documentation CD.


How to Implement Stop-N Functionality

Before You Begin

This procedure implements Stop-N functionality on Sun Fire V890 systems, temporarily resetting OpenBoot configuration variables to their default settings. This procedure is most useful if you have not configured your Sun Fire V890 system to run diagnostic tests. You might find it more convenient to use the alternative method of placing the system keyswitch in Diagnostics position. For more background, see:

For information about the system keyswitch, see:

What To Do

1. Turn on the power to the system.

If POST diagnostics are configured to run, both the Fault and OK-to-Remove LEDs on the front panel will blink slowly.

2. Wait until only the system Fault LED begins to blink rapidly.



Note - If you have configured the Sun Fire V890 system to run diagnostic tests, this could take upwards of 30 minutes.



3. Press the front panel Power button twice, with no more than a short, one-second delay in between presses.

A screen similar to the following is displayed to indicate that you have temporarily reset OpenBoot configuration variables to their default values:


Setting NVRAM parameters to default values.
 
Probing I/O buses
 
Sun Fire V890, No Keyboard
Copyright 1998-2004 Sun Microsystems, Inc.  All rights reserved.
OpenBoot x.x, xxxx MB memory installed, Serial #xxxxxxxx.
Ethernet address x:x:x:x:x:x, Host ID: xxxxxxxx.
 
System is operating in Safe Mode and initialized with factory
default configuration.  No actual NVRAM configuration variables
have been changed; values may be displayed with 'printenv' and set
with 'setenv'. System will resume normal initialization and
configuration after the next hardware or software reset.
 
ok



Note - Once the front panel LEDs stop blinking and the Power/OK LED stays lit, pressing the Power button again will begin a graceful shutdown of the system.



What Next

During the execution of OpenBoot firmware code, all OpenBoot configuration variables--including the ones that are likely to cause problems, such as input and output device settings--are temporarily set to "safe" factory default values. The only exception to this is auto-boot, which is set to false.

By the time the system displays the ok prompt, OpenBoot configuration variables have been returned to their original, and possibly misconfigured, values. These values do not take effect until the system is reset. You can display them with the printenv command and manually change them with the setenv command.

If you do nothing other than reset the system at this point, no values are permanently changed. All your customized OpenBoot configuration variable settings are retained, even ones that may have caused problems.

To correct such problems, you must either manually change individual OpenBoot configuration variables using the setenv command, or else type set-defaults to permanently restore the default settings for all OpenBoot configuration variables.


Reference for Device Identifiers

Refer to the following table when manually specifying which devices to deconfigure and reconfigure. The related procedures are covered in:



Note - The device identifiers above are not case-sensitive; you can type them as uppercase or lowercase characters.



You can use wild cards within device identifiers to reconfigure a range of devices, as shown in the following table.


Device Identifiers

Devices

*

All devices

cmp*

All processors

cmp0-bank*, cmp1-bank*, ... cmp7-bank*

All memory banks for each processor

hba*

All PCI bridge chips

gptwo-slot*

All CPU/Memory board slots

pci-slot*

All PCI slots

pci*

All on-board PCI devices (on-board Gigabit Ethernet, FC-AL, and IDE controllers) and all PCI slots




Note - You cannot deconfigure a range of devices. Wild cards are valid only for specifying a range of devices to reconfigure.




1 (TableFootnote) A working alternate path to the boot disk is required. For more information, see About Multipathing Software
2 (TableFootnote) A single processor failure causes the entire CPU/Memory module to be deconfigured. Reboot requires that another functional CPU/Memory module be present.
3 (TableFootnote) Since each physical DIMM belongs to two logical memory banks, the firmware deconfigures both memory banks associated with the affected DIMM. This leaves the CPU/Memory module operational, but with one of the processors having a reduced complement of memory.