C H A P T E R 4 |
Firmware |
The Netra CP3010 board contains a modular firmware architecture that gives you latitude in controlling boot initialization. You can customize the initialization, test the firmware, and even enable the installation of a custom operating system.
This platform also employs the Intelligent Platform Management controller (IPMC)--described in Section 5.1.5, Intelligent Platform Management Controller (IPMC)--which controls the system management, hot-swap control, and some board hardware. The IPMC configuration is controlled by separate firmware.
This chapter contains the following sections:
Power-on self-test (POST) is a firmware program that helps determine whether a portion of the system has failed. POST verifies the core functionality of the system, including the CPU modules, motherboard, memory, and some on-board I/O devices. The software then generates messages that can be useful in determining the nature of a hardware failure. You can run POST even if the system is unable to boot.
POST detects most system faults and is located in the motherboard's OpenBoot PROM. You can program the OpenBoot software to run POST at power-on by setting two environment variables: the diag-switch? and the diag-level flag. These two variables are stored on the system configuration card.
POST runs automatically when the system power is applied, or following an automatic system reset, if all of the following conditions apply:
If diag-level is set to min or max, POST performs an abbreviated or extended test, respectively.
If diag-level is set to menus, a menu of all the tests executed at power-on is displayed.
POST diagnostic and error message reports are displayed on a console.
You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables. Changes to OpenBoot configuration variables take effect only after the system is restarted.
TABLE 4-1 lists the most important and useful of these variables. You can find instructions for changing OpenBoot configuration variables in Section 4.5.1, Viewing and Setting OpenBoot Configuration Variables.
Refer to the OpenBoot PROM Enhancements for Diagnostic Operation (817-6957) document for more information.
Note - These variables affect OpenBoot diagnostics tests as well as POST diagnostics. |
Once POST diagnostics have finished running, POST reports the status of each test to the OpenBoot firmware. Control then reverts back to the OpenBoot firmware code.
If POST diagnostics do not uncover a fault, and your server still does not start up, run OpenBoot diagnostics tests.
Where value is min, max, or menus, depending on the quantity of diagnostic information you want to see.
The system runs POST diagnostics if post-trigger is set to user-reset. Status and error messages are displayed in the console window. If POST detects an error, it displays an error message describing the failure.
5. When you have finished running POST, restore the value of diag-switch? to false by typing:
Resetting diag-switch? to false minimizes boot time.
OpenBoot PROM commands are commands you type at the ok prompt. OpenBoot PROM commands that can provide useful diagnostic information include:
1. Halt the system to reach the ok prompt.
Inform users before you shut down the system.
2. Type the appropriate command at the console prompt.
Refer to the OpenBoot 4.x Command Reference Manual (816-1177) for more commands.
The Solaris OS provides some predefined device aliases for the network devices so that you do not need to type the full device path name. TABLE 4-2 lists the network device aliases, the default Solaris OS device names, and associated ports for the Netra CP3010 board. The devalias command can be used to display the device aliases.
The probe-scsi and probe-scsi-all commands diagnose problems with the SCSI devices.
Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, issuing the probe-scsior probe-scsi-allcommand can hang the system. |
The probe-scsi command communicates with all SCSI devices connected to on-board SCSI controllers. The probe-scsi-all command also accesses devices connected to any host adapters installed in PCI slots.
For any SCSI device that is connected and active, the probe-scsi and probe-scsi-all commands display its loop ID, host adapter, logical unit number, unique worldwide name (WWN), and a device description that includes type and manufacturer.
The following sample output is from the probe-scsi command.
{1} ok probe-scsi Target 0 Unit 0 Disk SEAGATE ST373307LSUN72G 0207 Target 1 Unit 0 Disk SEAGATE ST336607LSUN36G 0207 {1} ok |
The following sample output is from the probe-scsi-all command.
{1} ok probe-scsi-all /pci@1c,600000/scsi@2,1 /pci@1c,600000/scsi@2 Target 0 Unit 0 Disk SEAGATE ST373307LSUN72G 0207 Target 1 Unit 0 Disk SEAGATE ST336607LSUN36G 0207 {1} ok |
The probe-ide command communicates with all Integrated Drive Electronics (IDE) devices connected to the IDE bus. This is the internal system bus for media devices such as the DVD drive.
Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, issuing the probe-idecommand can hang the system. |
CODE EXAMPLE 4-3 shows sample output from the probe-ide command.
{1} ok probe-ide Device 0 ( Primary Master ) Not Present Device 1 ( Primary Slave ) Not Present Device 2 ( Secondary Master ) Not Present Device 3 ( Secondary Slave ) Not Present {1} ok |
The show-devs command lists the hardware device paths for each device in the firmware device tree. CODE EXAMPLE 4-4 shows some sample output.
See the OpenBoot PROM Enhancements for Diagnostic Operation (817-6957) document for information on OpenBoot diagnostics. This document can be found on the Sun documentation web site at:
http://www.sun.com/documentation
Summaries of the results from the most recent power-on self-test (POST) and OpenBoot diagnostics tests are saved across power cycles.
2. Do either of the following:
This command produces a system-dependent list of hardware components, along with an indication of which components passed and which failed POST or OpenBoot diagnostics tests.
Switches and diagnostic configuration variables stored in the IDPROM determine how and when POST diagnostics and OpenBoot Diagnostics tests are performed. This section explains how to access and modify OpenBoot configuration variables. For a list of important OpenBoot configuration variables, see TABLE 4-1.
Changes to OpenBoot configuration variables take effect at the next reboot.
Halt the server to display the ok prompt.
The following example shows a short excerpt of this command's output.
The watch-net diagnostics test monitors Ethernet packets on the primary network interface. The watch-net-all diagnostics test monitors Ethernet packets on the primary network interface and on any additional network interfaces connected to the system board. Good packets received by the system are indicated by a period (.). Errors such as the framing error and the cyclic redundancy check (CRC) error are indicated with an X and an associated error description.
To start the watch-net diagnostic test, type the watch-net command at the ok prompt.
{0} ok watch-net Internal loopback test -- succeeded. Link is -- up Looking for Ethernet Packets. `.' is a Good Packet. `X' is a Bad Packet. Type any key to stop................................. |
To start the watch-net-all diagnostic test, type watch-net-all at the ok prompt.
{0} ok watch-net-all /pci@1f,0/pci@1,1/network@c,1 Internal loopback test -- succeeded. Link is -- up Looking for Ethernet Packets. `.' is a Good Packet. `X' is a Bad Packet. Type any key to stop. |
The Netra CP3010 board boots from the 1-Mbyte system flash PROM device that contains the POST code and OpenBoot PROM. The contents map of this PROM is shown in FIGURE 4-1. User-developed code can also be programmed into the user flash memory space in the form of drop-ins. The system flash can be upgraded by running a program out of the OpenBoot PROM. It is not otherwise accessible by the user.
Note - Automatic system reconfiguration (ASR) is not the same as automatic server restart, which the Netra CT 900 server also supports. |
Automatic system reconfiguration (ASR) consists of self-test features and an auto-configuring capability to detect failed hardware components and unconfigure them. By enabling ASR, the server is able to resume operating after certain nonfatal hardware faults or failures have occurred.
If a component is monitored by ASR and the server is capable of operating without it, the server automatically reboots if that component develops a fault or fails. This capability prevents a faulty hardware component from stopping operation of the entire system or causing the system to fail repeatedly.
If a fault is detected during the power-on sequence, the faulty component is disabled. If the system remains capable of functioning, the boot sequence continues.
To support this degraded boot capability, the OpenBoot firmware uses the 1275 client interface (by means of the device tree) to mark a device as either failed or disabled, creating an appropriate status property in the device tree node. The Solaris OS does not activate a driver for any subsystem marked in this way.
As long as a failed component is electrically dormant (not causing random bus errors or signal noise, for example), the system reboots automatically and resumes operation while a service call is made.
Once a failed or disabled device is replaced with a new one, the OpenBoot firmware automatically modifies the status of the device upon reboot.
Note - ASR is not enabled until you activate it (see Section 4.7.4, Enabling ASR). |
The auto-boot? setting controls whether the firmware automatically boots the operating system after each reset. The default setting is true.
The auto-boot-on-error? setting controls whether the system attempts a degraded boot when a subsystem failure is detected. Both the auto-boot? and auto-boot-on-error? settings must be set to true to enable an automatic degraded boot.
Note - The default setting for auto-boot-on-error? is false. Therefore, the system does not attempt a degraded boot unless you change this setting to true. In addition, the system does not attempt a degraded boot in response to any fatal nonrecoverable error, even if degraded booting is enabled. For examples of fatal nonrecoverable errors, see Section 4.7.2, Error-Handling Summary. |
Error handling during the power-on sequence can be summarized in the following three ways:
Three OpenBoot configuration variables, diag-switch?, diag-trigger, and diag-script, control how the system runs firmware diagnostics in response to system reset events.
The standard system reset protocol bypasses POST and OpenBoot Diagnostics unless diag-switch? is set to true. The default setting for this variable is false. Because ASR relies on firmware diagnostics to detect faulty devices, diag-switch? must be set to true for ASR to run. For instructions, see Section 4.7.4, Enabling ASR.
To control which reset events, if any, automatically initiate firmware diagnostics, use diag-trigger. For detailed explanations of these variables and their uses, see Section 4.1.1, Controlling POST Diagnostics.
1. At the system ok prompt, type:
2. Set the diag-trigger variable to power-on-reset, error-reset, or user-reset.
The system permanently stores the parameter changes and boots automatically if the OpenBoot variable auto-boot? is set to true (its default value).
Note - To store parameter changes, you can also power-cycle the system by using the On/Standby button in the front panel. |
1. At the system ok prompt, type:
The system permanently stores the parameter change.
Note - To store parameter changes, you can also power-cycle the system by using the On/Standby button in the front panel. |
Copyright © 2006, Sun Microsystems, Inc. All Rights Reserved.