C H A P T E R  4

 


Firmware

The Netra CP3010 board contains a modular firmware architecture that gives you latitude in controlling boot initialization. You can customize the initialization, test the firmware, and even enable the installation of a custom operating system.

This platform also employs the Intelligent Platform Management controller (IPMC)--described in Section 5.1.5, Intelligent Platform Management Controller (IPMC)--which controls the system management, hot-swap control, and some board hardware. The IPMC configuration is controlled by separate firmware.

This chapter contains the following sections:


4.1 Power-On Self-Test Diagnostics

Power-on self-test (POST) is a firmware program that helps determine whether a portion of the system has failed. POST verifies the core functionality of the system, including the CPU modules, motherboard, memory, and some on-board I/O devices. The software then generates messages that can be useful in determining the nature of a hardware failure. You can run POST even if the system is unable to boot.

POST detects most system faults and is located in the motherboard's OpenBoot PROM. You can program the OpenBoot software to run POST at power-on by setting two environment variables: the diag-switch? and the diag-level flag. These two variables are stored on the system configuration card.

POST runs automatically when the system power is applied, or following an automatic system reset, if all of the following conditions apply:

If diag-level is set to min or max, POST performs an abbreviated or extended test, respectively.

If diag-level is set to menus, a menu of all the tests executed at power-on is displayed.

POST diagnostic and error message reports are displayed on a console.

4.1.1 Controlling POST Diagnostics

You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables. Changes to OpenBoot configuration variables take effect only after the system is restarted.

TABLE 4-1 lists the most important and useful of these variables. You can find instructions for changing OpenBoot configuration variables in Section 4.5.1, Viewing and Setting OpenBoot Configuration Variables.

Refer to the OpenBoot PROM Enhancements for Diagnostic Operation (817-6957) document for more information.

 

 


TABLE 4-1 OpenBoot Configuration Variables

OpenBoot Configuration Variable

Description and Keywords

auto-boot

Determines whether the system automatically boots. Default is true.

  • true - System automatically boots after initialization, provided no firmware-based (diagnostics or OpenBoot) errors are detected.
  • false - System remains at the ok prompt until you type boot.

diag-level

Specifies the level or type of diagnostics that are executed. Default is max.

  • off - No testing.
  • min - Basic tests are run.
  • max - More extensive tests might be run, depending on the device. Memory is extensively checked.

diag-script

Determines which devices are tested by OpenBoot Diagnostics. Default is normal.

  • none - OpenBoot Diagnostics do not run.
  • normal - Tests all devices that are expected to be present in the system's baseline configuration for which self-tests exist.
  • all - Tests all devices that have self-tests.

diag-switch?

Controls diagnostic execution in normal mode. Default is false.

  • true - Diagnostics are only executed on power-on reset events, but the level of test coverage, verbosity, and output is determined by user-defined settings.
  • false - Diagnostics are executed upon next system reset, but only for those class of reset events specified by the OpenBoot configuration variable.
    diag-trigger. The level of test coverage, verbosity, and output is determined by user-defined settings.

diag-trigger

Specifies the class of reset event that causes diagnostics to run automatically. Default setting is power-on-reset error-reset.

  • none - Diagnostic tests are not executed.
  • error-reset - Reset that is caused by certain hardware error events such as RED State Exception Reset, Watchdog Resets, Software-Instruction Reset, or Hardware Fatal Reset.
  • power-on-reset - Reset that is caused by power cycling the system.
  • user-reset - Reset that is initiated by an operating system panic or by user-initiated commands from OpenBoot (reset-all or boot) or from Solaris (reboot, shutdown, or init).
  • all-resets - Any kind of system reset.

Note: Both POST and OpenBoot Diagnostics run at the specified reset event if the variable diag-script is set to normal or all. If diag-script is set to none, only POST runs.

input-device

Selects where console input is taken from. Default is ttya.

  • ttya - From built-in SERIAL MGT port.
  • ttyb - From built-in general purpose serial port (10101).

output-device

Selects where diagnostic and other console output is displayed. Default is ttya.

  • ttya--To built-in SERIAL MGT port.
  • ttyb--To built-in general purpose serial port (10101).



Note - These variables affect OpenBoot diagnostics tests as well as POST diagnostics.



Once POST diagnostics have finished running, POST reports the status of each test to the OpenBoot firmware. Control then reverts back to the OpenBoot firmware code.

If POST diagnostics do not uncover a fault, and your server still does not start up, run OpenBoot diagnostics tests.

4.1.2 Starting POST Diagnostics

1. Go to the ok prompt.

2. Type:


ok setenv diag-switch? true

3. Type:


ok setenv diag-level value

Where value is min, max, or menus, depending on the quantity of diagnostic information you want to see.

4. Type:


ok reset-all

The system runs POST diagnostics if post-trigger is set to user-reset. Status and error messages are displayed in the console window. If POST detects an error, it displays an error message describing the failure.

5. When you have finished running POST, restore the value of diag-switch? to false by typing:


ok setenv diag-switch? false

Resetting diag-switch? to false minimizes boot time.


4.2 OpenBoot PROM Commands

OpenBoot PROM commands are commands you type at the ok prompt. OpenBoot PROM commands that can provide useful diagnostic information include:

4.2.1 Running OpenBoot PROM Commands

1. Halt the system to reach the ok prompt.

Inform users before you shut down the system.

2. Type the appropriate command at the console prompt.

Refer to the OpenBoot 4.x Command Reference Manual (816-1177) for more commands.

4.2.1.1 Network Device Aliases

The Solaris OS provides some predefined device aliases for the network devices so that you do not need to type the full device path name. TABLE 4-2 lists the network device aliases, the default Solaris OS device names, and associated ports for the Netra CP3010 board. The devalias command can be used to display the device aliases.


TABLE 4-2 Network Device Aliases

Device Alias

Default Solaris Device Name

BDM5704 Port

net, net0

bge0

Base Fabric Ethernet 0

net1

bge1

Base Fabric Ethernet 1

net2

bge4

Management Ethernet0 (Ethernet port A on front panel)

net3

bge5

Management Ethernet1 (Ethernet port B on front panel)

net4

bge2

Extended Fabric Ethernet 0 (PICMG 3.1)

net5

bge3

Extended Fabric Ethernet 1 (PICMG 3.1)


4.2.2 probe-scsi and probe-scsi-all Commands

The probe-scsi and probe-scsi-all commands diagnose problems with the SCSI devices.



caution icon

Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, issuing the probe-scsior probe-scsi-allcommand can hang the system.



The probe-scsi command communicates with all SCSI devices connected to on-board SCSI controllers. The probe-scsi-all command also accesses devices connected to any host adapters installed in PCI slots.

For any SCSI device that is connected and active, the probe-scsi and probe-scsi-all commands display its loop ID, host adapter, logical unit number, unique worldwide name (WWN), and a device description that includes type and manufacturer.

The following sample output is from the probe-scsi command.

CODE EXAMPLE 4-1 probe-scsi Command Output

{1} ok probe-scsi
Target 0 
  Unit 0   Disk     SEAGATE ST373307LSUN72G 0207
Target 1 
  Unit 0   Disk     SEAGATE ST336607LSUN36G 0207
{1} ok 

The following sample output is from the probe-scsi-all command.

CODE EXAMPLE 4-2 probe-scsi-all Command Output

{1} ok probe-scsi-all
/pci@1c,600000/scsi@2,1
 
/pci@1c,600000/scsi@2
Target 0 
  Unit 0   Disk     SEAGATE ST373307LSUN72G 0207
Target 1 
  Unit 0   Disk     SEAGATE ST336607LSUN36G 0207
 
{1} ok 

4.2.3 probe-ide Command

The probe-ide command communicates with all Integrated Drive Electronics (IDE) devices connected to the IDE bus. This is the internal system bus for media devices such as the DVD drive.



caution icon

Caution - If you used the haltcommand or the Stop-A key sequence to reach the okprompt, issuing the probe-idecommand can hang the system.



CODE EXAMPLE 4-3 shows sample output from the probe-ide command.

CODE EXAMPLE 4-3 probe-ide Command Output

{1} ok probe-ide
Device 0  ( Primary Master ) 
         Not Present
 
  Device 1  ( Primary Slave ) 
         Not Present
 
  Device 2  ( Secondary Master ) 
         Not Present
 
  Device 3  ( Secondary Slave ) 
         Not Present
 
{1} ok 

4.2.4 show-devs Command

The show-devs command lists the hardware device paths for each device in the firmware device tree. CODE EXAMPLE 4-4 shows some sample output.


CODE EXAMPLE 4-4 show-devs Command Output
{1} ok show-devs
/i2c@1f,464000
/pci@1f,700000
/pci@1e,600000
/pci@1d,700000
/pci@1c,600000
/memory-controller@1,0
/SUNW,UltraSPARC-IIIi@1,0
/memory-controller@0,0
/SUNW,UltraSPARC-IIIi@0,0
/virtual-memory
/memory@m0,0
/aliases
/options
/openprom
/chosen
/packages
/i2c@1f,464000/idprom@0,ae
/i2c@1f,464000/nvram@0,ae
/pci@1f,700000/scsi@1
/pci@1f,700000/scsi@1/disk
/pci@1f,700000/scsi@1/tape
/pci@1e,600000/network@2,1
/pci@1e,600000/network@2
/pci@1e,600000/ide@d
/pci@1e,600000/pmu@6
/pci@1e,600000/isa@7
/pci@1e,600000/ide@d/cdrom
/pci@1e,600000/ide@d/disk
/pci@1e,600000/pmu@6/gpio@80000000,b9
/pci@1e,600000/isa@7/ipmc@0,2e8
/pci@1e,600000/isa@7/serial@0,3e8
/pci@1e,600000/isa@7/serial@0,3f8
/pci@1e,600000/isa@7/rtc@0,70
/pci@1e,600000/isa@7/flashprom@2,0
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81/motherboard-fru-prom@81,a8
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81/dimm-spd@81,a6
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81/dimm-spd@81,a4
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81/dimm-spd@81,a2
/pci@1e,600000/isa@7/ipmc@0,2e8/i2c@81/dimm-spd@81,a0
/pci@1d,700000/ethernet@2,1
/pci@1d,700000/ethernet@2
/pci@1c,600000/network@2,1
/pci@1c,600000/network@2
/openprom/client-services
/packages/SUNW,asr
/packages/SUNW,fru-device
/packages/SUNW,i2c-ram-device
/packages/obp-tftp
/packages/kbd-translator
/packages/dropins
/packages/terminal-emulator
/packages/disk-label
/packages/deblocker
/packages/SUNW,builtin-drivers
{1} ok


4.3 OpenBoot Diagnostics

See the OpenBoot PROM Enhancements for Diagnostic Operation (817-6957) document for information on OpenBoot diagnostics. This document can be found on the Sun documentation web site at:

http://www.sun.com/documentation


4.4 Recent Diagnostic Test Results

Summaries of the results from the most recent power-on self-test (POST) and OpenBoot diagnostics tests are saved across power cycles.

4.4.1 Viewing Recent Test Results

1. Go to the ok prompt.

2. Do either of the following:

This command produces a system-dependent list of hardware components, along with an indication of which components passed and which failed POST or OpenBoot diagnostics tests.


4.5 OpenBoot Configuration Variables

Switches and diagnostic configuration variables stored in the IDPROM determine how and when POST diagnostics and OpenBoot Diagnostics tests are performed. This section explains how to access and modify OpenBoot configuration variables. For a list of important OpenBoot configuration variables, see TABLE 4-1.

Changes to OpenBoot configuration variables take effect at the next reboot.

4.5.1 Viewing and Setting OpenBoot Configuration Variables

single-step bulletHalt the server to display the ok prompt.

The following example shows a short excerpt of this command's output.


ok printenv
Variable Name         Value                          Default Value
 
diag-level            min                            min
diag-switch?          false                          false

4.5.2 Using the watch-net and watch-net-all Commands to Check the Network Connections

The watch-net diagnostics test monitors Ethernet packets on the primary network interface. The watch-net-all diagnostics test monitors Ethernet packets on the primary network interface and on any additional network interfaces connected to the system board. Good packets received by the system are indicated by a period (.). Errors such as the framing error and the cyclic redundancy check (CRC) error are indicated with an X and an associated error description.

single-step bulletTo start the watch-net diagnostic test, type the watch-net command at the ok prompt.



{0} ok watch-net
Internal loopback test -- succeeded.
Link is -- up
Looking for Ethernet Packets.
`.' is a Good Packet. `X' is a Bad Packet.
Type any key to stop.................................
 

single-step bulletTo start the watch-net-all diagnostic test, type watch-net-all at the ok prompt.



{0} ok watch-net-all
/pci@1f,0/pci@1,1/network@c,1
Internal loopback test -- succeeded.
Link is -- up 
Looking for Ethernet Packets.
`.' is a Good Packet. `X' is a Bad Packet.
Type any key to stop.
 


4.6 Firmware Memory Map

The Netra CP3010 board boots from the 1-Mbyte system flash PROM device that contains the POST code and OpenBoot PROM. The contents map of this PROM is shown in FIGURE 4-1. User-developed code can also be programmed into the user flash memory space in the form of drop-ins. The system flash can be upgraded by running a program out of the OpenBoot PROM. It is not otherwise accessible by the user.


FIGURE 4-1 System Flash PROM Map

Figure is a diagram of a system flash PROM map.



4.7 Automatic System Reconfiguration



Note - Automatic system reconfiguration (ASR) is not the same as automatic server restart, which the Netratrademark CT 900 server also supports.



Automatic system reconfiguration (ASR) consists of self-test features and an auto-configuring capability to detect failed hardware components and unconfigure them. By enabling ASR, the server is able to resume operating after certain nonfatal hardware faults or failures have occurred.

If a component is monitored by ASR and the server is capable of operating without it, the server automatically reboots if that component develops a fault or fails. This capability prevents a faulty hardware component from stopping operation of the entire system or causing the system to fail repeatedly.

If a fault is detected during the power-on sequence, the faulty component is disabled. If the system remains capable of functioning, the boot sequence continues.

To support this degraded boot capability, the OpenBoot firmware uses the 1275 client interface (by means of the device tree) to mark a device as either failed or disabled, creating an appropriate status property in the device tree node. The Solaris OS does not activate a driver for any subsystem marked in this way.

As long as a failed component is electrically dormant (not causing random bus errors or signal noise, for example), the system reboots automatically and resumes operation while a service call is made.

Once a failed or disabled device is replaced with a new one, the OpenBoot firmware automatically modifies the status of the device upon reboot.



Note - ASR is not enabled until you activate it (see Section 4.7.4, Enabling ASR).



4.7.1 Setting Autoboot Options

The auto-boot? setting controls whether the firmware automatically boots the operating system after each reset. The default setting is true.

The auto-boot-on-error? setting controls whether the system attempts a degraded boot when a subsystem failure is detected. Both the auto-boot? and auto-boot-on-error? settings must be set to true to enable an automatic degraded boot.

single-step bulletTo set the switches, type:


ok setenv auto-boot? true
ok setenv auto-boot-on-error? true



Note - The default setting for auto-boot-on-error? is false. Therefore, the system does not attempt a degraded boot unless you change this setting to true. In addition, the system does not attempt a degraded boot in response to any fatal nonrecoverable error, even if degraded booting is enabled. For examples of fatal nonrecoverable errors, see Section 4.7.2, Error-Handling Summary.



4.7.2 Error-Handling Summary

Error handling during the power-on sequence can be summarized in the following three ways:



Note - If POST or OpenBoot Diagnostics detect a nonfatal error associated with the normal boot device, the OpenBoot firmware automatically unconfigures the failed device and tries the next-in-line boot device, as specified by the boot-device configuration variable.



4.7.3 Reset Scenarios

Three OpenBoot configuration variables, diag-switch?, diag-trigger, and diag-script, control how the system runs firmware diagnostics in response to system reset events.

The standard system reset protocol bypasses POST and OpenBoot Diagnostics unless diag-switch? is set to true. The default setting for this variable is false. Because ASR relies on firmware diagnostics to detect faulty devices, diag-switch? must be set to true for ASR to run. For instructions, see Section 4.7.4, Enabling ASR.

To control which reset events, if any, automatically initiate firmware diagnostics, use diag-trigger. For detailed explanations of these variables and their uses, see Section 4.1.1, Controlling POST Diagnostics.

4.7.4 Enabling ASR

1. At the system ok prompt, type:


ok setenv diag-switch? true
ok setenv auto-boot? true
ok setenv auto-boot-on-error? true

2. Set the diag-trigger variable to power-on-reset, error-reset, or user-reset.

For example, type:


ok setenv diag-trigger user-reset

3. Type:


ok reset-all

The system permanently stores the parameter changes and boots automatically if the OpenBoot variable auto-boot? is set to true (its default value).



Note - To store parameter changes, you can also power-cycle the system by using the On/Standby button in the front panel.



4.7.5 Disabling ASR

1. At the system ok prompt, type:


ok setenv diag-switch? false

2. Type:


ok reset-all

The system permanently stores the parameter change.



Note - To store parameter changes, you can also power-cycle the system by using the On/Standby button in the front panel.