C H A P T E R  4

Network Interfaces and System Firmware

This chapter describes the networking options of the system and provides background information about the system's firmware.

Information covered in this chapter includes:


About the Network Interfaces

The Sun Fire V490 server provides two on-board Ethernet interfaces, which reside on the system centerplane and conform to the IEEE 802.3z Ethernet standard. For an illustration of the Ethernet ports, refer to FIGURE 2-4. The Ethernet interfaces operate at 10 Mbps, 100 Mbps, and 1000 Mbps.

Two back panel ports with RJ-45 connectors provide access to the on-board Ethernet interfaces. Each interface is configured with a unique media access control (MAC) address. Each connector features two LEDs, as described in TABLE 4-1.


TABLE 4-1 Ethernet Port LEDs

Name

Description

Activity

This amber LED lights when data is either being transmitted or received by the particular port.

Link Up

This green LED lights when a link is established at the particular port with its link partner.


Additional Ethernet interfaces or connections to other network types are available by installing the appropriate PCI interface cards. An additional network interface card can serve as a redundant network interface for one of the system's on-board interfaces. If the active network interface becomes unavailable, the system can automatically switch to the redundant interface to maintain availability. This capability is known as automatic failover and must be configured at the Solaris OS level. For additional details, refer to About Redundant Network Interfaces.

The Ethernet driver is installed automatically during the Solaris installation procedure.

For instructions on configuring the system network interfaces, refer to:


About Redundant Network Interfaces

You can configure your system with redundant network interfaces to provide a highly available network connection. Such a configuration relies on special Solaris software features to detect a failed or failing network interface and automatically switch all network traffic over to the redundant interface. This capability is known as automatic failover.

To set up redundant network interfaces, you can enable automatic failover between the two similar interfaces using the IP Network Multipathing feature of the Solaris OS. For additional details, refer to About Multipathing Software. You can also install a pair of identical PCI network interface cards, or add a single card that provides an interface identical to one of the two on-board Ethernet interfaces.

To help maximize system availability, make sure that any redundant network interfaces reside on separate PCI buses, supported by separate PCI bridges. For additional details, refer to About the PCI Cards and Buses.


About the ok Prompt

A Sun Fire V490 system with Solaris OS software is capable of operating at different run levels. A synopsis of run levels follows; for a full description, refer to the Solaris system administration documentation.

Most of the time, you operate a Sun Fire V490 system at run level 2, or run level 3, which are multiuser states with access to full system and network resources. Occasionally, you may operate the system at run level 1, which is a single-user administrative state. However, the most basic state is run level 0. At this state, it is safe to turn off power to the system.

When a Sun Fire V490 system is at run level 0, the ok prompt appears. This prompt indicates that the OpenBoot firmware is in control of the system.

There are a number of scenarios in which this can happen.

It is the last of these scenarios that most often concerns you as an administrator, since there will be times when you need to reach the ok prompt. The several ways to do this are outlined in Ways of Reaching the ok Prompt. For detailed instructions, refer to How to Get to the ok Prompt.

What You Should Know About Accessing the ok Prompt

It is important to understand that when you access the ok prompt from a functioning Sun Fire V490 system, you are suspending the Solaris OS software and placing the system under firmware control. Any processes that were running under the Solaris OS software are also suspended, and the state of such processes may not be recoverable.

The firmware-based tests and commands you run from the ok prompt have the potential to affect the state of the system. This means that it is not always possible to resume execution of the Solaris OS software from the point at which it was suspended. Although the go command will resume execution in most circumstances, in general, each time you drop the system down to the ok prompt, you should expect to have to reboot it to get back to the Solaris OS environment.

As a rule, before suspending the Solaris OS software, you should back up files, warn users of the impending shutdown, and halt the system in an orderly manner. However, it is not always possible to take such precautions, especially if the system is malfunctioning.

Ways of Reaching the ok Prompt

There are several ways to get to the ok prompt, depending on the state of the system and the means by which you are accessing the system console. In order of desirability, these are:

A discussion of each method follows. For instructions, refer to How to Get to the ok Prompt.

Graceful Halt

The preferred method of reaching the ok prompt is to halt the operating system software by issuing an appropriate command (for example, the shutdown, init, halt, or uadmin command) as described in Solaris system administration documentation.

Gracefully halting the system prevents data loss, allows you to warn users beforehand, and causes minimal disruption. You can usually perform a graceful halt, provided Solaris OS software is running and the hardware has not experienced serious failure.

Stop-A (L1-A) or Break Key Sequence

When it is impossible or impractical to halt the system gracefully, you can get to the ok prompt by typing the Stop-A (or L1-A) key sequence from a Sun keyboard, or, if you have an alphanumeric terminal attached to the Sun Fire V490 system, by pressing the Break key.

If you use this method to reach the ok prompt, be aware that issuing certain OpenBoot commands (like probe-scsi, probe-scsi-all, and probe-ide) may hang the system.

Externally Initiated Reset (XIR)

Generating an externally initiated reset (XIR) has the advantage of allowing you to issue the sync command to preserve file systems and produce a dump file of part of the system state for diagnostic purposes. Forcing an XIR may be effective in breaking the deadlock that is hanging up the system, but it also precludes the orderly shutdown of applications, and so it is not the preferred method of reaching the ok prompt.

Manual System Reset

Reaching the ok prompt by performing a manual system reset should be the method of last resort. Doing this results in the loss of all system coherence and state information. It also corrupts the machine's file systems, although the fsck command usually restores them. Use this method only if nothing else works.



caution icon

Caution - Forcing a manual system reset results in loss of system state data and risks corrupting your file systems.



For More Information

For more information about the OpenBoot firmware, refer to:

An online version of the manual is included with the Solaris Software Supplement CD that ships with Solaris software. It is also is available at the following web site under Solaris on Sun Hardware:

http://docs.sun.com


About OpenBoot Environmental Monitoring

Environmental monitoring and control capabilities for Sun Fire V490 systems reside at both the operating system level and the OpenBoot firmware level. This ensures that monitoring capabilities are operational even if the system has halted or is unable to boot. When the system is under OpenBoot control, the OpenBoot environmental monitor checks the state of the system power supplies, fans, and temperature sensors periodically. If it detects any voltage, current, fan speed, or temperature irregularities, the monitor generates a warning message to the system console.

For additional information about the system's environmental monitoring capabilities, refer to Environmental Monitoring and Control.

Enabling or Disabling the OpenBoot Environmental Monitor

The OpenBoot environmental monitor is enabled by default when the system is operating at the ok prompt. However, you can enable or disable it yourself using the OpenBoot commands env-on and env-off. For more information, refer to:

The commands env-on and env-off only affect environmental monitoring at the firmware level. They have no effect on the system's environmental monitoring and control capabilities while the operating system is running.



Note - Using the Stop-A keyboard command to enter the OpenBoot environment during power-on or reset will immediately disable the OpenBoot environmental monitor. If you want the OpenBoot PROM environmental monitor enabled, you must re-enable it prior to rebooting the system. If you enter the OpenBoot environment through any other means--by halting the operating system, by power-cycling the system, or as a result of a system panic--the OpenBoot environmental monitor will remain enabled.



Automatic System Shutdown

If the OpenBoot environmental monitor detects a critical overtemperature condition, it will initiate an automatic system power off sequence. In this case, a warning similar to the following is generated to the system console:


WARNING: SYSTEM POWERING DOWN IN 30 SECONDS!
Press Ctrl-C to cancel shutdown sequence and return to ok prompt.

If necessary, you can type Ctrl-C to abort the automatic shutdown and return to the system ok prompt; otherwise, after the 30 seconds expire, the system will power off automatically.



Note - Typing Ctrl-C to abort an impending shutdown also has the effect of disabling the OpenBoot environmental monitor. This gives you enough time to replace the component responsible for the critical condition without triggering another automatic shutdown sequence. After replacing the faulty component, you must type the env-on command to reinstate OpenBoot environmental monitoring.





caution icon

Caution - If you type Ctrl-C to abort an impending shutdown, you should immediately replace the component responsible for the critical condition. If a replacement part is not immediately available, power off the system to avoid damaging system hardware.



OpenBoot Environmental Status Information

The OpenBoot command .env enables you to obtain status on the current state of everything of interest to the OpenBoot environmental monitor. This includes information about the system's power supplies, fans, and temperature sensors.

You can obtain environmental status at any time, regardless of whether OpenBoot environmental monitoring is enabled or disabled. The .env status command simply reports the current environmental status information; it does not take action if anything is abnormal or out of range.

For an example of .env command output, refer to How to Obtain OpenBoot Environmental Status Information.


About OpenBoot Emergency Procedures

The introduction of Universal Serial Bus (USB) keyboards has made it necessary to change some of the OpenBoot emergency procedures. Specifically, the Stop-D,
Stop-F, and Stop-N commands that were available on systems with non-USB keyboards are not supported on systems that use USB keyboards, such as the Sun Fire V490 system. The following sections describe the OpenBoot emergency procedures for systems like the Sun Fire V490 server that accept USB keyboards.

Stop-A Functionality

Stop-A (Abort) issues a break that drops the system into OpenBoot firmware control (indicated by the display of the ok prompt). The key sequence works the same on the Sun Fire V490 server as it does on older systems with non-USB keyboards, except that it does not work during the first few seconds after the machine is reset.

Stop-D Functionality

The Stop-D (Diags) key sequence is not supported on systems with USB keyboards. However, the Stop-D functionality can be closely emulated by turning the system control switch to the Diagnostics position. For more information, refer to System Control Switch.

The RSC bootmode diag command also provides similar functionality. For more information, refer to the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V490 Documentation CD.

Stop-F Functionality

The Stop-F functionality is not available in systems with USB keyboards. However, the RSC bootmode forth command provides similar functionality. For more information, refer to the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V490 Documentation CD.

Stop-N Functionality

The Stop-N sequence is a method of bypassing problems typically encountered on systems with mis-configured OpenBoot configuration variables. On systems with older keyboards, you did this by pressing the Stop-N sequence while powering on the system.

On systems with USB keyboards, like the Sun Fire V490, the implementation involves waiting for the system to reach a particular state. For instructions, refer to How to Implement Stop-N Functionality.

The drawback of using Stop-N on a Sun Fire V490 system is that, if diagnostics are enabled, it can take some time for the system to reach the desired state. Fortunately, an alternative exists: Place the system control switch in the Diagnostics position.

Placing the system control switch in Diagnostics position will override OpenBoot configuration variable settings, allowing the system to recover to the ok prompt and letting you correct mis-configured settings.

Assuming you have access to RSC software, another possibility is to use the RSC bootmode reset_nvram command, which provides similar functionality. For more information, refer to the Sun Remote System Control (RSC) 2.2 User's Guide, which is included on the Sun Fire V490 Documentation CD.


About Automatic System Recovery

The Sun Fire V490 system provides a feature called automatic system recovery (ASR). To some, ASR implies an ability to shield the operating system in the event of a hardware failure, allowing the operating system to remain up and running. The implementation of ASR on the Sun Fire V490 server is different--it provides for automatic fault isolation and restoration of the operating system following non-fatal faults or failures of these hardware components:

In the event of such a hardware failure, firmware-based diagnostic tests isolate the problem and mark the device (using the 1275 Client Interface, via the device tree) as either failed or disabled. The OpenBoot firmware then deconfigures the failed device and reboots the operating system. This all occurs automatically, as long as the Sun Fire V490 system is capable of functioning without the failed component.

Once restored, the operating system will not attempt to access any deconfigured device. This prevents a faulty hardware component from keeping the entire system down or causing the system to crash repeatedly.

As long as the failed component is electrically dormant (that is, it does not cause random bus errors or introduce noise into signal lines), the system reboots automatically and resumes operation. Be sure to contact a qualified service technician about replacing the failed component.

Auto-Boot Options

The OpenBoot firmware provides an IDPROM-stored setting called auto-boot?, which controls whether the firmware will automatically boot the operating system after each reset. The default setting for Sun platforms is true.

If a system fails power-on diagnostics, then auto-boot? is ignored and the system does not start up unless an operator boots the system manually. This behavior obviously provides limited system availability. Therefore, the Sun Fire V490 OpenBoot firmware provides a second OpenBoot configuration variable switch called auto-boot-on-error?. This switch controls whether the system will attempt to boot when a subsystem failure is detected.

Both the auto-boot? and auto-boot-on-error? switches must be set to true (their default values) to enable an automatic boot following the firmware detection of a nonfatal subsystem failure.


ok setenv auto-boot? true
ok setenv auto-boot-on-error? true

The system will not attempt to boot if it is in service mode, or following any fatal nonrecoverable error. For examples of fatal nonrecoverable errors, refer to Error Handling Summary.

Error Handling Summary

Error handling during the power-on sequence falls into one of three cases summarized in the following table.


Scenario

System Behavior

Notes

No errors are detected.

The system attempts to boot if auto-boot? is true.

By default, auto-boot? and auto-boot-on-error? are both true.

Nonfatal errors are detected.

The system attempts to boot if auto-boot? and auto-boot-on-error? are both true.

Nonfatal errors include:

  • FC-AL subsystem failure[1]
  • Ethernet interface failure
  • USB interface failure
  • Serial interface failure
  • PCI card failure
  • Processor failure[2]
  • Memory failure[3]

Fatal nonrecoverable errors are detected.

The system will not boot regardless of OpenBoot configuration variable settings.

Fatal nonrecoverable errors include:

  • All processors failed
  • All logical memory banks failed
  • Flash RAM cyclical redundancy check (CRC) failure
  • Critical FRU-ID SEEPROM configuration data failure
  • Critical application specific integrated circuit (ASIC) failure



Note - If POST or OpenBoot Diagnostics detects a nonfatal error associated with the normal boot device, the OpenBoot firmware automatically deconfigures the failed device and tries the next-in-line boot device, as specified by the boot-device configuration variable.



Reset Scenarios

The system control switch position and three OpenBoot configuration variables,
service-mode?
, diag-switch?, and diag-trigger, control whether and how the system runs firmware diagnostics in response to system reset events.

When you set the system control switch to the Diagnostics position, the system is in service mode and runs tests at Sun-specified levels, disabling auto-booting and ignoring the settings of OpenBoot configuration variables.

Setting the service-mode? variable to true also puts the system in service mode, producing exactly the same results as setting the system control switch to the Diagnostics position.

When you set the system control switch to the Normal position, and when the OpenBoot service-mode? variable is set to false (its default value), the system is in normal mode. When the system is in this mode, you can control diagnostics and auto-boot behavior by setting OpenBoot configuration variables, principally
diag-switch? and diag-trigger.

When diag-switch? is set to false (its default value), you can use
diag-trigger to determine what kind of reset events trigger diagnostic tests. The following table describes the various settings (keywords) of the diag-trigger variable. You can use the first three of these keywords in any combination.


Keyword

Function

power-on-reset (default)

Reset caused by power-cycling the system.

error-reset (default)

Reset caused by certain hardware error events, such as a RED State Exception, Watchdog Reset, or Fatal Reset.

user-reset

Reset caused by operating system panics or by user-initiated commands from OpenBoot (reset-all, boot) or from Solaris OS (reboot, shutdown, init).

all-resets

Any kind of system reset.

none

Diagnostic tests are not executed.


Refer to TABLE 6-2 for a fuller list of OpenBoot configuration variables affecting diagnostics and system behavior.

Normal Mode and Service Mode Information

You will find a full description of normal and service modes, as well as detailed information about the OpenBoot configuration variables that affect ASR behavior, in OpenBoot PROM Enhancements for Diagnostic Operation, which is available on the
Sun Fire V490 Documentation CD.


About Manually Configuring Devices

This section explains the difference between deconfiguring a device and a slot, tells what happens if you try to deconfigure all of a system's processors, and also discusses how to obtain device paths.

Deconfiguring Devices vs. Slots

For some devices, different things happen when you deconfigure a slot than when you deconfigure the device that resides within a slot.

If you deconfigure a PCI device, the device in question can still be probed by firmware and recognized by the operating system. Solaris OS "sees" such a device, reports it as failed, and refrains from using it.

If you deconfigure a PCI slot, firmware will not even probe the slot, and the operating system will not "know about" any devices that may be plugged in to the slot.

In both cases, the devices in question are rendered unusable. So why make the distinction? Occasionally, a device may fail in such a way that probing it disrupts the system. In cases such as these, deconfiguring the slot in which the device resides is more likely to contain the problem.

Deconfiguring All System Processors

You can use the asr-disable command to deconfigure all system processors. Doing this will not crash the system. The OpenBoot system firmware, even though it reports all processors as deconfigured, in actuality keeps one processor functioning well enough to run the firmware.

Device Paths

When manually deconfiguring and reconfiguring devices, you might need to determine the full physical paths to those devices. You can do this by typing:


ok show-devs

The show-devs command lists the system devices and displays the full path name of each device. An example of a path name for a Fast Ethernet PCI card is shown below:


/pci@8,700000/pci@2/SUNW,hme@0,1

You can display a list of current device aliases by typing:


ok devalias

You can also create your own device alias for a physical device by typing:


ok devalias alias_name physical_device_path

where alias_name is the alias that you want to assign, and physical_device_path is the full physical device path for the device.



Note - If you manually deconfigure a device alias using asr-disable, and then assign a different alias to the device, the device will remain deconfigured even though the device alias has changed.



You can determine which devices are currently disabled by typing:


ok .asr

The related deconfiguration and reconfiguration procedures are covered in:

Device identifiers are listed in Reference for Device Identifiers.


Reference for Device Identifiers

Refer to the following table when manually specifying which devices to deconfigure and reconfigure. The related procedures are covered in:



Note - The device identifiers above are not case-sensitive; you can type them as uppercase or lowercase characters.



You can use wild cards within device identifiers to reconfigure a range of devices, as shown in the following table.


Device Identifiers

Devices

*

All devices

cmp*

All processors

cmpx-bank*, where x is a number 0-3, or 16-19

All memory banks for each processor

gptwo-slot*

All CPU/Memory board slots

io-bridge*

All PCI bridge chips

pci*

All on-board PCI devices (on-board Ethernet, FC-AL) and all PCI slots

pci-slot*

All PCI slots




Note - You cannot deconfigure a range of devices. Wild cards are valid only for specifying a range of devices to reconfigure.




1 (TableFootnote) A working alternate path to the boot disk is required. For more information, refer to About Multipathing Software.
2 (TableFootnote) A single processor failure causes the entire CPU/Memory module to be deconfigured. Reboot requires that another functional CPU/Memory module be present.
3 (TableFootnote) Since each physical DIMM belongs to two logical memory banks, the firmware deconfigures both memory banks associated with the affected DIMM. This leaves the CPU/Memory module operational, but with one of the processors having a reduced complement of memory.