Sun Enterprise 250 Server Owner's Guide

About Diagnosing Specific Problems

Network Communications Failure

Symptom

The system is unable to communicate over the network.

Action

Your system conforms to the Ethernet 10/100BASE-T standard, which states that the Ethernet 10BASE-T link integrity test function should always be enabled on both the host system and the Ethernet hub. The system cannot communicate with a network if this function is not set identically for both the system and the network hub (either enabled for both or disabled for both). This problem applies only to 10BASE-T network hubs, where the Ethernet link integrity test is optional. This is not a problem for 100BASE-T networks, where the test is enabled by default. Refer to the documentation provided with your Ethernet hub for more information about the link integrity test function.

If you connect the system to a network and the network does not respond, use the OpenBoot PROM command watch-net-all to display conditions for all network connections:

ok watch-net-all

For most PCI Ethernet cards, the link integrity test function can be enabled or disabled with a hardware jumper on the PCI card, which you must set manually. (See the documentation supplied with the card.) For the standard TPE and MII main logic board ports, the link test is enabled or disabled through software, as shown below.

Remember also that the TPE and MII ports share the same circuitry and as a result, only one port can be used at a time.

Note -

Some hub designs permanently enable (or disable) the link integrity test through a hardware jumper. In this case, refer to the hub installation or user manual for details of how the test is implemented.

Determining the Device Name of the Ethernet Interface

To enable or disable the link integrity test for the standard Ethernet interface, or for a PCI-based Ethernet interface, you must first know the device name of the desired Ethernet interface. To list the device name:

Shut down the operating system and take the system to the ok prompt.

Determine the device name for the desired Ethernet interface:

Solution 1

Use this method while the operating system is running:

Become superuser.

Type:

# 
eeprom nvramrc="probe-all install-console banner apply disable-link-pulse 
device-name"
  (Repeat for any additional device names.)
# eeprom "use-nvramrc?"=true

Reboot the system (when convenient) to make the changes effective.

Solution 2

Use this alternate method when the system is already in OpenBoot:

At the ok prompt, type:

ok 
nvedit
0: probe-all install-console banner
1: apply disable-link-pulse device-name
(Repeat this step for other device names as needed.) 
(Press CONTROL-C to exit nvedit.)
ok nvstore
ok setenv use-nvramrc? true

Reboot the system to make the changes effective.

Power-on Failures

Symptom

The system attempts to power up but does not boot or initialize the monitor.

Action

Run POST diagnostics.

See "How to Use POST Diagnostics".

Observe POST results.

The front panel general fault LED should blink slowly to indicate that POST is running. Check the POST output using a locally attached terminal, tip connection, or RSC console.

Note -
By default, POST output is displayed locally on an attached terminal or through a tip connection. If your server has been reconfigured to display POST output on an RSC console, POST results will not display locally. To redirect POST output to the local system, you must execute the OpenBoot PROM command diag-output-to ttya from the RSC console. See the Remote System Control (RSC) User's Guide for additional details.

If you see no front panel LED activity, a power supply may be defective.

See "Power Supply LEDs".

If the general fault LED remains lit, or the POST output contains an error message, then POST has failed.

The most probable cause for this type of failure is the main logic board. However, before replacing the main logic board you should:
1. Remove optional PCI cards.
2. Remove optional DIMMs.
  
  Leave only the four DIMMs in Bank A.
3. Repeat POST to determine if any of these modules caused the failure.
4. If POST still fails, then replace the main logic board.

Video Output Failure

Symptom

No video at the system monitor.

Action

Check that the power cord is connected to the monitor and to the wall outlet.

Verify with a volt-ohmmeter that the wall outlet is supplying AC power.

Verify that the video cable connection is secure between the monitor and the video output port.

Use a volt-ohmmeter to perform the continuity test on the video cable.

If the cables and their connections are okay, then troubleshoot the monitor and the graphics card.

Disk or CD-ROM Drive Failure

Symptom

A disk drive read, write, or parity error is reported by the operating system or a software application.

A CD-ROM drive read error or parity error is reported by the operating system or a software application.

Action

Replace the drive indicated by the failure message.

Symptom

Disk drive or CD-ROM drive fails to boot or is not responding to commands.

Action

Test the drive response to the probe-scsi-all command as follows:

At the system ok prompt, enter:
```
ok reset-all
ok probe-scsi-all
```

If the SCSI device responds correctly to probe-scsi-all, a message similar to the one above is printed out.

If the device responds and a message is displayed, the system SCSI controller has successfully probed the device. This indicates that the main logic board is operating correctly.
1. If one drive does not respond to the SCSI controller probe but the others do, replace the unresponsive drive.
2. If only one internal disk drive is configured with the system and the probe-scsi-all test fails to show the device in the message, replace the drive. If the problem is still evident after replacing the drive, replace the main logic board. If replacing both the disk drive and the main logic board does not correct the problem, replace the associated UltraSCSI data cable and UltraSCSI backplane.

SCSI Controller Failures

To check whether the main logic board SCSI controllers are defective, test the drive response to the probe-scsi command. To test additional SCSI host adapters added to the system, use the probe-scsi-all command. You can use the OBP printenv command to display the OpenBoot PROM configuration variables stored in the system NVRAM. The display includes the current values for these variables as well as the default values. See "OBP printenv Command" for more information.

At the ok prompt, enter:
```
ok probe-scsi
```
If a message is displayed for each installed disk, the system SCSI controllers have successfully probed the devices. This indicates that the main logic board is working correctly.

If a disk doesn't respond:

If the problem persists, replace the unresponsive drive.

If the problem remains after replacing the drive, replace the main logic board.

If the problem persists, replace the associated SCSI cable and backplane.

Power Supply Failure

If there is a problem with a power supply, POST lights the general fault indicator and the power supply fault indicator on the front panel. If you have more than one power supply, then you can use the LEDs located on the power supplies themselves to identify the faulty supply. The power supply LEDs will indicate any problem with the AC input or DC output. See "Power Supply LEDs" for more information about the LEDs.

DIMM Failure

SunVTS and POST diagnostics can report memory errors encountered during program execution. Memory error messages typically indicate the DIMM location number ("U" number) of the failing module.

Use the following diagram to identify the location of a failing memory module from its U number:

Figure 12-8

After you have identified the defective DIMM, remove it according to the instructions in "How to Remove a Memory Module". Install the replacement DIMM according to the directions in "How to Install a Memory Module".

Environmental Failures

The environmental monitoring subsystem monitors the temperature of the system as well as the operation of the fans and power supplies. For more information on the environmental monitoring subsystem, see "Environmental Monitoring and Control".

In response to an environmental error condition, the monitoring subsystem generates error messages that are displayed on the system console and logged in the /var/adm/messages file. These error messages are described in the table below.

Table 12-7


Message	Type	Description
`TEMPERATURE WARNING: X degrees celsius at location Y.`	Warning	Indicates that the temperature measured at location Y has exceeded the warning threshold and if it continues to overheat the system will shutdown. If the value of location Y is a sensor on a CPU, (CP0 or CP1) the temperature (identified by the value X) has exceeded 60 degrees C. If the value of location Y is a sensor on the PDB (power distribution board), SCSI backplane, MB0 or MB1 (main logic board), the ambient temperature (identified by the value X) has exceeded 53 degrees C.
TEMPERATURE CRITICAL: `X` degrees celsius at location Y.	Warning	Indicates that the temperature measured at location Y has exceeded a critical threshold. After this warning message, the system automatically shuts down. If the value of location Y is a sensor on a CPU, (CP0 or CP1) the temperature (identified by the value X) has exceeded 65 degrees C. If the value of location Y is a sensor on the PDB (power distribution board), SCSI backplane, MB0 or MB1 (main logic board), the ambient temperature (identified by the value X) has exceeded 58 degrees C.
`Power Supply X NOT okay.`	Warning	Indicates that there is something wrong with the DC output of the supply. The system may shut down abruptly if the redundant power supply fails. The value X identifies the power supply, PS0 is the lower power supply; PS1 is the upper power supply.
`Power supply X inserted`	Advisory	A hot-swap feature to tell you that the power supply identified by X was installed without service disruption.
`Power supply X removed`	Advisory	A hot-swap feature to tell you that the power supply identified by X was removed without service disruption.
`WARNING: Fan failure has been detected`	Warning	Indicates a fan failure in the fan tray assembly.

If the environmental monitoring system detects a temperature problem, it also lights the temperature LED on the status and control panel. If it detects a power supply problem, it lights the power supply fault LED on the panel. The LEDs located on the power supplies themselves will help to further identify the problem. For information about system LEDs, see:

"About the Status and Control Panel"

"Front Panel LEDs"

"Power Supply LEDs"

Note -

Enterprise 250 power supplies will shut down automatically in response to certain over-temperature and power fault conditions (see "Environmental Monitoring and Control"). To recover from an automatic shutdown, you must disconnect the AC power cord, wait approximately 10 seconds, and then reconnect the power cord.