C H A P T E R  4

Hardware Issues

This chapter describes hardware issues related to the Sun Fire X4600 M2 server.

The following are current issues.

Systems with Mixed Dual- and Quad-Core CPUs Require 950W Power Supply (6729680)

Most systems have either dual-core CPUs or quad-core CPUs. However, some systems support a combination of four dual-core CPUs and four quad-core CPUs.

These systems require a 950W power supply.

Before you upgrade a system with dual-core CPUs by adding quad-core CPUs, you must ensure that it has a 950W power supply.

To check the rating of the power supply, pull it out of the system (after shutting it down properly) and look at the rating on the wattage rating on the side.

For instructions, see the Sun Fire X4600 and Sun Fire X4600 M2 Service Manual.

Power Redundancy is Reduced on Fully Loaded System with 120W CPUs (6724117)

If your system is configured with six or eight 120W CPUs, power supply redundancy is reduced to 3+1.

If you are not certain what kind of CPUs your system has, see your system handbook.

DIMM Numbering Shown in Diagnostics Guide is Incorrect

The diagram that shows the DIMM numbering sequence in the Sun Fire X4600 and Sun Fire X4600 M2 Server Diagnostics Guide is incorrect. It shows the sequence as 0-1-2-3-CPU.

The correct sequence is 3-2-1-0-CPU.

The sequence is shown correctly in the Sun Fire X4600 and Sun Fire X4600 M2 Server Service Manual.

Support Added For 2-Socket and 6-Socket Configurations (6846538)

The Sun Fire X4600 M2 server supports 2-socket and 6-socket configurations.

Newer CPLD Improves Power Supply Redundancy (6738256)

Starting with CPLD Version 8, power supply redundancy is increased to 2+2.

Systems with CPLD versions 7 and lower have 3+1 power supply redundancy.

Look at the POST messages to find the CPLD level.

Mixing DIMMs Impacts Memory Performance

For optimum performance, all DIMMs controlled by a given CPU should be the same capacity and all single-rank or dual-rank. Mixed configurations are supported, but could result in lower memory performance. Note that all supported 4GB and 8GB DIMMs are dual-rank. For 1GB and 2GB DIMMs, you can identify the type by counting the DRAMs; single-rank DIMMs, have 18 DRAMs, while dual-rank DIMMs have 36 DRAMs.

Enhanced Quad-Core: Sync Flood Error Before POST (6772148)

This problem occurs only on systems with the new enhanced quad-core CPU. The system may experience a sync flood error at boot time, just before the POST messages would have been displayed. This causes an immediate reboot, so it is easy to miss the fact that an unsuccessful boot occurred.

The most obvious symptom is a loss of memory capacity due to spurious DIMM errors. Unrecoverable errors such as the following appear in the system event log.


Memory | Uncorrectable Error | Asserted | CPU 0 DIMM 6
Memory | Uncorrectable Error | Asserted | CPU 0 DIMM 7
Memory | Uncorrectable Error | Asserted | CPU 0 DIMM 6
Memory | Memory Device Disabled | Asserted | CPU 0 DIMM 7

Performing a warm boot (rebooting the OS without powering down the system) does not clear the condition.

Workaround

Power the system down and reboot. You may need to do this more than once.

Quad-Core Systems with 1 GB DIMMs Not Tested or Supported

The tested configurations for the Sun Fire X4600 M2 server do not include systems with both quad-core processors and 1 GB DIMMs. Such configurations may work, but are not supported.

Spontaneous Reboot With “OEM 0#x12 SEL” Messages and No Memory Errors (6652566)

Spontaneous reboots can occur with no memory errors reported. This problem is associated with an entry in the System Event Log (SEL) labeled “OEM #0x12” followed by a series of entries labeled “OEM record e0”. Here is an example:


6502 | 12/22/2007 | 07:41:21 | OEM #0x12 |  | Asserted
6602 | OEM record e0 | 00000000040f0c0200100000f2
6702 | OEM record e0 | 01000000040000000000000000

The hexadecimal value in “OEM record e0” entries may be different from those in the example.

If this problem occurs on your system, you should take basic steps to eliminate possible causes, as described below. If you take these steps, and the problem continues to occur, contact Sun Support for additional remediation.


procedure icon  Remedial Steps

The following steps may eliminate the problem. Retest the system after each step:

1. Verify that the system BIOS, other system firmware, and PCI card firmware is up to date. For information on verifying and updating system BIOS and firmware, consult the Software Release Notes.

2. Remove all PCI cards and CPU modules. Use canned compressed air to clean their contacts and slots, then reseat them.

3. If PCI cards have been added to the system recently, try rearranging the PCI cards. Note any change in symptoms after the cards are rearranged.

System Does Not Boot With 6 QGE Cards (6555627)

A system does not boot if the PCI slots contain 6 Sun Quad Gigabit Ethernet (QGE) cards. Test systems have run reliably with 5 QGE cards.

Mouse and Keyboard Hang During Stress Test (6499312)

During a lengthy stress test, a USB mouse and keyboard were both hung. At the time, floppy and DVD were both redirected to physical USB drives.

Workaround

Disconnect and reconnect mouse and keyboard.

USB Ports Become Disabled (6424279)

USB ports might become disabled during operation. This appears to be a hardware problem related to the Nvidia USB controller.

When this happens, the device attached to the USB port becomes inactive. A message similar to the following is reported in the file:

[ID 691482 kern.warning] WARNING: /pci@0,0/pci108e,cb84@2 (ohci0): Connecting device on port 1 failed

Workaround

Reboot the server to re-enable the USB ports.


Resolved Hardware Issues

System Panics Under Heavy I/O Workload (6544011)

(Fixed in Software 2.0.)

A test system configured with 16 RAID storage arrays connected to 8 PCI HBAs panicked and rebooted during tests involving a heavy I/O workload. The problem could not be reproduced with fewer than 16 arrays. The problem only occurred when read size was 256K or more.