C H A P T E R  4

Hardware and BIOS Issues

This chapter describes Sun Fire X4140, X4240, and X4440 Servers Product Notes hardware and BIOS issues.

The following topics are covered:

Hardware

BIOS

Refer to Service Processor (SP) Issues for firmware issues.


Hardware Issues

Adaptec RAID Card Failed Stripes Errors (6938724)

If a stripe fails when you are using an Adaptec RAID card with your server, messages similar to the following are displayed:

In the Sun StorageTek RAID Manager:

Mar 10 08:39:14 hqsun18 Sun StorageTek RAID Manager Agent: [ID 890732 daemon.warning] [215] One or more logical devices contain a bad stripe: controller 1.

In the messages file:

Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci10de,375@f/pci108e,286@0/sd@0,0 (sd1):

Mar 3 13:23:47 hqsun18 Error for Command: write(10) Error Level: Retryable

Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Requested Block: 295841110 Error Block: 295841110

Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Vendor: Sun Serial Number:

Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Sense Key: Hardware Error

Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0

Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci10de,375@f/pci108e,286@0/sd@0,0 (sd1):

Mar 3 13:23:48 hqsun18 Error for Command: write(10) Error Level: Retryable

Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Requested Block: 295840854 Error Block: 295840854

Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Vendor: Sun Serial Number:

Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Sense Key: Hardware Error

Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0

In Storman:

/opt/StorMan # ./arcconf GETCONFIG 1

full output cut----

Failed stripes : Yes

full output cut----

Resolution

There are no tools or procedures available that can correct a bad stripe while maintaining the existing array. Use the following procedure to create a new stripe array:

1. Back up the data on your stripe array.

2. Remove or delete the existing stripe array.

3. Perform a low-level format of the drives.

4. Create a new stripe array.

Determining If A Server Is HT3 Compatible

A Six-Core AMD Opteron Processor can only be installed on a new (Hyper-Transport 3 enabled) motherboard.

You must verify that the Sun Fire X4140 X4240 or X4440 Server motherboard supports HT3. HT3 enabled motherboards have a “511” prefix in their part numbers. For example:

Old motherboards: 501-7722-xx, "ASSY,MOTHERBOARD,DORADO/TUCANA"

New motherboards: 511-1394-02, "ASSY,MOTHERBOARD,DORADO/TUCANA"

Use one of the following methods to determine if the Sun Fire X4140 X4240 or X4440 Server motherboard supports HT3 before installing a Six-Core AMD Opteron Processor.

SATA SSD Configuration Notes

Support has been added for the following:

When using SATA SSDs, observe the following guidelines:

Sun Fire X4240 Server Cannot Boot With 3 Quad-Port NICs and SCSI HBA using BIOS42 (6786969)

This issue is a PC architecture limitation that does not have enough IO resources to assign to this specific configuration.

Workaround: A possible work around would be to exchange cards between slot 2 and slot 5, so that the IO request alignment fits better. This work-around may apply for this specific configuration only.

Upgrading 2389 SE or 8389 SE CPUs

Before you upgrade a Sun Fire X4240 (B14) or Sun Fire X4440 (B16) server to use 2389 SE or 8389 SE CPUs (Sun Marketing Part Numbers X6306A and X6337A / X6339A), you must first verify that the chassis can accommodate high performance CPUs by checking the chassis top cover or Sun part number. A correct chassis has perforations (small holes) in its top cover. If the chassis does not have top cover perforations (holes) or is not an approved part number, use of these CPUs is not warranted and may result in non-recoverable system failure.

The failure mode is related to the increased cooling needs of these two CPU types.



caution icon Caution - If installed in the chassis intended for other CPUs (B14-AA and B16-AA), the 2389 SE and 8389 SE processors may overheat, leading to catastrophic failure.


You can also check that a chassis is correct by the Sun marketing and manufacturing part number. The approved Sun marketing and manufacturing part numbers are:

If you have concerns or questions regarding this policy, please contact your Sun representative.

Sun Fire X4240 Server Might Not See Disks (6697760)

The LSI SAS expander on the Sun Fire X4240 server might not exit hardware reset reliably. On occasion, after power cycling or hardware reset, some or all of the internal hard disks might not be visible to the operating system.

Work around: If this problem occurs, either re-cycle the AC power or push the hardware reset button again. If the problem persists after repeated application of the work around, contact Sun service for additional support.

Sun StorageTek SAS 8-Port Internal HBA Firmware Version

If your server comes with a Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, note that the firmware/BIOS code level for the initial release of this HBA is:

Verify that you have this, or a later, version. This information is listed during system POST. If you have an earlier version, run the Sun Installation Assistant (SIA) to upgrade the firmware to the latest version.

Sun StorageTek SAS RAID Eight-Port, Internal HBA Firmware Version

If your Sun Fire server comes with a Sun StorageTek SAS RAID Eight-Port Internal Host Bus Adapter (Adaptec-based), part number SG-XPCIESAS-R-INT-Z or SG-PCIESAS-R-INT-Z, note that the firmware code level for the initial release of this HBA is:

Verify that you have this, or a later, version. This information is listed during system POST. If you have an earlier version, run the Sun Installation Assistant (SIA) to upgrade the firmware to the latest version.

Bus/Address Assignments Of Ports-2/3 Can Change With PCI-e Cards Onboard (6759450)

Due to the fact that PCI-E based network adapters can be placed between the 2 Nvidia MACs, the bus/address assignments of ports-2/3 can change.

OS Installer Does Not See Hard Disk Drives (6549807)

The Adaptec SG-XPCIESAS-R-IN-Z Integrated RAID controller card does not show any drives to the operating system installer by default. The user has to use the Adaptec BIOS utility in order to configure (initialize and create an array volume on) at least one drive, after which the drive will be available to install the operating system on.

The system BIOS does see the physical devices.

Workaround: This is expected behavior. You must first initialize and create an array or volume on all the disks you want the OS to see using the Adaptec BIOS utility (accessible using Ctrl-A during system POST). Once this is done, the operating system installation program will see the disks so that you can install the OS and create volumes as needed. Note that a reboot might be required after an HBA BIOS level initialization/partitioning of the disks.

Switching HBA Cards Might Renumber Disks (6564803)

If you have disks connected to the HBA of one manufacturer and decide to exchange that HBA with an HBA of another manufacturer--such as exchanging an LSI HBA for an Adaptec HBA (or visa versa)--you might notice that the drives will map differently after the exchange. This might cause your system to no longer boot.

Workaround: If you exchange an LSI HBA for an Adaptec HBA (or visa versa), reverse the cables connected to the HBA in order for the disk numbering to be correct:

5. Move the cable that was connected to channel 0 of the old HBA to channel 1 of the new HBA.

6. Move the cable connected to channel 1 of the old HBA to channel 0 of the new HBA.

Do Not Use Port 1 of the LSI Internal SAS HBA in a Sun Fire X4240 Server

Sun Fire X4240 servers with the integrated StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, are configured from the factory with an internal SAS cable connected from Port0 of the PCIe HBA card to the 16-disk backplane (connector J0302). There is no cable connection to Port1 of the HBA card.

Using Port1 of StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based) is not supported.

LED Behavior Is Different Than Documented (6580675)

Your server hard disk drives (HDD) have three LEDs: amber, green, and blue. When you command an HDD to light its locate LED, the drive’s amber LED should flash.

When you use the BIOS command to light an HDD’s locate LED, it flashes the green LED, instead. This has been seen on servers with disks attached to the Adaptec SG-XPCIESAS-R-IN Integrated RAID controller card.

Workaround: If you issue the locate command from the BIOS level for drives attached to the Adaptec SG-XPCIESAS-R-IN Integrated RAID controller card, look for a flashing green LED on the HDD instead of a flashing amber LED. Note that this is proper operation at the BIOS level. If you issue the locate command from the OS level (which is typically how the command would be issued), you will see a flashing amber HDD LED.

Sun Fire X4240 Server With LSI HBA Can Use 14 of 16 Hard Disk Drives in a RAID Array (6613780)

If your Sun Fire X4240 server has a Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, the HBA can create a RAID1 or RAID1E array. The Sun Fire X4240 can have up to 16 internal hard disk drives. However, the total number of hard disk drives that can be included in one of the HBA’s supported RAID configurations is 14 drives.

Recommendation: When using the Sun Fire X4240 server’s HBA, set up your RAID array with 14 drives and make the remaining two drives hot spares.

While Volume Resync Is in Progress, Removing a Drive From Another Volume Causes Resync to Start Over (6584821)

If a volume is resyncing, and a drive is pulled from another volume on the same HBA, the volume might start resync from the beginning. The resynchronization will only restart if fast resync (write cache on the secondary) is enabled.

Workaround: There is currently no workaround aside from not removing drives while a volume is resyncing. The result of this issue might add more time to the resyncing process.

Do Not Remove Hard Disks During a RAID Resync Operation (6604060)

On servers using the Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, if you do the following:

1. Remove a hard disk drive during a RAID resync operation.

2. Move that drive to another slot (where the drive will be found by the HBA and it will proceed with a new RAID resync operation).

3. Insert a new drive into the old slot (where you removed the drive).

The green LEDs on both the new drive and the moved drive will blink as if both were being resynced. However, only the drive that was part of the RAID and then moved will actually resync with the existing RAID.

The blinking LED of the new drive can be ignored; it is not actually resyncing and stops once the drive has a volume created on it.

Workaround: Do not remove drives during resync operations.

PS/2 Style Keyboard Plugged into KVM Does Not Respond During POST (6600715)

When using a USB KVM that supports PS2-style devices connected to a server, the PS/2-style keyboard attached might not respond during server power-on self test (POST).

Workaround: Do not use a KVM that supports PS/2-style input devices, use a KVM that supports USB input devices.

MPTBIOS and Linux Boot Error Messages with Multiple OpROM Cards Installed (6667530)

For the Sun Fire X4240 server after onboard devices are initialized, only 32KB of I/O space remains available for option cards. If you install too many cards with OpROMs, the server’s BIOS can run out of I/O space. Most cards can still work at the OS level, depending on the driver and whether the functionality of the card depends on legacy I/O space. However, they are not likely to work during POST (for example, when using the card’s OpROM). A warning message displays if this condition occurs.

Similar limitations exist on all PC architecture products, but the exact amount of I/O space remaining for cards depends on the chipset and other onboard devices.

Workaround: Any workaround is card-dependent. It might be that the OpROM can be used if one card is removed temporarily. Otherwise, the only workaround is to limit card configurations such that the total space required by option cards is limited to the 32KB available for the Sun Fire X4240 server.


BIOS Issues

Servers with 2222SE/8224SE Have Sync Flood Errors After Upgrading BIOS (6834930)

Sun Fire X4140, X4240, X4440 servers with dual-core CPUs can have sync flood errors after upgrading BIOS from 52 to 64.

Workaround: Change the Qimonda DDR2 DIMM to another vendor, such as Hynix or Samsung.

The Qimonda DDR2 DIMM type is:

Flash Upgrade Using Ipmiflash -i PCI Will Not Upgrade BIOS (6721497)

The flash upgrade using ipmiflash -I pci (from a Linux host) will not upgrade the BIOS software.

The ipmiflash supports four interfaces: open, usb, pci and lanplus.

The pci interfaces are used from host to recover a failed SP. When the pci interface is used ipmiflash blindly copies the flash image over to SP flash, it does not power off the host as it does not update the BIOS. Whereas the other three interfaces when using SP are alive and running and it does update the BIOS, thus host is turned off. In summary:

PCI : does not poweroff the host.

USB : does poweroff the host

OPEN: does poweroff the host

LANPLUS : does poweroff the host

This is normal behavior.

Flash Upgrade From 2.0.2.3 to 2.0.2.5 Doesn't Upgrade BIOS Intermittently (6710246)

Choose one of the following methods to solve the problem:

or

setenv upgrade_bios yes

saveenv

reset

The BIOS will be programmed the next time the host powers on.

or

BIOS ROM Memory (6774364)

Only 42KB of 128KB option ROM space is available due to space reserved for built-in devices. Because of the order that the devices in the system are scanned and detected during system boot, option ROM space might be exhausted before all cards are scanned.

If option ROM space is exhausted before an option card you wish to boot from is scanned, the device will not be available. Try disabling option ROMs in the BIOS setup or changing the slot that your PCI card is installed in to fix the problem.

Workarounds

There are two possible workarounds to ensure that you have enough option ROM space to PXE boot from your devices as desired.

Option 1: Disable option ROM scanning on all devices that do not need to PXE boot. This will preserve the option ROM space for the devices that you do want to PXE boot. Use the following procedure.

1. Enter the BIOS Setup utility by pressing the F2 key while the system is booting up and performing POST.

2. On the BIOS Main Menu screen, select the PCIPnP tab to open the PCI/PnP Settings screen.

3. Change the fields to Disabled for those PCI cards or NICs that will not be PXE booted.

4. Press and release the right arrow key until the Exit menu screen is displayed.

5. Follow the instructions on the Exit menu screen to save your changes and exit the Setup utility.

Option 2: Manually set the BIOS boot order so that the devices that you want to PXE boot from are early enough in the boot order to be scanned before the option ROM space is exhausted. Use the following procedure:

1. Enter the BIOS Setup utility by pressing the F2 key while the system is booting and performing POST.

2. On the BIOS Main Menu screen, select the Boot tab to open the Boot menu main screen.

3. Select Boot Device Priority, or select Hard Disk Drives from the list to change hard-disk drives.

4. Change the selections for the boot devices or hard-disks drives to set the required device order.

5. Press and release the right arrow key until the Exit menu screen is displayed.

6. Follow the instructions on the Exit menu screen to save your changes and exit the Setup utility.

6. PCI-X slot 1Memory in Unganged Mode (6695782 /6693114)

When memory is set to unganged mode in the BIOS setup page, and an uncorrectable ECC memory error occurs, platform BIOS is not able to pinpoint the failing DIMM pair. In this case, an uncorrectable memory error will correctly result in a system reset, with error information logged to the BMC (SP) SEL. However, this information will not explicitly describe which DIMM pair failed.

Workaround: Study the SP SEL and search for the "080813" string in the SEL events. This line contains Machine Check data related to the memory failure. In that line, the first 4 digits indicate which processor owns the failing memory. If the line begins with 0018, processor 0 owns the failing memory. 0019 indicates processor 1. 001A indicates processor 2. 001B indicates processor 3. For the processor with the failing memory, remove or replace all memory on that processor. Another workaround is to not change the default of ganged mode.

No POST When Installing Three 8000 Series CPUs and One 2000 Series CPU On X4440 Servers (6775959)

When installing three 8384 CPUs (Quad-Core) and one 2384 CPU (Quad-Core) with the same frequency on an X4440 server system, no power on self test (POST) occurs.

Workaround: Do not mix CPU configurations. All CPUs should be identical, including frequencies.

Prtdiag Reports 6 PCI-E Slots For X4140 Servers (6684807)

In the X4140 server, SMBIOS reports 2 unpopulated sockets in addition to the actual 2 CPU sockets; it also reports 6 PCI-E slots. This info is inaccurate for X4140. 'prtdiag' reports only the populated CPUs, however it incorrectly reports 6 available PCI-E slots.

1. X4140 and X4240 servers have structures for 4 processors, even though they only have 2 processors maximum.

2. X4140 has structures for 6 PCI-e slots, even though this model only supports 3 slots.

BIOS HDD Boot Ordering Changes On SATA Drives (6762709)

The custom boot order is incorrect with BIOS 52 when the SATA drives are connected to LSI or onboard Nvidia controller.

When a drive is added, the order does not match the first saved configuration, instead the BIOS adds the new drive to the beginning of the HDD boot list.

Memclock Value Of 533 Mhz Causes Continuous Sync Flood (6727700)

In some cases when the Memclock value is reduced from BIOS Setup to 533 MHz, the memory may not function correctly and run into system reboots due to uncorrectable memory errors.

Workaround:

a. Do not set "Memclock" option to "533 Mhz"

b. If "Memclock" was set to "533 Mhz", then clear CMOS using _*HW jumper*_.