C H A P T E R 4 |
Hardware and BIOS Issues |
This chapter describes Sun Fire X4140, X4240, and X4440 Servers Product Notes hardware and BIOS issues.
The following topics are covered:
Refer to Service Processor (SP) Issues for firmware issues.
If a stripe fails when you are using an Adaptec RAID card with your server, messages similar to the following are displayed:
In the Sun StorageTek RAID Manager:
Mar 10 08:39:14 hqsun18 Sun StorageTek RAID Manager Agent: [ID 890732 daemon.warning] [215] One or more logical devices contain a bad stripe: controller 1.
Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci10de,375@f/pci108e,286@0/sd@0,0 (sd1):
Mar 3 13:23:47 hqsun18 Error for Command: write(10) Error Level: Retryable
Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Requested Block: 295841110 Error Block: 295841110
Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Vendor: Sun Serial Number:
Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] Sense Key: Hardware Error
Mar 3 13:23:47 hqsun18 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci10de,375@f/pci108e,286@0/sd@0,0 (sd1):
Mar 3 13:23:48 hqsun18 Error for Command: write(10) Error Level: Retryable
Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Requested Block: 295840854 Error Block: 295840854
Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Vendor: Sun Serial Number:
Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] Sense Key: Hardware Error
Mar 3 13:23:48 hqsun18 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
/opt/StorMan # ./arcconf GETCONFIG 1
There are no tools or procedures available that can correct a bad stripe while maintaining the existing array. Use the following procedure to create a new stripe array:
1. Back up the data on your stripe array.
2. Remove or delete the existing stripe array.
3. Perform a low-level format of the drives.
A Six-Core AMD Opteron Processor can only be installed on a new (Hyper-Transport 3 enabled) motherboard.
You must verify that the Sun Fire X4140 X4240 or X4440 Server motherboard supports HT3. HT3 enabled motherboards have a “511” prefix in their part numbers. For example:
Old motherboards: 501-7722-xx, "ASSY,MOTHERBOARD,DORADO/TUCANA"
New motherboards: 511-1394-02, "ASSY,MOTHERBOARD,DORADO/TUCANA"
Use one of the following methods to determine if the Sun Fire X4140 X4240 or X4440 Server motherboard supports HT3 before installing a Six-Core AMD Opteron Processor.
Support has been added for the following:
When using SATA SSDs, observe the following guidelines:
This issue is a PC architecture limitation that does not have enough IO resources to assign to this specific configuration.
Workaround: A possible work around would be to exchange cards between slot 2 and slot 5, so that the IO request alignment fits better. This work-around may apply for this specific configuration only.
Before you upgrade a Sun Fire X4240 (B14) or Sun Fire X4440 (B16) server to use 2389 SE or 8389 SE CPUs (Sun Marketing Part Numbers X6306A and X6337A / X6339A), you must first verify that the chassis can accommodate high performance CPUs by checking the chassis top cover or Sun part number. A correct chassis has perforations (small holes) in its top cover. If the chassis does not have top cover perforations (holes) or is not an approved part number, use of these CPUs is not warranted and may result in non-recoverable system failure.
The failure mode is related to the increased cooling needs of these two CPU types.
Caution - If installed in the chassis intended for other CPUs (B14-AA and B16-AA), the 2389 SE and 8389 SE processors may overheat, leading to catastrophic failure. |
You can also check that a chassis is correct by the Sun marketing and manufacturing part number. The approved Sun marketing and manufacturing part numbers are:
If you have concerns or questions regarding this policy, please contact your Sun representative.
The LSI SAS expander on the Sun Fire X4240 server might not exit hardware reset reliably. On occasion, after power cycling or hardware reset, some or all of the internal hard disks might not be visible to the operating system.
Work around: If this problem occurs, either re-cycle the AC power or push the hardware reset button again. If the problem persists after repeated application of the work around, contact Sun service for additional support.
If your server comes with a Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, note that the firmware/BIOS code level for the initial release of this HBA is:
Verify that you have this, or a later, version. This information is listed during system POST. If you have an earlier version, run the Sun Installation Assistant (SIA) to upgrade the firmware to the latest version.
If your Sun Fire server comes with a Sun StorageTek SAS RAID Eight-Port Internal Host Bus Adapter (Adaptec-based), part number SG-XPCIESAS-R-INT-Z or SG-PCIESAS-R-INT-Z, note that the firmware code level for the initial release of this HBA is:
Verify that you have this, or a later, version. This information is listed during system POST. If you have an earlier version, run the Sun Installation Assistant (SIA) to upgrade the firmware to the latest version.
Due to the fact that PCI-E based network adapters can be placed between the 2 Nvidia MACs, the bus/address assignments of ports-2/3 can change.
The Adaptec SG-XPCIESAS-R-IN-Z Integrated RAID controller card does not show any drives to the operating system installer by default. The user has to use the Adaptec BIOS utility in order to configure (initialize and create an array volume on) at least one drive, after which the drive will be available to install the operating system on.
The system BIOS does see the physical devices.
Workaround: This is expected behavior. You must first initialize and create an array or volume on all the disks you want the OS to see using the Adaptec BIOS utility (accessible using Ctrl-A during system POST). Once this is done, the operating system installation program will see the disks so that you can install the OS and create volumes as needed. Note that a reboot might be required after an HBA BIOS level initialization/partitioning of the disks.
If you have disks connected to the HBA of one manufacturer and decide to exchange that HBA with an HBA of another manufacturer--such as exchanging an LSI HBA for an Adaptec HBA (or visa versa)--you might notice that the drives will map differently after the exchange. This might cause your system to no longer boot.
Workaround: If you exchange an LSI HBA for an Adaptec HBA (or visa versa), reverse the cables connected to the HBA in order for the disk numbering to be correct:
5. Move the cable that was connected to channel 0 of the old HBA to channel 1 of the new HBA.
6. Move the cable connected to channel 1 of the old HBA to channel 0 of the new HBA.
Sun Fire X4240 servers with the integrated StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, are configured from the factory with an internal SAS cable connected from Port0 of the PCIe HBA card to the 16-disk backplane (connector J0302). There is no cable connection to Port1 of the HBA card.
Using Port1 of StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based) is not supported.
Your server hard disk drives (HDD) have three LEDs: amber, green, and blue. When you command an HDD to light its locate LED, the drive’s amber LED should flash.
When you use the BIOS command to light an HDD’s locate LED, it flashes the green LED, instead. This has been seen on servers with disks attached to the Adaptec SG-XPCIESAS-R-IN Integrated RAID controller card.
Workaround: If you issue the locate command from the BIOS level for drives attached to the Adaptec SG-XPCIESAS-R-IN Integrated RAID controller card, look for a flashing green LED on the HDD instead of a flashing amber LED. Note that this is proper operation at the BIOS level. If you issue the locate command from the OS level (which is typically how the command would be issued), you will see a flashing amber HDD LED.
If your Sun Fire X4240 server has a Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, the HBA can create a RAID1 or RAID1E array. The Sun Fire X4240 can have up to 16 internal hard disk drives. However, the total number of hard disk drives that can be included in one of the HBA’s supported RAID configurations is 14 drives.
Recommendation: When using the Sun Fire X4240 server’s HBA, set up your RAID array with 14 drives and make the remaining two drives hot spares.
If a volume is resyncing, and a drive is pulled from another volume on the same HBA, the volume might start resync from the beginning. The resynchronization will only restart if fast resync (write cache on the secondary) is enabled.
Workaround: There is currently no workaround aside from not removing drives while a volume is resyncing. The result of this issue might add more time to the resyncing process.
On servers using the Sun StorageTek SAS 8-Port Internal Host Bus Adapter (LSI 3081E-S based), part number SG-XPCIE8SAS-I-Z or SG-PCIE8SAS-I-Z, if you do the following:
1. Remove a hard disk drive during a RAID resync operation.
2. Move that drive to another slot (where the drive will be found by the HBA and it will proceed with a new RAID resync operation).
3. Insert a new drive into the old slot (where you removed the drive).
The green LEDs on both the new drive and the moved drive will blink as if both were being resynced. However, only the drive that was part of the RAID and then moved will actually resync with the existing RAID.
The blinking LED of the new drive can be ignored; it is not actually resyncing and stops once the drive has a volume created on it.
Workaround: Do not remove drives during resync operations.
When using a USB KVM that supports PS2-style devices connected to a server, the PS/2-style keyboard attached might not respond during server power-on self test (POST).
Workaround: Do not use a KVM that supports PS/2-style input devices, use a KVM that supports USB input devices.
For the Sun Fire X4240 server after onboard devices are initialized, only 32KB of I/O space remains available for option cards. If you install too many cards with OpROMs, the server’s BIOS can run out of I/O space. Most cards can still work at the OS level, depending on the driver and whether the functionality of the card depends on legacy I/O space. However, they are not likely to work during POST (for example, when using the card’s OpROM). A warning message displays if this condition occurs.
Similar limitations exist on all PC architecture products, but the exact amount of I/O space remaining for cards depends on the chipset and other onboard devices.
Workaround: Any workaround is card-dependent. It might be that the OpROM can be used if one card is removed temporarily. Otherwise, the only workaround is to limit card configurations such that the total space required by option cards is limited to the 32KB available for the Sun Fire X4240 server.
Sun Fire X4140, X4240, X4440 servers with dual-core CPUs can have sync flood errors after upgrading BIOS from 52 to 64.
Workaround: Change the Qimonda DDR2 DIMM to another vendor, such as Hynix or Samsung.
The Qimonda DDR2 DIMM type is:
The flash upgrade using ipmiflash -I pci (from a Linux host) will not upgrade the BIOS software.
The ipmiflash supports four interfaces: open, usb, pci and lanplus.
The pci interfaces are used from host to recover a failed SP. When the pci interface is used ipmiflash blindly copies the flash image over to SP flash, it does not power off the host as it does not update the BIOS. Whereas the other three interfaces when using SP are alive and running and it does update the BIOS, thus host is turned off. In summary:
PCI : does not poweroff the host.
LANPLUS : does poweroff the host
Choose one of the following methods to solve the problem:
The BIOS will be programmed the next time the host powers on.
Only 42KB of 128KB option ROM space is available due to space reserved for built-in devices. Because of the order that the devices in the system are scanned and detected during system boot, option ROM space might be exhausted before all cards are scanned.
If option ROM space is exhausted before an option card you wish to boot from is scanned, the device will not be available. Try disabling option ROMs in the BIOS setup or changing the slot that your PCI card is installed in to fix the problem.
There are two possible workarounds to ensure that you have enough option ROM space to PXE boot from your devices as desired.
Option 1: Disable option ROM scanning on all devices that do not need to PXE boot. This will preserve the option ROM space for the devices that you do want to PXE boot. Use the following procedure.
1. Enter the BIOS Setup utility by pressing the F2 key while the system is booting up and performing POST.
2. On the BIOS Main Menu screen, select the PCIPnP tab to open the PCI/PnP Settings screen.
3. Change the fields to Disabled for those PCI cards or NICs that will not be PXE booted.
4. Press and release the right arrow key until the Exit menu screen is displayed.
5. Follow the instructions on the Exit menu screen to save your changes and exit the Setup utility.
Option 2: Manually set the BIOS boot order so that the devices that you want to PXE boot from are early enough in the boot order to be scanned before the option ROM space is exhausted. Use the following procedure:
1. Enter the BIOS Setup utility by pressing the F2 key while the system is booting and performing POST.
2. On the BIOS Main Menu screen, select the Boot tab to open the Boot menu main screen.
3. Select Boot Device Priority, or select Hard Disk Drives from the list to change hard-disk drives.
4. Change the selections for the boot devices or hard-disks drives to set the required device order.
5. Press and release the right arrow key until the Exit menu screen is displayed.
6. Follow the instructions on the Exit menu screen to save your changes and exit the Setup utility.
When memory is set to unganged mode in the BIOS setup page, and an uncorrectable ECC memory error occurs, platform BIOS is not able to pinpoint the failing DIMM pair. In this case, an uncorrectable memory error will correctly result in a system reset, with error information logged to the BMC (SP) SEL. However, this information will not explicitly describe which DIMM pair failed.
Workaround: Study the SP SEL and search for the "080813" string in the SEL events. This line contains Machine Check data related to the memory failure. In that line, the first 4 digits indicate which processor owns the failing memory. If the line begins with 0018, processor 0 owns the failing memory. 0019 indicates processor 1. 001A indicates processor 2. 001B indicates processor 3. For the processor with the failing memory, remove or replace all memory on that processor. Another workaround is to not change the default of ganged mode.
When installing three 8384 CPUs (Quad-Core) and one 2384 CPU (Quad-Core) with the same frequency on an X4440 server system, no power on self test (POST) occurs.
Workaround: Do not mix CPU configurations. All CPUs should be identical, including frequencies.
In the X4140 server, SMBIOS reports 2 unpopulated sockets in addition to the actual 2 CPU sockets; it also reports 6 PCI-E slots. This info is inaccurate for X4140. 'prtdiag' reports only the populated CPUs, however it incorrectly reports 6 available PCI-E slots.
1. X4140 and X4240 servers have structures for 4 processors, even though they only have 2 processors maximum.
2. X4140 has structures for 6 PCI-e slots, even though this model only supports 3 slots.
The custom boot order is incorrect with BIOS 52 when the SATA drives are connected to LSI or onboard Nvidia controller.
When a drive is added, the order does not match the first saved configuration, instead the BIOS adds the new drive to the beginning of the HDD boot list.
In some cases when the Memclock value is reduced from BIOS Setup to 533 MHz, the memory may not function correctly and run into system reboots due to uncorrectable memory errors.
a. Do not set "Memclock" option to "533 Mhz"
b. If "Memclock" was set to "533 Mhz", then clear CMOS using _*HW jumper*_.
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.