C H A P T E R 3 |
Software Issues |
This chapter describes software issues related to the Sun Fire X4100 M2/X4200 M2 servers, and includes these topics:
This issue is fully described in the Service Manual, 819-1157.
None. This is expected behavior for the OSs in question.
There are no drivers for these 10-Gb X-option cards on the following platforms:
Windows 2008, SLES 9 SP 4 |
|
Windows 2008, SLES 9 SP 4 |
|
RHEL 5.2, RHEL 4.6, Windows 2008, SLES 10 SP2, SLES 9 SP4 |
|
RHEL 5.2, SLES 10 SP 2, SLES 9 SP 4 |
Workaround: Please check the specified download URL for your card to determine if the driver has become available for download.
When the locate LED button is pressed for more than 15 seconds, log messages are generated. The following steps were taken to cause this issue:
1. Press the locate LED button for more than 15 seconds. This lights up all the LEDs.
2. After 15 seconds, all the LEDs return to their previous state.
3. Check the SEL log through ipmitool.
The log messages can be safely ignored.
The synchronization progress reported by the cfggen status command is not accurate. This is a problem with the DOS version 2.00.18 of cfggen.
In a standard (non-RAID) configuration, if drive 1 of several is removed, the drives after it, for example, drives 2, 3, and 4 also disappear from the MSM screen. The problem is strictly visual; the drives are actually still present, are detected by the OS, and work properly.
This issue does not occur in RAID configurations. Do not use MSM to manage non-RAID HDD configurations.
This is an LSI firmware issue that applies to 32- and 64-bit Windows 2003 servers where RAID arrays do not exist.
Prior to creating any RAID arrays, if a nonbootable disk is pulled from the system, the MSM log fails to reflect the actual disk’s new status. The log is not updated until the drive is reinserted.
Do not use MSM to manage non-RAID HDD configurations.
In SunCFG 1.11, setting the following strings larger than 63 characters can corrupt other strings
Do not enter strings larger than 63 characters.
In SunCFG 1.11, attempts to set the SMBIOS Type 3 TYPE string do not work.
If you boot from the SunVTS Bootable Diagnostics CD .iso image, version 2.1f, through a virtual CD-ROM or on some CD-ROM models, you might see the following messages. These messages are harmless and can be ignored:
Sep 7 03:49:11 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7460@6/pci1022,7464@0,1/storage@1/disk@0,0 (sd0): Sep 7 03:49:11 Error for Command: read(10) Error Level: Fatal Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Requested Block: 109118 Error Block: 109118 Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Vendor: AMI Serial Number: Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Sense Key: Media Error Sep 7 03:49:11 scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0
SunVTS 6.2 Graphical User Interface (GUI), shipped on the Bootable Diagnostics CD, Version 2.1f, has a Meter button. This Meter button does not work because it requires the Solaris stdperformeter utility, which is not available for bootable diagnostics.
In systems with multiple root PCI buses, adding or removing a PCI card with a PCI-PCI bridge changes the unit-addresses assigned to some of the root buses, thereby invalidating the persistent boot path. As a result, the server fails to reboot. There is no workaround for this problem. If this condition occurs, you must completely reinstall the OS.
The installation of Solaris with multiple NIC option cards installed may hang randomly. The workaround is removing option cards before installing Solaris, and plug-in them after Solaris installation finishes.
The following message may appear at boot time:
workaround applied for cpu erratum #122
WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.
This recommendation to patch the BIOS can be safely ignored; CPUs on Sun Fire X4100 M2 and Sun Fire X4200 systems include a fix for Erratum 122.
FMA errors may be reported for X4446A-Z HBAs on systems running Solaris 10 8/07. These error messages can be safely ignored.
An NVIDIA Gigabit Ethernet port can hang when under heavy load. To avoid this problem, install the latest version of patch 127891.
Solaris AMD x64 support includes a boot-time check for the presence of a BIOS workaround for the AMD Opteron Erratum 131. If Solaris detects that the workaround for Erratum 131 is needed but it is not yet implemented, Solaris logs and displays the following warning message:
WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.
This warning message can be safely ignored. The Sun Fire X4100 M2/X4200 M2 BIOS implements a superset workaround that includes the workaround required for Erratum 131.
On X4100 M2 and X4200 M2, the IB-HCA card X4217A-Z is not recognized on a system running Solaris. To support that card, you must install Solaris IB update 2. This update can be downloaded from:
http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#S
When using specific models of Netgear Gigabit switches with servers running Solaris 10 OS 6/06, the links between the NET0 and NET1 Ethernet ports (nge0 and nge1 in Solaris 10 OS) and the switch are not reestablished after the speed of the Netgear ports are changed from 1000 to 100. The models are Netgear switches on which this behavior has been observed are GS724TS and GS748T.
See Ethernet Port Mapping Differs From Physical Port Mapping in Various OSs (6421259) for the physical location of the Ethernet ports.
Your server /var/adm/messages file or = file might display the following message:
mpt():unknown event e received.
This message is displayed when a QUEUE FULL event occurs (queue already contains the maximum number of messages allowed).
No action is required. The LSI SAS controller firmware handles the situation.
A program may terminate suddenly with a SIGPFE exception. This can cause data corruption. This problem is already solved in Solaris 10 5/08.
If you are using an earlier Solaris version, please install the latest version of patch 127112 to avoid this problem.
A program does intensive I/O may hang or cause disk errors. This problem is already solved in Solaris 10 8/07. For earlier Solaris version, please install the latest version of patch 123776 to avoid this problem.
Under SUSE Linux Enterprise Server (SLES) 10 SP1, SLES9 SP3, and SLES 9 SP4, a problem with the forcedeth driver may cause application failure under heavy load.
Until an updated Linux kernel is available from Novell, avoid using the NICs labelled “NET0” and “NET1”, which use the NVIDIA forcedeth driver. Instead, use the NICs labeled “NET2” and “NET3”, which use the Intel driver.
RHEL 3 appears to have problems correctly parsing ACPI tables. This results in numerous error messages. These messages do not reflect a real problem, and can be safely ignored.
This condition is due to an old version of Forcedeth that ships with the OS.
Upgrade the driver to Forcedeth version 0.60 from the NVIDIA website. The upgrade fully restores performance.
Because of the way USB devices are probed, the OS may report that the floppy drive is missing, even when one is attached. This message can be safely ignored.
In RHEL 3U8, the default setting for /proc/sys/vm/numa_memory_allocator is 0. As a result, large applications may experience “out of memory” errors.
You can temporarily change this setting without rebooting:
# echo 1 > /proc/sys/vm/numa_memory_allocator
To change the setting permanently, add the following line to /etc/sysctl.conf:
vm.numa_memory_allocator = 1
Applications that are controlled by Non-uniform Memory Access (NUMA) and applications that rely on numactl do no perform as expected on RHEL 4 through RHEL 4.7. These issues are due to NUMA recognizing only three CPU nodes instead of the actual four. See output below for the numactl --show command after nodebind:
# numactl --show policy: default preferred node: 0 interleavemask: interleavenode: 0 nodebind: 0 1 2 membind: 0 1 2 3 cpubind: 0123456789101112131415
This issue does not impact newer applications that are able to use /sys/devices/system/node/.
Workaround: Configure the applications to use /sys/devices/system/node, or configure the applications to use NUMA’s config file.
If software is upgraded to RHEL 5 U2, some option card network port sequence will change and previous NIC settings for mapping changes.
Drivers for Red Hat Enterprise Linux 5 are not available for the following Sun option cards:
If you remove disk 1 (slot 1) from a RAID1, the OS becomes a read-only file system, and Dmesg shows a lot of errors.
Install MPT driver 4.00.05.00-1, which is included in an RPM package on the Tools and Drivers CD.
This applies to the following releases:
Following an AC power cycle, the OS might hang after the Enabling swap space boot message. This is apparently caused by a bug in Kudzu (refer to Red Hat Bugzilla entry 197722).
If the hang occurs, reboot the server and be sure to type “y” when prompted whether the file system should be checked during the reboot.
You can avoid this problem by disabling Kudzu from the command line.
prompt> chkconfig --level 345 kudzu off
The system hangs at /sbin/loader during OS installation.
Prior to OS installation, configure the system to support only USB 1.1. After the successful OS installation, reconfigure the system to support USB 2.0. To set USB support, go to BIOS Setup -> Advanced -> USB Configuration.
Under Red Hat Enterprise Linux 5.1, a problem with the forcedeth driver can cause Message-Signaled Interrupts (MSIs) to be mishandled.
Two steps are required to resolve this problem:
1. Download the updated Linux kernel provided by Red Hat:
http://rhn.redhat.com/errata/RHSA-2007-0993.html
2. Add the following line to /etc/modprobe.conf:
options forcedeth msi=0
SIA Installations of RHEL 5.1 fails because of legacy dmraid information on supplied disks.
The legacy dmraid information is on the last 2000 disk sectors. The following shell command overwrite these sectors with zeros:
dd if=/dev/zero of=/dev/sda bs=$(blockdev --getss /dev/sda) count=2000 seek=$(expr $(blockdev --getsize /dev/sda) - 2000)
When running under Linux, Megaraid Storage Manager (MSM) might not be able to start if run after dhclient.
Restart the X Window system. This is not a defect.
This applies to the following OS releases:
When booting the OS GUI, the dmesg log might show the following error message multiple times:
drivers/usb/input/hid-input.c: event field not found
During X initialization, some devices can get out of sync and some EV_REP events can get incorrectly interpreted as input events. This is caused by a bug in the HID driver. This message can be safely ignored.
This applies to the following releases:
If you install an Intel Ethernet card in a PCIE slot on a powered-off RHEL 4 U3-x86 (64-bit) system, then reboot the system, the slot’s Intel network interface card (NIC) is displayed with a different logical name than that of other network devices. In addition, the card’s instance number also changes.
Perform the following steps to make the card name display consistent:
# service network stop
# rm /etc/sysconfig/hwconf
4. Remove the ifcfg-eth files in the sysconfig directory:
# rm -f /etc/sysconfig/network-scripts/ifcfg-eth* # rm -f /etc/sysconfig/networking/devices/ifcfg-eth* # rm -f /etc/sysconfig/networking/profiles/default/ifcfg-eth*
5. Edit the modprobe.conf to remove any lines that start with the following:
alias eth* or alias dev*
7. Use the neat command to reconfigure the network devices.
RHEL 3 has a well-known issue with high load situations on the PCI bus.
RHEL 3, by default, scans only LUN 0 of any SCSI ID.
The LSI 22320/20320 Dual Ultra320 SCSI cards (based on the LSI 53C1030 chip) are not supported on RHEL 3_U8_64.
On systems running SLES9, if a host bus adapter (HBA) card is plugged into slot 0, you might not be able to boot the system. This is because SLES9 enumerates IDE and SCSI devices in scan order, and the BIOS scans PCI devices in ascending order. The scanning priority is:
If there is only one drive in the system, it is enumerated as /dev/sda. If an external device is later connected to an HBA card in slot 0, the device will be enumerated as /dev/sda and the internal device will be enumerated as /dev/sdb. However, the SLES9 boot device points to /dev/sda, which is an external device without the OS, and the system cannot boot.
The problem does not occur if the HBA card is plugged into slots 1-4 because these slots are scanned later than the on-board SLI controller. This problem is not specific to the server or the HBA card.
Plug the supported HBA card into slots 1-4, and then reboot the system. Also, follow these general guidelines:
If you create a partition with an ext3 file system, then mount that file system and write a file, the following JBD warning message is displayed:
JBD: barrier-based sync failed on sd<X><Y> - disabling barriers
This message can be safely ignored.
To suppress the message, mount the ext3 file systems using either data=writeback or barrier=none command parameters.
RHEL 3_U9 does not have reliable support for USB 2.0. This makes it difficult to install the OS using an optical drive that defaults to USB 2.0.
Change the BIOS settings so that all connections use USB 1.1.
Messages similar to the following occur.
insmod: /lib/modules/2.4.21-47.ELsmp/kernel/drivers/block/floppy.o: init_module: No such device kernel: PCI: No IRQ known for interrupt pin A of device 00:01.1
These messages have two causes:
1. The OS assumes that all systems have floppy drives and parallel ports.
2. The BIOS descriptions for IRQs are more current than those used by the OS.
These messages can be safely ignored.
Drivers for Red Hat Enterprise Linux 5 are not available for the following Sun option cards:
The lpfc driver provided with SUSE Linux Enterprise Server 9 SP3 (64-bit) for Sun’s PCI Express Dual Gigabit Ethernet fibre channel adapters does not work. An updated adapter is available from Novell:
http://forgeftp.novell.com/driver-process/pub/update/SUN/sle9/common/x86_64/update/SUSE-SLES/9/rpm/x86_64/emulex-lpfc-2.6.5_7.244_smp.x86_64.rpm
With TAC, running tbench at three NICs (two Intel and one NVIDIA) and setting 128 clients, tbench usually fails only at the NVIDIA NICs. ttcp-19990909/tcp, tcprr and tcpmaerts usually fail at the same time, but running them separately results in a pass.
Some systems running RHEL 3 report multiples instances of a virtual AMI floppy drive, with additional error messages such as “resize_dma_pool: unknown device type 31”. This is a known problem with RHEL 3 and will not be fixed.
An APIC error is reported on the CPU after a stress test. The steps taken to reproduce this issue is as follows:
1. Install RHEL 4.7 32bit (PXE).
3. Ensure there are no errors in the dmesg log.
This results in "APIC error on CPU..." in dmesg log. After a reboot the OS, cannot find the "APIC error on CPU..." anymore.
A Windows 2003 guest running under SUSE Linux Enterprise Server 10 via XEN can have a blue screen crash. This occurs when two virtual CPUs are configured.
Configure a single virtual CPU. You should also obtain a patch for xennet.sys from Novell.
The mkfloppy.exe utility that is included in FloppyPack.zip can be run on any Windows system; it is used to create the Mass Storage Driver floppy that is used during OS installation.
However, if there is more than one floppy drive present in the system (including USB-attached floppy drives), mkfloppy.exe does not select the correct floppy drive.
Ensure that the system has only one floppy drive present when using mkfloppy.exe.
Servers with 4 GB of memory, or less, cannot resume from hibernation automatically.
For Sun Fire X4100/X4200 M2 servers, download the InstallPack.exe application at:
http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#S
X4200 M2s running MS Windows Server 2003 (32 bit and 64 bit) with R2 SP2, experience platform resets when data is copied between USB storage device and internal disk.
SIA is not currently able to install drivers for Sun options cards on Windows 2003. These must be installed manually.
See Windows BIOS Option to Save Network Configuration.
Of the four network built in network interfaces, the bottom two NICs (labeled NET0 and NET1) do not function with VMware ESX 3.01.
Use the top two network interfaces (labeled NET2 and NET3). If additional network interfaces are required, use a network option card.
Upgrade to VMware ESX 3.02, the version supported by Software 1.3.)
While installing ESX Server 2.5.2, 2.5.3, or 2.5.4 in a boot from SAN configuration using an optical drive, the installation may stop after displaying running /sbin/loader.
When booting from the CD, watch for the “boot:” prompt at the bottom of the screen. When it appears, type
bootfromsan nousb
and press the enter key. The system may also hang when booting from the SAN. Again, watch for the “boot:” prompt; this time, type
nousb
and press the enter key. To have this workaround happen automatically, edit /etc/lilo.conf. Add the keyword nousb to the beginning of every append= line in the file. If there is no append= line, add one:
append=”nousb”
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.