C H A P T E R  3

Software Issues

This chapter describes software issues related to the Sun Fire X4100 M2/X4200 M2 servers, and includes these topics:


General Software Issues

Current

Ethernet Port Mapping Differs From Physical Port Mapping in Various OSs (6421259)

This issue is fully described in the Service Manual, 819-1157.

Workaround

None. This is expected behavior for the OSs in question.

Some X-option-card Drivers Not Available (6730873)

There are no drivers for these 10-Gb X-option cards on the following platforms:


Card

Platforms

IB-HCA
(PN: 375-3382-01)

Windows 2008, SLES 9 SP 4

IB-HCA
(PN: 375-3549-01)

Windows 2008, SLES 9 SP 4

10-Gb Ethernet PCI-E

(PN: 501-7283-04)

RHEL 5.2, RHEL 4.6, Windows 2008, SLES 10 SP2, SLES 9 SP4

SAS 8-port HBA (PCI-E x8 LP)
(PN:375-3487-01)

RHEL 5.2, SLES 10 SP 2, SLES 9 SP 4

Workaround: Please check the specified download URL for your card to determine if the driver has become available for download.

Pressing Locate LED Button For More Than 15 Seconds Results in SEL Log Messages (6773451)

When the locate LED button is pressed for more than 15 seconds, log messages are generated. The following steps were taken to cause this issue:

1. Press the locate LED button for more than 15 seconds. This lights up all the LEDs.

2. After 15 seconds, all the LEDs return to their previous state.

3. Check the SEL log through ipmitool.

Workaround

The log messages can be safely ignored.


Utilities Issues

Current

Cfggen Does Not Show Correct Synchronization Progress (6600993)

The synchronization progress reported by the cfggen status command is not accurate. This is a problem with the DOS version 2.00.18 of cfggen.

MSM: Removing One HDD Causes Others to Disappear (6514389, 6487038, 6515371)

In a standard (non-RAID) configuration, if drive 1 of several is removed, the drives after it, for example, drives 2, 3, and 4 also disappear from the MSM screen. The problem is strictly visual; the drives are actually still present, are detected by the OS, and work properly.

This issue does not occur in RAID configurations. Do not use MSM to manage non-RAID HDD configurations.

MSM: Status Log Does Not Reflect Status When a Disk is Removed (6525291)

This is an LSI firmware issue that applies to 32- and 64-bit Windows 2003 servers where RAID arrays do not exist.

Prior to creating any RAID arrays, if a nonbootable disk is pulled from the system, the MSM log fails to reflect the actual disk’s new status. The log is not updated until the drive is reinserted.

Do not use MSM to manage non-RAID HDD configurations.

SunCFG: Setting Strings Larger than 63 Characters Can Corrupt Other Strings (6686490, 6686513, 6686521)

In SunCFG 1.11, setting the following strings larger than 63 characters can corrupt other strings

Workaround

Do not enter strings larger than 63 characters.

SunCFG: Cannot Set TYPE String (6686513)

In SunCFG 1.11, attempts to set the SMBIOS Type 3 TYPE string do not work.


Utilities Issues

Resolved

Error Messages During Boot from SunVTS CD Can Be Ignored

If you boot from the SunVTS Bootable Diagnostics CD .iso image, version 2.1f, through a virtual CD-ROM or on some CD-ROM models, you might see the following messages. These messages are harmless and can be ignored:

Sep  7 03:49:11  scsi: [ID 107833 kern.warning] WARNING:
/pci@0,0/pci1022,7460@6/pci1022,7464@0,1/storage@1/disk@0,0 (sd0):
Sep  7 03:49:11         Error for Command: read(10)       Error
Level: Fatal
Sep  7 03:49:11  scsi: [ID 107833 kern.notice]  Requested Block:
109118                    Error Block: 109118
Sep  7 03:49:11  scsi: [ID 107833 kern.notice]  Vendor:
AMI                                Serial Number:
Sep  7 03:49:11  scsi: [ID 107833 kern.notice]  Sense Key: Media Error
Sep  7 03:49:11  scsi: [ID 107833 kern.notice]  ASC: 0x11 (unrecovered
read error), ASCQ: 0x0, FRU: 0x0

Meter Button in Bootable Diagnostics CD Version 2.1f Does Not Work

SunVTS 6.2 Graphical User Interface (GUI), shipped on the Bootable Diagnostics CD, Version 2.1f, has a Meter button. This Meter button does not work because it requires the Solaris stdperformeter utility, which is not available for bootable diagnostics.

 


Solaris Operating System Issues

Current

Cannot Boot with RHEA Card in PCI-E Slots 0 or 1 (6472670)

In systems with multiple root PCI buses, adding or removing a PCI card with a PCI-PCI bridge changes the unit-addresses assigned to some of the root buses, thereby invalidating the persistent boot path. As a result, the server fails to reboot. There is no workaround for this problem. If this condition occurs, you must completely reinstall the OS.

Solaris Installation Hangs if Multiple NIC cards are installed (6724474, 6673707)

The installation of Solaris with multiple NIC option cards installed may hang randomly. The workaround is removing option cards before installing Solaris, and plug-in them after Solaris installation finishes.

Resolved

Unnecessary BIOS Patch Recommended at Boot Time (6468360, 6447850)

The following message may appear at boot time:

workaround applied for cpu erratum #122

WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.

This recommendation to patch the BIOS can be safely ignored; CPUs on Sun Fire X4100 M2 and Sun Fire X4200 systems include a fix for Erratum 122.

FMA Errors for Intel X4446A-Z NICs (6601498)

(Fixed in Solaris 10 5/08.)

FMA errors may be reported for X4446A-Z HBAs on systems running Solaris 10 8/07. These error messages can be safely ignored.

NVIDIA Gigabit Ethernet Port Hangs Under Heavy Load (6500058, 6502876)

(Fixed in Solaris 8/07.)

An NVIDIA Gigabit Ethernet port can hang when under heavy load. To avoid this problem, install the latest version of patch 127891.

AMD Erratum 131 Warning Message Can Be Safely Ignored During OS Startup (6438926, 6468360, 6447850)

(Fixed in Solaris 10 8/07.)

Solaris AMD x64 support includes a boot-time check for the presence of a BIOS workaround for the AMD Opteron Erratum 131. If Solaris detects that the workaround for Erratum 131 is needed but it is not yet implemented, Solaris logs and displays the following warning message:

WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.

This warning message can be safely ignored. The Sun Fire X4100 M2/X4200 M2 BIOS implements a superset workaround that includes the workaround required for Erratum 131.

IB-HCA Card X4217A-Z Is Not Recognized (6724880)

On X4100 M2 and X4200 M2, the IB-HCA card X4217A-Z is not recognized on a system running Solaris. To support that card, you must install Solaris IB update 2. This update can be downloaded from:

http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#S

Solaris 10 OS 6/06: Connection to NVIDIA-Controlled NICs is Lost After Changing Port Speeds Using Netgear Switch (6419824, 6441359)

(Fixed in Software 1.2)

When using specific models of Netgear Gigabit switches with servers running Solaris 10 OS 6/06, the links between the NET0 and NET1 Ethernet ports (nge0 and nge1 in Solaris 10 OS) and the switch are not reestablished after the speed of the Netgear ports are changed from 1000 to 100. The models are Netgear switches on which this behavior has been observed are GS724TS and GS748T.

See Ethernet Port Mapping Differs From Physical Port Mapping in Various OSs (6421259) for the physical location of the Ethernet ports.

Unknown event e Message in messages or dmesg Files (6459169)

Your server /var/adm/messages file or = file might display the following message:

mpt():unknown event e received.

This message is displayed when a QUEUE FULL event occurs (queue already contains the maximum number of messages allowed).

Workaround

No action is required. The LSI SAS controller firmware handles the situation.

Sudden Program Termination With Possible Data Corruption (6636513)

A program may terminate suddenly with a SIGPFE exception. This can cause data corruption. This problem is already solved in Solaris 10 5/08.

Workaround

If you are using an earlier Solaris version, please install the latest version of patch 127112 to avoid this problem.

I/O Processes Hang (6490454, 6469065)

A program does intensive I/O may hang or cause disk errors. This problem is already solved in Solaris 10 8/07. For earlier Solaris version, please install the latest version of patch 123776 to avoid this problem.


Linux Operating System Issues

Current

SLES: NVIDIA NIC Problem Causes Application Failure Under Heavy Load (6610532, 6653013)

Under SUSE Linux Enterprise Server (SLES) 10 SP1, SLES9 SP3, and SLES 9 SP4, a problem with the forcedeth driver may cause application failure under heavy load.

Workaround

Until an updated Linux kernel is available from Novell, avoid using the NICs labelled “NET0” and “NET1”, which use the NVIDIA forcedeth driver. Instead, use the NICs labeled “NET2” and “NET3”, which use the Intel driver.

RHEL 3: ACPI Error messages (6469965)

RHEL 3 appears to have problems correctly parsing ACPI tables. This results in numerous error messages. These messages do not reflect a real problem, and can be safely ignored.

RHEL 3U8: Performance on NVIDIA Ports is Very Slow Compared with Intel Ports (6503371)

This condition is due to an old version of Forcedeth that ships with the OS.

Upgrade the driver to Forcedeth version 0.60 from the NVIDIA website. The upgrade fully restores performance.

RHEL 3U8_64: Dmesg Shows floppy0: no floppy controllers found When a USB Diskette Drive is Attached (6513814)

Because of the way USB devices are probed, the OS may report that the floppy drive is missing, even when one is attached. This message can be safely ignored.

RHEL 3 U8: NUMA Disabled by Default (6502538)

In RHEL 3U8, the default setting for /proc/sys/vm/numa_memory_allocator is 0. As a result, large applications may experience “out of memory” errors.

Workaround

You can temporarily change this setting without rebooting:

# echo 1 > /proc/sys/vm/numa_memory_allocator 

To change the setting permanently, add the following line to /etc/sysctl.conf:

vm.numa_memory_allocator = 1

RHEL 4 U5 Non-uniform Memory Access (NUMA) Applications Do Not Perform As Expected (6719368)

Applications that are controlled by Non-uniform Memory Access (NUMA) and applications that rely on numactl do no perform as expected on RHEL 4 through RHEL 4.7. These issues are due to NUMA recognizing only three CPU nodes instead of the actual four. See output below for the numactl --show command after nodebind:

# numactl --show 
policy: default 
preferred node: 0 
interleavemask: 
interleavenode: 0 
nodebind: 0 1 2
membind: 0 1 2 3
cpubind: 0123456789101112131415

This issue does not impact newer applications that are able to use /sys/devices/system/node/.

Workaround: Configure the applications to use /sys/devices/system/node, or configure the applications to use NUMA’s config file.

RHEL 5 U2 NIC Mapping is Different From Previous Versions (6770474)

If software is upgraded to RHEL 5 U2, some option card network port sequence will change and previous NIC settings for mapping changes.

Resolved

RHEL 5: Some Drivers Not Available (6558529)

(Fixed in Software 1.3)

Drivers for Red Hat Enterprise Linux 5 are not available for the following Sun option cards:

RHEL 5: OS Becomes a Read-Only File System After Removing a Disk From a RAID1 (6543466)

(Fixed in Software 1.2.)

If you remove disk 1 (slot 1) from a RAID1, the OS becomes a read-only file system, and Dmesg shows a lot of errors.

Solution

Install MPT driver 4.00.05.00-1, which is included in an RPM package on the Tools and Drivers CD.

RHEL 4: Servers Might Hang at Enabling Swap Space Message After Power Cycling (6470496)

(Fixed in Software 1.3.)

This applies to the following releases:

Following an AC power cycle, the OS might hang after the Enabling swap space boot message. This is apparently caused by a bug in Kudzu (refer to Red Hat Bugzilla entry 197722).

If the hang occurs, reboot the server and be sure to type “y” when prompted whether the file system should be checked during the reboot.

Workaround

You can avoid this problem by disabling Kudzu from the command line.

prompt> service kudzu stop

prompt> chkconfig --level 345 kudzu off

RHEL 4 U4_32: OS Installation Fails (6551551)

(Fixed in SW 1.2.)

The system hangs at /sbin/loader during OS installation.

Workaround

Prior to OS installation, configure the system to support only USB 1.1. After the successful OS installation, reconfigure the system to support USB 2.0. To set USB support, go to BIOS Setup -> Advanced -> USB Configuration.

RHEL 5.1: NVIDIA NIC Mishandles Message-Signaled Interrupts Under High Load (6644176)

(Fixed in Software 2.0.)

Under Red Hat Enterprise Linux 5.1, a problem with the forcedeth driver can cause Message-Signaled Interrupts (MSIs) to be mishandled.

Resolution

Two steps are required to resolve this problem:

1. Download the updated Linux kernel provided by Red Hat:

http://rhn.redhat.com/errata/RHSA-2007-0993.html

2. Add the following line to /etc/modprobe.conf:

options forcedeth msi=0

SIA Install of RHEL 5.1 Fails (6681828)

SIA Installations of RHEL 5.1 fails because of legacy dmraid information on supplied disks.

Workaround

The legacy dmraid information is on the last 2000 disk sectors. The following shell command overwrite these sectors with zeros:

dd if=/dev/zero of=/dev/sda bs=$(blockdev --getss /dev/sda) count=2000 seek=$(expr $(blockdev --getsize /dev/sda) - 2000)

This issue will not be fixed.

MSM Does Not Start (6609312)

When running under Linux, Megaraid Storage Manager (MSM) might not be able to start if run after dhclient.

Workaround

Restart the X Window system. This is not a defect.

RHEL 4/SLES 9: Error Message when Booting the GUI (6416608)

(Fixed in Software 2.1.)

This applies to the following OS releases:

When booting the OS GUI, the dmesg log might show the following error message multiple times:

drivers/usb/input/hid-input.c: event field not found

During X initialization, some devices can get out of sync and some EV_REP events can get incorrectly interpreted as input events. This is caused by a bug in the HID driver. This message can be safely ignored.

RHEL 4: Intel Network Interface Card is Displayed with Inconsistent Logical Name After Bootup (6423182)

(Fixed in Software 1.0.)

This applies to the following releases:

If you install an Intel Ethernet card in a PCIE slot on a powered-off RHEL 4 U3-x86 (64-bit) system, then reboot the system, the slot’s Intel network interface card (NIC) is displayed with a different logical name than that of other network devices. In addition, the card’s instance number also changes.

Workaround

Perform the following steps to make the card name display consistent:

1. Open a terminal window.

2. Stop the network:

# service network stop

3. Remove the Kudzu database:

# rm /etc/sysconfig/hwconf

4. Remove the ifcfg-eth files in the sysconfig directory:

# rm -f /etc/sysconfig/network-scripts/ifcfg-eth*
# rm -f /etc/sysconfig/networking/devices/ifcfg-eth*
# rm -f /etc/sysconfig/networking/profiles/default/ifcfg-eth*

5. Edit the modprobe.conf to remove any lines that start with the following:

alias eth* or alias dev*

6. Reboot the system.

7. Use the neat command to reconfigure the network devices.

RHEL 3U8_64: System Hangs Occasionally Under High Load with Many PCI I/O Devices Installed (6502242)

(Fixed in Software 1.2.)

RHEL 3 has a well-known issue with high load situations on the PCI bus.

RHEL 3, by default, scans only LUN 0 of any SCSI ID.

RHEL 3 U8_64: LSI Cards 22320/20320 Not Supported (6506460)

(Fixed in Software 1.2.)

The LSI 22320/20320 Dual Ultra320 SCSI cards (based on the LSI 53C1030 chip) are not supported on RHEL 3_U8_64.

SLES9_64 and SLES10_64: System Does Not Boot With Supported HBA Card in Slot 0 (6307424, 6343559)

(Fixed in Software 1.0.1.)

On systems running SLES9, if a host bus adapter (HBA) card is plugged into slot 0, you might not be able to boot the system. This is because SLES9 enumerates IDE and SCSI devices in scan order, and the BIOS scans PCI devices in ascending order. The scanning priority is:

1. NIC

2. Slot 0

3. SAS

4. Slot 2

5. Slot 3

6. Slot 4

7. Slot 1

If there is only one drive in the system, it is enumerated as /dev/sda. If an external device is later connected to an HBA card in slot 0, the device will be enumerated as /dev/sda and the internal device will be enumerated as /dev/sdb. However, the SLES9 boot device points to /dev/sda, which is an external device without the OS, and the system cannot boot.

The problem does not occur if the HBA card is plugged into slots 1-4 because these slots are scanned later than the on-board SLI controller. This problem is not specific to the server or the HBA card.

Workaround

Plug the supported HBA card into slots 1-4, and then reboot the system. Also, follow these general guidelines:

SLES9_SP3: Ignore Error Message when First Writing to an ext3 File System (6422442)

If you create a partition with an ext3 file system, then mount that file system and write a file, the following JBD warning message is displayed:

JBD: barrier-based sync failed on sd<X><Y> - disabling barriers

This message can be safely ignored.

Workaround:

To suppress the message, mount the ext3 file systems using either data=writeback or barrier=none command parameters.

RHEL 3_U9: Bad Support for USB 2.0 (6571085)

RHEL 3_U9 does not have reliable support for USB 2.0. This makes it difficult to install the OS using an optical drive that defaults to USB 2.0.

Workaround

Change the BIOS settings so that all connections use USB 1.1.

This issue will not be fixed.

RHEL 3_U8_64: INSMOD Error Messages (6501643)

(Fixed in Software 1.1.)

Messages similar to the following occur.

insmod: /lib/modules/2.4.21-47.ELsmp/kernel/drivers/block/floppy.o: init_module: No such device 
kernel: PCI: No IRQ known for interrupt pin A of device 00:01.1

These messages have two causes:

1. The OS assumes that all systems have floppy drives and parallel ports.

2. The BIOS descriptions for IRQs are more current than those used by the OS.

These messages can be safely ignored.

RHEL 5: Some Drivers Not Available (6558529)

(Fixed in Software 1.2.)

Drivers for Red Hat Enterprise Linux 5 are not available for the following Sun option cards:

SLES9 SP3 (64-bit): lpfc Driver Does Not Work (6655761)

(Fixed in Software 2.2.)

The lpfc driver provided with SUSE Linux Enterprise Server 9 SP3 (64-bit) for Sun’s PCI Express Dual Gigabit Ethernet fibre channel adapters does not work. An updated adapter is available from Novell:

http://forgeftp.novell.com/driver-process/pub/update/SUN/sle9/common/x86_64/update/SUSE-SLES/9/rpm/x86_64/emulex-lpfc-2.6.5_7.244_smp.x86_64.rpm

Tbench Fails On SLES and RHEL When Setting 128 Clients (6730796, 6728709)

With TAC, running tbench at three NICs (two Intel and one NVIDIA) and setting 128 clients, tbench usually fails only at the NVIDIA NICs. ttcp-19990909/tcp, tcprr and tcpmaerts usually fail at the same time, but running them separately results in a pass.

RHEL 3: Multiple Instances of Virtual AMI Floppy as “Unknown Device Type” With Additional Errors (6505341)

Some systems running RHEL 3 report multiples instances of a virtual AMI floppy drive, with additional error messages such as “resize_dma_pool: unknown device type 31”. This is a known problem with RHEL 3 and will not be fixed.

RHEL 4.7 32-bit dmesg Reports "APIC error on CPU..." After Stress Test (6762301)

An APIC error is reported on the CPU after a stress test. The steps taken to reproduce this issue is as follows:

1. Install RHEL 4.7 32bit (PXE).

2. Boot up the OS.

3. Ensure there are no errors in the dmesg log.

4. Run a TAC stress test.

This results in "APIC error on CPU..." in dmesg log. After a reboot the OS, cannot find the "APIC error on CPU..." anymore.


Windows Server Issues

Current

Windows 2003 Guest Under SLES 10, SP 1/XEN Has Blue Screen Crash (6645567)

A Windows 2003 guest running under SUSE Linux Enterprise Server 10 via XEN can have a blue screen crash. This occurs when two virtual CPUs are configured.

Workaround

Configure a single virtual CPU. You should also obtain a patch for xennet.sys from Novell.

Windows 2003 Utility mkfloppy.exe Does Not Select Correct Diskette Drive if More than One Diskette Drive is Present

The mkfloppy.exe utility that is included in FloppyPack.zip can be run on any Windows system; it is used to create the Mass Storage Driver floppy that is used during OS installation.

However, if there is more than one floppy drive present in the system (including USB-attached floppy drives), mkfloppy.exe does not select the correct floppy drive.

Workaround

Ensure that the system has only one floppy drive present when using mkfloppy.exe.

Resolved

Servers with 4 GB of Memory or Less Cannot Resume from Hibernation Automatically (6458266)

(Fixed in Software 1.3.)

Servers with 4 GB of memory, or less, cannot resume from hibernation automatically.

Workaround

There are three workarounds:

For Sun Fire X4100/X4200 M2 servers, download the InstallPack.exe application at:

http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#S

Platform Resets When Data Is Copied From USB Storage to Internal Disks (6647109)

(Fixed in Software 2.0)

X4200 M2s running MS Windows Server 2003 (32 bit and 64 bit) with R2 SP2, experience platform resets when data is copied between USB storage device and internal disk.

SIA Install of Windows 2003 Does Not Include Option Card Drivers (6555748)

(Fixed in Software 1.3)

SIA is not currently able to install drivers for Sun options cards on Windows 2003. These must be installed manually.

Network Configuration Lost After BIOS Update (6778969)

See Windows BIOS Option to Save Network Configuration.


VMware Issues

Resolved

Two of Four NIC Ports Inoperable (6518982)

(Fixed in Software 1.3.)

Of the four network built in network interfaces, the bottom two NICs (labeled NET0 and NET1) do not function with VMware ESX 3.01.

Workaround

Use the top two network interfaces (labeled NET2 and NET3). If additional network interfaces are required, use a network option card.

Solution

Upgrade to VMware ESX 3.02, the version supported by Software 1.3.)

ESX Installation Stops (6549480)

While installing ESX Server 2.5.2, 2.5.3, or 2.5.4 in a boot from SAN configuration using an optical drive, the installation may stop after displaying running /sbin/loader.

Workaround

When booting from the CD, watch for the “boot:” prompt at the bottom of the screen. When it appears, type

bootfromsan nousb

and press the enter key. The system may also hang when booting from the SAN. Again, watch for the “boot:” prompt; this time, type

nousb

and press the enter key. To have this workaround happen automatically, edit /etc/lilo.conf. Add the keyword nousb to the beginning of every append= line in the file. If there is no append= line, add one:

append=”nousb”

This issue will not be fixed.