C H A P T E R  6

Operating System Issues

This chapter describes Sun Fire X4140, X4240, and X4440 Servers Product Notes OS issues.

The following topics are covered:


Supported Operating Systems

The Sun Fire X4140, X4240 and X4440 servers with SW3.1 support the following OSes:

Solaris: Solaris10 10/09

OpenSolaris: OPenSolaris 2009.06

(not with a Six-Core AMD Opteron Processor): 3.5 u4, 4.0, ESXi, ESX 3.5 U3

Windows: Windows Server 2008 Datacenter (32 and 64-bit) SP2 (32/64bit), Windows Server 2003 SP2 (32 and 64-bit). WIndows 2K8 R2

Linux (64bit): SLES11, SLES10SP2, SLES 10 SP2; SLES 10 SP3
RHEL4.8. (32 and 64-bit), RHEL4 4.6 (32 and 64-bit), RHEL 5.1, RHEL 5.2, RHEL5.3, RHEL 5.4

For a complete list of operating systems supported for your server, go to the Suntrademark web site:


Solaris Issues

This section contains issues that apply to Sun Fire X4140, X4240 and X4440 servers running the Solaris 10 operating system.

Cross-Process Silent File Corruption During Abrupt Shutdown of a UFS File System (6577822)

With the UFS file system files on disk might not be synchronized if a system is shut down abruptly.

Workaround: Use one of the following methods to avoid this problem.

1. Store all critical files on ZFS file systems.

2. Mount UFS file systems using the ‘forcedirectio’ option.

The second option might cause performance issues for some file access.

Enabling Drive LEDs on Nvidia SATA SSDs (6803301)

For preinstall Solaris operating systems, enabling drive LEDs on Nvidia SATA SSDs is automatic.

For non-preinstall Solaris operating systems, follow these instructions to enable the drive LED functions, when using on board Nvidia SATA drives or SSDs:

1. For Solaris builds prior to s10u7, install patch 140339-04 or later

2. Add the following line to /kernel/drv/nv_sata.conf:

enable-sgpio-leds=1


Linux Issues

The following issues apply to servers running either a supported Red Hat or SUSE Linux operating system.

The following issues apply to servers running a supported Red Hat Linux operating system.

The following issues apply to servers running a supported SUSE Linux operating system.


Red Hat or SUSE Linux Issues

Redirecting the Server Console to the Serial Port Using Linux Operating System Commands (6623089)

Although the server’s ILOM has a redirection feature that allows you to do this, you can also redirect the server console to the serial port by doing the following for either Red Hat (RHEL) or Suse (SLES):

1. Add the following line in /etc/inittab file (for SLES, this line might already exist but be commented out. If so, simply remove the "#" at the beginning of the line):

s0:12345:respawn:/sbin/agetty -L 9600 ttyS0 vt102

2. Add the following line in the /etc/securetty file:

ttyS0

3. Change the /etc/grub.conf file as described below.

a. Comment the line that begins with "splashimage ...", like this:

# splashimage=(hd0,0)/grub/splash.xpm.gz

b. Add console=ttyS0 console=tty0 at the end of the line that starts with "kernel /vmlinuz ...", for example:

kernel /vmlinuz-2.6.9 ro root=LABEL=/ debug console=ttyS0,9600 console=tty0

c. Optionally, to have the grub boot menu display at the serial console, add the following lines before the splashimage line:

serial --unit=0 --speed=9600

terminal --timeout=10 serial console

Steps to Enable Linux OS to See AMI Virtual CDROM/Floppy (6570949, 6603436)

The following steps might be required to allow the AMI Virtual CDROM and AMI Virtual Floppy to be seen by the Linux based Host OS without problems (even with on-demand-usb).

To enable scsi_mod scanning of multiple LUN’s (on 2.6 kernel systems), do the following:

1. Edit /etc/modprobe.conf.

2. Add the following line:

options scsi_mod max_luns=128

3. Save the file.

4. Enter the command:

cd /boot

5. Run the mkinitrd command to rebuild the initrd ramdisk associated with each used kernel version. See below for SUSE and Red Hat examples:

cd /boot

mkinitrd -k vmlinuz-<kernel> -i initrd-<kernel>

cd /boot

mkinitrd -v initrd-<kernel>.img <kernel>

6. Reboot the host.


Red Hat Linux

(RHEL 4.5) Sun Fire X4240/X4440 Quad-Core Systems Have Hypertransport Sync Flood Error Under High IO Load (6682186)

In rare instances, the Sun Fire X4240/X4440 quad-core servers with RHEL 4.5 OS might have a hypertransport sync flood error with Link protocol error subcode. That error triggers a warm reset immediately.

Workaround: For more information and for instructions on installing a patch to resolve the issue, see:

http://kbase.redhat.com/faq/FAQ_42_11696.shtm

(RHEL 4.5 32-bit) OS Displays APIC error on CPUx: 40(40) (6590687)

An “APIC error on CPUx: 40(40)” error might be seen in the dmesg when running network traffic on the internal Nvidia ethernet ports (where “x” is the CPU number).

Two workarounds are available:

forcedeth max_interrupt_work=100

This can be done by disabling irqbalance using:

chkconfig --levels 12345 irqbalance off

And then setting the smp_affinity for each of Nvidia ethernet ports:

echo 1 > /proc/irq/num_eth0/smp_affinity

echo 2 > /proc/irq/num_eth1/smp_affinity

echo 4 > /proc/irq/num_eth2/smp_affinity

echo 8 > /proc/irq/num_eth3/smp_affinity

where num_eth# is the irq number associated with each ethernet port listed in /proc/interrupts.

(RHEL 4.5 64-bit) USB Port Becomes Inactive After OS Boot (6588236)

On a Sun Fire X4240 server using an LSI HBA running RHEL 4.5-x86_64, the server’s USB port becomes inactive after booting the OS.

Workaround: If you encounter this problem, unplug and reconnect the USB device, or reboot the server.

(RHEL 4.5 64-bit) dfrud Utility Occasionally Does Not Update SP FRU Data (6658442)

On occasion, the Disk FRU Utility (dfrud) does not update disk remove/insert information in the server’s service processor.

There is no workaround.

(RHEL 4.5) Non-Removable RAID Device Appearing to OS as Removable Device (6677394)

A Sun Fire X4240 system with the Sun StorageTek SAS RAID Eight-Port Internal Host Bus Adapter (Adaptec-based), part number SG-XPCIESAS-R-INT-Z or SG-PCIESAS-R-INT-Z, running RHEL 4.5 occasionally shows non-removable devices as removable devices. There is no known functional impact on the performance or reliability of the system.

There is no known workaround. This message can be ignored.

RHEL 5 and SLES 10 SP2 File I/O Performance Significantly Unbalanced (6546534)

On servers running RHEL 5 or SLES 10 SP2, you might see file I/O performance unbalanced by a significant factor in terms of accumulated CPU time and measured data rate. This is caused by the Linux scheduler.

Workaround: Update your version of the Linux kernel to 2.6.24 or later to fix this issue.

(RHEL 4.7) USB EHCI Is Broken In RHEL 4.7 Stock Kernel (6745462)

In the RHEL 4.7 stock kernel, the USB EHCI does not work.

Workaround: Use errata kernel -78.6. After installation update to errata kernel, use SIA or PXE installation.

(RHEL 4.8) Errors After Installing Nvidia Network Driver 1.38 for RHEL 4.8 (6899635, 6894503)

1. Run the command: rpm -ihv nvnet-rhel4.8-1-38.x86_64.rpm

2. After the driver installation is completed, reboot the system.

After the OS boots up there may be some errors as seen in the following figure:


FIGURE 6-1 RHEL Linux OS Release 4 Errors



Note - This warning message does not affect the functionality of the NIC drivers and can be regarded as harmless.


(RHEL 4.8) Inbox e1000 Driver in RHEL 4.8 May Cause an I/O Port Resource Assignment Issue (6899657, 6896622)

Two NorthStar 4Port GbE cards can not work at the same time in RHEL 4.8 32-bit with fully configured PCI slots.

With the PCI slots configured as follows:

slot0: Cougar (Adaptec)

slot1: Pallene-Q

slot2: Pallene-Q

slot3: IB-HA(256MB)

slot4: Northstar 4P

slot5: Northstar 4P

And after running the command:

dmesg | grep -i

The following error appears:

[root@nsgbj-34-224 ~]# dmesg | grep -i error

e1000: probe of 0000:83:00.0 failed with error -22

e1000: probe of 0000:83:00.1 failed with error -22

e1000: probe of 0000:84:00.0 failed with error -22

e1000: probe of 0000:84:00.1 failed with error -22

Workaround: Remove either Northstar 4Port card from the server, the remaining one will work correctly.

Heavy, Sustained Disk and Network I/O Might Cause Server to Hang or Display “Soft Lockup” Message (6609005, 6627637)

Under sustained, heavy disk and network I/O, Sun Fire X4140/X4240 servers might fail with a “soft lockup” displayed on the console or by hanging. The root cause is traced to the Nvidia ’forcedeth’ Ethernet driver. This problem might occur with the LSI HBA controller, but could also affect other disk controllers. This problem might occur with Red Hat Enterprise Linux version 5, but might also affect other implementations and versions of Linux.

Systems affected by this problem typically exhibit the following symptoms:

Workaround: Until a permanent solution is available, two methods to avoid the problem have been tested and verified.

1. Add pci=nomsi to the boot command line in /boot/grub/grub.conf.

2. Add the following line to /etc/modprobe.conf:

options forcedeth max_interrupt_work=15

The first method is preferred as it avoids the issue entirely. The second method reduces the frequency of occurrence to zero (or near zero).


SUSE Linux

(SLES10 SP1) Disk FRU Information in SP Does Not Update Correctly After Simultaneous HDD Removals (6643935)

Adding or removing multiple disks simultaneously on a SLES10 system might not be reported correctly when you use ipmitool or similar applications to capture FRU information.

Workaround: Restarting the dfrud using the following command causes the updated disk drive state information to be reported:

# service dfrud restart
  Stopping dfrud:                                            [  OK  ]
  Starting dfrud:                                              [  OK  ]
 #


Note - Any disk drive FRU state change (such as removal or addition of disk drives) requires a restart of the dfrud service as outlined above.


(SLES10 SP1) Fails to Boot After Migrating from a RAW Disk to a HW RAID1 (6645523)

After doing a migration in SLES10 SP1 of an existing raw disk to a RAID1, the system hangs with the message:

Waiting for device /dev/disk/by-id ... to appear

... exiting to /bin/sh

Workaround: During the installation of SLES 10 SP2, change the setting of “Mount in /etc/fstab by” from “Device ID” (which is the default) to “Device name”.

(SLES 10 SP2) Error Messages Can Be Ignored From the dmesg Log (6595474)

The following messages might show up in your dmesg log file.

ACPI error messages:

ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
Error attaching device data
.
.
.
ACPI: PCI Root Bridge [PCI0] (0000:00)

Real Time clock driver messages:

Real Time Clock Driver v1.12ac
hpet_resources: 0xfed00000 is busy
ACPI Error (utglobal-0125): Unknown exception code: 0xFFFFFFF0 [20060127] 
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
i8042.c: No controller found.
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize
mice: PS/2 mouse device common for all mice

Storage error messages:

scsi2 : sata_nv
ata3: SATA link down (SStatus 0 SControl 300)
ATA: abnormal status 0x7F on port 0x8887
scsi3 : sata_nv
ata4: SATA link down (SStatus 0 SControl 300)

Network Interface Card error messages:

NET: Registered protocol family 17
audit(1187715799.512:3): audit_pid=5288 old=0 by auid=4294967295
sbl[5472]: segfault at fffffffffffffffd rip 00002ac264c72650 rsp 00007fff46527318 error 4
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver

Workaround: These are warnings in Linux dmesg about the system configuration. They do not have functional impact and can be ignored.

(SLES10-SP1) Fails to Start After Boot Disk is Migrated to RAID1 (6600187, 6644934)

If you install SLES 10 SP1 on your server and then later decide to add a drive and mirror the boot disk on the new drive, SLES 10 SP1 will fail to boot after the migration of the boot drive from a non-RAID to a RAID configuration.

This is because the logical disk order can change during the migration process. This will affect information in the fstab file which contains configuration information all the partitions and storage devices in your computer. In the fstab file, there is a setting “Mount in /etc/fstab by” which identifies the boot disk by its device ID (determined during your initial installation of SLES 10 SP1). If the device ID changes, for example during migration process to RAID1, the system will no longer correctly identify the boot device.

Workaround: To fix this issue, before you do the migration, edit the fstab file, located under /etc/fstab, to change the setting of “Mount in /etc/fstab by” to the device name instead of the device ID. The device name should remain the same after the migration.

If you have already done the migration and cannot boot from your server’s boot disk, try booting from a network device. If that works, you should be able to access the server’s boot disk and edit the fstab file as explained above.


VMware Issues

The following issues apply to Sun Fire X4140, X4240 and X4440 servers running the ESX 3.0.2 operating system.

(VMware 3.5) Installation Hangs on Software 3.1 When a Keyboard is Connected to a Front USB Port (6905218)

This problem might occur under the following circumstances.

1. Connect a USB keyboard to the front USB port.

2. Boot the system with the VMware CD media or boot from the CD image via SP redirection.

When VMware begins to detect the USB device, it hangs and displays the following:

usb.c: registered new driver usbdevfs

usb.c: registered new driver hub

ehci_hcd 00:02.1: PCI device 10de:036d

ehci_hcd 00:02.1: irq 16, pci mem f880dc80

usb.c: new USB bus registered, assigned bus number 1

As seen in the following illustration:


FIGURE 6-2 USB Mode

Workaround: Don’t use the front USB ports to connect the keyboard or mouse during VMware 3.5 installation.

After the installation complete, add the USB keyboard or mouse from the front USB ports.

Harmless Error Messages Related To Insmod Failures On ESX 3.5. u2 (6763724)

On ESX 3.5, in the /var/log/messages, there are messages related to insmod failures. The failure messages are harmless and can be ignored.

This is a generic problem and depending on platforms and options cards used in the system, the failure messages might span from ethernet card to HBAs.

Some Devices Marked as “Unknown” (6571936, 6587973)

Several devices integral to the servers (including the ISA bridge, SMBus, USB Controller, IDE interface, PCI bridge and RAM memory) are recognized as “unknown devices” in ESX 3.0.2.

Workaround: ESX only has drivers for storage and networking devices; everything else is ignored and marked “unknown” in the PCI list output. Simply ignore the message. The devices will function normally.

VMware ESX Does Not Detect All Onboard NICs if PCIe Option Card Is Installed in Slots 2, 4 or 5 (6652529, 6623720)

Systems with PCIe cards in slots 2, 4, and 5 conflict with onboard NICs 2 and 3 due to a problem with the BIOS-supplied MP Table. This occurs with ESX 3 when used in non-ACPI mode.

Workaround: If onboard NICs 2 and 3 are needed with ESX 3, use PCIe slots 0, 1, and 3.

When Installing Multiple RHEA HBAs, Do Not Install One of the RHEA HBAs in Slot 1 (6573995)

On servers running ESX 3.0.2 with multiple LSI Logic RHEA Host Bus Adapter cards, using PCI-e slot 1 for one of the RHEA cards might hang the system during boot.

Workaround: When using multiple RHEA HBAs in the server, install them in slots other than slot 1.


Windows Issues

The following issues apply to servers running the Microsoft Windows Server 2003 operating system.

SW3.0: Windows 2008 Installation Hangs While SGXPCIESAS-R-EXT-Z Card Is Inserted In SUT (6844737)

Workaround:

Using The Nvidia SATA Controller And The Latest Nvidia Driver with Windows Server 2008, No SSD LEDs Are Active (6793985)

Workaround: Use the latest Nvidia driver from SW3.1 tools and driver CD.

Using IPMITool in Windows Environment Requires Installing Driver (6695007)

If you want to use the IPMITool in the Windows environment, you need to install the IPMI System Management driver. If you want to use IPMITool over ILOM, then no driver is needed.

The IPMITool is a command line utility that reads the sensor data repository (SDR) and displays sensor values, System Event Log (SEL), Field Replaceable Unit (FRU) inventory information, gets and sets LAN configuration parameters, and performs chassis power control operations via the server’s Service Processor. This component can be installed and is available from the Tools and Drivers CD for your server or from the Installpack.exe executable file.

Once installed, the IPMITool can be used in two ways:

Requirements

To use IPMItool, ensure that you have completed the requirements specified for your Windows version:


procedure icon  To Install the IPMI System Management Driver (Windows Server 2003 R2)

Do the following before attempting to use the IPMItool through the Windows operating system:

1. Ensure that Microsoft IPMI System Management driver is installed.

a. On the taskbar, click Start, and then click Run.

The Run dialog box is displayed.

b. In the Open list, type devmgmt.msc and then click OK.

The Device Manager is displayed.

c. Expand System Devices and look for "Microsoft Generic IPMI Compliant Device."

d. In Control Panel, open Add/Remove Programs.

The Add/Remove Programs dialog is displayed.

e. Click Add/Remove Windows Components.

The Windows Components Wizard dialog is displayed.

f. Highlight Management and Monitoring Tools component, and then click Details.

The Management and Monitoring Tools page is displayed.

g. Select the Hardware Management subcomponent check box.

The “3rd Party Drivers” warning dialog appears.

h. Read the warning and then click OK.

The Management and Monitoring Tools page is displayed.

i. Click OK.

The Windows Components Wizard dialog is displayed.

j. Click Next.

The Hardware Management component is installed.

2. Instantiate the IPMI System Management driver.

a. On the taskbar, click Start, and then click Run.

The Run dialog box is displayed.

b. In the Open list, type:

rundll32 ipmisetp.dll,AddTheDevice

and then click OK.

The IPMI System Management driver is instantiated.

c. To ensure that the IPMI System Management driver is installed, repeat steps 1a through 1c, above.

For information about using the IPMItool, refer to your Sun Integrated Lights Out Manager 2.0 User’s Guide (820-1188). For more information on standard IPMItool commands, go to:

http://ipmitool.sourceforge.net/manpage.html

Windows Server 2003 Install Fails; Setup Can Not Detect USB Floppy drive (6553336)

This issue occurs during Windows setup. When the setup reaches the point when supplemental drivers need to be loaded, it prompts the user to insert the floppy media. With the USB drive connected and media inserted, the system repeatedly fails and displays the following prompt:

... insert media and press ENTER when ready ...

Workaround: If you encounter this problem, insert the floppy diskette prior to BIOS POST for the drive to show as A: or B:.

Error: Sync Flood Appears in Event Log After OS Warm Boot (6641535)

Due to a chipset silicon erratum, on a warm boot the system might report a “Sync Flood” error in the SEL and POST screen. Although sync floods at runtime usually represent a fatal hardware error, this one does not and can be ignored.

Workaround: Provided this error is in response to a warm restart of the system and did not occur at runtime, simply press F1 to continue booting, and ignore the error. The report of the sync flood should be examined in the SEL to ensure it is not due to a uncorrectable memory error.

The SEL report looks like the following and is not followed with any further diagnostic information as to the source of the sync flood.

Here is the output of the log using seldecode to translate the error:

12d | 12/09/2007 | 14:57:33 | System Boot Initiated #0x43 | Initiated by warm reset | Asserted
12e | 12/09/2007 | 14:57:33 | Processor #0x04 | Presence detected | Asserted
12f | 12/09/2007 | 14:57:33 | OEM #0x12 | | Asserted
130 | 12/09/2007 | 14:57:33 | System Event #0x12 | Undetermined system hardware failure | Asserted
131 | OEM record e0 | 00000000000000000000000000
132 | OEM record e0 | 00000004000000000000b00006
133 | OEM record e0 | 00000048000000000011112022
134 | OEM record e0 | 00000058000000000000030000
135 | OEM record e0 | 00010044000000000000000000
136 | OEM record e0 | 00010048000000000000ff3efa
137 | OEM record e0 | 0018304c00f200002000020c0f
138 | OEM record e0 | 80000000000000000000000000
139 | OEM record e0 | 80000004000000000040b00006
13a | OEM record e0 | 80000048000000000011112322
13b | OEM record e0 | 80000058000000000000030000
13c | OEM record e0 | 80010044000000000000000000
13d | OEM record e0 | 80010048000000000000ff3efa
13e | 12/09/2007 | 14:57:57 | System Firmware Progress #0x01 | Memory initialization | Asserted

Booting WinPE Causes Windows Blue Screen (6660183)

On rare occasions, when booting Windows PE 1.5 after restarting from Windows 2003, WinPE might issue a blue screen with the data "0xA5, 2, 8A767A8, E1169A0, 8A750710" displayed.

Workaround: Although WinPE is not a supported operating system, powering down or momentarily removing AC power from the server will clear the problem.

Standby Option Missing After Windows 2003 Installation (6655011)

After installing Windows 2003 on a Sun Fire X4140 server, standby mode is not available. To use the energy savings from the AMD PowerNOW feature, you might select the Server Balanced Power option accessible from the Power Options control panel applet. Note that this selection is automatically made for you when you run InstallPack.exe from the Tools and Drivers CD.

Workaround: Customers concerned with energy saving can adjust power settings for server balanced power as described above. Downloading and installing the ASPEED display driver enables you to put the server in standby mode.

System Error Event ID 1003 Reported In Event Viewer (6658446)

For the Sun Fire X4240 server running Windows 2003 (32-bit version), system error (ID 1003) might be generated in the event log after long-duration, heavy load storage stress conditions with an attached hardware RAID.

Workaround: If you encounter this issue, reduce the disk stress load on the SAS controller and the designated system disk.

On Sun Fire X4140, X4240, and X4440 Systems With Windows Server 2008 32 Bit, OS Mapping Of Internal NIC Ports Is Different (6733863 )

When running the Windows Server 2008 OS, onboard NIC ports do not map in the same order as the label on the NIC ports.

The map displayed for Windows 2008 onboard NIC ports is: