C H A P T E R 2 |
Software Issues |
This chapter describes software issues related to the Sun Fire X4100 and Sun Fire X4200 servers and includes these topics:
Note - If an issue statement does not specify a particular platform, the issue applies to all platforms. |
This chapter uses the following Linux-related acronyms:
RHEL versions are usually used with a version number (for example, RHEL4) and an update number (for example, U3).
SLES versions are usually used with a version number (for example, SLES9) and a software patch number (for example, SLES9 SP3).
On systems that have two hard-disk drives, the drives in Slot 0 and Slot 1 are mapped to the OS as disk 2 and disk 3. Therefore, drives that are configured in Slot 0 or Slot 1 in systems with four hard-disk drives, and then moved into a two-disk system, might not operate correctly.
Certain patches for host bus adapters (HBAs), such as the Sun StorEdge Entry-Level Fibre Channel host bus adapter (QLA210), will not work without first installing a Solaris OS patch cluster on systems running Solaris 10 x86 OS and then rebooting the systems.
To install the patch cluster and the QLA210 patch:
1. Install the Solaris 10 3/05 operating system (if it is not already installed).
2. Install the recommended patch cluster.
For instructions on installing the patch cluster, see:
http://patches.sun.com/clusters/10_x86_Recommended.README
3. Install the recommended patch for the HBA.
For example, to install the QLA210 patch (119131-xx):
http://sunsolve.sun.com/pub-cgi/show.pl?target=patchpage
b. Enter 119131 in the PatchFinder text box.
The raidctl command enables you can manage the RAID controllers from the command line interface. However, because the raidctl command is not supported on Solaris 10 3/05, using the command might cause the system to panic.
A Solaris 10 3/05 patch (119851-13) that resolves this issue is available from the SunSolve download site.
If you do not have the latest Solaris 10 3/05 patch, use the MPT SCSI BIOS to create and manage the RAID volumes.
If the input device and output device are set to the serial port (ttya), the following message might appear in the console during bootup:
svc:/system/power:default: Method or service exit timed out. Killing contract 17.
This message does not indicate a problem.
During Solaris 10 OS installation, Solaris might report that it cannot find the second CD even though the second CD is inserted.
This problem does not occur if you perform a net install. Solaris is then able to mount and read the CD images. You can also work around this problem by installing from DVD media rather than multiple CDs.
Solaris AMD x64 OS support includes a boot-time check for the presence of a BIOS workaround for the AMD Opteron Erratum 131. If the Solaris OS detects that the workaround for Erratum 131 is needed but it is not yet implemented, Solaris logs and displays the following warning message:
WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.
The Sun Fire X4100 and Sun Fire X4200 BIOS implements a superset workaround that includes the workaround required for Erratum 131, so this warning message can be safely ignored.
The Sun Installation Assistant does not allow SELinux configuration during the installation of RHEL4. The GUI for the SELinux option is disabled during the installation of RHEL4 U1 with the Sun Installation Assistant CD.
To configure SELinux, run system-config-securitylevel after the installation.
RHEL runs a hardware discoverer named Kudzu. After installing RHEL3 or RHEL 4 with the Sun Installation Assistant, Kudzu displays messages indicating that the Ethernet drivers need to be removed and added again.
The messages Kudzu displays are incorrect. The Ethernet drivers do not need to be changed. Click Ignore when you are prompted to change the hardware configuration.
When the Sun Installation Assistant CD is used to install Red Hat Linux, the ext3 file system might report incorrect disk space utilization and file system full errors. This is because the file system was not being unmounted correctly by the utility on the CD.
The problem has been fixed in the new version of the Sun Installation Assistant CD (version 1.1.6 or later) that is available on the Sun Download Center web site. Go to the following URL and click on Downloads.
http://www.sun.com/servers/entry/x4100/index.html
If you use the old version of the CD and you see these errors, correct the problem by entering the tune2fs command at a command line, and then reboot the server.
After an upgrade to BIOS 0ABGA042, RHEL3U9 (32-bit) reverses the order in which it maps physical to logical ports. This can interfere with network operations, including PXE installation of the OS, when not all Ethernet ports are used. This problem has not been observed in other versions of Red Hat Enterprise Linux, including the 64-bit version of RHEL3U9.
Set the following kernel parameter:
pci=nosort
The hard-disk drive display omits a disk listing during installation if there are many SCSI disks attached to a system. Not all disks are available during the installation.
In addition, the disk-drive display lists the wrong drive type after the installation.
None. However, to display the omitted hard-disk drive, use one of the following instructions:
If a RAID array is attached to the system using a Sun StorEdge PCI/PCI-X Single Ultra320 SCSI host bus adapter (Ultra320 SCSI), you might see the following if you enter the command, fdisk -l, depending on which Linux OS you are using:
Hard-disk drives for the Pyramid and Summit option cards are not displayed during installation or after the installation is complete on Red Hat Linux.
Exceptions: This behavior was not observed in RHEL4 U3 with a 64-bit processor.
Add a device keyword to the installer kickstart file:
device <scsi/eth> xyz_driver [options]
To display the omitted cards, enter the following command in a terminal window:
The qla2400 refers to the HBA driver module that is included with this version of Red Hat Linux software.
After you choose and perform one of the workarounds, reboot the system and run the following command to confirm that the driver is loaded:
fdisk -l
Some Linux OSs, such as RHEL3, do not support the Advanced Configuration and Power Interface (ACPI), which allows a graceful shutdown. On systems running non-ACPI Linux operating systems, only a forceful shutdown is available.
By design, external drives are never loaded automatically on RHEL3 U8.
There are two possible workarounds:
1. Use either of the following to manually load the driver:
prompt> insmod <path_to_driver>/xyz_driver
2. Save a copy of the original initrd file:
prompt> cp initrd-<kernel-version>.img initrd-<kernel-version>.img_SAVED
prompt> mkinitrd -f initrd-<kernel-version>.img <kernel-version>
When the system is rebooted the driver will be loaded automatically.
device <scsi/eth> xyz_driver [options]
After you choose and perform one of the workarounds, reboot the system and run the following command to confirm that the driver is loaded:
fdisk -l
The RHEL3, RHEL4, and SLES9 CDs that you can purchase from Sun are the base (initial-release) versions of those operating systems (OSs) and are not the latest updated versions of those OS’s. Although Sun will support customers to help them install these base versions from the shipped media, customers are expected to immediately upgrade to RHEL3 U6, RHEL4 U3, and SLES9 SP2 to get full Sun support for servers running those OS’s.
1. Go to Sun’s download site for these platforms and download the latest Sun Installation Assistant software. The latest version, 1.1.6, is designed to support installation of the base versions of the Linux OS’s.
2. Burn the new SIA software to CD.
3. Use the new SIA CD you burned to install the version of the OS that you received from Sun.
Refer to the Sun Fire X4100 and Sun Fire X4200 servers Operating System Installation Guide for detailed instructions.
4. Immediately download the latest update or patches from the Linux manufacturers’s web site and install them.
Refer to the Sun Fire X4100 and Sun Fire X4200 servers Operating System Installation Guide for detailed instructions.
When installing the updated QLogic drivers for the QLA210 or QLA2342 option cards, you must manually unload the current drivers or the installation will fail. The modprobe -rv command does not work with these drivers.
1. To check for existing QLA drivers, enter the following command:
The output should look like this:
qla6322 129536 0qla2xxx_conf 310536 1qla2xxx 226960 1 qla6322scsi_transport_fc 16384 1 qla2xxxscsi_mod 140800 8usb_storage,st,sr_mod,sg,qla2xxx,scsi_transport_fc,mptscsih,sd_mod
2. Unload the drivers as shown in the following example:
# rmmod qla6322# rmmod qla2xxx
3. Load the updated QLA drivers.
Note - We recommend that RHEL3 users install the most recent OS update on the server to alleviate this issue. |
The BIOS Advanced menu (CPU Configuration menu), in the BIOS Setup utility, contains an option named “Speculative TLB Reload.” By default, this setting is enabled, which allows TLB reload.
With this default setting, you might see errors similar to the following on systems running any 64-bit version of RHEL or SLES with Service Pack 1.
Northbridge status a60000010005001b GART error 11 Lost an northbridge error NB status: unrecoverable NB error address 0000000037ff07f8 Error uncorrected
To avoid these errors, disable TLB reloading:
1. Reboot the server and press F2 to enter the BIOS Setup utility.
2. Go to the Advanced -> CPU Configuration menu.
3. Use the arrow keys to highlight the Speculative TLB Reload option, and change its setting to Disabled.
4. Save your changes and exit the utility.
The AMD PowerNow! feature is disabled in the BIOS by default. Before enabling it, verify that your operating system and applications support the PowerNow! feature.
The PowerNow! feature changes CPU clock rates. A loss of timer ticks has been observed while running recent Linux SMP kernels when PowerNow! is enabled. This loss of timer ticks might result in timing errors in the kernel and in user applications. Symptoms might include timers that prematurely time out and the time of day clock appearing to behave erratically.
Disable the PowerNow! feature by using the BIOS Setup utility. The menu path to the feature’s screen is Main -> Advanced -> AMD PowerNow Configuration.
RHEL3 displays many I/O errors when a USB device is being initialized. The USB mass storage driver uses the SCSI subsystem to access the device. When a USB mass storage device is attached, the driver attempts to identify it as a SCSI device. The I/O errors displayed are a result of this initialization probe. The I/O errors can be ignored, and the USB device should work properly once it is initialized. This problem and its workaround are documented at:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156831.
When two dual core processors are installed on a Sun Fire X4200 server, the RHEL3 kernel might report four of the hyperthreaded CPUs with the same physical ID of 0. Instead, the IDs should be 0 and 1 for each CPU.
If the USB keyboard is connected to either the front or back USB port, the system running RHEL3 U5 (64-bit) always shows the following error message in the “dmesg” after the reboot.
initialize_kbd: Keyboard reset failed, no ACK
This message does not indicate a problem.
If you have Emulex and QLogic HBA cards installed in your server, you might not be able to access external storage during RHEL3 U8 installation because the installer software does not load the appropriate kernel modules automatically. You therefore cannot perform setup and initialization of any external storage devices that are connected to those HBA cards during RHEL3 U8 installation (for example, disk formatting or RAID set up).
Perform the required hard-disk drive configuration manually after the operating system is installed on the local disks. If you use the KickStart automated installation, it is possible to force the installer to load a specific driver with the device and deviceprobe command. Refer to the Red Hat KickStart documentation for instructions.
The Sun Fire X4100 server might spontaneously reboot when running network traffic over the Kirkwood interface, in a Linux environment. This problem has only been observed when the MTU is set to 9K.
On systems running SLES9, incorrect CPU speeds might be reported in /proc/cpuinfo when the PowerNow! option is enabled. The maximum speed may not be reported.
Disable the PowerNow! feature by using the BIOS Setup utility. The menu path to the feature’s screen is Main -> Advanced -> AMD PowerNow Configuration.
SLES9 SP1 multipath driver (mdadm) does not work after a reboot of the host.
On systems running SLES9, if a host bus adapter (HBA) card is plugged in to Slot 0, you might not be able to boot the system. This is because SLES9 enumerates IDE and SCSI devices in scan order, and the BIOS scans PCI devices in ascending order. The scanning priority is:
If there is only one drive in the system, it is enumerated as /dev/sda. If an external device is later connected to an HBA card in Slot 0, the device will be enumerated as /dev/sda and the internal device will be enumerated as /dev/sdb. However, the SLES9 boot device points to /dev/sda, which is an external device without the OS, and the system cannot boot.
The problem does not occur if the HBA card is plugged in to Slots 1-4, since these slots are scanned later than the on-board SLI controller. This problem is not specific to the server or the HBA card.
Plug the supported HBA card in to Slots 1-4, and then reboot the system. Also, follow these general guidelines:
SunFire X4200 servers running on RHEL4 U3 with BIOS 31, 34, or 36, an smp kernel, and single dual-core CPUs, might fall into an infinite reboot loop.
Use RHEL4 U1 instead of RHEL4 U3. This fix is planned for a future release.
When installing RHEL3 U7 32-bit on Sun Fire X4100 or Sun Fire X4200 servers that have a PCI card installed in any slot other than PCI 0, installation might hang. This problem is not observed when installing RHEL3 U7 64-bit.
See FIGURE 2-1 or FIGURE 2-2 for the location of the PCI slots.
Use RHEL3 U8 32-bit if you have a PCI card installed in any PCI slot other than PCI 0. (This limitation was fixed in Update 8).
FIGURE 2-1 Sun Fire X4100 Designation and Speeds of PCI Slots
FIGURE 2-2 Sun Fire X4200 Designation and Speeds of PCI Slots
If the system is booted with no monitor connected, the VGA port will remain inoperable until the system is booted. The console is still accesible via JavaRConsole.
Obtain the latest driver from ATI. Version strings are 5.10.2600.6024 (32-bit) and 6.14.10.6025 (64-bit).
The bootup time for Windows Server 2003 could be significant (20 minutes or so) if there is a defective disk in the RAID array. Both Windows and firmware retries contribute to the time delay. The defective disk might be recognized by the controller under SAS Topology, but not under RAID Properties.
Windows Server 2003 requires that you use the first storage or the existing partition for installation. You cannot install Windows Server 2003 onto an on-board LSI RAID array if:
The AMD PowerNow! feature is disabled in the BIOS by default. Before enabling it, verify that your operating system and applications support the PowerNow! feature.
If you enable PowerNow! in a Windows Server 2003 environment, you might see a loss of timer ticks and a decrease in CPU voltage, resulting in alert and power failure LEDs illuminating.
Disable the PowerNow! feature by using the BIOS Setup utility. The menu path to the feature’s screen is Main -> Advanced -> AMD PowerNow Configuration.
The mkfloppy.exe utility that is included in FloppyPack.zip can be run on any Windows system; it is used to create the Mass Storage Driver floppy that is used during Windows Server 2003 installation.
However, if there is more than one floppy drive present in the system (including USB-attached floppy drives), mkfloppy.exe does not select the correct floppy drive.
Ensure that the system has only one floppy drive present when using mkfloppy.exe.
LSI MyStorage Backup/Restore functionality causes optical drives to become unavailable. LSI controller firmware will need to be reloaded.
Do not use the Backup/Restore functionality. The version of the LSI MyStorage application on the Tools and Drivers CD has the Backup/Restore functionality disabled.
Hibernation is disabled by default in the InstallPack.exe for Sun Fire X4100 and Sun Fire X4200 servers, but it can be enabled by the user with the Windows Control Panel Power Options settings.
If a server with BIOS 34 enters the S4 Hibernation state, and it has less than 4 GB of available memory, the server might fail to resume from Hibernation. It will instead attempt to reboot, but hang with a blue-screen crash.
Do not enable Hibernation if your server has less than 4 GB available memory.
While installing ESX Server 2.5.2, 2.5.3, or 2.5.4 in a boot from SAN configuration using an optical drive, the installation may stop after displaying “running /sbin/loader”.
When booting from the CD, watch for the “boot:” prompt at the bottom of the screen. When it appears, type
bootfromsan nousb
and press the enter key. The system may also hang when booting from the SAN. Again, watch for the “boot:” prompt; this time, type
nousb
and press the enter key. To have this workaround happen automatically, edit /etc/lilo.conf. Add the keyword nousb to the beginning of every append= line in the file. If there is no append= line, add one:
append=”nousb”
When installing ESX Server 2.5.4, the keyboard and mouse may become inoperative.
Same as for ESX Installation Stops (6549480).
SunVTS 6.2 Graphical User Interface (GUI), shipped on the Bootable Diagnostics CD, Version 2.1f, has a Meter button. This Meter button does not work because it requires the Solaris stdperformeter utility, which is not available for bootable diagnostics.
If you boot from the SunVTS Bootable Diagnostics CD .iso image, version 2.1f, through a virtual CD-ROM or on some CD-ROM models, you might see the following messages. These messages are harmless and can be ignored:
Sep 7 03:49:11 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7460@6/pci1022,7464@0,1/storage@1/disk@0,0 (sd0): Sep 7 03:49:11 Error for Command: read(10) Error Level: Fatal Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Requested Block: 109118 Error Block: 109118 Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Vendor: AMI Serial Number: Sep 7 03:49:11 scsi: [ID 107833 kern.notice] Sense Key: Media Error Sep 7 03:49:11 scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0
A memory test under exclusive mode in SunVTS (version 6.1 and earlier), ramtest, exercises a corner case that does not follow AMD programming guidelines. Therefore, on early Sun Fire X4100 or Sun Fire X4200 servers, ramtest might cause the system to reboot after an extended test run of more than seven hours. Sun Fire X4100 and Sun Fire X4200 systems running software that follows AMD programming guidelines, which most compilers generate, will function properly.
This problem is fixed in Sun VTS version 6.1sp1 and later. To get the latest version of SunVTS, you can download it from this URL:
http://www.sun.com/oem/products/vts/
If you have SunVTS version 6.1 or earlier, SunVTS pmemtest and vmemtest are suitable memory diagnostics for extended test runs. When performing test runs of more than seven hours, use pmemtest or vmemtest, rather than ramtest.
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.