C H A P T E R 2 |
Software Issues |
This chapter describes software issues related to the Sun Fire X4600 server. The numbers given in the section titles are internal tracking numbers for change requests related to the issues.
This chapter includes the following topics:
When an X7280A Gigabit Ethernet controller is used, Solaris Fault Management may report “The Solaris Fault Manager received an event from a component to which no automated diagnosis software is currently subscribed.” This message is due to a software bug and can be safely ignored.
During Solaris 10 installation from CD media, Solaris reports that it cannot find the second CD, even though the second CD is inserted.
This problem does not occur if you perform a net install. Solaris is then able to mount and read the CD images. You can also work around this problem by installing from DVD media rather than multiple CDs.
Solaris AMD x64 support includes a boot-time check for the presence of a BIOS workaround for the AMD Opteron Erratum 131. If Solaris detects that the workaround for Erratum 131 is needed but it is not yet implemented, Solaris logs and displays the following warning message:
WARNING: BIOS microcode patch for AMD Athlon(tm) 64/Opteron(tm) processor erratum 131 was not detected; updating your system’s BIOS to a version containing this microcode patch is HIGHLY recommended or erroneous system operation may occur.
This warning message can be safely ignored. The BIOS implements a superset workaround that includes the workaround required for Erratum 131.
A change made in the system BIOS after Software 1.3a causes Red Hat Enterprise Linux 4.4 to panic.
BIOS 51, included with Software 1.5, contains a workaround for this problem. After flashing new system firmware, disable the following options in the system BIOS:
A system with a Sun StorageTek PCI-E Dual Channel Ultra320 SCSI HBA (SG-XPCIE2SCSIU320Z) installed in slot 5, 6, or 7 may be unable to boot SUSE Linux Enterprise Server 10 from internal disk. The HBA should not interfere with the boot, because it is in a slot that is scanned after the embedded HBA that controls the internal disks.
This problem was only observed with SLES10, not with SLES10 SP1.
Edit /etc/fstab and /boot/grub/menu.lst to indicate the internal disk as the boot device. You will need to count the LUNs for external storage in order to calculate the internal disk ID.
Drivers for Red Hat Enterprise Linux 5 are not available for the following Sun option cards:
CPUs with PowerNow! enabled should run at minimum speed when idle. This does not always occur on Linux systems because of problems with the cpuspeed daemon.
service cpuspeed restart
When running under Linux, Megaraid Storage Manager (MSM) might not be able to start if run after dhclient.
The following error might appear when X Windows starts up on Red Hat Linux 4 U3:
mtrr: type mismatch for fd000000,800000 old: write-back new: write-combining
None. You can safely ignore this message. System functionality is not affected.
The Sun Fire x4600 server does not support PCI/PCI-X/PCI-E Hotplug or Hotswap capabilities. The Linux kernel attempts to locate the support in the firmware. This probe then fails with the following messages:
Evaluate _OSC Set fails. Status = 0x0005
Evaluate _OSC Set fails. Status = 0x0005
pciehp: Both _OSC and OSHP methods do not exist
None. These messages from the kernel can safely be ignored.
During the boot process of RHEL4 U2 on the Sun Fire x4600 server, the following error message might be displayed:
hda: packet command error: status=0x51 { DriveReady SeekComplete Error }
hda: packet command error: error=0x50
ide: failed opcode was 100
This error is a known problem with how the IDE driver handles the trayless CD/DVD-ROM drive that is contained in your Sun Fire x4600. This error is caused by the fact that the driver is attempting to close the CD/DVD-ROM drive tray. Because this drive does not have a tray, an error is reported.
None. It is safe to ignore this error.
On servers running RHEL4 U2, RHEL4 U3, or SLES9-SP3, when booting into the graphical user interface, the dmesg log might show the following error message more than once:
drivers/usb/input/hid-input.c: event field not found
During X initialization, some devices can get out of sync and some EV_REP events can get incorrectly interpreted as input events. This is caused by a bug in the HID driver. This message can be safely ignored.
RHEL3_U9 does not have reliable support for USB 2.0. This makes it difficult to install the OS using an optical drive that defaults to USB 2.0.
Change the BIOS settings so that all connections use USB 1.1.
The nVidia USB controller on the Sun Fire x4600 must have USB 2.0 structures mapped below 2GB of memory. When over 2GB of memory are used, undefined behaviors result. For USB 2.0 to work properly on RHEL4 U3, a future patch will be required. Until then, only USB 1.1 is supported.
Set the BIOS USB Controller Support option to USB 1.1 only:
1. Enter the BIOS Setup utility by pressing the F2 key while the system is booting up and performing the power-on self-test (POST).
2. On the BIOS Main Menu screen, select the Advanced tab to open the Advanced Menu screen.
3. On the Advanced Menu screen, choose USB Configuration.
4. On the USB Configuration screen, change USB Controller Support to USB1.1.
5. Press and release the right arrow key until the Exit menu screen is displayed.
6. Follow the instructions on the Exit menu screen to save your changes and exit the Setup utility.
When you install add-in Ethernet cards to the Sun Fire X4600 server in PCI slots 0-4, Red Hat Linux scans them first when assigning device names. If the Red Hat Linux OS was installed before installing the add-in Ethernet card, the new card might be reported as devXXXX (where XXXX is a number).
3. Remove the ifcfg-eth files from the sysconfig directory:
# rm -f /etc/sysconfig/network-scripts/ifcfg-eth*
# rm -f /etc/sysconfig/networking/devices/ifcfg-eth*
# rm -f /etc/sysconfig/networking/profiles/default/ifcfg-eth*
4. Edit the modprobe.conf file for ethX references. Remove any lines that start with alias eth* or alias dev* entries.
6. Configure the network device on next boot with Kudzu.
The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by software and hardware developers to detect system lockups during development. The NMI Watchdog periodically checks the CPU status to determine if a program is holding the CPU in an interrupted state for an extended period of time.
It has been observed in servers runnning BIOS 38 that the SMP kernel in RHEL4 will not boot without crashing when the NMI watchdog is enabled. If the watchdog timer is disabled, the server running RHEL4 will boot with no problems.
Disable the watchdog timer on RHEL4 by performing the following steps:
1. Log in as superuser (root).
2. Edit the /boot/grub/menu.lst file.
3. At the end of each line that begins with kernel, append this text:
nmi_watchdog=0s
4. Save the changes to the file.
The message file and dmesg log file might show messages similar to the following:
Warning many lost ticks Your time source seems to be unstable or some driver is hogging interrupts.
This message is caused by the contention between different IRQ handlers, but there is no negative impact to the system.
During SLES9 SP3 boot up, the following message is displayed:
ACPI-0201: *** Error: Return object type is incorrect [SB_.LATA._CRS] (Node 00000107fffdc180), AE_TYPE
This message can be safely ignored
During boot up, the SLES9 SP3 kernel prints the following message multiple times:
Attached scsi removable disk sdb at scsi2, channel 0, id 0, lun 0 Attached scsi generic sg2 at scsi2, channel 0, id 0, lun 0, type 0 Vendor: AMI Model: Virtual Floppy Rev: 1.00 Type: Direct-Access ANSI SCSI revision: 02
Each multiple message displays a different drive letter. In the example shown above, the drive letter is sdb.
To access a floppy drive, use the drive letter from the first message and ignore the subsequent messages.
During SLES9 SP3 system boot up the following message is displayed for all types of processors:
IA-32 Microcode Update Driver: v1.13 <tigran@veritas.com> microcode: CPU1 not a capable Intel processor microcode: CPU0 not a capable Intel processor
The message is repeated for all the processors on the system. You can safely ignore this message.
If you choose to install SLES9 SP3 from a CD using the graphical mode, you receive a text message informing you that there is less than the required 96 MB of memory available for installation in this mode.
Switch to the text mode to install the product.
On SLES9 SP3 systems, when you enter the cdrecord -scanbus command, you receive the following warning message:
pg: module not supported by Novell, setting U taint flag. pg: pg version 1.02, major 97 pga: Autoprobe failed pg: No ATAPI device detected
When installing SLES9 SP3 using YaST, the hard disk preparation operation might return an error message that says:
Error: Could not set up swap partition /dev/sda1
Click on OK and the installation will finish with no problems. You can then set up your desired swap partition using the commands below.
You can set the swap partition manually as described in the following steps.
1. After SLES9 SP3 has finished installation and the server has booted, login as the root user.
2. Issue the following commands in a terminal window:
# mkswap <swap partition space> # swapon <swap partition space>
3. In the /etc/fstab file, make an entry for swap partition (if it not already present), with option default. It should look like the following:
/dev/sdj5 swap swap defaults 0 0
The driver for the Sun 10-GbE, 133-MHz PCI-X (X5544A-4) cannot be installed under RHEL4U5 or SLES10 SP1.
The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by software and hardware developers to detect system lockups during development. The NMI Watchdog periodically checks the CPU status to determine if a program is holding the CPU in an interrupted state for an extended period of time.
To do this, the NMI Watchdog must be tied to an external timer source to know when to interrupt the CPU. This timer source for the AMD Opteron CPU is the Performance Counter, and the timer speed effectively increases as the processor performance increases. This can cause a large number of NMIs to be generated during very CPU-intensive situations. Therefore, it is recommended that you disable the NMI Watchdog timer in this type of situation.
You can disable the timer on RHEL4 by performing the following steps:
1. Log in as superuser (root).
2. Edit the /boot/grub/menu.lst file.
3. At the end of each line that begins with kernel, append this text:
nmi_watchdog=0
4. Save the changes to the file.
You then will be able to boot the system.
The ESX Server Console Operating System (COS) might report the following error at the main menu screen:
0:00:00:31.223 cpu2:1038 init:I586: Invalid vmkernel id:0. Distributed vmfs locking may not work.
This message indicates that networking for the COS is not attached or configured correctly.
If using DHCP for network configuration, ensure that the network interface link is up and that the DHCP server is operational. Otherwise, ensure that the host name and IP address for the interface are correctly configured.
The ESX Server message, Unexpected IO-APIC error, might appear in the /var/log/dmesg log file. There is no impact on performance or availability when this message appears.
The /var/log/dmesg log file has numerous messages that say BIOS reporting unknown devices. This is because of the existence of onboard hardware that ESX Server does not control. This has no impact on system usability or performance, and these messages can be safely ignored.
The message, Syncing Hardware Clock to System Time [Failed], is displayed during the ESX Server shutdown. This has no impact on system availability or performance and can be safely ignored.
The informational message, Unable to get COS default route, is displayed during bootup if no default route for the Console Operating System has been specified. Depending on the network topology, this may or may not have an impact on system usability and network access.
You can specify the default route in the file /etc/sysconfig/network by creating a line with this format:
(Where XXX.XXX.XXX.XXX is the default route IP address.)
ESX Server might report the message, INQUIRY EVPD Device ID failed, in the /var/log/dmesg log file when connecting some USB and SCSI storage devices.
Extended Vital Product Data (EVPD) is optional data provided by SCSI devices. Not all vendors program this data into their devices, resulting in this informational message from the SCSI system in ESX Server. This has no impact on system usability or performance.
ESX Server does not virtualize the Baseboard Management Controller (BMC) interface. That means that guest OSs cannot load their BMC interface drivers. Also, IPMI utilities running cannot use the BMC interface to interact with the Service Processor.
Error messages that occur when the BMC interface driver fails to load can be safely ignored. IPMI utilities must access the Service Processor over the network instead of using the BMC interface.
VMware ESX Server 3.0.1 cannot boot when the Sun StorageTek? PCI-E Dual Channel Ultra320 SCSI HBA is installed.
The synchronization progress reported by the cfggen status command is not accurate. This is a problem with the DOS version 2.00.18 of cfggen.
The LSI MPT BIOS version reported by the cfggen display command is not correct. This is a problem with the DOS version 2.00.18 of cfggen.
The MSM utilities might show incorrect disk drive counts after a drive is inserted or removed. This only occurs in non-RAID configurations.
On Windows you can work around this defect by restarting the MSM utilities.
2. Using the Windows Service Manager, restart MRMonitor and MSMFramework
3. Reopen the MSM application.
In non-RAID configurations, MSM does not update its status log when a disk is removed. Refreshing with F5 does not help. The log is updated when the disk is re-inserted.
Attempting to create a test schedule using the Schedule Manager fails with an error dialog that says “Operation Failed.”
SunVTS 6.2 Graphical User Interface (GUI), shipped on the Bootable Diagnostics CD, Version 2.1f, has a Meter button. This Meter button does not work because it requires the Solaris stdperformeter utility, which is not available for bootable diagnostics.
BMC communication time is very slow over KCS when using early Solaris 10 operating system releases.
Upgrade to the latest Solaris 10 release, which has better KCS support.
The MSM client is not able to find servers that are not on the same IP subnet as the client.
Set the server IP address manually.
Copying a large file (more than 6 GB) between an internal disk and a USB device can cause the system to reboot. This has been seen on Windows 2003 R2 SP2, 64-bit.
BIOS 51, which is included in Software Release 1.5, contains a workaround for this problem. Because the workaround impairs I/O performance, it is not enabled by default.
To use this workaround, install Software Release 1.5 firmware as described in Sun Fire X4600 Server Release Notes For Software Releases 1.4.1 and 1.5. To enable the workaround, set the BIOS option “Chipset/SouthBridge Configuration/Force MMIO write non-Posted” to “Enabled.”
The mkfloppy.exe utility that is included in FloppyPack.zip can be run on any Windows system; it is used to create the Mass Storage Driver floppy that is used during Windows Server 2003 installation.
However, if there is more than one floppy drive present in the system (including USB-attached floppy drives), mkfloppy.exe does not select the correct floppy drive.
Ensure that the system has only one floppy drive present when using mkfloppy.exe.
Copyright © 2009 Sun Microsystems, Inc. All rights reserved.