C H A P T E R 2 |
Late-Breaking Issues |
This chapter provides the following information:
Modular systems might experience issues when handling error events, where error telemetry might not be processed or logged by the service processor to the host upon processing a stream of error events. This problem can occur when the server module is running system firmware 7.2.10.a and earlier.
Workaround: Upgrade system firmware to 7.3.0 (or later). See Supported Versions of the Oracle Solaris OS, Firmware, and Software.
This problem only occurs on multi-socket Sun4v systems that are running system firmware 7.2.10.a or earlier.
When processing fault events under certain conditions the server module might reset, panic, or hang. This occurs when the system is handling events that require data from a remote CPU node.
Workaround: Upgrade the system firmware to 7.3.0 (or later). See Supported Versions of the Oracle Solaris OS, Firmware, and Software.
When a SAS2 capable REM is installed in the server module, using the cfgadm -c unconfigure command fails to illuminate the drives OK-to-Remove LED making it difficult to identify which drive to remove.
Workaround: If you are still uncertain about the location of the drive, perform the following procedure.
Manually Locate a Drive |
1. Run format utility and select the device that you need to locate.
2. Make note of the cntndn number associated with the drive.
For example, in the previous output example, the string to note is
c0t5000C5000258C457d0.
3. Type q to exit the format utility.
4. Find the serial number for the device:
a. Redirect the output of the iostat command to a file.
b. In the file, search for the string you noted in Step 2.
You can use an editor and search for the string. In the following example, we are searching for c0t5000C5000258C457d0.
c. Identify the serial number associated with the string.
In the previous example, 0802V16VTE is the serial number.
5. Change to the directory where you installed the SAS2IRCU utility.
For information on downloading and installing the SAS2IRCU utility, refer to the Sun Storage 6 Gb SAS REM HBA Installation Guide.
6. Find the SAS2 controller number (shown under Index) using the sas2ircu LIST command.
7. Redirect the output of the sas2ircu n display command to a file, where n is the controller number from Step 6.
8. In the output file, search for the serial number obtained from Step 4.
9. In the output, look for the enclosure # and slot # that correspond to this device.
The drive is in a server module. The Slot # refers to slot number on the server module. In the previous example, Slot # 1 corresponds to HDD1 on the front panel of the server module.
Locate the drive and do not complete the remaining steps in this procedure.
The drive is in a storage module. The Slot # refers to the slot number on the storage module.
Perform the remaining steps in this procedure.
10. To locate the drive in storage module, use the sas2ircu LOCATE command.
The locate ID on the drive will start blinking (amber).
Example specifying a drive in enclosure # 6, slot # 7:
11. After replacing the drive, turn off the locate LED.
Example specifying a drive in enclosure # 6, slot # 7:
The cfgadm -c unconfigure command fails if the path specified is an mpxio enabled device.
Workaround: This issue is fixed in the Oracle Solaris 9/10 OS and in kernel patch 14909-13 (or later). If you are unable to install Oracle Solaris 9/10 OS or patch 14909-13, perform the following procedure.
Manually Unconfiguring Multipath-Enabled Drives |
1. Start the format utility to see the drives and to obtain the drive numbers (such as c0t5000C5000F0E5AFFd0) for the drive you plan to unconfigure.
2. To exit the format utility, select one of the drives and type q.
3. Use the mount command to identify whether the device is mounted or if it is a boot drive.
# mount | grep c0t5000C5000F0E5AFFd0 /mnt on /dev/dsk/c0t5000C5000F0E5AFFd0s6 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=600016 on Fri Jun 4 10:37:08 2010 |
4. Based on your results, do one of the following:
5. Identify the processes running on the drive:
a. Run the fuser command to identify the processes accessing the disk.
b. If you identify a process, use the ps command to further identify the process.
# ps -ef | grep 1036 root 1036 982 0 11:56:34 pts/2 0:02 dd if=/dev/dsk/c0t5000C5000F0E5AFFd0s2 of=/dev/dsk/c0t5000C5000F0FE227d0s7 |
c. Kill processes identified in Step b using kill -9 PID.
d. Use the umount command to unmount any mount points and then run sync command to synchronize the disk.
e. Remove the disk, and do not continue with subsequent steps in this procedure.
6. If the drive is a boot drive, run the following commands to synchronize the drive and shutdown the system:
On some Sun Blade T6340 Modular Servers, the following intermittent error message is displayed during POST or SunVTS testing:
Fault | critical: "SP detected fault at time Tue Oct 27 18:17:32 2009. Host Power Failure: MB_DC_POK Fault" |
Fix: Update the modular server System Firmware to version 7.2.4.f or higher.
This issue occurs on Sun Blade T6340 server modules that are in a modular system chassis with CMM firmware 3.0.3.32.
When you launch the web interface by connecting to the CMM in the chassis where your server module is installed, you can then select the server module within the web interface to connect to it. If you connect this way, however, the ILOM Remote Console does not launch for the Sun Blade T6340 server module.
Workaround: Use the web interface to connect directly to the Sun Blade T6340 server module, not to the CMM.
This issue is fixed in System Firmware 7.1.8.a and later versions.
When powering on a Sun Blade T6340 server module, you might encounter the following error messages:
Workaround: Update the System Firmware version to 7.1.8.a or later.
Other Workarounds: This error is encountered when there is a large difference between the amount of memory on the different CMP and MEM modules. For example, it could happen if the memory on CMP0+MEM0 added up to 128 Gbytes, but the memory on CMP1+MEM1 added up to only 16 Gbytes. This situation can happen in two different situations. Each situation has its own recovery procedure.
Recovery: Reallocate the FB-DIMMs across the CMP and MEM modules to keep the total number and types of FB-DIMMs the same on each CMP and MEM module.
Recovery: Take one of the following two steps:
You must take this step if replacing the failed FB-DIMM is not immediately possible or desired.
i. View a list of enabled and disabled devices.
In ALOM compatibility shell: showcomponent
ii. Identify the FB-DIMM devices to be disabled.
For each FB-DIMM device that is disabled, you will disable the corresponding FB-DIMM associated with the other CMP/MEM units. For example, if the following device is disabled:
Then you must disable the following devices:
iii. Disable the target FB-DIMM devices.
In ILOM: set /SYS/component component_state=disabled
In ALOM CMT compatibility shell: disablecomponent component
This issue is fixed in System Firmware 7.1.8.a and later.
The service procedure “To Reset the Root Password to the Factory Default” described in the Sun Blade T6340 Server Module Service Manual does not reset the root password.
Workaround: If possible, update the System Firmware to 7.1.8.a or later.
The prtdiag -v command is slow and could appear to hang. The command might take up to five minutes to complete.
Fix: Update the OS to Oracle Solaris 10 5/09 or install the Solar55is 10 kernel patch 139555-08 (or later).
This issue is fixed in System Firmware 7.2.0.
Using the SP setdate command (ALOM compatibility shell) after having configured nondefault logical domains can cause the date on nondefault domains to change.
Workaround: Update the System Firmware to 7.2.0 or later.
Another workaround: Use the setdate command to configure the date on the SP before configuring and saving logical domain configurations.
If you use setdate after nondefault logical domain configurations have been saved, each nondefault domain must be booted to Oracle Solaris and the date corrected. (Refer to the date(1) or ntpdate(1M) man page.)
SunVTS xnetlbtest can fail during XAUI loopback testing. Failures occur with this error message:
Workaround: Do not run SunVTS xnetlbtest on XAUI interfaces.
Fix: Update the OS to Oracle Solaris 10/08 or install the Oracle Solaris 10 OS kernel patch 137137-09 or later.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.