The following notes describe know issues for the firmware, OS, and other software.
When a fault.memory.memlink-uc interconnect fault is detected, the server should shut down to protect memory integrity. On intermittent occasions, this fault has been reported during boot operations without the server shutting down.
Although this irregular behavior might indicate that the system was able to recover from the memory link error and restore a healthy boot-up state, the safest course is to power down then power up the server.
Recovery Action: Log into Oracle ILOM on the SP power cycle the host:
-> stop /SYS Are you sure you want to stop /SYS (y/n)? y Stopping /SYS -> start /SYS Are you sure you want to start /SYS (y/n) ? y Starting /SYS
This issue is fixed in system firmware 8.2.1.b.
The timestamp reported in an email generated in an Oracle ILOM fault/critical event might be one hour later than the timestamp recorded in the event log.
Workaround: Perform one of these actions:
Upgrade the system firmware to version 8.2.1 or higher.
Check the timestamp recorded in the event log. If that timestamp does not match the timestamp reported in the email, use the event log time.
When installing the Oracle Solaris OS on domains controlled through Sun PCIe Dual Gigabit Ethernet (UTP or MMF) adapters, the e1000g driver might generate false error reports on the static direct I/O (SDIO) and primary domains. For example:
date time ereport.io.pciex.tl.ca nvlist version: 0 ena = 0x298a9f62243802 ena = 0x298a9f62243802 detector = (embedded nvlist) nvlist version: 0 scheme = dev device-path = /pci@400/pci@1 (end detector) class = ereport.io.pciex.tl.ca dev-status = 0x2 ue-status = 0x8000 ue-severity = 0x62030 adv-ctl = 0xf source-id = 0x600 source-valid = 1 __ttl = 0x1 __tod = 0x4c058b2e 0x1e8813a0
Workaround: You can safely ignore these ereports.
When installing the Oracle Solaris OS while the OBP diag-switch? parameter is set to true, the OS installer fails to update the bootdevice parameter with the new device path where the OS was installed. Therefore, this new device path will not be used during the subsequent automatic system reboots.
Under these conditions, the server displays these error message and you are unable to reboot from the device:
Installing boot information - Installing boot blocks (cxtxdxsx) - Installing boot blocks (/dev/rdsk/cxtxdxsx) - Updating system firmware for automatic rebooting WARNING: Could not update system for automatic rebooting
On previous servers and server modules, the OBP diag-device parameter used to set the new device path to the boot device when the diag-switch? parameter was set to true. On SPARC T4 servers and server modules, the diag-device parameter is no longer supported and the Oracle Solaris OS installer warns that setting the OBP boot-device parameter is not possible.
Workaround: From the Oracle ILOM prompt, set the OBP diag-switch? parameter to false:
-> set /HOST/bootmode script="setenv diag-switch? false"
Alternatively, you can set this parameter at the OBP ok prompt:
ok setenv diag-switch? false
If you attempt to create a RAID volume smaller than MAX, the following series of messages is returned:
You are about to create an IR volume. WARNING: Proceeding with this operation may cause data loss or data corruption. Are you sure you want to proceed (YES/NO)? yes WARNING: Volume created with size other than 'MAX' is not supported. Do you want to continue with volume creation (YES/NO)? n SAS2IRCU: you must answer "YES" or "yes" to proceed; operation aborted! SAS2IRCU: Error executing command CREATE.
RAID volumes smaller than MAX are not supported. However, if you want to create a volume below MAX size for nonproduction use, the software allows you to do so. This situation is not clear from the message.
Workaround: Ignore the messages and answer yes for the question “Do you want to continue with volume creation (YES/NO)?”.
This issue does not apply to server modules that are running the Oracle Solaris 11 OS.
For the Oracle Solaris 10 OS, install patch 147790-01 or higher.
Previously diagnosed and repaired PSH faults from the host reappear in Oracle ILOM when the host reboots. An incorrect report of a PSH-diagnosed fault appears in the Oracle ILOM CLI and web interface, and the fault LED illuminates.
You can identify this issue by checking to see if the same PSH fault was also reported from the host. If the fault was reported only by Oracle ILOM and not from the host, it is probably an example of this issue.
Recovery Action: Use the Oracle ILOM diagnostic and repair tools to identify the error condition and correct it. This example illustrates how to diagnose and repair a PSH fault diagnosed by the host. This example is based on the Oracle ILOM fault management shell.You could instead use the Oracle ILOM CLI or web interface to accomplish the same results.
Display the fault information.
faultmgmtsp> fmadm faulty ------------------- ------------------------------------ -------------- ------- Time UUID msgid Severity ------------------- ------------------------------------ -------------- ------- 2011-09-16/15:38:19 af875d87-433e-6bf7-cb53-c3d665e8cd09 SUN4V-8002-6E Major Fault class : fault.cpu.generic-sparc.strand FRU : /SYS/MB (Part Number: 7015272) (Serial Number: 465769T+1130Y6004M) Description : A fault has been diagnosed by the Host Operating System. Response : The service required LED on the chassis and on the affected FRU may be illuminated. Impact : No SP impact. Check the Host OS for more information. Action : The administrator should review the fault on the Host OS. Please refer to the Details section of the Knowledge Article for additional information.
Check for faults on the host.
# fmadm fault # <-- Host displays no faults
Verify that the fault shown by Oracle ILOM was repaired on the host.
# fmdump TIME UUID SUNW-MSG-ID Sep 16 08:38:19.5582 af875d87-433e-6bf7-cb53-c3d665e8cd09 SUN4V-8002-6E Sep 16 08:40:47.8191 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-4M Repaired Sep 16 08:40:47.8446 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-6U Resolved #
Flush the previously faulty component from the host resource cache.
# fmadm flush /SYS/MB fmadm: flushed resource history for /SYS/MB #
Repair the fault in Oracle ILOM.
faultmgmtsp> fmadm repair /SYS/MB faultmgmtsp> fmadm faulty No faults found faultmgmtsp>
The MIB should report the sunHwCtrlPowerMgmtBudgetTimelimit in milliseconds, but the value displayed is in seconds.
Workaround: Understand that the value reported for sunHwCtrlPowerMgmtBudgetTimelimit is in seconds.
This issue is fixed in system firmware 8.2.1.b or higher.
During normal operation and when running the Oracle VTS system exerciser, you might see this message in the system console:
date time hostname px: [ID 781074 kern.warning] WARNING: px0: spurious interrupt from ino 0x3,0x02,or 0x04
Workaround: Update the system firmware or you can safely ignore this message.
On occasion during a power cycle, the server module might display the following warning message:
[CPU 0:0:0] NOTICE: MCU0: Link init failed: TS0 Timeout
The server module automatically retries the training sequence operation without error.
Workaround: You can safely ignore this message.
The cfgadm command might fail on SG-SAS6-REM-Z or SGX-SAS6-REM-Z HBA devices.
# cfgadm -c unconfigure Slot1 cfgadm: Component system is busy, try again: unconfigure failed WARNING: (pcieb2): failed to detach driver for the device (mpt_sas9) in the Connection Slot1 WARNING: (pcieb2): failed to detach driver for the device (mpt_sas9) in the Connection Slot1
Workaround: Disable the fault management daemon before running the cfgadm unconfigure command.
# svcadm disable fmd # ps -ef |grep fmd ... # cfgadm -c unconfigure PCI-EM0
After completing the cfadm task, re-enable the fault management daemon:
# svcadm enable fmd
This issue is fixed in the Oracle Solaris 11.1 OS.
A message displayed by the cpustat command says:
See the “SPARC T4 User's Manual” for descriptions of these events. Documentation for Sun processors can be found at: http://www.sun.com/processors/manuals
The document and website listed in this message are not available.
This is only an issue on server modules running the Oracle Solaris 10 10/09 OS or Oracle Solaris 10 09/10 OS.
If you run the trapstat -t command, the server module might panic with a watchdog reset.
Workaround: Add the SUNWust1 and SUNWust2 packages from the Oracle Solaris OS media or from the Oracle Solaris 10 ISO image.
This issue is fixed in the Oracle Solaris 11.1 OS.
When running the reboot disk command, extraneous characters are occasionally added to the disk argument before it reaches the OBP. This situation results in a failure to boot.
Recovery Action: Repeat the boot request.
This issue is fixed in the Oracle Solaris 11.1 OS.
In rare cases, PCIe devices in the server module might report I/O errors that are identified and reported by predictive self-healing (PSH). For example:
--------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 10 13:03:23 a7d43aeb-61ca-626a-f47b-c05635f2cf5a PCIEX-8000-KP Major Host : dt214-154 Platform : ORCL,SPARC-T3-1B Chassis_id : Product_sn : Fault class : fault.io.pciex.device-interr-corr 67% fault.io.pciex.bus-linkerr-corr 33% Affects : dev:////pci@400/pci@1/pci@0/pci@c dev:////pci@400/pci@1/pci@0/pci@c/pci@0 faulted but still in service FRU : "/SYS/MB" (hc://:product-id=ORCL,SPARC-T3-1B:product-sn=1052NND107:server-id=dt214-154:chassis-id=0000000-0000000000:serial=1005LCB-1052D9008K:part=541-424304:revision=50/chassis=0/motherboard=0) 67% "FEM0" (hc://:product-id=ORCL,SPARC-T3-1B:product-sn=1052NND107:server-id=dt214-154:chassis-id=0000000-0000000000/chassis=0/motherboard=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=12/pciexfn=0/pciexbus=62/pciexdev=0) 33% faulty Description : Too many recovered bus errors have been detected, which indicates a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault. Refer to http://sun.com/msg/PCIEX-8000-KP for more information. Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.
These errors might be an indication of a faulty or incorrectly seated PCI EM. Or these errors might be erroneous.
Workaround: Ensure that the PCI EM is properly seated and functioning. If the errors continue, apply the Oracle Solaris 10 8/11 OS patch 147705-01 (or higher).
This issue is fixed in the Oracle Solaris 10 8/11 with patch 147440-05 (or higher) and in the Oracle Solaris 11.1 OS.
With certain unusual heavy workloads, especially where a highly processor-intensive workload is bound to cpu 0, the host might appear to suddenly reset back to OBP without any sign of there having been a crash or a panic. The Oracle ILOM event log contains a host watchdog expired entry. The problem is more likely to occur on systems with full memory configurations.
If you see this sort of sudden reset, display the SP event log using this command from the Oracle ILOM CLI:
-> show /SP/logs/event/list
If you see an entry labeled Host watchdog expired, you are experiencing this issue.
Workaround: Contact your authorized service provider to see if a fix is available.
There are two ways you can work around this issue:
You can extend the watchdog period by adding this entry to the Oracle Solaris /etc/system file:
set watchdog_timeout = 60000
This extends the watchdog timeout period to 1 minute (60000 milliseconds).
In extreme cases, you can disable the watchdog timeout altogether by adding this entry to the /etc/system file:
set watchdog_enabled = 0
Whenever you modify the /etc/system file you must reboot the system for the changes to take effect.
If you do not want to reboot the system immediately after editing /etc/system, you can apply an additional temporary workaround that takes effect immediately. To apply this temporary workaround, as root type:
# psrset -c -F 0
This command creates a temporary processor set containing only CPU 0, preventing application workloads from using this processor and preventing this issue from occurring.
Note - If any threads were bound to CPU 0, they will be unbound.
This temporary processor set is removed on the next operating system reboot, at which point the /etc/system file workaround takes effect.
This issue is fixed in the Oracle Solaris 11.1 OS.
The server module might generate an ereport.fm.fmd.module message during a reboot of an SDIO domain. This ereport indicates that an error occurred on one of the fmd modules but the fmdump command does not display a valid message (msg).
# fmdump -eV -c ereport.fm.fmd.module TIME CLASS Sep 27 2011 06:27:19.954801492 ereport.fm.fmd.module nvlist version: 0 version = 0x0 class = ereport.fm.fmd.module detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = fmd authority = (embedded nvlist) nvlist version: 0 version = 0x0 product-id = ORCL,SPARC-T4-1B server-id = c193-133 (end authority) mod-name = etm mod-version = 1.2 (end detector) ena = 0x425fc9b065404001 msg = cannot open write-only transport <=== __ttl = 0x1 __tod = 0x4e81cf37 0x38e91d54
Workaround: You can safety ignore ereport.fm.fmd.module ereports.
This issue is fixed in Oracle VTS 7.0 PS13.
The Oracle VTS processor test called dtlbtest hangs when Oracle VM for SPARC max-ipc threading mode is set. This issue is not specific to any processor type, and happens when both the following cases are true:
Only one CPU or strand per core is enabled or online.
The total number of online CPU per strand is less than or equal to 128.
Workaround: Do one of the following:
Update to Oracle VTS 7.0 PS13
Do not run the Oracle VTS dtlbtest with the Oracle VM for SPARC threading mode set to max-ipc mode.
This issue is fixed in system firmware 8.1.4.e and higher.
After a cold reset, the server might add one day to the Oracle Solaris OS date and time. This possible date change will only occur on the first cold reset after January 1, 2012. Once you set the correct date using the Oracle Solaris OS date(1) command, the corrected date and time will persist across future resets.
A cold reset is when you halt the OS and restart the service processor (SP). For example, you can use one of the following Oracle Solaris OS commands to halt the OS:
# shutdown -g0 -i0 -y
# uadmin 1 6
# init 5
Then, at the ILOM prompt, use the following commands to reset the host:
-> stop /SYS . . . -> start /SYS
Refer to the service manual, the administration guide, and the Oracle Solaris OS documentation for more information.
Workaround: Install the latest system firmware. This issue is fixed in the system firmware version 8.1.4.e and higher.
After the first cold reset of the system, verify that the system date and time are correct. If the date has been impacted by this issue, use the Oracle Solaris OS date(1) command to set the correct date and time.
For example, to set the date and time to be February 26, 9:00am, 2012, type:
# date 022609002012
Refer to the date(1) man page and the Oracle Solaris OS documentation for more information.