Firmware, OS, and Other Software Issues

The following notes describe know issues for the firmware, OS, and other software.

`fault.memory.memlink-uc` Fault Did Not Cause Panic as Stated by System Message (CR 6940599)

When a fault.memory.memlink-uc interconnect fault is detected, the server should shut down to protect memory integrity. On intermittent occasions, this fault has been reported during boot operations without the server shutting down.

Although this irregular behavior might indicate that the system was able to recover from the memory link error and restore a healthy boot-up state, the safest course is to power down then power up the server.

Recovery Action: Log into Oracle ILOM on the SP power cycle the host:

-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n) ? y
Starting /SYS

Timestamp for an Oracle ILOM Fault/Critical Event Might Be off by One Hour (CR 6943957)

This issue is fixed in system firmware 8.2.1.b.

The timestamp reported in an email generated in an Oracle ILOM fault/critical event might be one hour later than the timestamp recorded in the event log.

Workaround: Perform one of these actions:

Upgrade the system firmware to version 8.2.1 or higher.
Check the timestamp recorded in the event log. If that timestamp does not match the timestamp reported in the email, use the event log time.

`e1000g` Driver Generates Spurious `ereports` When Installing Oracle Solaris OS Over a Sun PCIe Dual Gigabit Ethernet Adapter (CR 6958011)

When installing the Oracle Solaris OS on domains controlled through Sun PCIe Dual Gigabit Ethernet (UTP or MMF) adapters, the e1000g driver might generate false error reports on the static direct I/O (SDIO) and primary domains. For example:

date time ereport.io.pciex.tl.ca nvlist version: 0
          ena = 0x298a9f62243802
ena = 0x298a9f62243802
detector = (embedded nvlist)
nvlist version: 0
scheme = dev
device-path = /pci@400/pci@1
(end detector)
 
class = ereport.io.pciex.tl.ca
dev-status = 0x2
ue-status = 0x8000
ue-severity = 0x62030
adv-ctl = 0xf
source-id = 0x600
source-valid = 1
__ttl = 0x1
__tod = 0x4c058b2e 0x1e8813a0

Workaround: You can safely ignore these ereports.

When `diag-switch?` Is Set to `true`, Oracle Solaris OS Fails to Update the EEPROM for Automatic Rebooting (CR 6982060)

When installing the Oracle Solaris OS while the OBP diag-switch? parameter is set to true, the OS installer fails to update the bootdevice parameter with the new device path where the OS was installed. Therefore, this new device path will not be used during the subsequent automatic system reboots.

Under these conditions, the server displays these error message and you are unable to reboot from the device:

Installing boot information
       - Installing boot blocks (cxtxdxsx)
       - Installing boot blocks (/dev/rdsk/cxtxdxsx)
       - Updating system firmware for automatic rebooting
WARNING: Could not update system for automatic rebooting

On previous servers and server modules, the OBP diag-device parameter used to set the new device path to the boot device when the diag-switch? parameter was set to true. On SPARC T4 servers and server modules, the diag-device parameter is no longer supported and the Oracle Solaris OS installer warns that setting the OBP boot-device parameter is not possible.

Workaround: From the Oracle ILOM prompt, set the OBP diag-switch? parameter to false:

-> set /HOST/bootmode script="setenv diag-switch? false"

Alternatively, you can set this parameter at the OBP ok prompt:

ok setenv diag-switch? false

Unable to Configure RAID Volume Sizes Other Than the Max Size When Using the `sas2ircu` Command (CR 6983210)

If you attempt to create a RAID volume smaller than MAX, the following series of messages is returned:

You are about to create an IR volume.
 
WARNING: Proceeding with this operation may cause data loss or data
         corruption. Are you sure you want to proceed (YES/NO)? yes
 
WARNING: Volume created with size other than 'MAX' is not supported.
         Do you want to continue with volume creation (YES/NO)? n
SAS2IRCU: you must answer "YES" or "yes" to proceed; operation aborted!
SAS2IRCU: Error executing command CREATE.

RAID volumes smaller than MAX are not supported. However, if you want to create a volume below MAX size for nonproduction use, the software allows you to do so. This situation is not clear from the message.

Workaround: Ignore the messages and answer yes for the question “Do you want to continue with volume creation (YES/NO)?”.

Fault Management Sometimes Sends Resolved Cases to the SP (CR 6983432)

This issue does not apply to server modules that are running the Oracle Solaris 11 OS.

For the Oracle Solaris 10 OS, install patch 147790-01 or higher.

Previously diagnosed and repaired PSH faults from the host reappear in Oracle ILOM when the host reboots. An incorrect report of a PSH-diagnosed fault appears in the Oracle ILOM CLI and web interface, and the fault LED illuminates.

You can identify this issue by checking to see if the same PSH fault was also reported from the host. If the fault was reported only by Oracle ILOM and not from the host, it is probably an example of this issue.

Recovery Action: Use the Oracle ILOM diagnostic and repair tools to identify the error condition and correct it. This example illustrates how to diagnose and repair a PSH fault diagnosed by the host. This example is based on the Oracle ILOM fault management shell.You could instead use the Oracle ILOM CLI or web interface to accomplish the same results.

Display the fault information.

faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- -------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- -------
2011-09-16/15:38:19 af875d87-433e-6bf7-cb53-c3d665e8cd09 SUN4V-8002-6E  Major
 
Fault class : fault.cpu.generic-sparc.strand
 
FRU         : /SYS/MB
              (Part Number: 7015272)
              (Serial Number: 465769T+1130Y6004M)
 
Description : A fault has been diagnosed by the Host Operating System.
 
Response    : The service required LED on the chassis and on the affected
              FRU may be illuminated.
 
Impact      : No SP impact.  Check the Host OS for more information.
 
Action      : The administrator should review the fault on the Host OS.
              Please refer to the Details section of the Knowledge Article
              for additional information.

Check for faults on the host.

# fmadm fault
#                       <-- Host displays no faults

Verify that the fault shown by Oracle ILOM was repaired on the host.

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Sep 16 08:38:19.5582 af875d87-433e-6bf7-cb53-c3d665e8cd09 SUN4V-8002-6E
Sep 16 08:40:47.8191 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-4M Repaired
Sep 16 08:40:47.8446 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-6U Resolved
#

Flush the previously faulty component from the host resource cache.

# fmadm flush /SYS/MB
fmadm: flushed resource history for /SYS/MB
#

Repair the fault in Oracle ILOM.

faultmgmtsp> fmadm repair /SYS/MB
faultmgmtsp> fmadm faulty
No faults found
faultmgmtsp>

Units Used to Define the MIB Power Management Time Limit are Reported in Seconds (CR 6993008)

The MIB should report the sunHwCtrlPowerMgmtBudgetTimelimit in milliseconds, but the value displayed is in seconds.

Workaround: Understand that the value reported for sunHwCtrlPowerMgmtBudgetTimelimit is in seconds.

Spurious Interrupt Message in System Console When Using Oracle VTS (CR 7038266)

This issue is fixed in system firmware 8.2.1.b or higher.

During normal operation and when running the Oracle VTS system exerciser, you might see this message in the system console:

date time hostname px: [ID 781074 kern.warning] WARNING: px0: spurious
interrupt from ino 0x3,0x02,or 0x04

Workaround: Update the system firmware or you can safely ignore this message.

Intermittent Link Training Timeout Displayed During Power Cycles (CR 7043201)

On occasion during a power cycle, the server module might display the following warning message:

[CPU 0:0:0] NOTICE: MCU0: Link init failed: TS0 Timeout

The server module automatically retries the training sequence operation without error.

Workaround: You can safely ignore this message.

The `cfgadm` Command Might Fail on SG-SAS6-REM-Z or SGX-SAS6-REM-Z HBAs (CR 7044759)

The cfgadm command might fail on SG-SAS6-REM-Z or SGX-SAS6-REM-Z HBA devices.

# cfgadm -c unconfigure Slot1
cfgadm: Component system is busy, try again: unconfigure failed
WARNING: (pcieb2): failed to detach driver for the device (mpt_sas9) in the Connection Slot1
WARNING: (pcieb2): failed to detach driver for the device (mpt_sas9) in the Connection Slot1

Workaround: Disable the fault management daemon before running the cfgadm unconfigure command.

# svcadm disable fmd
# ps -ef |grep fmd
...
# cfgadm -c unconfigure PCI-EM0

After completing the cfadm task, re-enable the fault management daemon:

# svcadm enable fmd

Message From `cpustat` Refers to Processor Documentation Incorrectly (CR 7046898)

This issue is fixed in the Oracle Solaris 11.1 OS.

A message displayed by the cpustat command says:

See the “SPARC T4 User's Manual” for descriptions of these events.
Documentation for Sun processors can be found at:
http://www.sun.com/processors/manuals

The document and website listed in this message are not available.

Using `trapstat` Might Cause a Panic (CR 7052070)

This is only an issue on server modules running the Oracle Solaris 10 10/09 OS or Oracle Solaris 10 09/10 OS.

If you run the trapstat -t command, the server module might panic with a watchdog reset.

Workaround: Add the SUNWust1 and SUNWust2 packages from the Oracle Solaris OS media or from the Oracle Solaris 10 ISO image.

`reboot` `disk` Command Occasionally Fails When `disk` Argument Picks Up Extra Characters (CR 7050975)

This issue is fixed in the Oracle Solaris 11.1 OS.

When running the reboot disk command, extraneous characters are occasionally added to the disk argument before it reaches the OBP. This situation results in a failure to boot.

Recovery Action: Repeat the boot request.

PCIe Correctable Errors Might Be Reported (CR 7051331)

This issue is fixed in the Oracle Solaris 11.1 OS.

In rare cases, PCIe devices in the server module might report I/O errors that are identified and reported by predictive self-healing (PSH). For example:

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 10 13:03:23 a7d43aeb-61ca-626a-f47b-c05635f2cf5a  PCIEX-8000-KP  Major
 
Host        : dt214-154
Platform    : ORCL,SPARC-T3-1B  Chassis_id  :
Product_sn  :
 
Fault class : fault.io.pciex.device-interr-corr 67%
              fault.io.pciex.bus-linkerr-corr 33%
Affects     : dev:////pci@400/pci@1/pci@0/pci@c
              dev:////pci@400/pci@1/pci@0/pci@c/pci@0
                  faulted but still in service
FRU         : "/SYS/MB" (hc://:product-id=ORCL,SPARC-T3-1B:product-sn=1052NND107:server-id=dt214-154:chassis-id=0000000-0000000000:serial=1005LCB-1052D9008K:part=541-424304:revision=50/chassis=0/motherboard=0) 67%
              "FEM0" (hc://:product-id=ORCL,SPARC-T3-1B:product-sn=1052NND107:server-id=dt214-154:chassis-id=0000000-0000000000/chassis=0/motherboard=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=12/pciexfn=0/pciexbus=62/pciexdev=0) 33%
                  faulty
 
Description : Too many recovered bus errors have been detected, which indicates
              a problem with the specified bus or with the specified
              transmitting device. This may degrade into an unrecoverable
              fault.
              Refer to http://sun.com/msg/PCIEX-8000-KP for more information.
 
Response    : One or more device instances may be disabled
 
Impact      : Loss of services provided by the device instances associated with
              this fault
 
Action      : If a plug-in card is involved check for badly-seated cards or
              bent pins. Otherwise schedule a repair procedure to replace the
              affected device.  Use fmadm faulty to identify the device or
              contact Sun for support.

These errors might be an indication of a faulty or incorrectly seated PCI EM. Or these errors might be erroneous.

Workaround: Ensure that the PCI EM is properly seated and functioning. If the errors continue, apply the Oracle Solaris 10 8/11 OS patch 147705-01 (or higher).

Watchdog Timeouts Seen With Heavy Workloads and Maximum Memory Configurations (CR 7083001)

This issue is fixed in the Oracle Solaris 10 8/11 with patch 147440-05 (or higher) and in the Oracle Solaris 11.1 OS.

With certain unusual heavy workloads, especially where a highly processor-intensive workload is bound to cpu 0, the host might appear to suddenly reset back to OBP without any sign of there having been a crash or a panic. The Oracle ILOM event log contains a host watchdog expired entry. The problem is more likely to occur on systems with full memory configurations.

If you see this sort of sudden reset, display the SP event log using this command from the Oracle ILOM CLI:

-> show /SP/logs/event/list

If you see an entry labeled Host watchdog expired, you are experiencing this issue.

Workaround: Contact your authorized service provider to see if a fix is available.

There are two ways you can work around this issue:

You can extend the watchdog period by adding this entry to the Oracle Solaris /etc/system file:
```
set watchdog_timeout = 60000
```
This extends the watchdog timeout period to 1 minute (60000 milliseconds).
In extreme cases, you can disable the watchdog timeout altogether by adding this entry to the /etc/system file:
```
set watchdog_enabled = 0
```

Whenever you modify the /etc/system file you must reboot the system for the changes to take effect.

If you do not want to reboot the system immediately after editing /etc/system, you can apply an additional temporary workaround that takes effect immediately. To apply this temporary workaround, as root type:

# psrset -c -F 0

This command creates a temporary processor set containing only CPU 0, preventing application workloads from using this processor and preventing this issue from occurring.

Note - If any threads were bound to CPU 0, they will be unbound.

This temporary processor set is removed on the next operating system reboot, at which point the /etc/system file workaround takes effect.

`ereport.fm.fmd.module` Generated During a Reboot of an SDIO Domain (CR 7085231)

This issue is fixed in the Oracle Solaris 11.1 OS.

The server module might generate an ereport.fm.fmd.module message during a reboot of an SDIO domain. This ereport indicates that an error occurred on one of the fmd modules but the fmdump command does not display a valid message (msg).

For example:

# fmdump -eV -c ereport.fm.fmd.module
TIME                           CLASS
Sep 27 2011 06:27:19.954801492 ereport.fm.fmd.module
nvlist version: 0
        version = 0x0
        class = ereport.fm.fmd.module
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = ORCL,SPARC-T4-1B
                        server-id = c193-133
                (end authority)
 
                mod-name = etm
                mod-version = 1.2
        (end detector)
 
        ena = 0x425fc9b065404001
        msg = cannot open write-only transport <===
        __ttl = 0x1
        __tod = 0x4e81cf37 0x38e91d54

Workaround: You can safety ignore ereport.fm.fmd.module ereports.

Oracle VTS `dtlbtest` Hangs When CPU Threading Mode is Set to `max-ipc` (CR 7094158)

This issue is fixed in Oracle VTS 7.0 PS13.

The Oracle VTS processor test called dtlbtest hangs when Oracle VM for SPARC max-ipc threading mode is set. This issue is not specific to any processor type, and happens when both the following cases are true:

Only one CPU or strand per core is enabled or online.
The total number of online CPU per strand is less than or equal to 128.

Workaround: Do one of the following:

Update to Oracle VTS 7.0 PS13
Do not run the Oracle VTS dtlbtest with the Oracle VM for SPARC threading mode set to max-ipc mode.

Cold Reset Adds One Day to System Time (CR 7127740)

This issue is fixed in system firmware 8.1.4.e and higher.

After a cold reset, the server might add one day to the Oracle Solaris OS date and time. This possible date change will only occur on the first cold reset after January 1, 2012. Once you set the correct date using the Oracle Solaris OS date(1) command, the corrected date and time will persist across future resets.

A cold reset is when you halt the OS and restart the service processor (SP). For example, you can use one of the following Oracle Solaris OS commands to halt the OS:

# shutdown -g0 -i0 -y

# uadmin 1 6

# init 5

# poweroff

Then, at the ILOM prompt, use the following commands to reset the host:

-> stop /SYS
. . .
-> start /SYS

Refer to the service manual, the administration guide, and the Oracle Solaris OS documentation for more information.

Workaround: Install the latest system firmware. This issue is fixed in the system firmware version 8.1.4.e and higher.

After the first cold reset of the system, verify that the system date and time are correct. If the date has been impacted by this issue, use the Oracle Solaris OS date(1) command to set the correct date and time.

For example, to set the date and time to be February 26, 9:00am, 2012, type:

# date 022609002012

Refer to the date(1) man page and the Oracle Solaris OS documentation for more information.

Skip Navigation Links
Exit Print View
	SPARC T4-1B Server Module Product Notes

Product Notes