This section describes issues related to SPARC T4-4 server components.
To maximize processor module memory bandwidth, Oracle recommends that only fully-populated memory configurations—as opposed to half-populated configurations—be considered for performance-critical applications.
For specific memory installation and upgrade instructions, see the SPARC T4-4 Server Service Manual.
Only certain PCIe cards can be used as direct I/O endpoint devices on an I/O domain. You can still use other cards in your Oracle VM Server for SPARC environment, but these other cards cannot be used with the Direct I/O feature. Instead, other PCIe cards can be used for service domains and for I/O domains that have entire root complexes assigned to them.
For the most up-to-date list of supported PCIe cards, refer to https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&doctype=REFERENCE&id=1325454.1
Note - Not all cards listed on the Direct I/O web page are supported in the SPARC T4-4 server. Check the server hardware compatibility list before installing any PCIe cards.
To download sas2ircu firmware and documentation for SPARC T4-4 server from the current LSI web site, you must use links labeled SPARC T3-1 and T3-2. The software and documentation is the same for both sets of servers.
This is the web site for downloading sas2ircu software from LSI:
This is the web site for downloading sas2ircu documentation from LSI:
Sun Type 6 keyboards cannot be used with SPARC T4 series servers.
Although hardware RAID 0 and 1 are supported on the SPARC T4-4 server, hardware RAID 1E is not supported. Other RAID formats are available through software RAID.
Note - This issue was originally listed as CR 6983185.
When attempting to boot a USB thumbdrive inserted in either front USB port (USB2 or USB3), the server might panic.
Workaround: Use the server's rear USB ports (USB0 or USB1) whenever booting from an external USB device.
Note - This issue was originally listed as CR 6987359.
If you hot-plug a Dual 10GbE SFP+ PCIe2.0 EM Network Interface Card (NIC) (part number 1110A-Z) into a PCI Express Module slot that had previously held a 4-Port (Cu) PCIe (x4) ExpressModule (part number (X)7284A-Z-N), the expected performance benefit of the Dual 10GbE SFP+ PCIe2.0 NIC might not occur.
This problem does not occur if the slot was previously unoccupied, or if it had been occupied by any other option card. In addition, this problem occurs if the card is present when the system is powered on.
Workaround: Hotplug the Dual 10Gbe SFP+ PCIe2.0 EM card a second time, using one of the following methods.
Use the cfgadm(1m) command to disconnect, then reconnect, the card:
# cfgadm -c disconnect slot-name # cfgadm -c configure slot-name
Use the hotplug(1m) command to disable and poweroff the device, and then poweron and enable the device:
# hotplug disable device-path slot-name # hotplug poweroff device-path slot-name # hotplug poweron device-path slot-name # hotplug enable device-path slot-name
Use the Attention (ATTN) button on the card to deconfigure and then reconfigure the card.
Note - You don't need to physically remove and re-insert the card as part of the second hot plug operation.
Note - This issue was originally listed as CR 6995634.
In some rare instances, unrecoverable USB hardware errors occur, such as the following:
usba: WARNING: /pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,2 (ehci0): Unrecoverable USB Hardware Error usba: WARNING: /pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,1/hub@1/hub@3 (hubd5): Connecting device on port 2 failed
Workaround: Reboot the system. Contact your service representative if these error messages persist.
Note - This issue was originally listed as CR 7031216.
Note - This issue was fixed in Oracle Solaris 11.1.
When a CPU module is replaced to repair a faulty CPU, PSH might not clear retired cache lines on the replacement FRU. In such cases, the cache line remains disabled.
Workaround: Manually clear the disabled cache line by running the following command:
# fmadm repaired fmri | label
# fmdump -avNov 03 10:34:56.6192 e1ee44ed-72f7-c32b-855b-e9f4b03144af SUN4V-8002-V3 TIME UUID SUNW-MSG-IDProblem in: hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxxyyyxxx:server-id=xxxxx:chassis-id=xxxxyyyxxx/chassis=0/cpuboard=0/chip=0/l3cache=0/cacheindex=256/cacheway=7Affects: hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxxyyyxxx:server-id=xxxxx:chassis-id=xxxxyyyxxx/chassis=0/cpuboard=0/chip=0/l3cache=0/cacheindex=256/cacheway=7 FRU: hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxxyyyxxx:server-id=xxxxx:chassis-id=xxxxyyyxxx:serial=465769T+1115H50061:part=7013822:revision=01/chassis=0/cpuboard=0 # fmadm repaired hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxxyyyxxx:server-id=xxxxx:chassis-id=xxxxyyyxxx/chassis=0/cpuboard=0/chip=0/l3cache=0/cacheindex=256/cacheway=7Location: /SYS/PM0100% fault.cpu.generic-sparc.cachelinefmadm: recorded repair to of hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxxyyyxxx:server-id=xxxxx:chassis-id=xxxxyyyxxx/chassis=0/cpuboard=0/chip=0/l3cache=0/cacheindex=256/cacheway=7 # fmdump -aTIME UUID SUNW-MSG-ID Nov 03 10:34:56.6192 e1ee44ed-72f7-c32b-855b-e9f4b03144af SUN4V-8002-V3 Nov 03 10:37:40.3545 e1ee44ed-72f7-c32b-855b-e9f4b03144af FMD-8000-4M RepairedNov 03 10:37:40.3610 e1ee44ed-72f7-c32b-855b-e9f4b03144af FMD-8000-6U Resolved
Note - This issue was originally listed as CR 7051331.
Note - This issue was fixed in Oracle Solaris 11.
In rare cases, PCI Express Gen2 or low-profile PCIe devices in the server might report I/O errors that are identified and reported by PSH. For example:
--------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 10 13:03:23 a7d43aeb-61ca-626a-f47b-c05635f2cf5a PCIEX-8000-KP Major Host : dt214-154 Platform : ORCL,SPARC-T4-4 Chassis_id : Product_sn : Fault class : fault.io.pciex.device-interr-corr 67% fault.io.pciex.bus-linkerr-corr 33% Affects : dev:////pci@400/pci@1/pci@0/pci@c dev:////pci@400/pci@1/pci@0/pci@c/pci@0 faulted but still in service FRU : "/SYS/MB" (hc://:product-id=ORCL,SPARC-T4-4:product-sn=xxxx:server-id=xxxx:chassis-id=0000000-0000000000:serial=xxxx:part=541-424304:revision=50/chassis=0/motherboard=0) 67% "FEM0" (hc://:product-id=ORCL,SPARCT4-4:product-sn=xxxxx:server-id=xxxxx:chassis-id=0000000-0000000000/chassis=0/motherboard=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=12/pciexfn=0/pciexbus=62/pciexdev=0) 33% faulty Description : Too many recovered bus errors have been detected, which indicates a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault. ... Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.
These errors might be an indication of a faulty or incorrectly seated device. Or these errors might be erroneous.
Workaround: Ensure that the device is properly seated and functioning. If the errors continue, apply patch 147705-01 or higher.
Note - This issue was originally listed as CR 7065563.
Note - This issue was fixed in System Firmware 8.1.4.
An L2 cache uncorrectable error might lead to an entire processor being faulted when only specific core strands should be faulted.
Workaround: Schedule a service call with your authorized Oracle service provider to replace the processor module containing the faulty core. Until it is replaced, you can return the strands related to the functioning cores to service using the following procedure. This restores as much system functionality as the active cores provide.
Identify the faulty core:
# fmdump -eV -c ereport.cpu.generic-sparc.l2tagctl-uc
The detector portion of the fmdump output is displayed as follows.
Note - Key elements in the example are highlighted for emphasis. They would not be highlighted in the actual output.
detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = hc hc-root = hc-list-sz = 4 hc-list = (array of embedded nvlists) (start hc-list) nvlist version: 0 hc-name = chassis hc-id = 0 (end hc-list) (start hc-list) nvlist version: 0 hc-name = cpuboard hc-id = 1 (start hc-list) (end hc-list) hc-name = chip nvlist version: 0hc-id = 2 (end hc-list) (start hc-list) nvlist version: 0 hc-name = core hc-id = 19 (end hc-list) (end detector)
In this example, the faulted chip is indicated by the following FMRI values:
Chassis = 0
CPU Board = 1
Chip = 2
Core = 19
The following table includes additional examples with corresponding Nomenclature Architecture Council (NAC) names.
For example, given a FMRI of chassis=0/cpuboard=x/chip=y/core=z, the corresponding NAC name for /SYS/PMa/CMPb/COREc can be derived as follows:
a = x
b = (y mod 2)
c = (z mod 8)
Halt the Oracle Solaris OS, and power off the server.
Disable the faulty core. From the Oracle ILOM CLI:
-> cd /SYS/PM1/CMP0/CORE0 /SYS/PM1/CMP0/CORE0 -> show /SYS/PM1/CMP0/CORE01331 -> set component_state=disabled Targets: P0 P1 P2 P3 P4 P5 P6 P7 L2CACHE L1CACHE Properties: type = CPU Core component_state = Enabled Commands: cd set show
Power on the server, and restart the Oracle Solaris OS.
Refer to the SPARC T4 Series Servers Administration Guide for information on powering on the server from the Oracle ILOM prompt.
Override the FMA diagnosis manually.
The faulty component's UUID value is provided in the first line of the fmdump output.
# fmadm repair uuid-of-fault
Note - This issue was originally listed as CR 7071237.
When a processor cache line encounters an uncorrectable error (UE), the fault manager is supposed to attempt to retire the cache line involved in the error. Because of this defect, the fault manager might not retire the faulty cache line and instead report the entire chip as faulted.
Workaround: Schedule a replacement of the FRU containing the faulty component. For additional information about UEs in processor cache lines, search for message ID SUN4V-8002-WY on the Oracle support site, http://support.oracle.com.
Note - This issue was originally listed as CR 7075336.
In rare cases, if the server or sever module experiences a serious problem that results in a panic, when the server is rebooted, a number of CPUs might not start, even though the CPUs are not faulty.
Example of the type of error displayed:
rebooting... Resetting... ERROR: 63 CPUs in MD did not start
Workaround: Power cycle the server.
-> stop /SYS Are you sure you want to stop /SYS (y/n)? y Stopping /SYS -> start /SYS Are you sure you want to start /SYS (y/n) ? y Starting /SYS
Note - This issue was originally listed as CR 7066165.
In rare instances, the system FRU power-up probing routine might fail to list all installed system power supplies. The power supplies themselves are not faulted, but commands listing system FRUs do not show the presence of the non-probed power supply.
The fault sets the system fault LED, but no power supply fault LED is illuminated. To find the fault, use the fmadm utility from the ILOM fault management shell.
Start the fmadm utility from the ILOM CLI:
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? y faultmgmtsp>
To view the fault, type the following:
faultmgmtsp> fmadm faulty ------------------- ------------------------------------ -------------- ------ Time UUID msgid Severity ------------------- ------------------------------------ -------------- ------ 2011-09-21/13:59:35 f13524d6-9970-4002-c2e6-de5d750f4088 ILOM-8000-2V Major Fault class : fault.fruid.corrupt FRU : /SYS/PS0 (Part Number: 300-2159) (Serial Number: 476856F+1115CC0001) Description : A Field Replaceable Unit (FRU) has a corrupt FRUID SEEPROM Response : The service-required LED may be illuminated on the affected FRU and chassis. Impact : The system may not be able to use one or more components on the affected FRU. This may prevent the system from powering on. Action : The administrator should review the ILOM event log for additional information pertaining to this diagnosis. Please refer to the Details section of the Knowledge Article for additional information.
Workaround: From the fault management shell prompt, clear the fault, exit the fault management shell, and reset the SP. For example:
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? y faultmgmtsp> fmadm repair /SYS/PS0 faultmgmtsp> exit -> reset /SP Are you sure you want to reset /SP (y/n)? y
After the SP has reset, verify that all installed power supplies appear in the list of system devices:
-> ls /SYS
If the problem occurs again after applying this workaround, contact your authorized Oracle Service Provider for further assistance.
Note - This issue was originally listed as CR 7066726.
In some instances under heavy load, power supply threshold messages similar to the following appear in the /var/adm/messages file:
SC Alert: [ID 579591 daemon.notice] Sensor | minor: Power Unit : /SYS/VPS : Upper Non-critical going high : reading 2140 >= threshold 2140 Watts SC Alert: [ID 807701 daemon.notice] Sensor | minor: Power Unit : /SYS/VPS : Upper Non-critical going low : reading 2100 <= threshold 2140 Watts
Workaround: From the fault management shell prompt, clear the fault, exit the fault management shell, and reset the SP. For example:
-> start /SP/faultmgmt/shell Are you sure you want to start /SP/faultmgmt/shell (y/n)? yfaultmgmtsp> fmadm repair /SYS/PS0 faultmgmtsp> exit -> reset /SP Are you sure you want to reset /SP (y/n)? y
Note - This issue was originally listed as CR 7180259.
In some cases, the Oracle ILOM firmware identifies and reports spurious power supply errors. For example:
ereport.chassis.voltage-lnc-glo@/sys/rio /SYS/RIO/VDD_+1V0 ereport.chassis.voltage-lnc-glo@/sys/rio /SYS/RIO/VDD_+1V8 ereport.chassis.voltage-lnc-glo@/sys/rio /SYS/RIO/VDD_+3V3 ereport.chassis.voltage-lnc-glo@/sys/rio /SYS/RIO/VDD_+5V0 fault.chassis.power.missing
Workaround: Update the server to System Firmware 8.2.0.f. If these errors persist, they indicate a power supply fault. Refer to the SPARC T4-2 Server Service Manual for service instructions.
Dual-processor servers (those equipped with just one processor module) are currently restricted to twelve or fewer PCIe Express Module adaptors. In addition, Slots 6, 7, 14, and 15 should not be populated with PCIe Express Module adaptors.
Quad-processor servers (servers equipped with two processor modules) do not have these restrictions.