C H A P T E R 3 - Notes for Servers With Hardware Dash Levels 01 Through 06

C H A P T E R 3

Notes for Servers With Hardware Dash Levels 01 Through 06

This chapter contains information about Sun Fire T1000 Servers with motherboard hardware dash levels 01 through 06.

To determine if these notes apply to your server, see Identifying the Notes for Your Server.

The following sections are in this chapter:

Support for the Sun Fire T1000 Server

Supported Versions of Firmware and Software

Patch Information

Known Issues and Workarounds

Sun Fire T1000 Server Documentation

Hardware RAID Support

Upgrading to a Two-Disk Configuration

Note - For hardware RAID support, you must install Patch 121130-01 or greater for the Solaris 10 1/06 OS. Hardware RAID support is enabled by default with the Solaris 10 6/06 (or later) Operating System (OS). See Hardware RAID Support.

Support for the Sun Fire T1000 Server

Technical Support

If you have any technical questions or issues that are not addressed in the Sun Fire T1000 server documentation, contact your local Sun trademark Services representative. For customers in the U.S. or Canada, call 1-800-USA-4SUN (1-800-872-4786). For customers in the rest of the world, find the World Wide Solution Center nearest you by visiting the web site:

http://www.sun.com/service/contacting/solution.html

Software Resources

The Solaris trademark Operating System and Sun Java Enterprise System software are preinstalled on your Sun Fire T1000 server.

If it becomes necessary to reload the software, go to the following web site. You will find instructions for downloading software.

http://www.sun.com/software/preinstall/

Note - If you download a fresh copy of software, that software might not include patches that are mandatory for your Sun Fire T1000 server. After installing the software, see Patch Information for a procedure to check for the presence of patches on the system.

Supported Versions of Firmware and Software

These are the minimum supported versions of firmware and software for this release of the Sun Fire T1000 server:

Solaris 10 1/06 OS

Sun Java Enterprise System software (Java ES 2005Q4)

Sun system firmware 6.1.2, which includes Advanced Lights Out Manager (ALOM) CMT 1.1.2 software and OpenBoot 4.20.0 firmware.

Patch Information

Mandatory Patches

You must install the following patches if they are not present on your system. To determine if the patches are present, see To Download Patches.

122027-01 or greater

119578-16 or greater

118822-30 or greater

119578-22 or greater

The following patch is mandatory for Sun trademark Cluster software:

119715-10 or greater

The following patches are required for hardware RAID support:

121130-01 or greater (for Solaris 10 1/06, not needed for Solaris 10 6/06)

119850-14 or greater (not required for servers with dash level 07 or greater)

123456-01 or greater (not required for servers with dash level 07 or greater)

Note - These patches are not included in some versions of preinstalled software on the Sun Fire T1000 server. If the patches are missing from your server, download them from SunSolveSM as described in To Download Patches.

To Download Patches

1. Determine whether the patches have been installed on your system.

For example, using the showrev command, type the following for each patch number:

# showrev -p | grep "Patch: 119578"

If you see patch information listed for the queried patch, and the dash extension (the last two digits) matches or exceeds the required version, your system has the proper patches already installed and no further action is required.

For example, if patch 119578-16 or later is installed, your system has the required version of this patch.

If you do not see patch information listed for the queried patch, or if the dash extension precedes the required version, go to Step 2.

For example, if no version of the 119578 patch, or a version with an extension of
-15 or earlier is installed, you must download and install the new patch.

2. Go to http://www.sun.com/sunsolve to download the patches.

Using the SunSolve PatchFinder tool, specify the base Patch ID number (the first six digits) to access the current release of a patch.

3. Follow the installation instructions provided in a specific patch's README file.

Patches for Option Cards

If you add option cards to your server, refer to the documentation and README files for each card to determine if additional patches are needed.

Known Issues and Workarounds

Hardware Installation and Service Issues

Chassis Cover Might Be Difficult to Remove (CR 6376423)

The chassis cover might be very difficult to remove. If you press too hard on the cover lock button, the front edge of the cover might warp and bind. Also, elastic gasket material on the sides of the chassis might prevent the cover from sliding freely.

To remove the cover, lightly hold down the cover lock button and push the cover slightly toward the front of the chassis (this assists the unlocking action), then slide the cover approximately one half inch (12 mm) toward the rear of the chassis. You can now lift the cover off the chassis.

General Functionality Issues

These are the functionality issues for this release.

Sun Explorer Utility

Supported Version

The Sun Fire T1000 server is supported by the Sun Explorer 5.2 data collection utility, but is not supported by earlier releases of the utility. Installing Sun Cluster software from the preinstalled Java ES package will automatically install an earlier version of the utility on your system. After installing any of the Java ES software, determine whether an earlier version of the Sun Explorer product has been installed on your system by typing the following:

# pkginfo -l SUNWexplo

If an earlier version exists, uninstall it and install version 5.2, or greater. To download Sun Explorer 5.2, go to:

http://www.sun.com/sunsolve

Sun Explorer Requires the `Tx000` Option

When running Explorer 5.2, or greater, you must specify the Tx000 option to collect the data from the ALOM-CMT commands on Sun Fire T1000 and Sun Fire T2000 platforms. The script is not run by default. To do so, type:

# /opt/SUNWexplo/bin/explorer -w default,Tx000

For more details, refer to troubleshooting document 83612, Using Sun Explorer on the Tx000 Series Systems. This document is available on the SunSolve web site.

http://www.sun.com/sunsolve

Solaris Predictive Self-Healing Fault Messages

Sun Fire T1000 servers do not have a full implementation of the Solaris predictive self-healing (PSH) feature. The current implementation provides the server with the ability to detect faults, but not the ability to completely diagnose and handle all faults.

If the server detects a PSH-related error, the following message might be generated:

SUNW-MSG-ID: FMD-8000-OW, TYPE: Defect, VER: 1, SERVERITY: Minor

EVENT-TIME: ...

PLATFORM: ...

SOURCE: fmd-self-diagnosis, REV: ...

DESC: The Solaris Fault Manager received an event from a component to which no automated diagnosis software is currently subscribed..

AUTO-RESPONSE: ...

IMPACT: Automated diagnosis and response for these events will not occur.

REC-ACTION: ...

If you see this message on the console or in the /var/adm/messages file, it might be an indication that patch 119578-16 or greater has not been installed. For information on obtaining patches and a list of mandatory patches for the Sun Fire T1000 server, see Patch Information.

If the patch has been installed but you continue to see error messages, contact Sun technical support.

Network Port Performance (CR 6346149)

The Sun Fire T1000 servers might experience a drop in network performance that occurs most notably when the system is configured to transmit or receive data over all four network ports at high rates. This might result in lower than expected throughput rates and some instances where network traffic over all four ports may induce system-wide hangs that would require a system reset to recover. If your Sun Fire T1000 should experience any system hang, please contact Sun with details on the fault, system activity, and system configuration information. Sun is actively working to resolve this problem.

System Will Not Power On With an Invalid Memory Configuration (CR 6300114)

The system will not power on if memory rank 0 is not populated. Rank 0 sockets must always be filled.

Erroneous Messages Displayed After a Repair
(CR 6369961)

The Solaris PSH facility automatically detects the replacement of DIMMs. However, erroneous fault messages might be displayed when the system is booted, and these messages can mislead you to think that a problem persists when it is actually fixed.

For a procedure to manually clear the fault from all logs so that it is not reported at boot time, see To Manually Clear Fault Logs.

To correct future occurrences of the problem, install patch 119578-22.

Disk Drive Write Cache Enabled By Default

Read caching and write caching are both enabled by default for the Sun Fire T1000 server disk drive. The use of the caches increases the read and write performance of the disk drive. However, data in the write cache might be lost if system AC power is interrupted. (A loss of AC power does not present a problem for the read cache.)

If you prefer to disable write caching, use the Solaris format -e command:

Caution - These settings are not saved permanently. You must reset the write cache setting every time the system boots.

To Disable the Write Cache

1. In the Solaris environment, enter the format expert mode by typing:

# format -e

Searching for disks...done

AVAILABLE DISK SELECTIONS:

       0. c0t0d0 <ATA-HDS ...

2. Specify disk number 0.

Specify disk (enter its number): 0

selecting c0t0d0

...

The format menu is displayed.

3. Select the cache option by typing:

format> cache

4. Select the write_cache option by typing:

cache> write_cache

WRITE_CACHE MENU:

...

5. Display the current setting for the write cache.

write_cache> display

Write Cache is enabled

6. Disable the write cache.

write_cache> disable

This setting is valid until next reset only. It is not saved permanently.

7. Verify the new setting.

write_cache> display

Write Cache is disabled

8. Exit from the write_cache mode.

write_cache> quit

CACHE MENU:

...

9. Exit from the cache mode.

cache> quit

FORMAT MENU:

...

10. Exit from the format command.

format> quit

Bug List

TABLE 3-1 lists known bugs for this release of the Sun Fire T1000 server. The CR (change request) IDs are listed in numerical order.

TABLE 3-1 Known Bugs
	CR ID	Description	Workaround
1.	6297813	Upon boot up, the following messages might be displayed: `svc.startd[7]: [ID 122153 daemon.warning] svc:/system/power:default: Method or service exit timed out. Killing contract 51.` `svc.startd[7]: [ID 636263 daemon.warning] svc:/system/power:default: Method "/lib/svc/method/svc-power start" failed due to signal KILL.`	If Solaris power management is required, restart power management manually or reboot the server. If Solaris power management is not required, no action is needed.
2.	6300114	The system fails to power on if memory rank 0 is not populated.	Rank 0 sockets must always be filled.
3.	6303328	The `iostat -E` command reports incorrect vendor information for the SATA drive.	There is no workaround at this time.
4.	6310384	The SunVTS USB keyboard test (`usbtest`) reports that a keyboard is present when there is no keyboard attached to the server.	Do not run `usbtest`.
5.	6312364	When accessing the host through the ALOM-CMT `console` command, you might experience slow console response.	For optimum responsiveness, access the host through the host network interfaces as soon as the host has completed booting the OS.
6.	6314590	Executing the ALOM CMT `break` command and the OpenBoot PROM `go` command might cause the system to hang or panic.	If the console hangs or panics, use the ALOM CMT `reset` command to reset the system.
7.	6317382	Typing unrecognized commands or words at the `OBP` prompt causes the system to return an erroneous error and might hang the server. This behavior only occurs when you drop into the OBP prompt from Solaris. The erroneous error message is: `ERROR`: `Last` `Trap`	Disregard this message. If the console hangs or panics, use the ALOM CMT `reset` command to reset the system.
8.	6318208	POST or `OBP` `reset-all` generates the alert, `Host system has shut down`.	This is normal behavior following a `reset-all` command. The message does not indicate a problem in this situation.
9.	6325271	The ALOM CMT console history boot and run logs are the same.	There is no workaround at this time.
10.	6331819	SunVTS memory or CPU tests could fail due to lack of system resources. When too many instances of SunVTS functional tests are run in parallel on UltraSPARC® T1 CMT CPU-based (sun4v) entry-level servers with low memory configurations, SunVTS tests might fail due to lack of system resources. For example, you could see an error message similar to the following: System call fork failed; Resource temporarily unavailable	Workaround: Decrease the number of SunVTS test instances or perform SunVTS functional tests separately. In addition, you can increase the delay value for CPU tests or increase the test memory reserve space.
11.	6336040	If the `clearasr` command is used to clear a failed DIMM from the `asr` database and the `resetsc` command is issued before the `clearasr` command can be completed, ALOM-CMT might not properly reboot and returns the following error message: `No` `valid` `MEMORY` `configuration`	After issuing the `clearasrdb` command, wait 15 seconds before issuing the `resetsc` command.
12.	6338365	Sun Net Connect 3.2.2 software does not monitor environmental alarms on the Sun Fire T1000 server.	To receive notification that an environmental error has occurred, use the ALOM-CMT `mgt_mailalert` feature to have ALOM-CMT send an email when an event occurs. To check whether or not the environmental status of the server is ok, log on to ALOM-CMT and run the `showfaults` command. To view a history of any events the server encountered, log on to ALOM-CMT and run the `showlogs` command.
13.	6338777	If you issue a break command in the middle of a system boot, and then immediately boot again, the boot process fails with the message, `Exception handlers interrupted, please file a bug`.	Boot again. The system should then reset and boot normally.
14.	6346149	The maximum throughput of the system network ports decreases unexpectedly as the network load increases.	There is no workaround at this time.
15.	6346170	The ALOM CMT `showfru` command displays epoch timestamps of `THU JAN 01 00:00:00 1970`.	Ignore timestamps with this date. There is no workaround at this time.
16.	6347456	SunVTS memory tests might log a warning message similar to the following in rare cases when the ECC Error Monitor (`errmon`) option is enabled: WARNING: software error encountered while processing /var/fm/fmd/errlog Additional-Information: end-of-file reached	Do not enable the `errmon` option. (The `errmon` option is disabled by default.)
17.	6348070	False Ereport error messages might be generated for PCI devices.	There is no workaround at this time. The FMA diagnostic software required to eliminate false Ereports for PCI devices is still under development.
18.	6356449	The `poweron` command does not power on the system when issued immediately after the ALOM CMT resets.	If you use a script to reset the ALOM-CMT and power on the system, insert a 1-second delay before the `poweron` command.
19.	6362690	When SunVTS testing is stopped while `dtlbtest` is running, `dtlbtest` fails with the error: `No` `CPUs` `to` `test`	Upgrade to SunVTS 6.1 PS1 or a subsequent compatible version at this URL: `http://www.sun.com/oem/products/vts/`
20.	6363820	The `showcomponent` command hangs if you repeatedly loop on the `disablecomponent` and `enablecomponent` commands.	Reset ALOM-CMT with the `resetsc` command.
21.	6368136	Displaying large persistent logs with the `showlogs` `-p` `p` command slows down the ALOM CMT command line interface.	Use the `-e` flag with the `showlogs` command: `showlogs -e` number-of-lines This command displays a specified number of lines of data instead of displaying the entire log.
22.	6368944	The virtual-console does not accept paste buffers that are greater than 114 characters. This causes the wanboot NVRAM parameter, `network-boot-arguments` to not be set.	Cut and paste in chunks smaller than 114 characters, or don't use cut and paste.
23.	6369626	The ALOM CMT `poweron` command can fail and the console device is not available. If another `poweron` command is issued, it fails with a "Host poweron is already in progress" message.	Reset the ALOM CMT with the `resetsc` command, then issue the `poweron` command again. If this fails, manually reset the system as follows: Unplug the power cord from server. Wait 5 seconds. Plug the power cord back into the server.
24.	6369961	System fault messages and ALOM CMT alerts continue to be generated on boot after the fault has been repaired.	Install patch 119578-22 to avoid the problem. If the patch has not yet been installed, after you replace the faulty FRU, run the `showfaults` `-v` command to determine how to clear the fault. For the full procedure for clearing faulty messages, see To Manually Clear Fault Logs.
25.	6370222	The `flarcreate` command and flash archive do not work.	Before creating the flash archive manually, unmount the `libc_psr_hwcap1` libraries. After the flash archive is created, remount the `libc_psr_hwcap1` libraries.
26.	6370233	The Dtrace function might return inaccurate CPU xcalls.	Although they are not stable interfaces, putting Dtrace fbt probes on `send_one_mondo` and `send_mondo_set` could be used as a workaround. For `send_mondo_set`, extract the number of CPUs being sent cross calls from the `cpuset_t` argument.
27.	6372709	The maximum size of the FMA `fltlog` file might be restricted.	Remove the restrictions by changing the default log rotation options for the Solaris `logadm(1M)` command.
28.	6373682	A momentary pressure on the Power On/Off button does not initiate a normal shutdown.	Use the ALOM-CMT `poweron` and `poweroff` commands to power the system on and off.
29.	6375927	A date changed through the Solaris `date` command persists across reboots of the Solaris OS but not reboots of the ALOM CMT.	Use only the ALOM-CMT `date` command. Do not use the Solaris `date` command.
30.	6376423	The chassis cover might be extremely difficult to remove.	See Chassis Cover Might Be Difficult to Remove (CR 6376423).
31.	6377071	At certain stages of the power-on process, if the `resetsc` command is issued, or if the server loses AC power, the ALOM-CMT record of boot status is not cleared. At the next boot, ALOM-CMT might print the message "Reboot loop detected" and does not power on the system.	Issue the command `poweroff` `-f` and attempt to power on again. If this fails, manually reset the system as follows: Unplug the power cord from server. Wait 5 seconds. Plug the power cord back into the server.
32.	6377077	If host power is removed while POST or OpenBoot PROM is testing a device, the device is disabled.	Use the ALOM-CMT command, `enablecomponent` to reenable the incorrectly blacklisted device.
33.	6379739	The ALOM CMT `sc_powerstatememory` record might fail during a power failure, preventing the system from powering up afterward.	Use the ALOM CMT `poweroff` and `poweron` commands to cycle power on the host system. If you need to remove AC power from the system, you must wait 5 seconds before reapplying power.
34.	6381707	A faulty DIMM in rank 0 memory can prevent POST from running. The ALOM CMT `showcomponent` command does not list any CPUs if POST fails to run. Cycling power or running the `resetsc` command does not update the `showcomponent` list.	Replace the faulty DIMM, then run POST to update the device list used by the `showcomponents` command.
35.	6383237	The OpenBoot `nvramrc` script is not evaluated before the `probe-all` command executes.	There is no workaround at this time.
36.	6383664	System does not automatically recover and reboot after an error that causes a fatal abort. In these situations, you must manually power on the system.	Wait for the message `SC Alert: Host system has shut down`, then issue the ALOM CMT `poweron` command. (Caution: a system shutdown takes approximately 1-2 minutes. If you issue a `poweron` or `poweroff` command before the SC Alert message appears, the system will enter an uncertain state. If this happens, issue the ALOM-CMT `resetsc` command first, then issue the `poweron` command.)
37.	6389912	False error messages are logged during poweron or system reset. The error messages include this segment: `ereport.io.fire.pec.lup`	Ignore the messages.

Sun Fire T1000 Server Documentation

Downloading Documentation

Instructions for installing, administering, and using your Sun Fire T1000 server are provided in the Sun Fire T1000 server documentation set. The entire documentation set is available for download from the following web site:

http://www.sun.com/documentation/

Note - Information in these product notes supersedes the information in the Sun Fire T1000 documentation set.

To Manually Clear Fault Logs

Perform this procedure after replacing Sun Fire T1000 DIMMs. This procedure clears persistent fault information that creates erroneous fault messages at boot time.

1. Troubleshoot and repair a faulty FRU as described in the Sun Fire T1000 Server Service Manual.

2. Gain access to the ALOM-CMT sc> prompt.

Refer to the Advanced Lights Out Management (ALOM) CMT v1.1 Guide for instructions.

3. Run the showfaults -v command to determine how to clear the fault.

The method you use to clear a fault depends on how the fault is identified by the showfaults command.

Examples:

If the fault is a Host-detected fault (displays a UUID), such as the following:

sc> showfaults -v

ID Time              FRU               Fault

0 SEP 09 11:09:26   MB/CMP0/CH0/R0/D0 Host detected fault, MSGID:

SUN4U-8000-2S  UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86

Then continue to Step 4.

If the fault resulted in the FRU being disabled, such as the following:

sc> showfaults -v

   ID Time              FRU               Fault

    1 OCT 13 12:47:27   MB/CMP0/CH0/R0/D0 MB/CMP0/CH0/R0/D0 deemed faulty and disabled

Then run the enablecomponent command to enable the FRU:

sc> enablecomponent MB/CMP0/CH0/R0/D0

4. Perform the following steps to verify that there are no faults:

a. Set the virtual keyswitch to Diag mode so that POST will run in Service mode.

sc> setkeyswitch diag

b. Issue the poweron command.

sc> poweron

c. Switch to the system console to view POST output.

sc> console

Watch the POST output for possible fault messages. The following output is a sign that POST did not detect any faults:

0:0>POST Passed all devices.

0:0>

0:0>DEMON: (Diagnostics Engineering MONitor)

0:0>Select one of the following functions

0:0>POST:Return to OBP.

0:0>INFO:

0:0>POST Passed all devices.

0:0>Master set ACK for vbsc runpost command and spin...

Note - Depending on the configuration of ALOM-CMT POST variables and whether POST detected faults or not, the system might boot, or the system might remain at the ok prompt. If the system is at the ok prompt, type boot.

d. Issue the Solaris OS fmadm faulty command.

# fmadm faulty

No memory or DIMM faults should be displayed.

If faults are reported, refer to the Diagnostic Flow Chart in the Sun Fire T1000 Server Service Manual for an approach to diagnose the fault.

5. Gain access to the ALOM-CMT sc> prompt.

6. Run the showfaults command.

If the fault was detected by the host and the fault information persists, the output will be similar to the following example:

sc> showfaults -v

ID Time              FRU               Fault

0 SEP 09 11:09:26   MB/CMP0/CH0/R0/D0 Host detected fault, MSGID:

SUN4U-8000-2S  UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86

If the showfaults command does not report a fault with a UUID, then you do not need to proceed with the following steps because the fault is cleared.

7. Run the clearfault command.

sc> clearfault 7ee0e46b-ea64-6565-e684-e996963f7b86

8. Switch to the system console.

sc> console

9. Issue the fmadm repair command with the UUID.

Use the same UUID that you used with the clearfault command.

# fmadm repair 7ee0e46b-ea64-6565-e684-e996963f7b86

Hardware RAID Support

RAID technology allows for the construction of a logical volume, made up of multiple physical disks, to provide data redundancy, increased performance, or both. The Sun Fire T1000 server onboard disk controller supports the following RAID configurations:

Integrated Stripe, or IS volumes (RAID 0)

Integrated Mirror, or IM volumes (RAID 1)

You must have the following patches installed on the server before you create RAID volumes:

121130-01 or greater (for the Solaris 10 1/06 OS only) - provides updated hardware RAID support. This patch is not required for the Solaris 10 6/06 or later OS.

Note - For servers with HW dash level 07 or later, the following patches are preinstalled.

123456-01 or greater - provides 1064 firmware update.

119850-14 or greater - provides updates to the mpt device driver and raidctl utility. 119850-17 is preinstalled with the Solaris 10 6/06 OS.

For information on how to implement hardware RAID on the server, refer to the Sun Fire T1000 Server Administration Guide (part number 819-3249). This document is available alongside the other Sun Fire T1000 manuals at http://www.sun.com/documentation.

Upgrading to a Two-Disk Configuration

Any Sun Fire T1000 server with a single hard disk configuration can be upgraded to a two SAS disk configuration by installing the following hardware:

Note - Patches 123456-01 or greater and 119850-14 or greater are required for this hardware upgrade.

2 73 GB 10000 RPM 2.5 inch SAS Disk Drives with Bracket and Cable (part number XRA-SS2CG-73G10KZ) available at http://store.sun.com/CMTemplate/CEServlet?process=SunStore&cmdViewProduct_CP&boxid=XRA-SS2CG-73G10KZ