C H A P T E R  3

Notes for Servers With Hardware Dash Levels 01 Through 06

This chapter contains information about Sun Fire T1000 Servers with motherboard hardware dash levels 01 through 06.

To determine if these notes apply to your server, see Identifying the Notes for Your Server.

The following sections are in this chapter:



Note - For hardware RAID support, you must install Patch 121130-01 or greater for the Solaris 10 1/06 OS. Hardware RAID support is enabled by default with the Solaris 10 6/06 (or later) Operating System (OS). See Hardware RAID Support.




Support for the Sun Fire T1000 Server

Technical Support

If you have any technical questions or issues that are not addressed in the Sun Fire T1000 server documentation, contact your local Suntrademark Services representative. For customers in the U.S. or Canada, call 1-800-USA-4SUN (1-800-872-4786). For customers in the rest of the world, find the World Wide Solution Center nearest you by visiting the web site:

http://www.sun.com/service/contacting/solution.html

Software Resources

The Solaristrademark Operating System and Sun Javatrademark Enterprise System software are preinstalled on your Sun Fire T1000 server.

If it becomes necessary to reload the software, go to the following web site. You will find instructions for downloading software.

http://www.sun.com/software/preinstall/



Note - If you download a fresh copy of software, that software might not include patches that are mandatory for your Sun Fire T1000 server. After installing the software, see Patch Information for a procedure to check for the presence of patches on the system.




Supported Versions of Firmware and Software

These are the minimum supported versions of firmware and software for this release of the Sun Fire T1000 server:


Patch Information

Mandatory Patches

You must install the following patches if they are not present on your system. To determine if the patches are present, see To Download Patches.

The following patch is mandatory for Suntrademark Cluster software:

The following patches are required for hardware RAID support:



Note - These patches are not included in some versions of preinstalled software on the Sun Fire T1000 server. If the patches are missing from your server, download them from SunSolveSM as described in To Download Patches.




procedure icon  To Download Patches

1. Determine whether the patches have been installed on your system.

For example, using the showrev command, type the following for each patch number:


# showrev -p | grep "Patch: 119578"

For example, if patch 119578-16 or later is installed, your system has the required version of this patch.

For example, if no version of the 119578 patch, or a version with an extension of
-15 or earlier is installed, you must download and install the new patch.

2. Go to http://www.sun.com/sunsolve to download the patches.

Using the SunSolve PatchFinder tool, specify the base Patch ID number (the first six digits) to access the current release of a patch.

3. Follow the installation instructions provided in a specific patch's README file.

Patches for Option Cards

If you add option cards to your server, refer to the documentation and README files for each card to determine if additional patches are needed.


Known Issues and Workarounds

Hardware Installation and Service Issues

Chassis Cover Might Be Difficult to Remove (CR 6376423)

The chassis cover might be very difficult to remove. If you press too hard on the cover lock button, the front edge of the cover might warp and bind. Also, elastic gasket material on the sides of the chassis might prevent the cover from sliding freely.

To remove the cover, lightly hold down the cover lock button and push the cover slightly toward the front of the chassis (this assists the unlocking action), then slide the cover approximately one half inch (12 mm) toward the rear of the chassis. You can now lift the cover off the chassis.

General Functionality Issues

These are the functionality issues for this release.

Sun Explorer Utility

Supported Version

The Sun Fire T1000 server is supported by the Sun Explorer 5.2 data collection utility, but is not supported by earlier releases of the utility. Installing Sun Cluster software from the preinstalled Java ES package will automatically install an earlier version of the utility on your system. After installing any of the Java ES software, determine whether an earlier version of the Sun Explorer product has been installed on your system by typing the following:


# pkginfo -l SUNWexplo 

If an earlier version exists, uninstall it and install version 5.2, or greater. To download Sun Explorer 5.2, go to:

http://www.sun.com/sunsolve

Sun Explorer Requires the Tx000 Option

When running Explorer 5.2, or greater, you must specify the Tx000 option to collect the data from the ALOM-CMT commands on Sun Fire T1000 and Sun Fire T2000 platforms. The script is not run by default. To do so, type:


# /opt/SUNWexplo/bin/explorer -w default,Tx000

For more details, refer to troubleshooting document 83612, Using Sun Explorer on the Tx000 Series Systems. This document is available on the SunSolve web site.

http://www.sun.com/sunsolve

Solaris Predictive Self-Healing Fault Messages

Sun Fire T1000 servers do not have a full implementation of the Solaris predictive self-healing (PSH) feature. The current implementation provides the server with the ability to detect faults, but not the ability to completely diagnose and handle all faults.

If the server detects a PSH-related error, the following message might be generated:


SUNW-MSG-ID: FMD-8000-OW, TYPE: Defect, VER: 1, SERVERITY: Minor
EVENT-TIME: ...
PLATFORM: ...
SOURCE: fmd-self-diagnosis, REV: ...
DESC: The Solaris Fault Manager received an event from a component to which no automated diagnosis software is currently subscribed..
AUTO-RESPONSE: ...
IMPACT: Automated diagnosis and response for these events will not occur.
REC-ACTION: ...

If you see this message on the console or in the /var/adm/messages file, it might be an indication that patch 119578-16 or greater has not been installed. For information on obtaining patches and a list of mandatory patches for the Sun Fire T1000 server, see Patch Information.

If the patch has been installed but you continue to see error messages, contact Sun technical support.

Network Port Performance (CR 6346149)

The Sun Fire T1000 servers might experience a drop in network performance that occurs most notably when the system is configured to transmit or receive data over all four network ports at high rates. This might result in lower than expected throughput rates and some instances where network traffic over all four ports may induce system-wide hangs that would require a system reset to recover. If your Sun Fire T1000 should experience any system hang, please contact Sun with details on the fault, system activity, and system configuration information. Sun is actively working to resolve this problem.

System Will Not Power On With an Invalid Memory Configuration (CR 6300114)

The system will not power on if memory rank 0 is not populated. Rank 0 sockets must always be filled.

Erroneous Messages Displayed After a Repair
(CR 6369961)

The Solaris PSH facility automatically detects the replacement of DIMMs. However, erroneous fault messages might be displayed when the system is booted, and these messages can mislead you to think that a problem persists when it is actually fixed.

For a procedure to manually clear the fault from all logs so that it is not reported at boot time, see To Manually Clear Fault Logs.

To correct future occurrences of the problem, install patch 119578-22.

Disk Drive Write Cache Enabled By Default

Read caching and write caching are both enabled by default for the Sun Fire T1000 server disk drive. The use of the caches increases the read and write performance of the disk drive. However, data in the write cache might be lost if system AC power is interrupted. (A loss of AC power does not present a problem for the read cache.)

If you prefer to disable write caching, use the Solaris format -e command:



caution icon

Caution - These settings are not saved permanently. You must reset the write cache setting every time the system boots.




procedure icon  To Disable the Write Cache

1. In the Solaris environment, enter the format expert mode by typing:


# format -e
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <ATA-HDS ...

2. Specify disk number 0.


Specify disk (enter its number): 0
selecting c0t0d0
...

The format menu is displayed.

3. Select the cache option by typing:


format> cache 

4. Select the write_cache option by typing:


cache> write_cache
WRITE_CACHE MENU:
...

5. Display the current setting for the write cache.


write_cache> display
Write Cache is enabled

6. Disable the write cache.


write_cache> disable
This setting is valid until next reset only. It is not saved permanently.

7. Verify the new setting.


write_cache> display
Write Cache is disabled

8. Exit from the write_cache mode.


write_cache> quit
CACHE MENU:
...

9. Exit from the cache mode.


cache> quit
FORMAT MENU:
...

10. Exit from the format command.


format> quit

 

Bug List

TABLE 3-1 lists known bugs for this release of the Sun Fire T1000 server. The CR (change request) IDs are listed in numerical order.


TABLE 3-1 Known Bugs

CR ID

Description

Workaround

1.

6297813

Upon boot up, the following messages might be displayed:

  • svc.startd[7]: [ID 122153 daemon.warning] svc:/system/power:default: Method or service exit timed out. Killing contract 51.
  • svc.startd[7]: [ID 636263 daemon.warning] svc:/system/power:default: Method "/lib/svc/method/svc-power start" failed due to signal KILL.

If Solaris power management is required, restart power management manually or reboot the server. If Solaris power management is not required, no action is needed.

2.

6300114

The system fails to power on if memory rank 0 is not populated.

Rank 0 sockets must always be filled.

3.

6303328

The iostat -E command reports incorrect vendor information for the SATA drive.

There is no workaround at this time.

4.

6310384

The SunVTS USB keyboard test (usbtest) reports that a keyboard is present when there is no keyboard attached to the server.

 

Do not run usbtest.

5.

6312364

When accessing the host through the ALOM-CMT console command, you might experience slow console response.

For optimum responsiveness, access the host through the host network interfaces as soon as the host has completed booting the OS.

6.

6314590

Executing the ALOM CMT break command and the OpenBoot PROM go command might cause the system to hang or panic.

If the console hangs or panics, use the ALOM CMT reset command to reset the system.

7.

6317382

Typing unrecognized commands or words at the OBP prompt causes the system to return an erroneous error and might hang the server. This behavior only occurs when you drop into the OBP prompt from Solaris. The erroneous error message is:

ERROR: Last Trap

Disregard this message. If the console hangs or panics, use the ALOM CMT reset command to reset the system.

8.

6318208

POST or OBP reset-all generates the alert, Host system has shut down.

This is normal behavior following a reset-all command. The message does not indicate a problem in this situation.

9.

6325271

The ALOM CMT console history boot and run logs are the same.

There is no workaround at this time.

10.

6331819

SunVTStrademark memory or CPU tests could fail due to lack of system resources. When too many instances of SunVTS functional tests are run in parallel on UltraSPARC® T1 CMT CPU-based (sun4v) entry-level servers with low memory configurations, SunVTS tests might fail due to lack of system resources. For example, you could see an error message similar to the following:

System call fork failed; Resource temporarily unavailable

Workaround: Decrease the number of SunVTS test instances or perform SunVTS functional tests separately. In addition, you can increase the delay value for CPU tests or increase the test memory reserve space.

11.

6336040

If the clearasr command is used to clear a failed DIMM from the asr database and the resetsc command is issued before the clearasr command can be completed, ALOM-CMT might not properly reboot and returns the following error message:
No valid MEMORY configuration

After issuing the clearasrdb command, wait 15 seconds before issuing the resetsc command.

12.

6338365

Sun Net Connect 3.2.2 software does not monitor environmental alarms on the Sun Fire T1000 server.

To receive notification that an environmental error has occurred, use the ALOM-CMT mgt_mailalert feature to have ALOM-CMT send an email when an event occurs.

To check whether or not the environmental status of the server is ok, log on to ALOM-CMT and run the showfaults command.

To view a history of any events the server encountered, log on to ALOM-CMT and run the showlogs command.

13.

6338777

If you issue a break command in the middle of a system boot, and then immediately boot again, the boot process fails with the message, Exception handlers interrupted, please file a bug.

Boot again. The system should then reset and boot normally.

14.

6346149

The maximum throughput of the system network ports decreases unexpectedly as the network load increases.

There is no workaround at this time.

15.

6346170

The ALOM CMT showfru command displays epoch timestamps of THU JAN 01 00:00:00 1970.

Ignore timestamps with this date. There is no workaround at this time.

16.

6347456

SunVTS memory tests might log a warning message similar to the following in rare cases when the ECC Error Monitor (errmon) option is enabled:

WARNING: software error encountered while processing /var/fm/fmd/errlog Additional-Information: end-of-file reached

Do not enable the errmon option.
(The errmon option is disabled by default.)

17.

6348070

False Ereport error messages might be generated for PCI devices.

There is no workaround at this time. The FMA diagnostic software required to eliminate false Ereports for PCI devices is still under development.

18.

6356449

The poweron command does not power on the system when issued immediately after the ALOM CMT resets.

If you use a script to reset the ALOM-CMT and power on the system, insert a 1-second delay before the poweron command.

19.

6362690

When SunVTS testing is stopped while dtlbtest is running, dtlbtest fails with the error: No CPUs to test

Upgrade to SunVTS 6.1 PS1 or a subsequent compatible version at this URL:

http://www.sun.com/oem/products/vts/

20.

6363820

The showcomponent command hangs if you repeatedly loop on the disablecomponent and enablecomponent commands.

Reset ALOM-CMT with the resetsc command.

21.

6368136

Displaying large persistent logs with the showlogs -p p command slows down the ALOM CMT command line interface.

Use the -e flag with the showlogs command:

showlogs -e number-of-lines

This command displays a specified number of lines of data instead of displaying the entire log.

22.

6368944

The virtual-console does not accept paste buffers that are greater than 114 characters. This causes the wanboot NVRAM parameter, network-boot-arguments to not be set.

Cut and paste in chunks smaller than 114 characters, or don't use cut and paste.

23.

6369626

The ALOM CMT poweron command can fail and the console device is not available. If another poweron command is issued, it fails with a "Host poweron is already in progress" message.

Reset the ALOM CMT with the resetsc command, then issue the poweron command again.

If this fails, manually reset the system as follows:

  1. Unplug the power cord from server.
  2. Wait 5 seconds.

Plug the power cord back into the server.

24.

6369961

System fault messages and ALOM CMT alerts continue to be generated on boot after the fault has been repaired.

Install patch 119578-22 to avoid the problem.

If the patch has not yet been installed, after you replace the faulty FRU, run the showfaults -v command to determine how to clear the fault. For the full procedure for clearing faulty messages, see To Manually Clear Fault Logs.

25.

6370222

The flarcreate command and flash archive do not work.

  • Before creating the flash archive manually, unmount the libc_psr_hwcap1 libraries.
  • After the flash archive is created, remount the libc_psr_hwcap1 libraries.

26.

6370233

The Dtrace function might return inaccurate CPU xcalls.

Although they are not stable interfaces, putting Dtrace fbt probes on send_one_mondo and send_mondo_set could be used as a workaround. For send_mondo_set, extract the number of CPUs being sent cross calls from the cpuset_t argument.

27.

6372709

The maximum size of the FMA fltlog file might be restricted.

Remove the restrictions by changing the default log rotation options for the Solaris logadm(1M) command.

28.

6373682

A momentary pressure on the Power On/Off button does not initiate a normal shutdown.

Use the ALOM-CMT poweron and poweroff commands to power the system on and off.

29.

6375927

A date changed through the Solaris date command persists across reboots of the Solaris OS but not reboots of the ALOM CMT.

Use only the ALOM-CMT date command. Do not use the Solaris date command.

30.

6376423

The chassis cover might be extremely difficult to remove.

See Chassis Cover Might Be Difficult to Remove (CR 6376423).

31.

6377071

At certain stages of the power-on process, if the resetsc command is issued, or if the server loses AC power, the ALOM-CMT record of boot status is not cleared. At the next boot, ALOM-CMT might print the message "Reboot loop detected" and does not power on the system.

Issue the command poweroff -f and attempt to power on again.

If this fails, manually reset the system as follows:

  1. Unplug the power cord from server.
  2. Wait 5 seconds.
  3. Plug the power cord back into the server.

 

32.

6377077

If host power is removed while POST or OpenBoot PROM is testing a device, the device is disabled.

Use the ALOM-CMT command, enablecomponent to reenable the incorrectly blacklisted device.

33.

6379739

The ALOM CMT sc_powerstatememory record might fail during a power failure, preventing the system from powering up afterward.

Use the ALOM CMT poweroff and poweron commands to cycle power on the host system. If you need to remove AC power from the system, you must wait 5 seconds before reapplying power.

34.

6381707

A faulty DIMM in rank 0 memory can prevent POST from running. The ALOM CMT showcomponent command does not list any CPUs if POST fails to run. Cycling power or running the resetsc command does not update the showcomponent list.

Replace the faulty DIMM, then run POST to update the device list used by the showcomponents command.

35.

6383237

The OpenBoot nvramrc script is not evaluated before the probe-all command executes.

There is no workaround at this time.

36.

6383664

System does not automatically recover and reboot after an error that causes a fatal abort. In these situations, you must manually power on the system.

Wait for the message SC Alert: Host system has shut down, then issue the ALOM CMT poweron command.

(Caution: a system shutdown takes approximately 1-2 minutes. If you issue a poweron or poweroff command before the SC Alert message appears, the system will enter an uncertain state. If this happens, issue the ALOM-CMT resetsc command first, then issue the poweron command.)

37.

6389912

False error messages are logged during poweron or system reset.

The error messages include this segment: ereport.io.fire.pec.lup

Ignore the messages.



Sun Fire T1000 Server Documentation

Downloading Documentation

Instructions for installing, administering, and using your Sun Fire T1000 server are provided in the Sun Fire T1000 server documentation set. The entire documentation set is available for download from the following web site:

http://www.sun.com/documentation/



Note - Information in these product notes supersedes the information in the Sun Fire T1000 documentation set.




procedure icon  To Manually Clear Fault Logs

Perform this procedure after replacing Sun Fire T1000 DIMMs. This procedure clears persistent fault information that creates erroneous fault messages at boot time.

1. Troubleshoot and repair a faulty FRU as described in the Sun Fire T1000 Server Service Manual.

2. Gain access to the ALOM-CMT sc> prompt.

Refer to the Advanced Lights Out Management (ALOM) CMT v1.1 Guide for instructions.

3. Run the showfaults -v command to determine how to clear the fault.

The method you use to clear a fault depends on how the fault is identified by the showfaults command.

Examples:

Then continue to Step 4.

Then run the enablecomponent command to enable the FRU:


sc> enablecomponent MB/CMP0/CH0/R0/D0

4. Perform the following steps to verify that there are no faults:

a. Set the virtual keyswitch to Diag mode so that POST will run in Service mode.


sc> setkeyswitch diag

b. Issue the poweron command.


sc> poweron

c. Switch to the system console to view POST output.


sc> console

Watch the POST output for possible fault messages. The following output is a sign that POST did not detect any faults:


.
.
.
0:0>POST Passed all devices. 
0:0> 
0:0>DEMON: (Diagnostics Engineering MONitor) 
0:0>Select one of the following functions 
0:0>POST:Return to OBP. 
0:0>INFO: 
0:0>POST Passed all devices. 
0:0>Master set ACK for vbsc runpost command and spin... 



Note - Depending on the configuration of ALOM-CMT POST variables and whether POST detected faults or not, the system might boot, or the system might remain at the ok prompt. If the system is at the ok prompt, type boot.



d. Issue the Solaris OS fmadm faulty command.


# fmadm faulty

No memory or DIMM faults should be displayed.

If faults are reported, refer to the Diagnostic Flow Chart in the Sun Fire T1000 Server Service Manual for an approach to diagnose the fault.

5. Gain access to the ALOM-CMT sc> prompt.

 

6. Run the showfaults command.

If the fault was detected by the host and the fault information persists, the output will be similar to the following example:


sc> showfaults -v
ID Time              FRU               Fault
0 SEP 09 11:09:26   MB/CMP0/CH0/R0/D0 Host detected fault, MSGID: 
SUN4U-8000-2S  UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86

If the showfaults command does not report a fault with a UUID, then you do not need to proceed with the following steps because the fault is cleared.

7. Run the clearfault command.


sc> clearfault 7ee0e46b-ea64-6565-e684-e996963f7b86

8. Switch to the system console.


sc> console

9. Issue the fmadm repair command with the UUID.

Use the same UUID that you used with the clearfault command.


# fmadm repair 7ee0e46b-ea64-6565-e684-e996963f7b86


Hardware RAID Support

RAID technology allows for the construction of a logical volume, made up of multiple physical disks, to provide data redundancy, increased performance, or both. The Sun Fire T1000 server onboard disk controller supports the following RAID configurations:

You must have the following patches installed on the server before you create RAID volumes:



Note - For servers with HW dash level 07 or later, the following patches are preinstalled.



For information on how to implement hardware RAID on the server, refer to the Sun Fire T1000 Server Administration Guide (part number 819-3249). This document is available alongside the other Sun Fire T1000 manuals at http://www.sun.com/documentation.


Upgrading to a Two-Disk Configuration

Any Sun Fire T1000 server with a single hard disk configuration can be upgraded to a two SAS disk configuration by installing the following hardware:



Note - Patches 123456-01 or greater and 119850-14 or greater are required for this hardware upgrade.