C H A P T E R  2

SMS 1.4 Bugs

This chapter provides information about known SMS 1.4 bugs. It includes:


Bugs in SMS 1.4 Software

This section summarizes the most important 1.4 bugs and RFEs that affect SMS 1.4. It does not include all outstanding bugs and RFEs.

Using Control-C to Interrupt Poweron/Poweroff Sequence Can Cause ESMD to Core Dump (BugId 4902308)

Interrupting poweron/poweroff with Control-C can cause ESMD to core dump. ESMD will restart automatically and will gracefully recover. Component failure (esmd) and restart messages will be logged to the platform messages file.

Workaround: Do not use Control-C during poweron or poweroff operations.

Using Control-C to Interrupt Poweron/Poweroff Sequence Can Display Unnecessary Error Messages (BugId 4902311)

Interrupting poweron/poweroff with Control-C might cause errors such as "client monitor failed" to be logged on the platform. Although the messages do not reflect actual errors and have no effect on the system, they can be unnecessarily alarming.

Workaround: Either do not issue Control-C commands during power on operations or if you do, ignore the error messages.

The setchs -c Command Is Limited to One Component at a Time (BugId 4925617)

If you try to change CHS on more than one component with a single setchs command, only the first component will be changed. The command returns "0" to indicate successful completion, and does not provide an error message indicating that the subsequent components were not changed.

Workaround: Do not apply the setchs -c command to more than one component at a time.

ADC Chip Timeout Errors Displayed When SC Is Under Load (BugId 4948686)

When the system controller is subjected to some heavy load conditions, SMS 1.4 software may report ADC chip calibration timeout errors such as this one:

...NOTICE ExpBoard.cc 122] The ADC chip calibration timeout on EX13

Workaround: Ignore the error messages.

Misleading Message During SC Poweroff (BugId 4953836)

When esmd powers down a system controller (SC) due to environmental issues such as high or low temperature, it displays a misleading message. The message states that the SC will be powered off and removed from the domain. A system controller can not be included in a domain, so it cannot be removed.

Workaround: Ignore the message.

Domain Boot Time Has Increased (BugId 4957596)

There has been an increase of approximately 15% in the time it takes a Starcat chassis to turn on and have its domains display a Solaris prompt.

Workaround: None.

Failover May Not Work Properly on Spare SC (BugId 4963029)

When using a degraded centerplane, failover may not work properly on the spare SC.

Workaround: Correct the degraded centerplane issue before attempting to fix the spare SC.

Two-Processor System Boards Display Uknown Status After Domain Reboot (BugId 4970240)

When both processors of a 2-processor system board are indicted due to Solaris ECC correctable errors and the domain is rebooted, the "Power State" of the system board changes to UNKNOWN instead of remaining as ON. This will cause showchs to FAIL.

This problem does not occur with four-processor system boards.

Workaround: Power cycle the system board.

Domain Does Not Recover If You Poweroff Expander In a Running Domain (BugId 4970726)

If you poweroff an expander board in a running domain, dsmd will not recover the domain.

Workaround: Do not poweroff an expander when components in slot 0 or 1 are in use by a running domain.

 

Successful DR Operation Displays Error Message (BugId 4971396)

A successful addboard operation performed on a domain configured in a split-slot configuration can sometimes display this error message:

FAIL Slot SB12: MaxCPU in use in Slot I012, allow_maxcpu_split_ex not set. There is no FRU service action indicated for this failure.

Workaround: Use the showboards command to verify that the operation succeeded. If it did, ignore the message.

setkeyswitch Operation Appears to Hang (BugId 4972781)

If you run setkeyswitch commands on multiple domains that share expander boards, you may see error messages similar to this one:

[ ...ERR setKeyswitchLock.cc 124] setkeyswitch process already running: pid=10435

The operation is not hanging. Instead, each domain is locking the shared hardware from the other domains. When the first setkeyswitch command completes, the remaining setkeyswitch commands can begin.

Workaround: None.

 

Do Not Insert a System Board Into an Expander Board That Is Powered Down (BugId 4970670)

If a system board is inserted into a powered down expander board, no installation record is written.

Workaround: Remove the system board, power-on the expander board, and re-insert the system board.

 


Bugs That Affect SMS 1.4 Software

This section summarizes the most important bugs that can affect the SMS 1.4 system. It is not an exhaustive list of every bug that could affect the SMS 1.4 system.

After Changing the MAN I1 Network IP Address of an Installed Domain, you must reconfigure the MAN network by hand (BugId 4484851)

If there are already installed domains and you have changed the MAN I1 network configuration using smsconfig -m,you must configure the MAN network information on the already installed domains by hand.

Workaround: Refer to the information about unconfigured domains in the System Management Services (SMS) 1.4 Installation Guide.

Sun Fire 15K Platform-Specific Begin/Finish Scripts Can Hang on HPCI+-Only Domains (BugId 4797577)

The Solaris 8 update 7 operating environment does not include support for hsPCI+ boards. In domains consisting of only hsPCI+ boards, the installation can hang after the start of the Begin/Finish scripts.

Workaround: Press Ctrl-C to interrupt the Begin/Finish scripts. This will let the rest of the installation continue, resulting in successful installation.

Intermittent I2C Timeouts (1124) for Hpc3130 Cassette Status (BugId 4785961)

Intermittent I2C timeouts are reported by dxs and frad while getting the status for an Hpc3130 hsPCI cassette. The impact is benign and limited to generating error messages in the platform, domain and domain console message logs.

Workaround: None.

Unmapped Response to Non-cacheable Request Corrupts State in AXQ Lock Module (BugId 4761277)

If two domains share an expander and a device driver (or OS extension) on one domain issues a bad address to programmed IO space, both domains could dstop. This only occurs with defective OS extensions which run in privileged mode such as device drivers.

Workaround: Do not share an expander between a production domain and a domain containing untested or problematic privileged mode software such as device drivers.

Sun Fire 15K Servers Can Fail to Detect Domain Stop Interrupts (BugId 4924523)

If a domain stop (dstop) interrupt is detected by hwad but not by dsmd, dsmd will report a heartbeat failure. Only hardware configuration information is dumped, and neither CPU register or domain data (dsmd.dump) is saved. Hardware configuration files report dstop condition.

Workaround: You can re-post the domain at an increased post level to reveal the source of the hardware problem.

SMS Will Not Start if IP Address Is Missing (BugId 4929849)

If the a high-end server's system controller cannot resolve its own hostname, then wcapp will not start. As a result, SMS will not start, either. Instead, you will see continuous wcapp error messages in the platform log. For example:

wcapp[9433:1]: [12300 8753505948023 ERR libWcApp.cc 2227]
Wcapp : java.net.UnknownHostException:

[1312 8753513433994 ERR StartupManager.cc 3021] software component failed: name=wcapp

[1304 8753514591425 NOTICE StartupManager.cc 2740] software component start-up initiated: name=wcapp

wcapp: [NOTICE] /usr/java1.2/lib/ext/jsse.jar, /usr/java1.2/lib/ext/jnet.jar, /usr/java1.2/lib/ext/jcert.jar: optional JSSE jarfiles not all found or not readable by user; running without SSL support


Workaround: Make sure that the SC's correct hostname (as returned by the hostname(1) command) and IP address are recorded in the /etc/hosts file or whichever naming service is in use. One way to record the name in the /etc/hosts file to run the smsconfig command again and enter the hostname and IP address that were used for the SC in the Site Planning Guide. When you have verified that the hostname and IP address are correct, restart SMS.

 

 

 

 

 

 

 


SMS 1.4 Documentation Errors

This section summarizes errors in the SMS 1.4 manpages and documentation.

SMS Upgrade Example In smsupgrade.1m Manpage Uses Wrong Suffix Numbers (BugID 4912378)

The upgrade example in the smsupgrade.1m manpage does not display the correct upgrade suffixes for the SMS packages. All upgraded packages should have a .2 suffix.

Workaround: Read the SMS 1.4 Installation Guide, instead.

The pcd.1m manpage Displays Incorrect Data Fields (BugId 4918650)

The platform data descriptors in the pcd.1m manpage and the SMS 1.4 Reference Manual are not correct. For SMS 1.4, the descriptors are version 3, and a Chassis Serial Number field has been added to platform information.

Workaround: none.

flashupdate Information in Installation Guide Is Incorrect. (BugId 4942045)

The SMS 1.4 Installation Guide did not point out that two flashupdate files, nSCCPOST.di and oSCCPOST.di, can only be used on certain types of system controllers (SC). Each of those files is intended only for the following hardware:

In addition, the examples on pages 23, 38, 52, and 61 show a CP1500 board on one SC and a CP2140 board on the other SC, which is not supported.

Workaround: To find out which type of SC you have, check the platform messages log file when SMS is started.

showboards -c Provides Wrong Information About WPCI Boards (BugId 4970807)

The showboards -c command, designed to display the clock source for all system boards, incorrectly indicates that all WPCI boards in the system are turned Off. The incorrect status is displayed only with the -c option.

Workaround: Ignore the status for WPCI boards or run the showboards command again without the -c option to verify board status.