C H A P T E R 2 - SMS 1.5 Bugs

The smsrestore command will fail if there are more than 4095 files in the cpio archive.

The workaround is to remove unneeded files and recreate the cpio archive with smsbackup. The most likely candidates for unneeded files are post logs and dump files. There may be up to 1000 post logs per domain and up to 1000 dump files per domain.

FMA Event Reporting to NetConnect Doesn't Pick Up Modified Chassis Serial Number (CR ID 5052078)

If a Sun Fire high-end server runs without having its chassis serial number (CSN) set on the SCs using the setcsn command, any Fault Management Architecture (FMA) reports sent to NetConnect after a domain stop (Dstop) will show the serial number as blank in its event reports.

Workaround: Use the setcsn command to set the chassis serial number and then restart SMS. You must restart SMS in order for the CSN to appear in the event reports.

For more information about how to set the chassis serial number on the SC, refer to the System Management Services (SMS) 1.5 Installation Guide.

ndd/dev/scman man_pathgroups_report Output Needs Clarification (CR ID 6252771)

The ndd(1M) command can be executed as root in order to read and write certain device driver parameters. scman(7D) (ndd/dev/scman) manages the Starcat SC side of the Management (MAN) Network, and it supports the ndd(1M) command.

If the man_pathgroups_report parameter of scman(7D) is not interpreted correctly, it may appear as though a serious hardware error has occurred, when the error is actually caused by software. As a result, it might incorrectly be concluded that swapping of hardware is required in order to root-cause the problem.

When the man_pathgroups_report parameter is specified, one can obtain output such as the following:

# ndd /dev/scman man_pathgroups_report

MAN Pathgroup report: (* == error)

Interface       Destination             Active Path     Alternate Paths

----------------------------------------------------------------

scman1          Other SSC               eri0 eri0 exp 0, hme1 exp 0 *

The asterisk (*) in the last line denotes that "the last time the hme1 physical interface was used, an error was found". Historically, the majority of occurrences are due to software, not hardware.

Software causes an error when either the MAN network peer no longer responds to "heartbeat" messages, or when there is an incorrect dlpi(7P) state transition. One can repeatedly create the former case by running the following command as root (assuming the exact output appears as shown above):

# ndd -set /dev/scman man_set_active_path '1 0 1'

For the SC that executes the command (eg, SC0), its Active Path is switched from eri0 to hme1. For a while, SC1 will continue to send packets on the eri0 physical interface, and SC0 will send packets on hme1. After a short while, the two will synchronize and communicate using the same interface. However, an asterisk will be shown (on each SC) to show the last interface on which there was an error. In this case, the error is indeed caused by software (that is, the error is really a non-response to a "heartbeat" message sequence). It is not a fatal hardware error.

An asterisk will indeed be shown in the output if there is a persistent, fatal hardware error. However, one should not assume that hardware is the only possible cause of the asterisk.

showenvironment Reports Domain A Does Not Have Any Boards Assigned, Then Outputs the Report (CR ID 6299795)

If you remove, install, and assign boards in Domain A on your Sun Fire system and then use the showenvironment command with the -d A option, the command returns an error message stating :

The error message is incorrect and can be ignored. This issue occurs only on Domain A.

SMS 1.5 Documentation Errata

rcfgadm(1M)

If the rcfgadm command fails, a board does not return to its original state. A dxs or dcs error message is logged to the domain. If the error is recoverable, you can retry the command.

Before you retry the command, ensure that the following dcs entries exist in /etc/inetd.conf on the domain, and that they have not been disabled:

sun-dr stream tcp wait root /usr/lib/dcs dcs

sun-dr stream tcp6 wait root /usr/lib/dcs dcs

If the error is unrecoverable, you must reboot the domain in order to use that board.

testemail(1M)

The description of the -c option in the testemail(1M) man page should read as follows:

The fault class or comma-separated list of fault classes that testemail uses to generate an event.

Examples of valid fault classes are in the file /etc/opt/SUNWSMS/config/SF15000.dict .

When invoking testemail using an ecache resource, make sure that the system board containing the ecache is powered on. Otherwise, the testemail invocation will fail and no email will be generated.

System Management Services (SMS) 1.5 Administrator Guide

The description of VCMON is incorrect for the Sun Fire high-end systems. The correct description appears in VCMON of this document.

In the description of the showboards command, the -a option should read -v.

In the description of the showenvironment command, the category "Device" should be removed.

The following categories of error messages should be added between error codes 11300 and 50000:

System Management Services (SMS) 1.5 Installation Guide

Upgrade the Solaris OS. See "To Install or Upgrade the Solaris OS on the SC" on page 17.

Run smsupgrade to reinstall SMS after a major OS upgrade (see page 34). Otherwise, proceed to the next step and restore the SMS configuration.

The heading "To Reinstall SMS Software" should read "To Restore the SMS Configuration."

Bugs in SMS 1.5 Software

More Than 4095 Files in Backup `cpio` Breaks `smsrestore` (CR ID 6295142)

FMA Event Reporting to NetConnect Doesn't Pick Up Modified Chassis Serial Number (CR ID 5052078)

`ndd/dev/scman man_pathgroups_report` Output Needs Clarification (CR ID 6252771)

`showenvironment` Reports Domain A Does Not Have Any Boards Assigned, Then Outputs the Report (CR ID 6299795)

SMS 1.5 Documentation Errata

`rcfgadm`(1M)

`testemail`(1M)

System Management Services (SMS) 1.5 Administrator Guide

System Management Services (SMS) 1.5 Installation Guide

Bugs in SMS 1.5 Software

More Than 4095 Files in Backup cpio Breaks smsrestore (CR ID 6295142)

FMA Event Reporting to NetConnect Doesn't Pick Up Modified Chassis Serial Number (CR ID 5052078)

ndd/dev/scman man_pathgroups_report Output Needs Clarification (CR ID 6252771)

showenvironment Reports Domain A Does Not Have Any Boards Assigned, Then Outputs the Report (CR ID 6299795)

SMS 1.5 Documentation Errata

rcfgadm(1M)

testemail(1M)

System Management Services (SMS) 1.5 Administrator Guide

System Management Services (SMS) 1.5 Installation Guide

More Than 4095 Files in Backup `cpio` Breaks `smsrestore` (CR ID 6295142)

`ndd/dev/scman man_pathgroups_report` Output Needs Clarification (CR ID 6252771)

`showenvironment` Reports Domain A Does Not Have Any Boards Assigned, Then Outputs the Report (CR ID 6299795)

`rcfgadm`(1M)

`testemail`(1M)