C H A P T E R  1

System Management Services (SMS) 1.4 Release Notes

This chapter contains the release notes for System Management Services (SMS) 1.4 on Sun Fire high-end systems and covers the following topics:


SMS 1.4 Known Limitations

This section contains known limitations that involve SMS on a Sun Fire high-end system:


General Notes and Issues

This section contains general notes and issues that involve SMS on Sun Fire high-end systems.

Automatic Diagnosis and Recovery

The following automatic diagnosis and domain recovery features are enabled by default in SMS 1.4:

SMS 1.4 includes three diagnosis engines (DEs) that analyze certain hardware errors and identify the components associated with errors that affect the availability of the system and its domains:

The SMS DE diagnoses hardware errors associated with domain stops (dstops).

The Solaris operating environment (also referred to as the Solaris DE) identifies non-fatal domain hardware errors and reports them to the system controller.

The POST DE identifies any hardware test failures that occur when the
power-on self-test is run in SMS.

The DEs record the diagnosis information for the affected components and maintain this information as part of the component health status (CHS).

The diagnosis engines report diagnosis information through the following channels:

These event messages contain the chassis serial number of the affected system and event codes that identify the fault or error event. These event messages are also recorded in the SMS event log, which can be viewed by running the showlogs command.

Contact your service provider when you see these event messages. Your service provider uses the chassis serial number and event code to initiate the appropriate service action.



Note - In some cases the diagnosis engine cannot assign a reasonable event code based on the diversity of components associated with the fault. In such cases the event code will contain the word UNKNOWN, for example SF15000-UNKNOWN. Contact your service provider as usual to initiate the appropriate service action.



You can configure the email event notification features to receive immediate notice of critical fault events, without manually monitoring the platform or domain logs. As with the event messages, contact your service provider when you receive these emails, so that your service provider can initiate the appropriate service action.

For hardware errors associated with dstops, POST reviews the CHS information of affected components and deconfigures any faulty components from the system.

For further information on these features, see the "Automatic Diagnosis and Recovery" chapter in the System Management Services (SMS) 1.4 Administrator Guide.

New SMS 1.4 Commands

The following new daemons and commands are related to the automatic diagnosis and recovery features introduced in SMS 1.4. For detailed information on these daemons and commands, refer to their descriptions in the System Management Services (SMS) 1.4 Reference Manual.

/opt/SUNWSMS/SMS1.4/lib/smsadmin/testemail

Revised SMS 1.4 Commands

The following commands were updated in SMS 1.4 to reflect changes introduced by the automatic diagnosis and recovery features. For further information on these commands, refer to their descriptions in the System Management Services (SMS) 1.4 Reference Manual.

Chassis Serial Number

The chassis serial number is used to identify a Sun Fire high-end system. The serial number identifies the platform in system event messages and is used by service providers to correlate events and service actions to the correct system.

The chassis serial number is printed on a label in the front of the system chassis, near the bottom center. Starting with the SMS 1.4 release, the chassis serial number is automatically recorded by Sun manufacturing on systems that ship with SMS 1.4 installed. To view the chassis serial number, run the showplatform -p csn command.

If you are upgrading to SMS 1.4 from an earlier SMS version, use the setcsn(1M) command to record the chassis serial number of your Sun Fire high-end system. For details on setting the chassis serial number, refer to the System Management Services (SMS) 1.4 Installation Guide and the setcsn command description in the System Management Services (SMS) 1.4 Reference Manual.

Capacity On Demand (COD)

You can now temporarily enable an available, instant access CPU (also referred to as headroom) to replace a failed non-COD CPU. In this case, the instant access CPU is considered as a hotspare, which is a spare CPU that can be used immediately to replace a failed non-COD CPU. However, once you replace the failed non-COD CPU, you must deactivate the instant access CPU as explained in the "Capacity on Demand" chapter of the System Management Services (SMS) 1.4 Administrator Guide). Contact your Sun sales representative or reseller to purchase a COD RTU license for the instant access CPU in use if you want to continue using it.

System Controller External Network Configuration

Each system controller (SC) must be configured for the TCP/IP network to which it is attached. Refer to the System Administration Guide: Resource Management and Network Services of the Solaris 9 System Administrator Collection for details on planning and configuring a TCP/IP-based network. SMS supports both IPv4 and IPv6 configurations.

In this release, the SC supports network connections through the RJ45 jacks on the faceplate of each SC. This corresponds to the network interface hme0 and eri1 under Solaris software for each SC. You will be required to configure hme0 or eri1 on each SC with appropriate information for your TCP/IP network. Using this configuration, each SC is known to external network applications by a separate IP hostname and address.



Caution - The IP addresses shown in the smsconfig examples in the Sun Fire high-end system documentation are examples only. Always refer to your Sun Fire 15K/12K System Site Planning Guide for valid IP addresses for your network. Using invalid network IP addresses could under certain circumstances render your system unbootable!



Each SC operates in one of two mutually exclusive modes: main or spare. The SC that is in main mode is the SC that controls the machine. The SC that is in spare mode acts as a spare that automatically takes over if the main SC fails. It is important to know which system controller is the main SC and which is the spare SC. To determine the SC role log in to the SC and use the following command:

sc0:sms-user:> showfailover -r
MAIN

If you do not configure the external community network, applications such as Sun Management Center, telnet, and others will need to be given the appropriate IP hostname of the main system controller. In the case of an SC failover, these applications need to be restarted with the IP address of the new main SC.



Note - Any changes made to the network configuration on one SC using smsconfig -m must be made to the other SC as well. Network configuration is not automatically propagated.



System BREAK Sequence

To facilitate failover, the BREAK sequence to stop the system has been changed from STOP-A to the alternate [RETURN] [TILDE] [CONTROL B].



Note - There must be an interval of more than 0.5 seconds between characters, and the entire string must be entered in less than 5 seconds.



Solaris 8 introduced this new feature which gives the system the ability to force a hanging system to halt when required, without allowing random or spurious breaks to cause an unintentional stop. This is true only with serial devices acting as consoles and not for systems with keyboards of their own.

The following line is uncommented by default in the /etc/default/kbd file:

KEYBOARD_ABORT=alternate



Note - Do not return the use of STOP-A to the system. Your system will lose failover functionality.



IPSec Configuration

Disks intended to be used on a Product Name must be installed using a Sun Fire high-end system. Policy placed in /etc/inet/inetd.conf must be added manually to /etc/inet/ipsecinit.conf as well.

Whenever policy is taken out of /etc/inet/inetd.conf it must be removed manually from /etc/inet/ipsecinit.conf also.

Refer to Bug Id 4449848.

smsconnectsc Command

smsconnectsc is intended to be used in the event a remote SC hangs and cannot be accessed normally through login. Using smsconnectsc to create a remote console session from the local SC can result in the local SC losing monitoring capability and functionality. Do not use smsconnectsc except for the express purpose of system recovery.

Reinstallation and Upgrade

Previous versions of SMS documented the use of the Javatrademark WebStart GUI and the pkgadd command to install the SMS packages on the Sun Fire high-end system. SMS 1.3 introduced the smsinstall and smsupgrade scripts which simplify and streamline the installation and upgrade process to the extent that WebStart and pkgadd are no longer recommended or documented. Because of the complexity of configuration for SMS, do not use any method other than the ones documented in the System Management Services (SMS) 1.4 Installation Guide to install or upgrade SMS 1.4. Doing so could result in misconfiguration and loss of functionality.


SMS Documentation Notes

This section contains documentation notes that involve SMS on the Product Name.

Part Numbers

Software documentation for this release is provided at:

http://www.sun.com/products-n-solutions/hardware/docs/Servers/High-End_Servers/Sun_Fire_15K

These files are named by part number. For your convenience, here are the associated document titles:

817-3055-10.pdf - System Management Services (SMS) 1.4 Installation Guide (replaces 816-5320-10)

817-3056-10.pdf - System Management Services (SMS) 1.4 Administrator Guide (replaces 816-5318-10)

817-3057-10.pdf - System Management Services (SMS) 1.4 Reference Manual (replaces 816-5319-10)

817-3058-10.pdf - System Management Services (SMS) 1.4 Release Notes (replaces 816-5321-10)

817-3075-10.pdf - Sun Fire High End Systems Software Overview Guide (replaces 816-5322-10)

Documentation Errata

Example 1 in the testemail(1M) man page and command description in the System Management Services (SMS) 1.4 Reference Manual omitted the full path to the testemail command. The correct command specification is as follows:

sc0:sms-user:> /opt/SUNWSMS/SMS1.4/lib/smsadmin/testemail -c fault.board.ex.l1l2, fault.board.io.l1l2 -dD -i EX7,IO8

See BugID 4934058.