Solaris 8 2/02 Release Notes Supplement for Sun Hardware

Chapter 5 Open Issues for Sun Fire 6800/4810/4800/3800 Systems

This chapter describes open issues related to the use of the Solaris operating environment on Sun Fire 6800/4810/4800/3800 systems.

Sun Fire 6800/4810/4800/3800 Systems

This section provides information on using the Solaris operating environment.

Displaying System Configuration Information

The prtdiag command is one of the Solaris operating environment commands that displays system configuration parameters. The information on this command in the Sun Hardware Platform Guide for this operating system release is incorrect. Following is the corrected information.

The Solaris operating environment prtdiag (1M) command displays the following information to the domain of your Sun Fire 6800/4810/4800/3800 system:

Dynamic Reconfiguration on Sun Fire 6800/4810/4800/3800 Systems

Dynamic reconfiguration (DR) is supported for Solaris 8 2/02. This section includes open issues for DR on the Sun Fire 6800/4810/4800/3800 systems at the time of this release.


Note –

For information on the system controller firmware that contains DR functionality, refer to the Sun Fire 6800/4810/4800/3800 Systems Software Release Notes included with the 5.12.6 firmware release. This firmware and related documentation is included in SunSolve patch 112127-02, which is available on the SunSolve web site (http://sunsolve.Sun.com).


These release notes for dynamic reconfiguration (DR) on Sun Fire 6800, 4810, 4800, and 3800 systems cover the following topics:

System-Specific DR Support

System-specific DR support on the 6800/4810/4800/3800 systems is shown by the cfgadm command. System boards are indicated as class “sbd.” CompactPCI (cPCI) cards are shown as class “pci.” Users of DR through the cfgadm interface will see other DR classes as well.

For more information about system-specific problems with DR, see Known Dynamic Reconfiguration Bugs.

To view the classes that are associated with attachment points, run the following command as superuser:

# cfgadm -s “cols=ap_id:class”

Dynamic attachment points may also be listed by using the cfgadm command with the -a option. To determine the class of a specific attachment point, add the point as an argument to the above command.

Dynamic Reconfiguration Software Installation Instructions

The following software supports DR on a Sun Fire system: version 8 2/02 of the Solaris operating environment, and version 5.12.6 of the system firmware.

In addition, you have the option of installing the Sun Management Center (SunMC). Refer to the Sun Management Center 3.0 Supplement for Sun Fire 6800/4810/4800/3800 Systems for complete instructions.

Upgrading the System Firmware

An upgrade of the Sun Fire system firmware takes place via an FTP or HTTP connection from an FTP or HTTP server where the firmware image is stored. Refer to the Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual for more information.


Note –

Additional information about installing the firmware patch is available in the README and Install.info files that accompany the patch.



Caution – Caution –

Do not update the system controller firmware without updating the firmware for all your CPU/Memory boards and I/O assemblies. If the firmware for your CPU/Memory boards and I/O assemblies is different from the system controller firmware, you may not be able to boot your domains.


To Upgrade the System Firmware:
  1. Set up the FTP or HTTP server.

    For more information, see Appendix B of the Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual (part number 805-7373-13).

  2. Download the 5.12.6 firmware.

    This firmware and related documentation is included in SunSolve patch 112127-02, which is available on the SunSolve Web site located at :

    http://sunsolve.Sun.COM/pub-cgi/show.pl?target=patches/patch-access

  3. Copy the patch onto the FTP or HTTP server using a command such as the following:

    # cp /patch_location/* /export/ftp/pub/5.12.6

  4. Connect to the system controller console (serial port) to monitor the system when you upgrade the firmware (Step 6).

    The prompt for the system controller is:

    schostname:SC>

  5. Shut down all the domains by halting the Solaris operating environment.

    The keyswitch remains in the on position in these domains.

  6. In each domain that you shut down in step 5, set the keyswitch position to standby:

    schostname:A> setkeyswitch standby

  7. Verify that all the CPU/Memory boards and I/O assemblies are powered on by running the showboards command on the system controller in the platform shell:

    schostname:SC> showboards

  8. If any CPU/Memory boards or I/O assemblies are not powered on, use the poweron command on the system controller in the platform shell to power on those components:

    schostname:SC> poweron component_names

  9. Upgrade the firmware by using the flashupdate command on the system controller in the platform shell.


    Caution – Caution –

    Do not power down the system or reset the system while performing this step.


    Use the command syntax appropriate to the URL protocol:

    schostname:SC> flashupdate -f URL all

    The flashupdate command reboots the system controller and upgrades the CPU/Memory boards and I/O assemblies, scapp, and RTOS.


    Note –

    When running scapp 5.12.5 or higher and RTOS 18 or higher, the upgrade procedure updates scapp and RTOS only when the image to be installed is different from the image currently installed.


  10. After the system controller reboots successfully, connect to each domain console and power off all the CPU/Memory boards and I/O assemblies by setting the keyswitch position to off:

    schostname:A> setkeyswitch off

  11. Verify that all the CPU/Memory boards and I/O assemblies are powered off by running the showboards command on the system controller in the platform shell:

    schostname:SC> showboards

  12. If any CPU/Memory boards or I/O assemblies are not powered off, use the poweroff command on the system controller in the platform shell to power off those components:

    schostname:SC> poweroff component_names

  13. Bring up each domain by setting the keyswitch position to on:

    schostname:A> setkeyswitch on

  14. After all the domains have been brought up, update the configuration backup of the system controller by using the dumpconfig command:

    schostname:SC> dumpconfig -f URL

    where URL specifies the ftp protocol.

Known DR Limitations

This section contains known DR software limitations of the Sun Fire 6800, 4810, 4800, and 3800 systems.

General DR Limitations

Limitations Specific to CompactPCI

Procedures for Bringing a cPCI Network Interface (IPMP) Online or Offline

To Take a cPCI Network Interface (IPMP) Offline and Remove It
  1. Retrieve the group name, test address, and interface index by typing the following command.

    # ifconfig interface

    For example, ifconfig hme0

  2. Use the if_mpadm(1M) command as follows:

    # if_mpadm -d interface

    This takes the interface offline and causes the failover addresses to be failed over to another active interface in the group. If the interface is already in a failed state, then this step simply marks and ensures that the interface is offline.

  3. (Optional) Unplumb the interface.

    This step is required only if you want to use DR to reconfigure the interface automatically at a later time.

  4. Remove the physical interface.

    Refer to the cfgadm(1M) man page and the Sun Fire 6800, 4810, 4800 and 3800 Systems Dynamic Reconfiguration User Guide for more information.

To Attach and Bring Online a cPCI Network Interface (IPMP)
  1. Attach the physical interface.

    Refer to the cfgadm(1M) man page and the Sun Fire 6800, 4810, 4800 and 3800 Systems Dynamic Reconfiguration User Guide for more information.

After you attach the physical interface, it is automatically configured using settings in the hostname configuration file (/etc/hostname.interface, where interface is a value such as hme1 or qfe2).

This triggers the in.mpathd daemon to resume probing and detect repairs. Consequently, in.mpathd causes original IP addresses to failback to this interface. The interface should now be online and ready for use under IPMP.


Note –

If the interface had not been unplumbed and set to the OFFLINE status prior to a previous detach, then the attach operation described here would not automatically configure it. To set the interface back to the ONLINE status and failback its IP address after the physical attach is complete, enter the following command: if_mpadm -r interface


Operating System Quiescence

This section discusses permanent memory, and the requirement to quiesce the operating system when unconfiguring a system board that has permanent memory.

A quick way to determine whether a board has permanent memory is to run the following command as superuser:

# cfgadm -av | grep permanent

The system responds with output such as the following, which describes system board 0 (zero):

N0.SB0::memory connected configured ok base address 0x0, 4194304 KBytes total, 668072 KBytes permanent

Permanent memory is where the Solaris kernel and its data reside. The kernel cannot be released from memory in the same way that user processes residing in other boards can release memory by paging out to the swap device. Instead, cfgadm uses the copy-rename technique to release the memory.

The first step in a copy-rename operation is to stop all memory activity on the system by pausing all I/O operations and thread activity; this is known as quiescence. During quiescence, the system is frozen and does not respond to external events such as network packets. The duration of the quiescence depends on two factors: how many I/O devices and threads need to be stopped; and how much memory needs to be copied. Typically the number of I/O devices determines the required quiescent time, because I/O devices must be paused and unpaused. Typically, a quiescent state lasts longer than two minutes.

Because quiescence has a noticeable impact, cfgadm requests confirmation before effecting quiescence. If you enter:

# cfgadm -c unconfigure N0.SB0

The system responds with a prompt for confirmation:

System may be temporarily suspended, proceed (yes/no)?

If you are using SunMC to perform the DR operation, a pop-up window displays this prompt.

Enter yes to confirm that the impact of the quiesce is acceptable, and to proceed.

Dynamic Reconfiguration Software Bugs

This section contains the synopses and Sun BugID numbers of the more important bugs that have been discovered during testing of DR. This list does not include all bugs.

Known Dynamic Reconfiguration Bugs

cryptorand Exited After Removing CPU Board With Dynamic Reconfiguration (BugID 4456095)

Description: If a system is running the cryptorand process, which is found in the SUNWski package, an unconfigure of memory, such as part of a CPU/Memory (SB) board disconnect, causes cryptorand to close with messages recorded in /var/adm/messages. This action denies random number services to secure sub-systems, and any memory present when cryptorand is started should not be unconfigured.

The cryptorand process supplies a random number for /dev/random. After cryptorand is started, the amount of time before /dev/random becomes available depends on the amount of memory in the system. It takes about two minutes per GB of memory. Applications that use /dev/random to get random numbers may experience temporary blockage. It is not necessary to restart cryptorand if a CPU/memory board is added to a domain.

Workaround: If a CPU/memory board is removed from the domain, restart cryptorand by entering the following command as superuser:

# sh /etc/init.d/cryptorand start

SBM Sometimes Causes System Panic During DR Operations (BugID 4506562)

Description: A panic may occur when a system board that contains CPUs is removed from the system while Solaris Bandwidth Manager (SBM) is in use.

Workaround: Do not install SBM on systems that will be used for DR trials, and do not perform CPU system board DR operations on systems with SBM installed.

DR Hangs During Configure Operation With IB Board With vxdmpadm policy=check_all (BugID 4509462)

Description: A DR configure operation hangs with an IBx (I/O) board after a few successful iterations. This situation occurs when the DR operation is executed concurrently with the DMP daemon that is implementing the policy check_all with a time interval.

Workaround: To avoid the deadlock between the DMP daemon and system board DR, enter the following command before performing DR operations. This command stops and re-starts the DMP daemon.

# /usr/sbin/vxdmpadm stop restore

Unable to Disconnect SCSI Controllers Using DR (BugID 4446253)

Description: When a SCSI controller is configured but not busy, it cannot be disconnected using the DR cfgadm(1M) command.

Workaround: None.

cfgadm_sbd Plugin in Multi-Threaded Environment Is Broken (BugID 4498600)

Description: When a multi-threaded client of the cfgadm library issues concurrent sbd requests, the system may hang.

Workaround: None. Currently there are no existing applications implementing multithreaded usage of the cfgadm library.

DR Operations Hang After a Few Loops When CPU Power Control Is Also Running (BugID 4114317)

Description: When multiple concurrent DR operations occur, or when psradm is run at the same time as a DR operation, the system can hang because of a mutex deadly embrace.

Workaround: Perform DR operations serially (one DR operation at a time); and allow each to complete successfully before running psradm, or before beginning another DR operation.

SC Console Bus ERROR Seen While SNMP Enabled and Running DR Suite (BugID 4485505)

Description: A console bus error message is occasionally generated during SNMP get operations on the cpuModDescr object. This occurs infrequently, and only when SunMC is monitoring a system. When the message does occur, unknown is returned to SunMC as the value of the cpuModDescr object.

Workaround: The only workaround is to not use SunMC. However, the message is harmless, and the problem occurs rarely, so it is safe simply to ignore it. The only risk is that the SunMC GUI may occasionally display the wrong value for cpuModDescr.

System May Panic When send_mondo_set Times Out (BugID 4518324)

A Sun Fire system may panic if one or more of the CPU boards are sync paused during a DR operation. Sync pause is required to attach or detach boards. If there are outstanding mondo interrupts, and for any reason the SC is not able to complete sync pause within the one-second send_mondo timeout limit, the system panics.