Solaris 8 2/02 Release Notes Supplement for Sun Hardware

Dynamic Reconfiguration Software Bugs

This section contains the synopses and Sun BugID numbers of the more important bugs that have been discovered during testing of DR. This list does not include all bugs.

Known Dynamic Reconfiguration Bugs

cryptorand Exited After Removing CPU Board With Dynamic Reconfiguration (BugID 4456095)

Description: If a system is running the cryptorand process, which is found in the SUNWski package, an unconfigure of memory, such as part of a CPU/Memory (SB) board disconnect, causes cryptorand to close with messages recorded in /var/adm/messages. This action denies random number services to secure sub-systems, and any memory present when cryptorand is started should not be unconfigured.

The cryptorand process supplies a random number for /dev/random. After cryptorand is started, the amount of time before /dev/random becomes available depends on the amount of memory in the system. It takes about two minutes per GB of memory. Applications that use /dev/random to get random numbers may experience temporary blockage. It is not necessary to restart cryptorand if a CPU/memory board is added to a domain.

Workaround: If a CPU/memory board is removed from the domain, restart cryptorand by entering the following command as superuser:

# sh /etc/init.d/cryptorand start

SBM Sometimes Causes System Panic During DR Operations (BugID 4506562)

Description: A panic may occur when a system board that contains CPUs is removed from the system while Solaris Bandwidth Manager (SBM) is in use.

Workaround: Do not install SBM on systems that will be used for DR trials, and do not perform CPU system board DR operations on systems with SBM installed.

DR Hangs During Configure Operation With IB Board With vxdmpadm policy=check_all (BugID 4509462)

Description: A DR configure operation hangs with an IBx (I/O) board after a few successful iterations. This situation occurs when the DR operation is executed concurrently with the DMP daemon that is implementing the policy check_all with a time interval.

Workaround: To avoid the deadlock between the DMP daemon and system board DR, enter the following command before performing DR operations. This command stops and re-starts the DMP daemon.

# /usr/sbin/vxdmpadm stop restore

Unable to Disconnect SCSI Controllers Using DR (BugID 4446253)

Description: When a SCSI controller is configured but not busy, it cannot be disconnected using the DR cfgadm(1M) command.

Workaround: None.

cfgadm_sbd Plugin in Multi-Threaded Environment Is Broken (BugID 4498600)

Description: When a multi-threaded client of the cfgadm library issues concurrent sbd requests, the system may hang.

Workaround: None. Currently there are no existing applications implementing multithreaded usage of the cfgadm library.

DR Operations Hang After a Few Loops When CPU Power Control Is Also Running (BugID 4114317)

Description: When multiple concurrent DR operations occur, or when psradm is run at the same time as a DR operation, the system can hang because of a mutex deadly embrace.

Workaround: Perform DR operations serially (one DR operation at a time); and allow each to complete successfully before running psradm, or before beginning another DR operation.

SC Console Bus ERROR Seen While SNMP Enabled and Running DR Suite (BugID 4485505)

Description: A console bus error message is occasionally generated during SNMP get operations on the cpuModDescr object. This occurs infrequently, and only when SunMC is monitoring a system. When the message does occur, unknown is returned to SunMC as the value of the cpuModDescr object.

Workaround: The only workaround is to not use SunMC. However, the message is harmless, and the problem occurs rarely, so it is safe simply to ignore it. The only risk is that the SunMC GUI may occasionally display the wrong value for cpuModDescr.

System May Panic When send_mondo_set Times Out (BugID 4518324)

A Sun Fire system may panic if one or more of the CPU boards are sync paused during a DR operation. Sync pause is required to attach or detach boards. If there are outstanding mondo interrupts, and for any reason the SC is not able to complete sync pause within the one-second send_mondo timeout limit, the system panics.