C H A P T E R 5 |
Dynamic Reconfiguration on Sun Fire High-End Systems |
This chapter describes major domain-side dynamic reconfiguration (DR) bugs on Sun Fire high-end (Sun Fire E25K/E20K/15K/12K) systems running Solaris 9 9/05 software. It includes the known bugs at the time of this release.
For information about SMS-side DR bugs, see the SMS Release Notes for the version of SMS running on your system.
Description: When a DR command is executing on a system configured with the Freshchoice card (also called SunSwift PCI card, Option 1032), the system might display messages similar to the following:
These messages are benign; the DVMA space is properly refreshed during the DR operation. No true kernel memory leak occurs. This bug affects domains running both Solaris 8 and Solaris 9 operating environments.
Workaround: No workaround is necessary, but to prevent the message from displaying, add the following line to /etc/system:
Description: A cfgadm(1M) unconfigure operation on permanent memory executed on a system with a glm driver that is active might hang. The problem is specific to DR operations involving permanent memory, which require that the system be quiesced by means of suspend/resume. The problem lies with the glm driver. This bug affects domains running both Solaris 8 and Solaris 9 operating environments.
Workaround: Do not unconfigure permanent memory in the system if the glm driver is active.
Description: Unconfiguring a hsPCI or hsPCI+ I/O board while a PCI option card is being configured into it causes a system panic. For example, the panic would occur if the following commands were executed simultaneously. In this example, pcisch18:e03b1slot2 is one of the four PCI slots on IO3:
Workaround: Do not execute a PCI hotplug operation while a hsPCI or hsPCI+ I/O board is being unconfigured.
Description: Under certain error conditions, using DR to unconfigure a processor can leave that processor in the powered-off state. If psradm(1M) is then used to transition the processor to the off-line state, a system panic may result. Factors contributing to the problem are that Solaris does not expect processors to be in the powered-off state long-term, and psradm(1M) does not allow transitioning of processors to the powered-off state.
Workaround: Do not use psradm(1M) to offline a processor that is in the powered-off state.
Description: Sending a catchable signal, such as SIGINT sent by CTRL-C, to one or more cfgadm instances can cause those instances to hang. The problem is more likely to occur when multiple cfgadm processes are running, and can affect cfgadm instances on system boards, processors, I/O boards, and PCI slot attachment points. The problem has not been observed with a SIGKILL, and does not affect cfgadm status commands.
Workaround: None. To avoid this bug, do not send a catchable signal to a cfgadm process invoked to change the state of a component; for example, one executed with its -c or -x option.
Description: If nonpermanent memory is unconfigured, the system removes retired pages from the retired pages list to prevent them from becoming dangling pages. That is, pages that point to physical memory that would have been unconfigured. When permanent memory is unconfigured, a target board is identified and unconfigured first. Once a target board is ready, the contents of the source board (the permanent memory) are copied to the target board. The memory controllers on the target board are then "renamed" (programmed) withthe same address range as the source board. This means that if the source board contained any retired pages, these pages would not be dangling pages after the rename. They would point to valid addresses, but the physical memory behind those addresses is in the target board. The problem is that the physical memory is probably good (does not contain ECC errors).
Description: The automatic page removal feature may result in removal of a good page after a DR operation.
Copyright © 2006, Sun Microsystems, Inc. All Rights Reserved.