C H A P T E R 4 |
Dynamic Reconfiguration on Sun Fire Midrange Systems |
This chapter describes major issues related to dynamic reconfiguration (DR) on Sun Fire midrange (E6900/E4900/6800/4810/4800/3800) systems running Solaris 9 4/04 software.
This section contains general information about DR on Sun Fire midrange systems.
TABLE 4-1 shows acceptable combinations of Solaris 9 software and SC firmware for each Sun Fire midrange system to run DR. If the platform listed in the first column is running the Solaris release shown in the second column, the minimum SC firmware release is on that same line in the third column.
The cfgadm command shows Sun Fire midrange server system boards as class "sbd" and CompactPCI (cPCI) cards as class "pci."
For more information about system-specific issues with DR, see Known DR Limitations.
To view the classes that are associated with attachment points, run the following command as superuser:
To also list the dynamic attachment points and their classes, add the cfgadm command's -a option as an argument to the preceding command.
For information about using Sun Management Center (Sun MC) with your Sun Fire midrange system, refer to the Sun Management Center Supplement for Sun Fire Midrange Systems.
You can upgrade the system firmware for your Sun Fire midrange system through connection with an FTP or HTTP server where the firmware images are stored. For more information, refer to the README and Install.info files included in the firmware release(s) running on your domains. You can download Sun patches from http://sunsolve.sun.com.
This section contains known software limitations of DR on Sun Fire midrange systems.
Before performing any DR operation on an I/O (IBx) board, enter the following command to stop the vold daemon:
After the DR operation has successfully completed, enter the following command to restart the vold daemon:
On Sun Fire midrange systems, DR supports neither SAI/P (BugID 4466378) nor HIPPI/P. Previous releases did not support the SunHSI/P driver, but the bug that prevented support, 4496362, was fixed in patch 106922 (2.0) and 109715 (3.0). For more information see SunSolve.
You must execute the devfsadm(1M) command in order to see any changes that have been made, especially in regard to changes from PCI to cPCI.
You can unconfigure a CompactPCI (cPCI) I/O assembly only if all the cards in the board are in an unconfigured state. If any cPCI card is busy (such as with a plumbed/up interface or a mounted disk), the board unconfigure operation fails with the status "busy." All cPCI cards should be unconfigured before attempting to unconfigure the cPCI I/O assembly.
When a multipath disk is connected to two cPCI cards, it is possible to see disk activity across the cards when none is expected. For this reason, make sure that there is no activity on the local side of the resource. This is more likely to occur when attempting to perform DR operations on a cPCI card that shows a busy status, even when there is no activity on the local side of the resource. A subsequent DR attempt might be required.
When a user lists the attachment point for a cPCI board using the cfgadm(1M) command with the -a option, cPCI slots and PCI buses are all listed as attachment points. The cfgadm -a command displays an attachment point for a PCI bus as N0.IB8::pci0. There are four such attachment points for each cPCI board. The user should not perform DR operations on these points, nor on the sghsc attachment point (which the cfgadm -a command displays as N0.IB8::sghsc4), because DR is not actually performed, and some internal resources are removed. Using DR on these attachment points (bus and sghsc) is strongly discouraged.
In order for DR to function properly with cPCI cards, the levers on all cPCI cards that are inserted at Solaris boot time must be fully engaged.
Unconfiguring a cPCI card automatically disconnects it, too. If autoconfigure is enabled, connecting a cPCI card also configures it. If autoconfigure is disabled, you must do the configure manually.
This section discusses permanent memory, and the requirement to quiesce the operating system when unconfiguring a system board that has permanent memory.
A quick way to determine whether a board has permanent memory is to run the following command as superuser:
The system responds with output such as the following, which describes system board 0 (zero):
N0.SB0::memory connected configured ok base address 0x0, 4194304 KBytes total, 668072 KBytes permanent |
Permanent memory is where the Solaris kernel and its data reside. The kernel cannot be released from memory in the same way that user processes residing in other boards can release memory by paging out to the swap device. Instead, cfgadm uses the copy-rename technique to release the memory.
The first step in a copy-rename operation is to stop all memory activity on the system by pausing all I/O operations and thread activity; this is known as quiescence. During quiescence the system is frozen and does not respond to external events such as network packets. The duration of the quiescence depends on two factors: How many I/O devices and threads need to be stopped; and how much memory needs to be copied. Typically, the number of I/O devices determines the required quiescent time, because I/O devices must be paused and unpaused. A quiescent state usually lasts longer than two minutes.
Because quiescence has a noticeable impact, cfgadm requests confirmation before implementing quiescence. If you enter:
The system responds with a prompt for confirmation:
If you use Sun Management Center to perform the DR operation, a pop-up window displays this prompt.
Enter Yes to confirm that the impact of the quiesce is acceptable, and to proceed.
This section lists important DR bugs.
Description: Sending a catchable signal, such as SIGINT sent by CTRL-C, to one or more cfgadm instances can cause those instances to hang. The problem is more likely to occur when multiple cfgadm processes are running, and can affect cfgadm instances on system boards, processors, I/O boards, and PCI slot attachment points. The problem has not been observed with a SIGKILL, and does not affect cfgadm status commands.
Workaround: None. To avoid this bug, do not send a catchable signal to a cfgadm process invoked to change the state of a component; for example, one executed with its -c or -x option.
Description: A panic might occur when a system board that contains CPUs is removed from the system while Solaris Bandwidth Manager (SBM) is in use.
Workaround: Do not install SBM on systems that will be used for DR, and do not perform CPU system board DR operations on systems with SBM installed.
Description: A DR configure operation hangs with an IBx (I/O) board after a few successful iterations. This occurs when the DR operation is executed concurrently with the DMP daemon that is implementing the policy check_all with a time interval.
Workaround: Install VM 3.2 Patch 01.
Description: On systems actively running Oracle/TPCC, DR CPU/memory board unconfigure operations might take an unusually long time to complete (up to 8 hours), and might also negatively impact Oracle performance.
Workaround: Do not perform CPU/memory board DR unconfigure operations while Oracle/TPCC is running.
Description: On Sun Fire midrange systems, a Compact PCI (cPCI) I/O board cannot be unconfigured when Port 0 (P0) on that board is disabled. This problem exists only on systems running Solaris 9 software or Solaris 8 software with PatchID 108528-23. It occurs only during DR operations that involve cPCI boards, and displays an error message similar to the following:
# cfgadm -c unconfigure NO.IB7 cfgadm: Hardware specific failure: unconfigure N0.IB7: Device busy:/ssm@0,0/pci@1b,700000/pci@1 |
where NO.IB7 is a CompactPCI I/O Board with P0 disabled.
Workaround: If you do not need to disable P0 itself, disable its slots, instead.
Description: If a processor is transitioned from the powered-off to the off-line state with psradm(1M), a subsequent DR unconfigure operation on this processor can result in a system panic.
Workaround: Do not use psradm(1M) to offline a processor that is in the powered-off state.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.