|C H A P T E R 1|
Information on DR models
Details on getting started with DR configuration
An overview of DR configuration tasks
Prerequisite tasks to be completed before a DR detach operation
Details on the configuration changes that occur during DR detach operations and how to control certain conditions when a detach operation is in progress
Note - In this document, the phrase "DR detach operation" refers to the complete detach or removal of a system board. This detach operation can be accomplished by using the ADR command deleteboard(1M). For instructions on detaching boards from Solaris 9 domains (which support only DR model 3.0), refer to the Sun Enterprise 10000 Dynamic Reconfiguration User Guide (part number 816-3627-10).
There are two models of DR available for the Sun Enterprise 10000 system. DR model 2.0 is sometimes referred to as "legacy DR," and DR model 3.0 is referred to as "next generation DR." The following table shows the different versions of the Solaris operating environment and the SSP software that are used with DR models 2.0 and 3.0:
Only one model of DR can run within a domain at a time. To check the version of DR that is running, use the domain_status command with its -m option (available only on domains running version 3.5 of the SSP software). Make sure to verify the DR model before you execute any DR commands. The following is an example of the domain_status (1M) output. The DR-MODEL column indicates which model is enabled
According to this output, domain A is running Solaris version 8 software (OS 5.8) with DR model 2.0 enabled; domain B is running Solaris version 8 software with DR model 3.0 enabled; domain C is running Solaris version 7 software (OS 5.7) with DR model 2.0 enabled; and domain D is running Solaris version 9 software (OS 5.9) with DR model 3.0 enabled.
Caution Caution - Before you switch to DR 3.0 in a domain that is running the Solaris 8 10/01 operating environment, you must upgrade the SSP software to version 3.5 because previous versions of SSP do not support DR 3.0 operations.
For more information about using DR 2.0, see the the Sun Enterprise 10000 Dynamic Reconfiguration (DR) User Guide (part number 806-7616-10). For more information about using DR 3.0, see the Sun Enterprise 10000 Dynamic Reconfiguration (DR) User Guide (part number 816-3627-10).
DR 3.0 has a framework that offers better integration with applications, through the Reconfiguration Coordination Manager.
DR 3.0 supports network multipathing using IPMP.
You execute DR operations from either of two places: from the system service processor (SSP) by using the SSP commands-- addboard (1M), moveboard (1M), deleteboard (1M), rcfgadm (1M), and showdevices (1M); or from the domain, using the cfgadm(1M) command.
To use multipathing on DR model 3.0 domains, run IPMP (the IP multi-pathing software provided with the Solaris 8 operating environment) and MPxIO software, included in Solaris Kernal Update Patches 111412-02, 111413-02, 111095-02, 111096-02, and 111097-02.
Be familiar with how devices must be configured before DR detach operations, as explained in Device Prerequisites .
Verify that you have sufficient swap space for your domain.
For details, see Allocating Sufficient Domain Swap Space .
Qualify any third-party device drivers, as described in Qualifying Third-Party Device Drivers .
Detach-safe or not currently loaded
A detach-safe driver supports the device driver interface (DDI) function, DDI_DETACH . This function provides the ability to detach a particular instance of a driver without affecting other instances that are servicing other devices.
A detach-unsafe driver is one that does not support DDI_DETACH . If a detach-unsafe driver is loaded, you must unload it before performing a DR detach operation. For details on unloading a detach-unsafe device, see Preparing for DR Detach Operations .
Suspend-safe or closed
A suspend-safe device driver supports the quiescence (pausing) of the Solaris operating environment during the detach of a board that contains nonpageable OBP or kernel memory. In order for DR to perform the detach, the operating environment must temporarily suspend all processes, processors, and device activities to unconfigure the memory component.
A suspend-safe device supports the DDI_SUSPEND/DDI_RESUME function. This function enables a device to be suspended during a system quiescence and then resumed. The device managed by the driver will not attempt to access the domain centerplane (for example, it does not access memory or interrupt the system), even if the device is open when the suspend request is made. The quiescence only affects the target domain; other domains are not affected.
If a driver does not support the function DDI_SUSPEND/DDI_RESUME , the device is considered to be suspend-unsafe because the operating environment cannot quiesce if a suspend-unsafe device is present. If a system quiescence is required for a DR detach operation, you must manually suspend a suspend-unsafe device so that the quiescence can occur. For details, see To Manually Suspend a Suspend-Unsafe Device .
Note Note - The drivers currently released by Sun Microsystems that are known to be suspend-safe are: st, sd, isp, esp, fas, sbus, pci, pci-pci, qfe, and hme (Sun FastEthernet); nf (NPI-FDDI); qe (Quad Ethernet); le (Lance Ethernet); the SSA drivers (soc, pln, and ssd); and the Sun StorEdge A5000 drivers (sf, socal, and ses). For additional information about suspend-safe and detach-safe device drivers, contact your Sun service representative.
The domain swap configuration consists of the swap devices and swapfs (memory). The domain must contain enough swap space so that it can flush pageable memory. For example, if you want to remove 1 Gbyte of memory from a 2-Gbyte domain, you will need 1 Gbyte of swap space, depending on the load. Insufficient swap space can prevent the completion of a DR operation.
The domain swap space must be configured as multiple partitions on disks attached to controllers hosted by different boards. With this type of configuration, a particular swap partition is not a vital resource because swap partitions can be added and deleted dynamically (refer to the swap (1M) man page for more information).
Many third-party drivers (those purchased from vendors other than Sun Microsystems) do not support the standard Solaris modunload (1M) interface, which is used to unload detach-unsafe or suspend-unsafe device drivers. Conditions that invoke the driver functions occur infrequently during normal operation and the functions are sometimes missing or work improperly. Sun Microsystems suggests that you test these driver functions during the qualification and installation phases of any third-party device.
This section identifies the various configuration tasks that you must complete before running DR operations on Solaris 9 domains (which support only DR model 3.0). Note that it may not be necessary to perform all the tasks described in this section, depending on the types of devices on your system boards and the type of DR operation to be performed.
After you configure DR or whenever you make changes to the DR configuration, you must reboot your domain. If you want to minimize the number of domain reboots, determine which configuration tasks are applicable to your DR environment and then perform the appropriate set of configuration tasks before rebooting your domain.
If you intend to perform DR detach operations, enable the kernel cage, as explained in To Enable the Kernel Cage .
For devices, do the following:
If you set network configuration parameters manually, make these settings permanent as described in To Set Permanent Driver Parameters for Network Drivers .
If you have soc and pln devices, enable device suspension, as described in To Enable Device Suspension for the soc and pln Drivers .
If you have suspend-unsafe devices, specify those devices in the unsafe driver list, which blocks a quiesce from starting.
For details, see To Specify an Unsafe Driver List .
If you have tape devices that are not supported by Sun Microsystems, make those devices detach-safe.
For details, see To Make an Unsupported Tape Device Detach-Safe .
If you want to use multipathing, configure your domain for multipathing and run the appropriate multipathing software on the domain.
Reboot the domain to process the configuration changes.
Note Note - You must reboot the domain after any changes to the DR configuration. If you want to minimize the number of reboots, you may want to perform various configuration tasks then reboot the domain.
After the reboot completes successfully, review the /var/adm/messages file for messages that verify the DR configuration changes.
A caged kernel confines the nonpageable memory to a minimal (most often one) number of systems boards. By default the kernel cage is disabled, preventing DR detach operations. If you plan to perform DR detach operations, you must enable the kernel cage by using the system (4) variable kernel_cage_enable , as explained in the following procedure.
Note Note - Before the release of version 7 of the Solaris software, the dr-max-mem variable was used to enable DR. The dr-max-mem variable is not used to enable DR in version 7 and subsequent versions of the Solaris software.
DR reads this list when it prepares to suspend the operating environment so that a board containing nonpageable memory can be detached. If DR finds an active driver in the unsafe driver list, it aborts the operation and returns an error message. The message identifies the active, unsafe driver. You must manually suspend the device so that the DR operation can be performed.
For the Solaris 9 operating environment, tape devices that are natively supported by Sun Microsystems are suspend-safe and detach-safe. For details, refer to the st (7D) man page for a list of natively-supported drives. If a system board to be detached contains a natively-supported tape device, you can safely detach the board without suspending the device.
You must prepare a board for DR detach operations by following the steps described below. Although the following list of tasks implies a sequence of order, strict adherence to the order is not necessary. These steps apply to boards containing I/O or non-network devices. .
Unmount file systems.
If you have suspend-unsafe devices that manage file systems, unmount those file systems before a detach operation. If have to manually suspend unsafe devices that manage file systems, lock those file systems using the lockfs (1M) command before manually suspending the unsafe devices.
Remove disk partitions from the swap configuration by using swap (1M).
If you want to detach a board that hosts Sun StorEdge A3000 controllers, make those controllers idle or take them offline manually using the rm6 or rdacutil programs.
Close all non-network devices by doing the following:
Close all instances of a device by killing any processes that directly open the device or raw partition, or by directing the process to close an open device on the board.
Run modunload (1M) to unload each detach-unsafe or loaded device driver.
Note Note - In situations where you cannot unload a device that has an unsafe driver, you can blacklist the board that contains the unsafe device and then reboot the domain. You can remove the board later. For details on blacklisting, refer to the blacklist(1M) man page.
Processes bound to the processors of a board prevent that board from being detached. You can use pbind (1M) to rebind them to other processors.
How you can control forcible conditions that affect system quiescence during a DR detach operation in progress
Various configuration changes performed by DR during DR detach operations
If the Solaris operating environment cannot quiesce during a DR detach operation involving a board with nonpageable memory, it displays the reason why it cannot quiesce. For example, a suspend-unsafe device is open that cannot be quiesced by the operating environment.
A failure to quiesce due to open suspend-unsafe devices is known as a forcible condition . You have the option to retry the operation, or you can try to force the quiescence. The conditions that cause processes not to suspend are generally temporary in nature. You can retry the operation until the quiescence succeeds.
When you try to force the quiescence, you give the operating environment permission to continue with the quiescence even if forcible conditions are still present. Doing this forces the operating environment to permit the detach. Note that, although a detach can be forced to proceed when there are open suspend-unsafe devices in the system, it is not possible to force a detach when a detach-unsafe device resides on the board and its driver is loaded
The most straightforward way to quiesce a domain is to close any suspend-unsafe devices. For each network driver you must execute the ifconfig (1M) command with its down parameter, then again with its unplumb parameter (refer to the ifconfig (1M) man page for more information).
Note Note - It should be possible to unplumb all network drivers. However, this action is rarely tested in normal environments and may result in driver error conditions. If you use DR, Sun Microsystems suggests that you test these driver functions during the qualification and installation phases of any suspend-unsafe device.
If a suspend-unsafe device is open and cannot be closed, you can manually suspend the device, and then force the operating environment to quiesce. After the operating environment resumes, you can manually resume the device as explained below.
Note Note - If you cannot make a device suspend its access to the domain centerplane, do not force the operating environment to quiesce. Doing so could cause a domain to crash or hang. Instead, postpone the DR operation until the suspend-unsafe device is no longer open.
For example, if a device that allows asynchronous unsolicited input is open, you can disconnect its cables prior to quiescing the operating environment, preventing traffic from arriving at the device and the device from accessing the domain centerplane. You can reconnect the cables after the operating environment resumes.
Caution Caution - If you attempt a forced quiesce operation while activity is occurring on a suspend-unsafe device, the domain may hang. However, if the domain hangs, it will not affect other domains that are running on the Sun Enterprise 10000 system.
Caution Caution - Exercise care when using the force option. To successfully force the operating environment to quiesce, you must first manually quiesce the controller. Procedures to do that, if any, are device-specific. The device must not transfer any data, reference memory, or generate interrupts during the operation. Be sure to test any procedures used to quiesce the controller while it is open before running them on a production system. Using the force option to quiesce the operating environment, without first successfully quiescing the controller, can result in a domain failure and subsequent reboot.
The interface is the primary network interface for the domain; that is, the interface whose IP address corresponds to the network interface name contained in the file /etc/nodename .
Note that bringing down the primary network interface for the domain prevents network information name services from operating, which results in the inability to make network connections to remote hosts using applications such as ftp (1), rsh (1), rcp (1), rlogin (1). NFS client and server operations are also affected.
The interface is on the same subnet as the SSP host for the system; that is, the subnet of the IP address that corresponds to the SSP host name found in
Bringing down this interface interrupts communication between the host and SSP. Because DR operations are initiated on the SSP, control of the detach process would be lost. Note that the /etc/ssphostname file contains the name of the SSP that controls the host; therefore, if you rename the SSP, you must manually update the /etc/ssphostname file.
2. If the dcs daemon is configured in /etc/inetd.conf , kill dcs (1M) if it is currently running, and send a HUP signal to the inetd (1M) daemon to cause it to re-read the inetd.conf (4) configuration file: