C H A P T E R  2

DR Concepts

This chapter describes the DR concepts you should understand before attempting to use DR.

If you plan to execute DR operations on a high-end server's system controller (SC) using SMS DR commands, be sure to read Chapter 5, SMS DR Procedures - From the SC (High-End Only). Some of the information in this chapter is repeated in Chapter 5, but from a different perspective. Reading both chapters might yield a more comprehensive picture of the DR feature.

This chapter covers the following topics:



Note - The UltraSPARC IV+ board contains dual-core CPUs. References in this document to CPUs or processors might refer to either single-core or double-core types, and all procedures apply to both.




Dynamic System Domains

The Sun Fire system can be divided into domains. Each domain is based on the system board slots that are assigned to it. Further, each domain is electrically isolated into hardware partitions, which ensures that any failure in one domain does not affect the other domains in the server.

Each domain configuration is determined in a onfiguration database which resides on the SC. The configuration database - on high-end systems, the platform configuration database (PCD) - controls how the system board slots are logically partitioned into domains. The domain configuration represents the intended domain configuration. Thus, the configuration can include empty slots and populated slots. The physical domain is determined by the logical domain.

The number of slots available to a given domain is controlled by an ACL. ACL is an abbreviation for available component list on high-end system domains, or access control list on midrange system domains. The ACL for all domains is maintained on the SC. A slot must be assigned or available to a domain before you can change its state. After a slot has been assigned to a domain, it becomes visible to that domain and invisible and unavailable to all other domains. Conversely, you must disconnect and unassign a slot from its domain before you can assign and connect it to another domain.

The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain without having to be part of a physical domain. After the domain is booted, the system boards and the empty slots can be assigned to or unassigned from a logical domain; however, they are not allowed to become a part of the physical domain until the operating system requests it. System boards or slots that are not assigned to any domain are available to all domains. These boards can be assigned to a domain by the platform administrator; however, an ACL can be set up on the SC to allow users with appropriate privileges to assign available boards to a domain.


Attachment Points

An attachment point is a collective term for a board or device, the slot that holds it, and any components on it. Slots are sometimes called receptacles.

Sun Fire systems support the following attachment points:



Note - Many users are concerned only with changing the status of boards and devices. So, for simplicity, some procedures in this document refer to board attachment points simply as boards, PCI attachment points as PCI cards, and component attachment points as CPU or memory modules. Where simplification might cause confusion, proper names are used.



The term occupant refers to the combination of a board and its attached devices, including any external storage devices connected by interface cables.

Board slots can be named according to slot numbers, or can be anonymous (for example, when in a SCSI chain).

DR recognizes two types of attachment point names:

To obtain a list of all available logical attachment points, use the following command in the domain:


# cfgadm -l

Attachment Point Classes

Sun Fire systems support classes of attachment points. The two classes DR users need to know about are sbd and pci.

To view a list of the attachment points and the type of board associated with each, use the following command as superuser:


# cfgadm -s -a "cols=ap_id:class"

High-End System Attachment Points

Examples of physical attachment point names on high-end systems are:


/devices/pseudo/dr@0:SBx (for a system board in slot 0)

/devices/pseudo/dr@0:IOx (for an I/O board in slot 1)


where 0 is node 0 (zero), SB is a system board, IO is an I/O board, and x represents the board number or expander number for a particular board. System boards and I/O boards are numbered 0 to 17.



Note - System boards are installed only in slot 0. I/O boards and Max CPU boards are installed only in slot 1.



Logical attachment points on a high-end system take one of the following two forms:


SBx (for system boards)

IOx (for I/O boards or Max CPU boards)


Midrange System Attachment Points

Examples of physical attachment point names on a midrange system are:


/devices/ssm@0,0:N0.SBx (for a system board)

/devices/ssm@0,0:N0.IBx (for an I/O board)


where N0 is node 0 (zero), SB is a system board, IB is an I/O board, and x is a slot number (0 through 5 for a system board, 6 through 9 for an I/O board).

Logical attachment points on midrange systems take one of the following two forms:


N0.SBx (for a system board)

N0.IBx (for an I/O board)


Changes To Attachment Points

You can use the cfgadm(1M) command to change attachment points. You can:

For information about states, see the sections that follow. For more information about attachment points, see the cfgadm(1M) man page.


States and Conditions

This section describes the states and conditions of boards, slots, components, and attachment points.

The cfgadm(1M) command can display nine types of states and conditions. For more information, see Component States and Component Conditions.



Note - The following information about boards and board slots also applies to PCI cards and the PCI buses that hold them.



Board and Board Slot States

When a board slot does not hold a board, its state is empty. When the slot does contain a board, the state of the board is either disconnected or connected.


TABLE 2-1 Board and Board Slot States

State

Description

empty

The slot does not hold a board.

disconnected

The board in the slot is disconnected from the system bus. A board can be in the disconnected state without being powered off. However, a board must be powered off and in the disconnected state before you remove it from the slot. A newly inserted board is in the disconnected state.

connected

The board in the slot is powered on and connected to the system bus. You can view the components on a board only after it is in the connected state.




caution icon

Caution - Physically removing a board that is in the connectedstate, or that is powered on and in the disconnectedstate, crashes the operating system and can result in permanent damage to that system board.



 

A board in the connected state is either configured or unconfigured. A board that is disconnected is always unconfigured.


TABLE 2-2 Conrfigured and Unconfigured Boards

Name

Description

configured

The board is available for use by the Solaris software.

unconfigured

The board is not available for use by the Solaris software.


The following states are visible only from the SC:


TABLE 2-3 Board States Visible Only From the SC

Name

Description

Available

The slot, which might or might not contain a board, is not assigned to any particular domain.

Assigned

The slot, which might or might not contain a board, belongs to a domain, but the hardware has not been configured to use it.

Active

The board in the slot is being actively used by the domain to which it has been assigned. You cannot reassign an active board.


Board Conditions

A board can be in one of three conditions: unknown, ok, or failed. Its slot might be designated as unusable.


TABLE 2-4 Board and Board Slot Conditions

Name

Description

unknown

The board has not been tested.

ok

The board is operational.

failed

The board failed testing.

unusable

The board slot is unusable.


Component States

Unlike a board, a CPU or memory module cannot be individually connected or disconnected. Thus, all such components are in the connected state.

The connected component is either configured or unconfigured.


TABLE 2-5 Connected Components: Configured or Unconfigured

Name

Description

configured

The component is available for use by the Solaris OS.

unconfigured

The component is not available for use by the Solaris OS.


Component Conditions

A CPU or memory module is unknown, ok, or failed.


TABLE 2-6 CPU or Memory Module Conditions

Name

Description

unknown

The component has not been tested.

ok

The component is operational.

failed

The component failed testing.



Detachability

A detachable device is one that conforms to the following rules:

Some boards cannot be detached because their resources cannot be moved. For example, if a domain has only one CPU board, that CPU board cannot be detached. An I/O board is not detachable if it controls the boot drive.

If an I/O board has no alternate pathway, you can do one of the following:



Note - If you are unsure whether a device is detachable, consult with your Sun service representative.




Permanent and Non-Permanent Memory

Before you can delete a board, the operating system must vacate the memory on that board. Vacating a board entails flushing the contents of its non-permanent memory to swap space; and copying the contents of its permanent memory (that is, the kernel and OpenBoottrademark PROM software) to another memory board.

To relocate permanent memory, the operating system on a domain must be temporarily quiesced. The length of the quiescence depends on the domain I/O configuration and the running workloads.

Detaching a board with permanent memory is the only time when the operating system is quiesced; therefore, you should know where permanent memory resides so that you can avoid impacting the operation of the domain significantly. To display the size of permanent memory, use the cfgadm(1M) command with its -av option. To vacate a board that has permanent memory, the operating system must find a sufficiently large block of available memory, called target memory, on which to copy the current contents of permanent memory, which is referred to as source memory.

Copy-Rename

User processes can release memory by paging it out to the swap device. But the Solaris kernel, which resides in permanent memory, cannot be released in that manner. Instead, cfgadm uses the copy-rename technique to release the memory. After the OS identifies a suitable target board - one that has enough memory to hold the permanent memory to be moved - the DR software executes the following steps:

1. Vacates the memory on the target board by paging the memory out to swap.

2. Quiesces the operating system.

3. Copies the contents (permanent memory) from the source board to the target board. This is the copy part of the operation.

4. Reprograms the hardware to swap the memory address ranges of the source and target board. This is the rename part of the operation.

5. Releases the operating system from its quiesced state.

Memory Interleaving

System boards cannot be dynamically reconfigured if system memory is interleaved across multiple system boards. PCI cards and I/O boards can be dynamically reconfigured regardless of whether memory is interleaved.

For more information about memory interleaving on high-end systems, see the Sun Fire High-End Systems Administration Manual. For midrange systems, see the interleave-scope parameter of the setupdomain command, which is described in both the Sun Fire Midrange Systems Platform Administration Manual and the Sun Fire Midrange System Controller Command Reference Manual.

Correctable Memory Errors

Correctable memory errors indicate that the memory on a system board - that is, one or more of its dual inline memory modules (DIMMs), or portions of the hardware interconnect - might be faulty and need replacement. When the SC detects correctable memory errors, it initiates a record-stop dump to save the diagnostic data, which can interfere with a DR operation.

When a record-stop occurs from a correctable memory error, allow the record-stop dump to complete before you initiate a DR operation.

If the faulty component causes repeated reporting of correctable memory errors, the SC performs multiple record-stop dumps. If this happens, you should temporarily disable the dump-detection mechanism on the SC; allow the current dump to finish; then initiate the DR operation. After the DR operation finishes, re-enable the dump detection.


Quiescence

During the unconfigure operation on a system board with permanent memory (OpenBoottrademark PROM or kernel memory), the operating system is briefly paused, which is known as operating system quiescence. All operating system and device activity on the domain must cease during this critical phase of the operation.

A quick way to determine whether a board has permanent memory is to use the following command:


# cfgadm -av | grep permanent

The system responds with output such as the following, which describes system board 0 (zero) on a midrange system:


N0.SB0::memory connected configured ok base address 0x0, 4194304
 KBytes total, 668072 KBytes permanent

If the operating system cannot achieve quiescence, it displays the reasons, which might include the following:



Note - Real-time processes do not prevent quiescence.



The conditions that cause processes to fail to suspend are generally temporary. Examine the reasons for any failure, and if the operating system encountered a failure to suspend a process, simply try the operation again.

During quiescence the system is frozen and does not respond to external events such as network packets. The duration of the quiescence depends on two factors: How many I/O devices and threads need to be stopped; and how much memory needs to be copied. Typically, the number of I/O devices determines the required quiescent time, because I/O devices must be paused and unpaused. A quiescent state usually lasts longer than two minutes.

Because quiescence has a noticeable impact, cfgadm requests confirmation before implementing quiescence. If you type:


# cfgadm -c unconfigure N0.SB0

The system responds with a prompt for confirmation:


System may be temporarily suspended, proceed (yes/no)?

If you use Sun Management Center to perform the DR operation, a pop-up window displays this prompt:


Enter Yes to confirm that the impact of the quiesce is acceptable, and to proceed.


Suspend-Safe and Suspend-Unsafe Devices

When DR suspends the operating system, device drivers that are attached to the operating system must also be suspended. If a driver cannot be suspended (or subsequently resumed), the DR operation fails.

A suspend-safe device does not access memory or interrupt the system while the operating system is in quiescence. A driver is suspend-safe if it supports operating system quiescence (if it can be suspended and then resumed). A suspend-safe driver also guarantees that when a suspend request is successfully completed, the device that the driver manages does not attempt to access memory, even if the device is open when the suspend request is made.

A suspend-unsafe device allows a memory access or a system interruption to occur while the operating system is in quiescence.

On high-end systems, DR uses the unsafe driver list in the dr.conf file to prevent unsafe devices from accessing memory or interrupting the operating system during a DR operation. The dr.conf file resides in the following directory: /platform/SUNW,Sun-Fire-model_number/kernel/drv/, where model_number is the machine name, such as 15000. The unsafe driver list is a property in the dr.conf file with the following format:


unsupported-io-drivers="driver1","driver2","driver3";

DR reads this list when it prepares to suspend the operating system so that it can unconfigure a memory component. If DR finds an active driver in the unsafe driver list, it aborts the DR operation and returns an error message. The message includes the identity of the active, unsafe driver. You must manually remove the usage of the device by performing one or more of the following tasks:

You can retry the DR operation after you have stopped usage of the device.



Note - If you are unsure whether a device is suspend-safe, contact your Sun service representative.




DR on I/O Boards

You must use caution when you add or remove boards with I/O devices. Before you can remove a board with I/O devices, all of its devices must be closed and all its file systems must be unmounted.

If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added, you do not have to reconfigure. In this case, device paths to the board devices remain unchanged. But if you add another board with I/O devices after the first was removed, then re-add the first board, reconfiguration is required because the paths to devices on the first board have changed.



Note - Before attempting to perform DR operations on an I/O board in a domain, make sure at least two CPUs are available to the domain. Further, make sure at least one of those CPUs is located on a system board, and that no processes are bound to it. See the pbind(1M) man page for more information about bound processes.



High-End Systems I/O Boards, Golden IOSRAM, MaxCPU, and hsPCI+

Each I/O board in a high-end system domain contains an IOSRAM device. However, only one IOSRAM device, called the golden IOSRAM, is used for SC-to-domain communications at a time. The golden IOSRAM contains the "tunnel" that is used for SC-to-domain communications. Because DR can remove I/O boards, it is sometimes necessary to stop using the current golden IOSRAM and make another IOSRAM device the golden IOSRAM. This process is called a "tunnel switch," and takes place whenever DR unconfigures the current golden IOSRAM. When a domain is booted, the lowest-numbered I/O board in the domain is typically selected to be the initial golden IOSRAM.

DR supports the I/O buses on a high-end system I/O board and any PCI cards and MaxCPU boards they hold. DR also supports dynamic reconfiguration of hsPCI+ cards. Each hsPCI+ card includes two XMITS ASICs and four hot-pluggable hsPCI+ slots.

Midrange Systems I/O Assemblies, PCI and CompactPCI

On Sun Fire midrange systems, DR supports neither SAI/P (BugID 4466378) nor HIPPI/P. Previous releases did not support the SunHSI/P driver, but the bug that prevented support, 4496362, was fixed in patch 106922 (2.0) and 109715 (3.0). For more information see SunSolve and the devfsadm(1M) man page.



Note - You cannot use the DR connect and configure operations to add an I/O board to a domain in a single-partition midrange system that is configured with one or more UltraSPARC IV+ system boards. This restriction is due to the absence of a second domain in which the I/O board can be tested. However, you can use the DR unconfigure and disconnect commands on an I/O board in the described system. For more information see Testing Boards, and the Sun Fire Midrange Systems Platform Administration Manual, Firmware Release 5.19.0.



Notes about CompactPCI

The following limitations apply to reconfigurations involving CompactPCI assemblies:

Unconfiguring a CompactPCI card automatically disconnects it, too. If autoconfigure is enabled, connecting a CompactPCI card also configures it. If autoconfigure is disabled, you must do the configure manually.


Common DR Board Operations

Connect Operation

During the board connect operation, DR attempts to assign a board slot to the domain if the slot's system board is available and not part of any logical domain. After the slot has been assigned, DR requests that the SC power on and test the board. After the board has been tested, DR requests the SC to connect the board electronically to the system, which makes the board part of the physical domain. The operating system then probes the components on the board.



Note - If the cfgadm(1M) command fails during a DR operation, the board does not return to its original state. If the error is recoverable, you can retry the command. If the error is unrecoverable, you must reboot the domain to use the board.



The states and conditions for the attachment point before a board is inserted are:

After a board is physically inserted, the states and conditions are:

After the attachment point is logically connected, the states and conditions are:

Configure Operation

During the configure operation, DR attempts to connect the board slot if its state is disconnected. It then traverses the tree of devices that was created during the connect operation. (DR creates Solaris OS device tree nodes and attaches device drivers if necessary.)

The CPUs are added to the CPU list; and memory is initialized and added to the system memory pool. After the configure function has completed successfully, the CPUs and memory are ready for use.

For I/O devices, use the mount(1M) and the ifconfig(1M) commands before the devices can be used.

When you use cfgadm to configure a board into a domain, the board is automatically connected and configured

Disconnect Operation

During a disconnect operation, the DR framework communicates with the SC to program the interconnect so that the system board is removed from the physical domain. It then attempts to perform the tasks related to the unconfigure operation.

A board can be in the disconnected state without being powered off. However, the board must be powered off and in the disconnected state before you can remove it from the slot.

Before a board is disconnected, the states and conditions are:

After a board is disconnected, the states and conditions are:

Unconfigure Operation

The unconfigure operation can consist of a single operation or two separate operations, depending on the presence of permanent memory. If the system board hosts permanent memory, before the unconfigure operation DR moves the memory contents from the specified board to available memory on a target board in the domain. See Permanent and Non-Permanent Memory for more information about boards that host permanent memory.


Illustrations of DR Concepts

DR lets you disconnect, then reconnect system circuit boards without bringing the system down. You can use DR to add or remove system resources while the system continues to operate.

The example that follows is from a Sun Fire high-end system, but the basic idea applies to midrange systems, as well.



Note - Sun Fire E25K and Sun Fire 15K systems support up to 18 system boards and 18 I/O boards at a time, numbered 0 through 17.



Domain A contains system boards 0 and 2, and I/O board 2. Domain B contains system boards 1 and 3, and I/O boards 1, 3, and 4.


FIGURE 2-1 Domains A and B before reconfiguration


To assign system board 4 and I/O board 0 to Domain A, and to move I/O board 4 from Domain B to Domain A, you can use the Sun Management Center software's GUI. Or you can use cfgadm(1M) in each domain.

1. Use the following command in Domain B to disconnect I/O board 4.


# cfgadm -c disconnect -o nopoweroff,unassign IO4

2. Use the following command in Domain A to assign, connect, and configure system board 4 and I/O boards 0 and 4 into Domain A.


# cfgadm -c configure SB4 IO0 IO4

The following system configuration is the result. Only the way in which the boards are connected has changed, not the physical layout of the boards within the cabinet.


FIGURE 2-2 Domains A and B after reconfiguration