C H A P T E R  2

What You Must Know Before Using DR

This chapter provides information you must know to successfully use the DR functions.

This chapter includes these sections:

System Configuration

Conditions and Settings Using XSCF

Conditions and Settings Using Oracle Solaris OS

Status Management

Operation Management


2.1 System Configuration

This section describes the conditions, premises, and actions for operating the DR functions to construct a system.

2.1.1 System Board Components

There are three types of system board components that can be added and deleted by DR: CPU, memory, and I/O device. FIGURE 2-1 and FIGURE 2-2 show examples of a system board of a midrange server that is divided into one Uni-XSB, and into Quad-XSBs. FIGURE 2-3 and FIGURE 2-4 show examples of a system board of a high-end server that is divided into one Uni-XSB, and into Quad-XSBs.



Note - Due to diagnostic requirements, the DR function works only on boards that have at least one CPU and memory.


FIGURE 2-1 Example of Hardware Configuration (with Uni-XSB of Midrange Server)


 [ D ]

FIGURE 2-2 Example of Hardware Configuration (with Quad-XSBs of Midrange Server)


 [ D ]

FIGURE 2-3 Example of a Hardware Configuration (with Uni-XSBs of High-end Server)


 

FIGURE 2-4 Example of a Hardware Configuration (with Quad-XSBs of High-end Server)


2.1.1.1 CPU

Using DR to change a CPU configuration is easier than using it to change the configuration of memory or an I/O device.
An added CPU is automatically recognized by the Oracle Solaris OS and becomes available for use.

 

A CPU to be deleted must meet the following conditions:



Note - These conditions also apply to movement of a system board.


If any of the above conditions are not met, the DR operation is stopped and a message is displayed. However, if you specify the deleteboard(8) command with the -f (force) option, these protections are ignored and DR continues the deletion process.



Note - Exercise care when using the -f (force) option, as doing so introduces risk of domain failure.


To avoid this problem and automate the operations for CPUs, the Oracle Solaris OS provides the Reconfiguration and Coordination Manager (RCM) script function. For details of RCM, see RCM Script.

For information about mixed configurations of SPARC64 VII+ or or SPARC64 VII, and SPARC64 VI processors, see SPARC64 VII+, SPARC64 VII, and SPARC64 VI Processors and CPU Operational Modes.

2.1.1.2 Memory

The DR functions classify system boards by memory usage into two types:

(1) Kernel Memory Board

A kernel memory board is a system board on which kernel memory (memory internally used by the Oracle Solaris OS and containing an OpenBoot PROM program) is loaded. Kernel memory cannot be removed from the system. But the location of kernel memory can be controlled, and kernel memory can be copied from one board to another.

(1.1) Kernel Cage

The kernel cage function must be in use for DR operations on memory to succeed. Without the kernel cage, kernel memory could be assigned to all system boards, making it impossible to perform DR operations on memory. With the kernel cage, kernel memory is limited to a minimum set of system boards.

For details on enabling this function, see Settings of Kernel Cage Memory.

(1.2) Floating Boards

A floating board is a system board that is designated to be moved easily to another domain. In general, kernel memory is not assigned to a floating board unless absolutely necessary.

However, kernel memory can be assigned to a floating board when one of the following is true:

For details on enabling the floating board option for a system board, see Floating Board Option. For further details, alse see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the setdcl(8) man page.

(1.3) Kernel Memory Assignment

When a domain is powered on, the Power On Self Test (POST) initially assigns an address space to each system board in that domain. The order in which address spaces are assigned depends on the LSB number and floating board option of each system board. The first address spaces are assigned to non-floating boards in ascending order of LSB number. Then, additional address spaces are assigned to floating boards, again in ascending order of their LSB numbers.

When the kernel cage is enabled, kernel memory is assigned to system boards in the order of their address spaces. The kernel cage begins in the first address space (which initially corresponds to the non-floating board with the lowest LSB number). If the kernel requires more memory, then the kernel cage expands to the next address space (which initially corresponds to the non-floating board with the next-lowest LSB number), and so on. The kernel cage extends into the address spaces of floating boards only if kernel memory is too large to fit in the address spaces of the non-floating boards.



Note - During a copy-rename operation, the address spaces initially assigned by POST are exchanged between system boards. The effects of this process persist through reboots of a domain. Therefore, kernel memory may be assigned in a seemingly different order until the domain has gone through a full poweroff(8) and poweron(8) cycle, as this pair of operations cancels the effects of copy-rename operations.


For details on assigning LSB numbers to system boards, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the setdcl(8) man page.

(1.4) Copy-rename

Kernel memory itself cannot be removed, but it can be transferred to another system board. A DR operation to delete a kernel memory board must first perform this transfer, which is called a copy-rename operation.

The Oracle Solaris OS selects the target for the copy-rename operation from among the available user memory boards. The following selection and preference criteria are in effect:



Note - If no system boards meet the selection criteria, the DR operation to delete the kernel memory board will fail.


Once the copy-destination board has been selected, the Oracle Solaris OS performs a memory deletion on the selected user memory board.

Then, the kernel memory on the system board to be deleted is copied into memory on the selected copy-destination system board. The system is suspended while the copying is in progress. After all the memory is copied, the address space of the copy-destination board is renamed to that of the kernel memory board being deleted.



Note - If the address space of a system board is renamed by a copy-rename operation, the change will persist across reboots of the domain. A poweroff(8)/poweron(8) cycle of the domain will reset the address space assignments and remove the effects of one or more copy-rename operations.


(2) User Memory Board

A user memory board is a system board on which no kernel memory is loaded.
Before deleting user memory, the system attempts to swap out the physical pages to the swap area. Sufficient swap space must be available for this operation to succeed.

(2.1) Locked Pages and ISM Pages

Some user pages are locked into memory and cannot be swapped out. These pages receive special treatment by DR.

Intimate Shared Memory (ISM) pages are special user pages which are shared by all processes. ISM pages are permanently locked and cannot be swapped out as memory pages. ISM is usually used by Data Base Management System (DBMS) software to achieve better performance.

Although locked pages cannot be swapped out, the system automatically moves them to the memory on another system board to avoid any problem concerning the pages. Note, however, that the deletion of user memory fails if there is not sufficient free memory size on the remaining system boards to hold the relocated pages.

Although such moving of memory (called save processing) requires a certain length of time, system operations can continue during save processing because it is executed as a background task.



Note - The Dynamic Intimate Shared Memory (DISM) is a feature that allows applications to dynamically resize their ISM segments. Some applications use RCM scripts to resize their DISM segments to assist DR. See the Oracle Solaris man page for rcmscript(4).


Deleting or moving a user memory board fails if either of the following statements is true:

2.1.1.3 I/O Device

(1) Adding an I/O Device

The device driver processing executed by the Oracle Solaris OS is based on the premise that all device drivers dynamically recognize newly added devices. In the domain where DR is performed, all device drivers must support the addition of devices by DR. Upon the addition of an I/O device by DR, the I/O device is reconfigured automatically.

The path name of a device file under /dev is configured as the path name of the newly added I/O device to make the I/O device accessible.

(2) Deleting an I/O Device

An I/O device can be deleted when both of the following conditions are met:

In most cases the device to be deleted is in use. For example, the root file system or any other file systems requisite for operation cannot be unmounted.
To solve this problem, you can configure the system by using redundant configuration software to make the access path to each requisite I/O device redundant. For a disk drive unit, you can make the unit redundant by using disk mirroring software.

If a device driver that does not support DR is used in the domain, all access to I/O devices controlled by the device driver must be stopped, and the device driver must be unloaded by using the modunload(1M) command.



Note - Do not move a device that is part of a redundant configuration from one domain to another domain. The consequences of two domains simultaneously accessing the same device through different paths could be disastrous, such as data corruption.


2.1.2 System Board Configuration Requirements

XSCF enables the Uni-XSB or Quad-XSB setting according to the configuration conditions to determine the division type. If the CPU or memory configuration does not meet the configuration conditions, neither Uni-XSB nor Quad-XSB can be set as the division type.
For the CPU configuration and memory configuration conditions set for the division types, see the System Overview for your server.

The setting of division type may be changed for DR operation if a domain operation requirement dictates changing of a necessary hardware resource when a system board is added to the domain.
In such cases, the CPU configuration and memory configuration conditions for changing the division type are the same as described above. For the conditions, see the System Overview for your server.



Note - Changing the division type before a DR operation may not be possible depending on the system board status or DR operation, even if configuration conditions have been met.


2.1.3 System Board Pool Function

The system board pooling function places a specific system board in the status where that board does not belong to any domain.

This function can be effectively used to move a system board among multiple domains as needed.
For example, a system board can be added from the system board pool to a domain where CPU or memory has a high load. When the added system board becomes unnecessary, the system board can be returned to the system board pool.

All system boards that are targets of DR operations must be registered in the target domain’s Domain Component List (DCL). A domain’s DCL, managed by XSCF, is a list of system boards that are, or are to be, attached to that domain. The DCL of each domain contains not only information of registered system boards but also domain information and option information of each system board.

Moreover, a system board that is pooled can be assigned to a domain only when it is registered on DCL. Pooled system boards must be properly managed.

You can add and delete system boards by combining the system board pooling function with the floating board, omit-memory, and omit-I/O options described in Conditions and Settings Using XSCF.

2.1.4 Checklists for System Configuration

This section describes the prerequisites and the checklists for configuring the system for DR.

1. Redundant Configuration of I/O Devices - Before a system board can be replaced, any I/O device connected to that board must be temporarily disconnected.

You should use redundant-configuration software to prevent any problem that might be caused by disconnection of an I/O device that would affect a job process. You should also confirm that the driver and software support DR before performing a DR operation.

2. Selection of PCI Cards Supporting DR - All PCI cards and I/O device interfaces on a system board must support DR. If not, you cannot execute DR operations on that system board. You must turn off the power supply to the domain before performing maintenance and installation.

3. Confirmation of DR Compliance of Drivers and Other Software - You must confirm that all I/O device drivers and software installed in the system support DR and allow the I/O device operations of DR.

You should also apply the latest patches to the drivers and other software before performing DR.

4. Allocation of Sufficient Memory and Distributed Swap Areas - You must allocate sufficient memory resources to be used when the memory on a system board is disconnected. Performing a DR operation with a high load already applied to memory may significantly lower job process performance and DR operability.

5. Consideration of Hardware Configuration and System Boards on Which Kernel Memory is Loaded - Before determining the hardware configuration and operations, you must understand how job processes are affected by DR operations on system boards on which CPUs, memory, and I/O devices are mounted.

You can perform DR operations on system boards that contain kernel memory. When disconnecting a system board on which kernel memory is loaded, DR copies kernel memory into the memory on another system board. The copy operation is based on the premise that the copy-destination system board does not already contain any kernel memory.

When kernel memory is copied, the Oracle Solaris OS is temporarily suspended. Therefore, you must understand the effect of disconnecting the network connection with remote systems and other influences of the DR operation on job processes before determining system operations.

2.1.5 Reservation of Domain Configuration Changes

Besides letting you add, delete, or move system boards dynamically, DR also lets you order such reconfiguration to take place the next time the affected domains are turned on or turned off, or the domain is rebooted. Use the addboard(8), deleteboard(8), or moveboard(8) command with the -c reserve option to specify these actions.

Some of the reasons you might want to reserve a domain change include:

For how to reserve domain changes, see Reserving a Domain Configuration Change.


2.2 Conditions and Settings Using XSCF

This section describes the operating conditions required for XSCF to start DR operations and the settings that are established by XSCF.

2.2.1 Conditions Using XSCF

The DR operation to add a system board cannot be executed when the system board has only been mounted. The DR operation is enabled by registering the system board in the DCL by using the XSCF shell or XSCF Web. You must confirm that the system board to be added is registered in the DCL before performing the DR operation.
As a matter of course, system boards to be deleted, moved, or replaced have already been registered in the DCL. You need not confirm that these boards have been registered in the DCL.

For details about the DCL and how to register system boards in the DCL and to confirm registration, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

2.2.2 Settings Using XSCF

The DR functions provide users with some options to avoid the complexities of reconfiguration and memory allocation with the Oracle Solaris OS, and make DR operations smoother. You can set up these options using the XSCF shell or XSCF Web. This section describes the following options:

These options are set using setdcl(8) command. For details of how to set the options, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the setdcl(8) man page.

2.2.2.1 Configuration Policy Option

DR operations involve automatic hardware diagnosis to add or move a system board safely. Degradation of components occurs when the components are set according to the configuration of this option, and a hardware error is detected. This option specifies the range of degradation. Moreover, this option can be used for initial diagnosis by domain startup in addition to DR operations.
The unit of degradation can be a component where a hardware error is detected, the system board (XSB) where the component is mounted, or a domain.

Values that can be set and units of degradation are explained in TABLE 2-1.

The default value of the configuration policy option is FRU.



Note - Enable the configuration policy option when the power supply of the domain is turned off.



TABLE 2-1 Unit of Degradation

Value

Unit of degradation

FRU

Hardware is degraded in units of components such as CPU and memory.

XSB

Hardware is degraded in units of system boards (XSB).

System

Hardware is degraded in units of domains or the relevant domain is stopped without degradation.


2.2.2.2 Floating Board Option

The floating board option controls kernel memory allocation.
Upon deletion of a system board on which kernel memory is loaded, the OS is temporarily suspended. The suspended status affects job processes and may disable DR operations. To avoid this problem, use the floating board option to set the priority of kernel loading into the memory of each system board, which increases the likelihood of successful DR operations.
To move a system board among multiple domains, this option can be enabled for the system board to facilitate the system board move.

The value of this option is “true” (to enable the floating board setting) or “false” (to disable the floating board setting). The default is “false”.
A system board with “true” set for this option is called a floating board. A system board with “false” set for this option is called a non-floating board.

Kernel memory is allocated to the non-floating boards in a domain by priority in ascending order of LSB number. When only floating boards are set in the domain, one of them is selected and used as a kernel memory board. In that case, the status of the board is changed from floating board to non-floating board. When Copy-rename is operated by system board deletion or removal, and only floating board can be used because non-floating board cannot be used, specify the -f (force) option. Configuration of floating board option does not change when the force option is used.



Note - Enable the floating board option when the system board is in the system board pool or when the system board is not connected to the domain configuration.


2.2.2.3 Omit-memory Option

When the omit-memory option is enabled, the memory on a system board cannot be used in the domain.
Even when a system board actually has memory, this option enables you to make the memory on the system board unavailable through a DR operation to add or move the system board.
This option can be used when the target domain needs only the CPU (and not the memory) of the system board to be added.
If a domain has a high load on memory, an attempt to delete a system board from the domain may fail. This failure results if a timeout occurs in memory deletion processing (saving of the memory of the system board to be disconnected onto a disk by paging) when many memory pages are locked because of high load. To prevent this situation, you can enable the omit-memory option to facilitate the DR operation beforehand.



Note - For diagnosis and management of a system board, memory must be mounted on the system board even if the omit-memory option is enabled. Enabling the omit-memory option reduces available memory in the domain and may lower system performance. This option must be used in consideration of the influence on jobs.


The value of this option is “true” (omit memory) or “false” (do not omit memory). The default value is “false”.



Note - Enable the omit-memory option when the system board is in the system board pool or when the system board is not connected to the domain configuration.


2.2.2.4 Omit-I/O Option

The omit-I/O option disables the PCI cards, disk drives, and basic local-area network (LAN) ports on a system board to prevent the target domain from using them.

Set this option to “true” if the domain needs to use only the system board’s CPU and memory.

Set this option to “false” if the domain needs to use the system board’s PCI cards and I/O units. In this case you must fully understand the restrictions on use of these I/O components. And you must stop the software (e.g. application programs or daemons) that uses them before you attempt to delete or move the system board.

The value of this option is “true” (omit I/O units) or “false” (do not omit I/O units). The default value is “false”.



Note - Enable the omit-I/O option when the system board is in the system board pool or when the system board is not connected to the domain configuration.



2.3 Conditions and Settings Using Oracle Solaris OS

This section describes the operating conditions and settings required for DR operations.

2.3.1 I/O and Software Requirements

As described in System Configuration, all I/O device drivers and software installed in a domain where DR is to be used must support DR. The device drivers that support DR must also support the following DDI and DKI entries:

attach(9E): DDI_ATTACH and DDI_RESUME

detach(9E): DDI_DETACH and DDI_SUSPEND

If a device driver that does not support DR is present, the deletion of a system board might fail.

Even if the DDI_DETACH interface is supported, DDI_DETACH processing fails when the relevant driver is in use. Before starting the deletion of a system board, you must stop using all devices on the system board to be deleted.

The device drivers that do not support DR must be unloaded before a system board is deleted. To unload a device driver, you must stop using all I/O devices controlled by the device driver. To unload a device driver, you can use the Oracle Solaris command modunload(1M). Then, you can reload the driver for the remaining instances and resume using those remaining instances after deleting the system board.

2.3.2 Settings of Kernel Cage Memory

Kernel cage memory is a function used to minimize the number of system boards to which kernel memory is allocated. Kernel cage memory is enabled by default in the Oracle Solaris 10 OS.

If the kernel cage is disabled, the system may run more efficiently, but kernel memory will be spread among all boards and DR operations will not work on memory.

To determine whether kernel cage memory is enabled after the system has been rebooted, check the following message output from the /var/adm/messages file:


NOTICE: DR kernel Cage is ENABLED

If the kernel cage is disabled, the message will be:


NOTICE: DR kernel Cage is DISABLED

In most cases the kernel cage should be enabled. However, you must consider actual operations before changing the setting. If you do not need to perform DR operations, you do not need to enable the kernel cage.
To enable kernel cage memory, remove or comment out the following setting from the /etc/system file:


set kernel_cage_enable=0

The OS must be rebooted to make the new setting effective.

2.3.3 Setting of Oracle Solaris Service Management Facility (SMF)

Certain DR operations succeed only when the following Oracle Solaris Service Management Facility (SMF) services are active on the domain:

For details, see the Notes about SMF services in Displaying Device Information, Adding a System Board, Deleting a System Board, andMoving a System Board.


2.4 Status Management

The success of DR operations depends on the status of domains and system boards.
This section describes the status information on the domains and system boards managed by XSCF, and the points to be noted for a better understanding of DR operation conditions.

2.4.1 Domain Status

XSCF manages the status of each domain.
You can display and reference the status of each domain through a user interface provided by XSCF. For details of the user interface, see Chapter 3, DR User Interface.

XSCF manages the following aspects of domain status:


TABLE 2-2 Domain Status

Status

Description

Powered Off

Domain power is off.

Initialization Phase

POST processing or OpenBoot PROM initialization is in progress.

OpenBoot Executing Completed

Initialization of OpenBoot PROM is completed.

Booting

Oracle Solaris OS is being booted or, due to the domain being shutdown or reset, the system is in the OpenBoot PROM running state or is suspended in the OpenBoot PROM (ok prompt) state.

Running

Oracle Solaris OS is running.

Shutdown Started

Oracle Solaris OS is being shut down.

Panic State

Oracle Solaris OS has panicked.


To perform a DR operation for a system board, you must determine the method of DR operation according to the status of the relevant domain. The conditions of domain status available for DR operation are described in individual sections of
Chapter 3, DR User Interface. For details of each method used for DR, see the relevant section.

2.4.2 System Board Status

XSCF manages system board status in units of XSB for the following management items:


TABLE 2-3 System Board Management Items

Management item

Description

Power

Power on/off status of system board

Test

Diagnostic status of system board

Assignment

Status of assignment to domain

Connectivity

Status of connection to domain

Configuration

Status of addition into Oracle Solaris OS


The table below lists the status types available for individual management items.


TABLE 2-4 System Board Management Items

Management item

Status

Description

Power

Power Off

The system board is powered off and cannot be used.

Power On

The system board is powered on.

Test

Unmount

The system board is not mounted or cannot be recognized, perhaps because it is faulty.

Unknown

The system board is not being diagnosed.

Testing

Testing.

Passed

Passed.

Failed

A system board error was detected and the board has been deconfigured.

Assignment

Unavailable

The system board is in the system board pool (not assigned to a domain) and its status is one of the following: not-yet diagnosed, under diagnosis, or diagnosis error. All system boards that are not mounted are also shown as Unavailable.

Available

The system board is in the system board pool and its diagnosis has completed normally.

Assigned

The system board is reserved or assigned to the domain.

Connectivity

Disconnected

The system board is disconnected from the domain configuration and is in the system board pool.

Connected

The system board is connected to the domain configuration.

Configuration

Unconfigured

The hardware resources of the system board have been deleted from the Oracle Solaris OS.

Configured

The hardware resources of the system board have been added into the Oracle Solaris OS.


XSCF changes and configures system board status according to the conditions under which a system board is installed, removed, or registered in the DCL, or when a domain is started or stopped. System board status also changes when the system board is added, deleted, or moved by DR.

To perform a DR operation for a system board, you must determine the method of DR operation according to the status of the target system board.
You can display and reference the status of each system board via a user interface provided by XSCF. For details of the user interface, see Chapter 3, DR User Interface.

2.4.3 Flow of DR Processing

This section describes the flow of DR processing and the changes in system board status during individual DR operations.

2.4.3.1 Flowchart: Adding a System Board

The flow of DR operations and the transition of system board status when a system board has been added or reserved for addition are described in the schematic flowchart, below.

Each system board status indicated in FIGURE 2-5 is the main status that is changed.

FIGURE 2-5 Flow of System Board Addition Processing


2.4.3.2 Flowchart: Deleting a System Board

The flow of DR operations and the transition of system board status when a system board has been deleted or reserved for deletion are described in the schematic flowchart, below.

Each system board status indicated in FIGURE 2-6 is the main status that is changed.

FIGURE 2-6 Flow of System Board Deletion Processing


 

2.4.3.3 Flowchart: Moving a System Board

The flow of DR operations and the transition of system board status when a system board has been moved or reserved for a move are described in the schematic flowchart, below.

Each system board status indicated in FIGURE 2-7 is the main status that is changed.

For the flow of system board addition processing or deletion processing and the related system board status, see Flowchart: Adding a System Board or Flowchart: Deleting a System Board, respectively.

FIGURE 2-7 Flow of System Board Move Processing


 

2.4.3.4 Flowchart: Replacing System Board

The flow of DR operations and the transition of system board status when a system board has been replaced are described using the schematic flowchart.

Each system board state indicated in FIGURE 2-8 is the main status that is changed.

The sample status before and after replacement as shown in the figure are explained below. The actual status after hardware replacement may not match the indicated status.

For the flow of system board addition processing or deletion processing and the related system board status, see Flowchart: Adding a System Board or Flowchart: Deleting a System Board, respectively.

For details of hardware replacement operations, see the Service Manual for your server.

FIGURE 2-8 Flow of System Board Replacement Processing



2.5 Operation Management

This section describes the premises and the actions for DR operations.

2.5.1 I/O Device Management

Upon the addition of a system board, device information is reconfigured automatically. However, addition of the system board and the reconfiguration of device information do not end at the same time.

Sometimes, device link in /dev directory is not automatically cleaned up by devfsadmd(1M) daemon. Using devfsadm(1M), you can manually clean up this device link. See the devfsadm(1M) Oracle Solaris man page for details.

2.5.2 Swap Area

The size of available virtual memory is the sum of the size of memory mounted in the system and the size of the swap area on the disk. You must ensure that the size of available memory is sufficient for all necessary operations.

2.5.2.1 Swap Area at System Board Addition

By default in Oracle Solaris, the swap area is also used to store a system crash dump. You should use a dedicated dump device, instead. See the Oracle Solaris man page dumpadm(1M). The default swap area used to store the crash dump varies in size according to the size of mounted memory.

The size of the dump device used to store the crash dump must be larger than the size of mounted memory. When a system board is added, thereby increasing the size of mounted memory, the dump device must be reconfigured as required. For details, see the dumpadm(1M) Oracle Solaris man page.

2.5.2.2 Swap Area at System Board Deletion

When you delete a system board, the memory of the system board is swapped to the swap area of the disks. The available swap area is decreased by the memory size to be deleted. So, before you execute a delete board command, check the total swap area to verify that enough free swap space is available to hold the board's physical memory contents. Be aware that some of the total swap space may be supplied by disks that are attached to the board to be deleted. When making your assessment, be certain to also account for the swap space that will be lost.

To determine the size of currently available swap area, execute the swap -s command on the OS and verify that the memory size is marked available. For details, see the Oracle Solaris man page swap(1M). Moreover, the size of physical memory of system board to be deleted and information on I/O devices connected can be confirmed by the showdevices(8) command. See Displaying Device Information, or the showdevices(8) man page. see Appendix B for a more complete example.

2.5.3 Real-time Processes

The Oracle Solaris OS is temporarily suspended when a kernel memory board is deleted or moved. If your system has any real-time requirements (such as might be indicated by the presence of real-time processes), be aware that such a DR operation could significantly affect these processes.

2.5.4 Memory Mirror Mode

The memory mirror mode is a function used to duplex memory to ensure the hardware reliability of memory. When memory mirror mode is enabled, the domain can continue operation even if a fault occurs in a part of memory (provided that the fault is recoverable).

Memory mirror mode cannot be set in some division types of PSB. For more information, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

Enabling memory mirror mode does not restrict any DR functions. However, you must consider the domain configuration and operation when enabling memory mirror mode.

For example, when a kernel memory board with memory mirror mode enabled is deleted or moved, kernel memory is moved from the kernel memory board to another system board. Kernel memory is moved normally even if memory mirror mode is disabled for the move-destination system board. However, this operation results in lowered reliability of memory on the new kernel memory board.

You must properly plan and decide the setting of memory mirror mode by fully considering the requirements for the domain configuration and operations.

2.5.5 Capacity on Demand (COD)

DR works the same on COD boards as on other system boards, but standard COD restrictions still apply.

For detailed information on COD boards, see the SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on Demand (COD) User’s Guide.

2.5.6 XSCF Failover

An XSCF reset or failover might prevent a DR operation from completing. Log in to the active XSCF to determine if DR succeeded. If not, try it again.

2.5.7 Kernel Memory Board Deletion

An XSCF reset or failover during the Copy-rename phase of a deleteboard(8) or moveboard(8) operation might cause the domain to panic and display the following message::


Irrecoverable FMEM error error_code

If the XSCF reset or failover results in a domain panic, check the active XSCF to determine if the DR operation succeeded. If not, try it again.

2.5.8 Deletion of Board with CD-RW/DVD-RW Drive

To delete the system board to which the server’s CD-RW/DVD-RW drive is connected, execute the following steps:

1. Stop the vold(1M) daemon by disabling the volfs service.


# /usr/sbin/svcadm disable volfs

2. Execute the DR operation.

3. Restart the vold(1M) daemon by enabling the volfs service.


# /usr/sbin/svcadm enable volfs

For details, see the vold(1M) Oracle Solaris man page.

2.5.9 SPARC64 VII+, SPARC64 VII, and SPARC64 VI Processors and CPU Operational Modes



Note - This section applies only to M4000/M5000/M8000/M9000 servers that run or will run SPARC64 VII+ or SPARC64 VII processors.


The M4000/M5000/M8000/M9000 servers support system boards that contain any mix of SPARC64 VII+, SPARC64 VII, and SPARC64 VI processors.



Note - Supported firmware releases and Oracle Solaris releases vary based on processor type. For details, see the Product Notes that apply to the XCP release running on your server and the latest version of the Producct Notes (no earlier than XCP version 1100).


FIGURE 2-9 shows an example of a mixed configuration of SPARC64 VII and SPARC64 VI processors.

FIGURE 2-9 CPUs on CPU/Memory Board Unit (CMU) and Domain Configuration


Different types of processors can be mounted on a single CMU, as shown in CMU#2 and CMU#3 in FIGURE 2-9. And a single domain can be configured with different types of processors, as shown in Domain 2 in FIGURE 2-9.

2.5.9.1 CPU Operational Modes

An M4000/M5000/M8000/M9000 server domain runs in one of the following CPU operational modes:

All processors in the domain behave like and are treated by the Oracle Solaris OS as SPARC64 VI processors. The extended capabilities of SPARC64 VII+ and SPARC64 VII processors are not available in this mode. Domains 1 and 2 in FIGURE 2-9 correspond to this mode.

All boards in the domain must contain only SPARC64 VII+ or SPARC64 VII processors. In this mode, the server utilizes the extended capabilities of these processors. Domain 0 in FIGURE 2-9 corresponds to this mode.

To check the CPU operational mode, execute the prtdiag (1M) command on the Oracle Solaris OS. If the domain is in SPARC64 VII Enhanced Mode, the output will display SPARC64-VII on the System Processor Mode line. If the domain is in SPARC64 VI Compatible Mode, nothing is displayed on that line.

By default, the Oracle Solaris OS automatically sets a domain’s CPU operational mode each time the domain is booted based on the types of processors it contains. It does this when the cpumode variable - which can be viewed or changed by using the setdomainmode(8) command - is set to auto.

You can override the above process by using the setdomainmode(8) command to change the cpumode from auto to compatible, which forces the OS to set the CPU operational mode to SPARC64 VI Compatible Mode on reboot. To do so, power off the domain, execute the setdomainmode(8) command to change the cpumode setting from auto to compatible, then reboot the domain.

DR operations work normally on domains running in SPARC64 VI Compatible Mode. You can use DR to add, delete or move boards with any of the processor types, which are all treated as if they are SPARC64 VI processors.

DR also operates normally on domains running in SPARC64 VII Enhanced Mode, with one exception: You cannot use DR to add or move into the domain a system board that contains any SPARC64 VI processors. To add a SPARC64 VI processor you must power off the domain, change it to SPARC64 VI Compatible Mode, then reboot the domain.

In an exception to the above rule, you can use the DR addboard(8) command with its -c reserve or -c assign option to reserve or register a board with one or more SPARC64 VI processors in a domain running in SPARC64 VII Enhanced Mode. The next time the domain is powered off then rebooted, it comes up running in SPARC64 VI Compatible Mode and can accept the the reserved or registered board.



Note - Change the cpumode from auto to compatible for any domain that has or is expected to have SPARC64 VI processors. If you leave the domain in auto mode and all the SPARC64 VI processors later fail, the Oracle Solaris OS will see only the SPARC64 VII+ and SPARC64 VII processors - because the failed SPARC64 VI processors will have been degraded -and it will reboot the domain in SPARC64 VII Enhanced Mode. You will be able to use DR to delete the bad SPARC64 VI boards so you can remove them. But you will not be able to use DR to add replacement or repaired SPARC64 VI boards until you change the domain from SPARC64 VII Enhanced Mode to SPARC64 VI Compatible mode, which requires a reboot.

Setting cpumode to compatible in advance enables you to avoid possible failure of a later DR add operation and one or more reboots.


The SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide contains the above information, as well as more detailed instructions.