Sun Enterprise 10000 Dynamic Reconfiguration User Guide

Chapter 3 DR 3.0 Model

This chapter contains information about the DR 3.0 model of dynamic reconfiguration (DR) on the Sun EnterpriseTM 10000 server.

Introduction to the DR 3.0 Model

The DR 3.0 model is based on the use of the domain configuration server, dcs(1M), to control DR operations. This model includes the Automated DR (ADR) commands, such as addboard(1M), deleteboard(1M), and moveboard(1M). The showusage(1M) command is no longer supported. DR 3.0 includes three new commands:

Automatic DR

Automatic DR enables an application to execute DR operations without requiring user interaction. This ability is provided by an enhanced DR framework that includes a reconfiguration coordination manager (RCM) and a system event facility called sysevent. The RCM enables application-specific loadable modules to register callbacks with the RCM. The callbacks perform preparatory tasks before a DR operation, error recovery during a DR operation, or clean-up after a DR operation. The sysevent facility enables applications to register for system events and to receive notifications of those events. The automatic DR framework interfaces with the RCM and with sysevent to notify applications to give up resources automatically prior to unconfiguring them, and to capture new resources as they are configured into the domain.


Note -

Automatic DR is a different feature from Automated DR (ADR)


For more information about RCM, refer to the Solaris 8 System Administration Supplement in the Solaris 8 4/01 Update Collection.

Enhanced System Availability

The DR feature enables you to hot-swap system boards without bringing the server down. It is used to unconfigure the resources on a faulty system board from a domain so that the system board can be removed from the server. The repaired, or replacement, board can be inserted into the domain while the Solaris Operating Environment is running. DR then configures the resources on the board into the domain.

DR on I/O Boards

You must use caution when you add or remove system boards with I/O devices. Before you can remove a board with I/O devices, all of its devices must be closed and all its file systems must be unmounted.

If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added, reconfiguration is not necessary and need not be performed. In this case, device paths to the board devices will remain unchanged. However, if you add another board with I/O devices after the first was removed, then re-add the first board; reconfiguration is required because the paths to devices on the first board have changed.

Sun Enterprise 10000 Domains

The Sun Enterprise 10000 server can be divided into domains that contain system boards, I/O boards, and components such as CPUs, memory chips, and CompactPCI cards that are connected to the boards. Each domain is electrically isolated into hardware partitions, which ensures that any failure in one domain does not affect the other domains in the server.

DR 3.0 Procedures

This section contains procedures that describe how to use the DR 3.0 commands. The following procedures are included:

Showing Platform Information

Before you attempt to add, move, or delete a board to or from a specific domain, you can use the domain_status(1M) command to determine the domain name and the board number.

To Show the Platform Information
  1. Use the domain_status(1M) command to obtain the domain information.


    % domain_status -m
    

    Using the domain_status with the -m option command displays the domain name, the DR model, and the number of the boards in the domain, as in the following example.


    % domain_status -m
    
    DOMAIN     TYPE                     PLATFORM   DR-MODEL   OS   SYSBDS
    A          Ultra-Enterprise-10000   all-A      2.0        5.8  2
    B          Ultra-Enterprise-10000   all-A      3.0        5.8  3 4
    C          Ultra-Enterprise-10000   all-A      2.0        5.7  5 6

Showing Device Information

Before you attempt to perform any DR operation, use the showdevices(1M) command to display the device information, especially when you are removing devices.

To Show Device Information
  1. Use the showdevices(1M) command to display the device information for the domain.


    % showdevices -v -d A
    

    The above command displays the device information for all of the devices in the domain. Refer to the showdevices(1M) man page to learn how to display device-specific information.

    The above command produces the following output for CPUs in domain A (the following is an example only).


    CPU
    ----
    domain   board   id   state     speed    ecache    usage                
    A        C1      40   online    400      4
    A        C1      41   online    400      4
    A        C1      42   online    400      4
    A        C1      43   online    400      4
    A        C2      55   online    400      4
    A        C2      56   online    400      4
    A        C2      57   online    400      4
    A        C2      58   online    400      4
    

The following output represents an example of the memory output for the showdevices(1M) command above.


Memory
drain in progress:
-----------------
               board   perm    base    domain  target deleted remaining
domain  board  mem MB  mem MB  addr    mem
MB  board  MB      MB
A       C1     2048    933     0x600000 4096   C2     250     1500
A       C2     2048    0       0x200000 4096

The following output represents an example of the I/O devices output for the showdevices(1M) command above.


IO Devices
----------
domain   board   device  resource             usage
A        IO1     sd0
A        IO1     sd1
A        IO1     sd2
A        IO1     sd3     /dev/dsk/c0t3d0s0    mounted
filesystem "/"    
A        IO1     sd3     /dev/dsk/c0t3d0s1    dump
device (swap)    
A        IO1     sd3     /dev/dsk/c0t3d0s1    swap
area            
A        IO1     sd3     /dev/dsk/c0t3d0s3    mounted
filesystem "/var"    
A        IO1     sd3     /var/run             mounted
filesystem "/var/run"
A        IO1     sd4
A        IO1     sd5

Refer to the showdevices(1M) man page for a complete list of the options and arguments for this command.

Adding Boards

Adding a board to a domain moves the board through several state changes. It is connected to the domain and then configured into the Solaris Operating Environment. After it is connected, it is considered to be part of the physical domain and available to be used by the operating system.

To Add a Board to a Domain
  1. Use the addboard(1M) command to add the board to the domain.

    The following example of the addboard(1M) command adds system board 2 to the domain specified by domain_id. Two retries are performed, if necessary, with a wait time of 10 minutes between retries.


    % addboard -d domain_id -r 2 -t 600 SB2
    

Deleting Boards

Deleting a board from a domain removes the board from the domain.

You should always check the usage of the components on a board before you delete it from a domain. If the board hosts permanent memory, the memory is moved to another board within the same domain before the board is deleted from the domain. Likewise, if any busy devices are present, you must wait or ensure that the device is no longer being used by the system before you attempt to remove the board.


Caution - Caution -

You must use the power command to power off the board before you physically remove it from the server. The deleteboard(1M) command does not power off the board. Refer to the power(1M) man page for information about the power command. Or see the section "To Physically Replace a System Board". In addition, the Sun Enterprise 10000 Systems Service Manual contains complete information about physically removing and replacing boards.


To Delete a Board From a Domain
  1. Use the deleteboard(1M) command to delete the board from the domain.

    The following example of the deleteboard(1M) command deletes system board 2 from its current domain. Two retries are performed, if necessary, with a wait time of 15 minutes between retries.


    % deleteboard -r 2 -t 900 SB2

Moving Boards

Moving a board from one domain to another domain removes the board from the first domain; and then connects and configures it into the target domain.

You should always check the usage of the memory and devices on a board before you move it out of a domain. If the board hosts permanent memory, the memory must be moved to another board within the same domain before the board can be moved to another domain. Likewise, if any busy devices are present, you must wait or ensure that the device is no longer being used by the system before you attempt to move the board.

To Move a Board
  1. Use the moveboard(1M) command to move the board from one domain to another domain.

    The following example of the moveboard(1M) command moves system board 2 from its current domain to the domain specified by domain_id. Two retries are performed, if necessary, with a wait time of 15 minutes between retries.


    % moveboard -d domain_id -r 2 -t 900 SB2

Replacing System Boards

This section describes how to physically replace a board in a domain by using the commands described in this chapter.

To Physically Replace a System Board

In the following steps, system board 2 is removed from its current domain and replaced by system board 3.

  1. Delete the board from the domain.


    % deleteboard -r 2 -t 900 SB2
    

  2. Power off system board 2.

    Refer to the power(1M) man page for information about the power command.


    % power -off -sb 2
    

    Caution - Caution -

    For complete information about physically removing and replacing boards, refer to the Sun Enterprise 10000 Systems Service Manual. Failure to follow the stated procedures can result in damage to system boards and other components.


  3. Physically remove system board 2 and replace it with system board 3.

  4. Power on system board 3.


    % power -on -sb 3
    
  5. Add system board 3 to the domain.


    % addboard -r 2 -t 900 SB3