C H A P T E R  3

Preparing to Use DR

This chapter, along with chapters 1 and 2, provides information and some procedures you should understand to use DR successfully.



caution icon

Caution - An improperly executed DR operation can cause DR to fail and, in some cases, damage system components.



This chapter covers the following topics:


The cfgadm(1M) Command

The cfgadm(1M) command performs DR operations on the domain. DR operations are passed to the libcfgadm(3LIB) library interface, which dynamically loads a hardware-specific library plug-in that actually performs the DR operations.



Note - If the cfgadm(1M) command fails during a DR operation, the board does not return to its original state. If the error is recoverable, you can retry the command. If the error is unrecoverable, you must reboot the domain to use the board.



The sbd.so.1 hardware-specific plug-in provides DR functionality: connect, configure, unconfigure, and disconnect system boards, which enables you to connect or disconnect a system board from a running system without having to reboot the system.

The cfgadm(1M) command resides in the /usr/sbin directory. (See the cfgadm(1M) man page for more information.)

Each board slot appears as a single attachment point in the device tree. You can view the type, state, and condition of each component, and the state and condition of each board slot, by using the cfgadm(1M) command with its -a option.

The following options and operands are supported for the functions shown, where ap_id specifies the attachment point of the system board or component:


TABLE 3-1 cfgadm Options

Options and Operands

Specifies

-c connect ap_id

Change the receptacle state to connected.

-c disconnect ap_id

Change the receptacle state to disconnected.

-c configure ap_id

Change the occupant state to configured.

-c unconfigure ap_id

Change the occupant state to unconfigured.

-x assign ap_id

Change the occupant state to assigned.

-x unassign ap_id

Change the occupant state to unassigned.

-x poweron ap_id

Change the occupant state to powered on.

-x poweroff ap_id

Change the occupant state to powered off.

-l ap_id

Display the state, status, and condition of system boards and components.

-h [ap_id]

Print out a help message text. If ap_id is specified, the help routine of the hardware-specific library for the attachment point indicated by the argument is called.

-v

Execute in verbose mode.

-n

Automatically answer No to all prompts without displaying them.

-y

Automatically answer Yes to all prompts without displaying them..

-s listing_options

The state of attachment points to be displayed according to listing_options. Supplies listing options to -l) flag. The listing_options argument conforms to the syntax conventions of the getsubopt(3C) man page, and specifies:

  • Attachment point selection criteria (i.e., select=select_string)
  • Type of matching desired (i.e., match=match_type)
  • Order of listing (i.e., sort=field_spec)
  • Data displayed (i.e., cols=field_spec and cols2=field_spec)
  • Column delimiter (i.e., delim=string)
  • Column-heading suppression (i.e., noheadings).

-o hardware_options

Supply hardware-specific options to the main command option. The format and content of the hardware_options string is completely hardware-specific; and the string conforms to the syntax conventions of the getsubopt(3C) man page.

-t ap_id

Perform a test of one or more attachment points. The test function is used to re-evaluate the condition of the attachment point. Without a test-level specifier in hardware_options, the fastest test that identifies hard faults is used.



The rcfgadm(1M) Command (High-End Only)

The SMS command rcfgadm(1M) is executed on the SC and takes the same options and operands as cfgadm(1M), but often requires addition of the -d domain_id option. See rcfgadm(1M).


Checking Device Type, State and Condition

Before you attempt to perform any DR operation on a board or component from the domain, determine its state and condition.


procedure icon  To display states, types and conditions

single-step bulletUse the cfgadm(1M) command with the -la options.


# cfgadm -la


procedure icon  To display information about board slots and components

single-step bulletUse the prtdiag(1M) command.


# prtdiag 

The prtdiag(1M) command displays board numbers.


Preparing to Use DR on a Domain

Before you perform DR operations for the first time on a domain after it has been booted, make sure the board is available to the domain.


procedure icon  To Display Boards Available to the Domain

single-step bulletUse the cfgadm(1M) command with its -l option.


# cfgadm -l

On high-end systems each domain maintains an available component list. On midrange systems, domains maintain access control lists. Both are referred to as ACLs.

An error might occur if you attempt to perform DR operations on a board that is one of the following:

In either of these cases, the board is not available to the domain. For more information about viewing the available component list on high-end systems, see the System Management Services (SMS) Administrator Guide. For more info about ACLs on midrange systems, see the Sun Fire Midrange Systems Platform Administrator Manual.


Displaying System Board Status


procedure icon  To Display System Board Status

single-step bulletUse the cfgadm(1M) command.


# cfgadm -a -s "select=class(sbd)"

The cfgadm(1M) command displays information about boards that are either assigned to the domain, or appear in the ACL and are not assigned to another domain. The -a option tells the command to list all known attachment points, including board slots, SCSI buses, and PCI slots.

The following display shows a typical output on a midrange system domain.


TABLE 3-2 System Board Status Sample Display

Ap_Id

Type

Receptacle

Occupant

Condition

N0.IB6
PCI_I/O_Boa
connected
configured
ok
N0.IB7
PCI_I/O_Boa
connected
configured
ok
N0.IB8
PCI_I/O_Boa
connected
configured
ok
N0.IB9
PCI_I/O_Boa
disconnected
unconfigured
unknown
N0.SB0
CPU_Board
connected
configured
unknown
N0.SB1
CPU_Board
disconnected
unconfigured
failed
N0.SB2
CPU_Board
connected
configured
ok
N0.SB3
unknown
empty
unconfigured
unknown
N0.SB4
unknown
empty
unconfigured
unknown
N0.SB5
unknown
empty
unconfigured
unknown

To display more detailed information, add the -v option to cfgadm(1M).


Testing Boards


procedure icon  To Test a System Board

single-step bulletUse the cfgadm(1M) command with its -t option.


# cfgadm -t ap_id

where ap_id is an attachment point identifier.

single-step bulletUse the cfgadm(1M) command with its -t and -o options to test at a specified diagnostic level (midrange systems only).


# cfgadm -o platform=diag=<level> -t ap_id

where level is a diagnostic level and ap_id is an attachment point identifier.

If you do not specify the level on midrange systems, the setupdomain command sets the default diagnostic level, as described in both the Sun Fire Midrange Systems Platform Administration Manual and the Sun Fire Midrange System Controller Command Reference Manual. The diagnostic levels are:


TABLE 3-3 Diagnostic Levels

Diagnostic Level

Description

init

Run, but do not test, system board initialization code, for a quick pass through POST.

quick

Test all system board components, but with few tests and test patterns.

default or max

Test all system board components, except memory and Ecache modules, with all tests and test patterns.

mem1

Run all tests at the default level, plus more exhaustive DRAM and SRAM test algorithms. For Memory and Ecache modules, test all locations with multiple patterns. More extensive, time-consuming algorithms are not run at this level.

mem2

Run all tests in mem1, plus a DRAM test that does explicit compare operations of the DRAM data.



procedure icon  To Test an I/O Board (Midrange Only)



Note - You cannot use the DR connect and configure operations to add an I/O board to a domain in a single-partition midrange system that is configured with one or more UltraSPARC IV+ system boards. This restriction is due to the absence of a second domain in which the I/O board can be tested. However, you can use the DR unconfigure and disconnect commands on an I/O board in the described system. For more information see the Sun Fire Midrange Systems Platform Administration Manual, Firmware Release 5.19.0.



In this procedure, domain A is the current, active domain and domain B is the spare domain.

1. Enter the domain shell of the spare domain (B).

2. Press and hold the CTRL key while pressing the ] key to bring up the telnet> prompt.

3. At the telnet> prompt, type send break to display the system controller domain shell.

4. In the spare domain (B) shell, add the I/O assembly to the domain.


schostname:B> addboard IBx

where x is 6, 7, 8, or 9.

5. Set the virtual keyswitch in the spare domain to on.


schostname:B> setkeyswitch on
.
.
{x} ok

where x represents the CPU. POST is run on the domain when you turn the virtual keyswitch to on. If you see the ok prompt, the I/O board or I/O assembly is functioning properly.

6. Set the mode to standby.


schostname:B> setkeyswitch standby

7. Delete the board.


schostname:B> deleteboard ibx

8. Add the board to the active domain (A).


# cfgadm -c configure N0.IBx


procedure icon  To Prepare an I/O Board for DR (High-End Only)

Before you attempt to perform DR operations on an I/O board in a high-end system domain, verify all the following are true:

See the pbind(1M) man page for more information about bound processes.

When you use DR to configure an I/O board into a domain (or to test an I/O board explicitly using the cfgadm(1M) command with its -t option), one CPU that is an occupant on a system board in the same domain is selected to test the board. Further, no process can be bound to the CPU, and at least one additional CPU must remain in the domain. If no such CPU is available to perform the test, a message such as the following is displayed:


WARNING: No CPU available for I/O cage test

The CPU is unconfigured from the domain and the I/O board tested. After the test is complete, the CPU is configured back into the domain. After the CPU is successfully reconfigured, its timestamp as displayed by the psrinfo(1M) command differs from timestamps for other CPUs in the domain.