|C H A P T E R 3|
Preparing to Use DR
This chapter, along with chapters 1 and 2, provides information and some procedures you should understand to use DR successfully.
This chapter covers the following topics:
The cfgadm(1M) command performs DR operations on the domain. DR operations are passed to the libcfgadm(3LIB) library interface, which dynamically loads a hardware-specific library plug-in that actually performs the DR operations.
Note - If the cfgadm(1M) command fails during a DR operation, the board does not return to its original state. If the error is recoverable, you can retry the command. If the error is unrecoverable, you must reboot the domain to use the board.
The sbd.so.1 hardware-specific plug-in provides DR functionality: connect, configure, unconfigure, and disconnect system boards, which enables you to connect or disconnect a system board from a running system without having to reboot the system.
The cfgadm(1M) command resides in the /usr/sbin directory. (See the cfgadm(1M) man page for more information.)
Each board slot appears as a single attachment point in the device tree. You can view the type, state, and condition of each component, and the state and condition of each board slot, by using the cfgadm(1M) command with its -a option.
The following options and operands are supported for the functions shown, where ap_id specifies the attachment point of the system board or component:
The state of attachment points to be displayed according to listing_options. Supplies listing options to -l) flag. The listing_options argument conforms to the syntax conventions of the getsubopt(3C) man page, and specifies:
Supply hardware-specific options to the main command option. The format and content of the hardware_options string is completely hardware-specific; and the string conforms to the syntax conventions of the getsubopt(3C) man page.
Perform a test of one or more attachment points. The test function is used to re-evaluate the condition of the attachment point. Without a test-level specifier in hardware_options, the fastest test that identifies hard faults is used.
The SMS command rcfgadm(1M) is executed on the SC and takes the same options and operands as cfgadm(1M), but often requires addition of the -d domain_id option. See rcfgadm(1M).
Before you attempt to perform any DR operation on a board or component from the domain, determine its state and condition.
Use the cfgadm(1M) command with the -la options.
Use the prtdiag(1M) command.
The prtdiag(1M) command displays board numbers.
Before you perform DR operations for the first time on a domain after it has been booted, make sure the board is available to the domain.
Use the cfgadm(1M) command with its -l option.
On high-end systems each domain maintains an available component list. On midrange systems, domains maintain access control lists. Both are referred to as ACLs.
An error might occur if you attempt to perform DR operations on a board that is one of the following:
In either of these cases, the board is not available to the domain. For more information about viewing the available component list on high-end systems, see the System Management Services (SMS) Administrator Guide. For more info about ACLs on midrange systems, see the Sun Fire Midrange Systems Platform Administrator Manual.
Use the cfgadm(1M) command.
The cfgadm(1M) command displays information about boards that are either assigned to the domain, or appear in the ACL and are not assigned to another domain. The -a option tells the command to list all known attachment points, including board slots, SCSI buses, and PCI slots.
The following display shows a typical output on a midrange system domain.
To display more detailed information, add the -v option to cfgadm(1M).
Use the cfgadm(1M) command with its -t option.
where ap_id is an attachment point identifier.
Use the cfgadm(1M) command with its -t and -o options to test at a specified diagnostic level (midrange systems only).
where level is a diagnostic level and ap_id is an attachment point identifier.
If you do not specify the level on midrange systems, the setupdomain command sets the default diagnostic level, as described in both the Sun Fire Midrange Systems Platform Administration Manual and the Sun Fire Midrange System Controller Command Reference Manual. The diagnostic levels are:
Run all tests at the default level, plus more exhaustive DRAM and SRAM test algorithms. For Memory and Ecache modules, test all locations with multiple patterns. More extensive, time-consuming algorithms are not run at this level.
Note - You cannot use the DR connect and configure operations to add an I/O board to a domain in a single-partition midrange system that is configured with one or more UltraSPARC IV+ system boards. This restriction is due to the absence of a second domain in which the I/O board can be tested. However, you can use the DR unconfigure and disconnect commands on an I/O board in the described system. For more information see the Sun Fire Midrange Systems Platform Administration Manual, Firmware Release 5.19.0.
In this procedure, domain A is the current, active domain and domain B is the spare domain.
1. Enter the domain shell of the spare domain (B).
2. Press and hold the CTRL key while pressing the ] key to bring up the telnet> prompt.
3. At the telnet> prompt, type send break to display the system controller domain shell.
4. In the spare domain (B) shell, add the I/O assembly to the domain.
where x is 6, 7, 8, or 9.
5. Set the virtual keyswitch in the spare domain to on.
where x represents the CPU. POST is run on the domain when you turn the virtual keyswitch to on. If you see the ok prompt, the I/O board or I/O assembly is functioning properly.
6. Set the mode to standby.
7. Delete the board.
8. Add the board to the active domain (A).
Before you attempt to perform DR operations on an I/O board in a high-end system domain, verify all the following are true:
See the pbind(1M) man page for more information about bound processes.
When you use DR to configure an I/O board into a domain (or to test an I/O board explicitly using the cfgadm(1M) command with its -t option), one CPU that is an occupant on a system board in the same domain is selected to test the board. Further, no process can be bound to the CPU, and at least one additional CPU must remain in the domain. If no such CPU is available to perform the test, a message such as the following is displayed:
The CPU is unconfigured from the domain and the I/O board tested. After the test is complete, the CPU is configured back into the domain. After the CPU is successfully reconfigured, its timestamp as displayed by the psrinfo(1M) command differs from timestamps for other CPUs in the domain.