A P P E N D I X B |
Troubleshooting |
This chapter discusses common types of failure:
The following are examples of cfgadm diagnostic messages. (Syntax error messages are not included here.)
See the following man pages for additional error message detail: cfgadm(1M), cfgadm_sbd(1M), cfgadm_pci(1M), and config_admin(3CFGADM).
An unconfigure operation for a system board or I/O board can fail if the system is not in a correct state when you begin the operation.
If you try to unconfigure a system board whose memory is interleaved across system boards, the system displays an error message such as:
cfgadm: Hardware specific failure: unconfigure N0.SB2::memory: Memory is interleaved across boards: /ssm@0,0/memory-controller@b,400000 |
If you try to unconfigure a CPU to which a process is bound, the system displays an error message such as the following:
cfgadm: Hardware specific failure: unconfigure N0.SB2::cpu3: Failed to off-line: /ssm@0,0/SUNW,UltraSPARC-III |
Unbind the process from the CPU and retry the unconfigure operation.
All memory on a system board must be unconfigured before you try to unconfigure a CPU. If you try to unconfigure a CPU before all memory on the board is unconfigured, the system displays an error message such as:
cfgadm: Hardware specific failure: unconfigure N0.SB2::cpu0: Can't unconfig cpu if mem online: /ssm@0,0/memory-controller |
Unconfigure all memory on the board and then unconfigure the CPU.
To unconfigure the memory on a board that has permanent memory, move the permanent memory pages to another board that has enough available memory to hold them. Such an additional board must be available before the unconfigure operation begins.
If the unconfigure operation fails with a message such as the following, the memory on the board could not be unconfigured:
cfgadm: Hardware specific failure: unconfigure N0.SB0: No available memory target: /ssm@0,0/memory-controller@3,400000 |
Add to another board enough memory to hold the permanent memory pages, and then retry the unconfigure operation.
Confirm the memory page cannot be moved.
Look for the word "permanent" in the listing.
If the unconfigure fails with one of the messages below, removal of the board would not leave enough available memory in the system.
Reduce the memory load on the system and try again; if practical, install more memory in another board slot.
If the unconfigure fails with the following message, the memory demand has increased while the unconfigure operation was proceeding:
Reduce the memory load on the system and try again.
CPU unconfiguration is part of the unconfiguration operation for a
system board. If the operation fails to take the CPU offline, the following message is logged to the console:
It is possible to unconfigure a board and then discover that it cannot be disconnected. The cfgadm status display lists the board as not detachable. This problem occurs when the board is supplying an essential hardware service that cannot be relocated to an alternate board.
A device cannot be unconfigured or disconnected while it is in use. Many failures to unconfigure I/O boards occur because activity on the boards has not been stopped, or because an I/O device becomes active again after it has been stopped.
Disks attached to an I/O board must be idled before you attempt to unconfigure or disconnect that board. Any attempt to unconfigure/disconnect a board whose devices are still in use is rejected.
If an unconfiguration operation fails because an I/O board has a busy or open device, the board is left only partially unconfigured. The operation sequence stopped at the busy device.
To regain access to the devices that were not unconfigured, the board must be completely unconfigured, then reconfigured.
If a device on the board is busy, the system logs a message such as the following after an attempt to unconfigure:
cfgadm: Hardware specific failure: unconfigure N0.IB6: Device busy: /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@6,0 |
To continue the unconfigure operation, unmount the device and retry the unconfigure operation. The board must be in the unconfigured state before you try to reconfigure this board.
1. Use the fuser(1M) command to identify the processes that have these devices open.
2. Kill the vold daemon gracefully.
3. Disconnect all SCSI controllers that are associated with the card you are trying to unconfigure.
To get a list of all connected SCSI controllers use the following command.
4. If the redundancy features of Solaris Volume Manager mirroring are used to access a device connected to the board, reconfigure these subsystems so that the device or network is accessible by way of controllers on other system boards.
5. Unmount file systems, including volume manager meta-devices that have a board resident partition.
6. Remove the volume manager database from board-resident partitions.
The location of the volume manager database is explicitly chosen by the user and can be changed.
7. Remove any private regions used by Solaris Volume Manager or Veritas Volume Manager.
Solaris Volume Manager by default uses a private region on each device that it controls, so such devices must be removed from Solaris Volume Manager control before they can be detached.
8. Remove disk partitions from the swap configuration.
9. Either kill any process that directly opens a device or raw partition, or direct it to close the open device on the board.
Note - Unmounting file systems might affect NFS client systems. |
Time-outs occur by default after two minutes. Administrators might need to increase this time-out value to avoid time-outs during a DR-induced operating system quiescence, which might take longer than two minutes. Quiescing a system makes the system and related network services unavailable for a period of time that can exceed two minutes. These changes affect both the client and server machines.
Before configuring memory, all CPUs on the system board must be configured. If you try to configure memory while one or more CPUs are unconfigured, the system displays an error message such as:
cfgadm: Hardware specific failure: configure N0.SB2::memory: Can't config memory if not all cpus are online: /ssm@0,0/memory-controller |
A configure operation might fail because an I/O board with a device does not currently support hot-plugging. In such a situation, the board is now only partially configured. The operation has stopped at the unsupported device. In this situation, the board must be brought back to the unconfigured state before another configure attempt. The system logs a message, such as:
To continue the configure operation, either remove the unsupported device driver or replace it with a new version of the driver that supports hot-plugging.
Copyright © 2005, Sun Microsystems, Inc. All Rights Reserved.