Sun Enterprise 6x00, 5x00, 4x00, and 3x00 Systems Dynamic Reconfiguration User's Guide

Chapter 4 Troubleshooting

Diagnostic Messages

The following table lists examples of cfgadm diagnostic messages. (Syntax messages are not included in this list.)


cfgadm: Configuration administration not supported on this machine
cfgadm: hardware component is busy, try again
cfgadm: operation: configuration operation not supported on this machine
cfgadm: operation: Data error: error_text
cfgadm: operation: Hardware specific failure: error_text
cfgadm: operation: Insufficient privileges
cfgadm: operation: Operation requires a service interruption
cfgadm: System is busy, try again

See config_admin(3X) for additional error message detail.

Troubleshooting Specific Failures

There are several common types of failure:

Driver Does Not Support DR

  1. Some drivers do not yet support DR operations. A DR-compatible driver must be suspendable. Use this command to test for suspendable drivers.


    # cfgadm -x quiesce-test sysctrl#:slot#
    

  2. DR may not yet support some types of I/O and CPU/memory boards in Enterprise 3x00, 4x00, 5x00, and 6x00 systems. Use the quiesce test (above) or refer to the latest release notes.

Unable to Unconfigure

Before you attempt a DR unconfigure operation:

A device cannot be unconfigured or disconnected while it is in use. Disks attached to an I/O board must unmounted before any attempt is made to unconfigure or disconnect that board. Any attempt to unconfigure/disconnect a board whose devices are still in use will generate an error.

If an unconfiguration operation fails because an I/O board has a busy or open device, the board is left only partially unconfigured. The operation sequence stopped at the busy device.

To regain access to the devices which were not unconfigured, the board must be completely unconfigured and then reconfigured.

In such a case, the system will log messages similar to the following:


NOTICE: unconfiguring dual-pci board in slot 7
NOTICE: dual-pci board in slot 7 partially unconfigured 
reason:sysc iohelp unconfigure: Device busy
output from sysctrl unconfigure is:detach failed: /pci@f,4000/SUNW,isptwo@3/sd@2,0
is busy

To continue the unconfigure operation, unmount the device and retry the unconfigure operation. The board must be in the unconfigured state before you try to configure this board.

Unable to Configure

A configure operation may fail because an I/O board with a device does not currently support hot-plugging. In such a situation, the board is now only partially configured. The operation has stopped at the unsupported hot-plug device. In this situation, the board must be brought back to the unconfigured state before another configure attempt. In such a case the system will log messages similar to the following:


NOTICE: configuring dual-sbus-soc+ board in slot 4
NOTICE: dual-sbus-soc+ board in slot 4 partially configured
reason:sysc iohelp configure: Bad address
output from sysctrl configure is:attach failed: /sbus@8,0/SUNW,foo@d,10000/bar

To continue the configure operation, either remove the unsupported device's driver or replace it with a new version of the driver that will support hot-plugging.

Problems with Network Devices

DR does not automatically terminate use of all network interfaces on the board that is being disconnected. You must manually terminate the use of each interface.

DR does not allow an unconfigure operation on any interface that fits any of the following conditions. In these cases, the unconfigure operation fails and DR displays an error message. The operation fails if:

Problems with I/O Devices

All I/O devices must be closed before they are unconfigured. To see which processes have these devices open, use the fuser(1M) command.

Perform the following tasks for I/O devices.


Caution - Caution -

Unmounting file systems may affect NFS client systems.


RPC Time-out or Loss of Connection

RPC time-outs occur by default after two minutes. Administrators may need to increase this time-out value to avoid time-outs during a DR-induced operating system quiescence, which may take longer than two minutes. These changes affect both the client and server machines.