Sun Enterprise 10000 Dynamic Reconfiguration User Guide

DR Driver Error Messages

The following table contains DR driver error messages that are sent to the console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-15 DR Driver Error Messages

Error Message 

Probable Cause 

Suggested Action 

dr: Internal error: dr.c line_number

An internal error has occurred in the DR driver.  

Retry the operation that failed. If the error persists, exit and restart various DR software components, then retry the operation. If The problem still persists, reboot the domain. Check the console or the system logs for any additional information.  

dr: Insufficient memory: resource

The DR framework was unable to configure or unconfigure resources because a KPHYSM_ERESOURCE error or cpu_configure()/cpu_unconfigure() error with the ENOMEM errno occurred.

This condition might be transient. Retry the DR operation. If the error persists and if the operation that is failing is the unconfigure operation, then try configuring more memory into the domain from a different domain. If the error still persists, reboot the domain.  

dr: Device busy: resource

Translation of possible EBUSY errno from cpu_configure() or cpu_unconfigure(); or an I/O device cannot be detached because it is busy. This error message is also returned if a CPU to be detached is online when dr_pre_detach_cpu is called. A CPU cannot be detached while a delete memory operation is in progress.

Use showdevices(1M) on the system controller to find out why the resource is busy. Or, on the domain, use fuser(1M), psrinfo(1M), prtdiag(1M), or similar tools to find out why the device is busy. Also check if another memory deletion is already in progress. Either reconfigure or shutdown whatever is consuming the resource, or wait for the previous memory deletion to complete depending, on the cause of the error. Then retry the DR operation.

dr: Operation already in progress: resource

Translation of possible EALREADY errno from cpu_configure() or cpu_unconfigure().

Use showdevices(1M) on the system controller to examine the configuration of the specified resource. Or, on the domain, use cfgadm(1M), pbind(1M), psrinfo(1M), and similar commands to examine the configuration of the resource. Determine what operations are already in progress on this resource, and either wait for them to complete or cancel them. Then, retry the DR operation. The operation already in progress may already have terminated, so retrying to the operation might succeed, or may produce another error.

dr: I/O error: resource

An unexpected error code resulted from a call to kphysm_del_start. A more verbose cmn_err message is also printed.

Check the verbose error message from cmn_err in the system logs, and/or on the console for a more specific condition and suggested action.

dr: Bad address: resource

kphysm_add_memory_dynamic returned KPHYSM_EFAULT.

Retry the DR operation. If this error persists, contact your Sun Service representative.  

dr: No device(s) on board: board_path

The board is connected or disconnected with no devices (I/O, memory, or CPU).  

If devices were expected to be on the board, then disconnect it. The board should be removed from the server, and its components should be reseated by a qualified technician.  

dr: Invalid argument: attachment_point

DR was passed an invalid argument.  

Retry the DR operation. If this error persists, contact your Sun service representative.  

dr: Invalid state transition: attachment_point

A DR operation was sequenced out of order. This could be operator error if the cfgadm(1M) functions were issued out of order. Or, the DR driver could be confused due to some internal error conditions.

Retry the DR operation. If this error persists, stop and start (or unload and load) DR software components to recovery from this error condition. If the error continues to persist, reboot the domain.  

dr: Device in fatal state

The device could not be suspended, or it refused to be suspended.  

Retry the DR operation. If this error persists, the device could be suspend-unsafe. Check the list of suspend-unsafe devices. If the device is unsafe, use showdevices(1M) or fuser(1M) to how the device is in use, and manually reconfigure the consumers of the resource. Then, manually unload the driver, or if needed, unplug the cables attached to the device. The device should now be safe to retry the operation. Do not plug the cables back into the device, reload its driver, or reconfigure its consumers before the DR operation has succeeded.

dr: Device failed to resume: path

A previously suspended device could not be resumed.  

 

dr: Cannot stop user thread

DR could not stop a user thread(s) in preparing a device to be suspended.  

Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. You might have to kill the threads to enable the DR operation to proceed.  

dr: Cannot quiesce realtime thread

A realtime thread was encountered in an attempt to suspend the operating system. Suspending, or quiescence, of realtime threads is not allowed. All realtime threads must be stopped or changed to non-realtime before a suspend can succeed.  

Kill the realtime thread(s), or adjust their priority by using the priocntl(1M) command. (You must obtain the PID to adjust the priority of realtime threads.)

dr: Cannot stop kernel thread: name

DR could not stop a kernel thread.  

Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. Kill the kernel threads, if possible, to enable the DR operation to proceed.  

dr: Failed to off-line: cpu

A CPU could not off-lined, preventing it from being unconfigured. The CPU might have a thread(s) bound to it. An additional cmn_err message is logged if there are threads bound to the CPU. DR must be able to off-line CPUs and/or to power off CPUs before the board can be disconnected.

Check the console and system log messages to determine if threads are bound to the CPU. If they are, they can be manually unbound or rebound to CPUs on other boards in the domain. If threads are not bound to the CPU, use psrset(1M), pbind(1M), and psrinfo(1M) to determine what changes are required to enable DR to off-line the CPU. For example, you might have to add more CPUs to the domain from different boards. Or, you may have to online other CPUs. Finally, you might have to add more CPU boards to take over the CPU workload.

dr: Failed to on-line: cpu

DR could not online a CPU on a newly-connected or previously unconfigured board.  

 

dr: Failed to start CPU: cpu

DR could not start a CPU on a newly-connected or previously unconfigured board.  

 

dr: Failed to stop CPU: cpu

DR could not power off a CPU on a board to be unconfigured. All of the CPUs on a board to be unconfigured must be taken offline and powered off before the operation can succeed.  

 

dr: Kernel cage is disabled: resource

When the kernel cage is disabled, boards hosting permanent memory cannot be detached.  

You must enable the kernel cage in /etc/system and reboot the domain.

dr: No available memory target: resource

DR could not detach the board because it hosts permanent memory and there is no available target for the memory. Permanent memory must be moved to another memory component within the domain before the DR operation can succeed.  

Configure an additional memory component that contains an adequate amount of memory to act as a target for this board. Then, retry the DR operation.  

dr: VM viability test failed: resource

Translation of error code returned by kphysm_del_start.

Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.  

dr: kphysm_pre_del failed: resource

Translation of error code returned by kphysm_del_start.

Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.  

dr: Non-relocatable pages in span: resource

 

 

dr: kphysm_del_cancel: resource

 

 

dr: Memory operation failed: resource

DR failed to attach the memory on a newly attached board.  

 

dr: Can't unconfig cpu if mem online

DR cannot unconfigure a CPU if the memory on the board is online.  

You must off-line the memory before you can unconfigure the board.  

ngdrmach: Cannot read property value: Device Node node_address: property property_name

DR could not get the specified property of a particular device node.  

 

ngdrmach: Cannot determine property length: board::slot:property

DR could not get the length of the specified property for a particular device node. 

 

ngdrmach: No CPU specified for connect: slot

 

 

ngdrmach: Cannot move SIGB assignment

 

 

ngdrmach: Cannot disconnect CPU; SIGB is currently assigned: slot::board

 

 

ngdrmach: Device driver failure: path

 

 

ngdr: Must specify a CPU on the given board: cpu_id

 

 

ngdrmach: No such device: board::slot

 

 

ngdrmach: Memory configured with inter-board interleaving: board::slot

 

 

ngdrmach: Invalid board number: board_number

An invalid board number was specified for the assign board operation.  

Use a different board number, or fix the available components list on the system controller for the domain to include the board for which the assign function is failing.  

ngdrmach:: Cannot proceed; Board is configured or busy: component_name

DR cannot power off or unassign a board that is still configured or busy.  

Unconfigure the board, or wait for any previous DR operations on the board to complete. Then, retry the DR operation 

ngdrmach: Firmware probe failed: attachment_point

OBP failed to probe the board.  

 

ngdrmach: Firmware deprobe failed: attachment_point

OBP failed to deprobe the board.  

 

ngdrmach: Operation not supported

The operation you attempted is not supported.  

None 

ngdrmach: Unrecognized platform command: command/options

An unrecognized command was passed to DR.  

Refer to the cfgadm_sbd(1M) man page to ensure that you use a valid argument. If you used a valid argument and this error persists, contact your Sun service representative.

ngdrmach: drmach parameter is not a valid ID

An invalid drmachid_t value was encountered.

 

ngdrmach: drmach parameter is inappropriate for operation

The wrong type of drmachid_t was passed to a function.

 

ngdrmach: Unexpected internal condition: drmach.c line_number

An internal drmach error occurred.  

Use modunload(1M) and modload(1M) to unload then to load the drmach driver. Then, retry the DR operation. If this error persists, then you must reboot the domain.

ngdrmach: No CPU specified for connect.

 

 

ngdrmach: Firmware move_cpu0 failed: CPU cpu_id 

 

 

ngdrmach: Cannot move SIGB assignment 

 

 

ngdrmach: Cannot disconnect CPU; SIGB is currently assigned