Sun Enterprise 10000 Dynamic Reconfiguration User Guide

Appendix A DR Error Messages

This appendix contains a list of some of the error messages that you might see while you are performing DR operations. The list does not include Protocol Independent Module (PIM) layer errors, which are more generic than the error messages in the following tables.

All DR error messages are sent to the one or both of the following locations:

SSP applications
System error logs

Searching This Appendix

Before you use this appendix, take time to read the following list of search tips so that you can find a specific message.

Search on a specific string of text in the error message.
Avoid using numeric values. They are treated as replaceable text in this appendix.
Avoid using text that is replaceable. In this appendix, the following names are used to represent replaceable text in the error messages: descriptive message, errno_description. device_name, target_path, mount_point, interface_name_instance, interface_name, and partition_name.
If you are reading this text in hard-copy form, the tables are presented in order by the type of error or failure. The contents of the tables is sorted alphabetically in descending order.

Error-Type Links

The following are different types of errors:

SSP Errors

Use one of the following links to start your search of SSP-related error messages:

"Protocol and Communication Error Messages"

"Attach-Related Error Messages"

"Detach-Related Error Messages"

"Auto-Configuration Error Messages"

Domain Errors

Use one of the following links to start your search of domain-related error messages:

"DR Daemon Start-Up Error Messages"

"Memory Allocation Error Messages"

"DR Driver Error Messages"

"Platform Specific Module (PSM) Error Messages"

"DR General Domain Error Messages"

"DR Domain Exploration Error Messages"

"OpenBoot PROM Error Messages"

"Unsafe-Device Query Error Messages"

"Alternate Pathing (AP) Error Messages"

"DCS Error Messages"

"DR Driver Error Messages"

"Plugin Error Messages"

SSP Error Messages

The following sections contain SSP-related error messages:

Protocol and Communication Error Messages

The following table contains the protocol and communication error messages that are sent to the system logs and/or the SSP applications.

Table A-1 Protocol and Communication Failure Error Messages


Error Message	Probable Cause	Suggested Action
`NGNGDR Error: abort_attach_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGNGDR Error: abort_detach_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGNGDR Error: attach_finished: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: complete_attach_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: cpu0_move_finished: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: detach_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: detach_finished: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: detachable_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: drain_board_resources: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: get_board_config: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input or catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: get_board_state: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: get_cpu_info: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: get_obp_board_config: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: initiate_attach_board: invalid board number`	The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: initiate_attach_board: invalid cpu number`	The RPC is attempting to perform a initiate an attach of a board that contains a CPU that is not on the board. The DR applications carefully filter the user input or catch invalid CPU numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon.	Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly.
`NGDR Error: Unauthorized RPC call . . . Not owner`	The DR daemon received an RPC that failed authentication.	Check the system log for more information about this error. Also, make sure that the version numbers match for the SSP and the DR daemon and that the SSP user and network services are properly configured.

Attach-Related Error Messages

The following table contains attach-related failure error messages that are sent to the system logs and/or the SSP applications.

Table A-2 Attach-Related Failure Error Messages


Error Message	Probable Cause	Suggested Action
`NGDR Error: abort_attach_board: invalid board state`	The attach operation could not be aborted because the board is not in the init_attach state, awaiting to be configured into the domain.	Wait for the board to enter the init_attach state. Only then can the attach operation be aborted.
`NGDR Error: attach_finished: invalid board state`	Communication protocol has been breached over the state of the attach operation. The DR driver and daemon disagree with the SSP that the board was waiting for the confirmation of the attach operation from the SSP.	Exit and restart the current DR application, then retry the operation. If this error persists, stop and restart the DR daemon. You may need to reboot the domain to recover from this error.
`NGDR Error: Cannot abort attach. Board ineligible for further DR operations.`	The board entered the FATAL state after the abort command was issued, causing the abort operation to fail and the board to be lost from the system.	Reboot the domain.
`dr_attach: failure executing A3000 hot_add script . . . error message`	The Sun(TM) StoreEdge(TM) A3000 `hot_add` script is executed directly after a DR attach operation. If the script exists, but it cannot be executed, the error message explains why.	If you are not using, nor plan to use, A3000 devices, you can rename the script so that it will not be found.
`initiate_attach_board: already init_attached`	You attempted to initiate the attach of a board that was already initiated.	Go to the complete attach window and continue the attach process.
`NGDR Error: complete_attach_board: invalid board state`	You tried to initiate an attach operation on a board that is not eligible--the board is not in the init_attach state awaiting attachment to the domain.	Wait for the board to enter the init_attach state. Only then can the attach operation be aborted.
`NGDR Error: initiate_attach_board: invalid board state`	You tried to initiate an attach operation on a board that is not eligible--the board is not in the PRESENT state awaiting attachment to the domain.	Wait for the board to enter the init_attach state. Only then can the attach operation be aborted.
`NGDR Error: Some devices not attached. Examine the host syslog for details . . . errno_description`	Some of the devices were not configured into the domain.	Look at the system logs for more details about what devices were not configured into the domain and why they were not configured. Some devices on the board may not be supported by the operating environment or by the DR feature. You should blacklist unsupported devices.

Detach-Related Error Messages

The following table contains detach-related error messages that are sent to the system logs and/or to the SSP applications.

Table A-3 Detach-Related Failure Error Messages


Error Message	Probable Cause	Suggested Action
`NGDR Error: Cannot detach board board_number. It has interface_name interfaces configured.`	The board is not eligible to be detached because it has one or more network interfaces attached to it that are critical to the operation of the domain. The network interfaces can be any mix of primary, SSP, AP, or PBF interfaces.	Use the `ifconfig`(1M) command to determine the role of the interface(s). If the configured interface is the primary network or the SSP, manually switch the interface to the alternate interface if one exists. For an interface other than the primary and the SSP, unplumbing it may enable the detach operation to succeed. Otherwise, the domain must be shut down, and the interfaces must be moved to another board.
`NGDR Error: cpu0_move_finished: invalid board state`	Communication protocol has been breached over the eligibility of a CPU. To the SSP, the CPU has been moved off of the board. To the DR driver, the move operation is an invalid operation for that board.	None
`ifconfig down failed.`	The `ifconfig`(1M) command failed to bring down the network interfaces. The `ifconfig`(1M) command unplumbs and brings down the network interfaces before the board is detached. One of the network interfaces on the board could be busy, so manual intervention may be needed.	Log in to the domain, and, if possible, bring down the network interfaces on the board manually by using the `ifconfig`(1M) command with the `down` option. The manual execution of the command may yield more detailed information about the failure.
`ifconfig unplumb failed.`	The `ifconfig`(1M) command failed to unplumb the network interfaces. The `ifconfig`(1M) command unplumbs and brings down the network interfaces before the board is detached. One of the network interfaces on the board could be busy, so manual intervention may be needed.	Log in to the domain, and, if possible, unplumb the network interfaces manually by using the `ifconfig`(1M) command with the `unplumb` option. The manual execution of the command may yield more detailed information about the failure.
`Warning: Error return from /opt/SUNWconn/bin/nf_snmd_kill (return_value)`	The command failed. Certain daemons keep network interfaces open continuously. Those daemons must be stopped before the devices they control can be detached.	Analyze the `return_value` to determine why the `kill`(1) command failed, and try to correct the problem. If necessary, use the `ps`(1) command to obtain the PID number for the daemons, and use the `kill`(1) command to stop the daemons manually.
`Warning: Error return from /opt/SUNWconn/bin/pf_snmd_kill (return_value)`	The `kill`(1) command failed. The daemons that are used to control certain network devices must be stopped before the devices can be detached because the daemons keep the interfaces open continually.	Analyze the `return_value` to determine why the kill command failed, and try to correct the problem. If necessary, use the `ps`(1) command to obtain the PID number for the daemons, and use the `kill`(1) command to stop the daemons manually.
`NGDR Error: abort_detach: board already drained`	The CANCEL `ioctl()` failed while the DR daemon was trying to abort the detach operation. The failure caused the board to be reported as being in the UNREFERENCED state, indicating that the memory has already been drained.	The board must be completely detached before you can recover from this error. Retry the DR operation after the board has been successfully detached.
`NGDR Error: abort_detach_board: invalid board state`	Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain and has been, or is being, drained of its resources. The SSP, therefore, issues the abort command to stop the detach operation. However, to the DR driver and daemon, the board is not part of the domain.	Exit and restart the DR application.
`NGDR Error: board configuration query failed.`	The DR daemon failed to ascertain the eligibility of the configuration of the board.	Stop and start the DR daemon and/or the DR driver. If this error persists, use the modinfo(1M), modload(1M), and modunload(1M) commands to work with the driver after you have stopped the DR daemon. Also, check the size of the DR daemon with the `ps`(1) command. If it is not between 300- and 400 Kbytes, report this error, providing as much information from the system logs as possible.
`NGDR Error: Cannot abort detach. Board detached from OS (detach completed).`	This message indicates that the detach operation has completed. It follows the message that is displayed for the `NGDR Error: abort_detach: board already drained` error message.	See the `NGDR Error: abort_detach: board already drained` message.
`NGDR Error: couldn't query cpu configuration`	The complete_detach operation has failed because the DR daemon could not ascertain the CPU configuration just prior to the beginning of the complete_detach operation. After a board is detached, the DR daemon uses the information about the CPU configuration to update the `utmp` and `wtmp` entries for each CPU on the board. Although the complete_attach operation does not depend on the updates, if the mechanisms through which the CPU configuration is queried are broken, serious problems exist, so a completion of the detach operation should not proceed.	Stop and start the DR daemon and/or the DR driver. Also, check the size of the DR daemon with the `ps`(1) command. If it is not between 300- and 400-Kbytes, report this error, providing as much information from the system logs as possible.
`NGDR Error: detach_board: invalid board state`	Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, and its resources have been drained, causing the SSP to attempt to complete the detach operation. However, to the DR driver and daemon, the board is not part of the domain.	Examine the state of the board by using the `showdevices`(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.
`NGDR Error: detach_board: invalid board state`	The proper sequence of board states has not been followed, meaning that the board went into the error state or that an earlier failure in the drain-detach sequence of events was not properly reported.	Examine the state of the board by using the `showdevices`(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.
`NGDR Error: detach_finished: invalid board state`	Communication protocol has been breached over the eligibility of a board. To the SSP, the board has been detached. However, to the DR driver and daemon, the board has not been detached from the domain.	Examine the state of the board by using the `showdevices`(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.
`NGDR Error: detachable_board: invalid board state`	Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the resources. However, to the DR driver and daemon, the board is not part of the domain.	Examine the state of the board by using the `showdevices`(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.
`NGDR Error: detaching board would leave no online CPUs`	The detach operation failed because no CPUs would be left online after the board is detached.	Bring more CPUs online on other boards in the domain, or add more boards with online CPUs to the domain, so that the domain will have enough online CPUs after the board is detached.
`NGDR Error: drain_board_resources: invalid board state`	Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the resources. However, to the DR driver and daemon, the board is not part of the domain.	Examine the state of the board by using the `showdevices`(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.
`NGDR Error: Remaining system memory (memory_size mb) below minimum threshold (minimum_memory_size mb) . . . .Not enough space`	The domain must have enough memory to accommodate the memory of the board that is being detached. The detach operation failed because the domain does not have enough memory to detach the board.	Attach as many boards as necessary so that the memory in the domain will hold the memory on the board being detached.
`NGDR Error: Some devices not re-attached. Examine the host syslog for details . . . errno_description`	Devices could not be reattached to the operating environment during an abort detach operation. Errors were encountered while the DR daemon tried to communicate with the device drivers for one or more devices on the board.	Examine the system logs to determine which devices were not reattached. If possible, fix the problem then issue the `complete_attach`(1M) command again to fully configure the board. If this action fails, the failure may be caused by an unsupported device for which a state cannot be resolved until the domain is rebooted.
`NGDR Error: sysconf failed (_SC_NPROCESSORS_ONLN) . . . errno_description`	The `sysconf`(3c) system call failed to return the total number of online CPUs in the domain. Thus, the DR daemon cannot determine if the domain would be left with any online CPUs after the board is detached.	See the `sysconf`(3c) man page for more details about this error. Use those details and the `errno_description` to diagnose and solve the error. Retry the DR operation after you have solved the error. If no fix is apparent, stop and restart the DR daemon, then retry the DR operation.

Auto-Configuration Error Messages

The following table contains the list of auto-configuration error messages that are sent to the system logs and/or to the SSP applications.

Table A-4 Auto-Configuration Error Messages


Error Message	Probable Cause	Suggested Action
`NGDR Error: Complete pending DR operation prior to running autoconfig . . . Invalid argument`	The `autoconfig`(1M) command failed because a DR operation was still pending (that is, the board was not fully detached or attached before you issued the `autoconfig`(1M) command to reconfigure the operating environment).	Use the `showdevices`(1M) command to determine the state of the board. Decide to abort or complete the pending operation before you try to use the `autoconfig`(1M) command to reconfigure the operating environment.
`NGDR Error: Could not get /tmp/AdDrEm.lck lock . . . errno_description`	The DR daemon failed to get the lock it needs so that it can reconfigure the operating environment.	Check the additional `errno_description` and/or error number that is sent with the error message to determine why the lock could not be acquired.
`NGDR Error: Could not unlock /tmp/AdDrEm.lck lock . . . errno_description`	The DR daemon could not release the lock.	Check the additional `errno_description` and/or error number that is sent with the error message to determine why the lock was not released.
`NGDR Error: devlinks cmd failed. . . error descriptions`	The `devlinks`(1M) command failed to reconfigure the operating environment.	Check the additional `error descriptions` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.
`NGDR Error: disks cmd failed . . . error descriptions`	The `disks`(1M) command failed to reconfigure the operating environment.	Check the additional `error descriptions` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.
`NGDR Error: drvconfig cmd failed. . . error description`	The `drvconfig`(1M) command failed to reconfigure the operating environment.	Check the additional `error description` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.
`NGDR Error: ports cmd failed . . . error description`	The `ports`(1M) command failed to reconfigure the operating environment.	Check the additional `error description` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.
`NGDR Error: sync cmd failed . . . error description`	The `sync`(1M) command failed to reconfigure the operating environment.	Check the additional `error description` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.
`NGDR Error: tapes cmd failed . . . error descriptions`	The `tapes`(1M) command failed to reconfigure the operating environment.	Check the additional `error description` and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

Domain Error Messages

DR Daemon Start-Up Error Messages

The following table contains a list of the DR daemon start-up errors. These messages are sent only to the domain console window.

Table A-5 DR Daemon Start-Up Error Messages


Error Message	Probable Cause	Suggested Action
`Cannot create server handle`	The DR daemon could not start up the RPC server. You will see this message only if you manually execute the DR daemon without properly configuring the network services on the domain. Normally, network services spawn the DR daemon in response to an incoming RPC from the SSP.	On the domain, fix the `inetd.conf` entry for the DR daemon.
`Cannot fork: descriptive message`	The DR daemon could not fork a process from which to run its RPC server.	The `descriptive error` message corresponds to an `errno_value` and offers clues as to why the DR daemon could not fork off the RPC server. Check the resource limits and the load of the system to find a way to fix this error.
`Permission denied`	A user other than root tried to run the DR daemon.	Only the superuser (root) can run the DR daemon because the daemon needs all of the root privileges to fully explore the system and to access the driver to detach and attach boards.
`Unable to register (300326, 4)`	The DR daemon was executed without being properly registered with the network services in the domain. The first number represents the RPC number that is registered for the DR daemon. The second number represents the RPC version used by the DR daemon.	On the domain, fix the `inetd.conf` entry for the DR daemon.
`Unable to create (300326, 4) for netpath`	The DR daemon was executed without being properly registered with the network services in the domain. The first number represents the RPC number that is registered for the DR daemon. The second number represents the RPC version used by the DR daemon.	On the domain, fix the `inetd.conf` entry for the DR daemon.

DR Driver Error Messages

The following table contains the DR driver failures that are sent to the system logs and to the SSP applications. In general, refer to the descriptions of the daemon and PSM errors for details about what goes to the system logs and what goes to the SSP.

Note -

All of the possible DR driver failure messages are related to the three probable causes given in the table. Likewise, all of the failure messages have one suggested action.

Table A-6 DR Driver Error Messages


Error Message	Probable Cause	Suggested Action
`DR: Error: initiate_attach: ioctl failedDR: Error: complete_attach: ioctl failedDR: Error: abort_attach: ioctl failedDR: Error: get_cpu_info: ioctl failedDR: Error: get_mem_config: ioctl failed`	An `ioctl()` failure (that is, a failure that was encountered by the DR daemon when it tried to use the DR driver) can occur at three separate levels.At the first level--within the DR daemon, errors occur when the DR daemon and the DR driver are not interacting properly. The driver could be missing; the DR driver files in the `/devices/pseudo` directory could be missing, or the file permissions could be wrong. The DR daemon could also be experiencing memory corruption or resource limitations. The `ioctl()` failure message is followed by a message in the form: `Daemon (errno` `#error_number): error description`.	The context of the `ioctl()` failure (that is, which function precedes the `ioctl()` failed portion of the message), combined with the text of the error message, indicates what failed. Use the error number to identify the probable cause by checking the information on the `ioctl`(2) man page. You can also use the `/usr/include/errno.h` header file if the `ioct`l(2) man page does not have a specific reference for the error number.
`DR: Error: get_mem_cost: ioctl failedDR: Error: get_mem_drain: ioctl failedDR: Error: update_attach: ioctl failedDR: Error: ioctl failed, error draining resourcesDR: Error: detach_board: UNCONFIGURE ioctl failedDR: Error: detach_board: DISCONNECT ioctl failedDR: Error: abort_detach: CANCEL ioctl failedDR: Error: abort_detach: CONFIGURE ioctl failedDR: Error: get_dr_state: ioctl failedDR: Error: get_dr_status: ioctl failed`	At the second level--within the platform independent module (PIM) layer of the DR driver, an `ioctl` failure could indicate busy resources, failing I/O devices on the system board, or improper interaction between the PIM and the platform specific module (PSM) layers. The `ioctl()` failure message is followed by a PIM message in the form: `PIM (error #errornumber):` `errno_description`.At the third level--the PSM layer, an `ioctl()` failure could indicate busy resources, failing I/O devices on the system board, memory detach failures, CPU detach failures, or internal failures encountered by the PSM driver. The error description usually cites specific physical devices that are failing or includes detailed explanations for a memory or CPU detachment failure. The `ioctl()` failure message is followed by a PSM message that appears in the following form: `PSM (error #errornumber):` `errno_description`.Note that failures in the PSM layer do not have corresponding `errno` values. PSM failure messages use an error number. You can find explanations of the error numbers in the `/usr/include/sys/sfdr.h` header file.	The context of the `ioctl()` failure (that is, which function proceeds the `ioctl()` failed portion of the message), combined with the text of the error message, indicates what failed. Use the error number to identify the probable cause by checking the information on the `ioctl`(2) man page. You can also use the `/usr/include/errno.h` header file if the `ioct`l(2) man page does not have a specific reference for the error number.

Memory Allocation Error Messages

The following table contains the memory allocation error messages that are sent to the system logs and to the SSP applications. Although the list contains several error messages, each of them describe one of two possible errors: ENOMEM or EAGAIN. All of the ENOMEM errors have the same suggested action, as do the EAGAIN errors

Table A-7 Memory Allocation Error Messages


Error Message	Probable Cause	Suggested Action
`NGDR Error: malloc failed (add notnet ap info)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (alias_namelen)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (AP ctlr_t array)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (ap_controller)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (board_cpu_config_t) errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (board_mem_config_t) errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (board_mem_cost_t)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (board_mem_drain_t)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (dr_io)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (leaf array)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (leaf)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (net_leaf_array)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (sbus_cntrl_t)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (sbus_config)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (sbus_device_t)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (sbus_usage_t)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. You may have to stop and restart the daemon. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (struct devnm)` `errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (swap name entries)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (swaptbl)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.
`NGDR Error: malloc failed (unsafe_devs)errno_description`	While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The `errno_description` usually describes an ENOMEM or EAGAIN error.	First, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

Platform Specific Module (PSM) Error Messages

The following table contains a list of PSM error messages that are sent to the system logs and to the SSP applications.

Table A-8 PSM Error Messages


Error Message	Probable Cause	Suggested Action
`1 SFDR_ERR_INTERNAL`	An internal driver failed.	None
`2 SFDR_ERR_SUSPEND`	Failed to suspend devices.	None
`3 SFDR_ERR_RESUME`	Failed to resume suspended devices.	None
`4 SFDR_ERR_UNSAFE`	Failed to quiesce the operating system due to referenced suspend-unsafe devices.	Determine the I/O usage of unsafe devices in the domain, and manually suspend the unsafe devices.
`5 SFDR_ERR_UTHREAD`	User thread could not be stopped.	Retry the operation. If this error persists, try stopping the process with the `kill`(1) command.
`6 SFDR_ERR_RTTHREAD`	Realtime thread could not be stopped.	Retry the operation. If this error persists, try stopping the process with the `kill`(1) command.
`7 SFDR_ERR_KTHREAD`	Kernel thread could not be stopped.	Retry the operation. If this error persists, try stopping the process with the `kill`(1) command.
`8 SFDR_ERR_OSFAILURE`	The kernel is not processing DR operations properly for the DR driver.	None
`9 SFDR_ERR_OUTSTANDING`	The `ioctl()` failed because an error from a previous DR drain operation still has not been reported through the DR status command.	Retry the operation.
`11 SFDR_ERR_CONFIG`	The current system configuration will not allow the DR operation to execute.	Check the `/etc/system` file to ensure that memory detach is enabled.
`12 SFDR_ERR_NOMEM`	Not enough memory	None
`13 SFDR_ERR_PROTO`	Protocol failure	None
`14 SFDR_ERR_BUSY`	The device is busy.	Check the I/O usage of the device to determine the cause of this error (for example, a mounted file system or the last path to an AP device). If possible, manually adjust the system to correct this error (for instance, unmount the file system). If the cause of the error is not apparent, contact your Sun service provider.
`15 SFDR_ERR_NODEV`	No devices are present.	None
`16 SFDR_ERR_INVAL`	Invalid argument and/or operation	None
`17 SFDR_ERR_STATE`	Invalid board state (transition)	None
`18 SFDR_ERR_PROBE`	Failed to probe OBP nodes for a board.	None
`19 SFDR_ERR_DEPROBE`	Failed to deprobe OBP nodes for a board.	None
`20 SFDR_ERR_HW_INTERCONNECT`	Interconnect hardware failed.	None
`21 SFDR_ERR_OFFLINE`	Failed to place a CPU offline.	None
`22 SFDR_ERR_ONLINE`	Failed to bring a CPU online.	None
`23 SFDR_ERR_CPUSTART`	Failed to start a CPU.	None
`24 SFDR_ERR_CPUSTOP`	Failed to stop a CPU.	None
`25 SFDR_ERR_JUGGLE_BOOTPROC`	Failed to move the clock-signal CPU.	None
`26 SFDR_ERR_CANCEL`	Could not cancel a RELEASE operation.	Retry the Abort Detach operation after the Drain operation is complete.

DR General Domain Error Messages

The following table contains a list of the general failure error messages that are sent to the system logs and/or to the SSP applications.

Table A-9 DR General Domain Error Messages


Error Message	Probable Cause	Suggested Action
`NGDR Error: Cannot fork() process . . .` `errno_description`	The DR daemon could not fork off a process for the command to run in. A message in the form "running command" appears in the system logs prior to this error message, or any other error message about failed commands.	The `errno_description` offers hints on how to fix the command that you want to run. Also check the man page for the command. It may have an explanation of the error.
`DR Error:` `command` `has continued`	While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, drvconf) to configure the software subsystems.	Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error.
`DR Error:` `command` `stopped by signal` `signal_number`	While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, `drvconf`) to configure the software subsystems.	Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error.
`DR Error:` `command` `terminated due to signal` `signal_number`	While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, `drvconf`) to configure the software subsystems.	Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error.
`DR Error:` `command` `terminated due to signal` `signal_number`. `Core dumped.`	While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, `drvconf`) to configure the software subsystems.	Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error.
`NGDR Error: dr_issue_ioctl: failed closing driver . . .errno_description`	The DR daemon encountered a failure while it tried to close a DR driver entry point. A more detailed explanation of this failure accompanies the error message.	Use the `close`(2) man page and the `errno_description` to determine what caused this error and how to solve it.
`Cannot exec command (errno =` `errno_value`).	The DR daemon could not execute the external command. A more detailed explanation of this failure accompanies the error message.	Check the system logs to determine which command failed. See the `exec`(2) man page for more details about the specified `errno_value`. Use this information to solve the error.
`dr_get_sysbrd_info: NULL parameter`	An invalid pointer was given to the DR daemon during a query of the slot-to-memory address mapping. Either an RPC gave an incorrect value, or the DR daemon called itself with an invalid parameter.	You should gather as much information about this problem as possible from the system logs so that you can determine the cause of the failure. Try stopping and starting the DR daemon and the SSP application. If this error persists, report it to your Sun service representative.
`update_cpu_info: bad board number`	A problem within the DR daemon occurred, causing it to call its internal routines with incorrect values.	You should gather as much information about this problem as possible from the system logs so that you can determine the cause of the failure. You should also report this problem, and if it persists, you may have to stop and restart the daemon.
`WARNING: Failed to update board` `board_number``s modification time [non-fatal].	Updating the board modification time has failed. After a board has been modified (for example, memory or CPUs added), it is probed or deprobed by OBP so that OBP can inform other programs of the change. Then, the modification time is updated.	This error is non-fatal.

OpenBoot PROM Error Messages

The following table contains the list of OpenBoot(TM) PROM (OBP) error messages that are sent to the system logs and/or to the SSP applications.

Table A-10 OpenBoot PROM Error Messages


Error Message	Probable Cause	Suggested Action
`cpu unit without upa-portid [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`OBP_info: bad child units [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`obp_info: bad slot number [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`obp_info: missing sbus name [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`obp_info: missing slot number [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`sbus node without upa-portid [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`sysio_num out of range [non-fatal]`	This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct.	This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible.
`NGDR Error: cannot open /dev/openprom. . . errno_description`	The DR daemon could not open the entry point for the domain OBP information, meaning that no information will appear in the OBP Configuration window. This error is not fatal.	Determine what caused this error by using the `open`(2) man page and the `errno_description`. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: close error on /dev/openprom`	The DR daemon failed to close the entry point for the OBP driver.	Determine what caused this error by using the error messages that preceded this error message. Fix the error if possible.
`NGDR Error: dev/openprom busy. Cannot open.`	The entry point for the domain OBP information was busy, meaning that no information will appear in the OBP Configuration window. This error is non-fatal.	Retry the operation. Check for process that may be keeping the entry point open by using the `ps`(1M) command. Stop any processes that are keeping the entry point open.
`NGDR Error: get_obp_board_config: invalid board state`	Communication protocol was breached over the eligibility of a board when the SSP application tried to query the OBP information for a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the board resources. However, to the DR driver and daemon, the board is not part of the domain.	None
`NGDR Error: OBP config: too many CPUs`	The DR daemon found too many CPUs attributed to a system board in the OBP structures. To OBP, the board has more CPUs than it could possibly have (for instance, five or more).	Ensure that OBP is operating properly. If it is not, reboot the domain.
`NGDR Error: OPROMCHILD. . . errno_description`	An `ioctr()` performed on the OBP driver entry point failed, specifically the `ioctr()` used to walk the child OBP node in the device tree, meaning that the information in the OBP Configuration window will not be complete.	Determine what caused this error by using the `errno_value` or the `errno_description` that accompanies this error message. Fix the error is possible.
`NGDR Error: OPROMGETPROP. . . errno_description`	An `ioctl()` performed on the OBP driver entry point failed, specifically the `ioctl()` used to acquire the OBP properties, meaning that the information in the OBP Configuration window will be incomplete.	Determine what caused this error by using the `ioctl(2)` man page and the `errno_description` that accompanies this error message. Fix the error if possible.
`NGDR Error: OPROMNEXT. . . errno_description`	An `ioctl()` performed on the OBP driver entry point failed, specifically the `ioctr()` used to walk to the next OBP node in the device tree, meaning that the information in the OBP Configuration window will not be complete.	Determine what caused this error by using the `ioctl`(2) man page and the `errno_description` that accompanies this error message. Fix the error if possible.
`NGDR Error: System architecture does not support this option of this command.`	An unsupported option was given to the DR daemon as it walked the OBP tree for the domain, meaning that part of the information in the OBP Configuration window will be incorrect. This error is non-fatal.	None

DR Domain Exploration Error Messages

The following table contains the system exploration error messages that are sent to the system logs and/or to the SSP applications.

Table A-11 DR Domain Exploration Error Messages


Error Message	Probable Cause	Suggested Action
`Cannot open /etc/driver_aliases; dr_daemon may not operate correctly without driver alias mappings . . .errno_description`	The DR daemon made an incorrect decision about the detachability and usage of devices in the domain. It is a non-fatal error.	Analyze what caused this error by using the `errno_description`, and try to correct the error. Look for incorrect file permissions or some kind of resource limit that has been encountered. After you correct the error, you must stop the DR daemon, then restart it so that it attempts to read the driver alias mappings again.
`Cannot open mnttab (errno=errno_value)`	The DR daemon does not allow a detachability test to pass if the `mnttab` file cannot be opened and examined to determine which file systems are mounted. If the test is not stopped, a mounted file system could be detached from the domain.	Analyze the cause of this error by using the `errno_value`, and try to correct the error. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it.
`Cannot open socket (errno=errno_value)This error message is sent only to the system logs.`	The DR daemon could not open a network device. All network devices are opened to test their usage.	Determine what caused this error by using the `errno_value`. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`get_cpu_bindings: can't access /proc filesystem [non-fatal].`	The `/proc` filesystem cannot be opened. When the DR daemon explores the domain to determine the CPU information for a board, the `/proc` filesystem is examined to determine which PIDs, if any, are bound to the CPUs on the board. Bound processes negatively affect the detachability of a board. A complete detach operation will fail if processes are bound to a CPU.	Check to see why the `/proc` filesystem cannot be accessed. In the domain, process binding and processor set management programs, or processor management programs, can be used to manually determine the CPU information for a board.
`get_mem_config: couldn't determine total system memory size; only 1 board counted [non-fatal].`	When the DR daemon tried to count the amount of total memory, it could report only the amount of memory on the selected board, meaning that the system memory field reported by the drshow `board_number` `mem` command is inaccurate. The inaccuracy also negatively affects the eligibility of a board for a Detach operation because if the total memory cannot be calculated, then the effects of removing a board from the domain cannot be calculated as well.	Stop and restart the DR daemon and driver. Report this error, providing as much information from the system logs as possible. A memory leak could also have occured over time. Check the size of the DR daemon by using the `ps`(1) command. The size should be between 300- and 400-Kbytes. If the size is not within this range, stop and start the DR daemon and driver.
`get_net_config_info: interface_name no address (errno=errno_value)`	The DR daemon encountered a failure while it tried to obtain information about a network interface that was configured by using the `ifconfig`(1M) command.	Determine what caused this error by using the `errno_value`, then correct the error.
`getmntent returned error`	The `getmntent`(3c) system call failed because the mount-point entries could not be properly examined. If the mount-point entries cannot be properly examined, a mounted file system could be detached from the domain.	Analyze the `mnttab` file for possible corruption. If any exists, correct it. Also, the DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Finally, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`Host addr for interface_name not found (h_errno=errno_value)`	The file that is needed to test each active network device may not exist, or it may be corrupted. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Use the `errno_value` to determine if the file exists or if it is corrupted, and correct the error as necessary. The file is named `/etc/hostname`.`interface_name`, where `interface_name` is the interface named in the error message.
`Host address field for interface_name is null!!`	The IP address for the primary interface (`interface_name`) is not set properly. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Reconfigure the network setup for the domain. You may need to reboot the domain to configure network devices.
`Host address for interface_name must be internet address.`	The file that is needed to test each active network device may have a corrupted value or an incorrect network address. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Make sure that the hostname file for the primary network interface contains an IP address in the proper form (that is, xxx.xxx.xxx.xxx). The file is named `/etc/hostname.interface_name`, where `interface_name` is the interface named in the error message.
`I/O bus device tree not built.`	This error message continues added information about the `NGDR Error: device tree not built` error message, in which the `libdevinfo` API failed to build the device tree for the system board.	See the `NGDR Error: device tree not built error` message.
`minor_walk: failed to build net leaf.`	This error message continues added information about the `NGDR Error: device tree not built` error message, in which the `libdevinfo` API failed to build the device tree for the system board. This message indicates that the `libdevinfo` API at least started to look at the minor devices for a network leaf node.	See the `NGDR Error: device tree not built` error message.
`minor_walk: failed to build non-net leaf.`	This error message continues added information about the device tree not built error message, indicating that the `libdevinfo` API at least started to look at the minor devices for a non-network leaf node.	See the `NGDR Error: I/O bus device tree not built` error message.
`Partition partition_name does not have parent.`	The device tree is in error because it includes a disk partition that does not have a parent device, such as the disk to which the partition belongs.	A device could be bad, or a reboot may be necessary. If this error continues to appear, report the error to your Sun service representative, providing as much information from the system logs as possible.
Recursive symlink found `symbolic_link_name'. Please remove it.	The DR daemon found a symbolic link as it walked the `/dev` and `/devices` directories. Some symbolic links create a recursive loop. The DR daemon will not allow the detachability test to pass if it finds a symbolic link in one of these directories.	Remove the symbolic link so that the test can be retried.
`swapctl SC_GETNSWP failed (errno=errno_value)`	The `swapct`l(2) system call failed. This system call is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if the use of swap partitions cannot be determined.	Analyze what caused this error by using the `errno_value`, and try to correct it. Use the `swapctl(2)` man page and the `errno_value` to determine why the command failed. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. it should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`Unable to find cwd errno_value`	The DR daemon could not save the current working directory. The daemon switches into the `/dev` and `/devices` directories to produce the real pathnames that correspond to device drivers.	Determine what caused this error by using the `getcmd`(3c) man page and the `errno_value`, then correct the error.
`Unable to find the cwd errno_value`	The DR daemon could not determine the name of the driver directory. The daemon switches into the `/dev` and `/devices` directories to produce the real pathnames that correspond to device drivers.	Determine what caused this error by using the `getcmd`(3c) man page and the `errno_value`, then correct the error.
`Unable to get swap entries (errno=errno_value)`	The `swapctl`(2) system call failed. This system called is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if swap partitions cannot be determined.	Analyze what caused this error by using the `swapctl`(2) man page and the `errno_value`, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`Unable to lstat devlink_file errno_value`	The `lstat`(2) system call failed when it encountered the `devlink_file`, where `devlink` is the name of the symbolic link in the `/dev` directory.	Determine what caused this error by using the `lstat`(2) man page and the `errno_value`. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`Unable to open hostname_file (errno=errno_value)`	The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Analyze what caused this error by using the `open`(2) man page and the `errno_value`, and try to correct it. Look for incorrect file permissions or non-existent files. The `hostname_file` value consists of a file named `/etc/hostname`.`ifname`, where `ifname` is a device name, such as `hme0` or `le0`.
`Unable to read host name from hostname_file`	The file that is needed to test each active network device could not be read. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Ensure that the file has the correct permissions and that it has not been corrupted.
`Unable to readlink devlink_file errno_value`	The `readlink`(2) system call failed when it encountered the `devlink_file`, where `devlink` is the name of the symbolic link in the `/dev` directory.	Determine what caused this error by using the `readlink`(2) man page and the `errno_value`. The DR daemon may have encountered a resource limit. If so, stop the daemon, then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`Unable to restore cwd errno_value`	The DR daemon was unable to change back to the original directory after it changed into `/dev` or `/devices` directory. The DR daemon changes into the `/dev` and `/devices` directories to explore the relationships of the device driver with other drivers.	This error should not pose a problem for the domain, but you should determine what caused the error by using the `errno_value`.
`Unable to set cwd errno_value`	The DR daemon could not change into the `/dev` and `/devices` directories. The daemon switches into these directories to produce the real pathnames that correspond to device drivers.	Determine what caused this error by using the `chdir`(2) man page and the `errno_value`, then correct the error.
`unknown node type`	The device tree was built incorrectly. Several functions create the device tree for a system board by using the `libdevinfo` API, and searches the `/dev` and `/devices` directories. After the tree is constructed, it is passed on to the `rpc_info()` function, which builds the tree, performs some verifications, then translates the tree into a structure that can be returned from an RPC.	Check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error, report this error to your Sun service representative, providing as much information form the system logs as possible.
`utssys failed (errno_value) for mount_point`	The `utssys()` system call failed. This system call is used to determine the usage count for a mounted partition. The DR daemon will not allow the detachability test to pass if the usage count cannot be determined.	Analyze what caused this error by using the `errno_value`, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.
`walk_dir: dirlist buffer overflow.`	As it walked the `/dev` and `/devices` directories, the DR daemon encountered too many directories, causing a buffer overflow. If this message occurs, detection of or protection against recursive symbolic links is disabled.	Check the `/dev` and `/devices` directories for recursive symbolic links. Remove any recursive symbolic links that you find.
`walk_dir: tpath buffer overflow. target_path, device_name`	The DR daemon cannot add another directory to the `target_path`. The daemon walks the `/dev` and `/devices` directories to discover device name links so that it can add them to the target path. If the daemon encounters this limit, it cannot explore any more directories because the buffer is full. If the daemon stops it search, some of the devices will not appear in the views (DR daemon and SSP) of the domain device tree. You may also see improper autoswitching of AP devices if this error occurs.	Devices that are not added to the target path must be manually unconfigured and switched to other boards in the domain. You may also need to stop any daemon that is keeping a device open.
`WARNING: cannot check for cvc/ssp interface.`	The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it corresponds to the SSP network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine the SSP network interface. If the network loses the SSP network interface during a detach operation, DR operations are disabled in the domain, and `netcon`(1M) sessions are disabled.	Switch the suspected interface to a redundant network connection on another board. You may have to reboot the domain to recover from this error.
`WARNING: Cannot check for primary interface`	The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.	Determine which board hosts the primary network interface and re-attach the board to the domain. Or, switch the interface to a redundant network connection on another board in the domain. You may have to reboot the domain to recover from this error.
`WARNING: Cannot determine if interface_name_instance is cvc/ssp interface. SIOCGIFNETMASK errno=errno_value`	The DR daemon failed to obtain the necessary information to test an active network interface to determine if it is the SSP connection. While the network devices are examined, each active network device is tested to determine if it is the SSP connection for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the SSP connection for the domain. If the network loses the SSP connection during a DR Detach operation, DR operations and `netcon`(1M) sessions are disabled.	Switch the network interface (`interface_name`) to another board. If you cannot correct this error, you may have to reboot the domain.
`WARNING: cannot stat device_name errno=errno_value`	The `stat`(2) system call cannot access the `/dev` entry point for a device in the system device tree.	Use the `stat`(2) man page and the `errno_value` to determine why the file `device_name` could not be accessed.
`NGDR Error: Bad page size from sysconf . . . errno_description`	The `sysconf`(3c) system call returned an incorrect value for the system page size, meaning that the system call is broken or that it is not providing a required feature. This error may also explain why queries for memory information or detachability tests are failing due to incorrect reporting of memory sizes.	Use the `sysconf`(3c) man page and the `errno_value` to determine the cause of the error.
`NGDR Error: device tree not built.`	The `libdevinfo` API failed to build the device tree for the system board. More detailed information about this error accompanies the error message.	Make sure that the correct version of the `libdevinfo` API is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service representative.
`NGDR Error: dr_get_partn_cpus: cannot get cpu's partition . . . errno_description`	The DR daemon tried to use the `pset_assign`(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.	Use the `pset_assign`(2) man page and the `errno_description` to determine and correct the cause of this error.
`NGDR Error: dr_get_partn_cpus: failed to get cpu partition info . . . errno_description`	The DR daemon tried to use the `pset_info`(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.	Use the `pset_info`(2) man page and the `errno_description` to determine and correct the causes of this error.
`NGDR Error: dr_page_to_kb: page size smaller than a KB`	A math error occurred, or an incorrect memory value was used in a memory calculation.	Report this error to your Sun service representative.
`NGDR Error: get_board_config: invalid board state`	A communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain. However, to the DR daemon and driver, the board is not part of the domain.	Stop and start the DR application, then retry the operation. If the error persists, use the `kill`(1M) command to stop the DR daemon, then start the DR daemon and retry the DR operation.
`NGDR Error: get_board_config: invalid flag`	The SSP passed an invalid or unsupported flag to the DR daemon when the daemon tried to ascertain the configuration of a board.	Make sure that the version numbers match for the SSP and the DR daemon. Also, check the size of the daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to recover from this error.
`NGDR Error: libdevinfo failed.`	The initial routine used to open the `libdevinfo` API failed, so the DR daemon could not explore the device tree for that board. The `libdevinfo` API builds a tree of dev-info nodes for a board as part of the DR daemon's exploration of the domain devices and their usage. The tree is required by AP and DR operations to test the detachability of a board I/O devices. It is also used to inform the user of what devices are on what system boards.	Make sure that the correct version of the `libdevinfo` is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service provider.
`get_cpu_info: cpu state info is incomplete [non-fatal].`	The DR daemon could not gather the states of the CPUs (either online or offline). Therefore, the information about each CPU in the CPU Configuration window will not be accurate.	None
`NGDR Error: build_rpc_info: bad slot number`	The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the `/dev` and `/devices` directories and by using the `libdevinfo` API. After the tree is built, it is passed to the `build_rpc_info`() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.	Check the size of the DR daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: build_rpc_info: device address format error`	The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the `/dev` and `/devices` directories and by using the `libdevinfo` API. After the tree is built, it is passed to the `build_rpc_info`() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.	Check the size of the DR daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: build_rpc_info: I/O bus node address format error`	The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the `/dev` and `/devices` directories and by using the `libdevinfo` API. After the tree is built, it is passed to the `build_rpc_info()` function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.	Check the size of the DR daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: build_rpc_info: psycho number out of range`	The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the `/dev` and `/devices` directories and by using the `libdevinfo` API. After the tree is built, it is passed to the `build_rpc_info()` function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.	Check the size of the DR daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: build_rpc_info: sysio number out of range`	The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the `/dev` and `/devices` directories and by using the `libdevinfo` API. After the tree is built, it is passed to the `build_rpc_info()` function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.	Check the size of the DR daemon by using the `ps`(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

Unsafe-Device Query Error Messages

The following table contains the list of unsafe-device query failure error messages that are sent to the system logs and/or to the SSP applications.

Table A-12 Unsafe-Device Query Error Messages


Error Message	Probable Cause	Suggested Action
`unsafe_devices: couldn't determine name of unsafe device major_number`	The mechanism that the DR daemon uses to combine a driver name with a major number failed so that no name could be discovered. If this failure occurs, the DR daemon constructs a string for the device, marking it as "(unknown, `major_number`)".	This message notifies the user that the DR daemon was unable to find the name of one of the devices, but it does not constitute a correctable error. The daemon can use the major number to identify the drive.
`WARNING: board board_number not checked for unsafe devices.`	While the DR daemon was examining the system boards for unsafe devices, the daemon encountered a failure that prevented it from examining one of the system boards (`board_number`). This error message may be indicative of a more serious problem.	You may have to stop and restart the DR daemon to recover the domain from this error. Check the size of the DR daemon. If should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error, you should report this error to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: unsafe_devices: libdevinfo failed.`	The DR daemon cannot determine the names of unsafe major devices because it cannot use the `libdevinfo` API. This API must be used to search the device tree for the names of all of the unsafe major devices.	Make sure that the domain contains the correct version of the `libdevinfo` API and that the domain does not contain version mismatches between any of the DR daemon's libraries, the operating environment on the domain, or the daemon itself. If you cannot determine the cause of this error, report it to your Sun service representative, providing as much information from the system logs as possible.
`NGDR Error: create_ctlr_array: count mismatch [internal error]`	Communication protocol was breached over the existence of AP controllers. To the AP librarian, the domain has a certain number of AP controllers. However, to the DR daemon, the domain has a different number of AP controllers.	Check to determine the correct amount of AP controllers in the domain, and correct the error. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it.

Alternate Pathing (AP) Error Messages

The following table contains the list of Alternate Pathing error message that are sent to the system logs and/or to the SSP applications.

Table A-13 AP-related Error Messages


Error Message	Probable Cause	Suggested Action
`add_net_ap_info: multiple AP aliases ignored`	An AP device has multiple AP aliases. Only one alias is used. The other aliases were ignored. This is not an error.	If this error persists, remove all but one of the AP aliases.
`AP daemon call failed: error_message OR error = error_number`	An attempt to notify and/or query the AP librarian failed.	A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the `ap_daemon`(1M) man page for more details about this error.
`AP daemon comm init failed: error_message OR error = error_number`	The DR daemon encountered a failure when it tried to establish a channel of communication with the AP librarian.	A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the `ap_daemon`(1M) man page for more details about this error.
`AP daemon query failed: error_message OR error = error_number`	The DR daemon could not successfully query the AP librarian on the usage of a specific I/O controller.	A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the `ap_daemon`(1M) man page for more details about this error.
`AP daemon query failed: length mismatch`	The DR daemon queried the AP librarian about the usage of a specific I/O controller, but the response was incorrect.	A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the `ap_daemon`(1M) man page for more details about this error.
`Cannot find physical device for AP_alias. This error message is sent only to the system logs.`	The physical device name that corresponds with the AP alias could not be found. AP may be confused about the device name, or the `/dev` and `/devices` directories are incomplete.	Make sure that AP works properly. Check to see if all of the device entries are present in the `/dev` and `/devices` directories. If they are not present, add them the to the appropriate directories.
`create_ap_net_leaf: interface instance not found`	The DR daemon tries to match the AP meta-network interfaces with the physical device they represent. This error indicates that the DR daemon could not successfully match a network interface with the physical device it represents for this board.	Make sure that AP works properly if you observe abnormal behavior regarding the availability of devices during and after DR operations. If this error persists, report it to your Sun service representative with as much information from the system logs as possible.
`dr_ap_notify: unknown state state_number`	The DR daemon called one of its internal functions with a bad value. However, this error is indicative of a more serious problem.	Report this error to your Sun service representative with as much information as possible from the system logs.
`dr_daemon operating in NO AP interaction mode`	The AP software is not working, or it is not installed. This message means that the DR daemon will not notify AP about attach and detach operations.	Ignore this error if you do not have AP installed. If it is installed, make sure that it is properly installed and that the AP software version is compatible with the version of the DR daemon that is running in the domain.
`init_ap_rpc: Unable to get hostname`	The `uname(2)` system call returned a null hostname. Consequently, the DR daemon could not establish a connection to the AP librarian.	None

DCS Error Messages

The following table contains DCS error messages that are sent to the console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-14 DCS Error Messages


Error Message	Probable Cause	Suggested Action
`DCS ERROR: permission denied`	Only the superuser on the domain can run the DCS.	Check the `inetd.conf` file on the domain to ensure that the DCS is started with superuser UID.
`DCS ERROR: internal error: operation: error_description`	An internal error occurred within the DCS.	Use the `error_description`, which corresponds with the `errno_value` to diagnose the error. The operation field refers to the function call that caused the error.
`DCS NOTICE: unrecognized error reported`	The DCS reported an unknown error condition.	Use the log file on the domain to help determine what caused the error.
`DCS ERROR: network initialization failed`	The DCS failed to initialize the network connection used to accept DR requests from the DCA.	Retry the DR operation.
`DCS ERROR: failed to acquire reserved port`	The DCS uses port 665, which is reserved through sun-dr. The error occurred because another process is using the port.	Determine if another process is still using the port. If so, kill the process, if possible, then retry the DR operation.
`DCS ERROR: connection attempt failed`	The DCS failed to establish a connection with the DCA.	Retry the DR operation.
`DCS ERROR: unable to receive message`	The DCS failed to receive a message from the DCA.	Retry the DR operation.
`DCS ERROR: unable to send message for operation_name operation`	The DCS failed to send a message to the DCA.	Retry the DR operation.
`DCS NOTICE: sun-dr service not found, using reserved port 665`	The DCS failed to find the sun-dr service in `/etc/services`.	None
`DCS NOTICE: client disconnected`	The client unexpectedly disconnected.	None
`DCS ERROR: unknown operation requested`	The DCA requested an operation that is not recognized by the DCS.	Retry the DR operation.
`DCS ERROR: operation failed`	The current DCS operation failed to complete. The DR operation could have succeeded, if the DCS only failed to send the results to the DCA.	Check the status of the operation manually. If the DR operation did not succeed, retry the operation.
`DCS ERROR: invalid session establishment sequence`	The session establishment sequence, the initialization handshake, between the DCA and the DCS failed.	Retry the DR operation.
`DCS ERROR: operation_name operation issued before session established`	A DR operation was requested before the session was established.	Retry the DR operation.
`DCS ERROR: received an invalid message`	The DCS received unexpected information in the message.	Retry the DR operation.
`DCS NOTICE: confirm callback failed, aborting operation`	The DCS was able to display the confirmation prompt to the user.	None
`DCS NOTICE: message callback failed, continuing`	The DCS was unable to display a message to the user.	None
`DCS NOTICE: retry value invalid (retry_value)`	The value given for the `retry_value` was invalid, so the operation proceeded with the retry value set to zero.	None
`DCS NOTICE: timeout value invalid (timeout_value)`	The value given for the `timeout_value` was invalid, so the operation proceeded with the retry value set to zero.	None
`DCS INFO: retrying operation, attempt attempt_number`	The DCS is retrying the operation. The `attempt_number` field represents the current attempt.	None
`DCS ERROR: failed to start a new session handler`	The DCS failed to start a concurrent session handler to process the incoming DR request.	Retry the DR operation.
`DCS ERROR: abort attempt of session, session_id, unsuccessful`	The DCS failed to abort session, `session_id`.	Retry the abort request.
`DCS ERROR: unsupported message protocol version: version_number`	The DCS does not support the reported protocol version, `version_number`.	Check the DR software on the domain and on the SSP. Reinstall the proper version of the software on domain if they are not compatible.
`DCS INFO: session aborted`	The current DR operation was aborted by the user.	None
`DCS ERROR: illegal option option, exiting`	The DCS was passed the illegal option `option`.	Check the `inetd.conf` file on the domain and remove the illegal option from the entries for the DCS.
`DCS NOTICE: illegal argument to option flag (argument), action`	The option `option` was given the illegal argument `argument`. The DCS will perform the action specified by `action`.	Check the `inetd.conf` file on the domain and fix the entries for the DCS.
`DCS ERROR: resource info init error (error_code)`	The DCS failed to initialize the module responsible for providing resource usage information.	Retry the operation.

DR Driver Error Messages

The following table contains DR driver error messages that are sent to the console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-15 DR Driver Error Messages


Error Message	Probable Cause	Suggested Action
`dr: Internal error: dr.c line_number`	An internal error has occurred in the DR driver.	Retry the operation that failed. If the error persists, exit and restart various DR software components, then retry the operation. If The problem still persists, reboot the domain. Check the console or the system logs for any additional information.
`dr: Insufficient memory: resource`	The DR framework was unable to configure or unconfigure resources because a KPHYSM_ERESOURCE error or `cpu_configure()`/`cpu_unconfigure()` error with the ENOMEM `errno` occurred.	This condition might be transient. Retry the DR operation. If the error persists and if the operation that is failing is the unconfigure operation, then try configuring more memory into the domain from a different domain. If the error still persists, reboot the domain.
`dr: Device busy: resource`	Translation of possible EBUSY errno from `cpu_configure()` or `cpu_unconfigure()`; or an I/O device cannot be detached because it is busy. This error message is also returned if a CPU to be detached is online when `dr_pre_detach_cpu` is called. A CPU cannot be detached while a delete memory operation is in progress.	Use `showdevices`(1M) on the system controller to find out why the resource is busy. Or, on the domain, use `fuser`(1M), `psrinfo`(1M), `prtdiag`(1M), or similar tools to find out why the device is busy. Also check if another memory deletion is already in progress. Either reconfigure or shutdown whatever is consuming the resource, or wait for the previous memory deletion to complete depending, on the cause of the error. Then retry the DR operation.
`dr: Operation already in progress: resource`	Translation of possible EALREADY errno from `cpu_configure()` or `cpu_unconfigure()`.	Use `showdevices`(1M) on the system controller to examine the configuration of the specified resource. Or, on the domain, use `cfgadm`(1M), `pbind`(1M), `psrinfo`(1M), and similar commands to examine the configuration of the resource. Determine what operations are already in progress on this resource, and either wait for them to complete or cancel them. Then, retry the DR operation. The operation already in progress may already have terminated, so retrying to the operation might succeed, or may produce another error.
`dr: I/O error: resource`	An unexpected error code resulted from a call to `kphysm_del_start`. A more verbose `cmn_err message` is also printed.	Check the verbose error message from `cmn_err` in the system logs, and/or on the console for a more specific condition and suggested action.
`dr: Bad address: resource`	`kphysm_add_memory_dynamic` returned `KPHYSM_EFAULT`.	Retry the DR operation. If this error persists, contact your Sun Service representative.
`dr: No device(s) on board: board_path`	The board is connected or disconnected with no devices (I/O, memory, or CPU).	If devices were expected to be on the board, then disconnect it. The board should be removed from the server, and its components should be reseated by a qualified technician.
`dr: Invalid argument: attachment_point`	DR was passed an invalid argument.	Retry the DR operation. If this error persists, contact your Sun service representative.
`dr: Invalid state transition: attachment_point`	A DR operation was sequenced out of order. This could be operator error if the `cfgadm`(1M) functions were issued out of order. Or, the DR driver could be confused due to some internal error conditions.	Retry the DR operation. If this error persists, stop and start (or unload and load) DR software components to recovery from this error condition. If the error continues to persist, reboot the domain.
`dr: Device in fatal state`	The device could not be suspended, or it refused to be suspended.	Retry the DR operation. If this error persists, the device could be suspend-unsafe. Check the list of suspend-unsafe devices. If the device is unsafe, use `showdevices`(1M) or `fuser`(1M) to how the device is in use, and manually reconfigure the consumers of the resource. Then, manually unload the driver, or if needed, unplug the cables attached to the device. The device should now be safe to retry the operation. Do not plug the cables back into the device, reload its driver, or reconfigure its consumers before the DR operation has succeeded.
`dr: Device failed to resume: path`	A previously suspended device could not be resumed.
`dr: Cannot stop user thread`	DR could not stop a user thread(s) in preparing a device to be suspended.	Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. You might have to kill the threads to enable the DR operation to proceed.
`dr: Cannot quiesce realtime thread`	A realtime thread was encountered in an attempt to suspend the operating system. Suspending, or quiescence, of realtime threads is not allowed. All realtime threads must be stopped or changed to non-realtime before a suspend can succeed.	Kill the realtime thread(s), or adjust their priority by using the `priocntl`(1M) command. (You must obtain the PID to adjust the priority of realtime threads.)
`dr: Cannot stop kernel thread: name`	DR could not stop a kernel thread.	Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. Kill the kernel threads, if possible, to enable the DR operation to proceed.
`dr: Failed to off-line: cpu`	A CPU could not off-lined, preventing it from being unconfigured. The CPU might have a thread(s) bound to it. An additional `cmn_err` message is logged if there are threads bound to the CPU. DR must be able to off-line CPUs and/or to power off CPUs before the board can be disconnected.	Check the console and system log messages to determine if threads are bound to the CPU. If they are, they can be manually unbound or rebound to CPUs on other boards in the domain. If threads are not bound to the CPU, use `psrset`(1M), `pbind`(1M), and `psrinfo`(1M) to determine what changes are required to enable DR to off-line the CPU. For example, you might have to add more CPUs to the domain from different boards. Or, you may have to online other CPUs. Finally, you might have to add more CPU boards to take over the CPU workload.
`dr: Failed to on-line: cpu`	DR could not online a CPU on a newly-connected or previously unconfigured board.
`dr: Failed to start CPU: cpu`	DR could not start a CPU on a newly-connected or previously unconfigured board.
`dr: Failed to stop CPU: cpu`	DR could not power off a CPU on a board to be unconfigured. All of the CPUs on a board to be unconfigured must be taken offline and powered off before the operation can succeed.
`dr: Kernel cage is disabled: resource`	When the kernel cage is disabled, boards hosting permanent memory cannot be detached.	You must enable the kernel cage in `/etc/system` and reboot the domain.
`dr: No available memory target: resource`	DR could not detach the board because it hosts permanent memory and there is no available target for the memory. Permanent memory must be moved to another memory component within the domain before the DR operation can succeed.	Configure an additional memory component that contains an adequate amount of memory to act as a target for this board. Then, retry the DR operation.
`dr: VM viability test failed: resource`	Translation of error code returned by `kphysm_del_start`.	Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.
`dr: kphysm_pre_del failed: resource`	Translation of error code returned by `kphysm_del_start`.	Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.
`dr: Non-relocatable pages in span: resource`
`dr: kphysm_del_cancel: resource`
`dr: Memory operation failed: resource`	DR failed to attach the memory on a newly attached board.
`dr: Can't unconfig cpu if mem online`	DR cannot unconfigure a CPU if the memory on the board is online.	You must off-line the memory before you can unconfigure the board.
`ngdrmach: Cannot read property value: Device Node node_address: property property_name`	DR could not get the specified property of a particular device node.
`ngdrmach: Cannot determine property length: board::slot:property`	DR could not get the length of the specified property for a particular device node.
`ngdrmach: No CPU specified for connect: slot`
`ngdrmach: Cannot move SIGB assignment`
`ngdrmach: Cannot disconnect CPU; SIGB is currently assigned: slot::board`
`ngdrmach: Device driver failure: path`
`ngdr: Must specify a CPU on the given board: cpu_id`
`ngdrmach: No such device: board::slot`
`ngdrmach: Memory configured with inter-board interleaving: board::slot`
`ngdrmach: Invalid board number: board_number`	An invalid board number was specified for the assign board operation.	Use a different board number, or fix the available components list on the system controller for the domain to include the board for which the assign function is failing.
`ngdrmach:: Cannot proceed; Board is configured or busy: component_name`	DR cannot power off or unassign a board that is still configured or busy.	Unconfigure the board, or wait for any previous DR operations on the board to complete. Then, retry the DR operation
`ngdrmach: Firmware probe failed: attachment_point`	OBP failed to probe the board.
`ngdrmach: Firmware deprobe failed: attachment_point`	OBP failed to deprobe the board.
`ngdrmach: Operation not supported`	The operation you attempted is not supported.	None
`ngdrmach: Unrecognized platform command: command/options`	An unrecognized command was passed to DR.	Refer to the `cfgadm_sbd`(1M) man page to ensure that you use a valid argument. If you used a valid argument and this error persists, contact your Sun service representative.
`ngdrmach: drmach parameter is not a valid ID`	An invalid `drmachid_t` value was encountered.
`ngdrmach: drmach parameter is inappropriate for operation`	The wrong type of `drmachid_t` was passed to a function.
`ngdrmach: Unexpected internal condition: drmach.c line_number`	An internal drmach error occurred.	Use `modunload`(1M) and `modload`(1M) to unload then to load the drmach driver. Then, retry the DR operation. If this error persists, then you must reboot the domain.
`ngdrmach: No CPU specified for connect.`
ngdrmach: Firmware move_cpu0 failed: CPU cpu_id
ngdrmach: Cannot move SIGB assignment
ngdrmach: Cannot disconnect CPU; SIGB is currently assigned

Plugin Error Messages

The following error messages are generated by the libcfgadm system board plugin. They are sent to the netcon(1M) console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-16 Plugin Error Messages


Error Message	Probable Cause	Suggested Action
`Configuration operation cancelled: command ap_id`	You did not confirm a configuration operation that requires confirmation.	See the `cfgadm`(1M) and/or the `cfgadm_sbd`(1M) man page for more information about which configuration operations require confirmation.
`Hardware specific failure: command ap_id: error: resource`	A system error occurred during the execution of the command. The error message, `error`, can be a standard error (that is, an `errno`), or it can be a more specific error message that is returned by the DR driver (see the DR driver error messages for more information about DR driver errors). The name of the resource, `resource`, that is causing the error (for example, a busy device) can also be returned by the DR driver.	For busy devices, identify and remove the usage of the device. For other errors, refer to the driver's documentation for possible recovery actions.
`Library Error: command invalid: command`	The specified command is invalid for system boards.	Refer to the `cfgadm_sbd`(1M) man page for a list of valid commands.
`Library Error: command not supported: command ap_id`	The specified command is not supported for the specified attachment point. For example, the connect operation is not allowed for CPUs.	Refer to the `cfgadm_sbd`(1M) man page for a list of supported commands.
`Library Error: command aborted: command`	You aborted the command.	N/A
`Library Error: option invalid: option`	The specified option, `option`, is invalid.	Refer to the `cfgadm_sbd`(1M) man page for a list of the valid options.
`Library Error: option requires value: option`	The specified option, `option`, requires a value.	Refer to the `cfgadm_sbd`(1M) man page for a list of the option values.
`Library Error: option requires no value: option`	The specified option, `option`, does not require a value.	Refer to the `cfgadm_sbd`(1M) man page for a list of options that do not require values.
`Library Error: option value invalid: option value`	The specified value, `value`, for the option, `option`, is invalid.	Refer to the `cfgadm_sbd`(1M) man page for a list of valid option values.
`Library Error: attachment point invalid: ap_id`	The specified attachment point, `ap_id`, could not be parsed correctly. This error is rare and could indicate an internal error.	Refer to the `cfgadm_sbd`(1M) man page for a list of valid attachment points. If this error persists, contact your service representative.
`Library Error: component invalid: ap_id`	The specified component, `ap_id`, is invalid.	Refer to the `cfgadm_sbd`(1M) man page for a list of valid dynamic attachment points.
`Library Error: sequence invalid: command (rstate ostate) ap_id`	The specified command, `command`, is invalid for the receptacle and/or occupant state of the specified attachment point. For example, trying to connect an empty slot results in an invalid sequence error.	Refer to the `cfgadm_sbd`(1M) man page for a list of valid operations.
`Library Error: offline ap_id (path): error`	The Reconfiguration Coordination Manager (RCM) failed to take the resource, `ap_id`, offline. The error message, `error`, returned by the RCM will indicate the reason for the failure. Usually, the reason is a busy device.	For busy devices, identify and remove the usage of the device.
`Library Error: suspend ap_id (path): error`	The Reconfiguration Coordination Manager (RCM) failed to suspend the resource, `ap_id`. The error message, `error`, returned by the RCM will indicate the reason for the failure. Usually, the reason is a busy device.	For busy devices, identify and remove the usage of the device.
`Library Error: not enough memory`	The plugin operation failed due to a lack of memory.	Check the memory usage.
`Library Error: change signal disposition failed`	The plugin failed to set up the signals before it started the DR operation.	None
`Library Error: cannot get RCM handle`	The Reconfiguration Coordination Manager (RCM) failed to initialize.	None
`Library Error: cannot open library: error`	The Reconfiguration Coordination Manager (RCM) library, `library`, was found, but an error occurred when it was opened. The error message, `error`, will be returned by the `dlopen`(3DL).	Check for proper installation of the RCM.
`Library Error: cannot find symbol symbol in library`	A required symbol, `symbol`, was not found in the Reconfiguration Coordination Manager (RCM) library, `library`.	Check for proper installation of the RCM.
`Library Error: cannot stat library: error`	The Reconfiguration Coordination Manager (RCM) library, `library`, exists, but the `stat`(2) function failed to get the file status. The error message, `error`, will be returned by the Solaris operating environment.	None