Sun Enterprise 10000 Dynamic Reconfiguration User Guide

Appendix A DR Error Messages

This appendix contains a list of some of the error messages that you might see while you are performing DR operations. The list does not include Protocol Independent Module (PIM) layer errors, which are more generic than the error messages in the following tables.

All DR error messages are sent to the one or both of the following locations:

Searching This Appendix

Before you use this appendix, take time to read the following list of search tips so that you can find a specific message.

Error-Type Links

The following are different types of errors:

SSP Errors

Use one of the following links to start your search of SSP-related error messages:

"Protocol and Communication Error Messages"

"Attach-Related Error Messages"

"Detach-Related Error Messages"

"Auto-Configuration Error Messages"

Domain Errors

Use one of the following links to start your search of domain-related error messages:

"DR Daemon Start-Up Error Messages"

"Memory Allocation Error Messages"

"DR Driver Error Messages"

"Platform Specific Module (PSM) Error Messages"

"DR General Domain Error Messages"

"DR Domain Exploration Error Messages"

"OpenBoot PROM Error Messages"

"Unsafe-Device Query Error Messages"

"Alternate Pathing (AP) Error Messages"

"DCS Error Messages"

"DR Driver Error Messages"

"Plugin Error Messages"

SSP Error Messages

The following sections contain SSP-related error messages:

Protocol and Communication Error Messages

The following table contains the protocol and communication error messages that are sent to the system logs and/or the SSP applications.

Table A-1 Protocol and Communication Failure Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGNGDR Error: abort_attach_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGNGDR Error: abort_detach_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGNGDR Error: attach_finished: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: complete_attach_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: cpu0_move_finished: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: detach_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: detach_finished: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: detachable_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: drain_board_resources: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: get_board_config: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input or catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: get_board_state: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: get_cpu_info: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: get_obp_board_config: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: initiate_attach_board: invalid board number

The RPC is attempting to perform a DR operation on a board number that is not in the range of valid numbers. The DR applications carefully filter the user input to catch out-of-range board numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: initiate_attach_board: invalid cpu number

The RPC is attempting to perform a initiate an attach of a board that contains a CPU that is not on the board. The DR applications carefully filter the user input or catch invalid CPU numbers before they send the RPC. Therefore, this error indicates a breakdown on the SSP or in the network connection to the SSP. Or, it indicates an incompatibility between the SSP applications and the DR daemon. 

Check the SSP network connection and/or the SSP and DR applications to ensure that they are operating properly. 

NGDR Error: Unauthorized RPC call . . . Not owner

The DR daemon received an RPC that failed authentication. 

Check the system log for more information about this error. Also, make sure that the version numbers match for the SSP and the DR daemon and that the SSP user and network services are properly configured. 

Attach-Related Error Messages

The following table contains attach-related failure error messages that are sent to the system logs and/or the SSP applications.

Table A-2 Attach-Related Failure Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGDR Error: abort_attach_board: invalid board state

The attach operation could not be aborted because the board is not in the init_attach state, awaiting to be configured into the domain. 

Wait for the board to enter the init_attach state. Only then can the attach operation be aborted. 

NGDR Error: attach_finished: invalid board state

Communication protocol has been breached over the state of the attach operation. The DR driver and daemon disagree with the SSP that the board was waiting for the confirmation of the attach operation from the SSP. 

Exit and restart the current DR application, then retry the operation. If this error persists, stop and restart the DR daemon. You may need to reboot the domain to recover from this error. 

NGDR Error: Cannot abort attach. Board ineligible for further DR operations.

The board entered the FATAL state after the abort command was issued, causing the abort operation to fail and the board to be lost from the system. 

Reboot the domain. 

dr_attach: failure executing A3000 hot_add script . . . error message

The Sun(TM) StoreEdge(TM) A3000 hot_add script is executed directly after a DR attach operation. If the script exists, but it cannot be executed, the error message explains why.

If you are not using, nor plan to use, A3000 devices, you can rename the script so that it will not be found. 

initiate_attach_board: already init_attached

You attempted to initiate the attach of a board that was already initiated. 

Go to the complete attach window and continue the attach process. 

NGDR Error: complete_attach_board: invalid board state

You tried to initiate an attach operation on a board that is not eligible--the board is not in the init_attach state awaiting attachment to the domain. 

Wait for the board to enter the init_attach state. Only then can the attach operation be aborted. 

NGDR Error: initiate_attach_board: invalid board state

You tried to initiate an attach operation on a board that is not eligible--the board is not in the PRESENT state awaiting attachment to the domain. 

Wait for the board to enter the init_attach state. Only then can the attach operation be aborted. 

NGDR Error: Some devices not attached. Examine the host syslog for details . . . errno_description

Some of the devices were not configured into the domain. 

Look at the system logs for more details about what devices were not configured into the domain and why they were not configured. Some devices on the board may not be supported by the operating environment or by the DR feature. You should blacklist unsupported devices. 

Detach-Related Error Messages

The following table contains detach-related error messages that are sent to the system logs and/or to the SSP applications.

Table A-3 Detach-Related Failure Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGDR Error: Cannot detach board board_number. It has interface_name interfaces configured.

The board is not eligible to be detached because it has one or more network interfaces attached to it that are critical to the operation of the domain. The network interfaces can be any mix of primary, SSP, AP, or PBF interfaces. 

Use the ifconfig(1M) command to determine the role of the interface(s). If the configured interface is the primary network or the SSP, manually switch the interface to the alternate interface if one exists. For an interface other than the primary and the SSP, unplumbing it may enable the detach operation to succeed. Otherwise, the domain must be shut down, and the interfaces must be moved to another board.

NGDR Error: cpu0_move_finished: invalid board state

Communication protocol has been breached over the eligibility of a CPU. To the SSP, the CPU has been moved off of the board. To the DR driver, the move operation is an invalid operation for that board. 

None 

ifconfig down failed.

The ifconfig(1M) command failed to bring down the network interfaces. The ifconfig(1M) command unplumbs and brings down the network interfaces before the board is detached. One of the network interfaces on the board could be busy, so manual intervention may be needed.

Log in to the domain, and, if possible, bring down the network interfaces on the board manually by using the ifconfig(1M) command with the down option. The manual execution of the command may yield more detailed information about the failure.

ifconfig unplumb failed.

The ifconfig(1M) command failed to unplumb the network interfaces. The ifconfig(1M) command unplumbs and brings down the network interfaces before the board is detached. One of the network interfaces on the board could be busy, so manual intervention may be needed.

Log in to the domain, and, if possible, unplumb the network interfaces manually by using the ifconfig(1M) command with the unplumb option. The manual execution of the command may yield more detailed information about the failure.

Warning: Error return from /opt/SUNWconn/bin/nf_snmd_kill (return_value)

The command failed. Certain daemons keep network interfaces open continuously. Those daemons must be stopped before the devices they control can be detached. 

Analyze the return_value to determine why the kill(1) command failed, and try to correct the problem. If necessary, use the ps(1) command to obtain the PID number for the daemons, and use the kill(1) command to stop the daemons manually.

Warning: Error return from /opt/SUNWconn/bin/pf_snmd_kill (return_value)

The kill(1) command failed. The daemons that are used to control certain network devices must be stopped before the devices can be detached because the daemons keep the interfaces open continually.

Analyze the return_value to determine why the kill command failed, and try to correct the problem. If necessary, use the ps(1) command to obtain the PID number for the daemons, and use the kill(1) command to stop the daemons manually.

NGDR Error: abort_detach: board already drained

The CANCEL ioctl() failed while the DR daemon was trying to abort the detach operation. The failure caused the board to be reported as being in the UNREFERENCED state, indicating that the memory has already been drained.

The board must be completely detached before you can recover from this error. Retry the DR operation after the board has been successfully detached. 

NGDR Error: abort_detach_board: invalid board state

Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain and has been, or is being, drained of its resources. The SSP, therefore, issues the abort command to stop the detach operation. However, to the DR driver and daemon, the board is not part of the domain. 

Exit and restart the DR application. 

NGDR Error: board configuration query failed.

The DR daemon failed to ascertain the eligibility of the configuration of the board. 

Stop and start the DR daemon and/or the DR driver. If this error persists, use the modinfo(1M), modload(1M), and modunload(1M) commands to work with the driver after you have stopped the DR daemon. Also, check the size of the DR daemon with the ps(1) command. If it is not between 300- and 400 Kbytes, report this error, providing as much information from the system logs as possible.

NGDR Error: Cannot abort detach. Board detached from OS (detach completed).

This message indicates that the detach operation has completed. It follows the message that is displayed for the NGDR Error: abort_detach: board already drained error message.

See the NGDR Error: abort_detach: board already drained message.

NGDR Error: couldn't query cpu configuration

The complete_detach operation has failed because the DR daemon could not ascertain the CPU configuration just prior to the beginning of the complete_detach operation. After a board is detached, the DR daemon uses the information about the CPU configuration to update the utmp and wtmp entries for each CPU on the board. Although the complete_attach operation does not depend on the updates, if the mechanisms through which the CPU configuration is queried are broken, serious problems exist, so a completion of the detach operation should not proceed.

Stop and start the DR daemon and/or the DR driver. Also, check the size of the DR daemon with the ps(1) command. If it is not between 300- and 400-Kbytes, report this error, providing as much information from the system logs as possible.

NGDR Error: detach_board: invalid board state

Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, and its resources have been drained, causing the SSP to attempt to complete the detach operation. However, to the DR driver and daemon, the board is not part of the domain. 

Examine the state of the board by using the showdevices(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.

NGDR Error: detach_board: invalid board state

The proper sequence of board states has not been followed, meaning that the board went into the error state or that an earlier failure in the drain-detach sequence of events was not properly reported. 

Examine the state of the board by using the showdevices(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.

NGDR Error: detach_finished: invalid board state

Communication protocol has been breached over the eligibility of a board. To the SSP, the board has been detached. However, to the DR driver and daemon, the board has not been detached from the domain. 

Examine the state of the board by using the showdevices(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.

NGDR Error: detachable_board: invalid board state

Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the resources. However, to the DR driver and daemon, the board is not part of the domain. 

Examine the state of the board by using the showdevices(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.

NGDR Error: detaching board would leave no online CPUs

The detach operation failed because no CPUs would be left online after the board is detached. 

Bring more CPUs online on other boards in the domain, or add more boards with online CPUs to the domain, so that the domain will have enough online CPUs after the board is detached. 

NGDR Error: drain_board_resources: invalid board state

Communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the resources. However, to the DR driver and daemon, the board is not part of the domain. 

Examine the state of the board by using the showdevices(1m) command, and determine the cause of the problem. Retry the drain and/or complete_detach operations to determine if the error is recoverable. Stop and start the DR daemon and driver.

NGDR Error: Remaining system memory (memory_size mb) below minimum threshold (minimum_memory_size mb) . . . .Not enough space

The domain must have enough memory to accommodate the memory of the board that is being detached. The detach operation failed because the domain does not have enough memory to detach the board. 

Attach as many boards as necessary so that the memory in the domain will hold the memory on the board being detached. 

NGDR Error: Some devices not re-attached. Examine the host syslog for details . . . errno_description

Devices could not be reattached to the operating environment during an abort detach operation. Errors were encountered while the DR daemon tried to communicate with the device drivers for one or more devices on the board. 

Examine the system logs to determine which devices were not reattached. If possible, fix the problem then issue the complete_attach(1M) command again to fully configure the board. If this action fails, the failure may be caused by an unsupported device for which a state cannot be resolved until the domain is rebooted.

NGDR Error: sysconf failed (_SC_NPROCESSORS_ONLN) . . . errno_description

The sysconf(3c) system call failed to return the total number of online CPUs in the domain. Thus, the DR daemon cannot determine if the domain would be left with any online CPUs after the board is detached.

See the sysconf(3c) man page for more details about this error. Use those details and the errno_description to diagnose and solve the error. Retry the DR operation after you have solved the error. If no fix is apparent, stop and restart the DR daemon, then retry the DR operation.

Auto-Configuration Error Messages

The following table contains the list of auto-configuration error messages that are sent to the system logs and/or to the SSP applications.

Table A-4 Auto-Configuration Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGDR Error: Complete pending DR operation prior to running autoconfig . . . Invalid argument

The autoconfig(1M) command failed because a DR operation was still pending (that is, the board was not fully detached or attached before you issued the autoconfig(1M) command to reconfigure the operating environment).

Use the showdevices(1M) command to determine the state of the board. Decide to abort or complete the pending operation before you try to use the autoconfig(1M) command to reconfigure the operating environment.

NGDR Error: Could not get /tmp/AdDrEm.lck lock . . . errno_description

The DR daemon failed to get the lock it needs so that it can reconfigure the operating environment. 

Check the additional errno_description and/or error number that is sent with the error message to determine why the lock could not be acquired.

NGDR Error: Could not unlock /tmp/AdDrEm.lck lock . . . errno_description

The DR daemon could not release the lock. 

Check the additional errno_description and/or error number that is sent with the error message to determine why the lock was not released.

NGDR Error: devlinks cmd failed. . . error descriptions

The devlinks(1M) command failed to reconfigure the operating environment.

Check the additional error descriptions and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

NGDR Error: disks cmd failed . . . error descriptions

The disks(1M) command failed to reconfigure the operating environment.

Check the additional error descriptions and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

NGDR Error: drvconfig cmd failed. . . error description

The drvconfig(1M) command failed to reconfigure the operating environment.

Check the additional error description and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

NGDR Error: ports cmd failed . . . error description

The ports(1M) command failed to reconfigure the operating environment.

Check the additional error description and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

NGDR Error: sync cmd failed . . . error description

The sync(1M) command failed to reconfigure the operating environment.

Check the additional error description and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

NGDR Error: tapes cmd failed . . . error descriptions

The tapes(1M) command failed to reconfigure the operating environment.

Check the additional error description and/or error number that is sent with the error message to determine why the command failed. Manually run the command on the domain.

Domain Error Messages

DR Daemon Start-Up Error Messages

The following table contains a list of the DR daemon start-up errors. These messages are sent only to the domain console window.

Table A-5 DR Daemon Start-Up Error Messages

Error Message 

Probable Cause 

Suggested Action 

Cannot create server handle

The DR daemon could not start up the RPC server. You will see this message only if you manually execute the DR daemon without properly configuring the network services on the domain. Normally, network services spawn the DR daemon in response to an incoming RPC from the SSP. 

On the domain, fix the inetd.conf entry for the DR daemon.

Cannot fork: descriptive message

The DR daemon could not fork a process from which to run its RPC server. 

The descriptive error message corresponds to an errno_value and offers clues as to why the DR daemon could not fork off the RPC server. Check the resource limits and the load of the system to find a way to fix this error.

Permission denied

A user other than root tried to run the DR daemon. 

Only the superuser (root) can run the DR daemon because the daemon needs all of the root privileges to fully explore the system and to access the driver to detach and attach boards. 

Unable to register (300326, 4)

The DR daemon was executed without being properly registered with the network services in the domain. The first number represents the RPC number that is registered for the DR daemon. The second number represents the RPC version used by the DR daemon. 

On the domain, fix the inetd.conf entry for the DR daemon.

Unable to create (300326, 4) for netpath

The DR daemon was executed without being properly registered with the network services in the domain. The first number represents the RPC number that is registered for the DR daemon. The second number represents the RPC version used by the DR daemon. 

On the domain, fix the inetd.conf entry for the DR daemon.

DR Driver Error Messages

The following table contains the DR driver failures that are sent to the system logs and to the SSP applications. In general, refer to the descriptions of the daemon and PSM errors for details about what goes to the system logs and what goes to the SSP.


Note -

All of the possible DR driver failure messages are related to the three probable causes given in the table. Likewise, all of the failure messages have one suggested action.


Table A-6 DR Driver Error Messages

Error Message 

Probable Cause 

Suggested Action 

DR: Error: initiate_attach: ioctl failedDR: Error: complete_attach: ioctl failedDR: Error: abort_attach: ioctl failedDR: Error: get_cpu_info: ioctl failedDR: Error: get_mem_config: ioctl failed

An ioctl() failure (that is, a failure that was encountered by the DR daemon when it tried to use the DR driver) can occur at three separate levels.At the first level--within the DR daemon, errors occur when the DR daemon and the DR driver are not interacting properly. The driver could be missing; the DR driver files in the /devices/pseudo directory could be missing, or the file permissions could be wrong. The DR daemon could also be experiencing memory corruption or resource limitations. The ioctl() failure message is followed by a message in the form: Daemon (errno #error_number): error description.

The context of the ioctl() failure (that is, which function precedes the ioctl() failed portion of the message), combined with the text of the error message, indicates what failed. Use the error number to identify the probable cause by checking the information on the ioctl(2) man page. You can also use the /usr/include/errno.h header file if the ioctl(2) man page does not have a specific reference for the error number.

DR: Error: get_mem_cost: ioctl failedDR: Error: get_mem_drain: ioctl failedDR: Error: update_attach: ioctl failedDR: Error: ioctl failed, error draining resourcesDR: Error: detach_board: UNCONFIGURE ioctl failedDR: Error: detach_board: DISCONNECT ioctl failedDR: Error: abort_detach: CANCEL ioctl failedDR: Error: abort_detach: CONFIGURE ioctl failedDR: Error: get_dr_state: ioctl failedDR: Error: get_dr_status: ioctl failed

At the second level--within the platform independent module (PIM) layer of the DR driver, an ioctl failure could indicate busy resources, failing I/O devices on the system board, or improper interaction between the PIM and the platform specific module (PSM) layers. The ioctl() failure message is followed by a PIM message in the form: PIM (error #errornumber): errno_description.At the third level--the PSM layer, an ioctl() failure could indicate busy resources, failing I/O devices on the system board, memory detach failures, CPU detach failures, or internal failures encountered by the PSM driver. The error description usually cites specific physical devices that are failing or includes detailed explanations for a memory or CPU detachment failure. The ioctl() failure message is followed by a PSM message that appears in the following form: PSM (error #errornumber): errno_description.Note that failures in the PSM layer do not have corresponding errno values. PSM failure messages use an error number. You can find explanations of the error numbers in the /usr/include/sys/sfdr.h header file.

The context of the ioctl() failure (that is, which function proceeds the ioctl() failed portion of the message), combined with the text of the error message, indicates what failed. Use the error number to identify the probable cause by checking the information on the ioctl(2) man page. You can also use the /usr/include/errno.h header file if the ioctl(2) man page does not have a specific reference for the error number.

Memory Allocation Error Messages

The following table contains the memory allocation error messages that are sent to the system logs and to the SSP applications. Although the list contains several error messages, each of them describe one of two possible errors: ENOMEM or EAGAIN. All of the ENOMEM errors have the same suggested action, as do the EAGAIN errors

Table A-7 Memory Allocation Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGDR Error: malloc failed (add notnet ap info)errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (alias_namelen) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error. 

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (AP ctlr_t array) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (ap_controller) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (board_cpu_config_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (board_mem_config_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400- Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (board_mem_cost_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (board_mem_drain_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (dr_io) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (leaf array) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (leaf) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (net_leaf_array) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (sbus_cntrl_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (sbus_config) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (sbus_device_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (sbus_usage_t) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. You may have to stop and restart the daemon. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (struct devnm) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (swap name entries) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (swaptbl) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon is larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

NGDR Error: malloc failed (unsafe_devs) errno_description

While it queried the system information, the DR daemon could not allocate enough memory for a structure in which to return the requested information. The daemon may have encountered a resource limit. If the DR daemon cannot allocate memory, then it cannot continue to work. The errno_description usually describes an ENOMEM or EAGAIN error.

First, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon larger than the above memory sizes, then it may have a memory leak. If it does, you should report this problem. An ENOMEM error means that the DR daemon is in a state from which it cannot recover. An EAGAIN error means that the problem may have been temporary. You can retry the operation, which may succeed eventually, or you may have to stop and restart the daemon.

Platform Specific Module (PSM) Error Messages

The following table contains a list of PSM error messages that are sent to the system logs and to the SSP applications.

Table A-8 PSM Error Messages

Error Message 

Probable Cause 

Suggested Action 

1 SFDR_ERR_INTERNAL

An internal driver failed. 

None 

2 SFDR_ERR_SUSPEND

Failed to suspend devices. 

None 

3 SFDR_ERR_RESUME

Failed to resume suspended devices. 

None 

4 SFDR_ERR_UNSAFE

Failed to quiesce the operating system due to referenced suspend-unsafe devices. 

Determine the I/O usage of unsafe devices in the domain, and manually suspend the unsafe devices. 

5 SFDR_ERR_UTHREAD

User thread could not be stopped. 

Retry the operation. If this error persists, try stopping the process with the kill(1) command.

6 SFDR_ERR_RTTHREAD

Realtime thread could not be stopped. 

Retry the operation. If this error persists, try stopping the process with the kill(1) command.

7 SFDR_ERR_KTHREAD

Kernel thread could not be stopped. 

Retry the operation. If this error persists, try stopping the process with the kill(1) command.

8 SFDR_ERR_OSFAILURE

The kernel is not processing DR operations properly for the DR driver. 

None 

9 SFDR_ERR_OUTSTANDING

The ioctl() failed because an error from a previous DR drain operation still has not been reported through the DR status command.

Retry the operation. 

11 SFDR_ERR_CONFIG

The current system configuration will not allow the DR operation to execute. 

Check the /etc/system file to ensure that memory detach is enabled.

12 SFDR_ERR_NOMEM

Not enough memory 

None 

13 SFDR_ERR_PROTO

Protocol failure 

None 

14 SFDR_ERR_BUSY

The device is busy. 

Check the I/O usage of the device to determine the cause of this error (for example, a mounted file system or the last path to an AP device). If possible, manually adjust the system to correct this error (for instance, unmount the file system). If the cause of the error is not apparent, contact your Sun service provider. 

15 SFDR_ERR_NODEV

No devices are present. 

None 

16 SFDR_ERR_INVAL

Invalid argument and/or operation 

None 

17 SFDR_ERR_STATE

Invalid board state (transition) 

None 

18 SFDR_ERR_PROBE

Failed to probe OBP nodes for a board. 

None 

19 SFDR_ERR_DEPROBE

Failed to deprobe OBP nodes for a board. 

None 

20 SFDR_ERR_HW_INTERCONNECT

Interconnect hardware failed. 

None 

21 SFDR_ERR_OFFLINE

Failed to place a CPU offline. 

None 

22 SFDR_ERR_ONLINE

Failed to bring a CPU online. 

None 

23 SFDR_ERR_CPUSTART

Failed to start a CPU. 

None 

24 SFDR_ERR_CPUSTOP

Failed to stop a CPU. 

None 

25 SFDR_ERR_JUGGLE_BOOTPROC

Failed to move the clock-signal CPU. 

None 

26 SFDR_ERR_CANCEL

Could not cancel a RELEASE operation. 

Retry the Abort Detach operation after the Drain operation is complete. 

DR General Domain Error Messages

The following table contains a list of the general failure error messages that are sent to the system logs and/or to the SSP applications.

Table A-9 DR General Domain Error Messages

Error Message 

Probable Cause 

Suggested Action 

NGDR Error: Cannot fork() process . . . errno_description

The DR daemon could not fork off a process for the command to run in. A message in the form "running command" appears in the system logs prior to this error message, or any other error message about failed commands. 

The errno_description offers hints on how to fix the command that you want to run. Also check the man page for the command. It may have an explanation of the error.

DR Error: command has continued

While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, drvconf) to configure the software subsystems. 

Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error. 

DR Error: command stopped by signal signal_number

While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, drvconf) to configure the software subsystems.

Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error. 

DR Error: command terminated due to signal signal_number

While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, drvconf) to configure the software subsystems.

Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error. 

DR Error: command terminated due to signal signal_number. Core dumped.

While the DR daemon was running external commands, one of the commands failed or exited abnormally. The DR feature executes external commands (for example, drvconf) to configure the software subsystems.

Run the program manually on the domain. If the command fails again, refer to the man page for the command. It may have an explanation of the error. 

NGDR Error: dr_issue_ioctl: failed closing driver . . . errno_description

The DR daemon encountered a failure while it tried to close a DR driver entry point. A more detailed explanation of this failure accompanies the error message. 

Use the close(2) man page and the errno_description to determine what caused this error and how to solve it.

Cannot exec command (errno = errno_value).

The DR daemon could not execute the external command. A more detailed explanation of this failure accompanies the error message. 

Check the system logs to determine which command failed. See the exec(2) man page for more details about the specified errno_value. Use this information to solve the error.

dr_get_sysbrd_info: NULL parameter

An invalid pointer was given to the DR daemon during a query of the slot-to-memory address mapping. Either an RPC gave an incorrect value, or the DR daemon called itself with an invalid parameter. 

You should gather as much information about this problem as possible from the system logs so that you can determine the cause of the failure. Try stopping and starting the DR daemon and the SSP application. If this error persists, report it to your Sun service representative. 

update_cpu_info: bad board number

A problem within the DR daemon occurred, causing it to call its internal routines with incorrect values. 

You should gather as much information about this problem as possible from the system logs so that you can determine the cause of the failure. You should also report this problem, and if it persists, you may have to stop and restart the daemon. 

WARNING: Failed to update board board_number`s modification time [non-fatal].

Updating the board modification time has failed. After a board has been modified (for example, memory or CPUs added), it is probed or deprobed by OBP so that OBP can inform other programs of the change. Then, the modification time is updated. 

This error is non-fatal. 

OpenBoot PROM Error Messages

The following table contains the list of OpenBoot(TM) PROM (OBP) error messages that are sent to the system logs and/or to the SSP applications.

Table A-10 OpenBoot PROM Error Messages

Error Message 

Probable Cause 

Suggested Action 

cpu unit without upa-portid [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

OBP_info: bad child units [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

obp_info: bad slot number [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

obp_info: missing sbus name [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

obp_info: missing slot number [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

sbus node without upa-portid [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

sysio_num out of range [non-fatal]

This message indicates that corrupted or incorrect values were found in the OBP structures, meaning that the information in the OBP Configuration window will not be correct. 

This is a non-fatal error. If this error persists, reboot the domain. If the error persists after the reboot, report it to your Sun service representative, providing as much information about the error as possible. 

NGDR Error: cannot open /dev/openprom. . . errno_description

The DR daemon could not open the entry point for the domain OBP information, meaning that no information will appear in the OBP Configuration window. This error is not fatal. 

Determine what caused this error by using the open(2) man page and the errno_description. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: close error on /dev/openprom

The DR daemon failed to close the entry point for the OBP driver. 

Determine what caused this error by using the error messages that preceded this error message. Fix the error if possible. 

NGDR Error: dev/openprom busy. Cannot open.

The entry point for the domain OBP information was busy, meaning that no information will appear in the OBP Configuration window. This error is non-fatal. 

Retry the operation. Check for process that may be keeping the entry point open by using the ps(1M) command. Stop any processes that are keeping the entry point open.

NGDR Error: get_obp_board_config: invalid board state

Communication protocol was breached over the eligibility of a board when the SSP application tried to query the OBP information for a board. To the SSP, the board is part of the domain, so the SSP attempts to drain the board resources. However, to the DR driver and daemon, the board is not part of the domain. 

None 

NGDR Error: OBP config: too many CPUs

The DR daemon found too many CPUs attributed to a system board in the OBP structures. To OBP, the board has more CPUs than it could possibly have (for instance, five or more). 

Ensure that OBP is operating properly. If it is not, reboot the domain. 

NGDR Error: OPROMCHILD. . . errno_description

An ioctr() performed on the OBP driver entry point failed, specifically the ioctr() used to walk the child OBP node in the device tree, meaning that the information in the OBP Configuration window will not be complete.

Determine what caused this error by using the errno_value or the errno_description that accompanies this error message. Fix the error is possible.

NGDR Error: OPROMGETPROP. . . errno_description

An ioctl() performed on the OBP driver entry point failed, specifically the ioctl() used to acquire the OBP properties, meaning that the information in the OBP Configuration window will be incomplete.

Determine what caused this error by using the ioctl(2) man page and the errno_description that accompanies this error message. Fix the error if possible.

NGDR Error: OPROMNEXT. . . errno_description

An ioctl() performed on the OBP driver entry point failed, specifically the ioctr() used to walk to the next OBP node in the device tree, meaning that the information in the OBP Configuration window will not be complete.

Determine what caused this error by using the ioctl(2) man page and the errno_description that accompanies this error message. Fix the error if possible.

NGDR Error: System architecture does not support this option of this command.

An unsupported option was given to the DR daemon as it walked the OBP tree for the domain, meaning that part of the information in the OBP Configuration window will be incorrect. This error is non-fatal. 

None 

DR Domain Exploration Error Messages

The following table contains the system exploration error messages that are sent to the system logs and/or to the SSP applications.

Table A-11 DR Domain Exploration Error Messages

Error Message 

Probable Cause 

Suggested Action 

Cannot open /etc/driver_aliases; dr_daemon may not operate correctly without driver alias mappings . . . errno_description

The DR daemon made an incorrect decision about the detachability and usage of devices in the domain. It is a non-fatal error. 

Analyze what caused this error by using the errno_description, and try to correct the error. Look for incorrect file permissions or some kind of resource limit that has been encountered. After you correct the error, you must stop the DR daemon, then restart it so that it attempts to read the driver alias mappings again.

Cannot open mnttab (errno=errno_value)

The DR daemon does not allow a detachability test to pass if the mnttab file cannot be opened and examined to determine which file systems are mounted. If the test is not stopped, a mounted file system could be detached from the domain.

Analyze the cause of this error by using the errno_value, and try to correct the error. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it.

Cannot open socket (errno=errno_value)This error message is sent only to the system logs.

The DR daemon could not open a network device. All network devices are opened to test their usage. 

Determine what caused this error by using the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

get_cpu_bindings: can't access /proc filesystem [non-fatal].

The /proc filesystem cannot be opened. When the DR daemon explores the domain to determine the CPU information for a board, the /proc filesystem is examined to determine which PIDs, if any, are bound to the CPUs on the board. Bound processes negatively affect the detachability of a board. A complete detach operation will fail if processes are bound to a CPU.

Check to see why the /proc filesystem cannot be accessed. In the domain, process binding and processor set management programs, or processor management programs, can be used to manually determine the CPU information for a board.

get_mem_config: couldn't determine total system memory size; only 1 board counted [non-fatal].

When the DR daemon tried to count the amount of total memory, it could report only the amount of memory on the selected board, meaning that the system memory field reported by the drshow board_number mem command is inaccurate. The inaccuracy also negatively affects the eligibility of a board for a Detach operation because if the total memory cannot be calculated, then the effects of removing a board from the domain cannot be calculated as well.

Stop and restart the DR daemon and driver. Report this error, providing as much information from the system logs as possible. A memory leak could also have occured over time. Check the size of the DR daemon by using the ps(1) command. The size should be between 300- and 400-Kbytes. If the size is not within this range, stop and start the DR daemon and driver.

get_net_config_info: interface_name no address (errno=errno_value)

The DR daemon encountered a failure while it tried to obtain information about a network interface that was configured by using the ifconfig(1M) command.

Determine what caused this error by using the errno_value, then correct the error.

getmntent returned error

The getmntent(3c) system call failed because the mount-point entries could not be properly examined. If the mount-point entries cannot be properly examined, a mounted file system could be detached from the domain.

Analyze the mnttab file for possible corruption. If any exists, correct it. Also, the DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Finally, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Host addr for interface_name not found (h_errno=errno_value)

The file that is needed to test each active network device may not exist, or it may be corrupted. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Use the errno_value to determine if the file exists or if it is corrupted, and correct the error as necessary. The file is named /etc/hostname.interface_name, where interface_name is the interface named in the error message.

Host address field for interface_name is null!!

The IP address for the primary interface (interface_name) is not set properly. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.

Reconfigure the network setup for the domain. You may need to reboot the domain to configure network devices. 

Host address for interface_name must be internet address.

The file that is needed to test each active network device may have a corrupted value or an incorrect network address. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Make sure that the hostname file for the primary network interface contains an IP address in the proper form (that is, xxx.xxx.xxx.xxx). The file is named /etc/hostname.interface_name, where interface_name is the interface named in the error message.

I/O bus device tree not built.

This error message continues added information about the NGDR Error: device tree not built error message, in which the libdevinfo API failed to build the device tree for the system board.

See the NGDR Error: device tree not built error message.

minor_walk: failed to build net leaf.

This error message continues added information about the NGDR Error: device tree not built error message, in which the libdevinfo API failed to build the device tree for the system board. This message indicates that the libdevinfo API at least started to look at the minor devices for a network leaf node.

See the NGDR Error: device tree not built error message.

minor_walk: failed to build non-net leaf.

This error message continues added information about the device tree not built error message, indicating that the libdevinfo API at least started to look at the minor devices for a non-network leaf node.

See the NGDR Error: I/O bus device tree not built error message.

Partition partition_name does not have parent.

The device tree is in error because it includes a disk partition that does not have a parent device, such as the disk to which the partition belongs. 

A device could be bad, or a reboot may be necessary. If this error continues to appear, report the error to your Sun service representative, providing as much information from the system logs as possible. 

Recursive symlink found `symbolic_link_name'. Please remove it.

The DR daemon found a symbolic link as it walked the /dev and /devices directories. Some symbolic links create a recursive loop. The DR daemon will not allow the detachability test to pass if it finds a symbolic link in one of these directories.

Remove the symbolic link so that the test can be retried. 

swapctl SC_GETNSWP failed (errno=errno_value)

The swapctl(2) system call failed. This system call is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if the use of swap partitions cannot be determined.

Analyze what caused this error by using the errno_value, and try to correct it. Use the swapctl(2) man page and the errno_value to determine why the command failed. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. it should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to find cwd errno_value

The DR daemon could not save the current working directory. The daemon switches into the /dev and /devices directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the getcmd(3c) man page and the errno_value, then correct the error.

Unable to find the cwd errno_value

The DR daemon could not determine the name of the driver directory. The daemon switches into the /dev and /devices directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the getcmd(3c) man page and the errno_value, then correct the error.

Unable to get swap entries (errno=errno_value)

The swapctl(2) system call failed. This system called is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if swap partitions cannot be determined.

Analyze what caused this error by using the swapctl(2) man page and the errno_value, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to lstat devlink_file errno_value

The lstat(2) system call failed when it encountered the devlink_file, where devlink is the name of the symbolic link in the /dev directory.

Determine what caused this error by using the lstat(2) man page and the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to open hostname_file (errno=errno_value)

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Analyze what caused this error by using the open(2) man page and the errno_value, and try to correct it. Look for incorrect file permissions or non-existent files. The hostname_file value consists of a file named /etc/hostname.ifname, where ifname is a device name, such as hme0 or le0.

Unable to read host name from hostname_file

The file that is needed to test each active network device could not be read. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Ensure that the file has the correct permissions and that it has not been corrupted. 

Unable to readlink devlink_file errno_value

The readlink(2) system call failed when it encountered the devlink_file, where devlink is the name of the symbolic link in the /dev directory.

Determine what caused this error by using the readlink(2) man page and the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon, then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to restore cwd errno_value

The DR daemon was unable to change back to the original directory after it changed into /dev or /devices directory. The DR daemon changes into the /dev and /devices directories to explore the relationships of the device driver with other drivers.

This error should not pose a problem for the domain, but you should determine what caused the error by using the errno_value.

Unable to set cwd errno_value

The DR daemon could not change into the /dev and /devices directories. The daemon switches into these directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the chdir(2) man page and the errno_value, then correct the error.

unknown node type

The device tree was built incorrectly. Several functions create the device tree for a system board by using the libdevinfo API, and searches the /dev and /devices directories. After the tree is constructed, it is passed on to the rpc_info() function, which builds the tree, performs some verifications, then translates the tree into a structure that can be returned from an RPC.

Check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error, report this error to your Sun service representative, providing as much information form the system logs as possible. 

utssys failed (errno_value) for mount_point

The utssys() system call failed. This system call is used to determine the usage count for a mounted partition. The DR daemon will not allow the detachability test to pass if the usage count cannot be determined.

Analyze what caused this error by using the errno_value, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

walk_dir: dirlist buffer overflow.

As it walked the /dev and /devices directories, the DR daemon encountered too many directories, causing a buffer overflow. If this message occurs, detection of or protection against recursive symbolic links is disabled.

Check the /dev and /devices directories for recursive symbolic links. Remove any recursive symbolic links that you find.

walk_dir: tpath buffer overflow. target_path, device_name

The DR daemon cannot add another directory to the target_path. The daemon walks the /dev and /devices directories to discover device name links so that it can add them to the target path. If the daemon encounters this limit, it cannot explore any more directories because the buffer is full. If the daemon stops it search, some of the devices will not appear in the views (DR daemon and SSP) of the domain device tree. You may also see improper autoswitching of AP devices if this error occurs.

Devices that are not added to the target path must be manually unconfigured and switched to other boards in the domain. You may also need to stop any daemon that is keeping a device open. 

WARNING: cannot check for cvc/ssp interface.

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it corresponds to the SSP network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine the SSP network interface. If the network loses the SSP network interface during a detach operation, DR operations are disabled in the domain, and netcon(1M) sessions are disabled.

Switch the suspected interface to a redundant network connection on another board. You may have to reboot the domain to recover from this error. 

WARNING: Cannot check for primary interface

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Determine which board hosts the primary network interface and re-attach the board to the domain. Or, switch the interface to a redundant network connection on another board in the domain. You may have to reboot the domain to recover from this error. 

WARNING: Cannot determine if interface_name_instance is cvc/ssp interface. SIOCGIFNETMASK errno=errno_value

The DR daemon failed to obtain the necessary information to test an active network interface to determine if it is the SSP connection. While the network devices are examined, each active network device is tested to determine if it is the SSP connection for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the SSP connection for the domain. If the network loses the SSP connection during a DR Detach operation, DR operations and netcon(1M) sessions are disabled.

Switch the network interface (interface_name) to another board. If you cannot correct this error, you may have to reboot the domain.

WARNING: cannot stat device_name errno=errno_value

The stat(2) system call cannot access the /dev entry point for a device in the system device tree.

Use the stat(2) man page and the errno_value to determine why the file device_name could not be accessed.

NGDR Error: Bad page size from sysconf . . . errno_description

The sysconf(3c) system call returned an incorrect value for the system page size, meaning that the system call is broken or that it is not providing a required feature. This error may also explain why queries for memory information or detachability tests are failing due to incorrect reporting of memory sizes.

Use the sysconf(3c) man page and the errno_value to determine the cause of the error.

NGDR Error: device tree not built.

The libdevinfo API failed to build the device tree for the system board. More detailed information about this error accompanies the error message.

Make sure that the correct version of the libdevinfo API is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service representative.

NGDR Error: dr_get_partn_cpus: cannot get cpu's partition . . . errno_description

The DR daemon tried to use the pset_assign(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.

Use the pset_assign(2) man page and the errno_description to determine and correct the cause of this error.

NGDR Error: dr_get_partn_cpus: failed to get cpu partition info . . . errno_description

The DR daemon tried to use the pset_info(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.

Use the pset_info(2) man page and the errno_description to determine and correct the causes of this error.

NGDR Error: dr_page_to_kb: page size smaller than a KB

A math error occurred, or an incorrect memory value was used in a memory calculation. 

Report this error to your Sun service representative. 

NGDR Error: get_board_config: invalid board state

A communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain. However, to the DR daemon and driver, the board is not part of the domain. 

Stop and start the DR application, then retry the operation. If the error persists, use the kill(1M) command to stop the DR daemon, then start the DR daemon and retry the DR operation.

NGDR Error: get_board_config: invalid flag

The SSP passed an invalid or unsupported flag to the DR daemon when the daemon tried to ascertain the configuration of a board. 

Make sure that the version numbers match for the SSP and the DR daemon. Also, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to recover from this error.

NGDR Error: libdevinfo failed.

The initial routine used to open the libdevinfo API failed, so the DR daemon could not explore the device tree for that board. The libdevinfo API builds a tree of dev-info nodes for a board as part of the DR daemon's exploration of the domain devices and their usage. The tree is required by AP and DR operations to test the detachability of a board I/O devices. It is also used to inform the user of what devices are on what system boards.

Make sure that the correct version of the libdevinfo is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service provider.

get_cpu_info: cpu state info is incomplete [non-fatal].

The DR daemon could not gather the states of the CPUs (either online or offline). Therefore, the information about each CPU in the CPU Configuration window will not be accurate. 

None 

NGDR Error: build_rpc_info: bad slot number

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: device address format error

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: I/O bus node address format error

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: psycho number out of range

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: sysio number out of range

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

Unsafe-Device Query Error Messages

The following table contains the list of unsafe-device query failure error messages that are sent to the system logs and/or to the SSP applications.

Table A-12 Unsafe-Device Query Error Messages

Error Message 

Probable Cause 

Suggested Action 

unsafe_devices: couldn't determine name of unsafe device major_number

The mechanism that the DR daemon uses to combine a driver name with a major number failed so that no name could be discovered. If this failure occurs, the DR daemon constructs a string for the device, marking it as "(unknown, major_number)".

This message notifies the user that the DR daemon was unable to find the name of one of the devices, but it does not constitute a correctable error. The daemon can use the major number to identify the drive. 

WARNING: board board_number not checked for unsafe devices.

While the DR daemon was examining the system boards for unsafe devices, the daemon encountered a failure that prevented it from examining one of the system boards (board_number). This error message may be indicative of a more serious problem.

You may have to stop and restart the DR daemon to recover the domain from this error. Check the size of the DR daemon. If should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error, you should report this error to your Sun service representative, providing as much information from the system logs as possible. 

NGDR Error: unsafe_devices: libdevinfo failed.

The DR daemon cannot determine the names of unsafe major devices because it cannot use the libdevinfo API. This API must be used to search the device tree for the names of all of the unsafe major devices.

Make sure that the domain contains the correct version of the libdevinfo API and that the domain does not contain version mismatches between any of the DR daemon's libraries, the operating environment on the domain, or the daemon itself. If you cannot determine the cause of this error, report it to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: create_ctlr_array: count mismatch [internal error]

Communication protocol was breached over the existence of AP controllers. To the AP librarian, the domain has a certain number of AP controllers. However, to the DR daemon, the domain has a different number of AP controllers. 

Check to determine the correct amount of AP controllers in the domain, and correct the error. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. 

Alternate Pathing (AP) Error Messages

The following table contains the list of Alternate Pathing error message that are sent to the system logs and/or to the SSP applications.

Table A-13 AP-related Error Messages

Error Message 

Probable Cause 

Suggested Action 

add_net_ap_info: multiple AP aliases ignored

An AP device has multiple AP aliases. Only one alias is used. The other aliases were ignored. This is not an error. 

If this error persists, remove all but one of the AP aliases. 

AP daemon call failed: error_message *OR* error = error_number

An attempt to notify and/or query the AP librarian failed.  

A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the ap_daemon(1M) man page for more details about this error.

AP daemon comm init failed: error_message *OR* error = error_number

The DR daemon encountered a failure when it tried to establish a channel of communication with the AP librarian.  

A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the ap_daemon(1M) man page for more details about this error.

AP daemon query failed: error_message *OR* error = error_number

The DR daemon could not successfully query the AP librarian on the usage of a specific I/O controller. 

A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the ap_daemon(1M) man page for more details about this error.

AP daemon query failed: length mismatch

The DR daemon queried the AP librarian about the usage of a specific I/O controller, but the response was incorrect. 

A descriptive error message may be available to provide specific details about this failure, or an error number may be available. Also, check the ap_daemon(1M) man page for more details about this error.

Cannot find physical device for AP_alias. This error message is sent only to the system logs.

The physical device name that corresponds with the AP alias could not be found. AP may be confused about the device name, or the /dev and /devices directories are incomplete.

Make sure that AP works properly. Check to see if all of the device entries are present in the /dev and /devices directories. If they are not present, add them the to the appropriate directories.

create_ap_net_leaf: interface instance not found

The DR daemon tries to match the AP meta-network interfaces with the physical device they represent. This error indicates that the DR daemon could not successfully match a network interface with the physical device it represents for this board. 

Make sure that AP works properly if you observe abnormal behavior regarding the availability of devices during and after DR operations. If this error persists, report it to your Sun service representative with as much information from the system logs as possible. 

dr_ap_notify: unknown state state_number

The DR daemon called one of its internal functions with a bad value. However, this error is indicative of a more serious problem. 

Report this error to your Sun service representative with as much information as possible from the system logs. 

dr_daemon operating in NO AP interaction mode

The AP software is not working, or it is not installed. This message means that the DR daemon will not notify AP about attach and detach operations. 

Ignore this error if you do not have AP installed. If it is installed, make sure that it is properly installed and that the AP software version is compatible with the version of the DR daemon that is running in the domain. 

init_ap_rpc: Unable to get hostname

The uname(2) system call returned a null hostname. Consequently, the DR daemon could not establish a connection to the AP librarian.

None 

DCS Error Messages

The following table contains DCS error messages that are sent to the console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-14 DCS Error Messages

Error Message 

Probable Cause 

Suggested Action 

DCS ERROR: permission denied

Only the superuser on the domain can run the DCS.  

Check the inetd.conf file on the domain to ensure that the DCS is started with superuser UID.

DCS ERROR: internal error: operation: error_description

An internal error occurred within the DCS.  

Use the error_description, which corresponds with the errno_value to diagnose the error. The operation field refers to the function call that caused the error.

DCS NOTICE: unrecognized error reported

The DCS reported an unknown error condition.  

Use the log file on the domain to help determine what caused the error.  

DCS ERROR: network initialization failed

The DCS failed to initialize the network connection used to accept DR requests from the DCA.  

Retry the DR operation.  

DCS ERROR: failed to acquire reserved port

The DCS uses port 665, which is reserved through sun-dr. The error occurred because another process is using the port.  

Determine if another process is still using the port. If so, kill the process, if possible, then retry the DR operation.  

DCS ERROR: connection attempt failed

The DCS failed to establish a connection with the DCA.  

Retry the DR operation.  

DCS ERROR: unable to receive message

The DCS failed to receive a message from the DCA.  

Retry the DR operation.  

DCS ERROR: unable to send message for operation_name operation

The DCS failed to send a message to the DCA.  

Retry the DR operation.  

DCS NOTICE: sun-dr service not found, using reserved port 665

The DCS failed to find the sun-dr service in /etc/services.

None 

DCS NOTICE: client disconnected

The client unexpectedly disconnected.  

None 

DCS ERROR: unknown operation requested

The DCA requested an operation that is not recognized by the DCS.  

Retry the DR operation.  

DCS ERROR: operation failed

The current DCS operation failed to complete. The DR operation could have succeeded, if the DCS only failed to send the results to the DCA.  

Check the status of the operation manually. If the DR operation did not succeed, retry the operation.  

DCS ERROR: invalid session establishment sequence

The session establishment sequence, the initialization handshake, between the DCA and the DCS failed.  

Retry the DR operation.  

DCS ERROR: operation_name operation issued before session established

A DR operation was requested before the session was established.  

Retry the DR operation.  

DCS ERROR: received an invalid message

The DCS received unexpected information in the message.  

Retry the DR operation.  

DCS NOTICE: confirm callback failed, aborting operation

The DCS was able to display the confirmation prompt to the user.  

None 

DCS NOTICE: message callback failed, continuing

The DCS was unable to display a message to the user.  

None 

DCS NOTICE: retry value invalid (retry_value)

The value given for the retry_value was invalid, so the operation proceeded with the retry value set to zero.

None 

DCS NOTICE: timeout value invalid (timeout_value)

The value given for the timeout_value was invalid, so the operation proceeded with the retry value set to zero.

None 

DCS INFO: retrying operation, attempt attempt_number

The DCS is retrying the operation. The attempt_number field represents the current attempt.

None 

DCS ERROR: failed to start a new session handler

The DCS failed to start a concurrent session handler to process the incoming DR request.  

Retry the DR operation.  

DCS ERROR: abort attempt of session, session_id, unsuccessful

The DCS failed to abort session, session_id.

Retry the abort request.  

DCS ERROR: unsupported message protocol version: version_number

The DCS does not support the reported protocol version, version_number.

Check the DR software on the domain and on the SSP. Reinstall the proper version of the software on domain if they are not compatible.  

DCS INFO: session aborted

The current DR operation was aborted by the user.  

None 

DCS ERROR: illegal option option, exiting

The DCS was passed the illegal option option.

Check the inetd.conf file on the domain and remove the illegal option from the entries for the DCS.

DCS NOTICE: illegal argument to option flag (argument), action

The option option was given the illegal argument argument. The DCS will perform the action specified by action.

Check the inetd.conf file on the domain and fix the entries for the DCS.

DCS ERROR: resource info init error (error_code)

The DCS failed to initialize the module responsible for providing resource usage information. 

Retry the operation. 

DR Driver Error Messages

The following table contains DR driver error messages that are sent to the console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-15 DR Driver Error Messages

Error Message 

Probable Cause 

Suggested Action 

dr: Internal error: dr.c line_number

An internal error has occurred in the DR driver.  

Retry the operation that failed. If the error persists, exit and restart various DR software components, then retry the operation. If The problem still persists, reboot the domain. Check the console or the system logs for any additional information.  

dr: Insufficient memory: resource

The DR framework was unable to configure or unconfigure resources because a KPHYSM_ERESOURCE error or cpu_configure()/cpu_unconfigure() error with the ENOMEM errno occurred.

This condition might be transient. Retry the DR operation. If the error persists and if the operation that is failing is the unconfigure operation, then try configuring more memory into the domain from a different domain. If the error still persists, reboot the domain.  

dr: Device busy: resource

Translation of possible EBUSY errno from cpu_configure() or cpu_unconfigure(); or an I/O device cannot be detached because it is busy. This error message is also returned if a CPU to be detached is online when dr_pre_detach_cpu is called. A CPU cannot be detached while a delete memory operation is in progress.

Use showdevices(1M) on the system controller to find out why the resource is busy. Or, on the domain, use fuser(1M), psrinfo(1M), prtdiag(1M), or similar tools to find out why the device is busy. Also check if another memory deletion is already in progress. Either reconfigure or shutdown whatever is consuming the resource, or wait for the previous memory deletion to complete depending, on the cause of the error. Then retry the DR operation.

dr: Operation already in progress: resource

Translation of possible EALREADY errno from cpu_configure() or cpu_unconfigure().

Use showdevices(1M) on the system controller to examine the configuration of the specified resource. Or, on the domain, use cfgadm(1M), pbind(1M), psrinfo(1M), and similar commands to examine the configuration of the resource. Determine what operations are already in progress on this resource, and either wait for them to complete or cancel them. Then, retry the DR operation. The operation already in progress may already have terminated, so retrying to the operation might succeed, or may produce another error.

dr: I/O error: resource

An unexpected error code resulted from a call to kphysm_del_start. A more verbose cmn_err message is also printed.

Check the verbose error message from cmn_err in the system logs, and/or on the console for a more specific condition and suggested action.

dr: Bad address: resource

kphysm_add_memory_dynamic returned KPHYSM_EFAULT.

Retry the DR operation. If this error persists, contact your Sun Service representative.  

dr: No device(s) on board: board_path

The board is connected or disconnected with no devices (I/O, memory, or CPU).  

If devices were expected to be on the board, then disconnect it. The board should be removed from the server, and its components should be reseated by a qualified technician.  

dr: Invalid argument: attachment_point

DR was passed an invalid argument.  

Retry the DR operation. If this error persists, contact your Sun service representative.  

dr: Invalid state transition: attachment_point

A DR operation was sequenced out of order. This could be operator error if the cfgadm(1M) functions were issued out of order. Or, the DR driver could be confused due to some internal error conditions.

Retry the DR operation. If this error persists, stop and start (or unload and load) DR software components to recovery from this error condition. If the error continues to persist, reboot the domain.  

dr: Device in fatal state

The device could not be suspended, or it refused to be suspended.  

Retry the DR operation. If this error persists, the device could be suspend-unsafe. Check the list of suspend-unsafe devices. If the device is unsafe, use showdevices(1M) or fuser(1M) to how the device is in use, and manually reconfigure the consumers of the resource. Then, manually unload the driver, or if needed, unplug the cables attached to the device. The device should now be safe to retry the operation. Do not plug the cables back into the device, reload its driver, or reconfigure its consumers before the DR operation has succeeded.

dr: Device failed to resume: path

A previously suspended device could not be resumed.  

 

dr: Cannot stop user thread

DR could not stop a user thread(s) in preparing a device to be suspended.  

Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. You might have to kill the threads to enable the DR operation to proceed.  

dr: Cannot quiesce realtime thread

A realtime thread was encountered in an attempt to suspend the operating system. Suspending, or quiescence, of realtime threads is not allowed. All realtime threads must be stopped or changed to non-realtime before a suspend can succeed.  

Kill the realtime thread(s), or adjust their priority by using the priocntl(1M) command. (You must obtain the PID to adjust the priority of realtime threads.)

dr: Cannot stop kernel thread: name

DR could not stop a kernel thread.  

Retry the DR operation. If this error persists, examine the user threads that failed to suspend, and determine why they could not be suspended. Kill the kernel threads, if possible, to enable the DR operation to proceed.  

dr: Failed to off-line: cpu

A CPU could not off-lined, preventing it from being unconfigured. The CPU might have a thread(s) bound to it. An additional cmn_err message is logged if there are threads bound to the CPU. DR must be able to off-line CPUs and/or to power off CPUs before the board can be disconnected.

Check the console and system log messages to determine if threads are bound to the CPU. If they are, they can be manually unbound or rebound to CPUs on other boards in the domain. If threads are not bound to the CPU, use psrset(1M), pbind(1M), and psrinfo(1M) to determine what changes are required to enable DR to off-line the CPU. For example, you might have to add more CPUs to the domain from different boards. Or, you may have to online other CPUs. Finally, you might have to add more CPU boards to take over the CPU workload.

dr: Failed to on-line: cpu

DR could not online a CPU on a newly-connected or previously unconfigured board.  

 

dr: Failed to start CPU: cpu

DR could not start a CPU on a newly-connected or previously unconfigured board.  

 

dr: Failed to stop CPU: cpu

DR could not power off a CPU on a board to be unconfigured. All of the CPUs on a board to be unconfigured must be taken offline and powered off before the operation can succeed.  

 

dr: Kernel cage is disabled: resource

When the kernel cage is disabled, boards hosting permanent memory cannot be detached.  

You must enable the kernel cage in /etc/system and reboot the domain.

dr: No available memory target: resource

DR could not detach the board because it hosts permanent memory and there is no available target for the memory. Permanent memory must be moved to another memory component within the domain before the DR operation can succeed.  

Configure an additional memory component that contains an adequate amount of memory to act as a target for this board. Then, retry the DR operation.  

dr: VM viability test failed: resource

Translation of error code returned by kphysm_del_start.

Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.  

dr: kphysm_pre_del failed: resource

Translation of error code returned by kphysm_del_start.

Configure additional memory components into the domain to relieve memory resource pressure. Then, retry the DR operation.  

dr: Non-relocatable pages in span: resource

 

 

dr: kphysm_del_cancel: resource

 

 

dr: Memory operation failed: resource

DR failed to attach the memory on a newly attached board.  

 

dr: Can't unconfig cpu if mem online

DR cannot unconfigure a CPU if the memory on the board is online.  

You must off-line the memory before you can unconfigure the board.  

ngdrmach: Cannot read property value: Device Node node_address: property property_name

DR could not get the specified property of a particular device node.  

 

ngdrmach: Cannot determine property length: board::slot:property

DR could not get the length of the specified property for a particular device node. 

 

ngdrmach: No CPU specified for connect: slot

 

 

ngdrmach: Cannot move SIGB assignment

 

 

ngdrmach: Cannot disconnect CPU; SIGB is currently assigned: slot::board

 

 

ngdrmach: Device driver failure: path

 

 

ngdr: Must specify a CPU on the given board: cpu_id

 

 

ngdrmach: No such device: board::slot

 

 

ngdrmach: Memory configured with inter-board interleaving: board::slot

 

 

ngdrmach: Invalid board number: board_number

An invalid board number was specified for the assign board operation.  

Use a different board number, or fix the available components list on the system controller for the domain to include the board for which the assign function is failing.  

ngdrmach:: Cannot proceed; Board is configured or busy: component_name

DR cannot power off or unassign a board that is still configured or busy.  

Unconfigure the board, or wait for any previous DR operations on the board to complete. Then, retry the DR operation 

ngdrmach: Firmware probe failed: attachment_point

OBP failed to probe the board.  

 

ngdrmach: Firmware deprobe failed: attachment_point

OBP failed to deprobe the board.  

 

ngdrmach: Operation not supported

The operation you attempted is not supported.  

None 

ngdrmach: Unrecognized platform command: command/options

An unrecognized command was passed to DR.  

Refer to the cfgadm_sbd(1M) man page to ensure that you use a valid argument. If you used a valid argument and this error persists, contact your Sun service representative.

ngdrmach: drmach parameter is not a valid ID

An invalid drmachid_t value was encountered.

 

ngdrmach: drmach parameter is inappropriate for operation

The wrong type of drmachid_t was passed to a function.

 

ngdrmach: Unexpected internal condition: drmach.c line_number

An internal drmach error occurred.  

Use modunload(1M) and modload(1M) to unload then to load the drmach driver. Then, retry the DR operation. If this error persists, then you must reboot the domain.

ngdrmach: No CPU specified for connect.

 

 

ngdrmach: Firmware move_cpu0 failed: CPU cpu_id 

 

 

ngdrmach: Cannot move SIGB assignment 

 

 

ngdrmach: Cannot disconnect CPU; SIGB is currently assigned 

 

 

Plugin Error Messages

The following error messages are generated by the libcfgadm system board plugin. They are sent to the netcon(1M) console window, to the /var/adm/messages directory, and to the $SSPLOGGER/domain_name/messages directory.

Table A-16 Plugin Error Messages

Error Message 

Probable Cause 

Suggested Action 

Configuration operation cancelled: command ap_id

You did not confirm a configuration operation that requires confirmation.  

See the cfgadm(1M) and/or the cfgadm_sbd(1M) man page for more information about which configuration operations require confirmation.

Hardware specific failure: command ap_id: error: resource

A system error occurred during the execution of the command. The error message, error, can be a standard error (that is, an errno), or it can be a more specific error message that is returned by the DR driver (see the DR driver error messages for more information about DR driver errors). The name of the resource, resource, that is causing the error (for example, a busy device) can also be returned by the DR driver.

For busy devices, identify and remove the usage of the device. For other errors, refer to the driver's documentation for possible recovery actions.  

Library Error: command invalid: command

The specified command is invalid for system boards.  

Refer to the cfgadm_sbd(1M) man page for a list of valid commands.

Library Error: command not supported: command ap_id

The specified command is not supported for the specified attachment point. For example, the connect operation is not allowed for CPUs.  

Refer to the cfgadm_sbd(1M) man page for a list of supported commands.

Library Error: command aborted: command

You aborted the command.  

N/A 

Library Error: option invalid: option

The specified option, option, is invalid.

Refer to the cfgadm_sbd(1M) man page for a list of the valid options.

Library Error: option requires value: option

The specified option, option, requires a value.

Refer to the cfgadm_sbd(1M) man page for a list of the option values.

Library Error: option requires no value: option

The specified option, option, does not require a value.

Refer to the cfgadm_sbd(1M) man page for a list of options that do not require values.

Library Error: option value invalid: option value

The specified value, value, for the option, option, is invalid.

Refer to the cfgadm_sbd(1M) man page for a list of valid option values.

Library Error: attachment point invalid: ap_id

The specified attachment point, ap_id, could not be parsed correctly. This error is rare and could indicate an internal error.

Refer to the cfgadm_sbd(1M) man page for a list of valid attachment points. If this error persists, contact your service representative.

Library Error: component invalid: ap_id

The specified component, ap_id, is invalid.

Refer to the cfgadm_sbd(1M) man page for a list of valid dynamic attachment points.

Library Error: sequence invalid: command (rstate ostate) ap_id

The specified command, command, is invalid for the receptacle and/or occupant state of the specified attachment point. For example, trying to connect an empty slot results in an invalid sequence error.

Refer to the cfgadm_sbd(1M) man page for a list of valid operations.

Library Error: offline ap_id (path): error

The Reconfiguration Coordination Manager (RCM) failed to take the resource, ap_id, offline. The error message, error, returned by the RCM will indicate the reason for the failure. Usually, the reason is a busy device.

For busy devices, identify and remove the usage of the device.  

Library Error: suspend ap_id (path): error

The Reconfiguration Coordination Manager (RCM) failed to suspend the resource, ap_id. The error message, error, returned by the RCM will indicate the reason for the failure. Usually, the reason is a busy device.

For busy devices, identify and remove the usage of the device.  

Library Error: not enough memory

The plugin operation failed due to a lack of memory.  

Check the memory usage.  

Library Error: change signal disposition failed

The plugin failed to set up the signals before it started the DR operation.  

None 

Library Error: cannot get RCM handle

The Reconfiguration Coordination Manager (RCM) failed to initialize.  

None 

Library Error: cannot open library: error

The Reconfiguration Coordination Manager (RCM) library, library, was found, but an error occurred when it was opened. The error message, error, will be returned by the dlopen(3DL).

Check for proper installation of the RCM.  

Library Error: cannot find symbol symbol in library

A required symbol, symbol, was not found in the Reconfiguration Coordination Manager (RCM) library, library.

Check for proper installation of the RCM.  

Library Error: cannot stat library: error

The Reconfiguration Coordination Manager (RCM) library, library, exists, but the stat(2) function failed to get the file status. The error message, error, will be returned by the Solaris operating environment.

None