Sun Enterprise 10000 Dynamic Reconfiguration User Guide

DR Domain Exploration Error Messages

The following table contains the system exploration error messages that are sent to the system logs and/or to the SSP applications.

Table A-11 DR Domain Exploration Error Messages

Error Message 

Probable Cause 

Suggested Action 

Cannot open /etc/driver_aliases; dr_daemon may not operate correctly without driver alias mappings . . . errno_description

The DR daemon made an incorrect decision about the detachability and usage of devices in the domain. It is a non-fatal error. 

Analyze what caused this error by using the errno_description, and try to correct the error. Look for incorrect file permissions or some kind of resource limit that has been encountered. After you correct the error, you must stop the DR daemon, then restart it so that it attempts to read the driver alias mappings again.

Cannot open mnttab (errno=errno_value)

The DR daemon does not allow a detachability test to pass if the mnttab file cannot be opened and examined to determine which file systems are mounted. If the test is not stopped, a mounted file system could be detached from the domain.

Analyze the cause of this error by using the errno_value, and try to correct the error. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it.

Cannot open socket (errno=errno_value)This error message is sent only to the system logs.

The DR daemon could not open a network device. All network devices are opened to test their usage. 

Determine what caused this error by using the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

get_cpu_bindings: can't access /proc filesystem [non-fatal].

The /proc filesystem cannot be opened. When the DR daemon explores the domain to determine the CPU information for a board, the /proc filesystem is examined to determine which PIDs, if any, are bound to the CPUs on the board. Bound processes negatively affect the detachability of a board. A complete detach operation will fail if processes are bound to a CPU.

Check to see why the /proc filesystem cannot be accessed. In the domain, process binding and processor set management programs, or processor management programs, can be used to manually determine the CPU information for a board.

get_mem_config: couldn't determine total system memory size; only 1 board counted [non-fatal].

When the DR daemon tried to count the amount of total memory, it could report only the amount of memory on the selected board, meaning that the system memory field reported by the drshow board_number mem command is inaccurate. The inaccuracy also negatively affects the eligibility of a board for a Detach operation because if the total memory cannot be calculated, then the effects of removing a board from the domain cannot be calculated as well.

Stop and restart the DR daemon and driver. Report this error, providing as much information from the system logs as possible. A memory leak could also have occured over time. Check the size of the DR daemon by using the ps(1) command. The size should be between 300- and 400-Kbytes. If the size is not within this range, stop and start the DR daemon and driver.

get_net_config_info: interface_name no address (errno=errno_value)

The DR daemon encountered a failure while it tried to obtain information about a network interface that was configured by using the ifconfig(1M) command.

Determine what caused this error by using the errno_value, then correct the error.

getmntent returned error

The getmntent(3c) system call failed because the mount-point entries could not be properly examined. If the mount-point entries cannot be properly examined, a mounted file system could be detached from the domain.

Analyze the mnttab file for possible corruption. If any exists, correct it. Also, the DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Finally, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Host addr for interface_name not found (h_errno=errno_value)

The file that is needed to test each active network device may not exist, or it may be corrupted. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Use the errno_value to determine if the file exists or if it is corrupted, and correct the error as necessary. The file is named /etc/hostname.interface_name, where interface_name is the interface named in the error message.

Host address field for interface_name is null!!

The IP address for the primary interface (interface_name) is not set properly. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain.

Reconfigure the network setup for the domain. You may need to reboot the domain to configure network devices. 

Host address for interface_name must be internet address.

The file that is needed to test each active network device may have a corrupted value or an incorrect network address. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Make sure that the hostname file for the primary network interface contains an IP address in the proper form (that is, xxx.xxx.xxx.xxx). The file is named /etc/hostname.interface_name, where interface_name is the interface named in the error message.

I/O bus device tree not built.

This error message continues added information about the NGDR Error: device tree not built error message, in which the libdevinfo API failed to build the device tree for the system board.

See the NGDR Error: device tree not built error message.

minor_walk: failed to build net leaf.

This error message continues added information about the NGDR Error: device tree not built error message, in which the libdevinfo API failed to build the device tree for the system board. This message indicates that the libdevinfo API at least started to look at the minor devices for a network leaf node.

See the NGDR Error: device tree not built error message.

minor_walk: failed to build non-net leaf.

This error message continues added information about the device tree not built error message, indicating that the libdevinfo API at least started to look at the minor devices for a non-network leaf node.

See the NGDR Error: I/O bus device tree not built error message.

Partition partition_name does not have parent.

The device tree is in error because it includes a disk partition that does not have a parent device, such as the disk to which the partition belongs. 

A device could be bad, or a reboot may be necessary. If this error continues to appear, report the error to your Sun service representative, providing as much information from the system logs as possible. 

Recursive symlink found `symbolic_link_name'. Please remove it.

The DR daemon found a symbolic link as it walked the /dev and /devices directories. Some symbolic links create a recursive loop. The DR daemon will not allow the detachability test to pass if it finds a symbolic link in one of these directories.

Remove the symbolic link so that the test can be retried. 

swapctl SC_GETNSWP failed (errno=errno_value)

The swapctl(2) system call failed. This system call is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if the use of swap partitions cannot be determined.

Analyze what caused this error by using the errno_value, and try to correct it. Use the swapctl(2) man page and the errno_value to determine why the command failed. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. it should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to find cwd errno_value

The DR daemon could not save the current working directory. The daemon switches into the /dev and /devices directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the getcmd(3c) man page and the errno_value, then correct the error.

Unable to find the cwd errno_value

The DR daemon could not determine the name of the driver directory. The daemon switches into the /dev and /devices directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the getcmd(3c) man page and the errno_value, then correct the error.

Unable to get swap entries (errno=errno_value)

The swapctl(2) system call failed. This system called is used to determine which disk partitions are in use as swap space. The DR daemon will not allow the detachability test to pass if swap partitions cannot be determined.

Analyze what caused this error by using the swapctl(2) man page and the errno_value, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to lstat devlink_file errno_value

The lstat(2) system call failed when it encountered the devlink_file, where devlink is the name of the symbolic link in the /dev directory.

Determine what caused this error by using the lstat(2) man page and the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to open hostname_file (errno=errno_value)

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Analyze what caused this error by using the open(2) man page and the errno_value, and try to correct it. Look for incorrect file permissions or non-existent files. The hostname_file value consists of a file named /etc/hostname.ifname, where ifname is a device name, such as hme0 or le0.

Unable to read host name from hostname_file

The file that is needed to test each active network device could not be read. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Ensure that the file has the correct permissions and that it has not been corrupted. 

Unable to readlink devlink_file errno_value

The readlink(2) system call failed when it encountered the devlink_file, where devlink is the name of the symbolic link in the /dev directory.

Determine what caused this error by using the readlink(2) man page and the errno_value. The DR daemon may have encountered a resource limit. If so, stop the daemon, then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

Unable to restore cwd errno_value

The DR daemon was unable to change back to the original directory after it changed into /dev or /devices directory. The DR daemon changes into the /dev and /devices directories to explore the relationships of the device driver with other drivers.

This error should not pose a problem for the domain, but you should determine what caused the error by using the errno_value.

Unable to set cwd errno_value

The DR daemon could not change into the /dev and /devices directories. The daemon switches into these directories to produce the real pathnames that correspond to device drivers.

Determine what caused this error by using the chdir(2) man page and the errno_value, then correct the error.

unknown node type

The device tree was built incorrectly. Several functions create the device tree for a system board by using the libdevinfo API, and searches the /dev and /devices directories. After the tree is constructed, it is passed on to the rpc_info() function, which builds the tree, performs some verifications, then translates the tree into a structure that can be returned from an RPC.

Check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon, then restart it. If you cannot recover the domain from this error, report this error to your Sun service representative, providing as much information form the system logs as possible. 

utssys failed (errno_value) for mount_point

The utssys() system call failed. This system call is used to determine the usage count for a mounted partition. The DR daemon will not allow the detachability test to pass if the usage count cannot be determined.

Analyze what caused this error by using the errno_value, and try to correct it. The DR daemon may have encountered a resource limit. If so, stop the daemon then restart it. Also, check the size of the DR daemon. It should be between 300- and 400-Kbytes. If it is not within this range, stop the daemon then restart it. If you cannot recover the domain from this error or if symptoms of a memory leak exist, report this error to your Sun service representative, providing as much information from the system logs as possible.

walk_dir: dirlist buffer overflow.

As it walked the /dev and /devices directories, the DR daemon encountered too many directories, causing a buffer overflow. If this message occurs, detection of or protection against recursive symbolic links is disabled.

Check the /dev and /devices directories for recursive symbolic links. Remove any recursive symbolic links that you find.

walk_dir: tpath buffer overflow. target_path, device_name

The DR daemon cannot add another directory to the target_path. The daemon walks the /dev and /devices directories to discover device name links so that it can add them to the target path. If the daemon encounters this limit, it cannot explore any more directories because the buffer is full. If the daemon stops it search, some of the devices will not appear in the views (DR daemon and SSP) of the domain device tree. You may also see improper autoswitching of AP devices if this error occurs.

Devices that are not added to the target path must be manually unconfigured and switched to other boards in the domain. You may also need to stop any daemon that is keeping a device open. 

WARNING: cannot check for cvc/ssp interface.

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it corresponds to the SSP network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine the SSP network interface. If the network loses the SSP network interface during a detach operation, DR operations are disabled in the domain, and netcon(1M) sessions are disabled.

Switch the suspected interface to a redundant network connection on another board. You may have to reboot the domain to recover from this error. 

WARNING: Cannot check for primary interface

The information that is needed to test each active network device could not be acquired. While the network devices are examined, each active network device is tested to determine if it is the primary network interface for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the primary network interface for the domain. 

Determine which board hosts the primary network interface and re-attach the board to the domain. Or, switch the interface to a redundant network connection on another board in the domain. You may have to reboot the domain to recover from this error. 

WARNING: Cannot determine if interface_name_instance is cvc/ssp interface. SIOCGIFNETMASK errno=errno_value

The DR daemon failed to obtain the necessary information to test an active network interface to determine if it is the SSP connection. While the network devices are examined, each active network device is tested to determine if it is the SSP connection for the domain. The DR daemon will not allow the detachability test to pass if it cannot determine which active network device is the SSP connection for the domain. If the network loses the SSP connection during a DR Detach operation, DR operations and netcon(1M) sessions are disabled.

Switch the network interface (interface_name) to another board. If you cannot correct this error, you may have to reboot the domain.

WARNING: cannot stat device_name errno=errno_value

The stat(2) system call cannot access the /dev entry point for a device in the system device tree.

Use the stat(2) man page and the errno_value to determine why the file device_name could not be accessed.

NGDR Error: Bad page size from sysconf . . . errno_description

The sysconf(3c) system call returned an incorrect value for the system page size, meaning that the system call is broken or that it is not providing a required feature. This error may also explain why queries for memory information or detachability tests are failing due to incorrect reporting of memory sizes.

Use the sysconf(3c) man page and the errno_value to determine the cause of the error.

NGDR Error: device tree not built.

The libdevinfo API failed to build the device tree for the system board. More detailed information about this error accompanies the error message.

Make sure that the correct version of the libdevinfo API is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service representative.

NGDR Error: dr_get_partn_cpus: cannot get cpu's partition . . . errno_description

The DR daemon tried to use the pset_assign(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.

Use the pset_assign(2) man page and the errno_description to determine and correct the cause of this error.

NGDR Error: dr_get_partn_cpus: failed to get cpu partition info . . . errno_description

The DR daemon tried to use the pset_info(2) function, but the function failed. The DR daemon uses this function to obtain the processor set and partitioning information, which it sends to the CPU Configuration window.

Use the pset_info(2) man page and the errno_description to determine and correct the causes of this error.

NGDR Error: dr_page_to_kb: page size smaller than a KB

A math error occurred, or an incorrect memory value was used in a memory calculation. 

Report this error to your Sun service representative. 

NGDR Error: get_board_config: invalid board state

A communication protocol has been breached over the eligibility of a board. To the SSP, the board is part of the domain. However, to the DR daemon and driver, the board is not part of the domain. 

Stop and start the DR application, then retry the operation. If the error persists, use the kill(1M) command to stop the DR daemon, then start the DR daemon and retry the DR operation.

NGDR Error: get_board_config: invalid flag

The SSP passed an invalid or unsupported flag to the DR daemon when the daemon tried to ascertain the configuration of a board. 

Make sure that the version numbers match for the SSP and the DR daemon. Also, check the size of the daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to recover from this error.

NGDR Error: libdevinfo failed.

The initial routine used to open the libdevinfo API failed, so the DR daemon could not explore the device tree for that board. The libdevinfo API builds a tree of dev-info nodes for a board as part of the DR daemon's exploration of the domain devices and their usage. The tree is required by AP and DR operations to test the detachability of a board I/O devices. It is also used to inform the user of what devices are on what system boards.

Make sure that the correct version of the libdevinfo is included on the domain and that a version mismatch does not exist between the DR daemon's libraries, the operating environment on the domain, or the DR daemon itself. If no cause can be found, report this error to your Sun service provider.

get_cpu_info: cpu state info is incomplete [non-fatal].

The DR daemon could not gather the states of the CPUs (either online or offline). Therefore, the information about each CPU in the CPU Configuration window will not be accurate. 

None 

NGDR Error: build_rpc_info: bad slot number

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: device address format error

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: I/O bus node address format error

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: psycho number out of range

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.

NGDR Error: build_rpc_info: sysio number out of range

The device tree was built incorrectly. Several functions create the device tree for a system board by searching through the /dev and /devices directories and by using the libdevinfo API. After the tree is built, it is passed to the build_rpc_info() function that performs some verification of the tree as it translates the DR daemon device tree into a structure that can be returned from an RPC.

Check the size of the DR daemon by using the ps(1) command. Normally, the daemon uses about 300- to 400-Kbytes of memory. If the daemon has grown far beyond the above memory sizes, then an internal error may have occur within it. You may have to stop and restart the DR daemon to resolve this error. Report this error to your Sun service representative, providing as much information from the system logs as possible.