Sun Cluster 3.0 Error Messages Guide

Error Message List

The following list is ordered by the message ID.


700161 :Fault monitor is already running.

Description:

The resource's fault monitor is already running.

Solution:

This is an internal error. Save the /var/adm/messages file from all the nodes. Contact your authorized Sun service provider.


700321 :exec() of %s failed: %m.

Description:

The exec() system call failed for the given reason.

Solution:

Verify that the pathname given is valid.


702368 :Failed to register callback for NAFO group %s with tag %s and callback command %s (request failed with %d).

Description:

An unexpected error occurred while trying to communicate with the network monitoring daemon (pnmd).

Solution:

Make sure the network monitoring daemon (pnmd) is running.


703476 :clcomm: unable to create desired unref threads

Description:

The system was unable to create thread that deal with no longer needed objects. The system fails to create threads when memory is not available. This message can be generated by the inability of either the kernel or a user level process. The kernel creates unref threads when the cluster starts. A user level process creates threads when it initializes.

Solution:

Take steps to increase memory availability. The installation of more memory will avoid the problem with a kernel inability to create threads. For a user level process problem: install more memory, increase swap space, or reduce the peak work load.


703553 :Resource group name or resource name is too long.

Description:

Process monitor facility is failed to execute the command. Resource group name or resource name is too long for the process monitor facility command.

Solution:

Check the resource group name and resource name. Give short name for resource group or resource .


704795 :in libsecurity could not negotiate uid on any transport in NETPATH

Description:

A server (rpc.pmfd, rpc.fed or rgmd) was not able to start because it could not establish a rpc connection for the network specified, because it couldn't find any transport. This happened because either there are no available transports at all, or there are but none is a loopback. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


705163 :load balancer thread failed to start for %s

Description:

The system has run out of resources that is required to create a thread. The system could not create the load balancer thread.

Solution:

The service group is created with the default load balancing policy. If rebalancing is required, free up resources by shutting down some processes. Then delete the service group and re-create it.


705379 :resource %s state on node %s change to R_ONLINE_UNMON

Description:

This is a notification from the rgmd that a resource has changed state to online-not-monitored on the given node. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


705629 :clutil: Can't allocate hash table

Description:

The system attempted unsuccessfully to allocate a hash table. There was insufficient memory.

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.


706314 :clexecd: Error %d from open(/dev/zero). Exiting.

Description:

clexecd program has encountered a failed open(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


707301 :lkcm_reg: Unix DLM version (???) and the OSD library version (%d) are not compatible. Unix DLM versions accepatble to this library are: %d

Description:

UNIX DLM and Oracle DLM are not compatibale. Compatible versions will be printed as part of this message.

Solution:

Check installation procedure to make sure you have the correct versions of Oracle DLM and Unix DLM. Contact Sun service representative if versions cannot be resolved.


707421 :%s: Cannot create a thread.

Description:

Solaris has run out of its limit on threads. Either too many clients are requesting a service, causing many threads to be created at once or system is overloaded with processes.

Solution:

Reduce system load by reducing number of requestors of this service or halting other processes on the system.


707881 :clcomm: thread_create failed for autom_thread

Description:

The system could not create the needed thread, because there is inadequate memory.

Solution:

There are two possible solutions. Install more memory. Alternatively, reduce memory usage. Since this happens during system startup, application memory usage is normally not a factor.


707948 :launching method <%s> for resource <%s>, resource group <%s>, timeout <%d> seconds

Description:

RGM has invoked a callback method for the named resource, as a result of a cluster reconfiguration, scha_control GIVEOVER, or scswitch.

Solution:

This is an informational message, no user action is needed.


710143 :Failed to add node %d to scalable service group %s: %s.

Description:

A call to the underlying scalable networking code failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


711956 :open /dev/ip failed: %s.

Description:

System was attempting to open the specified device, but was unable to do so.

Solution:

This might be the result of lack of the system resources. Check whether the system is low in memory and take appropriate action. For specific error information check the syslog message.


712367 :clcomm: Endpoint %p: deferred task not allowed in state %d

Description:

The system maintains information about the state of an Endpoint. A deferred task is not allowed in this state.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


714838 :reservation fatal error(%s) - Unable to open name file '%s', errno %d

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


715958 :Method <%s> on resource <%s> stopped due to receipt of signal <%d>

Description:

A resource method was stopped by a signal, most likely resulting from an operator-issued kill(1). The method is considered to have failed.

Solution:

The operator must kill the stopped method. The operator may then choose to issue an scswitch(1M) command to bring resource groups onto desired primaries, or re-try the administrative action that was interrupted by the method failure.


716253 :launch_fed_prog: fe_set_env_vars() failed for program <%s>, step <%s>

Description:

The ucmmd server was not able to get the locale environment. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


717089 :NAFO group %s has unknown status %d. Skipping this NAFO group.

Description:

The status of the NAFO group is not among the set of statuses that is known.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


719114 :Failed to parse key/value pair from command line for %s.

Description:

The validate method for the scalable resource network configuration code was unable to convert the property information given to a usable format.

Solution:

Verify the property information was properly set when configuring the resource.


719497 :clcomm: path_manager using RT lwp rather than clock interrupt

Description:

The system has been built to use a real time thread to support path_manager heart beats instead of the clock interrupt.

Solution:

No user action is required.


719735 :resource %s state on node %s change to R_PENDING_BOOT

Description:

This is a notification from the rgmd that the BOOT method is running on the given resource, on a node that has recently joined the cluster. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


721252 :cm2udlm: cm_getclustmbyname: %s

Description:

Could not create a structure for communication with the cluster monitor process.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


721881 :dl_attach: kstr_msg failed %d error

Description:

Could not attach to the private interconnect.

Solution:

Reboot of the node might fix the problem.


722270 :fatal: cannot create state machine thread

Description:

The rgmd was unable to create a thread upon starting up. This is a fatal error. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Make sure that the hardware configuration meets documented minimum requirements. Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


722904 :Failed to open the resource group handle: %s.

Description:

An API operation has failed while retrieving the resource group property. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components. For resource group name and the property name, check the current syslog message.


722984 :call to rpc.fed failed for resource <%s>, resource group <%s>, method <%s>

Description:

The rgmd failed in an attempt to execute a method, due to a failure to communicate with the rpc.fed daemon. Depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state. If the rpc.fed process died, this might lead to a subsequent reboot of the node.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.


724035 :Failed to connect to %s secure port %d.

Description:

An error occured while the fault monitor was trying to connect to a secure port specified in the Port_list property for this resource.

Solution:

Check to make sure that the Port_list property is correctly set to the same port number that the Netscape Directory Server is running on.


726417 :read %d for %sport

Description:

Could not get the port information from config file udlm.conf.

Solution:

Check to make sure udlm.conf file exist and has entry for udlm.port. If everything looks normal and the problem persists, contact your Sun service representative.


727065 :CMM: Enabling failfast on quorum device %s failed with error %d.

Description:

An attempt to enable failfast on the specified quorum device failed with the specified error.

Solution:

Check if the specified quorum disk has failed. This message may also be logged when a node is booting up and has been preempted from the cluster, in which case no user action is necessary.


727160 :msg of wrong version %d, expected %d

Description:

udlmctl received an illegal message.

Solution:

None. udlm will handle this error.


728216 :reservation error(%s) - did_get_path() error

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


728425 :INTERNAL ERROR: bad state <%s> (%d) for resource group <%s> in rebalance()

Description:

An internal error has occurred in the rgmd. This may prevent the rgmd from bringing the affected resource group online.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


728928 :CCR: Can't access table %s on node %s errno = %d.

Description:

The indicated error occurred when CCR was tried to access the indicated table on the nodes in the cluster. The errno value indicates the nature of the problem. errno values are defined in the file /usr/include/sys/errno.h. An errno value of 28(ENOSPC) indicates that the root files system on the node is full. Other values of errno can be returned when the root disk has failed(EIO).

Solution:

There may be other related messages on the node where the failure occurred. They may help diagnose the problem. If the root file system is full on the node, then free up some space by removing unnecessary files. If the root disk on the afflicted node has failed, then it needs to be replaced. If the indicated table was accidently removed, boot the indicated node in -x mode to restore the indicated table from backup. The CCR tables are located at /etc/cluster/ccr/.


729152 :clexecd: Error %d from F_SETFD. Exiting.

Description:

clexecd program has encountered a failed fcntl(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


730190 :scvxvmlg error - found non device-node or non link %s, directory not removed

Description:

The program responsible for maintaining the VxVM namespace had detected suspicious entries in the global device namespace.

Solution:

The global device namespace should only contain diskgroup directories and volume device nodes for registered diskgroups. The specified path was not recognized as either of these and should be removed from the global device namespace.


730685 :PCSTATUS: %s

Description:

The rpc.pmfd server was not able to monitor a process, and the system error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


730782 :Failed to update scalable service: Error %d.

Description:

Update to a property related to scalability was not successfully applied to the system.

Solution:

Use scswitch to try to bring resource offline and online again on this node. If the error persists, reboot the node and contact your Sun service representative.


730956 :%d entries found in property %s. For a nonsecure Netscape Directory Server instance %s should have exactly one entry.

Description:

Since a nonsecure Netscape Directory Server instance only listens on a single port, the list property should only have a single entry. A different number of entries was found.

Solution:

Change the number of entries to be exactly one.


732069 :dl_attach: DL_ERROR_ACK protocol error

Description:

Could not attach to the physical device.

Solution:

Check the documentation for the driver associated with the private interconnect. It might be that the message returned is too small to be valid.


732822 :clconf: Invalid group name

Description:

An invalid group name has been encountered while converting a group name to clconf_obj type. Valid group names are "cluster", "nodes", "adapters", "ports", "blackboxes", "cables", and "quorum_devices".

Solution:

This is an unrecoverable error, and the cluster needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.


733367 :lkcm_act: %s: %s cm_reconfigure failed

Description:

ucmm reconfiguration failed.

Solution:

None if the next reconfiguration succeeds. If not, save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


734057 :clcomm: Duplicate TypeId's: %s : %s

Description:

The system records type identifiers for multiple kinds of type data. The system checks for type identifiers when loading type information. This message identifies two items having the same type identifiers. This checking only occurs on debug systems.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


734832 :clutil: Created insufficient threads in threadpool

Description:

There was insufficient memory to create the desired number of threads.

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.


734890 :pthread_detach: %s

Description:

The rpc.pmfd server was not able to detach a thread, possibly due to low memory. The message contains the system error. The server does not perform the action requested by the client, and an error message is output to syslog.

Solution:

Investigate if the machine is running out of memory. If all looks correct, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


737104 :Received unexpected result <%d> from rpc.fed, aborting node

Description:

This node encountered an unexpected error while communicating with other cluster nodes during a cluster reconfiguration. The ucmmd will produce a core file and will cause the node to halt or reboot.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


737125 :INTERNAL ERROR: PENDING_OFF_STOP_FAILED or ERROR_STOP_FAILED in rebalance()

Description:

An internal error has occurred in the rgmd. This may prevent the rgmd from bringing affected resource groups online.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


737445 :rebalance: not attempting to start resource group <%s> on node <%s> because this resource group has already failed to start on this node %d or more times in the past %d seconds

Description:

The rgmd is preventing "ping-pong" failover of the resource group, i.e., repeated failover of the resource group between two or more nodes.

Solution:

The time interval in seconds that is mentioned in the message can be adjusted by using scrgadm(1M) to set the Pingpong_interval property of the resource group.


738465 :Malformed adapter specification %s.

Description:

Failed to retrieve the nafo information. The given adapter specification is invalid.

Solution:

Check whether the adapter specification is in the form of nafogroup@nodename. If not, recreate the resource with the properly formatted adapter information.


738847 :clexecd: unable to create failfast object.

Description:

clexecd problem could not enable one of the mechanisms which causes the node to be shutdown to prevent data corruption, when clexecd program dies.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


739356 :warning: cannot store start_failed timestamp for resource group <%s>: time() failed, errno <%d> (%s)

Description:

The specified resource group failed to come online on some node, but this node is unable to record that fact due to the failure of the time(2) system call. The consequence of this is that the resource group may continue to pingpong between nodes for longer than the Pingpong_interval property setting.

Solution:

Examine other syslog messages occurring around the same time on the same node, to see if the cause of the problem can be identified. If the same error recurs, you might have to reboot the affected node.


739653 :Port number %d is listed twice in property %s, at entries %d and %d.

Description:

The port number in the message was listed twice in the named property, at the list entry locations given in the message. A port number should only appear once in the property.

Solution:

Specify the property with only one occurrence of the port number.


740373 :Failed to get the scalable service related properties for resource %s.

Description:

An unexpected error occurred while trying to collect the properties related to scalable networking for the named resource.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


740972 :in fe_set_env_vars setlocale failed

Description:

The rgmd server was not able to get the locale environment, while trying to connect to the rpc.fed server. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


742337 :Node %d is in the %s for resource %s, but property %s identifies resource %s which cannot host an address on node %d.

Description:

All IP addresses used by this resource must be configured to be available on all nodes that the scalable resource can run on.

Solution:

Either change the resource group nodelist to exclude the nodes that cannot host the SharedAddress IP address, or select a different network resource whose IP address will be available on all nodes where this scalable resource can run.


743362 :could not read failfast mode, using panic

Description:

/opt/SUNWudlm/etc/udlm.conf did not have an entry for failfast mode. Default mode of 'panic' will be used.

Solution:

None.


745100 :CMM: Quorum device %ld has been changed from %s to %s.

Description:

The name of the specified quorum device has been changed as indicated. This can happen if while this node was down, the previous quorum device was removed from the cluster, and a new one was added (and assigned the same id as the old one) to the cluster.

Solution:

This is an informational message, no user action is needed.


747567 :Unable to complete any share commands.

Description:

None of the paths specified in the dfstab. file were shared successfully.

Solution:

The prenet_start method would fail. Sun Cluster resource management would attempt to bring the resource on-line on some other node. Manually check that the paths specifed in the dfstab. file are correct.


749409 :clcomm: validate_policy: high not enough. high %d low %d in c %d nodes %d pool %d

Description:

The system checks the proposed flow control policy parameters at system startup and when processing a change request. For a variable size resource pool, the high server thread level must be large enough to allow all of the nodes identified in the message join the cluster and receive a minimal number of server threads.

Solution:

No user action required.


749958 :CMM: Unable to create %s thread.

Description:

The CMM was unable to create its specified thread and the system can not continue. This is caused by inadequate memory on the system.

Solution:

Add more memory to the system. If that does not resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available.


751934 :scswitch: rgm_change_mastery() failed with NOREF, UNKNOWN, or invalid error on node %d

Description:

An inter-node communication failed with an unknown exception while the rgmd was attempting to execute an operator-requested switch of the primaries of a resource group, or was attempting to "fail back" a resource group onto a node that just rejoined the cluster. This will cause the attempted switching action to fail.

Solution:

Examine other syslog messages occurring around the same time on the same node, to see if the cause of the problem can be identified. If the switch was operator-requested, retry it. If the same error recurs, you might have to reboot the affected node. Since this problem might indicate an internal logic error in the clustering software, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


753898 :resource group %s state on node %s change to RG_OFF_PENDING_BOOT

Description:

This is a notification from the rgmd that BOOT methods are running on resources in the given resource group, on a node that has recently joined the cluster. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


754521 :Property %s does not have a value. This property must have exactly one value.

Description:

The property named does not have a value specified for it.

Solution:

Set the property to have exactly one value.


754848 :The property %s must contain at least one SharedAddress network resource.

Description:

The named property must contain at least one SharedAddress.

Solution:

Specify a SharedAddress resource for this property.


756369 :NAFO group %s has failed, so scalable resource %s in resource group %s may not be able to respond to client requests. A request will be issued to relocate resource %s off of this node.

Description:

The named NAFO group has failed, so the node may not be able to respond to client requests. It would be desirable to move the resource to another node that has functioning NAFO groups. A request will be issued on behalf of this resource to relocate the resource to another node.

Solution:

Check the status of the NAFO group on the node. Try to fix the adapters in the NAFO group.


756650 :Failed to set the global interface node to %d for IP %s: %s.

Description:

A call to the underlying scalable networking code failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


757581 :Failed to stop daemon %s.

Description:

The HA-NFS implementation was unable to stop the specified daemon.

Solution:

The resource could be in a STOP_FAILED state. If the failover mode is set to HARD, the node would get automatically rebooted by the SunCluster resource management. If the Failover_mode is set to SOFT or NONE, please check that the specified daemon is indeed stopped (by killing it by hand, if necessary). Then clear the STOP_FAILED status on the resource and bring it on-line again using the scswitch command.


760001 :(%s) netconf error: cannot get transport info for 'ticlts' %s

Description:

Call to getnetconfigent failed and udlmctl could not get network information. udlmctl will exit.

Solution:

Make sure the internconnect does not have any problems. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


761076 :dl_bind: DL_ERROR_ACK protocol error

Description:

Could not bind to the physical device. We are trying to open a fast path to the private transport adapters.

Solution:

Reboot of the node might fix the problem.


762902 :Failed to restart fault monitor.

Description:

The resource property that was updated needed the fault monitor to be restarted inorder for the change to take effect, but the attempt to restart the fault monitor failed.

Solution:

Look at the prior syslog messages for specific problems. Correct the errors if possible. Look for the process _probe operating on the desired resource (indicated by the argument to "-R" option). This can be found from the command: ps -ef | egrep _probe | grep "\-R " Send a kill signal to this process. If the process does not get killed and restarted by the process monitor facility, reboot the node.


763929 :HA: rm_service_thread_create failed

Description:

The system could not create the needed thread, because there is inadequate memory.

Solution:

There are two possible solutions. Install more memory. Alternatively, reduce memory usage.


765395 :clcomm: RT class not configured in this system

Description:

Sun Cluster requires that the real time thread scheduling class be configured in the kernel.

Solution:

Configure Solaris with the RT thread scheduling class in the kernel.


766093 :IP address (hostname) and Port pairs %s%c%d%c%s and %s%c%d%c%s in property %s, at entries %d and %d, effectively duplicate each other. The port numbers are the same and the resolved IP addresses are the same.

Description:

The two list entries at the named locations in the named property have port numbers that are identical, and also have IP address (hostname) strings that resolve to the same underlying IP address. An IP address (hostname) string and port entry should only appear once in the property.

Solution:

Specify the property with only one occurrence of the IP address (hostname) string and port entry.


767363 :CMM: Disconnected from node %ld; aborting using %s rule.

Description:

Due to a connection failure between the local and the specified node, the local node must be halted to avoid a "split brain" configuration. The CMM used the specified rule to decide which node to fail. Rules are: rebootee: If one node is rebooting and the other was a member of the cluster, the node that is rebooting must abort. quorum: The node with greater control of quorum device votes survives and the other node aborts. node number: The node with higher node number aborts.

Solution:

The cause of the failure should be resolved and the node should be rebooted if node failure is unexpected.


767488 :reservation fatal error(UNKNOWN) - Command not specified

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


767629 :lkcm_reg: Unix DLM version (%d) and the OSD library version (%d) are not compatible. Unix DLM versions accepatble to this library are: %d

Description:

Unix DLM and Oracle DLM are not compatibale. Compatible versions will be printed as part of this message.

Solution:

Check installation procedure to make sure you have the correct versions of Oracle DLM and Unix DLM. Contact Sun service representative if versions cannot be resolved.


767858 :in libsecurity unknown security type %d

Description:

This is an internal error which shouldn't occur. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


770355 :fatal: received signal %d

Description:

The daemon indicated in the message tag (rgmd or ucmmd) has received a SIGTERM signal, possibly caused by an operator-initiated kill(1) command. The daemon will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

The operator must use scswitch(1M) and shutdown(1M) to take down a node, rather than directly killing the daemon.


770675 :monitor_check: fe_method_full_name() failed for resource <%s>, resource group <%s>

Description:

During execution of a scha_control(1HA,3HA) function, the rgmd was unable to assemble the full method pathname for the MONITOR_CHECK method. This is considered a MONITOR_CHECK method failure. This in turn will prevent the attempted failover of the resource group from its current master to a new master.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


770776 :INTERNAL ERROR: process_resource: Resource <%s> is R_BOOTING in PENDING_ONLINE resource group

Description:

The rgmd is attempting to bring a resource group online on a node where BOOT methods are still being run on its resources. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


771340 :fatal: Resource group <%s> update failed with error <%d>; aborting node

Description:

Rgmd failed to read updated resource group from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


772294 :%s requests reconfiguration in step %s

Description:

Return status at the end of a step execution indicates that a reconfiguration is required.

Solution:

None.


772395 :shutdown immediate did not succeed. (%s)

Description:

Failed to shutdown Oracle server using 'shutdown immediate' command.

Solution:

Examine 'Stop_timeout' property of the resource and increase 'Stop_timeout' if Oracle server takes long time to shutdown. and if you don't wish to use 'shutdown abort' for stopping Oracle server.


773078 :Error in configuration file lookup (%s, ...): %s

Description:

Could not read configuration file udlm.conf.

Solution:

Make sure udlm.conf exists under /opt/SUNWudlm/etc and has the correct permissions.


773366 :thread create for hb_threadpool failed

Description:

The system was unable to create thread used for heartbeat processing.

Solution:

Take steps to increase memory availability. The installation of more memory will avoid the problem with a kernel inability to create threads. For a user level process problem: install more memory, increase swap space, or reduce the peak work load.


774752 :reservation error(%s) - do_scsi3_inresv() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.


776199 :(%s) reconfigure: cm error %s

Description:

ucmm reconfiguration failed.

Solution:

None if the next reconfiguration succeeds. If not, save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


776339 :INTERNAL ERROR: postpone_stop_r: meth type <%d>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


778629 :ERROR: MONITOR_STOP method is not registered for ONLINE resource <%s>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


779073 :in fe_set_env_vars malloc of env_name[%d] failed

Description:

The rgmd server was not able to allocate memory for an environment variable, while trying to connect to the rpc.fed server, possibly due to low memory. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


779089 :Could not start up DCS client because we could not contact the name server.

Description:

There was a fatal error while this node was booting.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


780283 :clcomm: Exception in coalescing region - Lost data

Description:

While supporting an invocation, the system wanted to combine buffers and failed. The system identifies the exception prior to this message.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


780792 :Failed to retrieve the resource type information.

Description:

A Sun cluster dataservice has failed to retrieve the resource type's property information. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components.


781731 :Failed to retrieve the cluster handle: %s.

Description:

An API operation has failed while retrieving the cluster information.

Solution:

This may be solved by rebooting the node. For more details about API failure, check the messages from other components.


782111 :This list element in System property %s is missing a protocol: %s.

Description:

The system property that was named does not have a valid format. The value of the property must include a protocol.

Solution:

Add a protocol to the property value.


782694 :The value returned for property %s for resource %s was invalid.

Description:

An unexpected value was returned for the named property.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


783130 :Failed to retrieve the node id for node %s: %s.

Description:

API operation has failed while retrieving the node id for the given node.

Solution:

Check whether the node name is valid. For more information about API call failure, check the messages from other components.


783581 :scvxvmlg fatal error - clconf_lib_init failed, returned %d

Description:

The program responsible for maintaining the VxVM namespace has suffered an internal error. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


784560 :resource %s status on node %s change to %s

Description:

This is a notification from the rgmd that a resource's fault monitor status has changed.

Solution:

This is an informational message, no user action is needed.


785101 :transition '%s' failed for cluster '%s': unknown code %d

Description:

The mentioned state transition failed for the cluster because of an unexpected command line option. udlmctl will exit.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


785154 :Could not look up IP because IP was NULL.

Description:

The mapping for the given ip address in the local host files can't be done: the specified ip address is NULL.

Solution:

Check whether the ip address has NULL value. If this is the case, recreate the resource with valid host name. If this is not the reason, treat it as an internal error and contact Sun service provider.


786412 :reservation fatal error(UNKNOWN) - clconf_lib_init() error, returned %d

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


786765 :Failed to get host names from resource %s.

Description:

The networking information for the resource could not be retrieved.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


787063 :Error in getting parameters for global service <%s> of path <%s>: %s

Description:

Can not get information of global service.

Solution:

Save a copy of /var/adm/messages and contact your authorized Sun service provider to determine what is the cause of the problem.


787529 :NAFO group %s has status %s. The status of the NAFO group will be checked again in %d seconds.

Description:

The NAFO group named is in a transition state. The status will be checked again.

Solution:

This is an informational message, no user action is needed.


788145 :gethostbyname() failed: %s.

Description:

gethostbyname() failed with unexpected error.

Solution:

Check if name servcie is configured correctly. Try some commands to query name serves, such as ping and nslookup, and correct the problem. If the error still persists, then reboot the node.


789223 :lkcm_sync: caller is not registered

Description:

udlm is not registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


789460 :monitor_check: call to rpc.fed failed for resource <%s>, resource group <%s>, method <%s>

Description:

A scha_control(1HA,3HA) GIVEOVER attempt failed, due to a failure of the rgmd to communicate with the rpc.fed daemon. If the rpc.fed process died, this might lead to a subsequent reboot of the node. Otherwise, this will prevent a resource group on the local node from failing over to an alternate primary node

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.


791495 :Unregistered syscall (%d)

Description:

An internal error has occured. This should not happen. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


792109 :Unable to set number of file descriptors.

Description:

rpc.pmfd was unable to set the number of file descriptors used in the RPC server.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


792338 :The property %s must contain at least one value.

Description:

The named property does not have a legal value.

Solution:

Assign the property a value.


792967 :Unable to parse configuration file.

Description:

While parsing the Netscape configuration file an error occured in while either reading the file, or one of the fields within the file.

Solution:

Make sure that the appropriate configuration file is located in its default location with respect to the Confdir_list property.


794220 :switchback: bad nodename <%s>

Description:

The rgmd encountered a bad node name in the Nodelist of a resource group it was trying to failback. This might indicate corruption of CCR data or rgmd in-memory state.

Solution:

Use scrgadm(1M) -pvv to examine resource group properties. If the values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Please contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


794535 :clcomm: Marshal Type mismatch. Expecting type %d got type %d

Description:

When MARSHAL_DEBUG is enabled, the system tags every data item marshalled to support an invocation. This reports that the current data item in the received message does not have the expected type. The received message format is wrong.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


795047 :Stop fault monitor using pmfadm failed. tag %s error=%d

Description:

Failed to stop fault monitor will be stopped using Process Monitoring Facility (PMF), with the tag indicated in message. Error returned by PMF is indicated in message.

Solution:

Stop fault monitor processes. Please report this problem.


795062 :Stop fault monitor using pmfadm failed. tag %s error=%s

Description:

Failed to stop fault monitor will be stopped using Process Monitoring Facility (PMF), with the tag indicated in message. Error returned by PMF is indicated in message.

Solution:

Stop fault monitor processes. Please report this problem.


795381 :t_open: %s

Description:

Call to t_open() failed. The "t_open" man page describes possible error codes. udlm exits and the node will abort and panic.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


796491 :Error in checking file system mount point for path <%s>

Description:

The file system mount point can not be mapped to global service correctly.

Solution:

Check the definition of mountpoint path in extention property "ServicePaths" of SUNW.HAStorage type resource and make sure they are for global file system.


796536 :Password file %s is not readable: %s

Description:

For the secure server to run, a password file named keypass is required. This file could not be read, which resulted in an error when trying to start the Data Service.

Solution:

Create the keypass file and place it under the Confdir_list path for this resource. Make sure that the file is readable.


796771 :check_for_ccrdata failed malloc of size %d

Description:

Call to malloc failed. The "malloc" man page describes possible reasons.

Solution:

Install more memory, increase swap space or reduce peak memory consumption.


797604 :CMM: Connectivity of quorum device %ld (%s) has been changed from 0x%llx to 0x%llx.

Description:

The number of configured paths to the specified quorum device has been changed as indicated. The connectivity information is depicted as bitmasks.

Solution:

This is an informational message, no user action is needed.


798060 :Error opening procfs status file <%s> for tag <%s>: %s

Description:

The rpc.pmfd server was not able to open a procfs status file, and the system error is shown. procfs status files are required in order to monitor user processes.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


798175 :sema_wait: %s

Description:

The rpc.pmfd server was not able to act on a semaphore. The message contains the system error. The server does not perform the action requested by the client, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


798514 :Starting fault monitor. pmf tag %s

Description:

Informational message. Fault monitor is being started under control of Process Monitoring Facility (PMF), with the tag indicated in message.

Solution:

None


798658 :Failed to get the resource type name: %s.

Description:

While retrieving the resource information, API operation has failed to retrieve the resource type name.

Solution:

This is internal error. Contact your authorized Sun service provider. For more error description, check the syslog messages.


799348 :INTERNAL ERROR: MONITOR_START method is not registered for resource <%s>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


799426 :clcomm: can't ifkconfig private interface: %s:%d cmd %d error %d

Description:

The system failed to configure private network device for IP communications across the private interconnect of this device and IP address, resulting in the error identified in the message.

Solution:

Ensure that the network interconnect device is supported. Otherwise, Contact your authorized Sun service provider to determine whether a workaround or patch is available.