Sun Cluster 3.0 5/02 Error Messages Guide

Error Message List

The following list is ordered by the message ID.

802295 :monitor_check: resource group <%s> changed while running MONITOR_CHECK methods.

Description:

An internal error has occurred in the locking logic of the rgmd, such that a resource group was erroneously allowed to be edited while a failover was pending on it, causing the scha_control call to return early with an error. This in turn will prevent the attempted failover of the resource group from its current master to a new master. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

803391 :Could not validate the settings in %s. It is recommended that the settings for host lookup consult `files` before a name server.

Description:

Validation callback method has failed to validate the hostname list. There may be syntax error in the nsswitch.conf file.

Solution:

Check for the following syntax rules in the nsswitch.conf file. 1) Check if the lookup order for "hosts" has "files". 2) "cluster" is the only entry that can come before "files". 3) Everything in between '[' and ']' is ignored. 4) It is illegal to have any leading whitespace character at the beginning of the line; these lines are skipped. Correct the syntax in the nsswitch.conf file and try again.

803570:lkcm_parm: invalid handle was passed %s %d.

Description:

Need an explanation

Solution:

Need a user action.

803719 :host %s failed, and clnt_spcreateerror returned NULL

Description:

The rgm is not able to establish an rpc connection to the rpc.fed server on the host shown, and the rpc error could not be read. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

804658 :clexecd: close returned %d while exec'ing (%s). Exiting.

Description:

clexecd program has encountered a failed close(2) system call. The error message indicates the error number for the failure.

Solution:

The clexecd program will exit and the node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

804791 :A warm restart of rpcbind may be in progress.

Description:

The HA-NFS probe detected that the rpcbind daemon is not running, however it also detected that a warm restart of rpcbind is in progress.

Solution:

If a warm restart is indeed in progress, ignore this message. Otherwise, check to see if the rpcbind daemon is running. If not, reboot the node. If the rpcbind process is not running, the HA-NFS probe would reboot the node itself if the Failover_mode on the resource is set to HARD.

804820 :clcomm: path_manager failed to create RT lwp (%d)

Description:

The system failed to create a real time thread to support path manager heart beats.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

805735 :Failed to connect to the host <%s> and port <%d>.

Description:

An error occurred the while fault monitor attempted to make a connection to the specified hostname and port.

Solution:

Wait for the fault monitor to correct this by doing restart or failover. For more error descriptions, look at the syslog messages.

805788 :reservation fatal error(%s) - service_name not specified

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

806365 :monitor_check: getlocalhostname() failed for resource <%s>, resource group <%s>

Description:

While attempting to process a scha_control(1HA,3HA) call, the rgmd failed in an attempt to obtain the hostname of the local node. This is considered a MONITOR_CHECK method failure. This in turn will prevent the attempted failover of the resource group from its current master to a new master.

Solution:

806618 :Resource group name is null.

Description:

This is an internal error. While attempting to retrieve the resource information, null value was retrieved for the resource group name.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

806902 :clutil: Could not create lwp during respawn

Description:

There was insufficient memory to support this operation.

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.

807249 :CMM: Node %s (nodeid = %d) with votecount = %d removed.

Description:

The specified node with the specified votecount has been removed from the cluster.

Solution:

This is an informational message, no user action is needed.

808746 :Node id %d is higher than the maximum node id of %d in the cluster.

Description:

In one of the scalable networking properties, a node id was encountered that was higher than expected.

Solution:

Verify that the nodes listed in the scalable networking properties are still valid cluster members.

809322 :Couldn't create deleted directory: error (%d)

Description:

The file system is unable to create temporary copies of deleted files.

Solution:

Mount the affected file system as a local file system, and ensure that there is no file system entry with name "._" at the root level of that file system. Alternatively, run fsck on the device to ensure that the file system is not corrupt.

809329 :No adapter for node %s.

Description:

No NAFO group has been specified for this node.

Solution:

If this error message has occurred during resource creation, supply valid adapter information and retry it. If this message has occurred after resource creation, remove the LogicalHostname resource and recreate it with the correct NAFO group for each node which is a potential master of the resource group.

809554 :Unable to access directory %s:%s.

Description:

A HA-NFS method attempted to access the specified directory but was unable to do so. The reason for the failure is also logged.

Solution:

If the directory is on a mounted filesystem, make sure the filesystem is currently mounted. If the pathname of the directory is not what you expected, check to see if the Pathprefix property of the resource group is set correctly. If this error occurs in any method other than VALIDATE, HA-NFS would attempt to recover the situation by either failing over to another node or (in case of Stop and Postnet_stop) by rebooting the node.

809858 :ERROR: method <%s> timeout for resource <%s> is not an integer

Description:

The indicated resource method timeout, as stored in the CCR, is not an integer value. This might indicate corruption of CCR data or rgmd in-memory state. The method invocation will fail; depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

Use scstat(1M) -g and scrgadm(1M) -pvv to examine resource properties. If the values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

809956 :PCSEXIT: %s

Description:

The rpc.pmfd server was not able to monitor a process, and the system error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

809985 :Elements in Confdir_list and Port_list must be 1-1 mapping

Description:

The Confdir_list and Port_list properties must contain the same number of entries, thus maintaining a 1-1 mapping between the two.

Solution:

Using the appropriate scrgadm command, configure this resource to contain the same number of entries in the Confdir_list and the Port_list properties.

810551 :fatal: Unable to bind president to nameserver

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

811254 :VALIDATE failed on resource <%s>, resource group <%s>

Description:

The resource's VALIDATE method exited with a non-zero exit code. This indicates that an attempted update of a resource or resource group is invalid.

Solution:

Examine syslog messages occurring just before this one to determine the cause of validation failure. Retry the update.

811357:Successfully started BV servers on $HOSTNAME.

Description:

The Sun Cluster HA for BroadVision One-To-One Enterprise processes on the specified host successfully started.

Solution:

No user action required.

811463 :match_online_key failed strdup for (%s)

Description:

Call to strdup failed. The "strdup" man page describes possible reasons.

Solution:

Install more memory, increase swap space or reduce peak memory consumption.

812706 :dl_attach: DL_OK_ACK protocol error

Description:

Could not attach to the private interconnect interface.

Solution:

Reboot of the node might fix the problem.

813317 :Failed to open the cluster handle: %s.

Description:

An internal error occurred while attempting to open a handle for an object.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

813831 :reservation warning(%s) - MHIOCSTATUS error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

813866 :Property %s has no hostnames for resource %s.

Description:

The named property does not have any hostnames set for it.

Solution:

Re-create the named resource with one or more hostnames.

813977 :Node %d is listed twice in property %s.

Description:

The node in the message was listed twice in the named property.

Solution:

Specify the property with only one occurrence of the node.

813990 :Started the HA-NFS system fault monitor.

Description:

The HA-NFS system fault monitor was started successfully.

Solution:

No action required.

814232 :fork() failed: %m.

Description:

The fork() system call failed for the given reason.

Solution:

If system resources are not available, consider rebooting the node.

814905 :Could not start up DCS client because major numbers on this node do not match the ones on other nodes. See /var/adm/messages for previous errors.

Description:

Some drivers identified in previous messages do not have the same major number across cluster nodes, and devices owned by the driver are being used in global device services.

Solution:

Look in the /etc/name_to_major file on each cluster node to see if the major number for the driver matches across the cluster. If a driver is missing from the /etc/name_to_major file on some of the nodes, then most likely, the package the driver ships in was not installed successfully on all nodes. If this is the case, install that package on the nodes that don't have it. If the driver exists on all nodes but has different major numbers, see the documentation that shipped with this product for ways to correct this problem.

815551 :System property %s with value %s has an empty list element.

Description:

The system property that was named does not have a value for one of its list elements.

Solution:

Assign the property to have a value where all list elements have values.

816002 :The port number %d from entry %s in property %s was not for a nonsecure port.

Description:

The Netscape Directory Server instance has been configured as nonsecure, but the port number given in the list property is for a secure port.

Solution:

Remove the entry from the list or change its port number to correspond to a nonsecure port.

816578 :Node %u attempting to join cluster has incompatible cluster software. %s not compatible with %s

Description:

A node is attempting to join the cluster but it is either using an incompatible software version or is booted in a different mode (32-bit vs. 64-bit).

Solution:

Ensure that all nodes have the same clustering software installed and are booted in the same mode.

817592 :HA: rma::admin_impl failed to bind

Description:

An HA framework component failed to register with the name server.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

818821 :Value %d is listed twice in property %s.

Description:

The value listed occurs twice in the named property.

Solution:

Specify the property with only one occurrence of the value.

818824 :HA: rma::reconf can't talk to RM

Description:

An HA framework component failed to register with the Replica Manager.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

818836 :Value %s is listed twice in property %s.

Description:

The value listed occurs twice in the named property.

Solution:

Specify the property with only one occurrence of the value.

819642 :fatal: unable to register RPC service; aborting node

Description:

The rgmd was unable to start up successfully because it failed to register an RPC service. It will produce a core file and will force the node to halt or reboot.

Solution:

If rebooting the node doesn't fix the problem, examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

819721 :Failed to start %s.

Description:

Sun Cluster could not start the application. It would attempt to start the service on another node if possible.

Solution:

1) Check prior syslog messages for specific problems and correct them. 2) This problem may occur when the cluster is under load and Sun Cluster cannot start the application within the timeout period specified. You may consider increasing the Start_timeout property. 3) If the resource was unable to start on any node, resource would be in START_FAILED state. In this case, use scswitch to bring the resource ONLINE on this node. 4) If the service was successfully started on another node, attempt to restart the service on this node using scswitch. 5) If the above steps do not help, disable the resource using scswitch. Check to see that the application can run outside of the Sun Cluster framework. If it cannot, fix any problems specific to the application, until the application can run outside of the Sun Cluster framework. Enable the resource using scswitch. If the application runs outside of the Sun Cluster framework but not in response to starting the data service, contact your authorized Sun service provider for assistance in diagnosing the problem.

820394:Cannot check online status. Server processes are not running.

Description:

Sun Cluster HA for Sybase could not check the online status of the Sybase Adaptive Server. The Sybase Adaptive Server process is not running.

Solution:

Examine the Connect_string resource property. Make sure that the userid and password specified in the connect string are correct and that permissions are granted to the user connecting to the server. Check the Sybase Adaptive Server log for error messages. Other syslog messages and the log file should provide additional information.

821304 :Failed to retrieve the resource group information.

Description:

A Sun cluster dataservice has failed to retrieve the resource group property information. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components.

821754 :A SCHA API error occurred. Retrying the retrieval of cluster information.

Description:

SCHA API's are used to interface with the Resource Group Manager component. It is likely that the RGM is experiencing problems.

Solution:

Inspect the syslog for errors. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

821781 :Fencing shared disk groups: %s

Description:

A reservation failfast will be set so nodes which share these disk groups will be brought down if they are fenced off by other nodes.

Solution:

None.

822385 :Failed to retrieve process monitor facility tag.

Description:

Failed to create the tag that is used to register with the process monitor facility.

Solution:

Check the syslog messages that occurred just before this message. In case of internal error, save the /var/adm/messages file and contact authorized Sun service provider.

824468 :Invalid probe values.

Description:

The values for system defined properties Retry_count and Retry_interval are not consistent with the property Thorough_Probe_Interval.

Solution:

Change the values of the properties to satisfy the following relationship: Thorough_Probe_Interval * Retry_count <= Retry_interval.

824550 :clcomm: Invalid flow control parameters

Description:

The flow control policy is controlled by a set of parameters. These parameters do not satisfy guidelines. Another message from validay_policy will have already identified the specific problem.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

824861 :Resource %s named in property %s is not a SharedAddress resource.

Description:

The resource given for the named property is not a SharedAddress resource. All resources for that property must be SharedAddress resources.

Solution:

Specify only SharedAddresses for the named property.

825274 :idl_scha_control_checkall(): IDL Exception on node <%d>

Description:

During a failover attempt, the scha_control function was unable to check the health of the indicated node, because of an error in inter-node communication. This was probably caused by the death of the indicated node during scha_control execution. The RGM will still attempt to master the resource group on another node, if available.

Solution:

No action is required; the rgmd should recover automatically. Identify what caused the node to die by examining syslog output. The syslog output might indicate further remedial actions.

826050 :Failed to retrieve the cluster property %s for %s: %s.

Description:

The query for a property failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

826397 :Invalid values for probe related parameters.

Description:

Validation of the probe related parameters has failed. Invalid values are specified for these parameters.

Solution:

Retry_interval must be greater than or equal to the product of Thorough_probe_interval, and Retry_count. Use scrgadm(1M) to modify the values of these parameters so that they will hold the above relationship.

826747 :reservation error(%s) - do_scsi3_inkeys() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.

827525 :reservation message(%s) - Fencing other node from disk %s

Description:

The device fencing program is taking access to the specified device away from a non-cluster node.

Solution:

This is an informational message, no user action is needed.

828170 :CCR: Unrecoverable failure during updating table %s.

Description:

CCR encountered an unrecoverable error while updating the indicated table on this node.

Solution:

The node needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.

828171:stat of file %s failed.

Description:

Status of the named file could not be obtained.

Solution:

Verify the permissions of the file and all components in the path prefix.

828283 :clconf: No memory to read quorum configuration table

Description:

Could not allocate memory while converting the quorum configuration information into quorum table.

Solution:

This is an unrecoverable error, and the cluster needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.

828407 :WARNING: lkcm_sync failed: unknown message type %d

Description:

An message of unknown type was sent to udlm. This will be ignored.

Solution:

None.

828474 :resource group %s property changed.

Description:

This is a notification from the rgmd that the operator has edited a property of a resource group. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

828739 :transition '%s' timed out for cluster, forcing reconfiguration.

Description:

Step transition failed. A reconfiguration will be initiated.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

829132 :scha_control GIVEOVER failed. error %s

Description:

Fault monitor had detected problems in RDBMS server. Attempt to switchover resource to another node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.

829262 :Switchover (%s) error: cannot find clexecd

Description:

The file system specified in the message could not be hosted on the node the message came from. Check to see if the user program "clexecd" is running on that node.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

829384 :INTERNAL ERROR: launch_method: state machine attempted to launch invalid method <%s> (method <%d>) for resource <%s>; aborting node

Description:

An internal error occurred when the rgmd attempted to launch an invalid method for the named resource. The rgmd will produce a core file and will force the node to halt or reboot.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

830211 :Failed to accept connection on socket: %s.

Description:

While determining the health of the data service, fault monitor is failed to communicate with the process monitor facility.

Solution:

This is internal error. Save /var/adm/messages file and contact your authorized Sun service provider. For more details about error, check the syslog messages.

831036 :Service object [%s, %s, %d] created in group '%s'

Description:

A specific service known by its unique name SAP (service access point), the three-tuple, has been created in the designated group.

Solution:

This is an informational message, no user action is needed.

833126:Monitor server successfully started.

Description:

Sun Cluster HA for Sybase successfully started the Monitor Server.

Solution:

No user action required.

833212:Attempting to start the data service under process monitor facility.

Description:

The function is going to request the PMF to start the data service. If the request fails, refer to the syslog messages that appear after this message.

Solution:

This is an informational message, no user action is required.

833229 :Couldn't remove deleted directory file, '%s' error: (%d)

Description:

The file system is unable to create temporary copies of deleted files.

Solution:

833970 :clcomm: getrlimit(RLIMIT_NOFILE): %s

Description:

During cluster initialization within this user process, the getrlimit call failed with the specified error.

Solution:

Read the man page for getrlimit for a more detailed description of the error.

836461 :Entry for file system mount point %s absent in %s.

Description:

One or more file system mount points specified via the FilesystemMountPoint extension property are absent in the /etc/vfstab file.

Solution:

Ensure that each file system mount point is a valid entry in /etc/vfstab.

837169 :Starting listener %s.

Description:

Informational message. HA-Oracle will be starting Oracle listener.

Solution:

None

837211 :Resource is already online.

Description:

While attempting to restart the resource, error has occurred. The resource is already online.

Solution:

This is an internal error. Save the /var/adm/messages file from all the nodes. Contact your authorized Sun service provider.

837752 :Failed to retrieve the resource group handle for %s while querying for property %s: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

837760 :monitored processes forked failed (errno=%d)

Description:

The rpc.pmfd server was not able to start (fork) the application, probably due to low memory, and the system error number is shown. An error message is output to syslog.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

839641 :t_alloc (reqp): %s

Description:

Call to t_alloc() failed. The "t_alloc" man page describes possible error codes. udlm will exit and the node will abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

839649 :t_alloc (resp): %s

Description:

Call to t_alloc() failed. The "t_alloc" man page describes possible error codes. udlm will exit and the node will abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

839936 :Some ip addresses may not be plumbed.

Description:

Some of the ip addresses managed by the LogicalHostname resource were not successfully brought on-line on this node.

Solution:

Use ifconfig command to make sure that the ip addresses are indeed absent. Check for any error message before this error message for a more precise reason for this error. Use scswitch to move the resource group to some other node.

840542 :OFF_PENDING_BOOT: bad resource state <%s> (%d) for resource <%s>

Description:

The rgmd state machine has discovered a resource in an unexpected state on the local node. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

840619 :Invalid value was returned for resource group property %s for %s.

Description:

The value returned for the named property was not valid.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

840696 :DNS database directory %s is not readable: %s

Description:

The DNS database directory is not readable. This may be due to the directory not existing or the permissions not being set properly.

Solution:

Make sure the directory exists and has read permission set appropriately. Look at the prior syslog messages for any specific problems and correct them.

841616 :CMM: This node has been preempted from quorum device %s.

Description:

This node's reservation key was on the specified quorum device, but is no longer present, implying that this node has been preempted by another cluster partition. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There may be other related messages that may indicate why quorum was lost. Determine why quorum was lost on this node, resolve the problem and reboot this node.

841719 :listener %s is not running. restart limit reached. Stopping fault monitor.

Description:

Listener is not running. Listener monitor has reached the restart limit specified in 'Retry_count' and 'Retry_interval' properties. Listener monitor will be stopped.

Solution:

Check Oracle listener setup. Please make sure that Listener_name specified in the resource property is configured in listener.ora file. Check 'Host' property of listener in listener.ora file. Examine log file and syslog messages for additional information. Stop and start listener monitor.

841875 :remote node died

Description:

An inter-node communication failed because another cluster node died.

Solution:

No action is required. The cluster will reconfigure automatically. Examine syslog output on the rebooted node to determine the cause of node death.

842313 :clexecd: Sending fd on common channel returned %d. Exiting.

Description:

clexecd program has encountered a failed fcntl(2) system call. The error message indicates the error number for the failure.

Solution:

The node will halt or reboot itself to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

842382 :fcntl: %s

Description:

A server (rpc.pmfd or rpc.fed) was not able to execute the action shown, and the process associated with the tag is not started. The error message is shown.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

842712 :clcomm: solaris xdoor door_create failed

Description:

A door_create operation failed. Refer to the "door_create" man page for more information.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

843070 :Failed to disconnect from port %d of resource %s.

Description:

An error occurred while fault monitor attempted to disconnect from the specified hostname and port.

Solution:

Wait for the fault monitor to correct this by doing restart or failover. For more error descriptions, look at the syslog messages.

843093 :fatal: Got error <%d> trying to read CCR when enabling monitor of resource <%s>; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

843978 :Socket creation failed: %s.

Description:

System is unable to create a socket.

Solution:

This might be the result from the lack of system resources. Check whether the system is low in memory and take appropriate action. For specific error information check the syslog message.

843983 :CMM: Node %s: attempting to join cluster.

Description:

The specified node is attempting to become a member of the cluster.

Solution:

This is an informational message, no user action is needed.

845866 :Failover attempt failed: %s.

Description:

The failover attempt of the resource is rejected or encountered an error.

Solution:

For more detailed error message, check the syslog messages. Check whether the Pingpong_interval has appropriate value. If not, adjust it using scrgadm(1M). Otherwise, use scswitch to switch the resource group to a healthy node.

846053 :Fast path enable failed on %s%d, could cause path timeouts

Description:

DLPI fast path could not be enabled on the device.

Solution:

Check if the right version of the driver is in use.

846376 :fatal: Got error <%d> trying to read CCR when making resource group <%s> unmanaged; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

846420 :CMM: Nodes %ld and %ld are disconnected from each other; node %ld will abort using %s rule.

Description:

Due to a connection failure between the two specified non-local nodes, one of the nodes must be halted to avoid a "split brain" configuration. The CMM used the specified rule to decide which node to fail. Rules are: rebootee: If one node is rebooting and the other was a member of the cluster, the node that is rebooting must abort. quorum: The node with greater control of quorum device votes survives and the other node aborts. node number: The node with higher node number aborts.

Solution:

The cause of the failure should be resolved and the node should be rebooted if node failure is unexpected.

846813 :Switchover (%s) error (%d) converting to primary

Description:

The file system specified in the message could not be hosted on the node the message came from.

Solution:

Check /var/adm/messages to make sure there were no device errors. If not, contact your authorized Sun service provider to determine whether a workaround or patch is available.

847065 :Failed to start listener %s.

Description:

Failed to start Oracle listener.

Solution:

847124 :getnetconfigent: %s

Description:

call to getnetconfigent in udlm port setup failed.udlm fails to start and the node will eventually panic.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

847496 :CMM: Reading reservation keys from quorum device %s failed with error %d.

Description:

The specified error was encountered while trying to read reservation keys on the specified quorum device.

Solution:

There may be other related messages on this and other nodes connected to this quorum device that may indicate the cause of this problem. Refer to the quorum disk repair section of the administration guide for resolving this problem.

847656 :Command %s is not executable.

Description:

The specified pathname, which was passed to a libdsdev routine such as scds_timerun or scds_pmf_start, does not refer to an executable file. This could be the result of 1) incorrectly configuring the name of a START or MONITOR_START method or other property, 2) a programming error made by the resource type developer, or 3) a problem with the specified pathname in the file system itself.

Solution:

Ensure that the pathname refers to a regular, executable file.

847809 :Must be in cluster to start %s

Description:

Machine on which this command or daemon is running is not part of a cluster.

Solution:

Run the command on another machine or make the machine is part of a cluster by following appropriate steps.

847916 :(%s) netdir error: uaddr2taddr: %s

Description:

Call to uaddr2taddr() failed. The "uaddr2taddr" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

847978 :reservation fatal error(UNKNOWN) - cluster_get_quorum_status() error, returned %d

Description:

The device fencing program has suffered an internal error.

Solution:

847994 :Plumb failed. tried to unplumb %s%d, unplumbed failed with rc %d.

Description:

Topology Manager failed to plumb an adapter for private network. A possible reason for plumb to fail is that it is already plumbed. Solaris Clustering tries to unplumb the adapter and plumb it for private use but it could not unplumb the adapter.

Solution:

Check if the adapter by that name exists.

848033 :SharedAddress online.

Description:

The status of the sharedaddress resource is online.

Solution:

This is informational message. No user action required.

848652 :CMM aborting.

Description:

The node is going down due to a decision by the cluster membership monitor.

Solution:

This message is preceded by other messages indicating the specific cause of the abort, and the documentation for these preceding messages will explain what action should be taken. The node should be rebooted if node failure is unexpected.

848881 :Received notice that NAFO group %s has failed.

Description:

The status of the named NAFO group has become degraded. If possible, the scalable resources currently running on this node with monitoring enabled will be relocated off of this node, if the NAFO group stays in a degraded state.

Solution:

Check the status of the NAFO group on the node. Try to fix the adapters in the NAFO group.

848943 :clconf: No valid gdevname field for quorum device %d

Description:

Found the gdevname field for the quorum device being incorrect while converting the quorum configuration information into quorum table.

Solution:

Check the quorum configuration information.

852212 :reservation message(%s) - Taking ownership of disk %s away from non-cluster node

Description:

The device fencing program is taking access to the specified device away from a non-cluster node.

Solution:

This is an informational message, no user action is needed.

852497 :scvxvmlg error - readlink(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

852615 :reservation error(%s) - Unable to gain access to device '%s'

Description:

The device fencing program has encountered errors while trying to access a device.

Solution:

Another cluster node has fenced this node from the specified device, preventing this node from accessing that device. Access should have been reacquired when this node joined the cluster, but this must have experienced problems. If the message specifies the 'node_join' transition, this node will be unable to access the specified device. If the failure occurred during the 'make_primary' transition, then this will be unable to access the specified device and a device group containing the specified device may have failed to start on this node. An attempt can be made to acquire access to the device by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on this node. If a device group failed to start on this node, the scswitch command can be used to start the device group on this node if access can be reacquired. If the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available.

853478 :Received non interrupt heartbeat on %s - path timeouts are likely.

Description:

Solaris Clustering requires network drivers to deliver heartbeat messages in the interrupt context. A heartbeat message has unexpectedly arrived in non interrupt context.

Solution:

Check if the right version of the driver is in use.

854468 :failfast arm error: %d

Description:

Error during failfast device arm operation.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

854792 :clcomm: error in copyin for cl_change_threads_min

Description:

The system failed a copy operation supporting a flow control state change.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

854894 :No LogicalHostname resource in resource group.

Description:

The probe method for this data service could not find a LogicalHostname resource in the same resource group as the data service.

Solution:

Use scrgadm to configure the resource group to hold both the data service and the LogicalHostname.

856492 :waitpid() failed: %m.

Description:

The waitpid() system call failed for the given reason.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

856880 :Desired_primaries for resource group %s should be 0. Current value is %d.

Description:

The number of desired primaries for this resource group to be zero. In the event that a node dies or joins the cluster, the resource group might come online on some node, even if it was previously switched offline, and was intended to remain offline.

Solution:

Set the Desired_primaries property for the resource group to zero.

856919 :INTERNAL ERROR: process_resource: resource group <%s> is pending_methods but contains resource <%s> in STOP_FAILED state

Description:

During a resource creation, deletion, or update, the rgmd has discovered a resource in STOP_FAILED state. This may indicate an internal logic error in the rgmd, since updates are not permitted on the resource group until the STOP_FAILED error condition is cleared.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

857573 :scvxvmlg error - rmdir(%s) failed

Description:

Solution:

857792 :UNIX DLM initiating cluster abort.

Description:

Due to an error encountered, unix dlm is initiating an abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

859126 :System property %s is empty.

Description:

The system property that was named does not have a value.

Solution:

Assign the property a value.

862493 :in libsecurity could not register on any transport in NETPATH

Description:

A server (rpc.pmfd, rpc.fed or rgmd) was not able to start because it could not establish a rpc connection for the network specified, because it couldn't find any transport. An error message is output to syslog. This happened because either there are no available transports at all, or there are but none is a loopback.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

862716 :sema_init: %s

Description:

The rpc.pmfd server was not able to initialize a semaphore, possibly due to low memory, and the system error is shown. The server does not perform the action requested by the client, and pmfadm returns error. An error message is also output to syslog.

Solution:

865635 :lkcm_act: caller is not registered

Description:

udlm is not currently registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

866624 :clcomm: validate_policy: threads_low not big enough low %d pool %d

Description:

The system checks the proposed flow control policy parameters at system startup and when processing a change request. The low server thread level must not be less than twice the thread increment level for resource pools whose number threads varies dynamically.

Solution:

No user action required.

867059 :Could not shutdown replica for device service (%s). Some file system replicas that depend on this device service may already be shutdown. Future switchovers to this device service will not succeed unless this node is rebooted.

Description:

See message.

Solution:

If mounts or node reboots are on at the time this message was displayed, wait for that activity to complete, and then retry the command to shutdown the device service replica. If not, then contact your authorized Sun service provider to determine whether a workaround or patch is available.

868467:Process %s did not die in %d seconds.

Description:

Sun Cluster HA for NFS attempted to stop the specified process id but was unable to stop the process in time. Since Sun Cluster HA for NFS uses the SIGKILL signal to kill processes, this indicates a serious overload or kernel problem with the system.

Solution:

If this error occurs in a STOP method, the node should be rebooted. Increase timeout on the appropriate method.

869406 :Failed to communicate with server %s port %d: %s.

Description:

The data service fault monitor probe was trying to read from or write to the service specified and failed. Sun Cluster will attempt to correct the situation by either doing a restart or a failover of the data service. The problem may be due to an overloaded system or other problems, causing a timeout to occur before communications could be completed.

Solution:

If this problem is due to an overloaded system, you may consider increasing the Probe_timeout property.

870181 :Failed to retrieve the resource handle for %s while querying for property %s: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

870317 :INTERNAL ERROR: START method is not registered for resource <%s>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.

870566 :clutil: Scheduling class %s not configured

Description:

An attempt to change the thread scheduling class failed, because the scheduling class was not configured.

Solution:

Configure the system to support the desired thread scheduling class.

871642 :Validation failed. Invalid command line %s %s

Description:

Unable to process parameters passed to the call back method. This is an internal error.

Solution:

Please report this problem.

872086 :Service is degraded.

Description:

Probe is detected a failure in the data service. Probe is setting resource's status as degraded.

Solution:

Wait for the fault monitor to restart the data service. Check the syslog messages and configuration of the data service.

872599 :Error in getting service name for device path <%s>

Description:

Cannot map the device path to a valid global service name.

Solution:

Check the path passed into extension property "ServicePaths" of SUNW.HAStorage type resource.

872695:Could not start the adaptive server.

Description:

Sun Cluster HA for Sybase failed to start the Sybase Adaptive Server. Other syslog messages and the log file should provide additional information on possible reasons for failure.

Solution:

Manually start the Sybase Adaptive Server. Examine the log files and setup. See if the START method timeout value is set too low.

872839 :Resource is already stopped.

Description:

Sun Cluster attempted to stop the resource, but found it already stopped.

Solution:

No user action required.

874879 :clcomm: Path %s being deleted

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.

875171 :clcomm: Pathend %p: %d is not a pathend state

Description:

The system maintains state information about a path. The state information is invalid.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

875345 :None of the shared paths in file %s are valid.

Description:

All the paths specified in the dfstab._name> file are invalid.

Solution:

Check that those paths are valid. This might be a result of the underlying disk failure in an unavailable file system. The monitor_check method would thus fail and the HA-NFS resource would not be brought online on this node. However, it is advisable that the file system be brought online soon.

875595 :CMM: Shutdown timer expired. Halting.

Description:

The node could not complete its shutdown sequence within the halt timeout, and is aborting to enable another node to safely take over its services.

Solution:

This is an informational message, no user action is needed.

875796 :CMM: Reconfiguration callback timed out; node aborting.

Description:

One or more CMM client callbacks timed out and the node will be aborted.

Solution:

There may be other related messages on this node which may help diagnose the problem. Resolve the problem and reboot the node if node failure is unexpected. If unable to resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available.

875939 :ERROR: Failed to initialize callbacks for Global_resources_used, error code <%d>

Description:

The rgmd encountered an error while trying to initialize the Global_resources_used mechanism on this node. This is not considered a fatal error, but probably means that method timeouts will not be suspended while a device service is failing over. This could cause unneeded failovers of resource groups when device groups are switched over.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem. This error might be cleared by rebooting the node.

876090 :fatal: must be superuser to start %s

Description:

The rgmd can only be executed by the super-user.

Solution:

This probably occurred because a non-root user attempted to start the rgmd manually. Normally, the rgmd is started automatically when the node is booted.

876324 :CCR: CCR transaction manager failed to register with the cluster HA framework.

Description:

The CCR transaction manager failed to register with the cluster HA framework.

Solution:

This is an unrecoverable error, and the cluster needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.

876485:No execute permissions to the file %s.

Description:

The execute permissions to the specified file are not set.

Solution:

Set the execute permissions to this file.

876834 :Could not start server

Description:

HA-Oracle failed to start Oracle server. Syslog messages and log file will provide additional information on possible reasons of failure.

Solution:

Check whether Oracle server can be started manually. Examine the log files and setup.

877905 :ff_ioctl: %s

Description:

A server (rpc.pmfd or rpc.fed) was not able to arm or disarm the failfast device, which ensures that the host aborts if the server dies. The error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

878089 :fatal: realloc: %s (UNIX error %d)

Description:

The rgmd failed to allocate memory, most likely because the system has run out of swap space. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

The problem was probably cured by rebooting. If the problem recurs, you might need to increase swap space by configuring additional swap devices. See swap(1M) for more information.

878135 :WARNING: udlm_update_from_saved_msg

Description:

There is no saved message to update udlm.

Solution:

None. This is a warning only.

879301 :reservation error(%s) - clconf_do_execution() error. Node %d is not in the cluster.

Description:

The device fencing program has suffered an internal error.

Solution:

879380 :pmf_monitor_children: Error stopping <%s>: %s

Description:

An error occured while rpc.pmfd attempted to send a KILL signal to one of the processes of the given tag. The reason for the failure is also given. rpc.pmfd attempted to kill the process because a previous error occured while creating a monitor process for the process to be monitored.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

879511 :reservation fatal error(%s) - service_class not specified

Description:

The device fencing program has suffered an internal error.

Solution:

880317 :scvxvmlg fatal error - %s does not exist, VxVM not installed?

Description:

The program responsible for maintaining the VxVM namespace was unable to access the local VxVM device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

If VxVM is used to manage shared device, it must be installed on all cluster nodes. If VxVM is installed on this node, but the local VxVM namespace does not exist, VxVM may have to be re-installed on this node. If VxVM is installed on this node and the local VxVM device namespace does exist, the namespace management can be manually run on this node by executing '/usr/cluster/lib/dcs/scvxvmlg' on this node. If the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available. If VxVM is not being used on this cluster, then no user action is required.

880835 :pmf_search_children: Error stopping <%s>: %s

Description:

An error occurred while rpc.pmfd attempted to send a KILL signal to one of the processes of the given tag. The reason for the failure is also given. rpc.pmfd attempted to kill the process because a previous error occurred while creating a monitor process for the process to be monitored.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

883690 :Failed to start Monitor server.

Description:

Sun Cluster HA for Sybase failed to start the monitor server. Other syslog messages and the log file will provide additional information on possible reasons for the failure.

Solution:

Determine whether the server can be started manually. Examine the HA-Sybase log files, monitor server log files and setup.

884114 :clcomm: Adapter %s constructed

Description:

A network adapter has been initialized.

Solution:

No action required.

884438:A component of NFS did not start completely in %d seconds: prognum %lu, progversion %lu.

Description:

A daemon associated with the NFS service did not finish registering with RPC within the specified timeout.

Solution:

Increase the timeout associated with the method during which this failure occurred.

884482 :clconf: Quorum device ID %ld is invalid. The largest supported ID is %ld

Description:

Found the quorum device ID being invalid while converting the quorum configuration information into quorum table.

Solution:

Check the quorum configuration information.

884823 :Prog <%s> step <%s>: stat of program file failed.

Description:

A step points to a file that is not executable. This may have been caused by incorrect installation of the package.

Solution:

Identify the program for the step. Check the permissions on the program. Reinstall the package if necessary.

884979 :(%s) aborting, but got a message of type %d

Description:

Going through udlm abort and received an unexpected message of the mentioned type.

Solution:

None.

887666 :clcomm: sxdoor: op %d fcntl failed: %s

Description:

A user level process is unmarshalling a door descriptor and creating a new door. The specified operation on the fcntl operation fails. The "fcntl" man page describes possible error codes.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

887669 :clcomm: coalesce_region request(%d) > MTUsize(%d)

Description:

While supporting an invocation, the system wanted to create one buffer that could hold the data from two buffers. The system cannot create a big enough buffer. After generating another system error message, the system will panic. This message only appears on debug systems.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

888259 :clcomm: Path %s being deleted and cleaned

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.

889303:Failed to read from kstat:%s.

Description:

Sun Cluster HA for NFS fault monitor failed to look up the specified kstat parameter. The specific cause is logged with the message.

Solution:

Run the following command on the cluster node where this problem was encountered: /usr/bin/kstat -m nfs -i 0 -n nfs_server -s calls Barring resource availability issues. This call should successfully complete. If it fails without generating any output, contact your authorized Sun service provider for assistance.

889899 :scha_control RESTART failed. error %s

Description:

Fault monitor had detected problems in RDBMS server. Attempt to restart RDBMS server on the same node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.

890129 :dl_attach: DL_ERROR_ACK access error

Description:

Could not attach to the physical device. We are trying to open a fast path to the private transport adapters.

Solution:

Reboot of the node might fix the problem

890927 :HA: repl_mgr_impl: thr_create failed

Description:

The system could not create the needed thread, because there is inadequate memory.

Solution:

There are two possible solutions. Install more memory. Alternatively, reduce memory usage.

891030 :resource group %s state on node %s change to RG_OFF_BOOTED

Description:

This is a notification from the rgmd that a resource group has completed running its resources' BOOT methods on the given node. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

891362 :scha_resource_open error (%d)

Description:

Error occurred in API call scha_resource_open.

Solution:

Check syslog messages for errors logged from other system modules. Stop and start fault monitor. If error persists then disable fault monitor and report the problem.

891462 :in libsecurity caller is %d, not the desired uid %d

Description:

A server (rpc.pmfd, rpc.fed or rgmd) refused an rpc connection from a client because it has the wrong uid. The actual and desired uids are shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

891920 :rgm_clear_util called on resource <%s> with unknown flag <%d> anation:

Description:

An internal rgmd error has occurred while attempting to carry out an operator request to clear an error flag on a resource. The attempted clear action will fail.

Solution:

892183 :libsecurity: NULL RPC to program %ld failed will not retry %s

Description:

A client of the rpc.pmfd, rpc.fed or rgmd server was unable to initiate an rpc connection, because it could not execute a test rpc call. The program will not retry because the time limit of 1 hour was exceeded. The message shows the specific rpc error. The program number is shown. To find out what program corresponds to this number, use the rpcinfo command. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

893019 :%d-bit saposcol is running on %d-bit Solaris.

Description:

The architecture of saposcol is not compatible to the current running Solaris version. For example, your configuration is not compatible if you have a 64-bit saposcol running on a 32-bit Solaris machine.

Solution:

Make sure the correct saposcol is installed on the cluster.

893095 :Service <%s> with path <%s> is not available. Retrying...

Description:

The service is not available yet. prenet_start method of SUNW.HAStorage is still testing and waiting.

Solution:

Not user action is required.

894418 :reservation warnings) - Found invalid key, preempting

Description:

The device fencing program has discovered an invalid scsi-3 key on the specified device and is removing it.

Solution:

This is an informational message, no user action is needed.

894711 :Could not resolve '%s' in the name server. Exiting.

Description:

clexecd program was unable to start due to an error in registering itself with the low-level clustering software.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

894800:Dependent hosts are not up.Not starting BV servers on $HOSTNAME.

Description:

The hosts in the startup order on which the specified hosts depend have not started.

Solution:

Bring the resource group containing the specified host online, if it is not running. If the resource group is online, the user action required because the Sun Cluster HA for BroadVision One-To-One Enterprise Probe should take appropriate action.

895149 :(%s) t_open: tli error: %s

Description:

Call to t_open() failed. The "t_open" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

895159 :clcomm: solaris xdoor dup failed: %s

Description:

A dup operation failed. The "dup" man page describes possible error codes.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

895821 :INTERNAL ERROR: cannot get nodeid for node <%s>

Description:

The scha_control function is unable to obtain the node id number for one of the resource group's potential masters. This node will not be considered a candidate destination for the scha_control giveover.

Solution:

Try issuing an scstat(1M) -n command and see if it successfully reports status for all nodes. If not, then the cluster configuration data may be corrupted. If so, then there may be an internal logic error in the rgmd. In either case, please save copies of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

896275 :CCR: Ignoring override field for table %s on joining node %s.

Description:

The override flag for a table indicates that the CCR should use this copy as the final version when the cluster is coming up. If the cluster already has a valid copy while the indicated node is joining the cluster, then the override flag on the joining node is ignored.

Solution:

This is an informational message, no user action is needed.

896441 :Unknown scalable service method code: %d.

Description:

The method code given is not a method code that was expected.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

896532 :Invalid variable name in Environment_file. Ignoring %s

Description:

HA-Sybase reads the Environment_file and exports the variables declared in the Environment file. Syntax for declaring the variables is : VARIABLE=VALUE Lines starting with "#" are treated as comment lines. Lines starting with "export" are ignored. VARIABLE is expected to be a valid Korn shell variable that starts with alphabet or "_" and contains alphanumerics and "_".

Solution:

Please check the syntax and correct the Environment_file

896799 :INTERNAL ERROR: resource group <%s> is PENDING_BOOT or ERROR_STOP_FAILED, but contains no resources

Description:

The operator is attempting to delete the indicated resource group. Although the group is empty of resources, it was found to be in an unexpected state. This will cause the resource group deletion to fail.

Solution:

Use scswitch(1M) -z to switch the resource group offline on all nodes, then retry the deletion operation. Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

897348 :%s: must be run in secure mode using -S flag

Description:

rpc.sccheckd should always be invoked in secure mode. If this message shows up, someone has modified configuration files that affects server startup.

Solution:

Reinstall cluster packages or contact your service provider.

898001 :launch_fed_prog: getlocalhostname() failed for program <%s>

Description:

The ucmmd was unable to obtain the name of the local host. Launching of a method failed.

Solution:

898738 :Aborting node because pm_tick delay of %lld ms exceeds %lld ms

Description:

The system is unable to send heartbeats for a long time. (This is half of the minimum of timeout values of all the paths. If the timeout values for all the paths is 10 seconds then this value is 5 seconds.) There is probably heavy interrupt activity causing the clock thread to get delayed, which in turn causes irregular heartbeats. The node is aborted because it is considered to be in 'sick' condition and it is better to abort this node instead of causing other nodes (or the cluster) to go down.

Solution:

Check to see what is causing high interrupt activity and configure the system accordingly.

899278 :Retry_count exceeded in Retry_interval

Description:

Fault monitor has detected problems in RDBMS server. Number of restarts through fault monitor exceed the count specified in 'Retry_count' parameter in 'Retry_interval'. Database server is unable to survive on this node. Switching over the resource group to other node.

Solution:

Please check the RDBMS setup and server configuration.

899305 :clexecd: Daemon exiting because child died.

Description:

Child process in the clexecd program is dead.

Solution:

If this message is seen when the node is shutting down, ignore the message. If that's not the case, the node will halt or reboot itself to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

899648 :Failed to process the resource information.

Description:

A Sun cluster data service is unable to retrieve the resource property information. Low memory or API call failure might be the reasons.

Solution:

899776 :ERROR: scha_control() was called on resource group <%s>, resource <%s> before the RGM started

Description:

This message most likely indicates that a program called scha_control(1ha,3ha) before the RGM had started up. Normally, scha_control is called by a resource monitor to request failover or restart of a resource group. If the RGM had not yet started up on the cluster, no resources or resource monitors should have been running on any node. The scha_control call will fail with a SCHA_ERR_CLRECONF error.

Solution:

On the node where this message appeared, confirm that rgmd was not yet running (i.e., the cluster was just booting up) when this message was produced. Find out what program called scha_control. If it was a customer-supplied program, this most likely represents an incorrect program behavior which should be corrected. If there is no such customer-supplied program, or if the cluster was not just starting up when the message appeared, contact your authorized Sun service provider for assistance in diagnosing the problem.