Sun Cluster 3.1 Error Messages Guide

Message IDs 800000–899999


800040 Error deleting PidFile <%s> (%s) for Apache service with apachectl file <%s>.

Description:

The data service was not able to delete the specified PidFile file.

Solution:

Delete the PidFile file manually and start the resource group.


800320:Fencing %s from shared disk devices.

Description:

A reservation has been performed to fence off nonmember nodes from disks that are shared between the cluster nodes.

Solution:

None.


801519 connect: %s

Description:

Solution:


802295 monitor_check: resource group <%s> changed while running MONITOR_CHECK methods

Description:

An internal error has occurred in the locking logic of the rgmd, such that a resource group was erroneously allowed to be edited while a failover was pending on it, causing the scha_control call to return early with an error. This in turn will prevent the attempted failover of the resource group from its current master to a new master. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


802539 No permission for owner to read %s.

Description:

The owner of the file does not have read permission on it.

Solution:

Set the permissions on the file so the owner can read it.


803339 Prog <%s> step <%s>: program file is not executable.


803391 Could not validate the settings in %s. It is recommended that the settings for host lookup consult `files` before a name server.

Description:

Validation callback method has failed to validate the hostname list. There may be syntax error in the nsswitch.conf file.

Solution:

Check for the following syntax rules in the nsswitch.conf file. 1) Check if the lookup order for "hosts" has "files". 2) "cluster" is the only entry that can come before "files". 3) Everything in between '[' and ']' is ignored. 4) It is illegal to have any leading whitespace character at the beginning of the line; these lines are skipped. Correct the syntax in the nsswitch.conf file and try again.


803570 lkcm_parm: invalid handle was passed %s %d

Description:

Solution:


803649 Failed to check whether the resource is a logical host resource.

Description:

While retrieving the IP addresses from the network resources in the resource group, the attempt to check whether the resource is a logical host resource or not has failed.

Solution:

Internal error or API call failure might be the reasons. Check the error messages that occurred just before this message. If there is internal error, contact your authorized Sun service provider. For API call failure, check the syslog messages from other components. For the resource name and resource group name, check the syslog tag.


803719 host %s failed, and clnt_spcreateerror returned NULL

Description:

The rgm is not able to establish an rpc connection to the rpc.fed server on the host shown, and the rpc error could not be read. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


804457 Error reading properties; using old properties

Description:

Solution:


804658 clexecd: close returned %d while exec'ing (%s). Exiting.

Description:

clexecd program has encountered a failed close(2) system call. The error message indicates the error number for the failure.

Solution:

The clexecd program will exit and the node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


804791 A warm restart of rpcbind may be in progress.

Description:

The HA-NFS probe detected that the rpcbind daemon is not running, however it also detected that a warm restart of rpcbind is in progress.

Solution:

If a warm restart is indeed in progress, ignore this message. Otherwise, check to see if the rpcbind daemon is running. If not, reboot the node. If the rpcbind process is not running, the HA-NFS probe would reboot the node itself if the Failover_mode on the resource is set to HARD.


804820 clcomm: path_manager failed to create RT lwp (%d)

Description:

The system failed to create a real time thread to support path manager heart beats.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


805735 Failed to connect to the host <%s> and port <%d>.

Description:

An error occurred the while fault monitor attempted to make a connection to the specified hostname and port.

Solution:

Wait for the fault monitor to correct this by doing restart or failover. For more error descriptions, look at the syslog messages.


805788 reservation fatal error(%s) - service_name not specified

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


806365 monitor_check: getlocalhostname() failed for resource <%s>, resource group <%s>

Description:

While attempting to process a scha_control(1HA,3HA) call, the rgmd failed in an attempt to obtain the hostname of the local node. This is considered a MONITOR_CHECK method failure. This in turn will prevent the attempted failover of the resource group from its current master to a new master.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


806618 Resource group name is null.

Description:

This is an internal error. While attempting to retrieve the resource information, null value was retrieved for the resource group name.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


806902 clutil: Could not create lwp during respawn

Description:

There was insufficient memory to support this operation.

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.


807015 Validation of URI %s failed

Description:

The validation of the uri entered in the monitor_uri_list failed.

Solution:

Make sure a proper uri is entered. Check the syslog and /var/adm/messages for the exact error. Fix it and set the monitor_uri_list extension property again.


807249 CMM: Node %s (nodeid = %d) with votecount = %d removed.

Description:

The specified node with the specified votecount has been removed from the cluster.

Solution:

This is an informational message, no user action is needed.


808444 lkcm_reg: Unix DLM version (?) and the OSD library version (%d) are not compatible. Unix DLM versions acceptable to this library are: %d

Description:

UNIX DLM and Oracle DLM are not compatible. Compatible versions will be printed as part of this message.

Solution:

Check installation procedure to make sure you have the correct versions of Oracle DLM and Unix DLM. Contact Sun service representative if versions cannot be resolved.


808746 Node id %d is higher than the maximum node id of %d in the cluster.

Description:

In one of the scalable networking properties, a node id was encountered that was higher than expected.

Solution:

Verify that the nodes listed in the scalable networking properties are still valid cluster members.


809322 Couldn't create deleted directory: error (%d)

Description:

The file system is unable to create temporary copies of deleted files.

Solution:

Mount the affected file system as a local file system, and ensure that there is no file system entry with name "._" at the root level of that file system. Alternatively, run fsck on the device to ensure that the file system is not corrupt.


809329 No adapter for node %s.

Description:

No IPMP group has been specified for this node.

Solution:

If this error message has occured during resource creation, supply valid adapter information and retry it. If this message has occured after resource creation, remove the LogicalHostname resource and recreate it with the correct IPMP group for each node which is a potential master of the resource group.


809554 Unable to access directory %s:%s.

Description:

A HA-NFS method attempted to access the specified directory but was unable to do so. The reason for the failure is also logged.

Solution:

If the directory is on a mounted filesystem, make sure the filesystem is currently mounted. If the pathname of the directory is not what you expected, check to see if the Pathprefix property of the resource group is set correctly. If this error occurs in any method other then VALIDATE, HA-NFS would attempt to recover the situation by either failing over to another node or (in case of Stop and Postnet_stop) by rebooting the node.


809858 ERROR: method <%s> timeout for resource <%s> is not an integer

Description:

The indicated resource method timeout, as stored in the CCR, is not an integer value. This might indicate corruption of CCR data or rgmd in-memory state. The method invocation will fail; depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

Use scstat(1M) -g and scrgadm(1M) -pvv to examine resource properties. If the values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


809956 PCSEXIT: %s

Description:

The rpc.pmfd server was not able to monitor a process, and the system error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


809985 Elements in Confdir_list and Port_list must be 1-1 mapping

Description:

The Confdir_list and Port_list properties must contain the same number of entries, thus maintaining a 1-1 mapping between the two.

Solution:

Using the appropriate scrgadm command, configure this resource to contain the same number of entries in the Confdir_list and the Port_list properties.


810318 Unable to get the resource group handle: %s


810551 fatal: Unable to bind president to nameserver

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


811254 VALIDATE failed on resource <%s>, resource group <%s>

Description:

The resource's VALIDATE method exited with a non-zero exit code. This indicates that an attempted update of a resource or resource group is invalid.

Solution:

Examine syslog messages occurring just before this one to determine the cause of validation failure. Re-try the update.


811357 Successfully started BV servers on $HOSTNAME.

Description:

Just an informational message that the BV servers on the specified host have started.

Solution:

No action needed.


811463 match_online_key failed strdup for (%s)

Description:

Call to strdup failed. The "strdup" man page describes possible reasons.

Solution:

Install more memory, increase swap space or reduce peak memory consumption.


812706 dl_attach: DL_OK_ACK protocol error

Description:

Could not attach to the private interconnect interface.

Solution:

Reboot of the node might fix the problem.


812742 read: %s

Description:

The rpc.fed server was not able to execute the read system call properly. The message contains the system error. The server will not capture the output from methods it runs.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


813317 Failed to open the cluster handle: %s.

Description:

An internal error occurred while attempting to open a handle for an object.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


813831 reservation warning(%s) - MHIOCSTATUS error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.


813866 Property %s has no hostnames for resource %s.

Description:

The named property does not have any hostnames set for it.

Solution:

Re-create the named resource with one or more hostnames.


813977 Node %d is listed twice in property %s.

Description:

The node in the message was listed twice in the named property.

Solution:

Specify the property with only one occurrence of the node.


813990 Started the HA-NFS system fault monitor.

Description:

The HA-NFS system fault monitor was started successfully.

Solution:

No action required.


814232 fork() failed: %m.

Description:

The fork() system call failed for the given reason.

Solution:

If system resources are not available, consider rebooting the node.


814420 bind: %s

Description:

Solution:


814905 Could not start up DCS client because major numbers on this node do not match the ones on other nodes. See /var/adm/messages for previous errors.

Description:

Some drivers identified in previous messages do not have the same major number across cluster nodes, and devices owned by the driver are being used in global device services.

Solution:

Look in the /etc/name_to_major file on each cluster node to see if the major number for the driver matches across the cluster. If a driver is missing from the /etc/name_to_major file on some of the nodes, then most likely, the package the driver ships in was not installed successfully on all nodes. If this is the case, install that package on the nodes that don't have it. If the driver exists on all nodes but has different major numbers, see the documentation that shipped with this product for ways to correct this problem.


815551 System property %s with value %s has an empty list element.

Description:

The system property that was named does not have a value for one of its list elements.

Solution:

Assign the property to have a value where all list elements have values.


815833 Malformed property value pair %s.


816002 The port number %d from entry %s in property %s was not for a nonsecure port.

Description:

The Netscape Directory Server instance has been configured as nonsecure, but the port number given in the list property is for a secure port.

Solution:

Remove the the entry from the list or change its port number to correspond to a nonsecure port.


816578 Node %u attempting to join cluster has incompatible cluster software. %s not compatible with %s

Description:

A node is attempting to join the cluster but it is either using an incompatible software version or is booted in a different mode (32-bit vs. 64-bit).

Solution:

Ensure that all nodes have the same clustering software installed and are booted in the same mode.


817592 HA: rma::admin_impl failed to bind

Description:

An HA framework component failed to register with the name server.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


818821 Value %d is listed twice in property %s.

Description:

The value listed occurs twice in the named property.

Solution:

Specify the property with only one occurrence of the value.


818824 HA: rma::reconf can't talk to RM

Description:

An HA framework component failed to register with the Replica Manager.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


818836 Value %s is listed twice in property %s.

Description:

The value listed occurs twice in the named property.

Solution:

Specify the property with only one occurrence of the value.


819642 fatal: unable to register RPC service; aborting node

Description:

The rgmd was unable to start up successfully because it failed to register an RPC service. It will produce a core file and will force the node to halt or reboot.

Solution:

If rebooting the node doesn't fix the problem, examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.


819738 Property %s is not set - %s.

Description:

The property has not been set by the user and must be.

Solution:

Reissue the scrgadm command with the required property and value.


819721 Failed to start %s.

Description:

Sun Cluster could not start the application. It would attempt to start the service on another node if possible.

Solution:

1) Check prior syslog messages for specific problems and correct them. 2) This problem may occur when the cluster is under load and Sun Cluster cannot start the application within the timeout period specified. You may consider increasing the Start_timeout property. 3) If the resource was unable to start on any node, resource would be in START_FAILED state. In this case, use scswitch to bring the resource ONLINE on this node. 4) If the service was successfully started on another node, attempt to restart the service on this node using scswitch. 5) If the above steps do not help, disable the resource using scswitch. Check to see that the application can run outside of the Sun Cluster framework. If it cannot, fix any problems specific to the application, until the application can run outside of the Sun Cluster framework. Enable the resource using scswitch. If the application runs outside of the Sun Cluster framework but not in response to starting the data service, contact your authorized Sun service provider for assistance in diagnosing the problem.


820394 Cannot check online status. Server processes are not running.

Description:

HA-Oracle could not check online status of Oracle server. Oracle server processes are not running.

Solution:

Examine 'Connect_string' property of the resource. Make sure that user id and password specified in connect string are correct and permissions are granted to user for connecting to the server. Check whether Oracle server can be started manually. Examine the log files and setup.


821304 Failed to retrieve the resource group information.

Description:

A Sun cluster data service has failed to retrieve the resource group property information. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components.


821781 Fencing shared disk groups: %s

Description:

A reservation failfast will be set so nodes which share these disk groups will be brought down if they are fenced off by other nodes.

Solution:

None.


822385 Failed to retrieve process monitor facility tag.

Description:

Failed to create the tag that is used to register with the process monitor facility.

Solution:

Check the syslog messages that occurred just before this message. In case of internal error, save the /var/adm/messages file and contact authorized Sun service provider.


824468 Invalid probe values.

Description:

The values for system defined properties Retry_count and Retry_interval are not consistent with the property Thorough_Probe_Interval.

Solution:

Change the values of the properties to satisfy the following relationship: Thorough_Probe_Interval * Retry_count <= Retry_interval.


824550 clcomm: Invalid flow control parameters

Description:

The flow control policy is controlled by a set of parameters. These parameters do not satisfy guidelines. Another message from validay_policy will have already identified the specific problem.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


824861 Resource %s named in property %s is not a SharedAddress resource.

Description:

The resource given for the named property is not a SharedAddress resource. All resources for that property must be SharedAddress resources.

Solution:

Specify only SharedAddresses for the named property.


825274 idl_scha_control_checkall(): IDL Exception on node <%d>

Description:

During a failover attempt, the scha_control function was unable to check the health of the indicated node, because of an error in inter-node communication. This was probably caused by the death of the indicated node during scha_control execution. The RGM will still attempt to master the resource group on another node, if available.

Solution:

No action is required; the rgmd should recover automatically. Identify what caused the node to die by examining syslog output. The syslog output might indicate further remedial actions.


826050 Failed to retrieve the cluster property %s for %s: %s.

Description:

The query for a property failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


826353 Unable to open /dev/console: %s

Description:

While starting up, one of the rgmd daemons was not able to open /dev/console. The message contains the system error. This will prevent the daemon from starting on this node.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


826397 Invalid values for probe related parameters.

Description:

Validation of the probe related parameters is failed. Invalid values are specified for these parameters.

Solution:

Retry_interval must be greater than or equal to the product of Thorough_probe_interval, and Retry_count. Use scrgadm(1M) to modify the values of these parameters so that they will hold the above relationship.


826747 reservation error(%s) - do_scsi3_inkeys() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.


827525 reservation message(%s) - Fencing other node from disk %s

Description:

The device fencing program is taking access to the specified device away from a non-cluster node.

Solution:

This is an informational message, no user action is needed.


828140 Starting %s.

Description:

Sun Cluster is starting the specified application.

Solution:

This is an informational message, no user action is needed.


828170 CCR: Unrecoverable failure during updating table %s.

Description:

CCR encountered an unrecoverable error while updating the indicated table on this node.

Solution:

The node needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.


828171 stat of file %s failed.

Description:

Status of the named file could not be obtained.

Solution:

Check the permissions of the file and all components in the path prefix.


828283 clconf: No memory to read quorum configuration table

Description:

Could not allocate memory while converting the quorum configuration information into quorum table.

Solution:

This is an unrecoverable error, and the cluster needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.


828407 WARNING: lkcm_sync failed: unknown message type %d

Description:

An message of unknown type was sent to udlm. This will be ignored.

Solution:

None.


828474 resource group %s property changed.

Description:

This is a notification from the rgmd that the operator has edited a property of a resource group. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


828739 transition '%s' timed out for cluster, forcing reconfiguration.

Description:

Step transition failed. A reconfiguration will be initiated.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


829117 scha_control GIVEOVER failed. error %d

Description:

Fault monitor had detected problems in RDBMS server. Attempt to switchover resource to another node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.


829132 scha_control GIVEOVER failed. error %d

Description:

Fault monitor had detected problems in RDBMS server. Attempt to switchover resource to another node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.


829262 Switchover (%s) error: cannot find clexecd

Description:

The file system specified in the message could not be hosted on the node the message came from. Check to see if the user program "clexecd" is running on that node.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


829384 INTERNAL ERROR: launch_method: state machine attempted to launch invalid method <%s> (method <%d>) for resource <%s>; aborting node

Description:

An internal error occurred when the rgmd attempted to launch an invalid method for the named resource. The rgmd will produce a core file and will force the node to halt or reboot.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


830211 Failed to accept connection on socket: %s.

Description:

While determining the health of the data service, fault monitor is failed to communicate with the process monitor facility.

Solution:

This is internal error. Save /var/adm/messages file and contact your authorized Sun service provider. For more details about error, check the syslog messges.


831036 Service object [%s, %s, %d] created in group '%s'

Description:

A specific service known by its unique name SAP (service access point), the three-tuple, has been created in the designated group.

Solution:

This is an informational message, no user action is needed.


833126 Monitor server successfully started.

Description:

The Sybase Monitor server has beensuccessfully started by Sun Cluster HA for Sybase.

Solution:

This is an information message, no user action is needed.


833212 Attempting to start the data service under process monitor facility.

Description:

The function is going to request the PMF to start the data service. If the request fails, refer to the syslog messages that appear after this message.

Solution:

This is an informational message, no user action is required.


833229 Couldn't remove deleted directory file, '%s' error: (%d)

Description:

The file system is unable to create temporary copies of deleted files.

Solution:

Mount the affected file system as a local file system, and ensure that there is no file system entry with name "._" at the root level of that file system. Alternatively, run fsck on the device to ensure that the file system is not corrupt.


833729 Setup error. Unable to monitor database.

Description:

Fault monitor is unable to continue with database monitoring. This error can be result of incorrect setup such as wrong password for fault monitor user, incorrect database access permissions, or internal errors in fault monitor. Fault monitor is stopped after logging this syslog message. More information and error codes are available in other syslog messages are logged by the fault monitor prior to this message.

Solution:

Check syslog messages logged by the fault monitor. After correcting the setup, fault monitor can be started as follows: scrgadm -n -M -j <resource> scrgadm -e -M -j <resource>


833970 clcomm: getrlimit(RLIMIT_NOFILE): %s

Description:

During cluster initialization within this user process, the getrlimit call failed with the specified error.

Solution:

Read the man page for getrlimit for a more detailed description of the error.


834530 Failed to parse xml: invalid element %s

Description:

Solution:


834589 Error while executing scsblconfig.

Description:

There was an error while attempting to execute (source) the specified file. This may be due to improper permissions, or improper settings in this file.

Solution:

Please verify that the file has correct permissions. If permissions are correct, verify all the settings in this file. Try to manually source this file in korn shell ('. scsblconfig'), and correct any errors.


836593 Received a connect request from a node not configured in the cluster. Nodeid %u ipaddr 0x%x

Description:

CCR tables are temporarily out of sync.


837169 Starting listener %s.

Description:

Informational message. HA-Oracle will be starting Oracle listener.

Solution:

None


837211 Resource is already online.

Description:

While attempting to restart the resource, error has occurred. The resource is already online.

Solution:

This is an internal error. Save the /var/adm/messages file from all the nodes. Contact your authorized Sun service provider.


837752 Failed to retrieve the resource group handle for %s while querying for property %s: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


837760 monitored processes forked failed (errno=%d)

Description:

The rpc.pmfd server was not able to start (fork) the application, probably due to low memory, and the system error number is shown. An error message is output to syslog.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


838695 Unable to process client registration

Description:

Solution:


839641 t_alloc (reqp): %s

Description:

Call to t_alloc() failed. The "t_alloc" man page describes possible error codes. udlm will exit and the node will abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


839649 t_alloc (resp): %s

Description:

Call to t_alloc() failed. The "t_alloc" man page describes possible error codes. udlm will exit and the node will abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


839936 Some ip addresses may not be plumbed.

Description:

Some of the ip addresses managed by the LogicalHostname resource were not successfully brought on-line on this node.

Solution:

Use ifconfig command to make sure that the ip addresses are indeed absent. Check for any error message before this error message for a more precise reason for this error. Use scswitch to move the resource group to some other node.


839881 Media error encountered, but Auto_end_bkp failed.

Description:

The HA-Oracle start method identified that one or more datafiles is in need of recovery. This was caused by the file(s) being left in hot backup mode. The Auto_end_bkp extension property is enabled, but failed to recover the database.

Solution:

Examine the log files for the cause of the failure to recover the database.


840542 OFF_PENDING_BOOT: bad resource state <%s> (%d) for resource <%s>

Description:

The rgmd state machine has discovered a resource in an unexpected state on the local node. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


840619 Invalid value was returned for resource group property %s for %s.

Description:

The value returned for the named property was not valid.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


840696 DNS database directory %s is not readable: %s

Description:

The DNS database directory is not readable. This may be due to the directory not existing or the permissions not being set properly.

Solution:

Make sure the directory exists and has read permission set appropriately. Look at the prior syslog messages for any specific problems and correct them.


841616 CMM: This node has been preempted from quorum device %s.

Description:

This node's reservation key was on the specified quorum device, but is no longer present, implying that this node has been preempted by another cluster partition. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There may be other related messages that may indicate why quorum was lost. Determine why quorum was lost on this node, resolve the problem and reboot this node.


841719 listener %s is not running. restart limit reached. Stopping fault monitor.

Description:

Listener is not running. Listener monitor has reached the restart limit specified in 'Retry_count' and 'Retry_interval' properties. Listener monitor will be stopped.

Solution:

Check Oracle listener setup. Please make sure that Listener_name specified in the resource property is configured in listener.ora file. Check 'Host' property of listener in listener.ora file. Examine log file and syslog messages for additional information. Stop and start listener monitor.


841875 remote node died

Description:

An inter-node communication failed because another cluster node died.

Solution:

No action is required. The cluster will reconfigure automatically. Examine syslog output on the rebooted node to determine the cause of node death.


842059 Cannot create monitor child process. fork failed with %m

Description:

Fault monitor is not able to create child process. Fault monitor will be restarted. If problem persists, fault monitor will be stopped.


842313 clexecd: Sending fd on common channel returned %d. Exiting.

Description:

clexecd program has encountered a failed fcntl(2) system call. The error message indicates the error number for the failure.

Solution:

The node will halt or reboot itself to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


842382 fcntl: %s

Description:

A server (rpc.pmfd or rpc.fed) was not able to execute the action shown, and the process associated with the tag is not started. The error message is shown.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


842712 clcomm: solaris xdoor door_create failed

Description:

A door_create operation failed. Refer to the "door_create" man page for more information.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


843070 Failed to disconnect from port %d of resource %s.

Description:

Unable to connect to the port at hostname and port.

Solution:

If the problem persists Sun Cluster will restart or failover the resource.


843013 Data service failed to stay up. Start method failed.

Description:

The data service may have failed to startup completely.

Solution:

Look in /var/adm/messages for the cause of failure. Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


843070 Failed to disconnect from port %d of resource %s.

Description:

An error occurred while fault monitor attempted to disconnect from the specified hostname and port.

Solution:

Wait for the fault monitor to correct this by doing restart or failover. For more error descriptions, look at the syslog messages.


843093 fatal: Got error <%d> trying to read CCR when enabling monitor of resource <%s>; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


843876 Media error encountered, and Auto_end_bkp was successful.

Description:

The HA-Oracle start method identified that one or more datafiles was in need of recovery. This was caused by the file(s) being left in hot backup mode. The Auto_end_bkp extension property is enabled, and successfuly recovered and opened the database.

Solution:

None. This is an informational message. Oracle server is online.


843978 Socket creation failed: %s.

Description:

System is unable to create a socket.

Solution:

This might be the result from the lack of system resources. Check whether the system is low in memory and take appropriate action. For specific error information check the syslog message.


843983 CMM: Node %s: attempting to join cluster.

Description:

The specified node is attempting to become a member of the cluster.

Solution:

This is an informational message, no user action is needed.


845866 Failover attempt failed: %s.

Description:

The failover attempt of the resource is rejected or encountered an error.

Solution:

For more detailed error message, check the syslog messages. Check whether the Pingpong_interval has appropriate value. If not, adjust it using scrgadm(1M). Otherwise, use scswitch to switch the resource group to a healthy node.


845977 Failed to parse xml: low memory

Description:

Solution:


846053 Fast path enable failed on %s%d, could cause path timeouts

Description:

DLPI fast path could not be enabled on the device.

Solution:

Check if the right version of the driver is in use.


846376 fatal: Got error <%d> trying to read CCR when making resource group <%s> unmanaged; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


846420 CMM: Nodes %ld and %ld are disconnected from each other; node %ld will abort using %s rule.

Description:

Due to a connection failure between the two specified non-local nodes, one of the nodes must be halted to avoid a "split brain" configuration. The CMM used the specified rule to decide which node to fail. Rules are: rebootee: If one node is rebooting and the other was a member of the cluster, the node that is rebooting must abort. quorum: The node with greater control of quorum device votes survives and the other node aborts. node number: The node with higher node number aborts.

Solution:

The cause of the failure should be resolved and the node should be rebooted if node failure is unexpected.


846813 Switchover (%s) error (%d) converting to primary

Description:

The file system specified in the message could not be hosted on the node the message came from.

Solution:

Check /var/adm/messages to make sure there were no device errors. If not, contact your authorized Sun service provider to determine whether a workaround or patch is available.


847065 Failed to start listener %s.

Description:

Failed to start Oracle listener.

Solution:

Check Oracle listener setup. Please make sure that Listener_name specified in the resource property is configured in listener.ora file. Check 'Host' property of listener in listener.ora file. Examine log file and syslog messages for additional information.


847124 getnetconfigent: %s

Description:

call to getnetconfigent in udlm port setup failed.udlm fails to start and the node will eventually panic.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


847496 CMM: Reading reservation keys from quorum device %s failed with error %d.

Description:

The specified error was encountered while trying to read reservation keys on the specified quorum device.

Solution:

There may be other related messages on this and other nodes connected to this quorum device that may indicate the cause of this problem. Refer to the quorum disk repair section of the administration guide for resolving this problem.


847656 Command %s is not executable.

Description:

The specified pathname, which was passed to a libdsdev routine such as scds_timerun or scds_pmf_start, does not refer to an executable file. This could be the result of 1) mis-configuring the name of a START or MONITOR_START method or other property, 2) a programming error made by the resource type developer, or 3) a problem with the specified pathname in the file system itself.

Solution:

Ensure that the pathname refers to a regular, executable file.


847809 Must be in cluster to start %s

Description:

Machine on which this command or daemon is running is not part of a cluster.

Solution:

Run the command on another machine or make the machine is part of a cluster by following appropriate steps.


847916 (%s) netdir error: uaddr2taddr: %s

Description:

Call to uaddr2taddr() failed. The "uaddr2taddr" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


847978 reservation fatal error(UNKNOWN) - cluster_get_quorum_status() error, returned %d

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


847994 Plumb failed. tried to unplumb %s%d, unplumb failed with rc %d

Description:

Topology Manager failed to plumb an adapter for private network. A possible reason for plumb to fail is that it is already plumbed. Solaris Clustering tries to unplumb the adapter and plumb it for private use but it could not unplumb the adapter.

Solution:

Check if the adapter by that name exists.


848033 SharedAddress online.

Description:

The status of the sharedaddress resource is online.

Solution:

This is informational message. No user action required.


848652 CMM aborting.

Description:

The node is going down due to a decision by the cluster membership monitor.

Solution:

This message is preceded by other messages indicating the specific cause of the abort, and the documentation for these preceding messages will explain what action should be taken. The node should be rebooted if node failure is unexpected.


848854 Failed to retrieve WLS extension properties.

Description:

The WLS Extension properties could not be retrieved.

Solution:

Check for other messages in syslog and /var/adm/messages for details of failure.


848943 clconf: No valid gdevname field for quorum device %d

Description:

Found the gdevname field for the quorum device being incorrect while converting the quorum configuration information into quorum table.

Solution:

Check the quorum configuration information.


849856 sigemptyset: %s

Description:

The rpc.pmfd server was not able to initialize a signal set. The message contains the system error. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the syslog messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


850108 Validation failed. PARAMETER_FILE: %s does not exist

Description:

Oracle parameter file (typically init<sid>.ora) specified in property 'Parameter_file' does not exist or is not readable.

Solution:

Please make sure that 'Parameter_file' property is set to the existing Oracle parameter file. Reissue command to create/update


852212 reservation message(%s) - Taking ownership of disk %s away from non-cluster node

Description:

The device fencing program is taking access to the specified device away from a non-cluster node.

Solution:

This is an informational message, no user action is needed.


852497 scvxvmlg error - readlink(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


852615 reservation error(%s) - Unable to gain access to device '%s'

Description:

The device fencing program has encountered errors while trying to access a device.

Solution:

Another cluster node has fenced this node from the specified device, preventing this node from accessing that device. Access should have been reacquired when this node joined the cluster, but this must have experienced problems. If the message specifies the 'node_join' transition, this node will be unable to access the specified device. If the failure occurred during the 'make_primary' transition, then this will be unable to access the specified device and a device group containing the specified device may have failed to start on this node. An attempt can be made to acquire access to the device by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on this node. If a device group failed to start on this node, the scswitch command can be used to start the device group on this node if access can be reacquired. If the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available.


853478 Received non interrupt heartbeat on %s - path timeouts are likely.

Description:

Solaris Clustering requires network drivers to deliver heartbeat messages in the interrupt context. A heartbeat message has unexpectedly arrived in non interrupt context.

Solution:

Check if the right version of the driver is in use.


853956 INTERNAL ERROR: WLS extension properties structure is NULL.

Description:

This is an internal Error.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


854468 failfast arm error: %d

Description:

Error during failfast device arm operation.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


854792 clcomm: error in copyin for cl_change_threads_min

Description:

The system failed a copy operation supporting a flow control state change.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


854894 No LogicalHostname resource in resource group.

Description:

The probe method for this data service could not find a LogicalHostname resource in the same resource group as the data service.

Solution:

Use scrgadm to configure the resource group to hold both the data service and the LogicalHostname.


856492 waitpid() failed: %m.

Description:

The waitpid() system call failed for the given reason.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


856880 Desired_primaries for resource group %s should be 0. Current value is %d.

Description:

The number of desired primaries for this resource group to be zero. In the event that a node dies or joins the cluster, the resource group might come online on some node, even if it was previously switched offline, and was intended to remain offline.

Solution:

Set the Desired_primaries property for the resource group to zero.


856919 INTERNAL ERROR: process_resource: resource group <%s> is pending_methods but contains resource <%s> in STOP_FAILED state

Description:

During a resource creation, deletion, or update, the rgmd has discovered a resource in STOP_FAILED state. This may indicate an internal logic error in the rgmd, since updates are not permitted on the resource group until the STOP_FAILED error condition is cleared.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


857573 scvxvmlg error - rmdir(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


857792 UNIX DLM initiating cluster abort.

Description:

Due to an error encountered, unix dlm is initiating an abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


858256 Stopping %s with command %s.

Description:

Sun Cluster is stopping the specified application with the specified command.

Solution:

This is an informational message, no user action is needed.


859126 System property %s is empty.

Description:

The system property that was named does not have a value.

Solution:

Assign the property a value.


861738 Error: unknown error code\n

Description:

Solution:


862493 in libsecurity could not register on any transport in NETPATH

Description:

A server (rpc.pmfd, rpc.fed or rgmd) was not able to start because it could not establish a rpc connection for the network specified, because it couldn't find any transport. An error message is output to syslog. This happened because either there are no available transports at all, or there are but none is a loopback.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


862716 sema_init: %s

Description:

The rpc.pmfd server was not able to initialize a semaphore, possibly due to low memory, and the system error is shown. The server does not perform the action requested by the client, and pmfadm returns error. An error message is also output to syslog.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


862999 Siebel server components maybe unavailable or offline. No action will be taken.

Description:

Not all of the enabled Siebel server components are running.

Solution:

This is an informative message. Fault Monitor will not take any action. Please manually start the Siebel component(s) that may have gone down to ensure complete service.


865183 Cannot open pipe to child process. pipe() failed with %m

Description:

Fault monitor is not able to communicate to it's child process. Fault monitor will be restarted. If problem persists, fault monitor will be stopped.


865292 File %s should be owned by %s.

Description:

A program required the specified file to be owned by the specified user.

Solution:

Use chown command to change to owner as suggested.


865635 lkcm_act: caller is not registered

Description:

udlm is not currently registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


866371 The listener %s is not running; retry_count <%s> exceeded. Attempting to switchover resource group.

Description:

Listener is not running. Listener monitor has reached the restart limit specified in 'Retry_count' and 'Retry_interval' properties. Listener and the resource group will be moved to another node.

Solution:

Check Oracle listener setup. Please make sure that Listener_name specified in the resource property is configured in listener.ora file. Check 'Host' property of listener in listener.ora file. Examine log file and syslog messages for additional information.


866624 clcomm: validate_policy: threads_low not big enough low %d pool %d

Description:

The system checks the proposed flow control policy parameters at system startup and when processing a change request. The low server thread level must not be less than twice the thread increment level for resource pools whose number threads varies dynamically.

Solution:

No user action required.


867059 Could not shutdown replica for device service (%s). Some file system replicas that depend on this device service may already be shutdown. Future switchovers to this device service will not succeed unless this node is rebooted.

Description:

See message.

Solution:

If mounts or node reboots are on at the time this message was displayed, wait for that activity to complete, and then retry the command to shutdown the device service replica. If not, then contact your authorized Sun service provider to determine whether a workaround or patch is available.


868245 Unable to process dbms log file.

Description:

Error occurred when processing DBMB log file. As a result of this error, fault monitor could not scan errors from log file. This error can occur as a result of memory allocation problems.


868467 Process %s did not die in %d seconds.

Description:

HA-NFS attempted to stop the specified process id but was unable to stop the process in a timely fashion. Since HA-NFS uses the SIGKILL signal to kill processes, this indicates a serious overload or kernelproblem with the system.

Solution:

HA-NFS would take appropiate action. If this error occurs in a STOP method, the node would be rebooted. Increase timeout on the appropiate method.


869196 Failed to get IPMP status for group %s (request failed with %d).

Description:

A query to get the state of a IPMP group failed. This may cause a method failure to occur.

Solution:

Make sure the network monitoring daemon (pnmd) is running. Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


869406 Failed to communicate with server %s port %d: %s.

Description:

The data service fault monitor probe was trying to read from or write to the service specified and failed. Sun Cluster will attempt to correct the situation by either doing a restart or a failover of the data service. The problem may be due to an overloaded system or other problems, causing a timeout to occur before communications could be completed.

Solution:

If this problem is due to an overloaded system, you may consider increasing the Probe_timeout property.


870181 Failed to retrieve the resource handle for %s while querying for property %s: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


870317 INTERNAL ERROR: START method is not registered for resource <%s>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


870566 clutil: Scheduling class %s not configured

Description:

An attempt to change the thread scheduling class failed, because the scheduling class was not configured.

Solution:

Configure the system to support the desired thread scheduling class.


871642 Validation failed. Invalid command line %s %s

Description:

Unable to process parameters passed to the call back method. This is an internal error.

Solution:

Please report this problem.


872086 Service is degraded.

Description:

Probe is detected a failure in the data service. Probe is setting resource's status as degraded.

Solution:

Wait for the fault monitor to restart the data service. Check the syslog messages and configuration of the data service.


872599 Error in getting service name for device path <%s>

Description:

Can not map the device path to a valid global service name.

Solution:

Check the path passed into extension property "ServicePaths" of SUNW.HAStorage type resource.


872839 Resource is already stopped.

Description:

Sun Cluster attempted to stop the resource, but found it already stopped.

Solution:

No user action required.


874012 Command %s timed out. Will continue to start up liveCache.

Description:

The listed command timed out. Will continue to start up liveCache.

Solution:

Informative message. HA-liveCache will continue to start up liveCache. No immediate action is required. This could be caused by heavy system load. However, if the system load is not heavy, user should check the installation and configuration of liveCache. Make sure the same listed command can be ran manually on the system.


874167 Multi-IP group '%s' updated

Description:

The Multi-IP group by that name is modified.

Solution:

This is an informational message, no user action is needed.


874879 clcomm: Path %s being deleted

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.


875171 clcomm: Pathend %p: %d is not a pathend state

Description:

The system maintains state information about a path. The state information is invalid.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


875345 None of the shared paths in file %s are valid.

Description:

All the paths specified in the dfstab.<resource_name> file are invalid.

Solution:

Check that those paths are valid. This might be a result of the underlying disk failure in an unavailable file system. The monitor_check method would thus fail and the HA-NFS resource would not be brought online on this node. However, it is advisable that the file system be brought online soon.


875595 CMM: Shutdown timer expired. Halting.

Description:

The node could not complete its shutdown sequence within the halt timeout, and is aborting to enable another node to safely take over its services.

Solution:

This is an informational message, no user action is needed.


875796 CMM: Reconfiguration callback timed out; node aborting.

Description:

One or more CMM client callbacks timed out and the node will be aborted.

Solution:

There may be other related messages on this node which may help diagnose the problem. Resolve the problem and reboot the node if node failure is unexpected. If unable to resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available.


875939 ERROR: Failed to initialize callbacks for Global_resources_used, error code <%d>

Description:

The rgmd encountered an error while trying to initialize the Global_resources_used mechanism on this node. This is not considered a fatal error, but probably means that method timeouts will not be suspended while a device service is failing over. This could cause unneeded failovers of resource groups when device groups are switched over.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem. This error might be cleared by rebooting the node.


876090 fatal: must be superuser to start %s

Description:

The rgmd can only be executed by the super-user.

Solution:

This probably occurred because a non-root user attempted to start the rgmd manually. Normally, the rgmd is started automatically when the node is booted.


876324 CCR: CCR transaction manager failed to register with the cluster HA framework.

Description:

The CCR transaction manager failed to register with the cluster HA framework.

Solution:

This is an unrecoverable error, and the cluster needs to be rebooted. Also contact your authorized Sun service provider to determine whether a workaround or patch is available.


876485 No execute permissions to the file %s.

Description:

The execute permissions to the specified file are not set.

Solution:

Set the execute permissions to this file.


876834 Could not start server

Description:

HA-Oracle failed to start Oracle server. Syslog messages and log file will provide additional information on possible reasons of failure.

Solution:

Check whether Oracle server can be started manually. Examine the log files and setup.


877905 ff_ioctl: %s

Description:

A server (rpc.pmfd or rpc.fed) was not able to arm or disarm the failfast device, which ensures that the host aborts if the server dies. The error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


878089 fatal: realloc: %s (UNIX error %d)

Description:

The rgmd failed to allocate memory, most likely because the system has run out of swap space. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

The problem was probably cured by rebooting. If the problem recurs, you might need to increase swap space by configuring additional swap devices. See swap(1M) for more information.


878135 WARNING: udlm_update_from_saved_msg

Description:

There is no saved message to update udlm.

Solution:

None. This is a warning only.


878447 Multi-IP group '%s' created

Description:

A Multi-IP group by that name is created.

Solution:

This is an informational message, no user action is needed.


879106 Failed to complete command %s. Will continue to start up liveCache.

Description:

The listed command failed to complete. HA-liveCache will continue to start up liveCache.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


879380 pmf_monitor_children: Error stopping <%s>: %s

Description:

An error occured while rpc.pmfd attempted to send a KILL signal to one of the processes of the given tag. The reason for the failure is also given. rpc.pmfd attempted to kill the process because a previous error occured while creating a monitor process for the process to be monitored.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


879511 reservation fatal error(%s) - service_class not specified

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


880317 scvxvmlg fatal error - %s does not exist, VxVM not installed?

Description:

The program responsible for maintaining the VxVM namespace was unable to access the local VxVM device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

If VxVM is used to manage shared device, it must be installed on all cluster nodes. If VxVM is installed on this node, but the local VxVM namespace does not exist, VxVM may have to be re-installed on this node. If VxVM is installed on this node and the local VxVM device namespace does exist, the namespace management can be manually run on this node by executing '/usr/cluster/lib/dcs/scvxvmlg' on this node. If the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available. If VxVM is not being used on this cluster, then no user action is required.


880651 No hostnames specified.

Description:

An attempt was made to create a Network resource without specifying a hostname.

Solution:

At least one hostname must be specified via tha -l option to scrgadm(1M).


880835 pmf_search_children: Error stopping <%s>: %s

Description:

An error occured while rpc.pmfd attempted to send a KILL signal to one of the processes of the given tag. The reason for the failure is also given. rpc.pmfd attempted to kill the process because a previous error occured while creating a monitor process for the process to be monitored.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


883690 Failed to start Monitor server.

Description:

Sun Cluster HA for Sybase failed to start the monitor server. Other syslog messages and the log file will provide additional information on possible reasons for the failure.

Solution:

Please whether the server can be started manually. Examine the HA-Sybase log files, monitor server log files and setup.


884114 clcomm: Adapter %s constructed

Description:

A network adapter has been initialized.

Solution:

No action required.


884438 A component of NFS did not start completely in %d seconds: prognum %lu, progversion %lu.

Description:

A daemon associated with NFS service did not finish registering with RPC within the specified timeout.

Solution:

Increase the timeout associated with the method during which this failure occurred.


884482 clconf: Quorum device ID %ld is invalid. The largest supported ID is %ld

Description:

Found the quorum device ID being invalid while converting the quorum configuration information into quorum table.

Solution:

Check the quorum configuration information.


884821 Unparsable registration

Description:

Solution:


884823 Prog <%s> step <%s>: stat of program file failed.

Description:

A step points to a file that is not executable. This may have been caused by incorrect installation of the package.

Solution:

Identify the program for the step. Check the permissions on the program. Reinstall the package if necessary.


884979 (%s) aborting, but got a message of type %d

Description:

Going through udlm abort and received an unexpected message of the mentioned type.

Solution:

None.


887138 Extension property <Child_mon_level> has a value of <%d>

Description:

Resource property Child_mon_level is set to the given value.

Solution:

This is an informational message, no user action is needed.


887282 Mode for file %s needs to be %03o

Description:

The file needs to have the indicated mode.

Solution:

Set the mode of the file correctly.


887666 clcomm: sxdoor: op %d fcntl failed: %s

Description:

A user level process is unmarshalling a door descriptor and creating a new door. The specified operation on the fcntl operation fails. The "fcntl" man page describes possible error codes.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


887669 clcomm: coalesce_region request(%d) > MTUsize(%d)

Description:

While supporting an invocation, the system wanted to create one buffer that could hold the data from two buffers. The system cannot create a big enough buffer. After generating another system error message, the system will panic. This message only appears on debug systems.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


888259 clcomm: Path %s being deleted and cleaned

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.


889303 Failed to read from kstat:%s

Description:

See 176151

Solution:

See 176151


889884 scha_control RESTART failed. error %d

Description:

Fault monitor had detected problems in RDBMS server. Attempt to restart RDBMS server on the same node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.


889899 scha_control RESTART failed. error %d

Description:

Fault monitor had detected problems in RDBMS server. Attempt to restart RDBMS server on the same node failed. Error returned by API call scha_control is indicated in the message.

Solution:

None.


890129 dl_attach: DL_ERROR_ACK access error

Description:

Could not attach to the physical device. We are trying to open a fast path to the private transport adapters.

Solution:

Reboot of the node might fix the problem


890413 %s: state transition from %s to %s

Description:

A state transition has happened for the IPMP group. Transition to DOWN happens when all adapters in an IPMP group are determined to be faulty.

Solution:

If an IPMP group transitions to DOWN state, check for error messages about adapters being faulty and take suggested user actions accordingly. No user user action is needed for other state transitions.


890927 HA: repl_mgr_impl: thr_create failed

Description:

The system could not create the needed thread, because there is inadequate memory.

Solution:

There are two possible solutions. Install more memory. Alternatively, reduce memory usage.


891362 scha_resource_open error (%d)

Description:

Error occurred in API call scha_resource_open.

Solution:

Check syslog messages for errors logged from other system modules. Stop and start fault monitor. If error persists then disable fault monitor and report the problem.


891424 Starting %s with command %s.

Description:

Sun Cluster is starting the specified application with the specified command.

Solution:

This is an informational message, no user action is needed.


891462 in libsecurity caller is %d, not the desired uid %d

Description:

A server (rpc.pmfd, rpc.fed or rgmd) refused an rpc connection from a client because it has the wrong uid. The actual and desired uids are shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


892183 libsecurity: NULL RPC to program %ld failed will not retry %s

Description:

A client of the rpc.pmfd, rpc.fed or rgmd server was not able to initiate an rpc connection, because it could not execute a test rpc call. The program will not retry because the time limit of 1 hr was exceeded. The message shows the specific rpc error. The program number is shown. To find out what program corresponds to this number, use the rpcinfo command. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


893019 %d-bit saposcol is running on %d-bit Solaris.

Description:

The architecture of saposcol is not compatable to the current running Solaris version. For example, you have a 64-bit saposcol running on a 32-bit Solaris machine or vice verse.

Solution:

Make sure the correct saposcol is installed on the cluster.


893095 Service <%s> with path <%s> is not available. Retrying...

Description:

The service is not available yet. prenet_start method of SUNW.HAStorage is still testing and waiting.

Solution:

Not user action is required.


894418 reservation warning(%s) - Found invalid key, preempting

Description:

The device fencing program has discovered an invalid scsi-3 key on the specified device and is removing it.

Solution:

This is an informational message, no user action is needed.


894711 Could not resolve '%s' in the name server. Exiting.

Description:

clexecd program was unable to start due to an error in registering itself with the low-level clustering software.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


894800 Dependent hosts are not up. Not starting BV servers on $HOSTNAME.

Description:

The hosts in the startup order on which the specifiedhosts depends havent started.

Solution:

Bring the resource group containing the specified host online if it isnot yet running. If the resource group is already onlinethe probe will take appropriate action.


895149 (%s) t_open: tli error: %s

Description:

Call to t_open() failed. The "t_open" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


895159 clcomm: solaris xdoor dup failed: %s

Description:

A dup operation failed. The "dup" man page describes possible error codes.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


895821 INTERNAL ERROR: cannot get nodeid for node <%s>

Description:

The scha_control function is unable to obtain the node id number for one of the resource group's potential masters. This node will not be considered a candidate destination for the scha_control giveover.

Solution:

Try issuing an scstat(1M) -n command and see if it successfully reports status for all nodes. If not, then the cluster configuration data may be corrupted. If so, then there may be an internal logic error in the rgmd. In either case, please save copies of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


896275 CCR: Ignoring override field for table %s on joining node %s.

Description:

The override flag for a table indicates that the CCR should use this copy as the final version when the cluster is coming up. If the cluster already has a valid copy while the indicated node is joining the cluster, then the override flag on the joining node is ignored.

Solution:

This is an informational message, no user action is needed.


896441 Unknown scalable service method code: %d.

Description:

The method code given is not a method code that was expected.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


896532 Invalid variable name in Environment_file. Ignoring %s

Description:

HA-Sybase reads the Environment_file and exports the variables declared in the Environment file. Syntax for declaring the variables is : VARIABLE=VALUE Lines starting with ' Lines starting with 'export' are ignored. VARIABLE is expected to be a valid Korn shell variable that starts with alphabet or '_' and contains alphanumerics and '_'.

Solution:

Please check the syntax and correct the Environment_file


897348 %s: must be run in secure mode using -S flag

Description:

rpc.sccheckd should always be invoked in secure mode. If this message shows up, someone has modified configuration files that affects server startup.

Solution:

Reinstall cluster packages or contact your service provider.


898001 launch_fed_prog: getlocalhostname() failed for program <%s>

Description:

The ucmmd was unable to obtain the name of the local host. Launching of a method failed.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.


898738 Aborting node because pm_tick delay of %lld ms exceeds %lld ms

Description:

The system is unable to send heartbeats for a long time. (This is half of the minimum of timeout values of all the paths. If the timeout values for all the paths is 10 secs then this value is 5 secs.) There is probably heavy interrupt activity causing the clock thread to get delayed, which in turn causes irregular heartbeats. The node is aborted because it is considered to be in 'sick' condition and it is better to abort this node instead of causing other nodes (or the cluster) to go down.

Solution:

Check to see what is causing high interrupt activity and configure the system accordingly.


899278 Retry_count exceeded in Retry_interval

Description:

Fault monitor has detected problems in RDBMS server. Number of restarts through fault monitor exceed the count specified in 'Retry_count' parameter in 'Retry_interval'. Database server is unable to survive on this node. Switching over the resourge group to other node.

Solution:

Please check the RDBMS setup and server configuration.


899305 clexecd: Daemon exiting because child died.

Description:

Child process in the clexecd program is dead.

Solution:

If this message is seen when the node is shutting down, ignore the message. If thats not the case, the node will halt or reboot itself to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


899648 Failed to process the resource information.

Description:

A Sun cluster data service is unable to retrieve the resource property information. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components.


899776 ERROR: scha_control() was called on resource group <%s>, resource <%s> before the RGM started

Description:

This message most likely indicates that a program called scha_control(1ha,3ha) before the RGM had started up. Normally, scha_control is called by a resource monitor to request failover or restart of a resource group. If the RGM had not yet started up on the cluster, no resources or resource monitors should have been running on any node. The scha_control call will fail with a SCHA_ERR_CLRECONF error.

Solution:

On the node where this message appeared, confirm that rgmd was not yet running (i.e., the cluster was just booting up) when this message was produced. Find out what program called scha_control. If it was a customer-supplied program, this most likely represents an incorrect program behavior which should be corrected. If there is no such customer-supplied program, or if the cluster was not just starting up when the message appeared, contact your authorized Sun service provider for assistance in diagnosing the problem.