Sun Cluster 3.0 Error Messages Guide

Error Message List

The following list is ordered by the message ID.

201878 :clconf: Key length is more than max supported length in clconf_file_io

Description:

In reading configuration data through CCR FILE interface, found the data length is more than max supported length.

Solution:

Check the CCR configuraton information.

203680 :fatal: Unable to bind to nameserver

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

203739 :Resource %s uses network resource %s in resource group %s, but the property %s for resource group %s does not include resource group %s. This dependency must be set.

Description:

For all network resources used by a scalable resource, a dependency on the resource group containing the network resource should be created for the resource group of the scalable resource.

Solution:

Use the scrgadm(1M) command to update the RG_dependencies property of the scalable resource's resource group to include the resource groups of all network resources that the scalable resource uses.

204163 :clcomm: error in copyin for state_balancer

Description:

The system failed a copy operation supporting statistics reporting.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

204584 :clexecd: Going down on signal %d.

Description:

clexecd program got a signal indicated in the error message.

Solution:

clexecd program will exit and node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

205445 :check_and_start(): Out of memory

Description:

System runs out of memory in function check_and_start()

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.

205873 :Permissions incorrect for %s. s bit not set.

Description:

Permissions of $ORACLE_HOME/bin/oracle are expected to be '-rwsr-s--x' (set-group-ID and set-user-ID set). These permissions are set at the time or Oracle installation. FAult monitor will not function correctly without these permissions.

Solution:

Check file permissions. Check Oracle installaion. Relink Oracle, if necessary.

206501 :CMM: Monitoring re-enabled.

Description:

Transport path monitoring has been enabled back in the cluster, after being disabled.

Solution:

This is an informational message, no user action is needed.

206947 :ON_PENDING_MON_DISABLED: bad resource state <%s> (%d) for resource <%s>

Description:

The rgmd state machine has discovered a resource in an unexpected state on the local node. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

207481 :getlocalhostname() failed for resource <%s>, resource group <%s>, method <%s>

Description:

The rgmd was unable to obtain the name of the local host, causing a method invocation to fail. Depending on which method is being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.

208216 :ERROR: resource group <%s> has RG_dependency on non-existent resource group <%s>

Description:

A non-existent resource group is listed in the RG_dependencies of the indicated resource group. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

208596 :clcomm: Path %s being initiated

Description:

A communication link is being established with another node.

Solution:

No action required.

208701 :%s error status ignored in step %s

Description:

Ignoring the error status from step execution since this does not affect outcome of the step.

Solution:

None.

209274 :path_check_start(): Out of memory

Description:

Run out of memory in function path_check_start().

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.

210725 :Warning: While trying to lookup host %s, the length of the returned address (%d) was longer than expected (%d). The address will be truncated.

Description:

The value of the resolved address for the named host was longer than expected.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

211198 :Completed successfully.

Description:

Data service method completed successfully.

Solution:

No action required.

212337 :(%s) scan of seqnum failed on "%s", ret = %d

Description:

Could not get the sequence number from the udlm message received.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

212667 :PNM: could not start due to lock %s

Description:

An attempt was made to start multiple instances of the PNM daemon pnmd(1M), or pnmd(1M) has problem acquiring a lock on the named file.

Solution:

Check if another instance of pnmd is already running. If not, remove the named lock file and start pnmd using the /etc/init.d/pnm script.

213112 :latch_intention(): IDL exception when communicating to node %d

Description:

An inter-node communication failed, probably because a node died.

Solution:

No action is required; the rgmd should recover automatically.

215538 :Not all hostnames brought online.

Description:

Failed to bring all the hostnames online. Only some of the ip addresses are online.

Solution:

Use ifconfig command to make sure that the ip addresses are available. Check for any error message before this error message for a more precise reason for this error. Use scswitch command to move the resource group to a different node. If problem persists, reboot.

216087 :rebalance: resource group <%s> is being switched updated or failed back, cannot assign new primaries

Description:

The indicated resource group has lost a master due to a node death. However, the RGM is unable to switch the resource group to a new master because the resource group is currently in the process of being modified by an operator action, or is currently in the process of "failing back" onto a node that recently joined the cluster.

Solution:

Use scstat(1M) -g to determine the current mastery of the resource group. If necessary, use scswitch(1M) -z to switch the resource group online on desired nodes.

216244 :CCR: Table %s has invalid checksum field. Reported: %s, actual: %s.

Description:

The indicated table has an invalid checksum that does not match the table contents. This causes the consistency check on the indicated table to fail.

Solution:

Boot the offending node in -x mode to restore the indicated table from backup or other nodes in the cluster. The CCR tables are located at /etc/cluster/ccr/.

216379 :Stopping fault monitor using pmfadm tag %s

Description:

Informational message. Fault monitor will be stopped using Process Monitoring Facility (PMF), with the tag indicated in message.

Solution:

None

216774 :WARNING: update_state:udlm_send_reply failed

Description:

A warning for udlm state update and results in udlm abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

217093 :Call failed: %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) to execute the action shown. The rpc error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

218227 :Error accessing policy string

Description:

This message appears when the customer is initializing or changing a scalable services load balancer, by starting or updating a service. The Load_Balancing_String is missing.

Solution:

Add a Load_Balancing_String parameter when creating the resource group.

220849 :CCR: Create table %s failed.

Description:

The CCR failed to create the indicated table.

Solution:

The failure can happen due to many reasons, for some of which no user action is required because the CCR client in that case will handle the failure. The cases for which user action is required depends on other messages from CCR on the node, and include: If it failed because the cluster lost quorum, reboot the cluster. If the root file system is full on the node, then free up some space by removing unnecessary files. If the root disk on the afflicted node has failed, then it needs to be replaced. If the cluster repository is corrupted as indicated by other CCR messages, then boot the offending node(s) in -x mode to restore the cluster repository backup. The cluster repository is located at /etc/cluster/ccr/.

222512 :fatal: could not create death_ff

Description:

The daemon indicated in the message tag (rgmd or ucmmd) was unable to create a failfast device. The failfast device kills the node if the daemon process dies either due to hitting a fatal bug or due to being killed inadvertently by an operator. This is a requirement to avoid the possibility of data corruption. The daemon will produce a core file and will cause the node to halt or reboot.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the core file generated by the daemon. Contact your authorized Sun service provider for assistance in diagnosing the problem.

223145 :gethostbyname failed for (%s)

Description:

Failed to get information about a host. The "gethostbyname" man page describes possible reasons.

Solution:

Make sure entries in /etc/hosts, /etc/nsswitch.conf and /etc/netconfig are correct to get information about this host.

223458 :INTERNAL ERROR CMM: quorum_algorithm_init called already.

Description:

This is an internal error during node initialization, and the system can not continue.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

224718 :Failed to create scalable service in group %s for IP %s Port %d%c%s: %s.

Description:

A call to the underlying scalable networking code failed. This call may fail because the IP, Port, and Protocol combination listed in the message conflicts with the configuration of an existing scalable resource. A conflict can occur if the same combination exists in a scalable resource that is already configured on the cluster. A combination may also conflict if there is a resource that uses Load_balancing_policy LB_STICKY_WILD with the same IP address as a different resource that also uses LB_STICKY_WILD.

Solution:

Try using a different IP, Port, and Protocol combination. Otherwise, save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

224783 :clcomm: Path %s has been deleted

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.

225882 :Internal: Unknown command type (%d)

Description:

An internal error has occurred in the rgmd while trying to connect to the rpc.fed server.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

226914 :scswitch: internal error: bad nodename %s in nodelist of resource group %s

Description:

The indicated resource group's Nodelist property, as stored in the CCR, contains an invalid nodename. This might indicate corruption of CCR data or rgmd in-memory state. The scswitch command will fail.

Solution:

Use scstat(1M) -g and scrgadm(1M) -pvv to examine resource group properties. If the values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

227214 :Error: duplicate method <%s> launched on resource <%s> in resource group <%s>

Description:

Due to an internal error, the rgmd state machine has attempted to launch two different methods on the same resource on the same node, simultaneously. The rgmd will reject the second attempt and treat it as a method failure.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

228461 :CMM: Issuing a SCSI2 Release failed on quorum device %s with error %d.

Description:

This node encountered the specified error while issuing a SCSI2 Release operation on the specified quorum device. The quorum code will either retry this operation or will ignore this quorum device.

Solution:

There may be other related messages that may provide more information regarding the cause of this problem. SCSI2 operations fail with an error code of EACCES if SCSI3 keys are present on the device. Scrub the SCSI3 keys off of the quorum device.

228999 :reservation error(%s) - Ignoring device %s because it requires scsi-3 support

Description:

The device fencing program has discovered a device which is connected to more than 2 nodes.

Solution:

Connecting shared devices to more than 2 nodes is not yet supported. The connectivity of the device should be reduced to 2 nodes.

231770 :ns: Could not initialize ORB: %d

Description:

could not initialize ORB.

Solution:

Please make sure the nodes are booted in cluster mode.

231991 :WARNING: lkcm_dreg: udlm_send_reply failed

Description:

Could not deregister udlm with ucmm.

Solution:

None.

232201 :Invalid port number returned.

Description:

Invalid port number was retrieved for the Port_list property of the resource.

Solution:

Any of the following situations may occur. Different user action is required for these different scenarios. 1) If a new resource has created or updated, check whether it has valid port number. If port number is not valid, provide valid port number using scrgadm(1M) command. 2) Check the syslog messages that have occurred just before this message. If it is "Out of memory" problem, then correct it. 3) For all other cases, treat it as an Internal error. Contact your authorized Sun service provider.

232501 :Validation failed. ORACLE_HOME/bin/svrmgrl not found ORACLE_HOME=%s

Description:

Oracle binaries (svrmgrl) not found in ORACLE_HOME/bin directory. ORACLE_HOME specified for the resource is indicated in the message. HA-Oracle will not be able to manage resource if ORACLE_HOME is incorrect.

Solution:

Specify correct ORACLE_HOME when creating resource. If resource is already created, please update resource property 'ORACLE_HOME'.

232565 :Scalable services enabled.

Description:

This means that the scalable services framework is set up in the cluster. Specifically, is is printed out for the node that has joined the cluster and for which services have been downloaded. Once the services have been downloaded, those services are ready to participate as scalable services.

Solution:

This is an informational message, no user action is needed.

232920 :-d must be followed by a hex bitmask

Description:

Incorrect arguments used while setting up sun specific startup parameters to the Oracle unix dlm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

233017 :Successfully stopped %s.

Description:

The resource was successfully stopped by Sun Cluster.

Solution:

No user action is required.

233053 :SharedAddress offline.

Description:

The status of the sharedaddress resource is offline.

Solution:

This is informational message. No user action required.

233327 :Switchover (%s) error: failed to mount FS (%d)

Description:

The file system specified in the message could not be hosted on the node the message came from.

Solution:

Check /var/adm/messages to make sure there were no device errors. If not, contact your authorized Sun service provider to determine whether a workaround or patch is available.

233956 :Error in reading message in child process: %m

Description:

Error occurred when reading message in fault monitor child process. Child process will be stopped and restarted.

Solution:

If error persists, then disable the fault monitor and resport the problem.

233961 :scvxvmlg error - symlink(%s, %s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

234438 :INTERNAL ERROR: Invalid resource property type <%d> on resource <%s>; aborting node

Description:

An attempted creation or update of a resource has failed because of invalid resource type data. This may indicate CCR data corruption or an internal logic error in the rgmd. The rgmd will produce a core file and will force the node to halt or reboot.

Solution:

Use scrgadm(1M) -pvv to examine resource properties. If the resource or resource type properties appear to be corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Re-try the creation or update operation. If the problem recurs, save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

234463 :INTERNAL ERROR: process_resource: resource group <%s> is pending_mon_disable but contains resource <%s> in STOP_FAILED state

Description:

During a resource monitor disable (scswitch -M -n), the rgmd has discovered a resource in STOP_FAILED state. This may indicate an internal logic error in the rgmd, since updates are not permitted on the resource group until the STOP_FAILED error condition is cleared.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

236733 :lookup of oracle dba gid failed.

Description:

Could not find group id for dba. udlm will not startup.

Solution:

Make sure /etc/nswitch.conf and /etc/group files are valid and have correct information to get the group id of dba.

237149 :clcomm: Path %s being constructed

Description:

A communication link is being established with another node.

Solution:

No action required.

237724 :Failed to retrieve hostname: %s.

Description:

The call back method has failed to determine the hostname. Now the callback methods will be executed in /var/core directory.

Solution:

No user action is needed. For detailed error message, look at the syslog message.

239415 :Failed to retrieve the cluster handle: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

239735 :Couldn't parse policy string %s

Description:

This message appears when the customer is initializing or changing a scalable services load balancer, by starting or updating a service. The Load_Balancing_String is invalid.

Solution:

Check the Load_Balancing_String value specified when creating the resource group and make sure that a valid value is used.

240107 :resource %s state on node %s change to R_ONLINE

Description:

This is a notification from the rgmd that a resource's state has changed. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

240376 :No protocol was given as part of property %s for element %s. The property must be specified as %s=PortNumber%cProtocol,PortNumber%cProtocol,...

Description:

The property named does not have a legal value.

Solution:

Assign the property a legal value.

240388 :Prog <%s> step <%s>: timed out.

Description:

A step has exceeded its configured timeout and was killed by ucmmd. This in turn will cause a reconfiguration of OPS.

Solution:

Other syslog messages occurring just before this one might indicate the reason for the failure. After correcting the problem that caused the step to fail, the operator may retry reconfiguration of OPS.

241147 :Invalid value %s for property %s.

Description:

An invalid value was supplied for the property.

Solution:

Supply "conf" or "boot" as the value for DNS_mode property.

241441 :clexecd: ioctl(I_RECVFD) returned %d. Returning %d to clexecd.

Description:

clexecd program has encountered a failed ioctl(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

241948 :Failed to retrieve resource <%s> extension property <%s>: %s.

Description:

An internal error occurred in the rgmd while checking a resource property.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

242214 :clexecd: fork1 returned %d. Returning %d to clexecd.

Description:

clexecd program has encountered a failed fork1(2) system call. The error message indicates the error number for the failure.

Solution:

If the error number is 12 (ENOMEM), install more memory, increase swap space, or reduce peak memory consumption. If error number is something else, contact your authorized Sun service provider to determine whether a workaround or patch is available.

243444 :CMM: Issuing a SCSI2 Tkown failed for quorum device with error %d.

Description:

This node encountered the specified error while issuing a SCSI2 Tkown operation on a quorum device. This will cause the node to conclude that it has been unsuccessful in preempting keys from the quorum device, and therefore the partition to which it belongs has been preempted. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There will be other related messages that will identify the quorum device for which this error has occurred. If the error encountered is EACCES, then the SCSI2 command could have failed due to the presence of SCSI3 keys on the quorum device. Scrub the SCSI3 keys off of it, and reboot the preempted nodes.

243639 :Scalable service instance [%s,%s,%d] deregistered on node %s.

Description:

The specified scalable service had been deregistered on the specified node. Now, the gif node cannot redirect packets for the specified service to this node.

Solution:

This is an informational message, no user action is needed.

243965 :udlm_ack_msg: udp is null!

Description:

Can not acknowledge a message received from udlmctl because the address to acknowledge to is null.

Solution:

None.

243996 :Failed to retrieve resource <%s> extention property <%s>

Description:

Can not get extention property.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

244116 :clcomm: socreate on routing socket failed with error = %d

Description:

The system prepares IP communications across the private interconnect. A socket create operation on the routing socket failed.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

245186 :reservation warning(%s) - MHIOCGRP_PREEMPTANDABORT error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

247682 :recv_message: cm_reconfigure: %s

Description:

udlm received a message to reconfigure.

Solution:

None. OPS is going to reconfigure.

247868 :in libsecurity: file %s not readable or bad content

Description:

The rpc.pmfd, rpc.fed or rgmd server was not able to read an rpcbind information cache file, or the file's contents are corrupted. The affected component should continue to function by calling rpcbind directly.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

248031 :scvxvmlg warning - %s does not exist, creating it

Description:

The program responsible for maintaining the VxVM device namespace has discovered inconsistencies between the VxVM device namespace on this node and the VxVM configuration information stored in the cluster device configuration system. If configuration changes were made recently, then this message should reflect one of the configuration changes. If no changes were made recently or if this message does not correctly reflect a change that has been made, the VxVM device namespace on this node may be in an inconsistent state. VxVM volumes may be inaccessible from this node.

Solution:

If this message correctly reflects a configuration change to VxVM diskgroups then no action is required. If the change this message reflects is not correct, then the information stored in the device configuration system for each VxVM diskgroup should be examined for correctness. If the information in the device configuration system is accurate, then executing '/usr/cluster/lib/dcs/scvxvmlg' on this node should restore the device namespace. If the information stored in the device configuration system is not accurate, it must be updated by executing '/usr/cluster/bin/scconf -c -D name=diskgroup_name' for each VxVM diskgroup with inconsistent information.

249804 :INTERNAL ERROR CMM: Failure creating sender thread.

Description:

An instance of the userland CMM encountered an internal initialization error. This is caused by inadequate memory on the system.

Solution:

Add more memory to the system. If that does not resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available.

249934 :Method <%s> failed to execute on resource <%s> in resource group <%s>, error: <%d>

Description:

A resource method failed to execute, due to a system error number identified in the message. The indicated error number appears not to match any of the known errno values described in intro(2). This is considered a method failure. Depending on which method is being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state, or it might cause an attempted edit of a resource group or its resources to fail.

Solution:

Other syslog messages occurring at about the same time might provide evidence of the source of the problem. If not, save a copy of the /var/adm/messages files on all nodes, and (if the rgmd did crash) a copy of the rgmd core file, and contact your authorized Sun service provider for assistance.

250133 :Failed to open the device %s: %s.

Description:

This is an internal error. System failed to perform the specified operation.

Solution:

For specific error information check the syslog message. Provide the following information to your authorized Sun service provider to diagnose the problem. 1) Saved copy of /var/adm/messages file 2) Output of "ls -l /dev/sad" command 3) Output of "modinfo | grep sad" command.

250709 :CMM: Initialization for quorum device %s failed with error EACCES. Will retry.

Description:

This node is not able to access the specified quorum device because the node is still fenced off. A retry will be attempted.

Solution:

This is an informational message, no user action is needed.

250800 :clconf: Not found clexecd on node %d for %d seconds. Giving up!

Description:

Could not find clexecd to execute the program on a node. Indicated giving up after retries.

Solution:

No action required. This is informational message.

251552 :Failed to validate configuration.

Description:

The data service is not properly configured.

Solution:

Look at the prior syslog messages for specific problems and take corrective action.

254131 :resource group %s removed.

Description:

This is a notification from the rgmd that the operator has deleted a resource group. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

254692 :scswitch: internal error: bad state <%s> (<%d>) for resource group <%s>

Description:

While attempting to execute an operator-requested switch of the primaries of a resource group, the rgmd has discovered the indicated resource group to be in an invalid state. The switch action will fail.

Solution:

This may indicate an internal error or bug in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

254744 :CMM: Open failed for quorum device %s. Unable to scrub device.

Description:

The open operation failed for the specified quorum device while it was being added into the cluster. The add of this quorum device will fail.

Solution:

The quorum device has failed or the path to this device may be broken. Refer to the disk repair section of the administration guide for resolving this problem. Retry adding the quorum device after the problem has been resolved.

255115 :Retrying to retrieve the resource type information.

Description:

An update to cluster configuration occured while resource type properties were being retrieved

Solution:

Ignore the message.

255929 :in libsecurity authsys_create_default failed

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it failed the authentication process. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

258357 :Method <%s> failed to execute on resource <%s> in resource group <%s>, error: <%s>

Description:

A resource method failed to execute, due to a system error described in the message. For an explanation of the error message, consult intro(2). This is considered a method failure. Depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state, or it might cause an attempted edit of a resource group or its resources to fail.

Solution:

If the error message is not self-explanatory, other syslog messages occurring at about the same time might provide evidence of the source of the problem. If not, save a copy of the /var/adm/messages files on all nodes, and (if the rgmd did crash) a copy of the rgmd core file, and contact your authorized Sun service provider for assistance.

258909 :clexecd: sigfillset returned %d. Exiting.

Description:

clexecd program has encountered a failed sigfillset(3C) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

259455 :in fe_set_env_vars malloc failed

Description:

The rgmd server was not able to allocate memory for the environment name, while trying to connect to the rpc.fed server, possibly due to low memory. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

259810 :reservation error(%s) - do_scsi3_reserve() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.

261123 :resource group %s state change to managed.

Description:

This is a notification from the rgmd that a resource group's state has changed. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

262295 :Failback bailing out because resource group <%s>is being updated or switched

Description:

The rgmd was unable to failback the specified resource group to a more preferred node because the resource group was already in the process of being updated or switched.

Solution:

This is an informational message, no user action is needed.

262898 :Name service not available.

Description:

The monitor_check method detected that name service is not responsive.

Solution:

Check if name servcie is configured correctly. Try some commands to query name serves, such as ping and nslookup, and correct the problem. If the error still persists, then reboot the node.

263258 :CCR: More than one copy of table %s has the same version but different checksums. Using the table from node %s.

Description:

The CCR detects that two valid copies of the indicated table have the same version but different contents. The copy on the indicated node will be used by the CCR.

Solution:

This is an informational message, no user action is needed.

263606 :unpack_rg_seq: rname_to_r error <%s>

Description:

Due to an internal error, the rgmd was unable to find the specified resource data in memory.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

265925 :CMM: Cluster lost operational quorum; aborting.

Description:

Not enough nodes are operational to maintain a majority quorum, causing the cluster to fail to avoid a potential split brain.

Solution:

The nodes should rebooted.

266059 :security_svc_reg failed.

Description:

The rpc.pmfd server was not able to initialize authentication and rpc initialization. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

266834 :CMM: Our partition has been preempted.

Description:

The cluster partition to which this node belongs has been preempted by another partition during a reconfiguration. The preempted partition will abort. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There may be other related messages that may indicate why quorum was lost. Determine why quorum was lost on this node partition, resolve the problem and reboot the nodes in this partition.

267558 :Error when reading property %s.

Description:

Unable to read property value using API. Property name is indicated in message. Syslog messages may give more information on errors in other modules.

Solution:

Check syslog messages. Please report this problem.

267589 :launch_fed_prog: call to rpc.fed failed for program <%s>, step <%s>

Description:

Launching of fed program failed due to a failure of ucmmd to communicate with the rpc.fed daemon. If the rpc.fed process died, this might lead to a subsequent reboot of the node.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

267673 :Validation failed. ORACLE binaries not found ORACLE_HOME=%s

Description:

Oracle binaries not found under ORACLE_HOME. ORACLE_HOME specified for the resource is indicated in the message. HA-Oracle will not be able to manage Oracle if ORACLE_HOME is incorrect.

Solution:

Specify correct ORACLE_HOME when creating resource. If resource is already created, please update resource property 'ORACLE_HOME'.

267724 :stat of file system %s failed.

Description:

HA-NFS fault monitor reports a probe failure on a specified file system.

Solution:

Make sure the specified path exists.

269240 :clconf: Write_ccr routine shouldn't be called from kernel

Description:

Routine write_ccr that writes a clconf tree out to CCR should not be called from kernel.

Solution:

No action required. This is informational message.

269902 :reservation fatal error(%s) - Unable to find gdev property

Description:

A required rawdisk device group property is missing.

Solution:

Executing '/usr/cluster/bin/scgdevs -L' on this node should generate the required property. If this successfully creates the required property, it should be possible to retry the failed operation. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

270043 :reservation warning(%s) - MHIOCENFAILFAST error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

272238 :reservation warning(%s) - MHIOCGRP_RESERVE error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

272732 :scvxvmlg warning - chmod(%s) failed

Description:

Solution:

273018 :INTERNAL ERROR CMM: Failure starting CMM.

Description:

An instance of the userland CMM encountered an internal initialization error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

273354 :CMM: Node %s (nodeid = %d) is dead.

Description:

The specified node has died. It is guaranteed to be no longer running and it is safe to take over services from the dead node.

Solution:

The cause of the node failure should be resolved and the node should be rebooted if node failure is unexpected.

273638 :The entry %s and entry %s in property %s have the same port number: %d.

Description:

The two entries in the list property duplicate port number.

Solution:

Remove one of the entries or change its port number.

274421 :Port %d%c%s is listed twice in property %s, at entries %d and %d.

Description:

The port number in the message was listed twice in the named property, at the list entry locations given in the message. A port number should only appear once in the property.

Solution:

Specify the property with only one occurrence of the port number.

274605 :Server is online.

Description:

Informational message. Oracle server is online.

Solution:

None

274887 :clcomm: solaris xdoor: rejected invo: door_return returned, errno = %d

Description:

An unusual but harmless event occurred. System operations continue unaffected.

Solution:

No user action is required.

274901 :Invalid protocol %s given as part of property %s.

Description:

The property named does not have a legal value.

Solution:

Assign the property a legal value.

276380 :"pmfadm -k": Error signaling <%s>: %s

Description:

An error occured while rpc.pmfd attempted to send a signal to one of the processes of the given tag. The reason for the failure is also given. The signal was sent as a result of a 'pmfadm -k' command.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

276672 :reservation error(%s) - did_get_did_path() error

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

277995 :(%s) msg of wrong version %d, expected %d

Description:

Expected to receiver a message of a different version. udlmctl will fail.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

279084 :CMM: node reconfiguration #%lld completed.

Description:

The cluster membership monitor has processed a change in node or quorum status.

Solution:

This is an informational message, no user action is needed.

279152 :listener %s probe successful.

Description:

Informational message. Listener monitor successfully completed first probe.

Solution:

None

279309 :Failfast: Invalid failfast mode %s specified. Returning default mode PANIC.

Description:

An invalid value was supplied for the failfast mode. The software will use the default PANIC mode instead.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

280108 :clcomm: unable to rebind %s to name server

Description:

The name server would not rebind this entity.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

280256 :clnt_tp_create failed: %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it could not create the rpc handle. The rpc error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

281386 :dl_attach: DL_OK_ACK rtnd prim %u

Description:

Wrong primitive returned to the DL_ATTACH_REQ.

Solution:

Reboot the node. If the problem persists, check the documentation for the private interconnect.

281428 :Failed to retrieve the resource group handle: %s.

Description:

An API operation on the resource group has failed.

Solution:

For the resource group name, check the syslog tag. For more details, check the syslog messages from other components. If the error persists, reboot the node.

281680 :fatal: couldn't initialize ORB, possibly because machine is booted in non-cluster mode

Description:

The rgmd was unable to initialize its interface to the low-level cluster machinery. This might occur because the operator has attempted to start the rgmd on a node that is booted in non-cluster mode. The rgmd will produce a core file, and in some cases it might cause the node to halt or reboot to avoid data corruption.

Solution:

If the node is in non-cluster mode, boot it into cluster mode before attempting to start the rgmd. If the node is already in cluster mode, save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

281819 :%s exited with error %s in step %s

Description:

A ucmm step execution failed in the indicated step.

Solution:

None. See /var/adm/messages for previous errors and report this problem if it occurs again during the next reconfiguration.

282406 :fork1 returned %d. Exiting.

Description:

clexecd program has encountered a failed fork1(2) system call. The error message indicates the error number for the failure.

Solution:

282508 :INTERNAL ERROR: r_state_at_least: state <%s> (%d)

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.

282828 :reservation warning(%s) - MHIOCRELEASE error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

283262 :HA: rm_state_machine::service_suicide() not yet implemented

Description:

Unimplemented feature was activated.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

284006 :reservation fatal error(UNKNOWN) - Out of memory

Description:

The device fencing program has been unable to allocate required memory.

Solution:

Memory usage should be monitored on this node and steps taken to provide more available memory if problems persist. Once memory has been made available, the following steps may need to taken: If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, access to shared devices can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. The device group can be switched back to this node if desired by using the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

284644 :Warning: node %d has a weight assigned to it for property %s, but node %d is not in the %s for resource %s.

Description:

A node has a weight assigned but the resource can never be active on that node, therefore it doesn't make sense to assign that node a weight.

Solution:

This is an informational message, no user action is needed. Optionally, the weight that is assigned to the node can be omitted.

286722 :scvxvmlg error - remove(%s) failed

Description:

Solution:

286807 :clnt_tp_create_timed of program %s failed %s.

Description:

HA-NFS fault monitor was not able to make an rpc connection to an nfs server.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

289194 :Can't perform failover: Failover mode set to NONE.

Description:

Cannot perform failover of the data service. Failover mode is set to NONE.

Solution:

This is informational message. If failover is desired, then set the Failover_mode value to SOFT or HARD using scrgadm(1M).

289503 :Unable to re-compute NFS resource list.

Description:

The list of HA-NFS resources online on the node has gotten corrupted.

Solution:

Make sure there is space available in /tmp. If the error is showing up despite that, reboot the node.

290735 :Conversion of hostnames failed.

Description:

Data service is unable to convert the specified hostname into an IP address.

Solution:

Check the syslog messages that occurred just before, to check whether there is any internal error. If there is, then contact your authorized Sun service provider. Otherwise, if the logical host and shared address entries are specified in the /etc/inet/hosts file, check these entries are correct. If this is not the reason then check the health of the name server.

291986 :dl_bind ack bad len %d

Description:

Sanity check. The message length in the acknowledgment to the bind request is different from what was expected. We are trying to open a fast path to the private transport adapters.

Solution:

Reboot of the node might fix the problem.

292013 :clcomm: UioBuf: uio was too fragmented - %d

Description:

The system attempted to use a uio that had more than DEF_IOV_MAX fragments.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

295666 :clcomm: setrlimit(RLIMIT_NOFILE): %s

Description:

During Galileo initialization within this user process, the setrlimit call failed with the specified error.

Solution:

Read the man page for setrlimit for a more detailed description of the error.

295838 :Listener %s started.

Description:

Informational message. HA-Oracle successfully started Oracle listener.

Solution:

None

297061 :clcomm: can't get new reference

Description:

An attempt was made to obtain a new reference on a revoked handler.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

297139 :CCR: More than one data server has override flag set for the table %s. Using the table from node %s.

297178 :Error opening procfs control file <%s> for tag <%s>: %s

Description:

The rpc.pmfd server was not able to open a procfs control file, and the system error is shown. procfs control files are required in order to monitor user processes.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

297325 :The node portion of %s at position %d in property %s is not a valid node identifier or node name.

Description:

An invalid node was specified for the named property. The position index, which starts at 0 for the first element in the list, indicates which element in the property list was invalid.

Solution:

Specify a valid node instead.

297536 :Could not host device service %s because this node is being shut down

Description:

An attempt was made to start a device group on this node while the node was being shutdown.

Solution:

If the node was not being shutdown during this time, or if the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available.

297867 :(%s) t_bind: tli error: %s

Description:

Call to t_bind() failed. The "t_bind" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

298911 :setrlimit: %s

Description:

The rpc.pmfd server was not able to set the limit of files open. The message contains the system error. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

299417 :in libsecurity strong Unix authorization failed

Description:

A server (rgmd) refused an rpc connection from a client because it failed the Unix authentication. This happens if a caller program using scha public api, either in its C form or its CLI form, is not running as root or is not making the rpc call over the loopback interface. An error message is output to syslog.

Solution:

Check that the calling program using the scha public api is running as root and is calling over the loopback interface. If both are correct, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.