Sun Cluster 3.0 5/02 Error Messages Guide

Error Message List

The following list is ordered by the message ID.


201348 :Resource <%s> must be in the same resource group as <%s>.

Description:

There is at least one HAStoragePlus resource that this resource depends on that is not in this resource's resource group.

Solution:

Move the resources identified by the message into the same resource group and retry the operation that caused this error.


201878 :clconf: Key length is more than max supported length in clconf_file_io

Description:

In reading configuration data through CCR FILE interface, found the data length is more than max supported length.

Solution:

Check the CCR configuration information.


203680 :fatal: Unable to bind to nameserver

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


203739 :Resource %s uses network resource %s in resource group %s, but the property %s for resource group %s does not include resource group %s. This dependency must be set.

Description:

For all network resources used by a scalable resource, a dependency on the resource group containing the network resource should be created for the resource group of the scalable resource.

Solution:

Use the scrgadm(1M) command to update the RG_dependencies property of the scalable resource's resource group to include the resource groups of all network resources that the scalable resource uses.


204163 :clcomm: error in copying for state_balancer

Description:

The system failed a copy operation supporting statistics reporting.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


204584 :clexecd: Going down on signal %d.

Description:

clexecd program got a signal indicated in the error message.

Solution:

clexecd program will exit and node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


205445 :check_and_start(): Out of memory

Description:

System runs out of memory in function check_and_start()

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.


205754 :All specified device services validated successfully.

Description:

All specified global device services, file system entries, DCS, RGM and SCHA components were found to be in order.

Solution:

An informational message only. No action is needed.


205873 :Permissions incorrect for %s. s bit not set.

Description:

Permissions of $ORACLE_HOME/bin/oracle are expected to be '-rwsr-s--x' (set-group-ID and set-user-ID set). These permissions are set at the time or Oracle installation. FAult monitor will not function correctly without these permissions.

Solution:

Check file permissions. Check Oracle installation. Relink Oracle, if necessary.


206501 :CMM: Monitoring re-enabled.

Description:

Transport path monitoring has been enabled back in the cluster, after being disabled.

Solution:

This is an informational message, no user action is needed.


206947 :ON_PENDING_MON_DISABLED: bad resource state <%s> (%d) for resource <%s>

Description:

The rgmd state machine has discovered a resource in an unexpected state on the local node. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


207186:Text server shutdown did not succeed.

Description:

Sun Cluster HA for Sybase did not successfully shut down the Text Server.

Solution:

Manually stop the Text Server. Examine the log files and setup. See if the STOP method timeout values are set too low.


207481 :getlocalhostname() failed for resource <%s>, resource group <%s>, method <%s>

Description:

The rgmd was unable to obtain the name of the local host, causing a method invocation to fail. Depending on which method is being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.


207510:Some BV servers could not be launched on $HOSTNAME.Check BV logs.

Description:

The orbix servers could not be launched. The agent should not return any error because the orbix daemon should try to relaunch the servers at a later time.

Solution:

Check the Sun Cluster HA for BroadVision One-To-One Enterprise logs for the cause of the failure. Refer to your Sun Cluster HA for BroadVision One-To-One Enterprise documentation if the daemons continue to fail and you cannot start the data service.


208216 :ERROR: resource group <%s> has RG_dependency on non-existent resource group <%s>

Description:

A non-existent resource group is listed in the RG_dependencies of the indicated resource group. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


208596 :clcomm: Path %s being initiated

Description:

A communication link is being established with another node.

Solution:

No action required.


208701 :%s error status ignored in step %s

Description:

Ignoring the error status from step execution since this does not affect outcome of the step.

Solution:

None.


209274 :path_check_start(): Out of memory

Description:

Run out of memory in function path_check_start().

Solution:

Install more memory, increase swap space, or reduce peak memory consumption.


210725 :Warning: While trying to lookup host %s, the length of the returned address (%d) was longer than expected (%d). The address will be truncated.

Description:

The value of the resolved address for the named host was longer than expected.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


210975:Stop monitoring saposcol under PMF times out.

Description:

The Sun Cluster HA for SAP timed out before you were able to stop monitoring the OS collector process under the control of Process Monitor Facility (PMF). This might happen under heavy system load.

Solution:

Increase the stop timeout value.


211198 :Completed successfully.

Description:

Data service method completed successfully.

Solution:

No action required.


212337 :(%s) scan of seqnum failed on "%s", ret = %d

Description:

Could not get the sequence number from the udlm message received.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


212667 :PNM: could not start due to lock %s

Description:

An attempt was made to start multiple instances of the PNM daemon pnmd(1M), or pnmd(1M) has problem acquiring a lock on the named file.

Solution:

Check if another instance of pnmd is already running. If not, remove the named lock file and start pnmd using the /etc/init.d/pnm script.


212697 :CMM: Open failed with error `(%s)' and errno = %d for quorum device %s

Description:

The open operation failed for the specified quorum device while it was being added into the cluster. The add of this quorum device will fail.

Solution:

The quorum device has failed or the path to this device may be broken. Refer to the disk repair section of the administration guide for resolving this problem. Retry adding the quorum device after the problem has been resolved.


213112 :latch_intention(): IDL exception when communicating to node %d

Description:

An inter-node communication failed, probably because a node died.

Solution:

No action is required; the rgmd should recover automatically.


213583:Failed to stop Backup server.

Description:

Sun Cluster HA for Sybase failed to stop backup server using KILL signal.

Solution:

Please examine whether any Sybase server processes are running on the server. Please manually shutdown the server.


213599:Failed to stop backup server.

Description:

Sun Cluster HA for Sybase failed to stop backup server using KILL signal.

Solution:

Please examine whether any Sybase server processes are running on the server. Please manually shutdown the server.


213973:Resource depends on a SUNW.HAStoragePlus type resource that is not online anywhere.

Description:

The resource depends on a SUNW.HAStoragePlus resource that is not online on any cluster node.

Solution:

Bring all SUNW.HAStoragePlus resources, that this HA-NFS resource depends on, online before performing the operation that caused this error.


213991:No Network resource in resource group.

Description:

This message indicates that there is no network resource configured for a "network-aware" service. A network aware service can not function without a network address.

Solution:

Configure a network resource and retry the command.


215525:Failed to stop orbixd.

Description:

The stop method could not stop the orbix daemon. This failure might be an internal error.

Solution:

Manually stop the orbixd daemon, and clear all the Sun Cluster HA for BroadVision One-To-One Enterprise processes running on the node where the resource failed. If all the resources on the node where the stop method failed are turned off, delete the /var/run/cluster/bv/bv_orbixd_lock_file file.


215538 :Not all hostnames brought online.

Description:

Failed to bring all the hostnames online. Only some of the IP addresses are online.

Solution:

Use ifconfig command to make sure that the IP addresses are available. Check for any error message before this error message for a more precise reason for this error. Use scswitch command to move the resource group to a different node. If problem persists, reboot.


216087 :rebalance: resource group <%s> is being switched updated or failed back, cannot assign new primaries

Description:

The indicated resource group has lost a master due to a node death. However, the RGM is unable to switch the resource group to a new master because the resource group is currently in the process of being modified by an operator action, or is currently in the process of "failing back" onto a node that recently joined the cluster.

Solution:

Use scstat(1M) -g to determine the current mastery of the resource group. If necessary, use scswitch(1M) -z to switch the resource group online on desired nodes.


216244 :CCR: Table %s has invalid checksum field. Reported: %s, actual: %s.

Description:

The indicated table has an invalid checksum that does not match the table contents. This causes the consistency check on the indicated table to fail.

Solution:

Boot the offending node in -x mode to restore the indicated table from backup or other nodes in the cluster. The CCR tables are located at /etc/cluster/ccr/.


216379 :Stopping fault monitor using pmfadm tag %s

Description:

Informational message. Fault monitor will be stopped using Process Monitoring Facility (PMF), with the tag indicated in message.

Solution:

None


216774 :WARNING: update_state:udlm_send_reply failed

Description:

A warning for udlm state update and results in udlm abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


217093 :Call failed: %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) to execute the action shown. The rpc error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


218227 :Error accessing policy string

Description:

This message appears when the customer is initializing or changing a scalable services load balancer, by starting or updating a service. The Load_Balancing_String is missing.

Solution:

Add a Load_Balancing_String parameter when creating the resource group.


218780:Stopping the monitor server.

Description:

Sun Cluster HA for Sybase is shutting down the Monitor Server.

Solution:

No user action required.


219058:Failed to stop the backup server using %s.

Description:

Sun Cluster HA for Sybase failed to stop the backup server using the file specified in the STOP_FILE property. Other syslog messages and the log file will provide additional information on possible reasons for the failure.

Solution:

Please check the permissions of file specified in the STOP_FILE extension property. File should be executable by the Sybase owner and root user.


219930:Cannot determine if the server is secure: assuming secure.

Description:

While attempting to determine if the Netscape server is running under secure or non-secure mode an error occurred. This error results in the Data Service assuming a secure Netscape server, and will probe the server as such.

Solution:

This message is issued after an internal error has occurred. Refer to syslog for that message.


220849 :CCR: Create table %s failed.

Description:

The CCR failed to create the indicated table.

Solution:

The failure can happen due to many reasons, for some of which no user action is required because the CCR client in that case will handle the failure. The cases for which user action is required depends on other messages from CCR on the node, and include: If it failed because the cluster lost quorum, reboot the cluster. If the root file system is full on the node, then free up some space by removing unnecessary files. If the root disk on the afflicted node has failed, then it needs to be replaced. If the cluster repository is corrupted as indicated by other CCR messages, then boot the offending node(s) in -x mode to restore the cluster repository backup. The cluster repository is located at /etc/cluster/ccr/.


222512 :fatal: could not create death_ff

Description:

The daemon indicated in the message tag (rgmd or ucmmd) was unable to create a failfast device. The failfast device kills the node if the daemon process dies either due to hitting a fatal bug or due to being killed inadvertently by an operator. This is a requirement to avoid the possibility of data corruption. The daemon will produce a core file and will cause the node to halt or reboot.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the core file generated by the daemon. Contact your authorized Sun service provider for assistance in diagnosing the problem.


223145 :gethostbyname failed for (%s)

Description:

Failed to get information about a host. The "gethostbyname" man page describes possible reasons.

Solution:

Make sure entries in /etc/hosts, /etc/nsswitch.conf and /etc/netconfig are correct to get information about this host.


223458 :INTERNAL ERROR CMM: quorum_algorithm_init called already.

Description:

This is an internal error during node initialization, and the system can not continue.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


224682 :Failed to initialize the probe history.

Description:

A process has failed to allocate memory for the probe history structure, most likely because the system has run out of swap space.

Solution:

To solve this problem, increase swap space by configuring additional swap devices. See swap(1M) for more information.


224718 :Failed to create scalable service in group %s for IP %s Port %d%c%s: %s.

Description:

A call to the underlying scalable networking code failed. This call may fail because the IP, Port, and Protocol combination listed in the message conflicts with the configuration of an existing scalable resource. A conflict can occur if the same combination exists in a scalable resource that is already configured on the cluster. A combination may also conflict if there is a resource that uses Load_balancing_policy LB_STICKY_WILD with the same IP address as a different resource that also uses LB_STICKY_WILD.

Solution:

Try using a different IP, Port, and Protocol combination. Otherwise, save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


224783 :clcomm: Path %s has been deleted

Description:

A communication link is being removed with another node. The interconnect may have failed or the remote node may be down.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.


225882 :Internal: Unknown command type (%d)

Description:

An internal error has occurred in the rgmd while trying to connect to the rpc.fed server.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


226914 :scswitch: internal error: bad nodename %s in nodelist of resource group %s

Description:

The indicated resource group's Nodelist property, as stored in the CCR, contains an invalid nodename. This might indicate corruption of CCR data or rgmd in-memory state. The scswitch command will fail.

Solution:

Use scstat(1M) -g and scrgadm(1M) -pvv to examine resource group properties. If the values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


227214 :Error: duplicate method <%s> launched on resource <%s> in resource group <%s>

Description:

Due to an internal error, the rgmd state machine has attempted to launch two different methods on the same resource on the same node, simultaneously. The rgmd will reject the second attempt and treat it as a method failure.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


227820:Attempting to stop the data service running under process monitor facility.

Description:

The function is going to request the PMF to stop the data service. If the request fails, refer to the syslog messages that appear after this message.

Solution:

This is an informational message, no user action is required.


228021:Failed to retrieve the extension property <%s> for NetBackup, error : %s.

Description:

The extension <%s> is missing in the RTR File. This is a serious error. The RTR file might be corrupted.

Solution:

Reload the Sun Cluster HA for NetBackup package. If this problem persists contact your authorized Sun service provider for assistance.


228212:reservation fatal error(%s) - unable to get local node id.

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the `node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the `release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing `/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the `make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the `primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


228399 :Unable to stop processes running under PMF tag %s.

Description:

Sun Cluster HA for Sybase failed to stop processes using Process Monitoring Facility. Please examine if the PMF tag indicated in the message exists on the node. (command: pmfadm -l <tag>). Other syslog messages and the log file will provide additional information on possible reasons for the failure.

Solution:

Please examine Sybase logs and syslog messages. If this message is seen under heavy system load, it will be necessary to increase Stop_timeout property of the resource.


228461 :CMM: Issuing a SCSI2 Release failed on quorum device %s with error %d.

Description:

This node encountered the specified error while issuing a SCSI2 Release operation on the specified quorum device. The quorum code will either retry this operation or will ignore this quorum device.

Solution:

There may be other related messages that may provide more information regarding the cause of this problem. SCSI2 operations fail with an error code of EACCES if SCSI3 keys are present on the device. Scrub the SCSI3 keys off of the quorum device.


231770 :ns: Could not initialize ORB: %d

Description:

could not initialize ORB.

Solution:

Please make sure the nodes are booted in cluster mode.


231991 :WARNING: lkcm_dreg: udlm_send_reply failed

Description:

Could not deregister udlm with ucmm.

Solution:

None.


232201 :Invalid port number returned.

Description:

Invalid port number was retrieved for the Port_list property of the resource.

Solution:

Any of the following situations may occur. Different user action is required for these different scenarios. 1) If a new resource has created or updated, check whether it has valid port number. If port number is not valid, provide valid port number using scrgadm(1M) command. 2) Check the syslog messages that have occurred just before this message. If it is "Out of memory" problem, then correct it. 3) For all other cases, treat it as an Internal error. Contact your authorized Sun service provider.


232501 :Validation failed. ORACLE_HOME/bin/svrmgrl not found ORACLE_HOME=%s

Description:

Oracle binaries (svrmgrl) not found in ORACLE_HOME/bin directory. ORACLE_HOME specified for the resource is indicated in the message. HA-Oracle will not be able to manage resource if ORACLE_HOME is incorrect.

Solution:

Specify correct ORACLE_HOME when creating resource. If resource is already created, please update resource property 'ORACLE_HOME'.


232565 :Scalable services enabled.

Description:

This means that the scalable services framework is set up in the cluster. Specifically, is printed out for the node that has joined the cluster and for which services have been downloaded. Once the services have been downloaded, those services are ready to participate as scalable services.

Solution:

This is an informational message, no user action is needed.


232920 :-d must be followed by a hex bitmask

Description:

Incorrect arguments used while setting up sun specific startup parameters to the Oracle unix dlm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


233017 :Successfully stopped %s.

Description:

The resource was successfully stopped by Sun Cluster.

Solution:

No user action is required.


233053 :SharedAddress offline.

Description:

The status of the sharedaddress resource is offline.

Solution:

This is informational message. No user action required.


233327 :Switchover (%s) error: failed to mount FS (%d)

Description:

The file system specified in the message could not be hosted on the node the message came from.

Solution:

Check /var/adm/messages to make sure there were no device errors. If not, contact your authorized Sun service provider to determine whether a workaround or patch is available.


233956 :Error in reading message in child process: %m

Description:

Error occurred when reading message in fault monitor child process. Child process will be stopped and restarted.

Solution:

If error persists, then disable the fault monitor and report the problem.


233961 :scvxvmlg error - symlink(%s, %s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


234438 :INTERNAL ERROR: Invalid resource property type <%d> on resource <%s>; aborting node

Description:

An attempted creation or update of a resource has failed because of invalid resource type data. This may indicate CCR data corruption or an internal logic error in the rgmd. The rgmd will produce a core file and will force the node to halt or reboot.

Solution:

Use scrgadm(1M) -pvv to examine resource properties. If the resource or resource type properties appear to be corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Retry the creation or update operation. If the problem recurs, save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.


234463 :INTERNAL ERROR: process_resource: resource group <%s> is pending_mon_disable but contains resource <%s> in STOP_FAILED state

Description:

During a resource monitor disable (scswitch -M -n), the rgmd has discovered a resource in STOP_FAILED state. This may indicate an internal logic error in the rgmd, since updates are not permitted on the resource group until the STOP_FAILED error condition is cleared.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


236733 :lookup of oracle dba gid failed.

Description:

Could not find group id for dba. udlm will not startup.

Solution:

Make sure /etc/nswitch.conf and /etc/group files are valid and have correct information to get the group id of dba.


237149 :clcomm: Path %s being constructed

Description:

A communication link is being established with another node.

Solution:

No action required.


237724 :Failed to retrieve hostname: %s.

Description:

The call back method has failed to determine the hostname. Now the callback methods will be executed in /var/core directory.

Solution:

No user action is needed. For detailed error message, look at the syslog message.


237744:SAP was brought up outside of HA-SAP, HA-SAP will not shut it down.

Description:

Sun Cluster HA for SAP started up outside of the control of the Sun Cluster software. It should not shut down automatically.

Solution:

Shut down Sun Cluster HA for SAP, before trying to start Sun Cluster HA for SAP under the control of the Sun Cluster software.


239415 :Failed to retrieve the cluster handle: %s.

Description:

Access to the object named failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


239735 :Couldn't parse policy string %s

Description:

This message appears when the customer is initializing or changing a scalable services load balancer, by starting or updating a service. The Load_Balancing_String is invalid.

Solution:

Check the Load_Balancing_String value specified when creating the resource group and make sure that a valid value is used.


240107 :resource %s state on node %s change to R_ONLINE

Description:

This is a notification from the rgmd that a resource's state has changed. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


240376 :No protocol was given as part of property %s for element %s. The property must be specified as %s=PortNumber%cProtocol,PortNumber%cProtocol,...

Description:

The property named does not have a legal value.

Solution:

Assign the property a legal value.


240388 :Prog <%s> step <%s>: timed out.

Description:

A step has exceeded its configured timeout and was killed by ucmmd. This in turn will cause a reconfiguration of OPS.

Solution:

Other syslog messages occurring just before this one might indicate the reason for the failure. After correcting the problem that caused the step to fail, the operator may retry reconfiguration of OPS.


241147 :Invalid value %s for property %s.

Description:

An invalid value was supplied for the property.

Solution:

Supply "conf" or "boot" as the value for DNS_mode property.


241441 :clexecd: ioctl(I_RECVFD) returned %d. Returning %d to clexecd.

Description:

clexecd program has encountered a failed ioctl(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


241948 :Failed to retrieve resource <%s> extension property <%s>: %s.

Description:

An internal error occurred in the rgmd while checking a resource property.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


242214 :clexecd: fork1 returned %d. Returning %d to clexecd.

Description:

clexecd program has encountered a failed fork1(2) system call. The error message indicates the error number for the failure.

Solution:

If the error number is 12 (ENOMEM), install more memory, increase swap space, or reduce peak memory consumption. If error number is something else, contact your authorized Sun service provider to determine whether a workaround or patch is available.


243444 :CMM: Issuing a SCSI2 Tkown failed for quorum device with error %d.

Description:

This node encountered the specified error while issuing a SCSI2 Tkown operation on a quorum device. This will cause the node to conclude that it has been unsuccessful in preempting keys from the quorum device, and therefore the partition to which it belongs has been preempted. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There will be other related messages that will identify the quorum device for which this error has occurred. If the error encountered is EACCES, then the SCSI2 command could have failed due to the presence of SCSI3 keys on the quorum device. Scrub the SCSI3 keys off of it, and reboot the preempted nodes.


243639 :Scalable service instance [%s,%s,%d] deregistered on node %s.

Description:

The specified scalable service had been deregistered on the specified node. Now, the gif node cannot redirect packets for the specified service to this node.

Solution:

This is an informational message, no user action is needed.


243965 :udlm_ack_msg: udp is null!

Description:

Can not acknowledge a message received from udlmctl because the address to acknowledge to is null.

Solution:

None.


243996 :Failed to retrieve resource <%s> extension property <%s>

Description:

Can not get extension property.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.


244116 :clcomm: socreate on routing socket failed with error = %d

Description:

The system prepares IP communications across the private interconnect. A socket create operation on the routing socket failed.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


244218:Could not stop the BV processes on $HOSTNAME.

Description:

The Sun Cluster HA for BroadVision One-To-One Enterprise processes on the specified host could not be stopped.

Solution:

Manually stop the Sun Cluster HA for BroadVision One-To-One Enterprise processes. If the orbixd daemon does not start, contact your authorized Sun service provider for assistance. Provide your authorized Sun service provider a copy of the /var/adm/messages files from all nodes.


245186 :reservation warning(%s) - MHIOCGRP_PREEMPTANDABORT error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.


245679 :An error occured while obtaining the global service name associated with the file system mount point %s.

Description:

The device special file associated with a file system mount point may not be a valid DCS device. Only global devices can be used as device special files for specifying local file system mount points.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Inspect the syslog for other errors.


247682 :recv_message: cm_reconfigure: %s

Description:

udlm received a message to reconfigure.

Solution:

None. OPS is going to reconfigure.


247752:Failed to start the service %s.

Description:

Specified data service failed to start.

Solution:

Check the /var/adm/messages files for the cause of the failure. Contact your authorized Sun service provider for assistance. Provide your authorized Sun service provider a copy of the /var/adm/messages files from all nodes.


247868:in libsecurity: file %s not readable or bad content

Description:

The rpc.pmfd, rpc.fed or rgmd server was not able to read an rpcbind information cache file, or the file's contents are corrupted. The affected component should continue to function by calling rpcbind directly.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available


248031 :scvxvmlg warning - %s does not exist, creating it

Description:

The program responsible for maintaining the VxVM device namespace has discovered inconsistencies between the VxVM device namespace on this node and the VxVM configuration information stored in the cluster device configuration system. If configuration changes were made recently, then this message should reflect one of the configuration changes. If no changes were made recently or if this message does not correctly reflect a change that has been made, the VxVM device namespace on this node may be in an inconsistent state. VxVM volumes may be inaccessible from this node.

Solution:

If this message correctly reflects a configuration change to VxVM diskgroups then no action is required. If the change this message reflects is not correct, then the information stored in the device configuration system for each VxVM diskgroup should be examined for correctness. If the information in the device configuration system is accurate, then executing '/usr/cluster/lib/dcs/scvxvmlg' on this node should restore the device namespace. If the information stored in the device configuration system is not accurate, it must be updated by executing '/usr/cluster/bin/scconf -c -D name=diskgroup_name' for each VxVM diskgroup with inconsistent information.


249804 :INTERNAL ERROR CMM: Failure creating sender thread.

Description:

An instance of the userland CMM encountered an internal initialization error. This is caused by inadequate memory on the system.

Solution:

Add more memory to the system. If that does not resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available.


249934 :Method <%s> failed to execute on resource <%s> in resource group <%s>, error: <%d>

Description:

A resource method failed to execute, due to a system error number identified in the message. The indicated error number appears not to match any of the known error values described in intro(2). This is considered a method failure. Depending on which method is being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state, or it might cause an attempted edit of a resource group or its resources to fail.

Solution:

Other syslog messages occurring at about the same time might provide evidence of the source of the problem. If not, save a copy of the /var/adm/messages files on all nodes, and (if the rgmd did crash) a copy of the rgmd core file, and contact your authorized Sun service provider for assistance.


250047:Failed to start Broadvision servers on %s.

Description:

The Sun Cluster HA for BroadVision One-To-One Enterprise servers could not start on the specified host. This failure occurs if the orbix daemon did not properly start or if there are configuration errors.

Solution:

See if there are any internal errors. Verify the Sun Cluster HA for BroadVision One-To-One Enterprise configuration. Manually start Sun Cluster HA for BroadVision One-To-One Enterprise on the specified host. If you cannot manually start the orbixd daemon, contact your authorized Sun service provider for assistance. Provide your authorized Sun service provider a copy of the /var/adm/messages files from all nodes.


250133 :Failed to open the device %s: %s.

Description:

This is an internal error. System failed to perform the specified operation.

Solution:

For specific error information check the syslog message. Provide the following information to your authorized Sun service provider to diagnose the problem. 1) Saved copy of /var/adm/messages file 2) Output of "ls -l /dev/sad" command 3) Output of "modinfo | grep sad" command.


250387:Stop fault monitor using pmfadm failed. tag %s error=%s.

Description:

The Process Monitor Facility (PMF) could not stop the Sun Cluster HA for Sybase fault monitor. The fault monitor tag is provided in the message. The error message returned by the PMF is indicated in the message.

Solution:

Stop the fault monitor processes. Contact your authorized Sun Service provider to report this problem.


250709 :CMM: Initialization for quorum device %s failed with error EACCES. Will retry.

Description:

This node is not able to access the specified quorum device because the node is still fenced off. A retry will be attempted.

Solution:

This is an informational message, no user action is needed.


250800 :clconf: Not found clexecd on node %d for %d seconds. Giving up!

Description:

Could not find clexecd to execute the program on a node. Indicated giving up after retries.

Solution:

No action required. This is informational message.


251472:Validation failed. SYBASE directory %s does not exist.

Description:

The indicated directory does not exist. The SYBASE environment variable might be incorrectly set or the installation might be incorrect.

Solution:

Verify the SYBASE environment variable value and the Sybase installation.


251552 :Failed to validate configuration.

Description:

The data service is not properly configured.

Solution:

Look at the prior syslog messages for specific problems and take corrective action.


252457:The %s command does not have execute permissions: <%s>.

Description:

This command input to the agent builder does not have the expected default execute permissions.

Solution:

Reset the permissions to allow execute permissions using the chmod command.


254131 :resource group %s removed.

Description:

This is a notification from the rgmd that the operator has deleted a resource group. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


254388:Failed to retrieve Message server pid.

Description:

Failed to retrieve the process ID for the message server, indicating the message server process is not running.

Solution:

No user action required. The fault monitor should detect that the message server process is not running, and take appropriate action.


254692 :scswitch: internal error: bad state <%s> (<%d>) for resource group <%s>

Description:

While attempting to execute an operator-requested switch of the primaries of a resource group, the rgmd has discovered the indicated resource group to be in an invalid state. The switch action will fail.

Solution:

This may indicate an internal error or bug in the rgmd. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.


255115 :Retrying to retrieve the resource type information.

Description:

An update to cluster configuration occurred while resource type properties were being retrieved

Solution:

Ignore the message.


255929 :in libsecurity authsys_create_default failed

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it failed the authentication process. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


258357 :Method <%s> failed to execute on resource <%s> in resource group <%s>, error: <%s>

Description:

A resource method failed to execute, due to a system error described in the message. For an explanation of the error message, consult intro(2). This is considered a method failure. Depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state, or it might cause an attempted edit of a resource group or its resources to fail.

Solution:

If the error message is not self-explanatory, other syslog messages occurring at about the same time might provide evidence of the source of the problem. If not, save a copy of the /var/adm/messages files on all nodes, and (if the rgmd did crash) a copy of the rgmd core file, and contact your authorized Sun service provider for assistance.


258909 :clexecd: sigfillset returned %d. Exiting.

Description:

clexecd program has encountered a failed sigfillset(3C) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


259455 :in fe_set_env_vars malloc failed

Description:

The rgmd server was not able to allocate memory for the environment name, while trying to connect to the rpc.fed server, possibly due to low memory. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


259810 :reservation error(%s) - do_scsi3_reserve() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.


261123 :resource group %s state change to managed.

Description:

This is a notification from the rgmd that a resource group's state has changed. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.


262295 :Failback bailing out because resource group <%s>is being updated or switched

Description:

The rgmd was unable to failback the specified resource group to a more preferred node because the resource group was already in the process of being updated or switched.

Solution:

This is an informational message, no user action is needed.


262898 :Name service not available.

Description:

The monitor_check method detected that name service is not responsive.

Solution:

Check if name service is configured correctly. Try some commands to query name serves, such as ping and nslookup, and correct the problem. If the error still persists, then reboot the node.


263258 :CCR: More than one copy of table %s has the same version but different checksums. Using the table from node %s.

Description:

The CCR detects that two valid copies of the indicated table have the same version but different contents. The copy on the indicated node will be used by the CCR.

Solution:

This is an informational message, no user action is needed.


263606 :unpack_rg_seq: rname_to_r error <%s>

Description:

Due to an internal error, the rgmd was unable to find the specified resource data in memory.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


263706 :Could not allocate memory.

Description:

An attempt to allocate dynamic memory failed.

Solution:

Install more memory, increase swap space or reduce peak memory consumption.


265925 :CMM: Cluster lost operational quorum; aborting.

Description:

Not enough nodes are operational to maintain a majority quorum, causing the cluster to fail to avoid a potential split brain.

Solution:

The nodes should rebooted.


266059 :security_svc_reg failed.

Description:

The rpc.pmfd server was not able to initialize authentication and rpc initialization. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


266834 :CMM: Our partition has been preempted.

Description:

The cluster partition to which this node belongs has been preempted by another partition during a reconfiguration. The preempted partition will abort. If a cluster gets divided into two or more disjoint subclusters, exactly one of these must survive as the operational cluster. The surviving cluster forces the other subclusters to abort by grabbing enough votes to grant it majority quorum. This is referred to as preemption of the losing subclusters.

Solution:

There may be other related messages that may indicate why quorum was lost. Determine why quorum was lost on this node partition, resolve the problem and reboot the nodes in this partition.


267558 :Error when reading property %s.

Description:

Unable to read property value using API. Property name is indicated in message. Syslog messages may give more information on errors in other modules.

Solution:

Check syslog messages. Please report this problem.


267589 :launch_fed_prog: call to rpc.fed failed for program <%s>, step <%s>

Description:

Launching of fed program failed due to a failure of ucmmd to communicate with the rpc.fed daemon. If the rpc.fed process died, this might lead to a subsequent reboot of the node.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.


267673 :Validation failed. ORACLE binaries not found ORACLE_HOME=%s

Description:

Oracle binaries not found under ORACLE_HOME. ORACLE_HOME specified for the resource is indicated in the message. HA-Oracle will not be able to manage Oracle if ORACLE_HOME is incorrect.

Solution:

Specify correct ORACLE_HOME when creating resource. If resource is already created, please update resource property 'ORACLE_HOME'.


267724 :stat of file system %s failed.

Description:

HA-NFS fault monitor reports a probe failure on a specified file system.

Solution:

Make sure the specified path exists.


268593:Failed to take the resource out of PMF control. Sending SIGKILL now.

Description:

Process monitor facility failed to stop monitoring the application. The stop method will send SIGKILL to stop the application.

Solution:

No action required.


269240 :clconf: Write_ccr routine shouldn't be called from kernel

Description:

Routine write_ccr that writes a clconf tree out to CCR should not be called from kernel.

Solution:

No action required. This is informational message.


269902 :reservation fatal error(%s) - Unable to find gdev property

Description:

A required rawdisk device group property is missing.

Solution:

Executing '/usr/cluster/bin/scgdevs -L' on this node should generate the required property. If this successfully creates the required property, it should be possible to retry the failed operation. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


270043 :reservation warning(%s) - MHIOCENFAILFAST error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.


272139:Message Server Process is not running. pid was %d.

Description:

Message server process is not present on the process list, indicating message server process is not running on this node.

Solution:

No action needed. The fault monitor should detect that message server process is not running, and take appropriate action.


272238 :reservation warning(%s) - MHIOCGRP_RESERVE error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.


272434 :Validation failed. SYBASE text server startup file RUN_%s not found SYBASE=%s.

Description:

Text server was specified in the extension property Text_Server_Name. However, text server startup file was not found. Text server startup file is expected to be: $SYBASE/$SYBASE_ASE/install/RUN_<Text_Server_Name>

Solution:

Check the text server name specified in the Text_Server_Name property. Verify that SYBASE and SYBASE_ASE environment variables are set property in the Environment_file. Verify that RUN_<Text_Server_Name> file exists.


272732 :scvxvmlg warning - chmod(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


273018 :INTERNAL ERROR CMM: Failure starting CMM.

Description:

An instance of the userland CMM encountered an internal initialization error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


273354 :CMM: Node %s (nodeid = %d) is dead.

Description:

The specified node has died. It is guaranteed to be no longer running and it is safe to take over services from the dead node.

Solution:

The cause of the node failure should be resolved and the node should be rebooted if node failure is unexpected.


273638 :The entry %s and entry %s in property %s have the same port number: %d.

Description:

The two entries in the list property duplicate port number.

Solution:

Remove one of the entries or change its port number.


274386:reservation error(5s) - Could not determine controller number for device %s.

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the `node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the `release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing `/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the `make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the `primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


274421 :Port %d%c%s is listed twice in property %s, at entries %d and %d.

Description:

The port number in the message was listed twice in the named property, at the list entry locations given in the message. A port number should only appear once in the property.

Solution:

Specify the property with only one occurrence of the port number.


274506:Wrong data format from kstat: Expecting %d, Got %d.

Description:

Sun Cluster HA for NFS fault monitor failed to look up the specified kstat parameter. The specific cause is logged with the message.

Solution:

Run the following command on the cluster node where this problem was encountered: /usr/bin/kstat -m nfs -i 0 -n nfs_server -s calls Barring resource availability issues. This call should successfully complete. If it fails without generating any output, contact your authorized Sun service provider for assistance.


274605:Server is online.

Description:

The Sybase Adaptive Server is online.

Solution:

No user action required.


274887 :clcomm: solaris xdoor: rejected invo: door_return returned, errno = %d

Description:

An unusual but harmless event occurred. System operations continue unaffected.

Solution:

No user action is required.


274901 :Invalid protocol %s given as part of property %s.

Description:

The property named does not have a legal value.

Solution:

Assign the property a legal value.


276380 :"pmfadm -k": Error signaling <%s>: %s

Description:

An error occurred while rpc.pmfd attempted to send a signal to one of the processes of the given tag. The reason for the failure is also given. The signal was sent as a result of a 'pmfadm -k' command.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


276672 :reservation error(%s) - did_get_did_path() error

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


277995 :(%s) msg of wrong version %d, expected %d

Description:

Expected to receiver a message of a different version. udlmctl will fail.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


278240:Stopping fault monitor using pmf tag %s.

Description:

The fault monitor should be stopped using the Process Monitor Facility (PMF), with the tag indicated in the message.

Solution:

No user action required.


279084 :CMM: node reconfiguration #%lld completed.

Description:

The cluster membership monitor has processed a change in node or quorum status.

Solution:

This is an informational message, no user action is needed.


279152 :listener %s probe successful.

Description:

Informational message. Listener monitor successfully completed first probe.

Solution:

None


279309 :Failfast: Invalid failfast mode %s specified. Returning default mode PANIC.

Description:

An invalid value was supplied for the failfast mode. The software will use the default PANIC mode instead.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


280108 :clcomm: unable to rebind %s to name server

Description:

The name server would not rebind this entity.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


280256 :clnt_tp_create failed: %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it could not create the rpc handle. The rpc error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


280962:No ROOT HOST CONFIGURED.

Description:

The ROOT HOST is not configured in the bv1to1.conf file.

Solution:

Reconfigure the Sun Cluster HA for BroadVision One-To-One Enterprise site with the proper ROOT HOST.


281386 :dl_attach: DL_OK_ACK rtnd prim %u

Description:

Wrong primitive returned to the DL_ATTACH_REQ.

Solution:

Reboot the node. If the problem persists, check the documentation for the private interconnect.


281428 :Failed to retrieve the resource group handle: %s.

Description:

An API operation on the resource group has failed.

Solution:

For the resource group name, check the syslog tag. For more details, check the syslog messages from other components. If the error persists, reboot the node.


281680 :fatal: couldn't initialize ORB, possibly because machine is booted in non-cluster mode

Description:

The rgmd was unable to initialize its interface to the low-level cluster machinery. This might occur because the operator has attempted to start the rgmd on a node that is booted in non-cluster mode. The rgmd will produce a core file, and in some cases it might cause the node to halt or reboot to avoid data corruption.

Solution:

If the node is in non-cluster mode, boot it into cluster mode before attempting to start the rgmd. If the node is already in cluster mode, save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.


281819 :%s exited with error %s in step %s

Description:

A ucmm step execution failed in the indicated step.

Solution:

None. See /var/adm/messages for previous errors and report this problem if it occurs again during the next reconfiguration.


282406 :fork1 returned %d. Exiting.

Description:

clexecd program has encountered a failed fork1(2) system call. The error message indicates the error number for the failure.

Solution:

If the error number is 12 (ENOMEM), install more memory, increase swap space, or reduce peak memory consumption. If error number is something else, contact your authorized Sun service provider to determine whether a workaround or patch is available.


282508 :INTERNAL ERROR: r_state_at_least: state <%s> (%d)

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.


282828 :reservation warning(%s) - MHIOCRELEASE error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.


283262 :HA: rm_state_machine::service_suicide() not yet implemented

Description:

An unimplemented feature was activated.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


284006 :reservation fatal error(UNKNOWN) - Out of memory

Description:

The device fencing program has been unable to allocate required memory.

Solution:

Memory usage should be monitored on this node and steps taken to provide more available memory if problems persist. Once memory has been made available, the following steps may need to taken: If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, access to shared devices can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. The device group can be switched back to this node if desired by using the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.


284635:BV Update Completed successfully.

Description:

The update method completed successfully.

Solution:

No user action required.


284560 :Failed to offload resource group %s: %s

Description:

An attempt to offload the specified resource group failed. The reason for the failure is logged.

Solution:

Look for the message indicating the reason for this failure. This should help in the diagnosis of the problem. Otherwise, save a copy of the/var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.


284644 :Warning: node %d has a weight assigned to it for property %s, but node %d is not in the %s for resource %s.

Description:

A node has a weight assigned but the resource can never be active on that node, therefore it doesn't make sense to assign that node a weight.

Solution:

This is an informational message, no user action is needed. Optionally, the weight that is assigned to the node can be omitted.


286722 :scvxvmlg error - remove(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.


286807 :clnt_tp_create_timed of program %s failed %s.

Description:

HA-NFS fault monitor was not able to make an rpc connection to an nfs server.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


289194 :Can't perform failover: Failover mode set to NONE.

Description:

Cannot perform failover of the data service. Failover mode is set to NONE.

Solution:

This is informational message. If failover is desired, then set the Failover_mode value to SOFT or HARD using scrgadm(1M).


289278 :Error while retrieving resource group name.

Description:

An error occurred during the invocation of a DSDL API to obtain the resource group name.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


289503 :Unable to re-compute NFS resource list.

Description:

The list of HA-NFS resources online on the node has gotten corrupted.

Solution:

Make sure there is space available in /tmp. If the error is showing up despite that, reboot the node.


290644:Started sap processes under PMF successfully.

Description:

Sun Cluster HA for SAP starting under the control of the Process Monitor Facility (PMF).

Solution:

No user action required.


290735 :Conversion of hostnames failed.

Description:

Data service is unable to convert the specified hostname into an IP address.

Solution:

Check the syslog messages that occurred just before, to check whether there is any internal error. If there is, then contact your authorized Sun service provider. Otherwise, if the logical host and shared address entries are specified in the /etc/inet/hosts file, check these entries are correct. If this is not the reason then check the health of the name server.


290926:Successful validation.

Description:

The validation of the configuration for the data service was successful.

Solution:

No user action required.


291077 :Invalid variable name in the Environment file %s. Ignoring %s

Description:

HA-Oracle reads the fle specified in USER_ENV property and exports the variables declared in the file. Syntax for declaring the variables is VARIABLE=VALUE. Lines starting with "#" are treated as comment lines. VARIABLE is expected to be a valid Korn shell variable that starts with alphabet or "_" and contains alphanumerics and "_".

Solution:

Check the environment file and correct the syntax errors. Do not use export statement in environment file.


291245 :Invalid type %d passed.

Description:

An invalid value was passed for the program_type argument in the pmf routines.

Solution:

This is a programming error. Verify the value specified for program type argument and correct it. The valid types are: SCDS_PMF_TYPE_SVC: data service application, SCDS_PMF_TYPE_MON: fault monitor, and SCDS_PMF_TYPE_OTHER: other.


291986 :dl_bind ack bad len %d

Description:

Sanity check. The message length in the acknowledgment to the bind request is different from what was expected. We are trying to open a fast path to the private transport adapters.

Solution:

Reboot of the node might fix the problem.


292013 :clcomm: UioBuf: uio was too fragmented - %d

Description:

The system attempted to use a uio that had more than DEF_IOV_MAX fragments.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


295366 :Unable to mark the interface %s%d down, rc %d.

Description:

Topology Manager is done with using the adapter and failed when tried to mark the interface down.

Solution:

Need an user action for this message.


295666 :clcomm: setrlimit(RLIMIT_NOFILE): %s

Description:

During cluster initialization within this user process, the getrlimit call failed with the specified error.

Solution:

Read the man page for getrlimit for a more detailed description of the error.


295838 :Listener %s started.

Description:

Informational message. HA-Oracle successfully started Oracle listener.

Solution:

None


297061 :clcomm: can't get new reference

Description:

An attempt was made to obtain a new reference on a revoked handler.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.


297139 :CCR: More than one data server has override flag set for the table %s. Using the table from node %s.

Description:

The override flag for a table indicates that the CCR should use this copy as the final version when the cluster is coming up. In this case, the CCR detected multiple nodes having the override flag set for the indicated table. It chose the copy on the indicated node as the final version.

Solution:

This is an informational message, no user action is needed.


297178 :Error opening procfs control file <%s> for tag <%s>: %s

Description:

The rpc.pmfd server was not able to open a procfs control file, and the system error is shown. procfs control files are required in order to monitor user processes.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


297325 :The node portion of %s at position %d in property %s is not a valid node identifier or node name.

Description:

An invalid node was specified for the named property. The position index, which starts at 0 for the first element in the list, indicates which element in the property list was invalid.

Solution:

Specify a valid node instead.


297536 :Could not host device service %s because this node is being shut down

Description:

An attempt was made to start a device group on this node while the node was being shutdown.

Solution:

If the node was not being shutdown during this time, or if the problem persists, please contact your authorized Sun service provider to determine whether a workaround or patch is available.


297867 :(%s) t_bind: tli error: %s

Description:

Call to t_bind() failed. The "t_bind" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.


298320 :Command %s is too long.

Description:

The command string passed to the function is too long.

Solution:

Use a shorter command name or shorter path to the command.


298911 :getrlimit: %s

Description:

The rpc.pmfd server was not able to get the limit of files open. The message contains the system error. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


298911 :setrlimit: %s

Description:

The rpc.pmfd server was not able to set the limit of files open. The message contains the system error. This happens while the server is starting up, at boot time. The server does not come up, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.


299417 :in libsecurity strong Unix authorization failed

Description:

A server (rgmd) refused an rpc connection from a client because it failed the Unix authentication. This happens if a caller program using scha public api, either in its C form or its CLI form, is not running as root or is not making the rpc call over the loopback interface. An error message is output to syslog.

Solution:

Check that the calling program using the scha public api is running as root and is calling over the loopback interface. If both are correct, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.