Sun Cluster 3.0 5/02 Error Messages Guide

Error Message List

The following list is ordered by the message ID.

100088 :fatal: Got error <%d> trying to read CCR when making resource group <%s> managed; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

100293 :dl_bind: kstr_msg failed %d error

Description:

Could not bind to the private interconnect.

Solution:

Reboot of the node might fix the problem.

100396 :clexecd: unable to arm failfast.

Description:

clexecd problem could not enable one of the mechanisms which causes the node to be shutdown to prevent data corruption, when clexecd program dies.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

100555 :libsecurity: getnetconfigent error: %s

Description:

A client of the rpc.pmfd, rpc.fed or rgmd server was not able to initiate an rpc connection, because it could not get the network information. The pmfadm or scha command exits with error. The rpc error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

102218 :couldn't initialize ORB, possibly because machine is booted in non-cluster mode

Description:

could not initialize ORB.

Solution:

Please make sure the nodes are booted in cluster mode.

102340 :Prog <%s> step <%s>: authorization error.

Description:

An attempted program execution failed, apparently due to a security violation; this error should not occur. This failure is considered a program failure.

Solution:

Correct the problem identified in the error message. If necessary, examine other syslog messages occurring at about the same time to see if the problem can be diagnosed. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.

103196:An error occured while obtaining the global service name associated with global device path %s.

Description:

An error code was returned by a DCS API while translating the device path to the DCS service name.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

104035:Failed to start sap processes with command %s.

Description:

Sun Cluster HA for SAP Central Instance failed to start on this cluster node. It should start on some other cluster node if there is another cluster node available.

Solution:

If the Central Instance failed to start on any other node, disable the Sun Cluster HA for SAP Central Instance resource, then try to run the same command manually, and fix any problem found. Save the /var/adm/messages files from all nodes. Contact your authorized Sun service provider.

104165 :Resource <%s> of Resource Group <%s> failed pingpong check on node <%s>.

Description:

A scha_control(1HA,3HA) call has failed because no healthy new master could be found for the resource group. A given node is considered unhealthy for a given resource if that same resource has recently initiated a failover of that node by a previous scha_control call. In this context, "recently" means within the past Pingpong_interval seconds, where Pingpong_interval is a user-configurable property of the resource group. The default value of Pingpong_interval is 3600 seconds. This check is performed to avoid the situation where a resource group repeatedly "ping-pongs" or moves back and forth between two or more nodes, which might occur if some external problem prevents the resource group from running successfully on *any* node.

Solution:

A properly-implemented resource monitor, upon encountering the failure of a scha_control call, should sleep for awhile and restart its probes. If the resource remains unhealthy, the problem that caused the scha_control call to fail (such as ping-pong check described above) will eventually resolve, permitting a later scha_control request to succeed. Therefore, no user action is required. If the system administrator wishes to permit failovers to be attempted even at the risk of ping-pong behavior, the Pingpong_interval property of the resource group should be set to a smaller value.

104914 :CCR: Failed to set epoch on node %s errno = %d.

Description:

The CCR was unable to set the epoch number on the indicated node. The epoch was set by CCR to record the number of times a cluster has come up. This information is part of the CCR metadata.

Solution:

There may be other related messages on the indicated node, which may help diagnose the problem, for example: If the root file system is full on the node, then free up some space by removing unnecessary files. If the root disk on the afflicted node has failed, then it needs to be replaced.

105337 :WARNING: thr_getspecific %d

Description:

The rgmd has encountered a failed call to thr_getspecific(3T). The error message indicates the reason for the failure. This error is non-fatal.

Solution:

If the error message is not self-explanatory, contact your authorized Sun service provider for assistance in diagnosing the problem.

105450:Validation failed. ASE directory %s does not exist.

Description:

The Sybase Adaptive Server Environment directory does not exist. The SYBASE_ASE environment variable might be incorrectly set or the installation might be incorrect.

Solution:

Verify the SYBASE_ASE environment variable value and the Sybase installation.

106181 :WARNING: lkcm_act: %d returned from udlm_recv_message (the error was successfully masked from upper layers).

Description:

Unexpected error during a poll for dlm messages.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

108357 :lookup: unknown binding type <%d>

Description:

During a name server lookup an unknown binding type was encountered.

Solution:

No action required. This is informational message.

108990 :CMM: Cluster members: %s.

Description:

This message identifies the nodes currently in the cluster.

Solution:

This is an informational message, no user action is needed.

109102 :%s should be larger than %s.

Description:

The value of Thorough_Probe_Interval specified in scrgadm command or in CCR table was smaller than Cheap_Probe_Interval.

Solution:

Reissue the scrgadm command with appropriate values as indicated.

109105 :(%s) setitimer failed: %d: %s (UNIX errno %d)

Description:

Call to setitimer() failed. The "setitimer" man page describes possible error codes. udlmctl will exit.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

110012 :lkcm_dreg failed to communicate to CMM ... will probably failfast: %s

Description:

Could not deregister udlm from ucmm. This node will probably failfast.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

110097 :Major number for driver (%s) does not match the one on other nodes.

Description:

The driver identified in this message does not have the same major number across cluster nodes, and devices owned by the driver are being used in global device services.

Solution:

Look in the /etc/name_to_major file on each cluster node to see if the major number for the driver matches across the cluster. If a driver is missing from the /etc/name_to_major file on some of the nodes, then most likely, the package the driver ships in was not installed successfully on all nodes. If this is the case, install that package on the nodes that don't have it. If the driver exists on all nodes but has different major numbers, see the documentation that shipped with this product for ways to correct this problem.

111527 :Method <%s> on resource <%s>: unknown command.

Description:

An internal logic error in the rgmd has prevented it from successfully executing a resource method.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

111697 :Failed to delete scalable service in group %s for IP %s Port %d%c%s: %s.

Description:

A call to the underlying scalable networking code failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

114036 :clexecd: Error %d from putmsg

Description:

clexecd program has encountered a failed putmsg(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

114550 :Unable to create <%s>: %s.

Description:

The HA-NFS stop method attempted to create the specified file but failed.

Solution:

Check the error message for the reason of failure and correct the situation. If unable to correct the situation, reboot the node.

114568:Adaptive server successfully started.

Description:

Sun Cluster HA for Sybase successfully started the Sybase Adaptive Server.

Solution:

No user action required.

115256 :file specified in USER_ENV %s doesn't exist

Description:

'User_env' property was set when configuring the resource. File specified in 'User_env' property does not exist or is not readable. File should be specified with fully qualified path.

Solution:

Specify existing file with fully qualified file name when creating resource. If resource is already created, please update resource property 'User_env'.

115461 :in libsecurity __rpc_get_local_uid failed

Description:

A server (rpc.pmfd, rpc.fed or rgmd) refused an rpc connection from a client because it failed the Unix authentication, because it is not making the rpc call over the loopback interface. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

115987 :execvp: %s

Description:

The rpc.pmfd server was not able to exec a new process, possibly due to bad arguments. The message contains the system error. The server does not perform the action requested by the client, and an error message is output to syslog.

Solution:

Investigate that the file path to be executed exists. If all looks correct, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

117498 :scha_resource_get error (%d) when reading extension property %s

Description:

Error occurred in API call scha_resource_get.

Solution:

Check syslog messages for errors logged from other system modules. Stop and start fault monitor. If error persists then disable fault monitor and report the problem.

118046 :rebalance: no primary node could be found for resource group <%s>.

Description:

The rgmd is unable to bring the resource group online because all of its potential masters are down.

Solution:

Repair and reboot broken nodes so they may rejoin the cluster; or use scrgadm(1M) to edit the Nodelist property of the resource group so that it includes nodes that are cluster members.

118261:Successfully stopped the service %s.

Description:

Specified data service successfully stopped.

Solution:

No user action required.

119120 :clconf: Key length is more than max supported length in clconf_ccr read

Description:

In reading configuration data through CCR, found the key length is more than max supported length.

Solution:

Check the CCR configuration information.

119649 :clcomm: Unregister of pathend state proxy failed

Description:

The system failed to unregister the pathend state proxy.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

120470 :(%s) t_sndudata: tli error: %s

Description:

Call to t_sndudata() failed. The "t_sndudata" man page describes possible error codes. udlmctl will exit.

Solution:

Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

121513:Successfully restarted service.

Description:

This message indicates that the RGM successfully restarted the resource.

Solution:

This is an informational message, no user action is required.

121858 :tag %s: not suspended, cannot resume

Description:

The user sent a resume command to the rpc.fed server for a tag that is not suspended. An error message is output to syslog.

Solution:

Check the tag name.

123526 :Prog <%s> step <%s>: Execution failed: no such method tag.

Description:

An internal error has occurred in the rpc.fed daemon which prevents step execution. This is considered a step failure.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem. Re-try the edit operation.

123984 :All specified global device services are available.

Description:

All global device services associated with both GlobalDevicePaths and FilesystemMountPoint extension properties have been validated successfully and are found to be in the normal state. The RGM and DSDL components are found to be in the normal state.

Solution:

An informational message only. No action is needed.

124232 :clcomm: solaris xdoor fcntl failed: %s

Description:

A fcntl operation failed. The "fcntl" man page describes possible error codes.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

124810 :fe_method_full_name() failed for resource <%s>, resource group <%s>, method <%s>

Description:

Due to an internal error, the rgmd was unable to assemble the full method pathname. This is considered a method failure. Depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

125159 :Load balancer setting distribution on %s:

Description:

The load balancer is setting the distribution for the specified service group.

Solution:

This is an informational message, no user action is needed.

125356 :Failed to connect to %s:%d:%s.

Description:

The data service fault monitor probe was trying to connect to the host and port specified and failed. There may be a prior message in syslog with further information.

Solution:

Make sure that the port configuration for the data service matches the port configuration for the underlying application.

126142 :fatal: new_str strcpy: %s (UNIX error %d)

Description:

The rgmd failed to allocate memory, most likely because the system has run out of swap space. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

The problem is probably cured by rebooting. If the problem recurs, you might need to increase swap space by configuring additional swap devices. See swap(1M) for more information.

126318 :fatal: Unknown object type bound to %s

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

126467 :HA: not implemented for userland

Description:

An invocation was made on an HA server object in user land. This is not currently supported.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

127182 :fatal: thr_create returned error: %s (UNIX error %d)

Description:

The rgmd failed in an attempt to create a thread. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Fix the problem described by the UNIX error message. The problem may have already been corrected by the node reboot.

127411 :Error in reading /etc/mnttab: getmntent() returns <%d>

Description:

Failed to read /etc/mnttab.

Solution:

Check with system administrator and make sure /etc/mnttab is properly defined.

127624 :must be superuser to start %s

Description:

Process ucmmd did not get started by superuser. ucmmd is going to exit now.

Solution:

None. This is an internal error.

129832 :Incorrect syntax in Environment_file.Ignoring %s

Description:

Incorrect syntax in Environment_file. Correct syntax is: VARIABLE=VALUE

Solution:

Please check the Environment_file and correct the syntax errors.

130822 :CMM: join_cluster: failed to register ORB callbacks with CMM.

Description:

The system can not continue when callback registration fails.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

131492 :pxvfs::mount(): global mounts are not enabled (need to run "clconfig -g" first)

Description:

A global mount command is attempted before the node has initialized the global file system name space. Typically this caused by trying to perform a global mount while the system is booted in single user mode.

Solution:

If the system is not at run level 2 or 3, change to run level 2 or 3 using the init(1M) command. Otherwise, check message logs for errors during boot.

132032 :clexecd: strdup returned %d. Exiting.

Description:

clexecd program has encountered a failed strdup(3C) system call. The error message indicates the error number for the failure.

Solution:

If the error number is 12 (ENOMEM), install more memory, increase swap space, or reduce peak memory consumption. If error number is something else, contact your authorized Sun service provider to determine whether a workaround or patch is available.

134417 :Global service <%s> of path <%s> is in maintenance.

Description:

Service is not supported by HA replica.

Solution:

Resume the service by using scswitch(1m).

135918 :CMM: Quorum device %ld (%s) added; votecount = %d, bitmask of nodes with configured paths = 0x%llx.

Description:

The specified quorum device with the specified votecount and configured paths bitmask has been added to the cluster. The quorum subsystem treats a quorum device in maintenance state as being removed from the cluster, so this message will be logged when a quorum device is taken out of maintenance state as well as when it is actually added to the cluster.

Solution:

This is an informational message, no user action is needed.

136330:This resource depends on a HAStoragePlus resource that is not online. Unable to perform validations.

Description:

The resource depends on a HAStoragePlus resource that is not online on any node. Some of the files required for validation checks are not accessible. Validations cannot be performed on any node.

Solution:

Enable the HAStoragePlus resource that the resource depends on and reissue the command.

136955:Failed to retrieve main dispatcher pid.

Description:

Failed to retrieve the process ID for the main dispatcher process, indicating the main dispatcher process is not running.

Solution:

No action needed. The fault monitor should detect that the main dispatcher process is not running, and take appropriate action.

137294 :method_full_name: strdup failed

Description:

The rgmd server was not able to create the full name of the method, while trying to connect to the rpc.fed server, possibly due to low memory. An error message is output to syslog.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

137606 :clcomm: Pathend %p: disconnect_node not allowed

Description:

The system maintains state information about a path. The disconnect_node operation is not allowed in this state.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

138261 :File system associated with mount point %s is to be locally mounted. The AffinityOn value cannot be FALSE.

Description:

A local file system mount point is specified in the FilesystemMountPoint extension property. The AffinityOn extension property is specified to be FALSE. Local file systems cannot be specified with AffinityOn value set to be TRUE.

Solution:

Ensure that AffinityOn is set to TRUE in case one or more local file systems are to be managed by HAStoragePlus.

138972 :could not set timeout: %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it could not set the rpc call timeout. The rpc error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

139773 :clexecd: Error %d from strdup

Description:

clexecd program has encountered a failed strdup(3C) system call. The error message indicates the error number for the failure.

Solution:

140225 :The request to relocate resource %s completed successfully.

Description:

The resource named was relocated to a different node.

Solution:

This is an informational message, no user action is needed.

141062 :Failed to connect to host %s and port %d: %s.

Description:

An error occurred while fault monitor attempted to probe the health of the data service.

Solution:

Wait for the fault monitor to correct this by doing restart or failover. For more error description, look at the syslog messages.

141236 :Failed to format stringarray for property %s from value %s.

Description:

The validate method for the scalable resource network configuration code was unable to convert the property information given to a usable format.

Solution:

Verify the property information was properly set when configuring the resource.

141242 :HA: revoke not implemented for replica_handler

Description:

An attempt was made to use a feature that has not been implemented.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

141970 :in libsecurity caller has bad uid: get_local_uid=%d authsys=%d desired uid=%d

Description:

A server (rpc.pmfd, rpc.fed or rgmd) refused an rpc connection from a client because it has the wrong uid. The actual and desired uids are shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

142779 :Unable to open failfast device

Description:

A server (rpc.pmfd or rpc.fed) was not able to establish a link to the failfast device, which ensures that the host aborts if the server dies. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

142889:Starting up saposcol process under PMF times out.

Description:

The Sun Cluster HA for SAP timed out while the Sun Cluster HA for SAP start method started the OS collector process under the control of the Process Monitor Facility (PMF). This might happen under heavy system load.

Solution:

Increase the start timeout value.

143622 :PNM: adapter %s is %s

Description:

A network adapter has been determined to be either "ok" or "faulty" by PNM, based on a network adapter fault detection algorithm.

Solution:

For a network adapter determined to be "faulty", check that the physical connections between the adapter and its router are intact, including the adapter, any cables, hubs, and switches. Replace any broken component accordingly.

143694 :lkcm_act: caller is already registered

Description:

Message indicating that udlm is already registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

144303 :fatal: uname: %s (UNIX error %d)

Description:

A uname(2) system call failed. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

145270 :Cannot determine if the server is secure: assuming non-secure.

Description:

While parsing the Netscape configuration file to determine if the Netscape server is running under secure or non-secure mode an error occurred. This error results in the Data Service assuming a non-secure Netscape server, and will probe the server as such.

Solution:

Check the Netscape configuration file to make sure that it exists and that it contains information about whether the server is running as a secure server or not.

145770 :CMM: Monitoring disabled.

Description:

Transport path monitoring has been disabled in the cluster. It is enabled by default.

Solution:

This is an informational message, no user action is needed.

145893 :CMM: Unable to read quorum information. Error = %d.

Description:

The specified error was encountered while trying to read the quorum information from the CCR. This is probably because the CCR tables were modified by hand, which is an unsupported operation. The node will panic.

Solution:

Reboot the node in non-cluster (-x) mode and restore the CCR tables from the other nodes in the cluster or from backup. Reboot the node back in cluster mode. The problem should not reappear.

146238 :CMM: Halting to prevent split brain with node %ld.

Description:

Due to a connection failure with the specified node, the CMM is failing this node to prevent split brain partial connectivity.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.

146961 :Signal %d terminated the child process.

Description:

An unexpected signal caused the termination of the program that checks the availability of name service.

Solution:

Save a copy of the /var/adm/messages files on all nodes. If a core file was generated, submit the core to your service provider. Contact your authorized Sun service provider for assistance in diagnosing the problem.

148023 :method <%s> completed successfully for resource <%s>, resource group <%s>

Description:

RGM invoked a callback method for the named resource, as a result of a cluster reconfiguration, scha_control GIVEOVER, or scswitch. The method completed successfully.

Solution:

This is an informational message, no user action is needed.

148393 :Unable to create thread. Exiting.\n

Description:

clexecd program has encountered a failed thr_create(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

148465 :Prog <%s> step <%s>: RPC connection error.

Description:

An attempted program execution failed, due to an RPC connection problem. This failure is considered a program failure.

Solution:

Examine other syslog messages occurring around the same time on the same node, to see if the cause of the problem can be identified. If the same error recurs, you might have to reboot the affected node.

148526 :fatal: Cannot get local nodename

Description:

An internal error has occurred. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

148821 :fatal: Error in trying to access the configured network resources : %s.

Description:

Failed to get the available network address resources for this resource.

Solution:

This is an internal error. Save the /var/adm/messages file and contact an authorized Sun service provider.

148902 :No node was specified as part of property %s for element %s. The property must be specified as %s=Weight%cNode,Weight%cNode,...

Description:

The property was specified incorrectly.

Solution:

Set the property using the correct syntax.

149184 :clcomm: inbound_invo::signal:_state is 0x%x

Description:

The internal state describing the server side of a remote invocation is invalid when a signal arrives during processing of the remote invocation.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

150105 :This list element in System property %s has an invalid IP address (hostname): %s.

Description:

The system property that was named does not have a valid hostname or dotted-decimal IP address string.

Solution:

Change the value of the property to use a valid hostname or dotted-decimal IP address string.

150535 :clcomm: Could not find %s(): %s

Description:

The function get_libc_func could not find the specified function for the reason specified. Refer to the man pages for "dlsym" and "dlerror" for more information.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

152159 :WARNING: lkcm_sync: udlm_send_reply failed, forcing reconfiguration

Description:

A reconfiguration will start.

Solution:

None.

152478 :Monitor_retry_count or Monitor_retry_interval is not set.

Description:

The resource properties Monitor_retry_count or Monitor_retry_interval has not set. These properties control the restarts of the fault monitor.

Solution:

Check whether the properties are set. If not, set these values using scrgadm(1M).

152546 :ucm_callback for stop_trans generated exception %d

Description:

ucmm callback for stop transition failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

153018 :WARNING: missing msg, expected: don't_care, %d, %d, but received: %d %d, %d. FORCING reconfiguration.

Description:

Unexpected message received by udlm. This will trigger an OPS reconfiguration.

Solution:

None.

154317 :launch_validate: fe_method_full_name() failed for resource <%s>, resource group <%s>, method <%s>

Description:

Due to an internal error, the rgmd was unable to assemble the full method pathname for the VALIDATE method. This is considered a VALIDATE method failure. This in turn will cause the failure of a creation or update operation on a resource or resource group.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified. Retry the creation or update operation. If the problem recurs, save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

155479 :ERROR: VALIDATE method timeout property of resource <%s> is not an integer

Description:

The indicated resource's VALIDATE method timeout, as stored in the CCR, is not an integer value. This might indicate corruption of CCR data or rgmd in-memory state; the VALIDATE method invocation will fail. This in turn will cause the failure of a creation or update operation on a resource or resource group.

Solution:

Use scrgadm(1M) -pvv to examine resource properties. If the VALIDATE method timeout or other property values appear corrupted, the CCR might have to be rebuilt. If values appear correct, this may indicate an internal error in the rgmd. Retry the creation or update operation. If the problem recurs, save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

155830 :Invalid probe values. Retry_interval must be greater than or equal to the product of Thorough_probe_interval, and Retry_count.

Description:

Validation of the probe related parameters failed because invalid values were specified.

Solution:

Retry_interval must be greater than or equal to the product of Through_probe_interval, and Retry_count. Use scrgadm(1M) to modify the values of these parameters so that they will hold the above relationship.

156527 :Unable to execute <%s>: <%s>.

Description:

Sun Cluster was unable to execute a command.

Solution:

The problem could be caused by: 1) No more process table entries for a fork() 2) No available memory For the above two causes, the only option is to reboot the node. The problem might also be caused by: 3) The command that could not execute is not correctly installed For the above cause, the command might have the wrong path or file permissions. Correctly install the command.

156889 :Specified global device path %s is invalid.

Description:

One or more global device paths specified in the GlobalDevicePaths extension property is unrecognized by the Device Configuration Service component.

Solution:

Ensure that all entries are valid DCS global device paths.

157213 :CCR: The repository on the joining node %s could not be recovered, join aborted.

Description:

The indicated node failed to update its repository with the ones in current membership. And it will not be able to join the current membership.

Solution:

There may be other related messages on the indicated node, which help diagnose the problem, for example: If the root disk failed, it needs to be replaced. If the root disk is full, remove some unnecessary files to free up some space.

158530 :CMM: Halting because this node is severely short of resident physical memory; availrmem = %ld pages, tune.t_minarmem = %ld pages.

Description:

The local node does not have sufficient resident physical memory due to which it may declare other nodes down. To prevent this action, the local node is going to halt.

Solution:

There may be other related messages that may indicate the cause for the node having reached the low memory state. Resolve the problem and reboot the node. If unable to resolve the problem, contact your authorized Sun service provider to determine whether a workaround or patch is available

158836 :Endpoint %s initialization error - errno = %d, failing associated pathend.

Description:

Communication with another node could not be established over the path.

Solution:

Any interconnect failure should be resolved, and/or the failed node rebooted.

158981 :Path <%s> is not valid file system mount point specified in /etc/vfstab.

Description:

The "ServicePaths" property of the hastorage resource should be valid disk group or device special file or global file system mount point specified in the /etc/vfstab file.

Solution:

Check the definition of the extension property "ServicePaths" of SUNW.HAStorage type resource. If they are file system mount points, verify that the /etc/vfstab file contains correct entries.

159059 :IP address (hostname) %s from %s at entry %d in list property %s does not belong to any network resource used by resource %s.

Description:

The hostname or dotted-decimal IP address string in the message does not resolve to an IP address equal to any resolved IP address from the named resource's Network_resources_used property. Any explicitly named hostname or dotted-decimal IP address string in the named list property must resolve to an IP address equal to a resolved IP address from Network_resources_used.

Solution:

Either modify the hostname or dotted-decimal IP address string from the entry in the named property or modify Network_resources_used so that the entry resolves to an IP address equal to a resolved IP address from Network_resources_used.

159501 :host %s failed: %s

Description:

The rgm is not able to establish an rpc connection to the rpc.fed server on the host shown, and the error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

159592 :clcomm: Cannot make high %d less than current total %d

Description:

An attempt was made to change the flow control policy parameter specifying the high number of server threads for a resource pool. The system does not allow the high number to be reduced below current total number of server threads.

Solution:

No user action required.

160167 :Server successfully started.

Description:

Informational message. Oracle server has been successfully started by HA-Oracle.

Solution:

None

160400 :fatal: fcntl(F_SETFD): %s (UNIX error %d)

Description:

This error should not occur. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

160619 :Could not enlarge buffer for DBMS log messages: %m

Description:

Fault monitor could not allocate memory for reading RDBMS log file. As a result of this error, fault monitor will not scan errors from log file. However it will continue fault monitoring.

Solution:

Check if system is low on memory. If problem persists, please stop and start the fault monitor.

161104:Adaptive server stopped.

Description:

Sun Cluster HA for Sybase shut down the Sybase Adaptive Server.

Solution:

No user action required.

161275 :reservation fatal error (UNKNOWN) - Illegal command line option

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

161683:%s/%s/install/startserver does not have execute permissions set.

Description:

The Sybase Adaptive Server starts by execution of the startserver file. The file's current permissions prevent its execution. The full path name of the startserver file is specified as a part of this message. This file is located in the $SYBASE/$ASE/install directory.

Solution:

Verify the permissions of the startserver file and ensure that it can be executed. If not, modify its execute permissions.

161934:pid %d is stopped.

Description:

Sun Cluster HA for NFS fault monitor detected that the specified process stopped with a signal.

Solution:

No user action required. Sun Cluster HA for NFS fault monitor should kill and restart the process.

161991 :Load balancer for group '%s' setting weight for node %s to %d

Description:

This message indicates that the user has set a new weight for a particular node from an old value.

Solution:

This is an informational message, no user action is needed.

162419 :ERROR: launch_method: cannot get Failover_mode for resource <%s>, assuming NONE.

Description:

A method execution has failed or timed out. For some reason, the rgmd is unable to obtain the Failover_mode property of the resource. The rgmd assumes a setting of NONE for this property, therefore avoiding the outcome of rebooting the node (for STOP method failure) or failing over the resource group (for START method failure). For these cases, the resource is placed into a STOP_FAILED or START_FAILED state, respectively.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and contact your authorized Sun service provider for assistance in diagnosing the problem.

162851:Unable to lookup nfs:nfs_server:calls from kstat.

Description:

Sun Cluster HA for NFS fault monitor failed to look up the specified kstat parameter. The specific cause is logged with the message.

Solution:

Run the following command on the cluster node where this problem was encountered: /usr/bin/kstat -m nfs -i 0 -n nfs_server -s calls Barring resource availability issues. This call should successfully complete. If it fails without generating any output, contact your authorized Sun service provider for assistance.

163027 :CMM: Quorum device %s: owner set to node %ld.

Description:

The specified node has taken ownership of the specified quorum device.

Solution:

This is an informational message, no user action is needed.

164164:Starting Sybase %s: %s. Startup file: %s

Description:

Sybase server is going to be started by Sun Cluster HA for Sybase.

Solution:

This is an information message, no user action is needed.

164168 :PNM: nafo%d: state transition from %s to %s on %s

Description:

A state transition has happened for a NAFO group. Transition to DOUBT happens when the active adapter is determined to be faulty by PNM. Transition to DOWN happens when all adapters in a NAFO group is determined to be faulty, resulting in a loss of network connectivity to a given subnet.

Solution:

If a NAFO group transitions to DOWN state, check for error messages about adapters being faulty and take suggested user actions accordingly. No user user action: is needed for other state transitions.

164757 :reservation fatal error(%s) - realloc() error, errno %d

Description:

The device fencing program has been unable to allocate required memory.

Solution:

Memory usage should be monitored on this node and steps taken to provide more available memory if problems persist. Once memory has been made available, the following steps may need to taken: If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, access to shared devices can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. The device group can be switched back to this node if desired by using the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

165512 :reservation error(%s) - my_map_to_did_device() error in other_node_status()

Description:

The device fencing program has suffered an internal error.

Solution:

165527 :Oracle UDLM package is not properly installed. %s not found.

Description:

Oracle udlm package installation problem.

Solution:

Make sure Oracle UDLM package is properly installed.

165731:Backup server successfully started.

Description:

Sun Cluster HA for Sybase successfully started the Backup Server.

Solution:

No user action required.

166362 :clexecd: Got back %d from I_RECVFD. Looks like parent is dead.

Description:

Parent process in the clexecd program is dead.

Solution:

If the node is shutting down, ignore the message. If not, the node on which this message is seen, will shutdown to prevent to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

166489:reservation error(%s) error. Node %d is not in the cluster.

Description:

A node which the device fencing program was communicating with the cluster.

Solution:

This is an information message, no user action needed.

166560 :Maximum Primaries is %d. It should be 1.

Description:

Invalid value has set for Maximum Primaries. The value should be 1.

Solution:

Reset this value using scrgadm(1M).

166590:NULL value returned for the extension property <%s>.

Description:

The extension property <%s> is set to NULL in the RTR File. This is a serious error. The RTR file might be corrupted.

Solution:

Reload the package for Sun Cluster HA for NetBackup. If this problem persists, contact your authorized Sun service provider for assistance.

167108 :Starting Oracle server.

Description:

Informational message. Oracle server is being started by HA-Oracle.

Solution:

None

167253 :Server stopped successfully.

Description:

Informational message. Oracle server successfully stopped.

Solution:

None

168150 :INTERNAL ERROR CMM: Cannot bind quorum algorithm object to local name server.

Description:

There was an error while binding the quorum subsystem object to the local name server.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

168383 :Service not started

Description:

There was a problem detected in the initial startup of the service.

Solution:

Attempt to start the service by hand to see if there are any apparent problems with the application. Correct these problems and attempt to start the data service again.

168630 :could not read cluster name

Description:

Could not get cluster name. Perhaps the system is not booted as part of the cluster.

Solution:

Make sure the node is booted as part of a cluster.

168970 :sun_udlm_read_oracle_cfg: open failed: %s ... will use default values

Description:

Database connection check failed indicating the database might be down. HA-SAP will not take any action, but will check the database connection again after the time specified.

Solution:

Make sure the database and the HA software for the database are functioning properly.

169308:Database might be down, HA-SAP will not take any action. Will check again in %d seconds.

Description:

The database connection check failed, indicating the database might be down. Sun Cluster HA for SAP should not take any action, but should check the database connection again after the time specified.

Solution:

Ensure that the database and the HA software for the database are functioning properly.

169606 :Unable to create thread. Exiting.

Description:

clexecd program has encountered a failed thr_create(2) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

169608 :INTERNAL ERROR: scha_control_action: invalid action <%d>

Description:

The scha_control function has encountered an internal logic error. This will cause scha_control to fail with a SCHA_ERR_INTERNAL error, thereby preventing a resource-initiated failover.

Solution:

Please save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

169765 :Configuration file not found.

Description:

Internal error. Configuration file for online_check not found.

Solution:

Please report this problem.

171031 :reservation fatal error(%s) - get_control() failure.

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the `node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the `release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing `/usr/cluster/lib.sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the `make_primary' transition, then a device group has failed to start on this node. If another node was available to hose the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the `primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

171786 :listener %s is not running. Attempting restart.

Description:

Listen monitor has detected failure of listener. Monitor will attempt to restart the listener.

Solution:

None

171878 :in libsecurity setnetconfig failed when initializing the client: %s - %s

Description:

A client was not able to make an rpc connection to a server (rpc.pmfd, rpc.fed or rgmd) because it could not establish a rpc connection for the network specified. The rpc error and the system error are shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

172566 :Stopping oracle server using shutdown abort

Description:

Informational message. Oracle server will be stopped using 'shutdown abort' command.

Solution:

Examine 'Stop_timeout' property of the resource and increase 'Stop_timeout' if you don't wish to use 'shutdown abort' for stopping Oracle server.

173733 :Failed to retrieve the resource type property %s for %s: %s.

Description:

The query for a property failed. The reason for the failure is given in the message.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

174568 :Error while retrieving property %s: %s.

Description:

An error occurred during the invocation of a DSDL API to obtain a resource extension property.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

174751 :Failed to retrieve the process monitor facility tag.

Description:

Failed to create the tag that has used to register with the process monitor facility.

Solution:

Check the syslog messages that occurred just before this message. In case of internal error, save the /var/adm/messages file and contact authorized Sun service provider.

174928 :ERROR: process_resource: resource <%s> is offline pending boot, but no BOOT method is registered

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.

175370 :svc_restore_priority: Could not restore original scheduling parameters: %s

Description:

The server was not able to restore the original scheduling mode. The system error message is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

175553 :clconf: Your configuration file is incorrect! The type of property %s is not found

Description:

Could not find the type of property in the configuration file.

Solution:

Check the configuration file.

176151:Unable to lookup nfs:nfs_server from kstat:%s.

Description:

Sun Cluster HA for NFS fault monitor failed to look up the specified kstat parameter. The specific cause is logged with the message.

Solution:

176860 :Error: Unable to update scha_control timestamp file <%s> for resource <%s>

Description:

The rgmd failed in a call to utime(2) on the local node. This may prevent the anti-"pingpong" feature from working, which may permit a resource group to fail over repeatedly between two or more nodes. The failure of the utime call might indicate a more serious problem on the node.

Solution:

Examine other syslog messages occurring around the same time on the same node, to see if the source of the problem can be identified.

177070 :Got back %d in revents of the control fd. Exiting.

Description:

clexecd program has encountered an error.

Solution:

The clexecd program will exit and the node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

177252 :reservation warning(%s) - MHIOCGRP_INRESV error will retry in %d seconds

Description:

The device fencing program has encountered errors while trying to access a device. The failed operation will be retried

Solution:

This is an informational message, no user action is needed.

177899 :t_bind (open_cmd_port) failed

Description:

Call to t_bind() failed. The "t_bind" man page describes possible error codes. ucmmd will exit and the node will abort.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

179364 :CCR: Invalid CCR metadata.

Description:

The CCR could not find valid metadata on all nodes of the cluster.

Solution:

Boot the cluster in -x mode to restore the cluster repository on all the nodes in the cluster from backup. The cluster repository is located at /etc/cluster/ccr/.

180002 :Failed to stop the monitor server using %s.

Description:

Sun Cluster HA for Sybase failed to stop the backup server using the file specified in the STOP_FILE property. Other syslog messages and the log file will provide additional information on possible reasons for the failure. It is likely that adaptive server terminated prior to shutdown of monitor server.

Solution:

Please check the permissions of file specified in the STOP_FILE extension property. File should be executable by the Sybase owner and root user.

181193 :Cannot access file <%s>, err = <%s>

Description:

The rgmd has failed in an attempt to stat(2) a file used for the anti-"pingpong" feature. This may prevent the anti-pingpong feature from working, which may permit a resource group to fail over repeatedly between two or more nodes. The failure to access the file might indicate a more serious problem on the node.

Solution:

Examine other syslog messages occurring around the same time on the same node, to see if the source of the problem can be identified.

183071 :Cannot Execute %s: %s.

Description:

Failure in executing the command.

Solution:

Check the syslog message for the command description. Check whether the system is low in memory or the process table is full and take appropriate action. Make sure that the executable exists.

183799 :clconf: CSR not initialized

Description:

While executing task in clconf and modifying the state of proxy, found component CSR not initialized.

Solution:

Check the CSR component in the configuration file.

184139 :scvxvmlg warning - found no match for %s, removing it

Description:

The program responsible for maintaining the VxVM device namespace has discovered inconsistencies between the VxVM device namespace on this node and the VxVM configuration information stored in the cluster device configuration system. If configuration changes were made recently, then this message should reflect one of the configuration changes. If no changes were made recently or if this message does not correctly reflect a change that has been made, the VxVM device namespace on this node may be in an inconsistent state. VxVM volumes may be inaccessible from this node.

Solution:

If this message correctly reflects a configuration change to VxVM diskgroups then no action is required. If the change this message reflects is not correct, then the information stored in the device configuration system for each VxVM diskgroup should be examined for correctness. If the information in the device configuration system is accurate, then executing '/usr/cluster/lib/dcs/scvxvmlg' on this node should restore the device namespace. If the information stored in the device configuration system is not accurate, it must be updated by executing '/usr/cluster/bin/scconf -c -D name=diskgroup_name' for each VxVM diskgroup with inconsistent information.

185089 :CCR: Updating table %s failed to startup on node %s.

Description:

The operation to update the indicated table failed to start on the indicated node.

Solution:

There may be other related messages on the nodes where the failure occurred, which may help diagnose the problem. If the root disk failed, it needs to be replaced. If the indicated table was deleted by accident, boot the offending node(s) in -x mode to restore the indicated table from other nodes in the cluster. The CCR tables are located at /etc/cluster/ccr/. If the root disk is full, remove some unnecessary files to free up some space.

185465 :No action on DBMS Error %s: %ld

Description:

Database server returned error. Fault monitor does not take any action on this error.

Solution:

No action required.

185720 :lkdb_parm: lib initialization failed

Description:

initializing a library to get the static lock manager parameters failed.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

185839 :IP address (hostname) and Port pairs %s%c%d and %s%c%d in property %s, at entries %d and %d, effectively duplicate each other. The port numbers are the same and the resolved IP addresses are the same.

Description:

The two list entries at the named locations in the named property have port numbers that are identical, and also have IP address (hostname) strings that resolve to the same underlying IP address. An IP address (hostname) string and port entry should only appear once in the property.

Solution:

Specify the property with only one occurrence of the IP address (hostname) string and port entry.

185974 :Default Oracle parameter file %s does not exist

Description:

Oracle Parameter file has not been specified. Default parameter file indicated in the message does not exist.

Solution:

Please make sure that parameter file exists at the location indicated in message or specify 'Parameter_file' property for the resource.

186306 :Conversion of hostnames failed for %s.

Description:

The hostname or IP address given could not be converted to an integer.

Solution:

Add the hostname to the /etc/inet/hosts file. Verify the settings in the /etc/nsswitch.conf file include "files" for host lookup.

186484 :PENDING_METHODS: bad resource state <%s> (%d) for resource <%s>

Description:

The rgmd state machine has discovered a resource in an unexpected state on the local node. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

186524 :reservation error(%s) - do_scsi2_release() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

The action which failed is a scsi-2 ioctl. These can fail if there are scsi-3 keys on the disk. To remove invalid scsi-3 keys from a device, use 'scdidadm -R' to repair the disk (see scdidadm man page for details). If there were no scsi-3 keys present on the device, then this error is indicative of a hardware problem, which should be resolved as soon as possible. Once the problem has been resolved, the following actions may be necessary: If the message specifies the 'node_join' transition, then this node may be unable to access the specified device. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access the device. In either case, access can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group may have failed to start on this node. If the device group was started on another node, it may be moved to this node with the scswitch command. If the device group was not started, it may be started with the scswitch command. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group may have failed. If so, the desired action may be retried.

186612 :_cladm CL_GET_CLUSTER_NAME failed; perhaps system is not booted as part of cluster

Description:

Could not get cluster name. Perhaps the system is not booted as part of the cluster.

Solution:

Make sure the node is booted as part of a cluster.

187307 :invalid debug_level: '%s'

Description:

Invalid debug_level argument passed to udlmctl. udlmctl will not startup.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

190918:Failed to start orbixd.

Description:

The orbix daemon could not be started.

Solution:

As the Sun Cluster HA for BroadVision One-To-One Enterprise user, manually start the orbix daemon. If you cannot manually start the orbixd daemon, contact your authorized Sun service provider. Provide your authorized Sun service provider a copy of the /var/adm/messages files from all nodes and a copy of the orbixd log files, which are located in /var/run/cluster/bv/.

191225 :clcomm: Created %d threads, wanted %d for pool %d

Description:

The system creates server threads to support requests from other nodes in the cluster. The system could not create the desired minimum number of server threads. However, the system did succeed in creating at least 1 server thread. The system will have further opportunities to create more server threads. The system cannot create server threads when there is inadequate memory. This message indicates either inadequate memory or an incorrect configuration.

Solution:

There are multiple possible root causes. If the system administrator specified the value of "maxusers", try reducing the value of "maxusers". This reduces memory usage and results in the creation of fewer server threads. If the system administrator specified the value of "cl_comm:min_threads_default_pool" in "/etc/system", try reducing this value. This directly reduces the number of server threads. Alternatively, do not specify this value. The system can automatically select an appropriate number of server threads. Another alternative is to install more memory. If the system administrator did not modify either "maxusers" or "min_threads_default_pool", then the system should have selected an appropriate number of server threads. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

191270 :IP address (hostname) string %s in property %s, entry %d does not resolve to an IP address that belongs to one of the resources named in property %s.

Description:

The IP address or hostname named does not belong to one of the network resources designated for use by this resource

Solution:

Either select a different IP address to use that is in one of the network resources used by this resource or create a network resource that contains the named IP address and designate that resource as one of the network resources used by this resource.

191409 :scvxvmlg warning - chown(%s) failed

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be inaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

191492 :CCR: CCR unable to read root file system.

Description:

The CCR failed to read repository due to root file system failure on this node.

Solution:

The root file system needs to be replaced on the offending node.

191506 :ERROR: enabled resource <%s> in resource group <%s> depends on disabled resource <%s>

Description:

An enabled resource was found to depend on a disabled resource. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

191772 :Failed to configure the networking components for scalable resource %s for method %s.

Description:

The processing that is required for scalable services did not complete successfully.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

191957 :The property %s does not have a legal value.

Description:

The property named does not have a legal value.

Solution:

Assign the property a legal value.

192183 :freeze_adjust_timeouts: call to rpc.fed failed, tag <%s> err <%d> result <%d>

Description:

The rgmd failed in its attempt to suspend timeouts on an executing method during temporary unavailability of a global device group. This could cause the resource method to time-out. Depending on which method was being invoked and the Failover_mode setting on the resource, this might cause the resource group to fail over or move to an error state.

Solution:

No action is required if the resource method execution succeeds. If the problem recurs, rebooting this node might cure it. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem.

192518 :Cannot access start script %s: %s

Description:

The start script is not accessible and executable. This may be due to the script not existing or the permissions not being set properly.

Solution:

Make sure the script exists, is in the proper directory, and has read and execute permissions set appropriately.

192619 :reservation error(%s) - Unable to open device %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

This may be indicative of a hardware problem, which should be resolved as soon as possible. Once the problem has been resolved, the following actions may be necessary: If the message specifies the 'node_join' transition, then this node may be unable to access the specified device. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access the device. In either case, access can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group may have failed to start on this node. If the device group was started on another node, it may be moved to this node with the scswitch command. If the device group was not started, it may be started with the scswitch command. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group may have failed. If so, the desired action may be retried.

193137 :Service group '%s' deleted

Description:

The service group by that name is no longer known by the scalable services framework.

Solution:

This is an informational message, no user action is needed.

193167:Adaptive server shutdown did not succeed.

Description:

The Sybase Adaptive Server shutdown process did not succeed.

Solution:

Manually stop the Sybase Adaptive Server. Examine the log files and setup. See if the STOP method timeout value is set too low.

193263 :Service is online.

Description:

While attempting to check the health of the data service, probe detected that the resource status is fine and it is online.

Solution:

This is informational message. No user action is needed.

193933 :CMM: Votecount changed from %d to %d for node %s.

Description:

The specified node's votecount has been changed as indicated.

Solution:

This is an informational message, no user action is needed.

194179:Failed to stop the service %s.

Description:

Specified data service failed to stop.

Solution:

Check the /var/adm/messages files for the cause of the failure. Contact your authorized Sun service provider for assistance. Provide your authorized Sun service provider a copy of the /var/adm/messages files from all nodes.

194512 :Failed to stop HA-NFS system fault monitor.

Description:

Process monitor facility has failed to stop the HA-NFS system fault monitor.

Solution:

Use pmfadm(1M) with -s option to stop the HA-NFS system fault monitor with tag name "cluster.nfs.daemons". If the error still persists, then reboot the node.

194810 :clcomm: thread_create failed for resource_thread

Description:

The system could not create the needed thread, because there is inadequate memory.

Solution:

There are two possible solutions. Install more memory. Alternatively, reduce memory usage. Since this happens during system startup, application memory usage is normally not a factor.

195286 :CMM: Placing reservation on quorum device %s failed with error %d.

Description:

The specified error was encountered while trying to place a reservation on the specified quorum device, hence this node can not take ownership of this quorum device.

Solution:

There may be other related messages on this and other nodes connected to this quorum device that may indicate the cause of this problem. Refer to the quorum disk repair section of the administration guide for resolving this problem.

195538 :Null value is passed for the handle.

Description:

A null handle was passed for the function parameter. No further processing can be done without a proper handle.

Solution:

It's a programming error; core is generated. Specify a non-null handle in the function call.

195867 :clexecd: Unexpected eventmask %x in revents of the control fd.

Description:

clexecd program has encountered an error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

196233 :INTERNAL ERROR: launch_method: method tag <%s> not found in method invocation list for resource group <%s>

Description:

An internal error has occurred. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

197165:Monitor server shutdown did not succeed. Using pkill.

Description:

Sun Cluster HA for Sybase could not gracefully shut down the Monitor Server. The process is being terminated with the UNIX kill directive.

Solution:

Manually shut down the Monitor Server. Examine the log files and setup.

197307 :Resource contains invalid hostnames.

Description:

The hostnames that has to be made available by this logical host resource are invalid.

Solution:

It is advised to keep the hostnames in /etc/inet/hosts file and enable "files" for host lookup in nsswitch.conf file. Any of the following situations might have occurred. 1) If hosts are not in /etc/inet/hosts file then make sure the nameserver is reachable and has host name entries specified. 2) Invalid hostnames might have been specified while creating the logical host resource. If this is the case, use the scrgadm command to respectify the hostnames for this logical host resource.

197456 :CCR: Fatal error: Node will be killed.

Description:

Some fatal error occurred on this node during the synchronization of cluster repository. This node will be killed to allow the synchronization to continue.

Solution:

Look for other messages on this node that indicated the fatal error occurred on this node. For example, if the root disk on the afflicted node has failed, then it needs to be replaced.

197997 :clexecd: dup2 of stdin returned with errno %d while exec'ing (%s). Exiting.

Description:

clexecd program has encountered a failed dup2(2) system call. The error message indicates the error number for the failure.

Solution:

The clexecd program will exit and the node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

198216 :t_bind cannot bind to requested address

Description:

Call to t_bind() failed. The "t_bind" man page describes possible error codes. ucmmd will exit and the node will abort.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

198284 :Failed to start fault monitor.

Description:

The fault monitor for this data service was not started. There may be prior messages in syslog indicating specific problems.

Solution:

The user should correct the problems specified in prior syslog messages. This problem may occur when the cluster is under load and Sun Cluster cannot start the application within the timeout period specified. You may consider increasing the Monitor_Start_timeout property. Try switching the resource group to another node using scswitch (1M).

198542 :No network resources found for resource.

Description:

No network resources were found for the resource.

Solution:

Declare network resources used by the resource explicitly using the property Network_resources_used. For the resource name and resource group name, check the syslog tag.

198851 :fatal: Got error <%d> trying to read CCR when disabling resource <%s>; aborting node

Description:

Rgmd failed to read updated resource from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

199467 :clcomm::ObjectHandler::_unreferenced called

Description:

This operation should never be executed.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.