Sun Cluster 3.1 Error Messages Guide

Previous: Message IDs 800000–899999

Message IDs 900000–999999

900102 Failed to retrieve the resource type property %s: %s.

Description:

An API operation has failed while retrieving the resource type property. Low memory or API call failure might be the reasons.

Solution:

In case of low memory, the problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. Otherwise, if it is API call failure, check the syslog messages from other components.

900206 parameter '%s%s' must be an integer "%s". Using default value of %d.

Description:

Using a default value for a parameter.

Solution:

None.

900499 Error: low memory

Description:

The rpc.fed server was not able to allocate memory. The server may not be able to capture the output from methods it runs.

Solution:

Investigate if the host is running out of memory. If not save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

900501 pthread_sigmask: %s

Description:

Solution:

900675 cluster volume manager shared access mode enabled

Description:

Message indicating shared access availability of the volume manager.

Solution:

None.

900843 Retrying to retrieve the cluster information.

Description:

An update to cluster configuration occured while cluster properties were being retrieved

Solution:

Ignore the message.

900954 fatal: Unable to open CCR

Description:

The rgmd was unable to open the cluster configuration repository (CCR). The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

901030 clconf: Data length is more than max supported length in clconf_file_io

Description:

In reading configuration data through CCR FILE interface, found the data length is more than max supported length.

Solution:

Check the CCR configuration information.

902721 Switching over resource group using scha_control GIVEOVER

Description:

Fault monitor has detected problems in RDBMS server. Fault monitor has determined that RDBMS server cannot be restarted on this node. Attempt will be made to switchover the resource to any other node, if a healthy node is available.

Solution:

Check the cause of RDBMS failure.

903007 txmit_common: udp is null!

Description:

Can not transmit a message and communicate with udlmctl because the address to send to is null.

Solution:

None.

903317 Entry at position %d in property %s was invalid.

Description:

An invalid entry was found in the named property. The position index, which starts at 0 for the first element in the list, indicates which element in the property list was invalid.

Solution:

Make sure the property has a valid value.

903370 Command %s failed to run: %s.

Description:

HA-NFS attempted to run the specified command to perform some action which failed. The specific reason for failure is also provided.

Solution:

HA-NFS will take action to recover from this failure, if possible. If the failure persists and service does not recover, contact your service provider. If an immediate repair is desired, reboot the cluster node where this failure is occuring repeatedly.

903734 Failed to create lock directory %s: %s.

Description:

This network resource failed to create a directory in which to store lock files. These lock files are needed to serialize the running of the same callback method on the same adapter for multiple resources.

Solution:

This might be the result of a lack of system resources. Check whether the system is low in memory and take appropriate action. For specific error information, check the syslog message.

905023 clexecd: dup2 of stderr returned with errno %d while exec'ing (%s). Exiting.

Description:

clexecd program has encountered a failed dup2(2) system call. The error message indicates the error number for the failure.

Solution:

The clexecd program will exit and the node will be halted or rebooted to prevent data corruption. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

905591 Could not determine volume configuration daemon mode

Description:

Could not get information about the volume manager deamon mode.

Solution:

Check if the volume manager has been started up right.

906435 Encountered an error while validating configuration.

Description:

An error occurred during the validations of specified HAStoragePlus global device paths and/or FilesystemMountPoints.

Solution:

Investigate possible RGM, DSDL, DCS errors. Contact your authorized Sun service provider for assistance in diagnosing the problem.

906589 Error retrieving network address resource in resource group.

Description:

An error occured reading the indicated extension property.

Solution:

Check syslog messages for errors logged from other system modules. If error persists, please report the problem.

906838 reservation warning(%s) - do_scsi3_registerandignorekey() err or for disk %s, attempting do_scsi3_register()

Description:

The device fencing program has encountered errors while trying to access a device. Now trying to run do_scsi3_register() This is an informational message, no user action is needed

Solution:

This is an informational message, no user action is needed.

907960 scvxvmlg error - stat(%s) failed with errno %d

Description:

The program responsible for maintaining the VxVM namespace was unable to access the global device namespace. If configuration changes were recently made to VxVM diskgroups or volumes, this node may be unaware of those changes. Recently created volumes may be unaccessible from this node.

Solution:

Verify that the /global/.devices/node@N (N = this node's node number) is mounted globally and is accessible. If no configuration changes have been recently made to VxVM diskgroups or volumes and all volumes continue to be accessible from this node, then no further action is required. If changes have been made, the device namespace on this node can be updated to reflect those changes by executing '/usr/cluster/lib/dcs/scvxvmlg'. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

908240 in libsecurity realloc failed

Description:

A server (rpc.pmfd, rpc.fed or rgmd) was not able to start, or a client was not able to make an rpc connection to the server, probably due to low memory. An error message is output to syslog.

Solution:

Investigate if the host is low on memory. If not, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

908387 ucm_callback for step %d generated exception %d

Description:

ucmm callback for a step failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

908591 Failed to stop fault monitor.

Description:

An attempt was made to stop the fault monitor and it failed. There may be prior messages in syslog indicating specific problems.

Solution:

If there are prior messages in syslog indicating specific problems, these should be corrected. If that doesn't resolve the issue, the user can try the following. Use process monitor facility (pmfadm (1M)) with -L option to retrieve all the tags that are running on the server. Identify the tag name for the fault monitor of this resource. This can be easily identified as the tag ends in string ".mon" and contains the resource group name and the resource name. Then use pmfadm (1M) with -s option to stop the fault monitor. This problem may occur when the cluster is under load and Sun Cluster cannot stop the fault monitor within the timeout period specified. You may consider increasing the Monitor_Stop_timeout property. If the error still persists, then reboot the node.

908716 fatal: Aborting this node because method <%s> failed on resource <%s> and Failover_mode is set to HARD

Description:

A STOP method has failed or timed out on a resource, and the Failover_mode property of that resource is set to HARD.

Solution:

No action is required. This is normal behavior of the RGM. With Failover_mode set to HARD, the rgmd reboots the node to force the resource group offline so that it can be switched onto another node. Other syslog messages that occurred just before this one might indicate the cause of the STOP method failure.

909069 tcpmodopen: Could not allocate private data

Description:

Machine is out of memory.

909656 Unable to open /dev/kmem:%s

Description:

HA-NFS fault monitor attempt to open the device but failed. The specific cause of the failure is logged with the message. The /dev/kmem interface is used to read NFS activity counters from kernel.

Solution:

No action. HA-NFS fault monitor would ignore this error and try to open the device again later. Since it is unable to read NFS activity counters from kernel, HA-NFS would attempt to contact nfsd by means of a NULL RPC. A likely cause of this error is lack of resources. Attempt to free memory by terminating any programs which are using large amounts of memory and swap. If this error persists, reboot the node.

909737 Error loading dtd for %s

Description:

Solution:

909737 Error loading dtd for %s

Description:

Solution:

910546 Although there are no other potential masters, RGM is failing resource group <%s> off of node <%d> because there are other current masters.

Description:

A scha_control(1HA,3HA) GIVEOVER attempt succeeded, even though no candidate node was available to host the resource group, because the resource group was currently mastered by another node.

Solution:

No action required.

911176 Successfully started BV on %s

Description:

Just an Informational Message that the BV servers and daemons on the specified host have started.

Solution:

No action needed.

912352 Could not unplumb some IP addresses.

Description:

Some of the ip addresses managed by the LogicalHostname resource were not successfully brought offline on this node.

Solution:

Use the ifconfig command to make sure that the ip addresses are indeed absent. Check for any error message before this error message for a more precise reason for this error. Use scswitch command to move the resource group to a different node. If problem persists, reboot.

912696 The action to be taken as determined by scds_fm_action is failover. However the application is not being failed over because the failover_enabled extension property is set to false. Restarting the application instead.

Description:

Property failover_enabled is set to false. The probe is trying to restart application locally, instead of failover.

Solution:

This is an informational message, no user action is needed.

912866 Could not validate CCR tables; halting node

Description:

The rgmd was unable to check the validity of the CCR tables representing Resource Types and Resource Groups. The node will be halted to prevent further errors.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

913654 %s not specified on command line.

Description:

Required property not included in a scrgadm command.

Solution:

Reissue the scrgadm command with the required property and value.

914248 Listener status probe timed out after %s seconds.

Description:

An attempt to query the status of the Oracle listener using command 'lsnrctl status <listener_name>' did not complete in the time indicated, and was abandoned. HA-Oracle will attempt to kill the listener and then restart it.

Solution:

None, HA-Oracle will attempt to restart the listener. However, the cause of the listener hang should be investigated further. Examine the log file and syslog messages for additional information.

914519 Error when sending message to child %m

Description:

Error occurred when communicating with fault monitor child process. Child process will be stopped and restarted.

Solution:

If error persists, then disable the fault monitor and resport the problem.

914655 Restarting the resource %s.

Description:

The process monitoring facility tried to send a message to the fault monitor noting that the data service application died. It was unable to do so.

Solution:

Since some part (daemon) of the application has failed, it would be restarted. If fault monitor is not yet started, wait for it to be started by Sun Cluster framework. If fault monitor has been disabled, enable it using scswitch.

914866 Unable to complete some unshare commands.

Description:

HA-NFS postnet_stop method was unable to complete the unshare(1M) command for some of the paths specified in the dfstab file.

Solution:

The exact pathnames which failed to be unshared would have been logged in earlier messages. Run those unshare commands by hand. If problem persists, reboot the node.

915389 Failed to create socket: %s.

Description:

Failure in communication between fault monitor and process monitor facility.

Solution:

This is internal error. Save /var/adm/messages file and contact the Sun service provider.

917591 fatal: Resource type <%s> update failed with error <%d>; aborting node

Description:

Rgmd failed to read updated resource type from the CCR on this node.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

918018 load balancer for group '%s' released

Description:

This message indicates that the service group has been deleted.

Solution:

This is an informational message, no user action is needed.

918488 Validation failed. Invalid command line parameter %s %s.

Description:

Unable to process parameters passed to the call back methodspecified. This is a Sun Cluster HA for Sybase internal error.

Solution:

Report this problem to your authorized Sun service provider

919740 WARNING: error in translating address (%s) for nodeid %d

Description:

Could not get address for a node.

Solution:

Make sure the node is booted as part of a cluster.

919860 scvxvmlg warning - %s does not link to %s, changing it

Description:

The program responsible for maintaining the VxVM device namespace has discovered inconsistencies between the VxVM device namespace on this node and the VxVM configuration information stored in the cluster device configuration system. If configuration changes were made recently, then this message should reflect one of the configuration changes. If no changes were made recently or if this message does not correctly reflect a change that has been made, the VxVM device namespace on this node may be in an inconsistent state. VxVM volumes may be inaccessible from this node.

Solution:

If this message correctly reflects a configuration change to VxVM diskgroups then no action is required. If the change this message reflects is not correct, then the information stored in the device configuration system for each VxVM diskgroup should be examined for correctness. If the information in the device configuration system is accurate, then executing '/usr/cluster/lib/dcs/scvxvmlg' on this node should restore the device namespace. If the information stored in the device configuration system is not accurate, it must be updated by executing '/usr/cluster/bin/scconf -c -D name=diskgroup_name' for each VxVM diskgroup with inconsistent information.

920103 created %d threads to handle resource group switchback; desired number = %d

Description:

The rgmd was unable to create the desired number of threads upon starting up. This is not a fatal error, but it may cause RGM reconfigurations to take longer because it will limit the number of tasks that the rgmd can perform concurrently.

Solution:

Make sure that the hardware configuration meets documented minimum requirements. Examine other syslog messages on the same node to see if the cause of the problem can be determined.

920736 Unknown transport type: %s

Description:

The transport type used is not known to Solaris Clustering

922085 INTERNAL ERROR CMM: Memory allocation error.

Description:

The CMM failed to allocate memory during initialization.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

922330 getservbyname() failed : %s.

Description:

Entries for required services are missing.

Solution:

Verify the sources for the services specified in the /etc/nsswitch.conf file. These should have the entry for the required service.

922363 resource %s status msg on node %s change to <%s>

Description:

This is a notification from the rgmd that a resource's fault monitor status message has changed.

Solution:

This is an informational message, no user action is needed.

922870 tag %s: unable to kill process with SIGKILL

Description:

The rpc.fed server is not able to kill the process with a SIGKILL. This means the process is stuck in the kernel.

Solution:

Save the syslog messages file. Examine other syslog messages occurring around the same time on the same node, to see if the cause of the problem can be identified.

923184 CMM: Scrub failed for quorum device %s.

Description:

The scrub operation for the specified quorum device failed, due to which this quorum device will not be added to the cluster.

Solution:

There may be other related messages on this node that may indicate the cause of this problem. Refer to the disk repair section of the administration guide for resolving this problem. After the problem has been resolved, retry adding the quorum device.

923618 Prog <%s>: unknown command.

Description:

An internal error in ucmmd has prevented it from successfully executing a program.

Solution:

Save a copy of /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

923712 CCR: Table %s on joining node %s has the same version but different checksum as the copy in the current membership. The table on the joining node will be replaced by the one in the current membership.

Description:

The indicated table on the joining node has the same version but different contents as the one in the current membership. It will be replaced by the one in the current membership.

Solution:

This is an informational message, no user action is needed.

925260 Unlock failed: %s.

925379 resource group <%s> in illegal state <%s>, will not run %s on resource <%s>

Description:

While creating or deleting a resource, the rgmd discovered the containing resource group to be in an unexpected state on the local node. As a result, the rgmd did not run the INIT or FINI method (as indicated in the message) on that resource on the local node. This should not occur, and may indicate an internal logic error in the rgmd.

Solution:

The error is non-fatal, but it may prevent the indicated resource from functioning correctly on the local node. Try deleting the resource, and if appropriate, re-creating it. If those actions succeed, then the problem was probably transitory. Since this problem may indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.

925953 reservation error(%s) - do_scsi3_register() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

For the user action required by this message, see the user action for message 192619.

926099 char *fmt

926201 RGM aborting

Description:

A fatal error has occurred in the rgmd. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

926550 scha_cluster_get failed for (%s) with %d

Description:

Call to get cluster information failed. The second part of the message gives the error code.

Solution:

The calling program should handle this error. If it is not recoverable, it will exit.

926749 Resource group nodelist is empty.

Description:

Empty value was specified for the nodelist property of the resource group.

Solution:

Any of the following situations might have occured. Different user action is required for these different scenarios. 1) If a new resource is created or updated, check the value of the nodelist property. If it is empty or not valid, then provide valid value using scrgadm(1M) command. 2) For all other cases, treat it as an Internal error. Contact your authorized Sun service provider.

927042 Validation failed. SYBASE backup server startup file RUN_%s not found SYBASE=%s.

Description:

Backup server was specified in the extension property Backup_Server_Name. However, Backup server startup file was not found. Backup server startup file is expected to be: $SYBASE/$SYBASE_ASE/install/RUN_<Backup_Server_Name>

Solution:

Check the Backup server name specified in the Backup_Server_Name property. Verify that SYBASE and SYBASE_ASE environment variables are set property in the Environment_file. Verify that RUN_<Backup_Server_Name> file exists.

927753 Fault monitor does not exist or is not executable

Description:

Fault monitor program specified in support file is not executable or does not exist. Recheck your installation.

Solution:

Please report this problem.

927846 fatal: Received unexpected result <%d> from rpc.fed, aborting node

Description:

A serious error has occurred in the communication between rgmd and rpc.fed while attempting to execute a VALIDATE method. The rgmd will produce a core file and will force the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

928235 Validation failed. Adaptive_Server_Log_File %s not found.

Description:

File specified in the Adaptive_Server_Log_File extension property was not found. The Adaptive_Server_Log_File is used by the fault monitor for monitoring the server.

Solution:

Please check that file specified in the Adaptive_Server_Log_File extension property is accessible on all the nodes.

928382 CCR: Failed to read table %s on node %s.

Description:

The CCR failed to read the indicated table on the indicated node. The CCR will attempt to recover this table from other nodes in the cluster.

Solution:

This is an informational message, no user action is needed.

928455 clcomm: Couldn't write to routing socket: %d

Description:

The system prepares IP communications across the private interconnect. A write operation to the routing socket failed.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

929100 No permission for others to execute %s.

Description:

The specified path does not have the correct permissions as expected by a program.

Solution:

Set the permissions for the file so that it is readable and executable by others (world).

929252 Failed to start HA-NFS system fault monitor.

Description:

Process monitor facility has failed to start the HA-NFS system fault monitor.

Solution:

Check whether the system is low in memory or the process table is full and correct these problems. If the error persists, use scswitch to switch the resource group to another node.

929712 Share path %s: file system %s is not global.

Description:

The specified share path exists on a file system which is neither mounted via /etc/vfstab nor is a global file system.

Solution:

Share paths in HA-NFS must satisfy this requirement.

930059 %s: %s.

Description:

HA-SAP failed to access to a file. The file in question is specified with the first '%s'. The reason it failed is provided with the second '%s'.

Solution:

Check and make sure the file is accessable via the path list.

930851 ERROR: process_resource: resource <%s> is pending_fini but no FINI method is registered

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

Since this problem might indicate an internal logic error in the rgmd, please save a copy of the /var/adm/messages files on all nodes, the output of an scstat -g command, and the output of a scrgadm -pvv command. Report the problem to your authorized Sun service provider.

931677 Could not reset SCSI buses on CMM reconfiguration. Could not find clexecd in nameserver.

Description:

An error occurred when the SC 3.0 software was in the process of resetting SCSI buses with shared nodes that are down.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

932126 CCR: Quorum regained.

Description:

The cluster lost quorum at sometime after the cluster came up, and the quorum is regained now.

Solution:

This is an informational message, no user action is needed.

933073 INTERNAL ERROR: unable to obtain static node membership of the cluster; continuing

Description:

This is a non-fatal internal error. The rgmd is unable to get the static node membership. A complete set of all possible nodes will be used instead. This may cause some spurious state transitions for non-existent nodes to be syslogged on a new president or when a Resource or a Resource Group is created.

Solution:

934249 Invalid value for property %s: %d.

Description:

An invalid value may have been specified for the property.

Solution:

Please set correct value for the property and retry the operation.

935576 $HOSTNAME is not configured for BV processes.

Description:

The specified host is not configured for BV processes.

Solution:

Configure BV processes to run on this host or create the resource group properly with the right Networkresource and BV resource in the RG.

936306 svc_setschedprio: Could not setup RT (real time) scheduling parameters: %s

Description:

The server was not able to set the scheduling mode parameters, and the system error is shown. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

937669 CCR: Failed to update table %s.

Description:

The CCR data server failed to update the indicated table.

Solution:

There may be other related messages on this node, which may help diagnose the problem. If the root file system is full on the node, then free up some space by removing unnecessary files. If the root disk on the afflicted node has failed, then it needs to be replaced.

938163 %s %s server startup encountered errors, errno = %d.

Description:

TCP server to accept connections on the private interconnect could not be started.

938189 rpc.fed: program not registered or server not started

Description:

The rpc.fed daemon was not correctly initialized, or has died. This has caused a step invocation failure, and may cause the node to reboot.

Solution:

Check that the Solaris Clustering software has been installed correctly. Use ps(1) to check if rpc.fed is running. If the installation is correct, the reboot should restart all the required daemons including rpc.fed.

938618 Couldn't create deleted subdir %s: error (%d)

Description:

While mounting this file system, PXFS was unable to create some directories that it reserves for internal use.

Solution:

If the error is 28(ENOSPC), then mount this FS non-globally, make some space, and then mount it globally. If there is some other error, and you are unable to correct it, contact your authorized Sun service provider to determine whether a workaround or patch is available.

938836 invalid value for parameter '%sfailfast': "%s". Using default value of 'panic'

Description:

/opt/SUNWudlm/etc/udlm.conf did not have a valid entry for failfast mode. Default mode of 'panic' will be used.

Solution:

None.

939374 CCR: Failed to access cluster repository during synchronization. ABORT node.

Description:

This node failed to access its cluster repository when it first came up in cluster mode and tried to synchronize its repository with other nodes in the cluster.

Solution:

This is usually caused by an unrecoverable failure such as disk failure. There may be other related messages on this node, which may help diagnose the problem. If the root disk on the afflicted node has failed, then it needs to be replaced. If the root disk is full on this node, boot the node into non-cluster mode and free up some space by removing unnecessary files.

940685 Configuration file %s missing for NetBackup.

Description:

The configuration file for NetBackup is missing or does not have correct permissions.

Solution:

Check whether the NetBackup configuration file bp.conf, or a link to it exists under /usr/openv/netbackup, and that the file has correct permissions.

941267 Cannot determine command passed in: <%s>.

Description:

An invalid pathname, displayed within the angle brackets, was passed to a libdsdev routine such as scds_timerun or scds_pmf_start. This could be the result of mis-configuring the name of a START or MONITOR_START method or other property, or a programming error made by the resource type developer.

Solution:

Supply a valid pathname to a regular, executable file.

941367 open failed: %s

Description:

Failed to open /dev/console. The "open" man page describes possible error codes.

Solution:

None. ucmmd will exit.

941416 One or more of the SUNW.HAStoragePlus resources that this resource depends on is not online anywhere.

Description:

It is an invalid configuration to create an application resource that depends on one or more SUNW.HAStoragePlus resource(s) that are not online on any node.

Solution:

Bring the SUNW.HAStoragePlus resource(s) online before creating the application resource that depend on them and then try the command again.

941693 "%s" Failed to stay up.

Description:

The tag shown, being run by the rpc.pmfd server, has exited. Either the user has decided to stop monitoring this process, or the process exceeded the number of retries. An error message is output to syslog.

Solution:

This message is informational; no user action is needed.

943168 pmf_monitor_suspend: pmf_add_triggers: %s

Description:

The rpc.pmfd server was not able to resume the monitoring of a process, and the monitoring of this process has been aborted. An error message is output to syslog.

Solution:

Save the syslog messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

944068 clcomm: validate_policy: invalid relationship moderate %d high %d pool %d

Description:

The system checks the proposed flow control policy parameters at system startup and when processing a change request. The moderate server thread level cannot be higher than the high server thread level.

Solution:

No user action required.

944121 Incorrect permissions set for %s

Description:

This file does not have the expected default execute permissions.

Solution:

Reset the permissions to allow execute permissions using the chmod command.

945325 Livecache instance name is not defined in script lccluster.

Description:

The livecache instance name (in upper case) is not defined in the script 'lccluster'.

Solution:

Make sure livecache instance name is defined in script lccluster. See the instructions in script file lccluster for details.

946660 Failed to create sap state file %s:%s Might put sap resource in stop-failed state.

Description:

If SAP is brought up outside the control of Sun Cluster, HA-SAP will create the state file to signal the stop method not to try to stop sap via Sun Cluster. Now if SAP was brought up outside of Sun Cluster, and the state file creation failed, then the SAP resource might end in the stop-failed state when Sun Cluster tries to stop SAP.

Solution:

This is an internal error. No user action needed. Save the /var/adm/messages from all nodes. Contact your authorized Sun service provider.

946873 The Host $i is not yet up.

Description:

The host specified is not running.

Solution:

Bring the resource group containing the specified host online if it isnot yet running. If the resource group is already onlinethe probe will take appropriate action.

947007 Error initializing the cluster version manager (error %d).

Description:

This message can occur when the system is booting if incompatible versions of cluster software are installed.

Solution:

Verify that any recent software installations completed without errors and that the installed packages or patches are compatible with the rest of the installed software. Also, contact your authorized Sun service provider to determine whether a workaround or patch is available.

947401 reservation error(%s) - Unable to open device %s, errno %d

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

This may be indicative of a hardware problem, which should be resolved as soon as possible. Once the problem has been resolved, the following actions may be necessary: If the message specifies the 'node_join' transition, then this node may be unable to access the specified device. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access the device. In either case, access can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group may have failed to start on this node. If the device group was started on another node, it may be moved to this node with the scswitch command. If the device group was not started, it may be started with the scswitch command. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group may have failed. If so, the desired action may be retried.

948847 ucm_callback for start_trans generated exception %d

Description:

ucmm callback for start transition failed. Step may have timedout.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

949148 "%s" requeued

Description:

The tag shown has exited and was restarted by the rpc.pmfd server. An error message is output to syslog.

Solution:

This message is informational; no user action is needed.

949565 reservation error(%s) - do_scsi2_tkown() error for disk %s

Description:

The device fencing program has encountered errors while trying to access a device. All retry attempts have failed.

Solution:

The action which failed is a scsi-2 ioctl. These can fail if there are scsi-3 keys on the disk. To remove invalid scsi-3 keys from a device, use 'scdidadm -R' to repair the disk (see scdidadm man page for details). If there were no scsi-3 keys present on the device, then this error is indicative of a hardware problem, which should be resolved as soon as possible. Once the problem has been resolved, the following actions may be necessary: If the message specifies the 'node_join' transition, then this node may be unable to access the specified device. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access the device. In either case, access can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group may have failed to start on this node. If the device group was started on another node, it may be moved to this node with the scswitch command. If the device group was not started, it may be started with the scswitch command. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group may have failed. If so, the desired action may be retried.

949937 Out of memory.

Description:

A process has failed to allocate new memory, most likely because the system has run out of swap space.

Solution:

The problem will probably be cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. See swap(1M) for more information.

949937 Out of memory.

Description:

The data service has failed to allocate memory, most likely because the system has run out of swap space.

Solution:

The problem will probably cured by rebooting. If the problem reoccurs, you might need to increase swap space by configuring additional swap devices. See swap(1M) for more information.

950747 resource %s monitor disabled.

Description:

This is a notification from the rgmd that the operator has disabled monitoring on a resource. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

950760 reservation fatal error(%s) - get_resv_lock() error

Description:

The device fencing program has suffered an internal error.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available. Copies of /var/adm/messages from all nodes should be provided for diagnosis. It may be possible to retry the failed operation, depending on the nature of the error. If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, it may be possible to reacquire access to shared devices by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. If desired, it may be possible to switch the device group to this node with the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to retry the attempt to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

951501 CCR: Could not initialize CCR data server.

Description:

The CCR data server could not initialize on this node. This usually happens when the CCR is unable to read its metadata entries on this node. There is a CCR data server per cluster node.

Solution:

There may be other related messages on this node, which may help diagnose this problem. If the root disk failed, it needs to be replaced. If there was cluster repository corruption, then the cluster repository needs to be restored from backup or other nodes in the cluster. Boot the offending node in -x mode to restore the repository. The cluster repository is located at /etc/cluster/ccr/.

951520 Validation failed. SYBASE ASE runserver file RUN_%s not foundSYBASE=%s.

Description:

Sybase Adaptive Server is started by specifying the AdaptiveServer 'runserver' file named RUN_<Server Name> located under$SYBASE/$SYBASE_ASE/install. This file is missing.

Solution:

Verify that the Sybase installation includes the 'runserver' fileand that permissions are set correctly on the file. The file should reside in the $SYBASE/$SYBASE_ASE/install directory.

951634 INTERNAL ERROR CMM: clconf_get_quorum_table() returned error %d.

Description:

The node encountered an internal error during initialization of the quorum subsystem object.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

951733 Incorrect usage: %s.

Description:

The usage of the program was incorrect for the reason given.

Solution:

Use the correct syntax for the program.

952006 received signal %d: exiting

Description:

Solution:

952237 Method <%s>: unknown command.

Description:

An internal error has occurred in the interface between the rgmd and fed daemons. This in turn will cause a method invocation to fail. This should not occur and may indicate an internal logic error in the rgmd.

Solution:

Look for other syslog error messages on the same node. Save a copy of the /var/adm/messages files on all nodes, and report the problem to your authorized Sun service provider.

952465 HA: exception adding secondary

Description:

A failure occurred while attempting to add a secondary provider for an HA service.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

952857 Confdir_list is not defined correctly in script lccluster.

Description:

Confdir_list path is not defined correctly in the script 'lccluster'.

Solution:

Make sure the path for Confdir_list is defined in the script lccluster using parameter 'CONFDIR_LIST'. The value should be defined inside the double quotes, and it is the same as what is defined for extension property 'Confdir_list'.

953642 Server is not running. Calling shutdown abort to clear shared memory (if any)

Description:

Informational message. Oracle server is not running. However if Oracle processes are aborted without clearing shared memory, it can cause problems when starting Oracle server. Clearing leftover shared memory if any.

Solution:

None

954497 clcomm: Unable to find %s in name server

Description:

The specified entity is unknown to the name server.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

955930 Attempt to connect from addr %s port %d

Description:

There was a connection from the named IP address and port number (> 1024) which means that a non-priviledged process is trying to talk to the PNM daemon.

Solution:

This message is informational; no user action is needed. However, it would be a good idea to see which non-priviledged process is trying to talk to the PNM daemon and why?

956501 Issuing a failover request.

Description:

This message indicates that the function is about to make a failover request to the RGM. If the request fails, refer to the syslog messages that appear after this message.

Solution:

This is an informational message, no user action is required.

957086 Prog <%s> failed to execute step <%s> - error=<%d>

Description:

ucmmd failed to execute a step.

Solution:

Examine other syslog messages occurring at about the same time to see if the problem can be identified and if it recurs. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance.

957535 Smooth_shutdown flag is not set to TRUE. The WebLogic Server will be shutdown using sigkill.

Description:

This is a information message. The Smooth_shutdown flag is not set to true and hence the WLS will be stopped using SIGKILL.

Solution:

None. If the smooth shutdown has to be enabled then set the Smooth_shutdown extension property to TRUE. To enable smooth shutdown the username and pasword that have to be passed to the "java weblogic.Admin .." has to be set in the start script. Refer to your Admin guide for details.

958832 INTERNAL ERROR: monitoring is enabled, but MONITOR_STOP method is not registered for resource <%s>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

958888 clcomm: Failed to allocate simple xdoor client %d

Description:

The system could not allocate a simple xdoor client. This can happen when the xdoor number is already in use. This message is only possible on debug systems.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

959384 Possible syntax error in hosts entry in %s.

Description:

Validation callback method has failed to validate the hostname list. There may be syntax error in the nsswitch.conf file.

Solution:

Check for the following syntax rules in the nsswitch.conf file. 1) Check if the lookup order for "hosts" has "files". 2) "cluster" is the only entry that can come before "files". 3) Everything in between '[' and ']' is ignored. 4) It is illegal to have any leading whitespace character at the beginning of the line; these lines are skipped. Correct the syntax in the nsswitch.conf file and try again.

959610 Property %s should have only one value.

Description:

A multi-valued (comma-separated) list was provided to the scrgadm command for the property, while the implementation supports only one value for this property.

Solution:

Specify a single value for the property on the scrgadm command.

960308 clcomm: Pathend %p: remove_path called twice

Description:

The system maintains state information about a path. The remove_path operation is not allowed in this state.

Solution:

No user action is required.

960344 ERROR: process_resource: resource <%s> is pending_init but no INIT method is registered

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

960862 (%s) sigaction failed: %s (UNIX errno %d)

Description:

The udlm has failed to initialize signal handlers by a call to sigaction(2). The error message indicates the reason for the failure.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

960932 Switchover (%s) error: failed to fsck disk

Description:

The file system specified in the message could not be hosted on the node the message came from because an fsck on the file system revealed errors.

Solution:

Unmount the PXFS file system (if mounted), fsck the device, and then mount the PXFS file system again.

961551 Signal %d terminated the scalable service configuration process.

Description:

An unexpected signal caused the termination of the program that configures the networking components for a scalable resource. This premature termination will cause the scalable service configuration to be aborted for this resource.

Solution:

Save a copy of the /var/adm/messages files on all nodes. If a core file was generated, submit the core to your service provider. Contact your authorized Sun service provider for assistance in diagnosing the problem.

961768 Failed to unshare %s.

962746 Usage: %s [-c|-u] -R <resource-name> -T <type-name> -G <group-name> [-r sys_def_prop=values ...] [-x ext_prop=values ...].

Description:

Incorrect arguments are passed to the callback methods.

Solution:

This is an internal error. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

963465 fatal: rpc_control() failed to set automatic MT mode; aborting node

Description:

The rgmd failed in a call to rpc_control(3N). This error should never occur. If it did, it would cause the failure of subsequent invocations of scha_cmds(1HA) and scha_calls(3HA). This would most likely lead to resource method failures and prevent RGM reconfigurations from occurring. The rgmd will produce a core file and will force the node to halt or reboot.

Solution:

Examine other syslog messages occurring at about the same time to see if the source of the problem can be identified. Save a copy of the /var/adm/messages files on all nodes and contact your authorized Sun service provider for assistance in diagnosing the problem. Reboot the node to restart the clustering daemons.

963755 lkcm_cfg: caller is not registered

Description:

udlm is not registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

964072 Unable to resolve %s.

Description:

The data service has failed to resolve the host information.

Solution:

If the logical host and shared address entries are specified in the /etc/inet/hosts file, check that these entries are correct. If this is not the reason, then check the health of the name server. For more error information, check the syslog messages.

964083 t_open (open_cmd_port) failed

Description:

Call to t_open() failed. The "t_open" man page describes possible error codes. ucmmd will exit and the node will abort.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

964399 udlm seq no (%d) does not match library's (%d).

Description:

Mismatch in sequence numbers between udlm and the library code is causing an abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

964521 Failed to retrieve the resource handle: %s.

Description:

An API operation on the resource has failed.

Solution:

For the resource name, check the syslog tag. For more details, check the syslog messages from other components. If the error persists, reboot the node.

965873 CMM: Node %s (nodeid = %d) with votecount = %d added.

Description:

The specified node with the specified votecount has been added to the cluster.

Solution:

This is an informational message, no user action is needed.

965935 Node %d: weight %d

966245 %d entries found in property %s. For a secure Netscape Directory Server instance %s should have one or two entries.

Description:

Since a secure Netscape Directory Server instance can listen on only one or two ports, the list property should have either one or two entries. A different number of entries was found.

Solution:

Change the number of entries to be either one or two.

966416 This list element in System property %s has an invalid protocol: %s.

Description:

The system property that was named does not have a valid protocol.

Solution:

Change the value of the property to use a valid protocol.

966670 did discovered faulty path, ignoring: %s

Description:

scdidadm has discovered a suspect logical path under /dev/rdsk. It will not add it to subpaths for a given instance.

Solution:

Check to see that the symbolic links under /dev/rdsk are correct.

966842 in libsecurity unknown security flag %d

Description:

This is an internal error which shouldn't happen. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

967050 Validation failed. Listener binaries not found ORACLE_HOME=%s

Description:

Oracle listener binaries not found under ORACLE_HOME. ORACLE_HOME specified for the resource is indicated in the message. HA-Oracle will not be able to manage Oracle listener if ORACLE_HOME is incorrect.

Solution:

Specify correct ORACLE_HOME when creating resource. If resource is already created, please update resource property 'ORACLE_HOME'.

967080 This resource depends on a HAStoragePlus resouce that is not online on this node. Ignoring validation errors.

Description:

The resource depends on a HAStoragePlus resource. Some of the files required for validation checks are not accessible from this node, because HAStoragePlus resource in not online on this node. Validations will be performed on the node that has HAStoragePlus resource online. Validation errors are being ignored on this node by this callback method.

Solution:

Check the validation errors logged in the syslog messages. Please verify that these errors are not configuration errors.

967372 Directory %s is not readable.

967970 Modification of resource <%s> failed because none of the nodes on which VALIDATE would have run are currently up

Description:

In order to change the properties of a resource whose type has a registered VALIDATE method, the rgmd must be able to run VALIDATE on at least one node. However, all of the candidate nodes are down. "Candidate nodes" are either members of the resource group's Nodelist or members of the resource type's Installed_nodes list, depending on the setting of the resource's Init_nodes property.

Solution:

Boot one of the resource group's potential masters and retry the resource change operation.

968426 arguments to bv_utils are $*

Description:

Just a debug message.

Solution:

No action needed.

968557 Could not unplumb any ip addresses.

Description:

Failed to unplumb any ip addresses. The resource cannot be brought offline. Node will be rebooted by Sun cluster.

Solution:

Check the syslog messages from other components for possible root cause. Save a copy of /var/adm/messages and contact Sun service provider for assistance in diagnosing and correcting the problem.

968853 scha_resource_get error (%d) when reading system property %s

Description:

Error occurred in API call scha_resource_get.

Solution:

Check syslog messages for errors logged from other system modules. Stop and start fault monitor. If error persists then disable fault monitor and report the problem.

969008 t_alloc (open_cmd_port-T_ADDR) %d

Description:

Call to t_alloc() failed. The "t_alloc" man page describes possible error codes. ucmmd will exit and the node will abort.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

969827 Failover attempt has failed.

Description:

The failover attempt of the resource is rejected or encountered an error.

Solution:

For more detailed error message, check the syslog messages. Check whether the Pingpong_interval has appropriate value. If not, adjust it using scrgadm(1M). Otherwise, use scswitch to switch the resource group to a healthy node.

970018 Probe for %s returned error.

Description:

Probe for the specified service returned error.

970912 execve: %s

Description:

The rpc.pmfd server was not able to exec a new process, possibly due to bad arguments. The message contains the system error. The server does not perform the action requested by the client, and an error message is output to syslog.

Solution:

Investigate that the file path to be executed exists. If all looks correct, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

971233 Property %s is not set.

Description:

The property has not been set by the user and must be.

Solution:

Reissue the scrgadm command with the required property and value.

971412 Error in getting global service name for path <%s>

Description:

The path can not be mapped to a valid service name.

Solution:

Check the path passed into extension property "ServicePaths" of SUNW.HAStorage type resource.

972580 CCR: Highest epoch is < 0, highest_epoch = %d.

Description:

The epoch indicates the number of times a cluster has come up. It should not be less than 0. It could happen due to corruption in the cluster repository.

Solution:

Boot the cluster in -x mode to restore the cluster repository on all the members of the cluster from backup. The cluster repository is located at /etc/cluster/ccr/.

972610 fork: %s

Description:

The rgmd, rpc.pmfd or rpc.fed daemon was not able to fork a process, possibly due to low swap space. The message contains the system error. This can happen while the daemon is starting up (during the node boot process), or when executing a client call. If it happens when starting up, the daemon does not come up. If it happens during a client call, the server does not perform the action requested by the client.

Solution:

Investigate if the machine is running out of swap space. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

972716 Failed to stop the application with SIGKILL. Returning with failure from stop method.

Description:

The stop method failed to stop the application with SIGKILL.

Solution:

Use pmfadm(1M) with the -L option to retrieve all the tags that are running on the server. Identify the tag name for the application in this resource. This can be easily identified as the tag ends in the string ".svc" and contains the resource group name and the resource name. Then use pmfadm(1M) with the -s option to stop the application. If the error still persists, then reboot the node.

972908 Unable to get the name of the local cluster node: %s.

Description:

An internal error occurred while attempting to obtain the local cluster nodename.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

973615 Node %s: weight %d

Description:

The load balancer set the specified weight for the specified node.

Solution:

This is an informational message, no user action is needed.

973933 resource %s added.

Description:

This is a notification from the rgmd that the operator has created a new resource. This may be used by system monitoring tools.

Solution:

This is an informational message, no user action is needed.

974106 lkcm_parm: caller is not registered

Description:

udlm is not registered with ucmm.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

974129 Cannot stat %s: %s.

Description:

The stat(2) system call failed on the specified pathname, which was passed to a libdsdev routine such as scds_timerun or scds_pmf_start. The reason for the failure is stated in the message. The error could be the result of 1) mis-configuring the name of a START or MONITOR_START method or other property, 2) a programming error made by the resource type developer, or 3) a problem with the specified pathname in the file system itself.

Solution:

Ensure that the pathname refers to a regular, executable file.

974664 HA: no valid secondary provider in rmm - aborting

Description:

This node joined an existing cluster. Then all of the other nodes in the cluster died before the HA framework components on this node could be properly initialized.

Solution:

This node must be rebooted.

976495 fork failed: %s

Description:

Failed to run the "fork" command. The "fork" man page describes possible error codes.

Solution:

Some system resource has been exceeded. Install more memory, increase swap space or reduce peak memory consumption.

976914 fctl: %s

Description:

Solution:

977371 Backup server terminated.

Description:

Graceful shutdown did not succeed. Backup server processes were killed in STOP method. It is likely that adaptive server terminated prior to shutdown of backup server.

Solution:

Please check the permissions of file specified in the STOP_FILE extension property. File should be executable by the Sybase owner and root user.

978125 in libsecurity setnetconfig failed when initializing the server: %s - %s

Description:

A server (rpc.pmfd, rpc.fed or rgmd) was not able to start because it could not establish a rpc connection for the network specified. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

978534 %s: lookup failed.

Description:

Could not get the hostname for a node. This could also be because the node is not booted as part of a cluster.

Solution:

Make sure the node is booted as part of a cluster.

978829 t_bind, did not bind to desired addr

Description:

Call to t_bind() failed. The "t_bind" man page describes possible error codes. udlm will exit and the node will abort.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

979343 Error: duplicate prog <%s> launched step <%s>

Description:

Due to an internal error, uccmd has attempted to launch the same step by duplicate programs. ucmmd will reject the second program and treat it as a step failure.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

979803 CMM: Node being shut down.

Description:

This node is being shut down.

Solution:

This is an informational message, no user action is needed.

980307 reservation fatal error(%s) - Illegal command

Description:

The device fencing program has suffered an internal error.

Solution:

980425 Aborting startup: could not determine whether failover of NFS resource groups is in progress.

Description:

Startup of an NFS resource was aborted because it was not possible to determine if failover of any NFS resource groups is in progress.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

980477 LogicalHostname online.

Description:

The status of the logicalhost resource is online.

Solution:

This is informational message. No user action required.

980681 clconf: CSR removal failed

Description:

While executing task in clconf and modifying the state of proxy, received failure from trying to remove CSR.

Solution:

This is informational message. No user action required.

980942 CMM: Cluster doesn't have operational quorum yet; waiting for quorum.

Description:

Not enough nodes are operational to obtain a majority quorum; the cluster is waiting for more nodes before starting.

Solution:

If nodes are booting, wait for them to finish booting and join the cluster. Boot nodes that are down.

981739 CCR: Updating invalid table %s.

Description:

This joining node carries a valid copy of the indicated table with override flag set while the current cluster membership doesn't have a valid copy of this table. This node will update its copy of the indicated table to other nodes in the cluster.

Solution:

This is an informational message, no user action is needed.

981931 INTERNAL ERROR: postpone_start_r: meth type <%d>

Description:

A non-fatal internal error has occurred in the rgmd state machine.

Solution:

984704 reset_rg_state: unable to change state of resource group <%s> on node <%d>; assuming that node died

Description:

The rgmd was unable to reset the state of the specified resource group to offline on the specified node, presumably because the node died.

Solution:

Examine syslog output on the specified node to determine the cause of node death. The syslog output might indicate further remedial actions.

985111 lkcm_reg: illegal %s value

Description:

Cluster information that is being used during udlm registration with ucmm is incorrect.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

985417 %s: Invalid arguments, restarting service.

Description:

The PMF action script supplied by the DSDL while launching the process tree was called with invalid arguments.

Solution:

This is an internal error. Contact your authorized Sun service provider for assistance in diagnosing and correcting the problem.

986190 Entry at position %d in property %s with value %s is not a valid node identifier or node name.

Description:

The value given for the named property has an invalid node specified for it. The position index, which starts at 0 for the first element in the list, indicates which element in the property list was invalid.

Solution:

Specify a valid node for the property.

986197 reservation fatal error(%s) - malloc() error, errno %d

Description:

The device fencing program has been unable to allocate required memory.

Solution:

Memory usage should be monitored on this node and steps taken to provide more available memory if problems persist. Once memory has been made available, the following steps may need to taken: If the message specifies the 'node_join' transition, then this node may be unable to access shared devices. If the failure occurred during the 'release_shared_scsi2' transition, then a node which was joining the cluster may be unable to access shared devices. In either case, access to shared devices can be reacquired by executing '/usr/cluster/lib/sc/run_reserve -c node_join' on all cluster nodes. If the failure occurred during the 'make_primary' transition, then a device group has failed to start on this node. If another node was available to host the device group, then it should have been started on that node. The device group can be switched back to this node if desired by using the scswitch command. If no other node was available, then the device group will not have been started. The scswitch command may be used to start the device group. If the failure occurred during the 'primary_to_secondary' transition, then the shutdown or switchover of a device group has failed. The desired action may be retried.

986466 clexecd: stat of '%s' failed

Description:

clexecd problem failed to stat the directory indicated in the error message.

Solution:

Make sure the directory exists. If the problem persists, contact your authorized Sun service provider to determine whether a workaround or patch is available.

987455 in libsecurity weak Unix authorization failed

Description:

A server (rgmd) refused an rpc connection from a client because it failed the Unix authentication. This happens if a caller program using scha public api, either in its C form or its CLI form, is not running as root. An error message is output to syslog.

Solution:

Check that the calling program using the scha public api is running as root. If the program is running as root, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

987601 scvxvmlg error - opendir(%s) failed

Description:

Solution:

988355 scha_control: rejecting RESOURCE_IS_RESTARTED call on resource <%s> because it does not have a Retry_interval property

Description:

A resource monitor (or some other program) is attempting to notify the RGM that the indicated resource has been restarted, by calling scha_control(1ha),(3ha) with the RESOURCE_IS_RESTARTED option. This request is rejected because the resource type does not declare the Retry_interval property for its resources. This represents a bug in the calling program. To enable the RESOURCE_IS_RESTARTED functionality, the resource type registration (RTR) file must declare the Retry_interval property.

Solution:

Contact the author of the data service (or of whatever program is attempting to call scha_control) and report the error.

988416 t_sndudata (2) in send_reply: %s

Description:

Call to t_sndudata() failed. The "t_sndudata" man page describes possible error codes.

Solution:

None.

988719 Warning: Unexpected result returned while checking for the existence of scalable service group %s: %d.

Description:

A call to the underlying scalable networking code failed.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

988762 Invalid connection attempted from %s: %s

Description:

Solution:

988885 libpnm error: %s

Description:

This means that there is an error either in libpnm being able to send the command to the PNM daemon or in libpnm receiving a response from the PNM daemon.

Solution:

The user of libpnm should handle these errors. However, if the message is: network is too slow - it means that libpnm was not able to read data from the network - either the network is congested or the resources on the node are dangerously low. scha_cluster_open failed - it means that the call to initialize a handle to get cluster information failed. This means that the command will not be sent to the PNM daemon. scha_cluster_get failed - it means that the call to get cluster information failed. This means that the command will not be sent to the PNM daemon. can't connect to PNMd - it means that libpnm was not able to connect to the PNM daemon through the private interconnects. There could be other related error messages. wrong version of PNMd - it means that we connected to a PNM daemon which did not give us the correct version number.

988937 Extension property <failover_enabled> has a value of <%d>

Description:

Resource property failover_enabled is set to a value or has a default value. The value '1' means TRUE and '0' means FALSE.

Solution:

This is an informational message, no user action is needed.

989693 thr_create failed

Description:

Could not create a new thread. The "thr_create" man page describes possible error codes.

Solution:

Some system resource has been exceeded. Install more memory, increase swap space or reduce peak memory consumption.

989846 ERROR: unpack_rg_seq(): rgname_to_rg failed <%s>

Description:

Due to an internal error, the rgmd was unable to find the specified resource group data in memory.

Solution:

Save a copy of the /var/adm/messages files on all nodes. Contact your authorized Sun service provider for assistance in diagnosing the problem.

989958 wrong command length received %d

Description:

This means that the PNM daemon received a command from libpnm, but all the bytes were not received.

Solution:

This is not a serious error. It could be happening due to some network problems. If the error persists send KILL (9) signal to pnmd. PMF will restart pnmd automatically. If the problem persists, restart the node with scswitch -S and shutdown(1M).

990215 HA: repl_mgr: exception while invoking RMA reconf object

Description:

An unrecoverable failure occurred in the HA framework.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

990418 received signal %d

Description:

The daemon indicated in the message tag (rgmd or ucmmd) has received a signal, possibly caused by an operator-initiated kill(1) command. The signal is ignored.

Solution:

The operator must use scswitch(1M) and shutdown(1M) to take down a node, rather than directly killing the daemon.

991108 uaddr2taddr (open_cmd_port) failed

Description:

Call to uaddr2taddr() failed. The "uaddr2taddr" man page describes possible error codes. ucmmd will exit and the node will abort.

Solution:

Save the files /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

991130 pthread_create: %s

Description:

The rpc.pmfd server was not able to allocate a new thread, probably due to low memory, and the system error is shown. This can happen when a new tag is started, or when monitoring for a process is set up. If the error occurs when a new tag is started, the tag is not started and pmfadm returns error. If the error occurs when monitoring for a process is set up, the process is not monitored. An error message is output to syslog.

Solution:

Investigate if the machine is running out of memory. If this is not the case, save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

991800 in libsecurity transport %s is not a loopback transport

Description:

A server (rpc.pmfd, rpc.fed or rgmd) refused an rpc connection from a client because the named transport is not a loopback. An error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

991864 putenv: %s

Description:

The rpc.pmfd server was not able to change environment variables. The message contains the system error. The server does not perform the action requested by the client, and an error message is output to syslog.

Solution:

Save the /var/adm/messages file. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

992912 clexecd: thr_sigsetmask returned %d. Exiting.

Description:

clexecd program has encountered a failed thr_sigsetmask(3THR) system call. The error message indicates the error number for the failure.

Solution:

Contact your authorized Sun service provider to determine whether a workaround or patch is available.

992998 clconf: CSR registration failed

Description:

While executing task in clconf and modifying the state of proxy, received failure from registering CSR.

Solution:

This is informational message. No user action required.

994915 %s: Cannot get transport information.

Description:

The daemon is unable to get needed information about transport over which it provides RPC service.

995026 lkcm_cfg: invalid handle was passed %s %d

Description:

Handle for communication with udlmctl during a call to return the current DLM configuration is invalid.

Solution:

This is an internal error. Save the contents of /var/adm/messages, /var/cluster/ucmm/ucmm_reconf.log and /var/cluster/ucmm/dlm*/*logs/* from all the nodes and contact your Sun service representative.

995339 Restarting using scha_control RESTART

Description:

Fault monitor has detected problems in RDBMS server. Attempt will be made to restart RDBMS server on the same node.

Solution:

Check the cause of RDBMS failure.

995532 Could not bring up some ip addresses.

995859 scha_cluster_get failed

Description:

Call to get cluster information failed. This means that the incoming connection to the PNM daemon will not be accepted.

Solution:

There could be other related error messages which might be helpful. Contact your authorized Sun service provider to determine whether a workaround or patch is available.

996075 fatal: Unable to resolve %s from nameserver

Description:

The low-level cluster machinery has encountered a fatal error. The rgmd will produce a core file and will cause the node to halt or reboot to avoid the possibility of data corruption.

Solution:

Save a copy of the /var/adm/messages files on all nodes, and of the rgmd core file. Contact your authorized Sun service provider for assistance in diagnosing the problem.

996887 reservation message(%s) - attempted removal of scsi-3 keys from non-scsi-3 device %s

Description:

The device fencing program has detected scsi-3 registration keys on a a device which is not configured for scsi-3 PGR use. The keys have been removed.

Solution:

This is an informational message, no user action is needed.

996897 Method <%s> on resource <%s>: stat of program file failed.

Description:

The rgmd was unable to access the indicated resource method file. This may be caused by incorrect installation of the resource type.

Solution:

Consult resource type documentation; [re-]install the resource type, if necessary.

996902 Stopped the HA-NFS system fault monitor.

Description:

The HA-NFS system fault monitor was stopped successfully.

Solution:

No action required.

997568 modinstall of tcpmod failed

Description:

Streams module that intercepts private interconnect communication could not be installed.

997689 IP address %s is an IP address in resource %s and in resource %s.

Description:

The same IP address is being used in two resources. This is not a correct configuration.

Solution:

Delete one of the resources that is using the duplicated IP address.

998022 Failed to restart the service: %s.

Description:

Restart attempt of the data service has failed.

Solution:

Check the sylog messages that are occurred just before this message to check whether there is any internal error. In case of internal error, contact your Sun service provider. Otherwise, any of the following situations may have happened. 1) Check the Start_timeout and Stop_timeout values and adjust them if they are not appropriate. 2) This might be the result of lack of the system resources. Check whether the system is low in memory or the process table is full and take appropriate action.

Previous: Message IDs 800000–899999