Reading Error Information for Debugging

In the Foundation Services, standard error and alert messages are sent to system log files. In error scenarios, you can refer to the system log files to determine the history of a process. Critical errors are written on the console in addition to being logged in the system log files.

While it is true that errors can cause notifications to be sent, notifications are events and are not errors in themselves. For information on notifications, see EXAMPLE 6-1.

The NMA enables you to receive information on notifications. Statistics are available to diagnose the cause of errors received. See the Netra High Availability Suite 3.0 1/08 Foundation Services NMA Programming Guide.

Note - NMA is not available for use on the Linux platform, and is only supported for use with the Solaris OS.

For information about using and configuring system log files, see the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide.

Stopping the Daemon Monitor for Debugging

You cannot debug critical services, such as the CMM or Reliable NFS, on a running cluster. Debugging would interrupt the regular messages that these services send between nodes. Debugging tools, such as the truss command, cannot be used on daemons while they are being monitored by the Daemon Monitor.

Before debugging a Foundation Services daemon or a monitored Solaris daemon, stop the Daemon Monitor from monitoring the daemon that you want to debug. When you have finished debugging, restart the Daemon Monitor.

For information about how to stop and restart the Daemon Monitor, see the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide. For a list of monitored daemons, see the nhpmd(1M), or for the Linux OS, the nhpmd(8), man page.

Broken Pipe Error Messages

If one of the applications you are running on your cluster terminates suddenly, CMM notification pipes that this application opened are kept on the nhcmmd side. You can be left with a broken pipe from the CMM to the dead application. If the CMM later sends a notification to this dead application, the CMM realizes that the application is dead and closes the broken pipe. Alternatively, the CMM frequently checks to see if a client application is dead and if necessary, closes associated pipes.

If many of your applications die suddenly, without notifying the CMM, the following can happen:

Many pipes are broken.

Unless the CMM has a notification to emit, neither the dead applications nor the broken pipes, are identified by the CMM.

Each broken pipe is associated to a file descriptor. This can lead to a file descriptor shortage as the quantity of file descriptors increases, which can saturate the CMM.

If one of your applications has died suddenly, you receive a system log message such as this:

#  Dec 23 09:56:07 machine_name CMM[839]: S-CMM 
notif to /var/run/CMM_884_00000000 fails: Broken pipe

The CMM detects the problem and closes the notification pipe. For further information on accessing system log files, see “Accessing and Maintaining System Log Messages” in the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide and the syslog.conf(4) man page.

Return Values of the CMM API

The CMM API provides extensive return values for errors and successful function calls. They are listed in TABLE 8-1.

**TABLE 8-1 Common Return Values of the CMM API**
Return Value	Result	Possible Responses
`CMM_OK`	The function call succeeded.	None required.
`CMM_EAGAIN`	Returned information is based on a cluster view that has not been updated by the master node for more than 10 seconds.	Retry the function call.
`CMM_EBADF`	An identifier or descriptor that corresponds to a file descriptor is invalid. The connection to the CMM is no longer valid. Perhaps the CMM is dead.	Verify that data in your program is not corrupted. Call the `cmm_cmc_register` and the `cmm_notify_getfd` functions to fetch a new connection.
`CMM_EBUSY`	For all functions: The CMM API server is temporarily out of resources to respond to the requested operation. For `cmm_cmc_unregister`: An attempt to unregister a callback, that is, a call to the `cmm_cmc_unregister` function, failed because the caller's callback function is active. See the `cmm_cmc_unregister`(3CMM) man page.	Wait, then retry the function call. You can decide the length of wait, based on the application's characteristics.
`CMM_ECANCELED`	A switchover operation was cancelled. For example, when trying to demote the master, no vice-master can take over the master role.	Continue.
`CMM_ECONN`	The local CMM API process is unreachable.	Check that the process is currently running. Perhaps it is not running yet. Retry the function call.
`CMM_EEXIST`	Only one function can be registered at a time. An attempt to call the `cmm_cmc_register` function when a callback is already registered returns this message.	The calling process has already registered a callback. Verify that the existing function is required for the purpose of your program.
`CMM_EINVAL`	A function parameter has an invalid value.	Ensure that the type of each parameter matches the type in the function prototype. For example the `nodeid` is not a master-eligible node. Cast variables to the expected type if necessary and verify that the area of memory that stores the parameter is valid.
`CMM_ENOCLUSTER`	One of the following has occurred: The local node is not configured in an active cluster. This occurs, for example, when the cluster election is in progress. The local node has been removed from the cluster node table on the master node. For more information, see the `cluster_nodes_table`(4) man page. There is more than one master node. The master node has been disqualified and no vice-master node has taken over the master role. A failover has been triggered by the disqualification of the master node. During the failover, there is a brief time when there is no master node. The CMM_ENOCLUSTER error was returned during this time.	Any combination of the following: Add an entry for the node to the cluster node table. Requalify the node. Assign only one master.
`CMM_ENOENT`	An attempted operation on an item failed because the item does not exist. For example, when calling the `cmm_cmc_unregister` function, no callback has been registered. Not critical.	Any combination of the following: Verify that the area of memory that stores the item is valid. If you want to delete the item, continue.
`CMM_ENOMSG`	An attempt to dispatch an event failed because there are no events to be dispatched.	Continue.
`CMM_ENOTSUP`	The operation could not be correctly executed. This error can be the result of a system problem such as a file that cannot be created or a problem with Remote Procedure Call (RPC) services.	Examine the system log files.
`CMM_EPERM`	The call tried to execute on a node other than the master node, but it can execute only on the master node. For more information, see the `cmm_mastership_release`(3CMM), `cmm_member_setqualif`(3CMM), and `cmm_member_seizequalif`(3CMM) man pages.	Execute the function only on the master node.
`CMM_ERANGE`	The number of cells in the table is smaller than the number of nodes in the cluster. Returned by the `cmm_member_getall` function. See the `cmm_member_getall`(3CMM) man page.	Add an entry in the table for each potential peer node.
`CMM_ESRCH`	Using the `cmm_member_getinfo` function to obtain information about a node that is either not in the local cluster node table, or is in the local cluster node table but currently has the `CMM_OUT_OF_CLUSTER` role. Using the `cmm_potential_getinfo` function to obtain information about a node that is not in the local cluster node table. Using the `cmm_vicemaster_getinfo` function while the cluster has no vice-master.	Any combination of the following: Examine why the master-eligible node is down or isolated. Add an entry for this node to the cluster node table. See the `cluster_nodes_table`(4) man page. Change the node's role to master or vice-master.
`CMM_ETIMEDOUT`	No response even when an operation is retried, until the delay has expired. The function call was timed out.	Any combination of the following: Retry the function call. Reduce the load on the system.

Debugging Applications in the Foundation Services

Reporting Application Errors

Reading Error Information for Debugging

Stopping the Daemon Monitor for Debugging

Broken Pipe Error Messages

Return Values of the CMM API