Sun Cluster Geographic Edition System Administration Guide

Appendix D Troubleshooting Sun Cluster Geographic Edition Software

This appendix describes procedures for troubleshooting your application of the Sun Cluster Geographic Edition software.

This appendix contains the following sections:

Troubleshooting Monitoring and Logging

This section provides information about setting up logging and problems that you might encounter with monitoring the Sun Cluster Geographic Edition software.

Configuring the Logger File to Avoid Too Many Traces

Configure the logger file, /etc/opt/SUNWcacao/logger.properties, as following depending on the messages you want logged:

The enabled traces are copied to the /var/opt/SUNWcacao/logs/cacao.0 file.

Configuring the Log File to Avoid Detailed Messages From the gcr Agent

If you want to avoid too detailed messages in your log file from the gcr agent, use entries similar to the following in your logger file /etc/opt/SUNWcacao/logger.properties:


com.sun.cluster.level=WARNING
com.sun.cluster.agent.geocontrol.gcr.level=INFO
com.sun.cluster.agent.geocontrol.level=ALL

This property file is updated each time you reinstall the SUNWscmasa package.

Configuring the Log File to Avoid jmx Remote Traces

To avoid jmx remote traces add the following lines to the beginning of your logger.properties file:


javax.management.remote.level=OFF
com.sun.jmx.remote.level=OFF
java.io.level=OFF

Troubleshooting Migration Problems

This section provides information about problems that you might encounter when services are migrated by using Sun Cluster Geographic Edition software.

Resolving Problems With Application Resource Group Failover When Communication Lost With the Storage Device

When a loss of communication occurs between a node on which the application is online and the storage device, some application resource groups might not failover gracefully to the nodes from which the storage is accessible. The application resource group might result in a ERROR_STOP_FAILED state.

Solution or Workaround

The Sun Cluster infrastructure does not initiate a switchover when I/O errors occur in a volume or its underlying devices. Because no switchover or failover occurs, the device service remains online on this node despite the fact that storage has been rendered inaccessible.

If this problem occurs, restart the application resource group on the correct nodes by using the standard Sun Cluster procedures. Refer to Clearing the STOP_FAILED Error Flag on Resources in Sun Cluster Data Services Planning and Administration Guide for Solaris OS about recovering from the ERROR_STOP_FAILED state and restarting the application.

The Sun Cluster Geographic Edition software detects state changes in the application resource group and displays the states in the output of the geoadm status command. For more information about using this command, see Monitoring the Runtime Status of the Sun Cluster Geographic Edition Software.

Troubleshooting Cluster Start and Restart

This section provides information about troubleshooting problems that you might encounter with starting and restarting the Sun Cluster Geographic Edition software.

Validating Protection Groups in an Error State

After a cluster reboot the protection group configuration might be in an error state. This problem might be caused by the common agent container process not being available on one of the nodes of the cluster when the protection group is initialized after the reboot.

Solution or Workaround

To fix the configuration error, use the geopg validate command on the protection group that is in an error state.

Restarting the Common Agent Container

The Sun Cluster software enables the common agent container only during the Sun Cluster software installation. Therefore, if you disable the common agent container at any time after the installation, the common agent container remains disabled.

Solution or Workaround

To enable the common agent container after a node reboot, use the /opt/SUNWcacao/bin/cacaoadm enable command.