This appendix describes procedures for troubleshooting your application of the Sun Cluster Geographic Edition software.
This appendix contains the following sections:
This section provides information about setting up logging and problems that you might encounter with monitoring the Sun Cluster Geographic Edition software.
Configure the logger file, /etc/opt/SUNWcacao/logger.properties, as following depending on the messages you want logged:
To select only WARNING and SEVERE cmass messages, the first line of the file should read as follows:
com.sun.cluster.level=WARNING |
To enable all geocontrol messages, the second line of the file should read as follows:
com.sun.cluster.agent.geocontrol.level=ALL |
The enabled traces are copied to the /var/opt/SUNWcacao/logs/cacao.0 file.
If you want to avoid too detailed messages in your log file from the gcr agent, use entries similar to the following in your logger file /etc/opt/SUNWcacao/logger.properties:
com.sun.cluster.level=WARNING com.sun.cluster.agent.geocontrol.gcr.level=INFO com.sun.cluster.agent.geocontrol.level=ALL |
This property file is updated each time you reinstall the SUNWscmasa package.
To avoid jmx remote traces add the following lines to the beginning of your logger.properties file:
javax.management.remote.level=OFF com.sun.jmx.remote.level=OFF java.io.level=OFF |
This section provides information about problems that you might encounter when services are migrated by using Sun Cluster Geographic Edition software.
When a loss of communication occurs between a node on which the application is online and the storage device, some application resource groups might not failover gracefully to the nodes from which the storage is accessible. The application resource group might result in a ERROR_STOP_FAILED state.
The Sun Cluster infrastructure does not initiate a switchover when I/O errors occur in a volume or its underlying devices. Because no switchover or failover occurs, the device service remains online on this node despite the fact that storage has been rendered inaccessible.
If this problem occurs, restart the application resource group on the correct nodes by using the standard Sun Cluster procedures. Refer to Clearing the STOP_FAILED Error Flag on Resources in Sun Cluster Data Services Planning and Administration Guide for Solaris OS about recovering from the ERROR_STOP_FAILED state and restarting the application.
The Sun Cluster Geographic Edition software detects state changes in the application resource group and displays the states in the output of the geoadm status command. For more information about using this command, see Monitoring the Runtime Status of the Sun Cluster Geographic Edition Software.
This section provides information about troubleshooting problems that you might encounter with starting and restarting the Sun Cluster Geographic Edition software.
After a cluster reboot the protection group configuration might be in an error state. This problem might be caused by the common agent container process not being available on one of the nodes of the cluster when the protection group is initialized after the reboot.
To fix the configuration error, use the geopg validate command on the protection group that is in an error state.
The Sun Cluster software enables the common agent container only during the Sun Cluster software installation. Therefore, if you disable the common agent container at any time after the installation, the common agent container remains disabled.
To enable the common agent container after a node reboot, use the /opt/SUNWcacao/bin/cacaoadm enable command.