Oracle® Solaris Cluster Geographic Edition System Administration Guide

Exit Print View

Updated: July 2014, E39667-01
 
 

Troubleshooting Geographic Edition Software

This appendix describes procedures for troubleshooting your application of the Geographic Edition software.

This appendix contains the following sections:

Troubleshooting Monitoring and Logging

This section provides the following information about setting up logging and problems that you might encounter with monitoring the Geographic Edition software:

For information about logging, see Viewing the Geographic Edition Log Messages.

Configuring the Logger File to Avoid Too Many Traces

Configure the logger file, /etc/cacao/instances/default/private/logger.properties, as following depending on the cmass messages you want logged:

  • To select only WARNING and SEVERE messages, the first line of the file should read as follows:

    com.sun.cluster.level=WARNING
  • To enable all geocontrol messages, the second line of the file should read as follows:

    com.sun.cluster.agent.geocontrol.level=ALL

The enabled traces are copied to the /var/cacao/instances/default/logs/cacao.0 file.

Configuring the Logger File to Avoid Detailed Messages From the gcr Agent

If you want to avoid too detailed messages in your log file from the gcr agent, use entries similar to the following in your logger file /etc/cacao/instances/default/private/logger.properties:

com.sun.cluster.level=WARNING
com.sun.cluster.agent.geocontrol.gcr.level=INFO
com.sun.cluster.agent.geocontrol.level=ALL

This property file is updated each time you reinstall the SUNWscmasa package.

Configuring the Logger File to Avoid jmx Remote Traces

To avoid jmx remote traces add the following lines to the beginning of your logger.properties file:

javax.management.remote.level=OFF
com.sun.jmx.remote.level=OFF
java.io.level=OFF

Troubleshooting Migration Problems

This section provides information about problems that you might encounter when services are migrated by using Geographic Edition software.

Resolving Problems With Application Resource Group Failover When Communication Lost With the Storage Device

When a loss of communication occurs between a node on which the application is online and the storage device, some application resource groups might not failover gracefully to the nodes from which the storage is accessible. The application resource group might result in a ERROR_STOP_FAILED state.

Solution or Workaround

The Oracle Solaris Cluster infrastructure does not initiate a switchover when I/O errors occur in a volume or its underlying devices. Because no switchover or failover occurs, the device service remains online on this node despite the fact that storage has been rendered inaccessible.

If this problem occurs, restart the application resource group on the correct nodes by using the standard Oracle Solaris Cluster procedures. Refer to Clearing the STOP_FAILED Error Flag on Resources in Oracle Solaris Cluster Data Services Planning and Administration Guide about recovering from the ERROR_STOP_FAILED state and restarting the application.

The Geographic Edition software detects state changes in the application resource group and displays the states in the output of the geoadm status command. For more information about using this command, see Monitoring the Runtime Status of the Geographic Edition Software.

Troubleshooting Cluster Start and Restart

This section provides the following information about troubleshooting problems that you might encounter with starting and restarting the Geographic Edition software:

Validating Protection Groups in an Error State

After a cluster reboot the protection group configuration might be in an error state. This problem might be caused by the common agent container process not being available on one of the nodes of the cluster when the protection group is initialized after the reboot.

Solution or Workaround

To fix the configuration error, use the geopg validate command on the protection group that is in an error state.

Restarting the Common Agent Container

The Oracle Solaris Cluster software enables the common agent container only during the Oracle Solaris Cluster software installation. Therefore, if you disable the common agent container at any time after the installation, the common agent container remains disabled.

Solution or Workaround

To enable the common agent container after a node reboot, use the /usr/lib/cacao/bin/cacaoadm enable command.

Matching the Nodelist Property of an Availability Suite Protection Group to Those of Its Device Group and Resource Group

When you add resource groups, or Availability Suite device groups to a protection group, or when you run the command geopg get on a protection group, the order of the hosts in the nodelist property of each device group and resource group in the protection group must match the order of the hosts in the nodelist property of the protection group, or the operation will fail with a message similar to:

Application resource group app-rg must have a nodelist whose physical host components match those
of protection group app-pg and the resources it contains.

The Geographic Edition software requires that the entries in the nodelist property of an Availability Suite protection group match those of any device group or resource group added to the protection group. The order of the entries in their nodelist properties must also be identical.

Solution or Workaround

Ensure that the entries, and the order of the entries in the nodelist properties of a protection group, of its device groups, and of its resource groups are identical.