Go to main content

Oracle® Solaris Cluster 4.3 Geographic Edition System Administration Guide

Exit Print View

Updated: June 2017
 
 

Troubleshooting Cluster Start and Restart

This section provides the following information about troubleshooting problems that you might encounter with starting and restarting the Geographic Edition framework:

Validating Protection Groups in an Error State

After a cluster reboot, the protection group configuration might be in an error state. This problem might be caused by the common agent container process not being available on one of the nodes of the cluster when the protection group is initialized after the reboot.

Solution or Workaround – To fix the configuration error, use the geopg validate command on the protection group that is in an error state.

Administering Stopped Protection Groups After a Cluster Restart

If an entire cluster goes down then comes back up, the expected behavior is that Geographic Edition does not restart the protection groups when the local cluster boots back up. After an entire cluster is restarted, the protection groups are deactivated on the local cluster, and the application resource groups in those protection groups will be in unmanaged state on the local cluster. After cluster restart, the administrator must determine whether data replication can be safely restarted and, if so, which cluster should have the Primary role.

Solution or Workaround – After a cluster has rebooted, evaluate the state of the clusters and the condition of the storage or data. Then manually reset the roles of the protection groups and restart them, or perform a failback procedure, whichever is appropriate for the situation. If the cluster is a single-node cluster, protection groups must always be manually restarted after a node reboot, because only one node exists in the cluster. For more information, see Migrating Services and see failback and post-takeover procedures in the Geographic Edition guide for the replication component that you are using.

Restarting the Common Agent Container

The Oracle Solaris Cluster software enables the common agent container only during the Oracle Solaris Cluster software installation. Therefore, if you disable the common agent container at any time after the installation, the common agent container remains disabled.

Solution or Workaround – To enable the common agent container after a node reboot, use the /usr/lib/cacao/bin/cacaoadm enable command.

Matching the Nodelist Property of an Availability Suite Protection Group to Those of Its Device Group and Resource Group

When you add resource groups or Availability Suite device groups to a protection group, or when you run the command geopg get on a protection group, the order of the hosts in the Nodelist property of each device group and resource group in the protection group must match the order of the hosts in the Nodelist property of the protection group, or the operation will fail with a message similar to the following example:

Application resource group app-rg must have a nodelist whose physical host components match those
of protection group app-pg and the resources it contains.

The Geographic Edition framework requires that the entries in the Nodelist property of an Availability Suite protection group match those of any device group or resource group added to the protection group. The order of the entries in their Nodelist properties must also be identical.

Solution or Workaround – Ensure that the entries, and the order of the entries in the Nodelist properties of a protection group, of its device groups, and of its resource groups are identical.