Troubleshooting Cluster Start and Restart

Language:

This section provides the following information about troubleshooting problems that you might encounter with starting and restarting the disaster recovery framework:

Validating Protection Groups in an Error State

After a cluster reboot the protection group configuration might be in an error state. This problem might be caused by the common agent container process not being available on one of the nodes of the cluster when the protection group is initialized after the reboot.

Solution or Workaround – To fix the configuration error, use the geopg validate command on the protection group that is in an error state.

Administering Stopped Protection Groups After a Cluster Restart

If an entire cluster goes down then comes back up, the expected behavior is that disaster recovery framework does not restart the protection groups when the local cluster boots back up. After an entire cluster is restarted, the protection groups are deactivated on the local cluster, and the application resource groups in those protection groups will be in unmanaged state on the local cluster. After cluster restart, the administrator must determine whether data replication can be safely restarted and, if so, which cluster should have the primary role.

Solution or Workaround – After a cluster has rebooted, evaluate the state of the clusters and the condition of the storage or data. Then manually reset the roles of the protection groups and restart them, or perform a failback procedure, whichever is appropriate for the situation. If the cluster is a single-node cluster, protection groups must always be manually restarted after a node reboot, because only one node exists in the cluster. For more information, see Migrating Services and see failback and post-takeover procedures in the disaster recovery framework guide for the replication component that you are using.

Restarting the Common Agent Container

The Oracle Solaris Cluster software enables the common agent container only during the Oracle Solaris Cluster software installation. Therefore, if you disable the common agent container at any time after the installation, the common agent container remains disabled.

Solution or Workaround – To enable the common agent container after a node reboot, use the /usr/lib/cacao/bin/cacaoadm enable command.