Go to main content

Administering the Disaster Recovery Framework for Oracle^® Solaris Cluster 4.4

Exit Print View

» ...Documentation Home » Documentation Library » Administering the Disaster Recovery Framework ... » Troubleshooting the Disaster Recovery Framework » Troubleshooting Migration Problems

Updated: June 2019

Administering the Disaster Recovery Framework for Oracle^® Solaris Cluster 4.4

Document Information

Using This Documentation

Chapter 1 Introduction to Administering the Disaster Recovery Framework

Chapter 2 Overview of Administering a Disaster Recovery Framework Configuration

Chapter 3 Administering the Disaster Recovery Framework

Chapter 4 Administering Rights Profiles

Chapter 5 Administering Cluster Partnerships

Chapter 6 Administering Heartbeats

Chapter 7 Administering Protection Groups

Chapter 8 Administering Sites

Chapter 9 Administering Multigroups

Chapter 10 Monitoring and Validating the Disaster Recovery Framework

Chapter 11 Migrating Services

Chapter 12 Customizing Switchover and Takeover Actions

Chapter 13 Script-Based Plug-Ins

Appendix A Standard Disaster Recovery Framework Properties

Appendix B Legal Names and Values of Disaster Recovery Framework Entities

Appendix C Disaster Recovery Administration Example

Appendix D Takeover Postconditions

Appendix E Troubleshooting the Disaster Recovery Framework

Troubleshooting Monitoring and Logging

Configuring the Logger File to Avoid Too Many Traces

Configuring the Logger File to Avoid Detailed Messages From the gcr Agent

Configuring the Logger File to Avoid jmx Remote Traces

Troubleshooting Migration Problems

Resolving Problems With Application Resource Group Failover When Communication Lost With the Storage Device

Troubleshooting Cluster Start and Restart

Validating Protection Groups in an Error State

Administering Stopped Protection Groups After a Cluster Restart

Restarting the Common Agent Container

Appendix F Error Return Codes for Script-Based Plug-Ins

Language:

Troubleshooting Migration Problems

This section provides information about problems that you might encounter when services are migrated by using the disaster recovery framework.

Resolving Problems With Application Resource Group Failover When Communication Lost With the Storage Device

When a loss of communication occurs between a node on which the application is online and the storage device, some application resource groups might not failover gracefully to the nodes from which the storage is accessible. The application resource group might result in a ERROR_STOP_FAILED state.

Solution or Workaround – The disaster recovery framework does not initiate a switchover when I/O errors occur in a volume or its underlying devices. Because no switchover or failover occurs, the device service remains online on this node despite the fact that storage has been rendered inaccessible.

If this problem occurs, restart the application resource group on the correct nodes by using the standard Oracle Solaris Cluster procedures. Refer to Clearing the STOP_FAILED Error Flag on Resources in Planning and Administering Data Services for Oracle Solaris Cluster 4.4 about recovering from the ERROR_STOP_FAILED state and restarting the application.

The disaster recovery framework detects state changes in the application resource group and displays the states in the output of the geoadm status command. For more information about using this command, see Monitoring the Runtime Status of the Disaster Recovery Framework.

Copyright © 2004, 2019, Oracle and/or its affiliates.