Troubleshooting Oracle Data Guard in Exadata Cloud Service Systems

When troubleshooting Oracle Data Guard in Exadata Cloud Service systems, you must first determine whether the problem occurs during the Data Guard setup and initialization or during Data Guard operation, when lifecycle commands are entered. The steps to identify and resolve the issues are different, depending on the scenario in which they are used.

There are three lifecycle operations: switchover, failover, and reinstate. The Data Guard broker is used for all of these commands. The broker command line interface (dgmgrl) is the main tool used to identify and troubleshoot the issues. Although you can use logfiles to identify root causes, dgmgrl is faster and easier to use to check and identify an issue.

Setting up and enabling Data Guard though the Console in Oracle Cloud Infrastructure consists of multiple steps. If the step fails, each step creates a log file that can be viewed to get information about errors that occurred.

  • Validation of the primary cloud VM cluster (or DB system) and database
  • Validation of the standby cloud VM cluster (or DB system)
  • Recreating and copying files to the standby database (passwordfile and wallets)
  • Creating Data Guard through Network (RMAN Duplicate command)
  • Configuring Data Guard broker
  • Finalizing the setup

Troubleshooting Data Guard using logfiles

The tools used to identify the issue and the locations of relevant logfiles are different, depending on the scenario in which they are used.

Use the following procedures to collect relevant log files to investigate issues. If you are unable to resolve the problem after investigating the log files, contact My Oracle Support.

Note

When preparing collected files for Oracle Support, bundle them into a compressed archive, such as a ZIP file.

Collecting Log Files for Investigating Enablement Errors

On each compute node associated with the Data Guard configuration, gather log files pertaining to the problem you experienced.

  • Enablement stage log files (such as those documenting the Create Standby Database operation) and the logs for the corresponding primary or standby system.
  • Enablement job ID logfiles. For example: 23.
  • Locations of enablement log files by enablement stage and Exadata system (primary or standby).
  • Database name logfiles (db_name or db_unique_name, depending on the file path).
Important

Check all nodes of the corresponding primary and standby Exadata systems. Commands executed on a system may have been run on any of its nodes.

Log Files from Configure Primary Database

Data Guard Deployer (DGdeployer) is the process that performs the configuration. When configuring the primary database, it creates the /var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log file.

This log should contain the root cause of a failure to configure the primary database.

Log Files from TDE Wallet

  • The primary log from the dbaasapi command line utility is: /var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log. Look for entries that contain dg_api
  • One standby log from the dbaasapi command line utility is: /var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log In this log, look for entries that contain dg_api
  • The other standby log is: /var/opt/oracle/log/<dbname>/dgcc/dgcc.log This log is the Data Guard Configuration log

Standby System Log files from Create Standby Database

  • The Oracle Cloud Deployment Engine (ODCE) creates the /var/opt/oracle/log/<dbname>/ocde/ocde.log file. This log should contain the cause of a failure to create the standby database.
  • The dbaasapi command line utility creates the var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log file. Look for entries that contain "dg_api".
  • The Data Guard configuration log file is /var/opt/oracle/log/<dbname>/dgcc/dgcc.log.

Log files from Configure Standby Database

  • DGdeployer is the process that performs the configuration. It creates the following /var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log file. This log should contain the root cause of a failure to configure the standby database.
  • The dbaasapi command line utility creates the /var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log file Look for entries that contain dg_api
  • The Data Guard configuration log is /var/opt/oracle/log/<dbname>/dgcc/dgcc.log.

Log Files from Configure Data Guard

DGdeployer is the process that performs the configuration. While configuring Data Guard, it creates the /var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log file. This log should contain the root cause of a failure to configure the primary database.

Collecting Log Files for Investigating Lifecycle Management Errors

On each node of the primary and standby sites, gather log files for the related database name (db_name).

Important

Check all nodes on both primary and standby Exadata systems. A lifecycle management operation may impact both primary and standby systems.

Locations of Lifecycle Management Log Files

  • Database alert log: /u02/app/oracle/diag/rdbms/<dbname>/<dbinstance>/trace/alert_<dbinstance>.log
  • Data Guard Broker log: /u02/app/oracle/diag/rdbms/<dbname>/<dbinstance>/trace/drc<dbinstance>.log
  • Cloud tooling log file for Data Guard: /var/opt/oracle/log/<dbname>/odg/odg.log

Troubleshooting the Data Guard Setup Process

The following errors might occur in the different steps of the Data Guard setup process. While some errors are displayed within the Console, most of the root causes can be found in the logfiles listed above.

ERROR: VALIDATE_DG sys passwd validation failed

The password entered for enabling Data Guard didn't match the primary admin password for the SYS user. This error occurs during the Validate Primary stage of enablement.

ERROR: VALIDATE_DG failed with TNS

The database may not be running. This error occurs during the Validate Primary stage of enablement. Check with srvctl and sql on the host to verify that the database is up and running on all nodes.

#### Completed OCDE with errors, please check logs ####

The primary database could not be configured. Invalid Data Guard commands or failed listener reconfiguration can cause this error.

_dg_send_wallet: Failed to create info file

The TDE wallet could not be created. The Oracle Transparent Database Encryption (TDE) keystore (wallet) files could not be prepared for transportation to the standby site. This error occurs during the create TDE Wallet stage of enablement. Either of the following items can cause failure at this stage:

  • The TDE wallet files could not be accessed
  • The enablement commands could not create an archive containing the wallet files

Troubleshooting procedure:

  1. Ensure that the cluster is accessible. To check the status of a cluster, run the following command: crsctl check cluster -all
  2. If the cluster is down, run the following command to restart it: crsctl start crs -wait
  3. If this error occurs when the cluster is accessible, check the logs for create TDE Wallet (enablement stage) to determine cause and resolution for the error.

Operating system-level command failures. Receive TDE Wallet stage fails.

The archive containing the TDE wallet was likely not transmitted to the standby site. Retrying usually solves the problem.

_dg_config_standby failed to configure standby _dg_config_dg failed to enable Data Guard

  • The primary and standby sites may not be able to communicate with each other to configure the standby database. These errors occur during the configure standby database stage of enablement. In this stage, configurations are performed on the standby database, including the rman duplicate of the primary database. To resolve this issue:
    1. Verify the connectivity status for the primary and standby sites.
    2. Ensure that the host can communicate from port 1521 to all ports. Check the network setup, including Network Security Groups (NSGs), Network Security Lists, and the remote VCN peering setup (if applicable). The best way to test communication between the host and other nodes is to access the databases using SQL*PLUS from the primary to standby and from the standby to the primary.
  • The SCAN VIPs or listeners may not be running. Use the test above to help identify the issue.

TNS "Target" or "Auxiliary" Error After Completing Enablement of Data Guard on an Exadata System

Possible causes:

  • SCAN VIPs or listeners may not be running. You can confirm this issue by using the following commands on any cluster node.
    • [grid@exa1-****** ~]$ srvctl status scan
    • [grid@exa1-****** ~]$ srvctl status scan_listener
  • Databases may not be reachable. You can confirm this issue by attempting to connect using an existing Oracle Net alias.

Troubleshooting procedure:

  1. As the oracle OS user, check for the existence of an Oracle Net alias for the container database (CDB). Look for an alias in $ORACLE_HOME/network/admin/<dbname>/tnsnames.ora.

    The following example shows an entry for a container database named db12c:

    [oracle@exa1-****** db12c]$ cat $ORACLE_HOME/network/admin/
            db12c/tnsnames.ora DB12C = (DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = exa1-*****-scan.********.******.******.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c.********.******.******.com) (FAILOVER_MODE = (TYPE = select) (METHOD = basic))))
  2. Verify that you can use the alias to connect to the database. For example, as sysdba, enter the following command: [oracle@exa1-****** db12c]$ sqlplus sys@db12c

Invalid Password

A possible cause for this error is that the Oracle Database sys or system user passwords for the database and the TDE wallet may not be the same. To compare the passwords:

  1. Connect to the database as the sys user and check the TDE status in V$ENCRYPTION_WALLET.
  2. Connect to the database as the system user and check the TDE status in V$ENCRYPTION_WALLET.
  3. Update the applicable passwords to match. Log on to the system host as opc and run the following commands:
    1. To change the SYS password: sudo dbaascli database changepassword --dbname <database_name>
    2. To change the TDE wallet password: sudo dbaascli tde changepassword --dbname <database_name>

General TDE Issues

For possible causes and resolutions to TDE wallet issues, see TDE Wallet and Backup Failures.

Lifecycle Error Messages

When the switchover, failover, and reinstate commands are run, multiple error messages may occur. Refer to the Oracle Database documentation for these error messages.

Note

Oracle recommends using the Data Guard broker command line interface (dgmgrl) to validate the configurations.
  1. As the Oracle User, connect to the primary or standby database with dgmgrl and verify the configuration and the database:

    $ dgmgrl sys/<pwd>@<database>
    DGMGRL> VALIDATE CONFIGURATION VERBOSE
    DGMGRL> VALIDATE DATABASE VERBOSE <PRIMARY>
    DGMGRL> VALIDATE DATABASE VERBOSE <STANDBY>
  2. Consult the Oracle Database documentation to check for the respective error message. For example:
    • ORA-16766: Redo apply is stopped.
    • ORA-16853: Apply lag has exceeded specified threshold.
    • ORA-16664: Unable to receive the result from a member (under the standby database).
    • ORA-12541: TNS: no listener (under the primary database)

For cause and resolution, review the errors in Database Error Messages.