Troubleshooting Exadata Cloud Infrastructure Systems
These topics cover some common issues you might run into and how to address them.
- Known Issues for Exadata Cloud Infrastructure
General known issues. - Troubleshoot Network Connectivity
To determine if a VM Cluster is properly configured to access the Oracle Cloud Infrastructure (OCI) Services Network, you need to perform the following steps on each virtual machine in the VM Cluster. - Backup Failures in Exadata Database Service on Dedicated Infrastructure
If your Exadata managed backup does not successfully complete, you can use the procedures in this topic to troubleshoot and fix the issue. - Troubleshooting Oracle Data Guard
Learn to identify and resolve Oracle Data Guard issues. - Patching Failures on Exadata Cloud Infrastructure Systems
- Obtaining Further Assistance
- Standby Database Fails to Restart After Switchover in Oracle Database 11g Oracle Data Guard Setup
Parent topic: Reference Guides for Exadata Cloud Infrastructure
Known Issues for Exadata Cloud Infrastructure
General known issues.
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
CPU Offline Scaling Fails
** CPU Scale Update **An error occurred during module execution. Please refer to the log file for more information
Cause: After provisioning a VM cluster, the
/var/opt/oracle/cprops/cprops.ini
file, which is automatically
generated by the database as a service (DBaaS) is not updated with the
common_dcs_agent_bindHost
and
common_dcs_agent_port
parameters and this causes CPU offline
scaling to fail.
root
user, manually add the following
entries in the /var/opt/oracle/cprops/cprops.ini
file.common_dcs_agent_bindHost=<IP_Address>
common_dcs_agent_port=7070
Note:
Thecommon_dcs_agent_port
value is 7070 always.
netstat -tunlp | grep 7070
netstat -tunlp | grep 7070
tcp 0 0 <IP address 1>:7070 0.0.0.0:* LISTEN 42092/java
tcp 0 0 <IP address 2>:7070 0.0.0.0:* LISTEN 42092/java
You can specify either of the two IP addresses, <IP
address 1> or <IP address 2> for
the common_dcs_agent_bindHost
parameter.
Parent topic: Known Issues for Exadata Cloud Infrastructure
Adding a VM to a VM Cluster Fails
[FATAL] [INS-32156] Installer has detected that there are non-readable files in oracle home. CAUSE: Following files are non-readable, due to insufficient permission oracle.ahf/data/scaqak03dv0104/diag/tfa/tfactl/user_root/tfa_client.trc ACTION: Ensure the above files are readable by grid.
Cause: Installer has detected a non-readable trace file,
oracle.ahf/data/scaqak03dv0104/diag/tfa/tfactl/user_root/tfa_client.trc
created by Autonomous Health Framework (AHF) in Oracle home that causes adding a cluster
VM to fail.
AHF ran as root
created a trc
file with
root
ownership, which the grid
user is not able to
read.
grid
user before you add VMs to a VM cluster. To fix the permission
issue, run the following commands as root
on all the existing VM
cluster
VMs:chown grid:oinstall /u01/app/19.0.0.0/grid/srvm/admin/logging.properties
chown -R grid:oinstall /u01/app/19.0.0.0/grid/oracle.ahf*
chown -R grid:oinstall /u01/app/grid/oracle.ahf*
Parent topic: Known Issues for Exadata Cloud Infrastructure
Troubleshoot Network Connectivity
To determine if a VM Cluster is properly configured to access the Oracle Cloud Infrastructure (OCI) Services Network, you need to perform the following steps on each virtual machine in the VM Cluster.
Validation check for Identity and Access management connectivity:
ssh
to a virtual machine on your ExaDB-D VM Cluster as opc user.- Execute the command:
curl https://identity.<region>.oci.oraclecloud.com
here <region> corresponds to the OCI region where your VM Cluster is deployed. If your VM Cluster is deployed in the Ashburn region you need to use “us-ashburn-1” for <region>. The curl command will now look likecurl https://identity.us-ashburn-1.oci.oraclecloud.com
. - If your Virtual Cloud Network (VCN) is properly configured for accessing the OCI
Services Network, you will get an immediate response that looks
like
{ "code" : "NotAuthorizedOrNotFound", "message" : "Authorization failed or requested resource not found." }
- The ssh session will hang and will eventually timeout if your network is not configured for accessing the OCI Services
- Depending on your VCN setup, you will need to follow the steps outlined in the action section below to configure access to the OCI Services Network.
Validation check for Object Storage Service (OSS) connectivity:
- ssh to a virtual machine on your ExaDB-D VM
Cluster as
opc
user. - Execute the command:
curl https://objectstorage.<region>.oraclecloud.com
, here <region> corresponds to the OCI region where your VM Cluster is deployed. If your VM Cluster is deployed in the Ashburn region you need to use “us-ashburn-1” for <region>. The curl command will now look likecurl https://objectstorage.us-ashburn-1.oraclecloud.com
. - If your Virtual Cloud Network (VCN) is properly configured for accessing the OCI
Services Network, you will get an immediate response that looks like
{ "code" : "NotAuthorizedOrNotFound", "message" : "Authorization failed or requested resource not found." }
- The ssh session will hang and will eventually timeout if your network is not configured for accessing the OCI Services
- Depending on your VCN setup, you will need to follow the steps outlined in the action section below to configure access to the OCI Services Network.
Action:
- This action is applicable to customers who have deployed their VM Cluster on a
private subnet.
If you haven’t already configured a Service Gateway to reach the OCI Services Network, use the instructions in the documentation to configure a Service Gateway for use by the VM Cluster to reach the OCI Services https://docs.oracle.com/en/engineered-systems/exadata-cloud-service/ecscm/ecs-network-setup.html#GUID-51C3EC2C-20DA-4EE5-B882-CD500FA6F7C6
- This action is applicable to customers who have deployed their VM Cluster on a
public subnet.
If you haven’t already configured an Internet Gateway to reach the OCI Services Network, use the instructions in the documentation to configure the Internet Gateway for use by the VM Cluster to reach OCI Services https://docs.oracle.com/en/engineered-systems/exadata-cloud-service/ecscm/ecs-network-setup.html#GUID-D8296957-E344-4688-B626-42A99E1D164B
Once you configure your VCN to reach the OCI Services network following the above instructions, execute the steps in both the Validation check sections to ensure that you have established connectivity to the OCI Services network from your VM Cluster.
Additional Information:
You can find instructions to update a service gateway here (https://docs.oracle.com/en-us/iaas/Content/Network/Tasks/servicegateway.htm#switch_label)
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
Backup Failures in Exadata Database Service on Dedicated Infrastructure
If your Exadata managed backup does not successfully complete, you can use the procedures in this topic to troubleshoot and fix the issue.
The most common causes of backup failure are the following:
- The host cannot access Object Storage
- The database configuration on the host is not correct
The information that follows is organized by the error condition. If you already know the cause, you can skip to the section with the suggested solution. Otherwise, use the procedure in Determining the Problem to get started.
- Determining the Problem
In the Console, a failed database backup either displays a status of Failed or hangs in the Backup in Progress or Creating state. If the error message does not contain enough information to point you to a solution, you can gather more information by usingdbaascli
and by viewing the log files. Then, refer to the applicable section in this topic for a solution. - Database Service Agent Issues
Your Oracle Cloud Infrastructure database makes use of an agent framework to allow you to manage your database through the cloud platform. Use the following to check and restart thedbcsagent
. - Object Store Connectivity Issues
Backing up your database to Oracle Cloud Infrastructure Object Storage requires that the host can connect to the applicable Swift endpoint. - Host Issues
One or more of the following conditions on the database host can cause backups to fail: - Database Issues
An improper database state or configuration can lead to failed backups. - TDE Wallet and Backup Failures
Learn to identify the root cause of TDE wallet and backup failures.
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
Determining the Problem
In the Console, a failed database backup either displays a status of
Failed or hangs in the Backup in Progress
or Creating state. If the error message does not contain enough
information to point you to a solution, you can gather more information by using
dbaascli
and by viewing the log files. Then, refer to the applicable
section in this topic for a solution.
Database backups can fail during the RMAN
configuration stage or
during a running RMAN
backup job. RMAN configuration tasks include
validating object store connectivity, backup module installation, and
RMAN
configuration changes. The log files you examine depend on
which stage the failure occurs.
- Log on to the host as the
root
user. -
Check the applicable log file:
- If the failure occurred during
RMAN
configuration, navigate to the/var/opt/oracle/log/<database_name>/bkup/
directory and check thebkup.log
file. - If the failure occurred during the backup job, navigate to the
/var/opt/oracle/log/<database_name>/obkup/
directory and check theobkup.log
file.
- If the failure occurred during
Note:
- Each execution of
bkup
andobkup
commands generates a separate log file butbkup.log
andobkup.log
are symbolic links that point to the most recently generated log file. - Ensure that you check the log files on all of the Exadata DB system compute nodes because all nodes send backup pieces to Object Storage.
Database Service Agent Issues
Your Oracle Cloud Infrastructure database makes use of an agent framework to allow you to manage your database through the cloud platform. Use the following to check and restart the dbcsagent
.
Occasionally you might need to restart the dbcsagent
program if it has the status of stop/waiting to resolve a backup failure. View the /opt/oracle/dcs/log/dcs-agent.log
file to identify issues with the agent.
- From a command prompt, check the status of the
agent:
systemctl status dbcsagent.service
- If the agent is in the stop/waiting state, try to restart
the agent:
systemctl start dbcsagent.service
- Check the status of the agent again to confirm that it has the
stop/running status:
systemctl status dbcsagent.service
Object Store Connectivity Issues
Backing up your database to Oracle Cloud Infrastructure Object Storage requires that the host can connect to the applicable Swift endpoint.
Though Oracle controls the actual Swift user credentials for the storage bucket for managed backups, verifying general connectivity to Object Storage in your region is a good indicator that object store connectivity is not the issue. You can test this connectivity by using another Swift user.
- Create a Swift user in your tenancy. See Working with Auth Tokens.
- With the user you created in the previous step, use the following command to
verify the host can access the object
store.
See Object Storage FAQ for the correct region to use. See Understanding Object Storage Namespaces for information about your Object Storage namespace.curl -v -X HEAD -u <user_ID>:'<auth_token>' https://swiftobjectstorage.<region_name>.oraclecloud.com/v1/<object_storage_namespace>
- If you cannot connect to the object store, refer to Prerequisites for Backups on Exadata Cloud Service topic for information on configuring object store connectivity.
Host Issues
One or more of the following conditions on the database host can cause backups to fail:
If an interactive command such as oraenv
, or any command that might
return an error or warning message, was added to the .bash_profile
file for the grid or oracle user, Database service operations like automatic backups
can be interrupted and fail to complete. Check the .bash_profile
file for these commands, and remove them.
Backup operations require space in the /u01
directory on the host
file system. Use the df -h
command on the host to check the space
available for backups. If the file system has insufficient space, you can remove old
log or trace files to free up space.
Your system might not have the required version of the backup module
(opc_installer.jar
). See Unable to use Managed Backups in your DB System for
details about this known issue. To fix the problem, you can follow the procedure in
that section or simply update your DB system and database with the latest bundle
patch.
Customizing the site profile file (
$ORACLE_HOME/sqlplus/admin/glogin.sql
) can cause managed
backups to fail in Oracle Cloud Infrastructure. In particular, interactive commands
can lead to backup failures. Oracle recommends that you not modify this file for
databases hosted in Oracle Cloud Infrastructure.
Database Issues
An improper database state or configuration can lead to failed backups.
The database must be active and running (ideally on all nodes) while the backup is in progress.
srvctl status database -d <db_unique_name> -verbose
Open
for the backup to succeed. If the
database is not running, use the following command to start it:
srvctl start database -d <db_unique_name> -o open
Open
status, use
the following commands to access the SQL*Plus command prompt and set the status to
Open
:
sqlplus / as sysdba
alter database open;
When you provision a new database, the archiving mode is set to
ARCHIVELOG
by default. This is the required archiving mode for
backup operations. Check the archiving mode setting for the database and change it
to ARCHIVELOG
, if applicable.
select log_mode from v$database;
ARCHIVELOG
, start the database
in MOUNT
status (and not OPEN
status), and use the
following command at the SQL*Plus command prompt:
alter database archivelog;
Confirm that the db_recovery_file_dest
parameter points to
+RECO
, and that the log_archive_dest_1
parameter
is set to USE_DB_RECOVERY_FILE_DEST
.
For RAC databases, one instance must have the MOUNT
status when enabling
archivelog mode. To enable archivelog mode for a RAC database, perform the following
steps:
- Shut down all database instances:
srvctl stop database -d
- Start one of the database instances in mount state:
srvctl start instance -d <db_unique_name> -i <instance_name> -o mount
- Access the SQL*Plus command prompt:
sqlplus / as sysdba
- Enable archive log mode:
alter database archivelog; exit;
- Stop the database:
srvctl stop instance -d <db_unique_name> -i <instance_name>
- Restart all database instances:
srvctl start database -d <db_unqiue_name>
- At the SQL*Plus command prompt, confirm the archiving mode is set to:
ARCHIVELOG
:select log_mode from v$database;
srvctl status database -db
<db_unique_name> -v
command. If the
command returns the following output, you must resolve the stuck archiver process
issue before backups can succeed:
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Stuck Archiver
Refer to ORA-00257:Archiver Error (Doc ID 2014425.1) for information on resolving a stuck archiver process.
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Open
If the instance status does not change after you resolve the underlying issue with
the device or resource being full or unavailable, try restarting the database using
the srvctl
command to update the status of the database in the
clusterware.
Editing certain RMAN configuration parameters can lead to backup failures in Oracle
Cloud Infrastructure. To check your RMAN configuration, use the show
all
command at the RMAN command line prompt.
See the following list of parameters for details about RMAN the configuration settings that should not be altered for databases in Oracle Cloud Infrastructure.
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 30 DAYS;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 5 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/var/opt/oracle/dbaas_acfs/<db_name>/opc/libopc.so, ENV=(OPC_PFILE=/var/opt/oracle/dbaas_acfs/<db_name>/opc/opc<db_name>.ora)';
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO 'SBT_TAPE';
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE ENCRYPTION FOR DATABASE ON;
RMAN backups fail when an object store wallet file is lost. The wallet file is necessary to enable connectivity to the object store.
-
Get the name of the database with the backup failure using SQL*Plus:
show parameter db_name
-
Determine the file path of the backup config parameter file that contains the RMAN wallet information at the Linux command line:
locate opc_<database_name>.ora
For example:find / -name "opctestdb30.ora" -print /var/opt/oracle/dbaas_acfs/testdb30/opc/opctestdb30.ora
-
Find the file path to the wallet file in the backup config parameter file by inspecting the value stored in the
OPC_WALLET
parameter. To do this, navigate to the directory containing the backup config parameter file and use the followingcat
command:cat opc<database_name>.ora
For example:cd /var/opt/oracle/dbaas_acfs/testdb30/opc/
ls -altr *.ora opctestdb30.ora
cat opctestdb30.ora OPC_HOST=https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/dbbackupphx OPC_WALLET='LOCATION=file:/var/opt/oracle/dbaas_acfs/testdb30/opc/opc_wallet CREDENTIAL_ALIAS=alias_opc' OPC_CONTAINER=bUG3TFsSi8QzjWfuTxqqExample _OPC_DEFERRED_DELETE=false
-
Confirm that the
cwallet.sso
file exists in the directory specified in theOPC_WALLET
parameter, and confirm that the file has the correct permissions. The file permissions should have the octal value of "600" (-rw-------
). Use the following command:ls -ltr /var/opt/oracle/dbaas_acfs/<database_name>/opc/opc_wallet
For example:ls -altr /var/opt/oracle/dbaas_acfs/testdb30/opc/opc_wallet -rw------- 1 oracle oinstall 0 Oct 29 01:59 cwallet.sso.lck -rw------- 1 oracle oinstall 111231 Oct 29 01:59 cwallet.sso
TDE Wallet and Backup Failures
Learn to identify the root cause of TDE wallet and backup failures.
$ORACLE_HOME/network/admin/sqlnet.ora
file must contain the
ENCRYPTION_WALLET_LOCATION
parameter formatted exactly as
follows:ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet)))
cat
command to check the TDE wallet location specification.
For
example:$ cat $ORACLE_HOME/network/admin/sqlnet.ora
ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet)))
Database backups fail if the TDE wallet is not in the proper state. The following scenarios can cause this problem:
If the database was started using SQL*Plus, and the ORACLE_UNQNAME
environment variable was not set, the wallet is not opened correctly.
srvctl
utility:srvctl start database -d <db_unique_name>
In a multitenant environment for Oracle Database versions that support PDB-level
keystore, each PDB has its own master encryption key. For Oracle 18c databases, this
encryption key is stored in a single keystore used by all containers. (Oracle
Database 19c does not support a keystore at the PDB level.) After you create or plug
in a new PDB, you must create and activate a master encryption key for it. If you do
not do so, the STATUS
column in the
v$encryption_wallet
view shows the value
OPEN_NO_MASTER_KEY
.
To check the master encryption key status and create a master key, do the following:
-
Review the the
STATUS
column in thev$encryption_wallet
view, as shown in the following example:SQL> alter session set container=pdb2; Session altered. SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet; WRL_TYPE WRL_PARAMETER STATUS WALLET_TYPE ---------- ----------------------------------------------- ------------------ ----------- FILE /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ OPEN_NO_MASTER_KEY AUTOLOGIN
-
Confirm that the PDB is in READ WRITE open mode and is not restricted, as shown in the following example:
SQL> show pdbs CON_ID CON_NAME OPEN MODE RESTRICTED ------ ------------ ---------------------- --------------- 2 PDB$SEED READ ONLY NO 3 PDB1 READ WRITE NO 4 PDB2 READ WRITE NO
The PDB cannot be open in restricted mode (the
RESTRICTED
column must showNO
). If the PDB is currently in restricted mode, review the information in thePDB_PLUG_IN_VIOLATIONS
view and resolve the issue before continuing. For more information on thePDB_PLUG_IN_VIOLATIONS
view and the restricted status, review the Oracle Multitenant Administrator’s Guide on pluggable database for your Oracle Database version. -
Create and activate a master encryption key for the PDB:
- Set the container to the PDB:
ALTER SESSION SET CONTAINER = <pdb>;
- Create and activate a master encryption key in the PDB by
executing the following command:
ADMINISTER KEY MANAGEMENT SET KEY USING TAG '<tag>' FORCE KEYSTORE IDENTIFIED BY <keystore-password> WITH BACKUP USING '<backup_identifier>';
Note the following:
- The
USING TAG
clause is optional and can be used to associate a tag with the new master encryption key. -
The
WITH BACKUP
clause is optional and can be used to create a backup of the keystore before the new master encryption key is created.
You can also use the
dbaascli
commandsdbaascli tde status
anddbaascli tde rotate masterkey
to investigate and manage your keys. - Set the container to the PDB:
-
Confirm that the status of the wallet has changed from
OPEN_NO_MASTER_KEY
to OPEN by querying thev$encryption_wallet
view as shown in step 1.
Configuration parameters related to the TDE wallet can cause backups to fail.
open
and the wallet type is
auto login
by checking the v$encryption_wallet
view. For example:
SQL> select status, wrl_parameter,wallet_type from v$encryption_wallet;
STATUS WRL_PARAMETER WALLET_TYPE
------- ---------------------------------------------- --------------
OPEN /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ AUTOLOGIN
v$encryption_wallet
view. For example:
$ sqlplus / as sysdba
SQL> alter session set container=pdb1;
Session altered.
SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet;
WRL_TYPE WRL_PARAMETER STATUS WALLET_TYPE
--------- ----------------------------------------------- -------- -----------
FILE /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ OPEN AUTOLOGIN
ewallet.p12
) can cause backups to fail if it is
missing, or if it has incompatible file system permissions or ownership. Check the
file as shown in the following example as the root
user:
# ls -altr /var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet/ewallet.p12
total 76
-rw------ 1 oracle oinstall 5467 Oct 1 20:17 ewallet.p12
The TDE wallet file should have file permissions with the octal value "600"
(-rw-------
), and the owner of this file should be a part of
the oinstall
operating system group.
cwallet.sso
) can cause
backups to fail if it is missing, or if it has incompatible file system permissions
or ownership. Check the file as shown in the following example as the
root
user:
# ls -altr /var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet/cwallet.sso
total 76
-rw------ 1 oracle oinstall 5512 Oct 1 20:18 cwallet.sso
The auto login wallet file should have file permissions with the octal
value "600" (-rw-------
), and the owner of this file should be a
part of the oinstall
operating system group.
Troubleshooting Oracle Data Guard
Learn to identify and resolve Oracle Data Guard issues.
When troubleshooting Oracle Data Guard, you must first determine whether the problem occurs during the Data Guard setup and initialization or during Data Guard operation, when lifecycle commands are entered. The steps to identify and resolve the issues are different, depending on the scenario in which they are used.
There are three lifecycle operations: switchover, failover, and reinstate. The Data Guard
broker is used for all of these commands. The broker command line interface
(dgmgrl
) is the main tool used to identify and troubleshoot the
issues. Although you can use logfiles to identify root causes, dgmgrl
is faster and easier to use to check and identify an issue.
Setting up and enabling Data Guard involves multiple steps. Log files are created for each step. If any of the steps fail, review the relevant log file to identify and fix the problem.
- Validation of the primary cloud VM Cluster and database
- Validation of the standby cloud VM Cluster
- Recreating and copying files to the standby database (passwordfile and wallets)
- Creating Data Guard through Network (RMAN Duplicate command)
- Configuring Data Guard broker
- Finalizing the setup
- Troubleshooting Data Guard using logfiles
The tools used to identify the issue and the locations of relevant logfiles are different, depending on the scenario in which they are used. - Troubleshooting the Data Guard Setup Process
The following errors might occur in the different steps of the Data Guard setup process. While some errors are displayed within the Console, most of the root causes can be found in the logfiles
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
Troubleshooting Data Guard using logfiles
The tools used to identify the issue and the locations of relevant logfiles are different, depending on the scenario in which they are used.
Use the following procedures to collect relevant log files to investigate issues. If you are unable to resolve the problem after investigating the log files, contact My Oracle Support.
Note:
When preparing collected files for Oracle Support, bundle them into a compressed archive, such as a ZIP file.On each compute node associated with the Data Guard configuration, gather log files pertaining to the problem you experienced.
- Enablement stage log files (such as those documenting the Create Standby Database operation) and the logs for the corresponding primary or standby system.
- Enablement job ID logfiles. For example: 23.
- Locations of enablement log files by enablement stage and Exadata system (primary or standby).
- Database name logfiles (
db_name
ordb_unique_name
, depending on the file path).
Note:
Check all nodes of the corresponding primary and standby Exadata systems. Commands executed on a system may have been run on any of its nodes.Data Guard Deployer (DGdeployer
) is the process that
performs the configuration. When configuring the primary database, it creates the
/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file.
This log should contain the root cause of a failure to configure the primary database.
- The primary log from the
dbaasapi
command-line utility is:/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
. Look for entries that containdg_api
. - One standby log from the
dbaasapi
command-line utility is:/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
. In this log, look for entries that containdg_api
. - The other standby log is:
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log
. This log is the Data Guard configuration log.
- The Oracle Cloud Deployment Engine (ODCE) creates the
/var/opt/oracle/log/<dbname>/ocde/ocde.log
file. This log should contain the cause of a failure to create the standby database. - The
dbaasapi
command line utility creates thevar/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
file. Look for entries that containdg_api
. - The Data Guard configuration log file is
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log
.
DGdeployer
is the process that performs the configuration. It creates the following/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file. This log should contain the root cause of a failure to configure the standby database.- The
dbaasapi
command-line utility creates the/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
file. Look for entries that containdg_api
. - The Data Guard configuration log is
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log
.
DGdeployer
is the process that performs the
configuration. While configuring Data Guard, it creates the
/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file. This log should contain the root cause of a failure to configure the primary
database.
On each node of the primary and standby sites, gather log files for the
related database name (db_name
).
Note:
Check all nodes on both primary and standby Exadata systems. A lifecycle management operation may impact both primary and standby systems.- Database alert log:
/u02/app/oracle/diag/rdbms/<dbname>/<dbinstance>/trace/alert_<dbinstance>.log
- Data Guard Broker log:
/u02/app/oracle/diag/rdbms/<dbname>/<dbinstance>/trace/drc<dbinstance>.log
- Cloud tooling log file for Data Guard:
/var/opt/oracle/log/<dbname>/odg/odg.log
Parent topic: Troubleshooting Oracle Data Guard
Troubleshooting the Data Guard Setup Process
The following errors might occur in the different steps of the Data Guard setup process. While some errors are displayed within the Console, most of the root causes can be found in the logfiles
The password entered for enabling Data Guard didn't match the primary admin password for the SYS user. This error occurs during the Validate Primary stage of enablement.
The database may not be running. This error occurs during the Validate Primary stage
of enablement. Check with srvctl
and sql
on the
host to verify that the database is up and running on all nodes.
The primary database could not be configured. Invalid Data Guard commands or failed listener reconfiguration can cause this error.
The TDE wallet could not be created. The Oracle Transparent Database Encryption (TDE) keystore (wallet) files could not be prepared for transportation to the standby site. This error occurs during the create TDE Wallet stage of enablement. Either of the following items can cause failure at this stage:
- The TDE wallet files could not be accessed
- The enablement commands could not create an archive containing the wallet files
Troubleshooting procedure:
- Ensure that the cluster is accessible. To check the status of a cluster, run the
following command:
crsctl check cluster -all
- If the cluster is down, run the following command to restart it:
crsctl start crs -wait
- If this error occurs when the cluster is accessible, check the logs for create TDE Wallet (enablement stage) to determine cause and resolution for the error.
The archive containing the TDE wallet was likely not transmitted to the standby site. Retrying usually solves the problem.
- The primary and standby sites may not be able to communicate with each other to
configure the standby database. These errors occur during the configure standby
database stage of enablement. In this stage, configurations are performed on the
standby database, including the rman duplicate of the primary database. To
resolve this issue:
- Verify the connectivity status for the primary and standby sites.
- Ensure that the host can communicate from port 1521 to all ports. Check the network setup, including Network Security Groups (NSGs), Network Security Lists, and the remote VCN peering setup (if applicable). The best way to test communication between the host and other nodes is to access the databases using SQL*PLUS from the primary to standby and from the standby to the primary.
- The SCAN VIPs or listeners may not be running. Use the test above to help identify the issue.
Possible causes:
- SCAN VIPs or listeners may not be running. You can confirm this issue by using
the following commands on any cluster node.
-
[grid@exa1-****** ~]$ srvctl status scan
-
[grid@exa1-****** ~]$ srvctl status scan_listener
-
- Databases may not be reachable. You can confirm this issue by attempting to connect using an existing Oracle Net alias.
Troubleshooting procedure:
- As the oracle OS user, check for the existence of an Oracle Net alias for the
container database (CDB). Look for an alias in
$ORACLE_HOME/network/admin/<dbname>/tnsnames.ora.
The following example shows an entry for a container database named db12c:
cat $ORACLE_HOME/network/admin/db12c/tnsnames.ora DB12C = (DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = exa1-*****-scan.********.******.******.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = db12c.********.******.******.com) (FAILOVER_MODE = (TYPE = select) (METHOD = basic))))
- Verify that you can use the alias to connect to the database. For example, as
sysdba, enter the following command:
sqlplus sys@db12c
A possible cause for this error is that the Oracle Database sys or system user passwords for the database and the TDE wallet may not be the same. To compare the passwords:
- Connect to the database as the sys user and check the TDE status in
.V$ENCRYPTION_WALLET
- Connect to the database as the system user and check the TDE status in
.V$ENCRYPTION_WALLET
- Update the applicable passwords to match. Log on to the system host as opc
and run the following commands:
- To change the SYS password:
sudo dbaascli database changepassword --dbname <database_name>
- To change the TDE wallet password:
sudo dbaascli tde changepassword --dbname <database_name>
- To change the SYS password:
For possible causes and resolutions to TDE wallet issues, see TDE Wallet and Backup Failures .
When the switchover, failover, and reinstate commands are run, multiple error messages may occur. Refer to the Oracle Database documentation for these error messages.
Note
Oracle recommends using the Data Guard broker command line interface (dgmgrl) to validate the configurations.
-
As the Oracle User, connect to the primary or standby database with
dgmgrl
and verify the configuration and the database:dgmgrl sys/<pwd>@<database> DGMGRL> VALIDATE CONFIGURATION VERBOSE DGMGRL> VALIDATE DATABASE VERBOSE <PRIMARY> DGMGRL> VALIDATE DATABASE VERBOSE <STANDBY>
- Consult the Oracle Database documentation to check for the respective error
message. For example:
- ORA-16766: Redo apply is stopped.
- ORA-16853: Apply lag has exceeded specified threshold.
- ORA-16664: Unable to receive the result from a member (under the standby database).
- ORA-12541: TNS: no listener (under the primary database)
For cause and resolution, review the errors in Database Error Messages.
Parent topic: Troubleshooting Oracle Data Guard
Patching Failures on Exadata Cloud Infrastructure Systems
Patching operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the virtual machine cannot access the object store.
- Determining the Problem
In the Console, you can identify a failed patching operation by viewing the patch history of an Exadata Cloud Infrastructure system or an individual database. - Troubleshooting and Diagnosis
Diagnose the most common issues that can occur during the patching process of any of the Exadata Cloud Infrastructure components.
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
Determining the Problem
In the Console, you can identify a failed patching operation by viewing the patch history of an Exadata Cloud Infrastructure system or an individual database.
A patch that was not successfully applied displays a status of
Failed
and includes a brief description of the
error that caused the failure. If the error message does not contain enough
information to point you to a solution, you can use the database CLI and log
files to gather more data. Then, refer to the applicable section in this
topic for a solution.
Troubleshooting and Diagnosis
Diagnose the most common issues that can occur during the patching process of any of the Exadata Cloud Infrastructure components.
- Database Server VM Issues
One or more of the following conditions on the database server VM can cause patching operations to fail. - Oracle Grid Infrastructure Issues
One or more of the following conditions on Oracle Grid Infrastructure can cause patching operations to fail. - Oracle Databases Issues
An improper database state can lead to patching failures.
Database Server VM Issues
One or more of the following conditions on the database server VM can cause patching operations to fail.
Database Server VM Connectivity Problems
Cloud tooling relies on the proper networking and connectivity configuration between virtual machines of a given VM cluster. If the configuration is not set properly, this may incur in failures on all the operations that require cross-node processing. One example can be not being able to download the required files to apply a given patch.
Given the case, you can perform the following actions:
- Verify that your DNS configuration is correct so that the relevant virtual machine addresses are resolvable within the VM cluster.
- Refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.
Parent topic: Troubleshooting and Diagnosis
Oracle Grid Infrastructure Issues
One or more of the following conditions on Oracle Grid Infrastructure can cause patching operations to fail.
Oracle Grid Infrastructure is Down
Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the VM Cluster for patching operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a patching failure.
./crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
crsctl start cluster -all
crsctl check cluster
Parent topic: Troubleshooting and Diagnosis
Oracle Databases Issues
An improper database state can lead to patching failures.
Oracle Database is Down
The database must be active and running on all the active nodes so the patching operations can be completed successfully across the cluster.
srvctl status database -d db_unique_name -verbose
The system returns a message including the database instance status. The instance status must be Open for the patching operation to succeed.
srvctl start database -d db_unique_name -o open
Parent topic: Troubleshooting and Diagnosis
Obtaining Further Assistance
If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.
- Collecting Cloud Tooling Logs
Use the relevant log files that could assist Oracle Support for further investigation and resolution of a given issue. - Collecting Oracle Diagnostics
Related Topics
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems
Collecting Cloud Tooling Logs
Use the relevant log files that could assist Oracle Support for further investigation and resolution of a given issue.
DBAASCLI Logs
/var/opt/oracle/log/dbaascli
dbaascli.log
Parent topic: Obtaining Further Assistance
Standby Database Fails to Restart After Switchover in Oracle Database 11g Oracle Data Guard Setup
Description: After performing the switchover, the new standby (old primary) database remains shut down and fails to restart.
Action: After performing switchover, do the following:
- Restart the standby database using the
srvctl start database -db <standby dbname>
command. - Reload the listener as
grid
user on all primary and standby cluster nodes.- To reload the listener using high availability, download and
apply patch 25075940 to the Grid home, and then run
lsnrctl reload -with_ha
. - To reload the listener, run
lsrnctl reload
.
- To reload the listener using high availability, download and
apply patch 25075940 to the Grid home, and then run
After reloading the listener, verify that the
<dbname>_DGMGRL
services are loaded into the
listener using the lsnrctl status
command.
To download patch 25075940
- Log in to My Oracle Support.
- Click Patches & Updates.
- Select Bug Number from the Number/Name or Bug Number (Simple) drop-down list.
- Enter the bug number 34741066, and then click Search.
- From the search results, click the name of the latest patch.
You will be redirected to the Patch 34741066: LSNRCTL RELOAD -WITH_HA FAILED TO READ THE STATIC ENTRY IN LISTENER.ORA page.
- Click Download.
Parent topic: Troubleshooting Exadata Cloud Infrastructure Systems