Backup Failures on Bare Metal and Virtual Machine DB Systems
Database backups can fail for various reasons. Typically, a backup fails because either the database host cannot access the object store, or there are problems on the host or with the database configuration.
This topic includes information to help you determine the cause of a failure and fix the problem. The section that includes troubleshooting information is organized into several subsections, based on the error condition. If you already know the cause, you can skip to the section with the suggested solution. Otherwise, use the procedure in Finding the Problem to get started.
Finding the Problem
In the Console, a failed database backup either displays a status of Failed or hangs in the Backup in Progress or Creating state. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this topic for a solution.
-
Log on to the host as the root user and navigate to
/opt/oracle/dcs/bin/
. -
Determine the sequence of operations performed on the database.
dbcli list-jobs | grep -i <dbname>
Note the last job ID listed with a status other than Success.
-
With the job ID you noted from the previous step, use the following command to check the details of that job:
dbcli describe-job -i <job_ID> -j
Typically, running this command is enough to reveal the root cause of the failure.
-
If you require more information, review the
/opt/oracle/dcs/log/dcs-agent.log
file.You can find the job ID in this file by using the timestamp returned by the job report in step 2.
-
If the problem details suggest an RMAN issue, review the RMAN logs in the
/opt/oracle/dcs/log/<hostname>/rman/bkup/<db_unique_name>/rman_backup/<yyyy-mm-dd>
directory.Note
If the database failure is on a 2-node RAC database, perform steps 3 and 4 on both nodes.
Database Service Agent Issues
Your Oracle Cloud Infrastructure Database makes use of an agent framework to allow you to manage your database through the cloud platform. Occasionally you might need to restart the dcsagent program if it has the status of stop/waiting to resolve a backup failure.
-
From a command prompt, check the status of the agent:
initctl status initdcsagent
-
If the agent is in the stop/waiting state, try to restart the agent:
initctl start initdcsagent
-
Check the status of the agent again to confirm that it has the start/running status:
initctl status initdcsagent
Oracle Clusterware Issues
Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. Occasionally you might need to restart the Clusterware program to resolve a backup failure.
-
From command prompt, check the status of Oracle Clusterware:
crsctl check crs
crsctl stat res -t
-
If Oracle Clusterware is not online, try to restart the program:
crsctl start crs
-
Check the status of Oracle Clusterware to confirm that it is online:
crsctl check crs
Object Store Connectivity Issues
Backing up your database to Oracle Cloud Infrastructure Object Storage requires that the host can connect to the applicable Swift endpoint. You can test this connectivity by using a Swift user.
- Create a Swift user in your tenancy. See Working with Auth Tokens.
-
With the user you created in the previous step, use the following command to verify the host can access the object store.
curl -v -X HEAD -u <user_ID>:'<auth_token>' https://swiftobjectstorage.<region_name>.oraclecloud.com/v1/<object_storage_namespace>
See Object Storage FAQ for the correct region to use. See Understanding Object Storage Namespaces for information about your Object Storage namespace.
-
If you cannot connect to the object store, refer to Back Up a Database to Object Storage for how to configure object store connectivity.
Host Issues
One or more of the following conditions on the database host can cause backups to fail:
Interactive Commands in the Oracle Profile
If an interactive command such as oraenv
, or any command that might return an error or warning message, was added to the .bash_profile
file for the grid or oracle user, Database service operations like automatic backups can be interrupted and fail to complete. Check the .bash_profile
file for these commands, and remove them.
The File System Is Full
Backup operations require space in the /u01
directory on the host file system. Use the df -h
command on the host to check the space available for backups. If the file system has insufficient space, you can remove old log or trace files to free up space.
Incorrect Version of the Oracle Database Cloud Backup Module
Your system might not have the required version of the backup module (opc_installer.jar). See Unable to use Managed Backups in your DB System for details about this known issue. To fix the problem, you can follow the procedure in that section or simply update your DB system and database with the latest bundle patch.
Changes to the Site Profile File (glogin.sql)
Customizing the site profile file ($ORACLE_HOME/sqlplus/admin/glogin.sql
) can cause managed backups to fail in Oracle Cloud Infrastructure. In particular, interactive commands can lead to backup failures. Oracle recommends that you not modify this file for databases hosted in Oracle Cloud Infrastructure.
Database Issues
An improper database state or configuration can lead to failed backups.
Database Not Running During Backup
The database must be active and running (ideally on all nodes) while the backup is in progress.
Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:
srvctl status database -d <db_unique_name> -verbose
The system returns a message including the database's instance status. The instance status must be Open for the backup to succeed. If the database is not running, use the following command to start it:
srvctl start database -d <db_unique_name> -o open
If the database is mounted but does not have the Open status, use the following commands to access the SQL*Plus command prompt and set the status to Open:
sqlplus / as sysdba
alter database open;
Archiving Mode Set to NOARCHIVELOG
When you provision a new database, the archiving mode is set to ARCHIVELOG
by default. This is the required archiving mode for backup operations. Check the archiving mode setting for the database and change it to ARCHIVELOG
, if applicable.
Open an SQL*Plus command prompt and enter the following command:
select log_mode from v$database;
If you need to set the archiving mode to ARCHIVELOG
, start the database in Mount status (and not Open status), and use the following command at the SQL*Plus command prompt:
alter database archivelog;
Confirm that the db_recovery_file_dest
parameter points to +RECO
, and that the log_archive_dest_1
parameter is set to USE_DB_RECOVERY_FILE_DEST
.
For RAC databases, one instance must have the Mount status when enabling archivelog mode. To enable archivelog mode for a RAC database, perform the following steps:
-
Shut down all database instances:
srvctl stop database -d
-
Start one of the database instances in mount state:
srvctl start instance -d <db_unique_name> -i <instance_name> -o mount
-
Access the SQL*Plus command prompt:
sqlplus / as sysdba
-
Enable archive log mode:
alter database archivelog;
exit;
-
Stop the database:
srvctl stop instance -d <db_unique_name> -i <instance_name>
-
Restart all database instances:
srvctl start database -d <db_unqiue_name>
-
At the SQL*Plus command prompt, confirm the archiving mode is set to
ARCHIVELOG
:select log_mode from v$database;
Stuck Database Archiver Process and Backup Failures
Backups can fail when the database instance has a stuck archiver process. For example, this can happen when the flash recovery area (FRA) is full. You can check for this condition using the srvctl status database -db <db_unique_name> -v
command. If the command returns the following output, you must resolve the stuck archiver process issue before backups can succeed:
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Stuck Archiver
Refer to ORA-00257:Archiver Error (Doc ID 2014425.1) for information on resolving a stuck archiver process.
After resolving the stuck process, the command should return the following output :
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Open
If the instance status does not change after you resolve the underlying issue with the device or resource being full or unavailable, try one of the following workarounds:
- Restart the database using the
srvctl
command to update the status of the database in the clusterware - Upgrade the database to the latest patchset levels
Temporary Tablespace Errors
If fixed table statistics are not up to date on the database, backups can fail with errors referencing temporary tablespace present in the dcs-agent.log
file. For example:
select status from v$rman_status where COMMAND_ID=<backup_id>
ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
Gather your fixed table statics as follows to resolve this issue:
conn / as sysdba
exec dbms_stats.gather_fixed_objects_stats();
RMAN Configuration and Backup Failures
Editing certain RMAN configuration parameters can lead to backup failures in Oracle Cloud Infrastructure. To check your RMAN configuration, use the show all
command at the RMAN command line prompt.
See the following list of parameters for details about RMAN the configuration settings that should not be altered for databases in Oracle Cloud Infrastructure.
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 30 DAYS;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 5 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' MAXPIECESIZE 2 G FORMAT '%d_%I_%U_%T_%t' PARMS 'SBT_LIBRARY=/opt/oracle/dcs/commonstore/pkgrepos/oss/odbcs/libopc.so ENV=(OPC_PFILE=/opt/oracle/dcs/commonstore/objectstore/opc_pfile/1578318329/opc_tiger_iad3c8.ora)';
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO 'SBT_TAPE';
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE ENCRYPTION FOR DATABASE ON;
RMAN Retention Policy and Backup Failures
The RMAN retention policy configuration can be the source of backup failures. Using the REDUNDANCY retention policy configuration instead of the RECOVERY WINDOW policy can lead to backup failures. Be sure to use the RECOVERY WINDOW OF 30 DAYS configuration.
-
Find the database ID using the following command:
dbcli list-databases
-
Find the
BackupConfigId
value for the database using the following command:dbcli describe-database -i <database_id>
-
Update the retention policy configuration to
RECOVERY WINDOW OF 30 DAYS
:dbcli update-backupconfig -i <backup_config_id> --recoverywindow 30
Loss of Object Store Wallet File and Backup Failures
RMAN backups fail when an object store wallet file is lost. The wallet file is necessary to enable connectivity to the object store.
-
Find the database ID using the following command:
dbcli list-databases
-
Find the
BackupConfigId
value for the database using the following command:dbcli describe-database -i <database_id>
-
Find the
BackupLocation
value for the database using the following command:dbcli describe-backupconfig <backup_config_id>
-
Find the file path of the backup config parameter file (
opc_<backup_location_value>_BC.ora
) using the following command:locate opc_<backup_location_value>_BC.ora
For example:
[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# locate opc_b9naijWMAXzi9example_BC.ora /opt/oracle/dcs/commonstore/objectstore/opc_pfile/13aef284-9d6b-4eb6-8751-2988a9example/opc_b9naijWMAXzi9example_BC.ora
-
Find the file path to the wallet file in the backup config parameter file by inspecting the value stored in the
OPC_WALLET
parameter. To do this, navigate to the directory containing the backup config parameter file and use the followingcat
command:cat <backup_config_parameter_file>
For example:
[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# cat opc_b9naijWMAXzi9example_BC.ora OPC_HOST=https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/dbbackupiad OPC_WALLET='LOCATION=file:/opt/oracle/dcs/commonstore/objectstore/wallets/13aef284-9d6b-4eb6-8751-2988aexample CREDENTIAL_ALIAS=alias_opc' OPC_CONTAINER=b9naijWMAXzi9example
-
Confirm that the
cwallet.sso
file exists in the directory specified in theOPC_WALLET
parameter, and confirm that the file has the correct permissions. The file permissions should have the octal value of "600" (-rw-------
). Use the following command:ls -ltr /opt/oracle/dcs/commonstore/objectstore/wallets/<backup_config_id>
For example:
[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# ls -ltr /opt/oracle/dcs/commonstore/objectstore/wallets/13aef284-9d6b-4eb6-8751-2988aexample total 4 -rw------- 1 oracle oinstall 0 Apr 20 06:45 cwallet.sso.lck -rw------- 1 oracle oinstall 1941 Apr 20 06:45 cwallet.sso
TDE Wallet and Backup Failures
Incorrect TDE Wallet Location Specification
For backup operations to work, the $ORACLE_HOME/network/admin/sqlnet.ora
file must contain the ENCRYPTION_WALLET_LOCATION parameter formatted exactly as follows:
ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/opt/oracle/dcs/commonstore/wallets/tde/$ORACLE_UNQNAME)))
In this wallet location entry,
$ORACLE_UNQNAME
is an environment variable and should not be replaced with an actual value.Use the cat
command to check the TDE wallet location specification. For example:
[oracle@orcl tde]$ cat $ORACLE_HOME/network/admin/sqlnet.ora
ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/opt/oracle/dcs/commonstore/wallets/tde/$ORACLE_UNQNAME)))
Incorrect State of the TDE Wallet
Database backups fail if the TDE wallet is not in the proper state. The following scenarios can cause this problem:
If the database was started using SQL*Plus, and the ORACLE_UNQNAME
environment variable was not set, the wallet is not opened correctly.
To fix the problem, start the database using the srvctl
utility:
srvctl start database -d <db_unique_name>
STATUS
column in the v$encryption_wallet
view shows the value OPEN_NO_MASTER_KEY
.To check the master encryption key status and create a master key, do the following:
-
Review the the
STATUS
column in thev$encryption_wallet
view, as shown in the following example:SQL> alter session set container=pdb2; Session altered. SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet; WRL_TYPE WRL_PARAMETER STATUS WALLET_TYPE --------------- ------------------------------------------------------- ------------------ --------- FILE /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/ OPEN_NO_MASTER_KEY AUTOLOGIN
-
Confirm that the PDB is in READ WRITE open mode and is not restricted, as shown in the following example:
SQL> show pdbs CON_ID CON_NAME OPEN MODE RESTRICTED ------ ------------ ---------------------- --------------- 2 PDB$SEED READ ONLY NO 3 PDB1 READ WRITE NO 4 PDB2 READ WRITE NO
The PDB cannot be open in restricted mode (the
RESTRICTED
column must showNO
). If the PDB is currently in restricted mode, review the information in the PDB_PLUG_IN_VIOLATIONS view and resolve the issue before continuing. For more information on the PDB_PLUG_IN_VIOLATIONS view and the restricted status, review the documentation on pluggable database for your Oracle database version. -
Run the following
DBCLI
commands to change the status toOPEN
:$ sudo su – # dbcli list-database # dbcli update-tdekey -i <database_ID> -n <PDB_name> -p
The
update-tdekey
command shown will prompt you for the admin password. -
Confirm that the status of the wallet has changed from OPEN_NO_MASTER_KEY to OPEN by querying the
v$encryption_wallet
view as shown in step 1.
Incorrect Configuration Related to the TDE Wallet
Several configuration parameters related to the TDE wallet can cause backups to fail.
Other Causes of Backup Failures
Unmounted Commonstore Mount Point
The mount point /opt/oracle/dcs/commonstore
must be mounted, or backups will fail.
Confirm that the mount point /opt/oracle/dcs/commonstore
is mounted, as shown in the following example:
[root@orcl ~]# srvctl config filesystem -volume commonstore -diskgroup data
Volume device: /dev/asm/commonstore-5
Diskgroup name: data
Volume name: commonstore
Canonical volume device: /dev/asm/commonstore-5
Accelerator volume devices:
Mountpoint path: /opt/oracle/dcs/commonstore
Mount point owner: oracle
Mount users:
Type: ACFS
The state for ora.data.commonstore.acfs must be online, or backups will fail. Confirm as shown in the following example:
[root@orcl ~]# crsctl stat resource ora.data.commonstore.acfs -v
NAME=ora.data.commonstore.acfs
TYPE=ora.acfs.type
LAST_SERVER=orcl
STATE=OFFLINE
TARGET=OFFLINE
...
STATE_DETAILS=admin unmounted /opt/oracle/dcs/commonstore
...
[root@orcl ~]# ls -ltr /opt/oracle/dcs/commonstore
total 0
If the STATE_DETAILS value is unmounted
, mount the file system as shown in the following example:
[root@orcl ~]# srvctl start filesystem -volume commonstore -diskgroup data
Confirm that the change was successful as shown in the following example:
[root@orcl ~]# crsctl stat resource ora.data.commonstore.acfs -v
NAME=ora.data.commonstore.acfs
TYPE=ora.acfs.type
LAST_SERVER=orcl
STATE=ONLINE on orcl
TARGET=ONLINE
CARDINALITY_ID=ONLINE
...
STATE_DETAILS=mounted on /opt/oracle/dcs/commonstore
List the contents of the commonstore
directory to confirm that it is mounted, as shown in the following example:
[root@orcl ~]# ls -ltr /opt/oracle/dcs/commonstore
total 220
drwx------ 2 root root 65536 Apr 18 10:50 lost+found
drwx------ 3 oracle oinstall 20480 Apr 18 11:02 wallets
drwxr-xr-x 3 root root 20480 Apr 20 06:41 pkgrepos
drwxr-xr-x 4 oracle oinstall 20480 Apr 20 06:41 objectstore
The Database Is Not Properly Registered
Database backups fail if the database is not registered with the dcs-agent. This scenario can occur if you manually migrate the database to Oracle Cloud Infrastructure and do not run the dbcli register-database
command.
To check whether the database is properly registered, review the information returned by running the srvctl config database
command and the dbcli list-databases
command. If either command does not return a record of the database, contact Oracle Support Services.
For instructions on how to register the database, refer to the following topics:
Getting Help
If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.
Use the following commands to collect details about your database. Record the output of each command for reference:
dbcli list-databases
dbcli describe-database -i <database_id>
dbcli describe-component
-
Log on to the host as the root user and navigate to the
/opt/oracle/dcs/bin/
directory. -
Run the following two commands to generate information about the failed job:
dbcli list-jobs |grep -i <dbname>
dbcli describe-job -i <job_ID> -j
The <job_ID> in the second command should be the ID of the latest failed job reported from the first command.
-
Run the diagnostics collector script to create a zip file with the diagnostic information for Oracle Support Services.
diagcollector.py
This command creates a file named
diagLogs-<timestamp>.zip
in the/tmp
directory.
To collect DCS agent log files, do the following:
- Log in as opc user.
-
Run the following command:
sudo /opt/oracle/dcs/bin/diagcollector.py
-
The system returns a message indicating that agent logs are available in a zip file at a specified directory. For example:
[opc@prodpr ~]$ sudo /opt/oracle/dcs/bin/diagcollector.py Log files collected to :/tmp/dcsdiag/diagLogs-1234567890.zip Logs are being collected to: /tmp/dcsdiag/diagLogs-1234567890.zip
- Run the
srvctl getenv database -d <db_unique_name>
command and record the output for reference. -
Record the output of the view
v$encryption_wallet
. For example:SQL> select status, wrl_parameter,wallet_type from v$encryption_wallet; STATUS WRL_PARAMETER WALLET_TYPE --------------- ------------------------------------------------------- --------- OPEN /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/ AUTOLOGIN
-
Record the output of the output of the
ls -ltr <wrl_parameter>
command.For example:[oracle@patchtst ~]$ ls -ltr /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/ total 28 -rw------- 1 oracle asmadmin 2400 May 2 09:42 ewallet_2018050209420381_defaultTag.p12 -rw------- 1 oracle asmadmin 5680 May 2 09:42 ewallet.p12 -rw------- 1 oracle asmadmin 5723 May 2 09:42 cwallet.sso
Generate RMAN Backup Report File using the following command:
dbcli create-rmanbackupreport -i <db_id> -w detailed -rn <report_name>
For example:
[root@patchtst ~]# dbcli create-rmanbackupreport -i 57fvwxyz-9dc4-45d3-876b-5f850example -w detailed -rn bkpreport1
Locate the report file using the dbcli describe-rmanbackupreport -in <report_name>
command. The location of the report is given in output. For example:
[root@patchtst ~]# dbcli describe-rmanbackupreport -in bkpreport1
Backup Report details
----------------------------------------------------------------
ID: b55vwxyz-c49f-4af3-a956-acccdexample
Report Type: detailed
Location: Node patchtst: /opt/oracle/dcs/log/patchtst/rman/bkup/example_iadxyz/rman_list_backup_detail/2018-05-02/rman_list_backup_detail_2018-05-02_11-46-51.0359.log
Database ID: 57fvwxyz-9dc4-45d3-876b-5f850example
CreatedTime: May 2, 2018 11:46:38 AM UTC