Backup Failures on Bare Metal and Virtual Machine DB Systems

Database backups can fail for various reasons. Typically, a backup fails because either the database host cannot access the object store, or there are problems on the host or with the database configuration.

This topic includes information to help you determine the cause of a failure and fix the problem. The section that includes troubleshooting information is organized into several subsections, based on the error condition. If you already know the cause, you can skip to the section with the suggested solution. Otherwise, use the procedure in Finding the Problem to get started.

Finding the Problem

In the Console, a failed database backup either displays a status of Failed or hangs in the Backup in Progress or Creating state. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this topic for a solution.

To identify the root cause of the backup failure
  1. Log on to the host as the root user and navigate to /opt/oracle/dcs/bin/.

  2. Determine the sequence of operations performed on the database.

    dbcli list-jobs | grep -i <dbname>

    Note the last job ID listed with a status other than Success.

  3. With the job ID you noted from the previous step, use the following command to check the details of that job:

    dbcli describe-job -i <job_ID> -j

    Typically, running this command is enough to reveal the root cause of the failure.

  4. If you require more information, review the /opt/oracle/dcs/log/dcs-agent.log file.

    You can find the job ID in this file by using the timestamp returned by the job report in step 2.

  5. If the problem details suggest an RMAN issue, review the RMAN logs in the /opt/oracle/dcs/log/<hostname>/rman/bkup/<db_unique_name>/rman_backup/<yyyy-mm-dd> directory.

    Note

    If the database failure is on a 2-node RAC database, perform steps 3 and 4 on both nodes.

Database Service Agent Issues

Your Oracle Cloud Infrastructure Database makes use of an agent framework to allow you to manage your database through the cloud platform. Occasionally you might need to restart the dcsagent program if it has the status of stop/waiting to resolve a backup failure.

To restart the database service agent
  1. From a command prompt, check the status of the agent:

    initctl status initdcsagent
  2. If the agent is in the stop/waiting state, try to restart the agent:

    initctl start initdcsagent
  3. Check the status of the agent again to confirm that it has the start/running status:

    initctl status initdcsagent

Oracle Clusterware Issues

Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. Occasionally you might need to restart the Clusterware program to resolve a backup failure.

To restart the Oracle Clusterware
  1. From command prompt, check the status of Oracle Clusterware:

    crsctl check crs
    crsctl stat res -t
  2. If Oracle Clusterware is not online, try to restart the program:

    crsctl start crs
  3. Check the status of Oracle Clusterware to confirm that it is online:

    crsctl check crs

Object Store Connectivity Issues

Backing up your database to Oracle Cloud Infrastructure Object Storage requires that the host can connect to the applicable Swift endpoint. You can test this connectivity by using a Swift user.

To ensure your database host can connect to the object store
  1. Create a Swift user in your tenancy. See Working with Auth Tokens.
  2. With the user you created in the previous step, use the following command to verify the host can access the object store.

    curl -v -X HEAD -u <user_ID>:'<auth_token>' https://swiftobjectstorage.<region_name>.oraclecloud.com/v1/<object_storage_namespace>

    See Object Storage FAQ for the correct region to use. See Understanding Object Storage Namespaces for information about your Object Storage namespace.

  3. If you cannot connect to the object store, refer to Prerequisites for how to configure object store connectivity.

Host Issues

One or more of the following conditions on the database host can cause backups to fail:

Interactive Commands in the Oracle Profile

If an interactive command such as oraenv, or any command that might return an error or warning message, was added to the .bash_profile file for the grid or oracle user, Database service operations like automatic backups can be interrupted and fail to complete. Check the .bash_profile file for these commands, and remove them.

The File System Is Full

Backup operations require space in the /u01 directory on the host file system. Use the df -h command on the host to check the space available for backups. If the file system has insufficient space, you can remove old log or trace files to free up space.

Incorrect Version of the Oracle Database Cloud Backup Module

Your system might not have the required version of the backup module (opc_installer.jar). See Unable to use Managed Backups in your DB System for details about this known issue. To fix the problem, you can follow the procedure in that section or simply update your DB system and database with the latest bundle patch.

Changes to the Site Profile File (glogin.sql)

Customizing the site profile file ($ORACLE_HOME/sqlplus/admin/glogin.sql) can cause managed backups to fail in Oracle Cloud Infrastructure. In particular, interactive commands can lead to backup failures. Oracle recommends that you not modify this file for databases hosted in Oracle Cloud Infrastructure.

Database Issues

An improper database state or configuration can lead to failed backups.

Database Not Running During Backup

The database must be active and running (ideally on all nodes) while the backup is in progress.

To check that the database is active and running

Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:

srvctl status database -d <db_unique_name> -verbose

The system returns a message including the database's instance status. The instance status must be Open for the backup to succeed. If the database is not running, use the following command to start it:

srvctl start database -d <db_unique_name> -o open

If the database is mounted but does not have the Open status, use the following commands to access the SQL*Plus command prompt and set the status to Open:

sqlplus / as sysdba
alter database open;

Archiving Mode Set to NOARCHIVELOG

When you provision a new database, the archiving mode is set to ARCHIVELOG by default. This is the required archiving mode for backup operations. Check the archiving mode setting for the database and change it to ARCHIVELOG, if applicable.

To check and set the archiving mode

Open an SQL*Plus command prompt and enter the following command:

select log_mode from v$database;

If you need to set the archiving mode to ARCHIVELOG, start the database in Mount status (and not Open status), and use the following command at the SQL*Plus command prompt:

alter database archivelog;

Confirm that the db_recovery_file_dest parameter points to +RECO, and that the log_archive_dest_1 parameter is set to USE_DB_RECOVERY_FILE_DEST.

For RAC databases, one instance must have the Mount status when enabling archivelog mode. To enable archivelog mode for a RAC database, perform the following steps:

  1. Shut down all database instances:

    srvctl stop database -d
  2. Start one of the database instances in mount state:

    srvctl start instance -d <db_unique_name> -i <instance_name> -o mount
  3. Access the SQL*Plus command prompt:

    sqlplus / as sysdba
  4. Enable archive log mode:

    alter database archivelog;
    exit;
  5. Stop the database:

    srvctl stop instance -d <db_unique_name> -i <instance_name>
  6. Restart all database instances:

    srvctl start database -d <db_unqiue_name>
  7. At the SQL*Plus command prompt, confirm the archiving mode is set to ARCHIVELOG:

    select log_mode from v$database;

Stuck Database Archiver Process and Backup Failures

Backups can fail when the database instance has a stuck archiver process. For example, this can happen when the flash recovery area (FRA) is full. You can check for this condition using the srvctl status database -db <db_unique_name> -v command. If the command returns the following output, you must resolve the stuck archiver process issue before backups can succeed:

Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Stuck Archiver

Refer to ORA-00257:Archiver Error (Doc ID 2014425.1) for information on resolving a stuck archiver process.

After resolving the stuck process, the command should return the following output :

Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Open

If the instance status does not change after you resolve the underlying issue with the device or resource being full or unavailable, try one of the following workarounds:

  • Restart the database using the srvctl command to update the status of the database in the clusterware
  • Upgrade the database to the latest patchset levels

Temporary Tablespace Errors

If fixed table statistics are not up to date on the database, backups can fail with errors referencing temporary tablespace present in the dcs-agent.log file. For example:


			select status from v$rman_status where COMMAND_ID=<backup_id>

			ERROR at line 1:
			ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
		

Gather your fixed table statics as follows to resolve this issue:


			conn / as sysdba

		exec dbms_stats.gather_fixed_objects_stats();

RMAN Configuration and Backup Failures

Editing certain RMAN configuration parameters can lead to backup failures in Oracle Cloud Infrastructure. To check your RMAN configuration, use the show all command at the RMAN command line prompt.

See the following list of parameters for details about RMAN the configuration settings that should not be altered for databases in Oracle Cloud Infrastructure.

RMAN configuration settings that should not be altered

					CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 30 DAYS;

					CONFIGURE CONTROLFILE AUTOBACKUP ON;

					CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 5 BACKUP TYPE TO COMPRESSED BACKUPSET;

					CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;

					CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' MAXPIECESIZE 2 G FORMAT   '%d_%I_%U_%T_%t' PARMS  'SBT_LIBRARY=/opt/oracle/dcs/commonstore/pkgrepos/oss/odbcs/libopc.so ENV=(OPC_PFILE=/opt/oracle/dcs/commonstore/objectstore/opc_pfile/1578318329/opc_tiger_iad3c8.ora)';

					CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO 'SBT_TAPE';

					CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;

				CONFIGURE ENCRYPTION FOR DATABASE ON;

RMAN Retention Policy and Backup Failures

The RMAN retention policy configuration can be the source of backup failures. Using the REDUNDANCY retention policy configuration instead of the RECOVERY WINDOW policy can lead to backup failures. Be sure to use the RECOVERY WINDOW OF 30 DAYS configuration.

To configure the RMAN retention policy setting
  1. Find the database ID using the following command:

    dbcli list-databases
  2. Find the BackupConfigId value for the database using the following command:

    dbcli describe-database -i <database_id>
  3. Update the retention policy configuration to RECOVERY WINDOW OF 30 DAYS:

    dbcli update-backupconfig -i <backup_config_id> --recoverywindow 30

Loss of Object Store Wallet File and Backup Failures

RMAN backups fail when an object store wallet file is lost. The wallet file is necessary to enable connectivity to the object store.

To confirm that the object store wallet file exists and has the correct permissions
  1. Find the database ID using the following command:

    dbcli list-databases
  2. Find the BackupConfigId value for the database using the following command:

    dbcli describe-database -i <database_id>
  3. Find the BackupLocation value for the database using the following command:

    dbcli describe-backupconfig <backup_config_id>
  4. Find the file path of the backup config parameter file (opc_<backup_location_value>_BC.ora) using the following command:

    locate opc_<backup_location_value>_BC.ora

    For example:

    
    							[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# locate opc_b9naijWMAXzi9example_BC.ora
    
    							/opt/oracle/dcs/commonstore/objectstore/opc_pfile/13aef284-9d6b-4eb6-8751-2988a9example/opc_b9naijWMAXzi9example_BC.ora
    						
  5. Find the file path to the wallet file in the backup config parameter file by inspecting the value stored in the OPC_WALLET parameter. To do this, navigate to the directory containing the backup config parameter file and use the following cat command:

    cat <backup_config_parameter_file>

    For example:

    
    							[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# cat opc_b9naijWMAXzi9example_BC.ora
    							OPC_HOST=https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/dbbackupiad
    							OPC_WALLET='LOCATION=file:/opt/oracle/dcs/commonstore/objectstore/wallets/13aef284-9d6b-4eb6-8751-2988aexample CREDENTIAL_ALIAS=alias_opc'
    						OPC_CONTAINER=b9naijWMAXzi9example
  6. Confirm that the cwallet.sso file exists in the directory specified in the OPC_WALLET parameter, and confirm that the file has the correct permissions. The file permissions should have the octal value of "600" (-rw-------). Use the following command:

    ls -ltr /opt/oracle/dcs/commonstore/objectstore/wallets/<backup_config_id>

    For example:

    
    							[root@orcl 13aef284-9d6b-4eb6-8751-2988aexample]# ls -ltr /opt/oracle/dcs/commonstore/objectstore/wallets/13aef284-9d6b-4eb6-8751-2988aexample
    
    							total 4
    
    							-rw------- 1 oracle oinstall    0 Apr 20 06:45 cwallet.sso.lck
    
    						-rw------- 1 oracle oinstall 1941 Apr 20 06:45 cwallet.sso

TDE Wallet and Backup Failures

Incorrect TDE Wallet Location Specification

For backup operations to work, the $ORACLE_HOME/network/admin/sqlnet.ora file must contain the ENCRYPTION_WALLET_LOCATION parameter formatted exactly as follows:

ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/opt/oracle/dcs/commonstore/wallets/tde/$ORACLE_UNQNAME)))
Important

In this wallet location entry, $ORACLE_UNQNAME is an environment variable and should not be replaced with an actual value.
To check the TDE wallet location specification

Use the cat command to check the TDE wallet location specification. For example:


					[oracle@orcl tde]$ cat $ORACLE_HOME/network/admin/sqlnet.ora

				ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/opt/oracle/dcs/commonstore/wallets/tde/$ORACLE_UNQNAME)))

Incorrect State of the TDE Wallet

Database backups fail if the TDE wallet is not in the proper state. The following scenarios can cause this problem:

The ORACLE_UNQNAME environment variable was not set when the database was started using SQL*Plus

If the database was started using SQL*Plus, and the ORACLE_UNQNAME environment variable was not set, the wallet is not opened correctly.

To fix the problem, start the database using the srvctl utility:

srvctl start database -d <db_unique_name>
A pluggable database was added with an incorrectly configured master encryption key
In a multitenant environment for Oracle Database versions that support PDB-level keystore, each PDB has its own master encryption key. This encryption key is stored in a single keystore used by all containers. After you create or plug in a new PDB, you must create and activate a master encryption key for it. If you do not do so, the STATUS column in the v$encryption_wallet view shows the value OPEN_NO_MASTER_KEY.

To check the master encryption key status and create a master key, do the following:

  1. Review the the STATUS column in the v$encryption_wallet view, as shown in the following example:

    
    							SQL> alter session set container=pdb2;
    
    							Session altered.
    
    							SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet;
    
    							WRL_TYPE        WRL_PARAMETER                                           STATUS             WALLET_TYPE
    
    							--------------- ------------------------------------------------------- ------------------ ---------
    
    						FILE            /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/ OPEN_NO_MASTER_KEY AUTOLOGIN
  2. Confirm that the PDB is in READ WRITE open mode and is not restricted, as shown in the following example:

    
    							SQL> show pdbs
    
    							CON_ID CON_NAME     OPEN MODE              RESTRICTED
    
    							------ ------------ ---------------------- ---------------
    
    							2      PDB$SEED     READ ONLY              NO
    
    							3      PDB1         READ WRITE             NO
    
    						4      PDB2         READ WRITE             NO

    The PDB cannot be open in restricted mode (the RESTRICTED column must show NO). If the PDB is currently in restricted mode, review the information in the PDB_PLUG_IN_VIOLATIONS view and resolve the issue before continuing. For more information on the PDB_PLUG_IN_VIOLATIONS view and the restricted status, review the documentation on pluggable database for your Oracle database version.

  3. Run the following DBCLI commands to change the status to OPEN:

     $ sudo su –
    							# dbcli list-database
    							# dbcli update-tdekey -i <database_ID> -n <PDB_name> -p
    						

    The update-tdekey command shown will prompt you for the admin password.

  4. Confirm that the status of the wallet has changed from OPEN_NO_MASTER_KEY to OPEN by querying the v$encryption_wallet view as shown in step 1.

Incorrect Configuration Related to the TDE Wallet

Several configuration parameters related to the TDE wallet can cause backups to fail.

Other Causes of Backup Failures

Unmounted Commonstore Mount Point

The mount point /opt/oracle/dcs/commonstore must be mounted, or backups will fail.

To check the commonstore mount point

Confirm that the mount point /opt/oracle/dcs/commonstore is mounted, as shown in the following example:


					[root@orcl ~]# srvctl config filesystem -volume commonstore -diskgroup data

					Volume device: /dev/asm/commonstore-5

					Diskgroup name: data

					Volume name: commonstore

					Canonical volume device: /dev/asm/commonstore-5

					Accelerator volume devices:

					Mountpoint path: /opt/oracle/dcs/commonstore

					Mount point owner: oracle

					Mount users:

				Type: ACFS
To confirm that ora.data.commonstore.acfs is online

The state for ora.data.commonstore.acfs must be online, or backups will fail. Confirm as shown in the following example:


					[root@orcl ~]# crsctl stat resource ora.data.commonstore.acfs -v      

					NAME=ora.data.commonstore.acfs

					TYPE=ora.acfs.type

					LAST_SERVER=orcl

					STATE=OFFLINE

					TARGET=OFFLINE
					...

					STATE_DETAILS=admin unmounted /opt/oracle/dcs/commonstore

					...

					[root@orcl ~]# ls -ltr /opt/oracle/dcs/commonstore

				total 0

If the STATE_DETAILS value is unmounted, mount the file system as shown in the following example:

[root@orcl ~]# srvctl start filesystem -volume commonstore -diskgroup data	

Confirm that the change was successful as shown in the following example:


					[root@orcl ~]# crsctl stat resource ora.data.commonstore.acfs -v

					NAME=ora.data.commonstore.acfs

					TYPE=ora.acfs.type

					LAST_SERVER=orcl

					STATE=ONLINE on orcl

					TARGET=ONLINE

					CARDINALITY_ID=ONLINE

					...

				STATE_DETAILS=mounted on /opt/oracle/dcs/commonstore

List the contents of the commonstore directory to confirm that it is mounted, as shown in the following example:


					[root@orcl ~]# ls -ltr /opt/oracle/dcs/commonstore

					total 220

					drwx------ 2 root   root     65536 Apr 18 10:50 lost+found

					drwx------ 3 oracle oinstall 20480 Apr 18 11:02 wallets

					drwxr-xr-x 3 root   root     20480 Apr 20 06:41 pkgrepos

				drwxr-xr-x 4 oracle oinstall 20480 Apr 20 06:41 objectstore

The Database Is Not Properly Registered

Database backups fail if the database is not registered with the dcs-agent. This scenario can occur if you manually migrate the database to Oracle Cloud Infrastructure and do not run the dbcli register-database command.

To check whether the database is properly registered, review the information returned by running the srvctl config database command and the dbcli list-databases command. If either command does not return a record of the database, contact Oracle Support Services.

For instructions on how to register the database, refer to the following topics:

Getting Help

If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.

To collect database information for use in problem reports

Use the following commands to collect details about your database. Record the output of each command for reference:

dbcli list-databases
dbcli describe-database -i <database_id>
dbcli describe-component
To collect diagnostic information regarding failed jobs
  1. Log on to the host as the root user and navigate to the /opt/oracle/dcs/bin/ directory.

  2. Run the following two commands to generate information about the failed job:

    dbcli list-jobs |grep -i <dbname>
    dbcli describe-job -i <job_ID> -j

    The <job_ID> in the second command should be the ID of the latest failed job reported from the first command.

  3. Run the diagnostics collector script to create a zip file with the diagnostic information for Oracle Support Services.

    diagcollector.py

    This command creates a file named diagLogs-<timestamp>.zip in the /tmp directory.

To collect DCS agent log files

To collect DCS agent log files, do the following:

  1. Log in as opc user.
  2. Run the following command:

    sudo /opt/oracle/dcs/bin/diagcollector.py
  3. The system returns a message indicating that agent logs are available in a zip file at a specified directory. For example:

    
    							[opc@prodpr ~]$ sudo /opt/oracle/dcs/bin/diagcollector.py
    
    							Log files collected to :/tmp/dcsdiag/diagLogs-1234567890.zip
    
    							Logs are being collected to:
    
    						/tmp/dcsdiag/diagLogs-1234567890.zip
To collect TDE configuration details
  1. Run the srvctl getenv database -d <db_unique_name> command and record the output for reference.
  2. Record the output of the view v$encryption_wallet. For example:

    
    							SQL> select status, wrl_parameter,wallet_type from v$encryption_wallet;
    
    							STATUS          WRL_PARAMETER                                           WALLET_TYPE
    
    							--------------- ------------------------------------------------------- ---------
    
    						OPEN            /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/ AUTOLOGIN
  3. Record the output of the output of the ls -ltr <wrl_parameter> command.For example:

    
    							[oracle@patchtst ~]$ ls -ltr /opt/oracle/dcs/commonstore/wallets/tde/example_iadxyz/
    
    							total 28
    
    							-rw------- 1 oracle asmadmin 2400 May  2 09:42 ewallet_2018050209420381_defaultTag.p12
    
    							-rw------- 1 oracle asmadmin 5680 May  2 09:42 ewallet.p12
    
    						-rw------- 1 oracle asmadmin 5723 May  2 09:42 cwallet.sso
To collect the RMAN backup report file

Generate RMAN Backup Report File using the following command: 

dbcli create-rmanbackupreport -i <db_id> -w detailed -rn <report_name>

For example:

[root@patchtst ~]# dbcli create-rmanbackupreport -i 57fvwxyz-9dc4-45d3-876b-5f850example -w detailed -rn bkpreport1

Locate the report file using the dbcli describe-rmanbackupreport -in <report_name> command. The location of the report is given in output. For example:


							[root@patchtst ~]# dbcli describe-rmanbackupreport -in bkpreport1

							Backup Report details                                           

							----------------------------------------------------------------

							ID: b55vwxyz-c49f-4af3-a956-acccdexample

							Report Type: detailed

							Location: Node patchtst: /opt/oracle/dcs/log/patchtst/rman/bkup/example_iadxyz/rman_list_backup_detail/2018-05-02/rman_list_backup_detail_2018-05-02_11-46-51.0359.log

							Database ID: 57fvwxyz-9dc4-45d3-876b-5f850example

						CreatedTime: May 2, 2018 11:46:38 AM UTC