23 Troubleshooting RMAN Operations

Use RMAN message output and dynamic performance views to troubleshoot RMAN operations.

23.1 Interpreting RMAN Message Output

Recovery Manager provides detailed error messages that can aid in troubleshooting problems.

Also, Oracle Database and the third-party media vendors generate useful debugging output of their own. The following discussion explains how to identify and interpret the different errors that you may encounter.

23.1.1 Identifying Types of RMAN Message Output

Output that is useful for troubleshooting failed or unresponsive RMAN jobs is located in several different places.

The following table provides an overview of where to locate message output that can be used to troubleshoot RMAN backup problems.

Table 23-1 Types of Message Output

Type of Output Produced By Location Description

RMAN messages

RMAN

Completed job information is in V$RMAN_STATUS and RC_RMAN_STATUS. Current job information is in V$RMAN_OUTPUT.

When running RMAN from the command line, you can direct output to the following places:

  • Standard output

  • A log file specified by LOG on the command line or the SPOOL LOG command

  • A file created by redirecting RMAN output (for example, in UNIX, using the'>' operator)

Contains actions relevant to the RMAN job and error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an RMAN- prefix. Normal action descriptions do not have a prefix.

You can execute the following PL/SQL to remove all entries from V$RMAN_STATUS:

update node set high_rsr_recid=0
where db_key = our_target_database_db_key ;

The preceding function removes all job-related entries. No rows are visible until new backup jobs are shown in V$RMAN_BACKUP_JOB_DETAILS.

alert_SID.log

Oracle Database

The alert subdirectory of the Automatic Diagnostic Repository (ADR) home

Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records.

Oracle trace file

Oracle Database

The trace subdirectory of the ADR home

Contains detailed output generated by Oracle Database processes. This file is created when an ORA-600 or ORA-3113 error message occurs, whenever RMAN cannot allocate a channel, and when the database fails to load the media management library.

sbtio.log

Third-party media management software

The trace subdirectory of the ADR home

Contains vendor-specific information written by the media management software. This log does not contain Oracle Database or RMAN errors.

Media manager log file

Third-party media management software

The file names for any media manager logs other than sbtio.log are determined by the media management software.

Contains information about the functioning of the media management device

23.1.2 Troubleshooting Long-Running RMAN Operations

RMAN message output provides information about the progress of backup and recovery operations. Use this information to take any required actions to troubleshoot operations that are stuck or awaiting resources.

Certain operations such as backup, restore, recovery, and duplication for large databases typically take a long time to complete. However, it is not always clear if the operation is progressing or waiting on some resources. Starting with Oracle Database Release 18c, RMAN message output contains additional logging information that indicates if a job is waiting on resources. Every 10 minutes, RMAN checks if there is a change in the number of blocks processed. If there no change in the blocks processed, then RMAN displays a message with the associated wait event.

The following is an example of the RMAN output for a RESTORE operation:

allocated channel: c1
channel c1: SID=123 device type=SBT_TAPE
channel c1: WARNING: Oracle Test Disk API

Starting restore at 18-JAN-18

channel c1: starting datafile backup set restore
channel c1: specifying datafile(s) to restore from backup set
channel c1: restoring datafile 00002 to /ade/b/2776899351/oracle/dbs/tbs_ax1.f
channel c1: reading from backup piece 01sov1t4_1_1

***** Hang Detected ***** at 2018-01-18 04:11:23 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No   Event                            Waiting Time(mirco secs)
602      Backup: MML read backup piece    38094371

***** Hang Detected ***** at 2018-01-18 04:11:33 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No   Event                            Waiting Time(mirco secs)
602      Backup: MML read backup piece    48106104

channel c1: piece handle=01sov1t4_1_1 tag=TAG20180118T040804
channel c1: restored backup piece 1
channel c1: restore complete, elapsed time: 00:02:35
Finished restore at 18-JAN-18
released channel: c1

The output indicates that the restore was stuck because of a problem with a media manager read operation. After the read operation completed, the RMAN restore was successful.

23.1.3 Recognizing RMAN Error Message Stacks

RMAN reports errors as they occur. If an error is not retrievable, that is, if RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.

One way to determine whether RMAN encountered an error is to examine its return code. A second way is to search the RMAN output for the string RMAN-00569, which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569 message in the output, then there are no errors.

Starting with Oracle Database 23ai, RMAN uses an in-memory tracing mechanism to diagnose intermittent and hard to diagnose RMAN failures. RMAN has the debug mode enabled by default, and creates an in-memory record of RMAN client and server process traces. When a failure occurs, RMAN dumps the client and server traces in separate files. The RMAN message output provides information about the location of the trace files using these error codes:
  • RMAN-01109

    RMAN client failures are logged as a RMAN-01109 message in the output. RMAN-01109 indicates the file name where the client diagnostics traces are logged

  • RMAN-01110

    RMAN failures during the command execution phase are logged as a RMAN-01110 message in the output. RMAN-01110 indicates the file name where the server diagnostics traces are logged.

  • If the RMAN client fails to connect to a server, then the RMAN output includes the error code RMAN-01111 indicating that RMAN server traces were not dumped

Example 23-1 RMAN Syntax Error

This example shows an RMAN syntax error. The RMAN-00569 message is followed by other error messages that indicate the reason for the error.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01005: syntax error: found ")": expecting one of: "archivelog, backup,
 backupset, controlfilecopy, current, database, datafile, datafilecopy, (, plus, ;, tablespace"
RMAN-01007: at line 1 column 18 file: standard input

Example 23-2 RMAN Error Indicating the Trace File Names

This examples shows an RMAN error message with RMAN-01109 and RMAN-01110 error codes indicating the location of the RMAN client and server process trace files.

RMAN-00571: ===========================================================
RMAN-03002: failure of sql command at 06/08/2023 10:33:29
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 06/08/2023 10:33:29
RMAN-20110: set_stamp set_count conflict
RMAN-01109: RMAN Client Diagnostic Trace file : 
/ade/b/368400417/oracle/log/diag/clients/user_name/host_name/trace/ora_570883_140446658743104.trc
RMAN-01111: RMAN Server Diagnostic Trace file : 
/ade/b/368400417/oracle/log/diag/rdbms/tst/tst/trace/tst_ora_570897.trc

Note:

To avoid storage space related issues, you must ensure that the in-memory trace files are regularly purged. Use the Automatic Diagnostic Repository Command Interpreter (ADRCI) utility to set policies that control automatic purging of diagnostic trace files. See the Oracle Database Utilities to learn how to purge diagnostic data.

23.1.4 Identifying RMAN Error Codes

You can use the error codes in RMAN message stacks to troubleshoot problems with RMAN commands.

Typically, you find the following types of error codes in RMAN message stacks:

  • Errors prefixed with RMAN-

    These are RMAN errors.

  • Errors prefixed with ORA-

    Media manager errors use the ORA- prefix.

  • Errors preceded by the line Additional information:

See Also:

23.1.4.1 RMAN Error Message Numbers

RMAN error messages are prefixed with RMAN-.

The following table indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages Reference.

Table 23-2 RMAN Error Message Ranges

Error Range Cause

01108-01112

Information about in-memory debug tracing of RMAN client and server processes

0550-0999

Command-line interpreter

1000-1999

Keyword analyzer

2000-2999

Syntax analyzer

3000-3999

Main layer

4000-4999

Services layer

5000-5499

Compilation of RESTORE or RECOVER command

5500-5999

Compilation of DUPLICATE command

6000-6999

General compilation

7000-7999

General execution

8000-8999

PL/SQL programs

9000-9999

Low-level keyword analyzer

10000-10999

Server-side execution

11000-11999

Interphase errors between PL/SQL and RMAN

12000-12999

Recovery catalog packages

23.1.4.2 ORA-19511: Media Manager Errors

If a media manager error occurs, ORA-19511 is signaled, and the media manager is expected to provide RMAN a descriptive error. RMAN displays the error passed back to it by the media manager.

For example, you might see this:

ORA-19511: Error received from media manager layer, error text:
   sbtpvt_open_input: file .* does not exist or cannot be accessed, errno = 2

The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, then refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511 errors originate with the media manager, not with Oracle Database. The database just passes on the message from the media manager. The cause can be addressed only by the media management vendor.

If you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:

ORA-19507: failed to retrieve sequential file, handle="c-140148591-20031014-06", parms=""
ORA-27007: failed to open file
Additional information: 7000
Additional information: 2
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7000, errno = 0, sbtopen: backup file not found

The "Additional information" provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 23-3. RMAN again signals the error, as an ORA-19511 Error received from media manager layer error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.

The SBT 1.1 error messages are listed here for your reference. Table 23-3 lists media manager message numbers and their corresponding error text. In the error codes, O/S stands for operating system. The errors marked with an asterisk (*) are internal and are not typically seen during normal operation.

Table 23-3 Media Manager Error Message Ranges

Cause No. Message

sbtopen

7000

7001

7002*

7003

7004

7005

7006

7007

7008

7009

7010

7011

7012*

Backup file not found (only returned for read)

File exists (only returned for write)

Bad mode specified

Invalid block size specified

No tape device found

Device found, but busy; try again later

Tape volume not found

Tape volume is in-use

I/O Error

Can't connect with Media Manager

Permission denied

O/S error for example malloc, fork error

Invalid argument(s) to sbtopen

sbtclose

7020*

7021*

7022

7023

7024*

7025

Invalid file handle or file not open

Invalid flags to sbtclose

I/O error

O/S error

Invalid argument(s) to sbtclose

Can't connect with Media Manager

sbtwrite

7040*

7041

7042

7043

7044*

Invalid file handle or file not open

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtwrite

sbtread

7060*

7061

7062

7063

7064

7065*

Invalid file handle or file not open

EOF encountered

End of volume reached

I/O error

O/S error

Invalid argument(s) to sbtread

sbtremove

7080

7081

7082

7083

7084

7085

7086*

Backup file not found

Backup file in use

I/O Error

Can't connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtremove

sbtinfo

7090

7091

7092

7093

7094

7095*

Backup file not found

I/O Error

Can't connect with Media Manager

Permission denied

O/S error

Invalid argument(s) to sbtinfo

sbtinit

7110*

7111

7112

7113

7114

Invalid argument(s) to sbtinit

O/S error

The current RDBMS version does not support RA_FORMAT=TRUE.

Auto allocated buffers required when using RA_FORMAT=TRUE.

RA_REPLICATION and RA_FORMAT cannot both be set to true.

common error codes

7500

7501

7502

7503

7504

7505

7506

7507

7508

Operator intervention requested

Media Management Error

File not found

File already exists

End of file

cannot proxy copy the specified file

no proxy work in progress

no buffer available

aborting due to signal

Starting with Oracle Database 23ai, native SBT libraries are available in Oracle home directory as part of the database installation. After installing the database, you can configure RMAN channels to use the native SBT libraries for backup and restores with Oracle Cloud, Recovery Appliance, and Amazon S3.

If any issues occur when Oracle uses the native SBT media libraries, RMAN signals ORA-19511 followed by a descriptive error message that is prefixed with KBHS. Error messages prefixed with KBHS provide a clear indication of the cause of the error.

Example 23-3 KBHS Error Message Displayed when Oracle uses Native SBT Libraries to Access Cloud Storage Locations

This example shows an error that occurs when Oracle access a native SBT library. The ORA-19511 is followed by the KBHS-01025 message that indicates the cause.

RMAN-03090: Starting backup at APR 17 2018 16:00:21
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 04/17/2018 16:00:23
ORA-19554: error allocating device, device type: SBT_TAPE, device name: 
ORA-27023: skgfqsbi: media manager protocol error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
KBHS-01025: Parameter OPC_WALLET and OPC_CREDENTIAL_OBJECT cannot both be specified

KBHS error messages are listed here for your reference, and also described in the Oracle Database Error Messages Reference.

Table 23-4 KBHS Error Messages (Range 00100 - 00900)

KBHS - Error No - Message Cause Action

KBHS - 00100 - Parameter string value string is not a multiple of BLKSIZE string specified in SBT parms

CHUNK_SIZE specified in configuration was not a multiple of BLKSIZE

Change BLKSIZE or CHUNK_SIZE configuration and retry the command.

KBHS - 00101 - file string already exists; use FORMAT '%number_%%U' option for unique name

An attempt was made to create a file with a name that is already used.

Use %d_%U in the FORMAT option of the BACKUP command and retry the command.

KBHS - 00103 - Version strings of string is not licensed

The indicated version of the library was not licensed.

Contact Oracle Support Services for assistance.

KBHS - 00104 - License of string version string expired on string

The indicated version of the library has expired.

Renew your license. Contact Oracle Support Services for assistance.

KBHS - 00200 - Product string is not licensed

The system backup tape (SBT) library was not licensed.

Contact Oracle Support Services for assistance.

KBHS - 00201 - The license of string expired on string

The system backup tape (SBT) library license has expired.

Renew your license. Contact Oracle Support Services for assistance.

KBHS - 00202 - cannot create log bucket string; retry after library registration

An attempt was made to re-create the log bucket specified in the license file. This attempt failed.

Retry the command after library registration.

KBHS - 00203 - maximum number of session licenses (string) exceeded

All licenses were in use.

Increase the value of the LICENSE_MAX_SESSIONS parameter.

KBHS - 00204 - data bucket string location conflicts with log bucket string location

Log and Data buckets were in different geographic location.

Delete the bucket that is in conflict and retry the command.

KBHS - 00205 - location of bucket string is string, which conflicts with location string specified

Log or Data bucket were in different geographic location specified.

Re-run installer with location and reRegister option to change geographic location and retry the command.

KBHS - 00206 - file string already exists; use AS option to rename the piece

An attempt was made to create a file with a name that is already used.

Use 'AS backuppiece name' option to rename the backup piece and retry the command.

KBHS - 00207 - parameter string value string conflicts with bucket/container string property string

Bucket/Container with different property specified.

Check the specified parameter in the configuration file.

KBHS - 00208 - server side encryption container string is not supported

Container specified has server side encryption enabled. This is not supported.

Disable server side encryption or retry the command with different container name.

KBHS - 00209 - life cycle tiering policy for container string must exclude XML objects

The specified tiering-enabled container would archive XML objects. This is not supported.

Exclude XML objects in the container LTP.

KBHS - 0400* - NETTEST BACKUP: number bytes sent in number microseconds

Displays NETTEST BACKUP result.

This is an undocumented output for internal usage only. No action required.

KBHS - 00401* - NETTEST RESTORE: number bytes received in number microseconds

Displays NETTEST RESTORE result.

This is an undocumented output for internal usage only. No action required.

KBHS - 00402* - NETTEST successfully completed

NETTEST command returns successfully.

This is an undocumented output for internal usage only. No action required.

KBHS - 00600 - internal error

An internal error in Oracle HTTP SBT occurred.

Contact Oracle Customer Support.

KBHS - 00601 - fatal error in Oracle HTTP SBT

A fatal error has occurred.

This message should be accompanied by other error message(s) indicating the cause of the error. See SBTIO.LOG file for more information.

KBHS - 00700 - HTTP response error 'string'

An HTTP operation returned an error.

See accompanying error messages for more information.

KBHS - 00701 - no host name or port number or host name too long

Host name absent or too long, port number absent or invalid.

Ensure the host name is correct and the port number is valid.

KBHS - 00702 - connect failed because host string is unreachable

The address specified was unreachable or not valid, or the service at the address was unavailable.

Verify that HOST and PROXY parameters are valid.

KBHS - 00703 - unable to connect to HTTP server string; received ORA-string

Connect to HTTP server failed or timed out.

Verify HOST and PROXY parameter values. See ORA error description for more information.

KBHS - 00704 - unable to do register connection; received ORA-string

Registering the connection failed.

See ORA error description for more information.

KBHS - 00705 - connection closed or lost contact; received ORA-string

HTTP server has disconnected.

See ORA error description for more information.

KBHS - 00706 - unable to wait for notification; received ORA-string

Wait for event notification on registered connection failed.

See ORA error description for more information.

KBHS - 00707 - HTTP response timed out; waited string seconds

The HTTP server did not respond within the specified timeout.

This is usually a HTTP server issue. Contact HTTP server administrator or increase the timeout value.

KBHS - 00708 - unable to receive data from HTTP server; error ORA-string

Receive from HTTP server failed.

See ORA error description for more information.

KBHS - 00709 - unable to send data to HTTP server; error ORA-string

HTTP send operation to server failed.

See ORA error description for more information.

KBHS - 00710 - unable to flush data from buffer; error ORA-string

Flushing data from local buffer to HTTP server failed.

See ORA error description for more information.

KBHS - 00711 - unable to close HTTP session; error ORA-string

HTTP close session operation failed.

See ORA error description for more information.

KBHS - 00712 - ORA-string received from local HTTP service

An error occurred during HTTP processing.

See accompanying error messages for more information.

KBHS - 00713 - HTTP client error 'string'

The HTTP response indicated an HTTP client error.

Fix the HTTP client error and retry the HTTP request.

KBHS - 00714 - HTTP server error 'string'

The HTTP response indicated an HTTP server error.

Fix the HTTP server error and retry the HTTP request. Contact the administrator of the HTTP server when necessary.

KBHS - 00715 - HTTP error occurred 'string'

The specified HTTP protocol error happened.

See accompanying error messages for more information.

KBHS - 00717 - file name string contains an illegal character string

The specified file name contained a character that was not allowed within an SBT backup file name.

Use a file name that does not contain the specified illegal character and retry the command.

KBHS - 00718 - operation failed, retry possible

A HTTP HEAD, DELETE, PUT or GET operation failed with an error. The operation may be retried.

This message is used by SBT HTTP API to decide whether or not to retry the operation.

KBHS - 00719 - Error 'string'; string

Description of error returned by HTTP operation.

See accompanying error messages for more information.

KBHS - 00720 - bucket cannot be accessed using end-point for request string

The end-point specified in the HOST parameter was not the home location for the bucket in the URL.

Change HOST to the home location of the bucket.

KBHS - 00721 - resource busy, retry possible

An HTTP HEAD, DELETE, PUT or GET operation failed because the resource was busy. The operation may be retried.

This message is used by SBT HTTP API to decide whether or not to retry the operation.

KBHS - 00722 - missing or invalid EC2 instance metadata for IAM role string

The metadata corresponding to the specified IAM role does not exist or the temporary security credentials are invalid.

Verify IAM role name.

KBHS - 00723 - missing header string in HTTP response

The expected header is not available in the HTTP response.

Check if the submitted HTTP request is appropriate.

KBHS - 00724 - invalid value string for header string

header value is invalid

Check if the submitted HTTP request is appropriate.

KBHS - 00900 - virtual file initialization failed

Unable to initialize virtual file.

Contact Oracle Customer Support.

Table 23-5 KBHS Error Messages (Range 01000-01803)

KBHS - Error No - Message Cause Action

KBHS-01000 - value of parameter string must be a number; specified string

The value given for the specified parameter was not numeric.

Change the value of the parameter to numeric and retry the command.

KBHS-01001 - value of parameter string must be TRUE or FALSE; specified string

The value given for the specified parameter was not TRUE or FALSE.

Change the parameter to TRUE or FALSE and retry the command.

KBHS - 01002 - unable to open string parameter file string

Encountered an error while trying to open the parameter file.

Ensure that the specified file exists and has read permission.

KBHS - 01003 string

An error occurred when parsing a parameter.

See accompanying error messages indicating the cause of error.

KBHS-01004 - LRM-string occurred when parsing a parameter

An error occurred when parsing a parameter.

See accompanying error messages indicating the cause of error.

KBHS-01005 - syntax error was encountered while parsing string parameter file string

An error occurred when parsing parameter file.

See accompanying error messages for more information.

KBHS-01006 - Parameter string was not specified

Failed to specify the indicated parameter.

Specify the value of the parameter and retry the command.

KBHS-01008 - syntax error was encountered while parsing string parameter string

An error occurred when parsing parameter string.

See accompanying error messages for more information.

KBHS-01009 - Parameter string must not be specified as sub-domain format

The specified parameter cannot be used in sub-domain.

Remove the sub-domain format and retry the command.

KBHS-01010 - user name or password was not specified in the wallet

user name or password specified in the wallet was null.

Ensure that username and password is not null and retry the command.

KBHS-01012 - ORA-string occurred during wallet operation; WRL string

An operation on the wallet failed due to indicated error.

Refer to indicated oracle message for more information.

KBHS-01013 - specified string alias string not found in wallet

The specified WALLET alias did not appear in the wallet.

Check the WALLET alias or create an alias in the wallet for the specified attribute and retry the command.

KBHS-01014 - Parameter string must contain string attribute

The WALLET parameter was missing the mandatory attribute.

Add the missing attribute and retry the command.

KBHS-01015 - value of parameter string is out of range (string - string)

The value given for the specified parameter was not valid.

Change the value of the parameter within the range and retry the command.

KBHS-01016 - syntax error was encountered while parsing parameter string

An error occurred when parsing parameter.

See accompanying error messages for more information.

KBHS-01017 - Parameter string value string contains too many attributes

The WALLET parameter contains many attributes.

Check the syntax of WALLET parameter and retry the command.

KBHS-01018 - parameter string must not be specified

The indicated parameter should not be specified.

Remove the indicated parameter and retry the command.

KBHS-01019 - value of parameter string is invalid; specified string

The value given for the specified parameter was not valid.

Change the parameter to valid value and retry the command.

KBHS-01020 - value of parameter string must be TRUE for authentication string

The value given for the specified parameter was not valid.

Change the parameter to valid value and retry the command.

KBHS-01021 - AWS region could not be determined

AWS region could not be determined.

Change authentication scheme and retry the command.

KBHS-01022 - endpoint to restore archive objects cannot be constructed

HOST parameter is not under the v1 namespace.

Use HOST parameter under v1 namespace.

KBHS - 01023 - Either parameter string or parameter string must be specified

Specifying one of the parameter is mandatory.

Specify one of the parameter and retry the command.

KBHS - 01024 - invalid credential object [schema = string, name = string]

user name or password not specified in the credential object.

Ensure that the credential object is valid and retry the command.

KBHS - 01025 - Parameter string and string cannot both be specified

Cannot specify both parameter.

Specify only one of the parameter and retry the command.

KBHS - 01026 - SSLWallet location could not be retrieved from database

SSLWallet location was not found in database.

Ensure that SSLWallet location is specified in database and retry the command.

KBHS - 01100 - end-of-file for backup piece string

There were no more chunks for this backup piece to read or read bytes specified in the metadata file.

If the end-of-file was reached prematurely, check if all the chunks are present in the backup piece.

KBHS - 01102 - unable to read backup piece string because of missing chunk string

Unable to read the backup piece because the specified chunk was found to be missing.

Retry the command after restoring the chunk file in the specified location.

KBHS - 01103 - backup piece string missing

Unable to read the backup piece because the specified piece was found to be missing.

Retry the command after restoring the file in the specified location.

KBHS - 01104 - file string is not in backup piece format

Unable to read the backup piece header because the specified file was not in backup piece format.

Retry the command after restoring the file in the specified location.

KBHS - 01105 - Block Pool format restore does not support password decryption

When a backup is performed using RA_FORMAT=TRUE and encryption, the restore requires Transparent Data Encryption/Decryption.

Retry the command without password.

KBHS - 01300 - unable to read backup piece string because of missing metadata file string

The backup piece could not be read because the metadata file was missing.

Retry the command after restoring the metadata file in the specified location.

KBHS - 01400 - Unable to load message file from virtual file system

Unable to find the message file in virtual file system.

Contact Oracle Customer Support.

KBHS - 01401 - memory allocation failure

Insufficient memory available to satisfy requested allocation.

Terminate other processes to free up memory or add memory to the system.

KBHS - 01402 - Oracle error occurred while converting a date: ORA-number: string

An internal error occurred while converting a date.

Contact Oracle Support Services.

KBHS - 01403 - could not open trace file string

An error occurred when trying to open the trace file.

Check that the directory exists and that the file has write permissions.

KBHS - 01405 - Automatic retry wait time limit string reached

An attempt was made to automatically retry the operation but the configured wait time limit was reached.

Retry the operation manually.

KBHS - 01406 - value of parameter string must be between string and string; specified string

The length of the parameter was not within the limits.

Specify a value of length within the limits and retry the command.

KBHS - 01407 - bucket name string contains an illegal character string

One of the following restrictions were not followed in the specified bucket name:
  • Can only contain lowercase letters, numbers, periods(.) and dashes(-).
  • Must start with a number or letter.
  • Should not end with a dash(-)
  • Cannot have dashes appear next to periods
  • Cannot be in an IP address style (eg. 192.168.5.4)

Use a bucket name that follows the restriction and retry the command.

KBHS - 01408 - log in to Recovery Appliance failed with Oracle error:string

An Oracle error was reported while attempting to log in to the Recovery Appliance.

Follow the actions for the specified Oracle error.

KBHS - 01409 - Oracle error reported from Recovery Appliance while executing string

An Oracle error was reported when executing an Oracle Call Interface (OCI) operation in Recovery Appliance.

Follow the actions for the specified Oracle error.

KBHS - 01410 - Oracle HTTP Server not running in Recovery Appliance

Oracle HTTP Server was not running in Recovery Appliance.

Start the HTTP Server in Recovery Appliance and retry the command.

KBHS - 01411 - Recovery Appliance access was not granted

Recovery Appliance access was not granted to this user.

Grant database access to this user and then retry the command.

KBHS - 01412 - failed for connection string (@string)

This is an informational message. This precedes error 1410 or 1411.

No action is required.

KBHS - 01413 - connect string could not be resolved

The specified host is not resolvable.

Check the spelling of the host name or the IP address. Make sure that the host name or the IP address is resolvable on the Recovery Appliance.

KBHS - 01500 - syntax error was encountered while parsing SBT command string

An error occurred when parsing SBT command.

See accompanying error messages for more information.

KBHS - 01600 - backup piece string header validation failed

The header was not recognized as a valid backup piece header. One reason could be that the backup piece was converted at the target and a converted backup piece cannot be transmitted.

Ensure the backup piece is not converted and retry the backup operation.

KBHS - 01601 - Data Pump dump file backup piece string is not supported

Data Pump dump file backup was requested. This cannot be transmitted and is not supported.

Ensure the backup piece does not include a Data Pump dump file and retry the backup operation.

KBHS - 01602 - backup piece string is not encrypted

Backups sent to Oracle Public Cloud Storage was not encrypted. Only encrypted backups can be stored in Oracle Public Cloud Storage.

Configure RMAN to create encrypted backups and retry the command.

KBHS - 01603 - Incremental backups that include control or spfile not supported

Backups with RA_FORMAT=TRUE do not support mixing data files with either control or sp files.

Configure RMAN to use AUTOBACKUP ON when using RA_FORMAT=TRUE.

KBHS - 01605 - The current ZDLRA version does not support RA_FORMAT=TRUE.

When RA_FORMAT=TRUE is set, the ZDLRA needs to be patched to accept the new formatted backups.

Update the ZDLRA software to the appropriate version.

KBHS - 01606 - Invalid compression algorithm when using RA_FORMAT=TRUE.

When RA_FORMAT=TRUE is set, the only supported compression algorithm is LOW.

Set the RMAN compression algorithm to LOW.

KBHS - 01607 - Password encryption is not supported when using RA_FORMAT=TRUE.

When RA_FORMAT=TRUE is set, the only supported encryption method is Transparent Data Encryption.

Use Transparent Data Encryption instead of password encryption.

KBHS - 01608 - Previously compressed backups not supported with RA_FORMAT=TRUE.

When RA_FORMAT=TRUE is set, backups that are already compressed are not supported.

Take the initial backup without compression.

KBHS - 01609 - Previously encrypted backups not supported with RA_FORMAT=TRUE.

When RA_FORMAT=TRUE is set, backups that are already encrypted are not supported.

Take the initial backup without encryption.

KBHS - 01610 - MAXPIECESIZE backups not supported

Backups with RA_FORMAT=TRUE do not support backups that also have MAXPIECESIZE.

If you need maxpiecesize backups, configure RMAN without RA_FORMAT=TRUE.

KBHS - 01700 - Error occurred in XML processing string

An error occurred when processing the XML document.

Check the given error message and fix the appropriate problem.

KBHS - 01701 - XML parsing failed

XML parser returned an error while trying to parse the document.

Check if the document to be parsed is valid.

KBHS - 01800 - unmatched quote in parameter string

Missing close quote was encountered.

Add missing quote and retry command.

KBHS - 01801 - invalid keyword (string)

This is an informational message indicating unrecognized keyword was encountered.

Correct the syntax and retry the command.

KBHS - 01802 - incomplete/malformed command

This is an informational message indicating the command was not complete.

Correct the syntax and retry the command.

KBHS - 01803 - invalid parameter (string)

This is an informational message indicating the identifier token that caused a syntax error.

Correct the syntax and retry the command.

23.1.5 Interpreting RMAN Error Stacks

It is important to identify the relevant messages in the RMAN error stack.

Note the following tips and suggestions while interpreting RMAN messages:

  • Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.

  • When you are using an SBT 1.1 media management layer and you are presented with SBT 1.1 style error messages containing the "Additional information:" numeric error codes, look for the ORA-19511 message that follows for the text of error messages passed back to RMAN by the media manager. These messages identify the real failure in the media management layer.

  • Look for the RMAN-03002 or RMAN-03009 message (RMAN-03009 equals RMAN-03002 but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generate RMAN-00558.

  • Identify the basic type of error according to the error range chart in Table 23-2 and then refer to the error messages for information about the most important messages.

See Also:

23.1.5.1 Interpreting RMAN Errors: Example

Errors prefixed by RMAN- indicate errors caused by RMAN commands.

You attempt a backup of tablespace users and receive the following message:

Starting backup at 29-AUG-13
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 08/29/2013 15:14:03
RMAN-20202: tablespace not found in the recovery catalog
RMAN-06019: could not translate tablespace name "USESR"

The RMAN-03002 error indicates that the BACKUP command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace users appears in the recovery catalog because you mistyped the name as usesr.

23.1.5.2 Interpreting Server Errors: Example

Errors from the server are prefixed with ORA-.

Assume that you attempt to recover a tablespace and receive the following errors:

RMAN> RECOVER TABLESPACE users;

Starting recover at 29-AUG-13
using channel ORA_DISK_1

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 08/29/2013 15:18:43
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed tablespace USERS
ORA-00283: recovery session canceled due to errors
ORA-01124: cannot recover data file 8 - file is in use or recovery
ORA-01110: data file 8: '/oracle/oradata/trgt/users01.dbf'

As suggested, you start reading from the bottom up. The ORA-01110 message explains there was a problem with the recovery of data file users01.dbf. The second error indicates that the database cannot recover the data file because it is in use or being recovered. The remaining RMAN errors indicate that the recovery session was canceled due to the server errors. Hence, you conclude that because you were not recovering this data file, the problem must be that the data file is online and you must take it offline and restore a backup.

23.1.5.3 Interpreting SBT 2.0 Media Management Errors: Example

This example shows how to interpret errors caused at the media manager level.

Assume that you use a tape drive and see the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-19624: operation failed, retry possible
ORA-19507: failed to retrieve sequential file, handle="/tmp/mydir", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received from media manager layer, error text:
  sbtpvt_open_input:file /tmp/mydir does not exist or cannot be accessed, errno=2

The error text displayed following the ORA-19511 error is generated by the media manager and describes the real source of the failure. See the media manager documentation to interpret this error.

23.1.5.4 Interpreting SBT 1.1 Media Management Errors: Example

This example shows the output of a backup job that has errors media management errors.

Assume that you use a tape drive and see the following output during a backup job:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c1 channel at 09/04/2013 13:18:19
ORA-19506: failed to create sequential file, name="07d36ecp_1_1", parms=""
ORA-27007: failed to open file
SVR4 Error: 2: No such file or directory
Additional information: 7005
Additional information: 1
ORA-19511: Error received from media manager layer, error text:
   SBT error = 7005, errno = 2, sbtopen: system error

The main information of interest returned by SBT 1.1 media managers is the error code in the "Additional information" line:

Additional information: 7005

Referring to Table 23-3, you discover that error 7005 means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.

Note:

The sbtio.log contains information written by the media management software, not Oracle Database. Thus, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log, then contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you must take to have the media manager errors appear in sbtio.log.

23.1.6 Identifying RMAN Return Codes

One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.

How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you run UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status. The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.

23.2 Using V$ Views for RMAN Troubleshooting

When LIST, REPORT, and SHOW do not provide all the information that you need for RMAN operations, some V$ views can provide useful details.

Sometimes it is useful to identify exactly what a server session performing a backup and recovery job is doing. The views described in the following table are useful for obtaining information about RMAN jobs.

Table 23-6 Useful V$ Views for Troubleshooting

View Description

V$PROCESS

Identifies currently active processes

V$SESSION

Identifies currently active sessions. Use this view to determine which database server sessions correspond to which RMAN allocated channels.

V$SESSION_WAIT

Lists the events or resources for which sessions are waiting

You can use the preceding views to perform the following tasks:

23.2.1 Monitoring RMAN Interaction with the Media Manager

You can use the event names in the dynamic performance event views to monitor RMAN calls to the media management API. The event names have one-to-one correspondence with SBT functions.

See the following example:

Backup: MML v1 open backup piece
Backup: MML v1 read backup piece
Backup: MML v1 write backup piece
Backup: MML v1 query backup piece
Backup: MML v1 delete backup piece
Backup: MML v1 close backup piece
.
.
.

To obtain the complete list of SBT events, you can use the following query:

SELECT NAME 
FROM   V$EVENT_NAME 
WHERE  NAME LIKE '%MML%';

Before making a call to any of functions in the media management API, the server adds a row in V$SESSION_WAIT, with the STATE column including the string WAITING. The V$SESSION_WAIT.SECONDS_IN_WAIT column shows the number of seconds that the server has been waiting for this call to return. After an SBT function is returned from the media manager, this row disappears.

A row in V$SESSION_WAIT corresponding to an SBT event name does not indicate a problem, because the server updates these rows at run time. The rows appear and disappear as calls are made and returned. However, if the SECONDS_IN_WAIT column is high, then the media manager may be suspended.

To monitor the SBT events, you can run the following SQL query:

COLUMN EVENT FORMAT a17
COLUMN SECONDS_IN_WAIT FORMAT 999
COLUMN STATE FORMAT a15
COLUMN CLIENT_INFO FORMAT a30

SELECT p.SPID, s.EVENT, s.SECONDS_IN_WAIT AS SEC_WAIT, 
       sw.STATE, s.CLIENT_INFO
FROM   V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
WHERE  sw.EVENT LIKE '%MML%'
AND    s.SID=sw.SID
AND    s.PADDR=p.ADDR;

Examine the SQL output to determine which SBT functions are waiting. For example, the following output indicates that RMAN has been waiting for the sbtbackup function to return for 10 minutes:

SPID EVENT             SEC_WAIT   STATE           CLIENT_INFO
---- ----------------- ---------- --------------- ------------------------------
8642 Backup: MML creat 600        WAITING         rman channel=ORA_SBT_TAPE_1

Note:

The V$SESSION_WAIT view shows only database events, not media manager events.

See Also:

Oracle Database Reference for descriptions of the V$SESSION_WAIT view.

23.2.2 Correlating Server Sessions with RMAN Channels

To identify which server sessions correspond to which RMAN channels, you can query V$SESSION and V$PROCESS.

The SPID column of V$PROCESS identifies the operating system ID number for the process or thread. For example, on UNIX the SPID column shows the process ID, whereas on Windows the SPID column shows the thread ID. You have two basic methods for obtaining this information, depending on whether you have multiple RMAN sessions active concurrently.

This section contains the following topics:

23.2.2.1 Matching Server Sessions with Channels When One RMAN Session Is Active

When only one RMAN session is active, the easiest method for determining the server session ID for an RMAN channel is to query the target database.

Run the following query on the target database while the RMAN job is executing:

COLUMN CLIENT_INFO FORMAT a30
COLUMN SID FORMAT 999
COLUMN SPID FORMAT 9999

SELECT s.SID, p.SPID, s.CLIENT_INFO
FROM   V$PROCESS p, V$SESSION s
WHERE  p.ADDR = s.PADDR
AND    CLIENT_INFO LIKE 'rman%';

The following shows sample output:

 SID SPID         CLIENT_INFO
---- ------------ ------------------------------
  14 8374         rman channel=ORA_SBT_TAPE_1

If you set an ID using the RMAN SET COMMAND ID command instead of using the system-generated default ID, then search for that value in the CLIENT_INFO column instead of 'rman%'.

23.2.2.2 Matching Server Sessions with Channels in Multiple RMAN Sessions

If multiple RMAN sessions are active, then the V$SESSION.CLIENT_INFO column can yield the same information for a channel in each session.

For example:

 SID SPID         CLIENT_INFO
---- ------------ ------------------------------
  14 8374         rman channel=ORA_SBT_TAPE_1
   9 8642         rman channel=ORA_SBT_TAPE_1

In this case, you have the following methods for determining which channel corresponds to which SID value.

23.2.2.2.1 Obtaining the Channel ID from the RMAN Output

You must first obtain the sid values from the RMAN output and then use these values in your SQL query.

To correlate a process with a channel during a backup:

  1. In an active session, run the RMAN job as usual and examine the output to get the SID for the channel. For example, the output may show:
    Starting backup at 21-AUG-13
    allocated channel: ORA_SBT_TAPE_1
    channel ORA_SBT_TAPE_1: sid=14 devtype=SBT_TAPE
    
  2. Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing. For example, enter:
    COLUMN CLIENT_INFO FORMAT a30
    COLUMN SID FORMAT 999
    COLUMN SPID FORMAT 9999
    
    SELECT s.SID, p.SPID, s.CLIENT_INFO
    FROM   V$PROCESS p, V$SESSION s
    WHERE  p.ADDR = s.PADDR
    AND    CLIENT_INFO LIKE 'rman%'
    /
    

    Use the sid value obtained from the first step to determine which channel corresponds to which server session:

           SID SPID         CLIENT_INFO
    ---------- ------------ ------------------------------
            14 2036         rman channel=ORA_SBT_TAPE_1
            12 2066         rman channel=ORA_SBT_TAPE_1
23.2.2.2.2 Correlating Server Sessions with Channels by Using SET COMMAND ID

You specify a command ID string in the RMAN backup script. You can then query V$SESSION.CLIENT_INFO for this string.

To correlate a process with a channel during a backup:

  1. In each session, set the COMMAND ID to a different value after allocating the channels and then back up the desired object. For example, enter the following in session 1:
    RUN 
    {
      ALLOCATE CHANNEL c1 TYPE disk;
      SET COMMAND ID TO 'sess1';
      BACKUP DATABASE;
    }
    

    Set the command ID to a string such as sess2 in the job running in session 2:

    RUN 
    {
      ALLOCATE CHANNEL c1 TYPE sbt;
      SET COMMAND ID TO 'sess2';
      BACKUP DATABASE;
    }
    
  2. Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing. For example, enter:
    SELECT SID, SPID, CLIENT_INFO 
    FROM   V$PROCESS p, V$SESSION s 
    WHERE  p.ADDR = s.PADDR 
    AND    CLIENT_INFO LIKE '%id=sess%';
    

    If you run the SET COMMAND ID command in the RMAN job, then the CLIENT_INFO column displays in the following format:

    id=command_id,rman channel=channel_id
    

    For example, the following shows sample output:

     SID SPID         CLIENT_INFO
    ---- ------------ ------------------------------
      11 8358         id=sess1
      15 8638         id=sess2
      14 8374         id=sess1,rman channel=c1
       9 8642         id=sess2,rman channel=c1
    

    The rows that contain the string rman channel show the channel performing the backup. The remaining rows are for the connections to the target database.

See Also:

Oracle Database Backup and Recovery Reference for SET COMMAND ID syntax, and Oracle Database Reference for more information about V$SESSION and V$PROCESS

23.3 Testing the Media Management API

On some platforms, Oracle provides a diagnostic tool called sbttest. This utility performs a simple test of the media management software by acting as the Oracle database server and attempting to communicate with the media manager.

This section contains the following topics:

23.3.1 Obtaining the sbttest Utility

The default location of the sbttest utility depends on the platform.

On UNIX, the sbttest utility is typically located in $ORACLE_HOME/bin. If for some reason the utility is not included with your platform, then contact Oracle Support Services to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.

On platforms such as Solaris, you do not have to relink when using sbttest. On other platforms, relinking may be necessary.

23.3.2 Obtaining Online Documentation for the sbttest Utility

Use the sbttest command, without arguments, to list the various arguments for this program.

For online documentation of sbttest, issue the following on the command line:

% sbttest

The program displays the list of possible arguments for the program:

Error: backup file name must be specified
Usage: sbttest backup_file_name # this is the only required parameter
               <-dbname database_name>
               <-trace trace_file_name>
               <-remove_before>
               <-no_remove_after> 
               <-read_only>
               <-no_regular_backup_restore>
               <-no_proxy_backup>
               <-no_proxy_restore>
               <-file_type n>
               <-copy_number n>
               <-media_pool n>
               <-os_res_size n>
               <-pl_res_size n>
               <-block_size block_size> 
               <-block_count block_count>
               <-proxy_file os_file_name bk_file_name 
                           [os_res_size pl_res_size block_size block_count]>
               <-libname sbt_library_name>

The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:

Optional parameters:
  -dbname  specifies the database name which will be used by SBT
           to identify the backup file. The default is "sbtdb"
  -trace   specifies the name of a file where the Media Management 
           software will write diagnostic messages.

23.3.3 Using the sbttest Utility

Use sbttest to perform a quick test of the media manager.

If sbttest returns 0, then the test ran without error, which means that the media manager is correctly installed and can accept a data stream and return the same data when requested. If sbttest returns a nonzero value, then either the media manager is not installed or it is not configured correctly.

To use sbttest:

  1. Confirm that the program is installed and included in the system path by typing sbttest at the command line:
    % sbttest
    

    If the program is operational, then you see a display of the online documentation.

  2. Execute the program, specifying any of the arguments described in the online documentation. For example, enter the following to create test file some_file.f and write the output to sbtio.log:
    % sbttest some_file.f -trace sbtio.log
    

    You can also test a backup of an existing data file. For example, this command tests data file tbs_33.f of database prod:

    % sbttest tbs_33.f -dbname prod
    
  3. Examine the output. If the program encounters an error, then it provides messages describing the failure. For example, if the database cannot find the library, you see:
    libobk.so could not be loaded. Check that it is installed properly, and that
     LD_LIBRARY_PATH environment variable (or its equivalent on your platform)
     includes the directory where this file can be found. Here is some additional
     information on the cause of this error:
    ld.so.1: sbttest: fatal: libobk.so: open failed: No such file or directory
    

In some cases, sbttest can work but an RMAN backup does not. The reasons can be the following:

  • The user who starts sbttest is not the owner of the Oracle Database processes.

  • If the database server is not linked with the media management library or cannot load it dynamically when needed, then RMAN backups to the media manager fail, but sbttest may still work.

  • The sbttest program passes all environment parameters from the shell but RMAN does not.

23.4 Terminating an RMAN Command

There are several ways to terminate an RMAN command in the middle of execution.

They include the following:

  • The preferred method is to press Ctrl+C (or the equivalent "attention" key combination for your system) in the RMAN interface. This also terminates allocated channels, unless they are suspended in the media management code, as happens when, for example, they are waiting for a tape to be mounted.

  • You can end the server session corresponding to the RMAN channel by running the SQL ALTER SYSTEM KILL SESSION statement as described in Terminating the Session with ALTER SYSTEM KILL SESSION.

  • You can terminate the server session corresponding to the RMAN channel on the operating system as described in Terminating the Session at the Operating System Level.

23.4.1 Terminating the Session with ALTER SYSTEM KILL SESSION

To terminate an RMAN session by using the ALTER SYSTEM statement, you need the Oracle session ID for the RMAN channel and the serial number. This information is contained in the RMAN log for messages.

Search for messages with the format shown in the following example:

channel ch1: sid=15 devtype=SBT_TAPE

The sid and devtype are displayed for each allocated channel. The Oracle Database sid is different from the operating system process ID. You can end the session using a SQL ALTER SYSTEM KILL SESSION statement.

ALTER SYSTEM KILL SESSION takes two arguments, the sid printed in the RMAN message and a serial number, both of which can be obtained by querying V$SESSION.

For example, run the following statement, where sid_in_rman_output is the number from the RMAN message:

SELECT SERIAL# 
FROM   V$SESSION 
WHERE  SID=sid_in_rman_output;

Then, run the following statement, substituting the sid_in_rman_output and serial number obtained from the query:

ALTER SYSTEM KILL SESSION 'sid_in_rman_output,serial#';

This statement has no effect on the session if the session stopped in media manager code.

23.4.2 Terminating the Session at the Operating System Level

Finding and terminating the processes that are associated with the server sessions is operating system-specific. On some platforms, the server sessions are not associated with any processes at all. See your operating system-specific documentation for more information.

23.4.3 Terminating an RMAN Session That Is Not Responding in the Media Manager

You may sometimes need to terminate an RMAN job that is not responding in the media manager. The best way to terminate RMAN when the channel connections are not responding in the media manager is to terminate the session in the media manager.

If this action does not solve the problem, then on some platforms, such as Linux, you may be able to terminate the Oracle Database processes of the connections. (Terminating the Oracle processes may cause problems with the media manager. See your media manager documentation for details.)

23.4.3.1 Components of an RMAN Session

The nature of an RMAN session depends on the operating system.

In UNIX, an RMAN session has the following processes associated with it:

  • The RMAN client process itself

  • The default channel, the initial connection to the target database

  • One target connection to the target database corresponding to each allocated channel

  • The catalog connection to the recovery catalog database, if you use a recovery catalog

  • An auxiliary connection to an auxiliary instance, during DUPLICATE or TSPITR operations

  • A polling connection to the target database, used for monitoring RMAN command execution on the various allocated channels. By default, RMAN makes one polling connection. RMAN makes additional polling connections if you use different connect strings in the ALLOCATE CHANNEL or CONFIGURE CHANNEL commands. One polling connection exists for each distinct connect string used in the ALLOCATE CHANNEL or CONFIGURE CHANNEL command.

23.4.3.2 Process Behavior During a Suspended Job

RMAN usually stops responding because a channel connection is waiting in the media manager code for a tape resource. The catalog connection and the default channel appear to suspend, because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.

If you terminate the RMAN process itself, then you also terminate the catalog connection, the auxiliary connection, the default channel, and the polling connections. If target and auxiliary connections are suspended but not while executing media manager code, they also terminate. If either the target connection or any of the auxiliary connections are executing in the media management layer, then they do not terminate until the processes are manually terminated at the operating system level.

Not all media managers can detect the termination of the Oracle Database process. Those which cannot may keep resources busy or continue processing. Consult your media manager documentation for details.

Terminating the catalog connection does not cause the RMAN process to terminate because RMAN is not performing catalog operations while the backup or restore is in progress. Removing default channel and polling connections causes the RMAN process to detect that a channel is no longer present and then to exit. In this case, the connections to the unresponsive channels remain active as described previously.

23.4.3.3 Terminating an RMAN Session: Basic Steps

After the unresponsive channels in the media manager code are terminated, the RMAN process detects this termination and exits, removing all connections except target connections that are still operative in the media management layer.

The warning about the media manager resources still applies in this case.

To terminate an Oracle Database process that is not responding in the media manager:

  1. Query V$SESSION and V$SESSION_WAIT, as described in "Using V$ Views for RMAN Troubleshooting". For example, execute the following query:
    COLUMN EVENT FORMAT a17
    COLUMN SECONDS_IN_WAIT FORMAT 999
    COLUMN STATE FORMAT a10
    COLUMN CLIENT_INFO FORMAT a30
    
    SELECT p.SPID, s.EVENT, s.SECONDS_IN_WAIT AS SEC_WAIT, 
           sw.STATE, s.CLIENT_INFO
    FROM   V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p
    WHERE  sw.EVENT LIKE '%MML%'
    AND    s.SID=sw.SID
    AND    s.PADDR=p.ADDR;
    

    Examine the SQL output to determine which SBT functions are waiting. For example, the output may be as follows:

    SPID EVENT             SEC_WAIT   STATE      CLIENT_INFO
    ---- -----------------  ---------- ---------- -----------------------------
    8642 Backup:MML write   600        WAITING    rman channel=ORA_SBT_TAPE_1
    8374 Backup:MML write   600        WAITING    rman channel=ORA_SBT_TAPE_2
    
  2. Using operating system-level tools appropriate to your platform, end the unresponsive sessions. For example, on Linux execute a kill -9 command:
    % kill -9 8642 8374
    

    Some platforms include a command-line utility called orakill that enables you to terminate a specific thread. From a command prompt, run the following command, where sid identifies the database instance to target, and the thread_id is the SPID value from the query in Step 1:

    orakill sid thread_id
    
  3. Check that the media manager also clears its processes. If any remain, the next backup or restore operation may freeze again, due to the previous problems in the backup or restore operation. In some media managers, the only solution is to shut down and restart the media manager. If the documentation from the media manager does not provide the needed information, contact technical support for the media manager.

    See Also:

    Your operating system-specific documentation for the relevant commands