Corruption Prevention, Detection, and Repair

Data block corruptions can be very disruptive and challenging to repair. Corruptions can cause serious application and database downtime and data loss when encountered and worse yet it can go undetected for hours, days and even weeks leading to even longer application downtime once detected.Unfortunately, there is not one way to comprehensively prevent, detect, and repair data corruptions within the database because the source and cause of corruptions can be anywhere in memory, hardware, firmware, storage, operating system, software, or user error. Worse yet, third-party solutions that do not understand Oracle data block semantics and how Oracle changes data blocks do not prevent and detect data block corruptions well. Third party remote mirroring technologies can propagate data corruptions to the database replica (standby) leading to a double failure, data loss, and much longer downtime. Third party backup and restore solutions cannot detect corrupted backups or bad sectors until a restore or validate operation is issued, resulting in longer restore times and once again potential data loss.

Oracle MAA has a comprehensive plan to prevent, detect, and repair all forms of data block corruptions including physical block corruptions, logical block corruptions, stray writes, and lost writes. These additional safeguards provide the most comprehensive Oracle data block corruption prevention, detection, and repair solution. Details of this plan are described in the My Oracle Support note "Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)."

The following outlines block corruption checks for various manual operational checks and runtime and background corruption checks. Database administrators and the operations team can incorporate manual checks such as running Oracle Recovery Manager (RMAN) backups, RMAN "check logical" validations, or running the ANALYZE VALIDATE STRUCTURE command on important objects. Manual checks are especially important to validate data that are rarely updated or queried.

Runtime checks are far superior in that they catch corruptions almost immediately or during runtime for actively queried and updated data. Runtime checks can prevent corruptions or automatically fix corruptions resulting in better data protection and higher application availability. A new background check has been introduced in Exadata to automatically scan and scrub disks intelligently with no application overhead and to automatically fix physically corrupted blocks.

Table 3-1 Summary of Block Corruption Checks

Checks Capabilities Physical Block Corruption Logical Block Corruption

Manual checks

Dbverify, Analyze

Physical block checks

Logical intra-block and inter-object consistency checks

Manual checks

RMAN

Physical block checks during backup and restore operations

Intra-block logical checks

Manual checks

ASM Scrub

Physical block checks

Some logical intra-block checks

Runtime checks

Oracle Active Data Guard

1. Continuous physical block checking at standby during transport and apply

2. Strong database isolation eliminates single point database failure

3. Automatic repair of block corruptions, including file block headers in Oracle Database 12c Release 2

4. Automatic database failover

1. With DB_LOST_WRITE_PROTECT enabled, detection of lost writes (11.2 and higher). With 11.2.0.4 and Data Guard broker, ability to shutdown the primary when lost writes are detected on the primary database.

2. With DB_BLOCK_CHECKING enabled on the standby, additional intra-block logical checks

Runtime checks

Database

With DB_BLOCK_CHECKSUM, in-memory data block and redo checksum validation

With DB_BLOCK_CHECKING, in-memory intra-block check validation

Starting in Oracle Database 18c, and with Shadow Lost Write Protection enabled, Oracle tracks system change numbers (SCNs) for tracked data files and enables early lost write detection. When lost writes are detected, an error is returned immediately.

See Shadow Lost Write Protection description following this table.

Runtime checks

ASM and ASM software mirroring

(inherent in Exadata, Supercluster, and Zero Data Loss Recovery Appliance)

Implicit data corruption detection for reads and writes and automatic repair if good ASM extent block pair is available during writes

.

Runtime checks

DIX + T10 DIF

Checksum validation from operating system to HBA controller to disk (firmware). Validation for reads and writes for certified Linux, HBA and disks.

.

Runtime checks

Hardware and Storage

Limited checks due to lack of Oracle integration. Checksum is most common.

Limited checks due to lack of Oracle integration. Checksum is most common

Runtime checks

Exadata

Comprehensive HARD checks on writes

HARD checks on writes

Background checks

Exadata

Automatic HARD disk scrub and repair. Detects and fixes bad sectors.

.

Shadow Lost Write Protection

New in Oracle Database 18c, shadow lost write protection detects a lost write before it can result in a major data corruption. You can enable shadow lost write protection for a database, a tablespace, or a data file without requiring an Oracle Data Guard standby database. Shadow lost write protection provides fast detection and immediate response to a lost write, thus minimizing the data loss that can occur in a database due to data corruption.

See Also:

Oracle AI Database Reference for more information about the views and initialization parameters

My Oracle Support Note 1302539.1