Corruption Prevention, Detection, and Repair
Data block corruptions can be very disruptive and challenging to repair. Corruptions can cause serious application and database downtime and data loss when encountered and worse yet it can go undetected for hours, days and even weeks leading to even longer application downtime once detected.Unfortunately, there is not one way to comprehensively prevent, detect, and repair data corruptions within the database because the source and cause of corruptions can be anywhere in memory, hardware, firmware, storage, operating system, software, or user error. Worse yet, third-party solutions that do not understand Oracle data block semantics and how Oracle changes data blocks do not prevent and detect data block corruptions well. Third party remote mirroring technologies can propagate data corruptions to the database replica (standby) leading to a double failure, data loss, and much longer downtime. Third party backup and restore solutions cannot detect corrupted backups or bad sectors until a restore or validate operation is issued, resulting in longer restore times and once again potential data loss.
Oracle MAA has a comprehensive plan to prevent, detect, and repair all forms of data block corruptions including physical block corruptions, logical block corruptions, stray writes, and lost writes. These additional safeguards provide the most comprehensive Oracle data block corruption prevention, detection, and repair solution. Details of this plan are described in the My Oracle Support note "Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)."
The following outlines block corruption checks for various manual operational checks and
runtime and background corruption checks. Database administrators and the operations
team can incorporate manual checks such as running Oracle Recovery Manager (RMAN)
backups, RMAN "check logical" validations, or running the ANALYZE VALIDATE
STRUCTURE command on important objects. Manual checks are especially
important to validate data that are rarely updated or queried.
Runtime checks are far superior in that they catch corruptions almost immediately or during runtime for actively queried and updated data. Runtime checks can prevent corruptions or automatically fix corruptions resulting in better data protection and higher application availability. A new background check has been introduced in Exadata to automatically scan and scrub disks intelligently with no application overhead and to automatically fix physically corrupted blocks.
Table 3-1 Summary of Block Corruption Checks
| Checks | Capabilities | Physical Block Corruption | Logical Block Corruption |
|---|---|---|---|
|
Manual checks |
Dbverify, Analyze |
Physical block checks |
Logical intra-block and inter-object consistency checks |
|
Manual checks |
RMAN |
Physical block checks during backup and restore operations |
Intra-block logical checks |
|
Manual checks |
ASM Scrub |
Physical block checks |
Some logical intra-block checks |
|
Runtime checks |
Oracle Active Data Guard |
1. Continuous physical block checking at standby during transport and apply 2. Strong database isolation eliminates single point database failure 3. Automatic repair of block corruptions, including file block headers in Oracle Database 12c Release 2 4. Automatic database failover |
1. With 2. With |
|
Runtime checks |
Database |
With |
With Starting in Oracle Database 18c, and with Shadow Lost Write Protection enabled, Oracle tracks system change numbers (SCNs) for tracked data files and enables early lost write detection. When lost writes are detected, an error is returned immediately. See Shadow Lost Write Protection description following this table. |
|
Runtime checks |
ASM and ASM software mirroring (inherent in Exadata, Supercluster, and Zero Data Loss Recovery Appliance) |
Implicit data corruption detection for reads and writes and automatic repair if good ASM extent block pair is available during writes |
. |
|
Runtime checks |
DIX + T10 DIF |
Checksum validation from operating system to HBA controller to disk (firmware). Validation for reads and writes for certified Linux, HBA and disks. |
. |
|
Runtime checks |
Hardware and Storage |
Limited checks due to lack of Oracle integration. Checksum is most common. |
Limited checks due to lack of Oracle integration. Checksum is most common |
|
Runtime checks |
Exadata |
Comprehensive HARD checks on writes |
HARD checks on writes |
|
Background checks |
Exadata |
Automatic HARD disk scrub and repair. Detects and fixes bad sectors. |
. |
Shadow Lost Write Protection
New in Oracle Database 18c, shadow lost write protection detects a lost write before it can result in a major data corruption. You can enable shadow lost write protection for a database, a tablespace, or a data file without requiring an Oracle Data Guard standby database. Shadow lost write protection provides fast detection and immediate response to a lost write, thus minimizing the data loss that can occur in a database due to data corruption.
See Also:
Oracle AI Database Reference for more information about the views and initialization parameters
My Oracle Support Note 1302539.1