Restarting a Primary Extract after System Failure or Corruption

This procedure enables Oracle GoldenGate to recover from certain conditions, such as a file system corruption or a system failure, that corrupt the Extract checkpoint file, trail, or both, and which prevent Extract from being able to start. It enables you to establish a safe starting point in the transaction log for the primary Extract after the system has been restored. It also shows you how to reposition downstream data pumps and Replicat to read from the correct Extract write position in the trails, and to filter out any transactions that Replicat already applied to the target.

Details of this procedure

Extract passes a log begin sequence number, or LOGBSN, to the trail files. The BSN is the native database sequence number that identifies the oldest uncommitted transaction that is held in Extract memory. For example, the BSN in an Oracle installation would be the Oracle system change number (SCN). Each trail file contains the lowest LOGBSN value for all of the transactions in that trail file. Once you know the LOGBSN value, you can reposition Extract at the correct read position to ensure that the appropriate transactions are re-generated to the trail and propagated to Replicat.

Note:

In an Oracle RAC environment, the lowest SCN of all of the threads is transmitted to Replicat. Transactions that may already have been committed by Replicat are handled as duplicates at startup. However, any thread that has been idle past a certain threshold will not be considered for the BSN value, to avoid Extract having to read too far back in the log stream when restarted.

The bounded recovery checkpoint is not taken into account when calculating the LOGBSN. The failure that affected the Extract checkpoint file may also involve a loss of the persisted bounded recovery data files and bounded recovery checkpoint information.

Performing the Recovery

Follow these steps in the order shown to recover the Oracle GoldenGate processes.

  1. In GGSCI on the target system, issue the DBLOGIN command.
    DBLOGIN {USERID Replicat_user | USERIDALIAS alias_of_Replicat_user} 
    
  2. On the target, obtain the LOGBSN value by issuing the INFO REPLICAT command with the DETAIL option.
    INFO REPLICAT group, DETAIL
    

    The BSN is included in the output as a line similar to the following:

    Current Log BSN value: 1151679
    
  3. (Classic capture mode only. Skip if using integrated capture mode.) Query the source database to find the sequence number of the transaction log file that contains the value of the LOGBSN that you identified in the previous step. This example assumes 1855798 is the LOGBSN value and shows that the sequence number of the transaction log that contains that LOGBSN value is 163.
    SQL> select name, thread#, sequence# from v$archived_log 
    where 1855798 between first_change# and next_change#; 
    
    NAME                                  THREAD#    SEQUENCE# 
    ------------------------------------- ---------- ----------/oracle/dbs/arch1_163_800262442.dbf   1          163 
    
  4. Issue the following commands in GGSCI to reposition the primary Extract to the LOGBSN start position.
    • (Classic capture mode)

      ALTER EXTRACT group EXTSEQNO 163
      ALTER EXTRACT group EXTRBA 0 
      ALTER EXTRACT group ETROLLOVER
      
    • (Integrated capture mode)

      ALTER EXTRACT group SCN 1151679
      ALTER EXTRACT group ETROLLOVER
      

    Note:

    There is a limit on how far back Extract can go in the transaction stream, when in integrated mode. If the required SCN is no longer available, the ALTER EXTRACT command fails.

  5. Issue the following command in GGSCI to the primary Extract to view the new sequence number of the Extract Write Checkpoint. This command shows the trail and RBA where Extract will begin to write new data. Because a rollover was issued, the start point is at the beginning (RBA 0) of the new trail file, in this example file number 7.
    INFO EXTRACT group SHOWCH
    Sequence #: 7
    RBA: 0 
    
  6. Issue the following command in GGSCI to reposition the downstream data pump and start a new output trail file.
    ALTER EXTRACT pump EXTSEQNO 7
    ALTER EXTRACT pump EXTRBA 0
    ALTER EXTRACT pump ETROLLOVER
    
  7. Issue the following command in GGSCI to the data pump Extract to view the new sequence number of the data pump Write Checkpoint, in this example trail number 9.
    INFO EXTRACT pump SHOWCH
    Sequence #: 9
    RBA: 0 
    
  8. Reposition Replicat to start reading the trail at the new Write Checkpoint of the data pump.
    ALTER REPLICAT group EXTSEQNO 9
    ALTER REPLICAT group EXTRBA 0
    
  9. Start the primary Extract and the data pump.
    START EXTRACT group
    START REPLICAT group
    
  10. Issue the following command in GGSCI to start Replicat. If Replicat is operating in integrated mode (Oracle targets only), omit the FILTERDUPTRANSACTIONS option. Integrated Replicat handles duplicate transactions transparently.
    START REPLICAT group[, FILTERDUPTRANSACTIONS]

Note:

The LOGBSN gives you the information needed to set Extract back in time to reprocess transactions. Some filtering by Replicat is necessary because Extract will likely re-generate a small amount of data that was already applied by Replicat. FILTERDUPTRANSACTIONS directs Replicat to find and filter duplicates at the beginning of the run.