3 Oracle GoldenGate Processes and Key Terms

Oracle GoldenGate has common data replication processes and architecture-specific processes as well.

Specific components of Classic Architecture and Microservices Architecture are discussed in Components of Classic Architecture and Components of Microservices Architecture. However, there are various processes and key terms that are common to both the architectures of Oracle GoldenGate.

3.1 Common Data Replication Processes

There are a number of data replication processes that are common to both Oracle GoldenGate architectures.

Topics:

3.1.1 What is an Extract?

Extract is a process that is configured to run against the source database or configured to run on a downstream mining database (Oracle only) with capturing data generated in the true source database located somewhere else. This process is the extraction or the data capture mechanism of Oracle GoldenGate.

You can configure an Extract for the following use cases:
  • Initial Loads: When you set up Oracle GoldenGate for initial loads, the Extract process captures the current, static set of data directly from the source objects.

  • Change Synchronization: When you set up Oracle GoldenGate to keep the source data synchronized with another set of data, the Extract process captures the DML and DDL operations performed on the configured objects after the initial synchronization has taken place. Extracts can run locally on the same server as the database or on another server using the downstream Integrated Extract for reduced overhead. It stores these operations until it receives commit records or rollbacks for the transactions that contain them. If it receives a rollback, it discards the operations for that transaction. If it receives a commit, it persists the transaction to disk in a series of files called a trail, where it is queued for propagation to the target system. All of the operations in each transaction are written to the trail as a sequentially organized transaction unit. This design ensures both speed and data integrity.

    Note:

    Extract ignores operations on objects that are not in the Extract configuration, even though a transaction may also include operations on objects that are in the Extract configuration.
The Extract process can be configured to extract data from three types of data sources:
  • Source tables: This source type is used for initial loads.

  • Database recovery logs or transaction logs: While capturing from the logs, the actual method varies depending on the database type. Some examples of this source type are the Oracle Database redo logs or SQL/MX audit trails.

  • Third-party capture modules: This method provides a communication layer that passes data and metadata from an external API to the Extract API. The database vendor or a third-party vendor provides the components that extract the data operations and pass them to Extract.

3.1.2 What is a Trail?

A trail is a series of files on disk where Oracle GoldenGate stores the captured changes to support the continuous extraction and replication of database changes.

A trail can exist on the source system, an intermediary system, the target system, or any combination of those systems, depending on how you configure Oracle GoldenGate. On the local system, it is known as an Extract trail (or local trail). On a remote system, it is known as a remote trail. By using a trail for storage, Oracle GoldenGate supports data accuracy and fault tolerance. The use of a trail also allows extraction and replication activities to occur independently of each other. With these processes separated, you have more choices for how data is processed and delivered. For example, instead of extracting and replicating changes continuously, you could extract changes continuously and store them in the trail for replication to the target later, whenever the target application needs them.

In addition, trails allow Oracle Database to operate in heterogeneous environment. The data is stored in a trail file in a consistent format, so it can be read by Replicat process for all supported databases. For more information , see About the Oracle GoldenGate Trail.

Processes that write to the trail file:

The Extract and the data pump processes write to the trail. Only one Extract process can write to a given local trail. All local trails must have different full-path names though you can use the same trail names in different paths.

Multiple data pump processes can each write to a trail of the same name, but the physical trails themselves must reside on different remote systems, such as in a data-distribution topology. For example, a data pump named pump1 and a data pump named pump2 can both reside on sys01 and write to a remote trail named aa. Pump1 can write to trail aa on sys02, while pump2 can write to trail aa on sys03.

Processes that read from the trail file:

The data pump and Replicat processes read from the trail files. The data pump extracts DML and DDL operations from a local trail that is linked to an Extract process, performs further processing if needed, and transfers the data to a trail that is read by the next Oracle GoldenGate process downstream (typically Replicat, but could be another data pump if required).

The Replicat process reads the trail and applies the replicated DML and DDL operations to the target database.

Trail file creation and maintenance:

The trail files are created as needed during processing. You specify a two-character name for the trail when you add it to the Oracle GoldenGate configuration with the ADD RMTTRAIL or ADD EXTTRAIL command. By default, trails are stored in the dirdat sub-directory of the Oracle GoldenGate directory. You can specify a six or nine digit sequence number using the TRAIL_SEQLEN_9D | TRAIL_SEQLEN_6D GLOBALS parameter; TRAIL_SEQLEN_9D is set by default.

Trail files age automatically to allow processing to continue without interruption for file maintenance. As each new file is created, it inherits the two-character trail name appended with a unique nine digit sequence number from 000000000 through 999999999 (for example c:\ggs\dirdat\tr000000001). When the sequence number reaches 999999999, the numbering starts over at 000000000, and previous trail files are overwritten. trail files can be purged on a routine basis by using the Manager parameter PURGEOLDEXTRACTS.

You can create more than one trail to separate the data from different objects or applications. You link the objects that are specified in a TABLE or SEQUENCE parameter to a trail that is specified with an EXTTRAIL or RMTTRAIL parameter in the Extract parameter file. To maximize throughput, and to minimize I/O load on the system, extracted data is sent into and out of a trail in large blocks. Transactional order is preserved.

Converting Existing Trails to 9 Digit Sequence Numbers

You can convert trail files from 9-digit to 6-digit checkpoint record for the named extract groups. Use convchk native command to convert to 9-digit trail by stopping your Extract gracefully then using convchk to upgrade as follows:

convchk extract trail seqlen_9d

Start your Extract

You can downgrade from a 9 to 6 digit trail with the same process using this convchk command:

convchk extract trail seqlen_6d

Note:

Extract Files: You can configure Oracle GoldenGate to store extracted data in an extract file instead of a trail. The extract file can be a single file, or it can be configured to roll over into multiple files in anticipation of limitations on file size that are imposed by the operating system. It is similar to a trail, except that checkpoints are not recorded. The file or files are created automatically during the run. The same versioning features that apply to trails also apply to extract files.

3.1.3 What is a Replicat?

Replicat is a process that delivers data to a target database. It reads the trail file on the target database, reconstructs the DML or DDL operations, and applies them to the target database.

The Replicat process uses dynamic SQL to compile a SQL statement once and then executes it many times with different bind variables. You can configure the Replicat process so that it waits a specific amount of time before applying the replicated operations to the target database. For example, a delay may be desirable to prevent the propagation of errant SQL, to control data arrival across different time zones, or to allow time for other planned events to occur.

For the two common uses cases of Oracle GoldenGate, the function of the Replicat process is as follows:
  • Initial Loads: When you set up Oracle GoldenGate for initial loads, the Replicat process applies a static data copy to target objects or routes the data to a high-speed bulk-load utility.

  • Change Synchronization: When you set up Oracle GoldenGate to keep the target database synchronized with the source database, the Replicat process applies the source operations to the target objects using a native database interface or ODBC, depending on the database type.

You can configure multiple Replicat processes with one or more Extract processes and Data Pumps in parallel to increase throughput. To preserve data integrity, each set of processes handles a different set of objects. To differentiate among Replicat processes, you assign each one a group name

If you don't want to use multiple Replicat processes, you can configure a single Replicat process in coordinated or integrated mode.

  • Coordinated mode is supported on all databases that Oracle GoldenGate supports. In coordinated mode, the Replicat process is threaded. One coordinator thread spawns and coordinates one or more threads that execute replicated SQL operations in parallel. A coordinated Replicat process uses one parameter file and is monitored and managed as one unit. See olink:GWUAD-GUID-6DE3925D-65A1-4604-B253-0A2CC7FEC74F for more information.

  • Integrated mode is supported for Oracle Database releases 11.2.0.4 or later. In integrated mode, the Replicat process leverages the apply processing functionality that is available within the Oracle Database. Within a single Replicat configuration, multiple inbound server child processes known as apply servers apply transactions in parallel while preserving the original transaction atomicity. See About Integrated Replicat for more information about integrated mode.

You can delay Replicat so that it waits a specific amount of time before applying the replicated operations to the target database. A delay may be desirable, for example, to prevent the propagation of errant SQL, to control data arrival across different time zones, or to allow time for other planned events to occur. The length of the delay is controlled by the DEFERAPPLYINTERVAL parameter.

Various parameters control the way that Replicat converts source transactions to target transactions. These parameters include BATCHSQL, GROUPTRANSOPS, and MAXTRANSOPS. For more information about these and other Replicat parameters, see Oracle GoldenGate Parameters in Reference for Oracle GoldenGate.

3.1.4 What are Checkpoints?

Checkpoints store the current read and write positions of a process to disk for recovery purposes. Checkpoints ensure that data changes that are marked for synchronization actually are captured by Extract and applied to the target by Replicat, and they prevent redundant processing. They provide fault tolerance by preventing the loss of data should the system, the network, or an Oracle GoldenGate process need to be restarted. For complex synchronization configurations, checkpoints enable multiple Extract or Replicat processes to read from the same set of trails.

Checkpoints work with inter-process acknowledgments to prevent messages from being lost in the network. Oracle GoldenGate has a proprietary guaranteed-message delivery technology.

Extract creates checkpoints for its positions in the data source and in the trail. Because Extract only captures committed transactions, it keeps track of the operations in all open transactions, in the event that any of them are committed. This requires Extract to record a checkpoint where it is currently reading in a transaction log, plus the position of the start of the oldest open transaction, which can be in the current or any preceding log.

To control the amount of transaction log that must be re-processed after an outage in an Oracle database, Extract also persists the current state and data of processing to disk at specific intervals, including the state and data (if any) of long-running transactions. If Extract stops after one of these intervals, it can recover from a position within the previous interval or at the last checkpoint, instead of having to return to the log position where the oldest open long-running transaction first appeared. See the BR parameter in Reference for Oracle GoldenGate for more information.

Replicat creates checkpoints for its position in the trail. Replicat stores these checkpoints in a table, known as the checkpoint table, in the target database and also in a checkpoint file on disk. The checkpoint table is stored with a user-specified name and location. The checkpoint file is stored in the dirchk sub-directory of the Oracle GoldenGate directory.

At the completion of each transaction, Replicat writes information about that transaction to a row in the checkpoint table, linking the transaction with a unique position in a specific trail file. Replicat also writes a checkpoint to the checkpoint file when it completes a transaction. At regular intervals, Replicat also writes its current read position to the checkpoint file. These positions are typically not at a transaction boundary, but at some point within a transaction. The interval length is controlled by the CHECKPOINTSECS parameter.

Because the checkpoint table is part of the database, and benefits from the database recovery system, it provides a more efficient recovery point for Replicat. The last checkpoint in the checkpoint file may not be the most recent transaction boundary. It could be the middle of a transaction not yet applied by Replicat or an earlier transaction that was already applied. The checkpoint table ensures that Replicat starts at the correct transaction boundary, so that each transaction is applied only once. The information in the checkpoint table can be used for recovery in some cases, but is primarily used for purposes, such as for the INFO commands in GGSCI.

Regular backups of the Oracle GoldenGate environment, including the trails, should match your database backup, recovery, and retention policies. Restoring the database (and with it the checkpoint table) to an earlier period of time causes Replicat to reposition to an earlier checkpoint that matches that time. If the required trail files for this time period are already aged off the system, they must be restored from backup. To understand how trails are maintained and aged, see "What is a Trail?".

Checkpoints are not required for non-continuous types of configurations, such as a batch load or initial load. If there is a failure, these processes can be started again from the original start point.

See olink:GWUAD-GUID-52E9EEEE-FF27-48BC-8F9F-1A638576C7AA for additional information about checkpoints and the checkpoint table.

3.2 Oracle GoldenGate Key Terms and Concepts

Apart from the two architectures and their components, there are some key terms that you should get familiar with.

Topics:

3.2.1 Overview of Process Types

Depending on the requirement, Oracle GoldenGate can be configured with the following processing types.

  • An online Extract or Replicat process runs until stopped by a user. Online processes maintain recovery checkpoints in the trail so that processing can resume after interruptions. You use online processes to continuously extract and replicate DML and DDL operations (where supported) to keep source and target objects synchronized. The EXTRACT and REPLICAT parameters apply to this process type.

  • A source-is-table (also known as in initial-load Extract) Extract process extracts a current set of static data directly from the source objects in preparation for an initial load to another database. This process type does not use checkpoints. The SOURCEISTABLE parameter applies to this process type.

  • A special-run Replicat process applies data within known begin and end points. You use a special Replicat run for initial data loads, and it also can be used with an online Extract to apply data changes from the trail in batches, such as once a day rather than continuously. This process type does not maintain checkpoints, because the run can be started over with the same begin and end points. The SPECIALRUN parameter applies to this process type.

  • A remote task is a special type of initial-load process in which Extract communicates directly with Replicat over TCP/IP. Neither a Collector process nor temporary disk storage in a trail or file is used. The task is defined in the Extract parameter file with the RMTTASK parameter.

3.2.2 Overview of Groups

To differentiate among multiple Extract or Replicat processes on a system, you define processing groups. For example, to replicate different sets of data in parallel, you would create two Replicat groups.

A processing group consists of a process (either Extract or Replicat), its parameter file, its checkpoint file, and any other files associated with the process. For Replicat, a group may also include an associated checkpoint table. You define groups by using the ADD EXTRACT and ADD REPLICAT commands in the Oracle GoldenGate command interface, GGSCI.

All files and checkpoints relating to a group share the name that is assigned to the group itself. Any time that you issue a command to control or view processing, you supply a group name or multiple group names by means of a wildcard.

3.2.3 Overview of the Commit Sequence Number (CSN)

When working with Oracle GoldenGate, you might need to refer to a Commit Sequence Number, or CSN. A CSN is an identifier that Oracle GoldenGate constructs to identify a transaction for the purpose of maintaining transactional consistency and data integrity. It uniquely identifies a point in time in which a transaction commits to the database.

The CSN can be required to position Extract in the transaction log, to reposition Replicat in the trail, or for other purposes. It is returned by some conversion functions and is included in reports and certain GGSCI output.

A CSN is a monotonically increasing identifier generated by Oracle GoldenGate that uniquely identifies a point in time when a transaction commits to the database. It purpose is to ensure transactional consistency and data integrity as transactions are replicated from source to target. Each kind of database management system generates some kind of unique serial number of its own at the completion of each transaction, which uniquely identifies the commit of that transaction. For example, the Oracle RDBMS generates a System Change Number, which is a monotonically increasing sequence number assigned to every event by Oracle RDBMS. The CSN captures this same identifying information and represents it internally as a series of bytes, but the CSN is processed in a platform-independent manner. A comparison of any two CSN numbers, each of which is bound to a transaction-commit record in the same log stream, reliably indicates the order in which the two transactions completed.

The CSN is cross-checked with the transaction ID (displayed as XID in Oracle GoldenGate informational output). The XID-CSN combination uniquely identifies a transaction even in cases where there are multiple transactions that commit at the same time, and thus have the same CSN. For example, this can happen in an Oracle RAC environment, where there is parallelism and high transaction concurrency.

The CSN value is stored as a token in any trail record that identifies the commit of a transaction. This value can be retrieved with the @GETENV column conversion function and viewed with the Logdump utility.

See olink:GWUAD-GUID-C8A3CB58-2A18-4A1E-9A2F-5AADCD9BF599 for more information about the CSN and a list of CSN values per database.