This chapter describes the capabilities of Oracle GoldenGate, its basic terminology, and its processing logic and architecture.
Oracle GoldenGate enables the exchange and manipulation of data at the transaction level among multiple, heterogeneous platforms across the enterprise. It moves committed transactions with transaction integrity and minimal overhead on your existing infrastructure. Its modular architecture gives you the flexibility to extract and replicate selected data records, transactional changes, and changes to DDL (data definition language) across a variety of topologies.
Support for DDL, certain topologies, and capture or delivery configurations varies by the database type. See Using Oracle GoldenGate for Oracle Databaseand Using Oracle GoldenGate for Heterogeneous Databasesfor detailed information about supported features and configurations.
With this flexibility, and the filtering, transformation, and custom processing features of Oracle GoldenGate, you can support numerous business requirements:
Business continuance and high availability.
Initial load and database migration.
Decision support and data warehousing.
Figure 1-1 Oracle GoldenGate Supported Topologies
Table 1-1 Supported Processing Methods
|Database||Log-Based Extraction (capture)||Non-Log-Based Extraction (1) (capture)||Replication (delivery)|
DB2 for i
Footnote 1 Non-Log-Based Extraction uses a capture module that communicates with the Oracle GoldenGate API to send change data to Oracle GoldenGate.
Oracle GoldenGate can be configured for the following purposes:
A static extraction of data records from one database and the loading of those records to another database.
Continuous extraction and replication of transactional Data Manipulation Language (DML) operations and data definition language (DDL) changes (for supported databases) to keep source and target data consistent.
Extraction from a database and replication to a file outside the database.
Oracle GoldenGate is composed of the following components:
Trails or extract files
Figure 1-2 illustrates the logical architecture of Oracle GoldenGate for initial data loads and for the synchronization of DML and DDL operations. This is the basic configuration. Variations of this model are recommended depending on business needs.
Figure 1-2 Oracle GoldenGate Logical Architecture
The Extract process is the extraction (capture) mechanism of Oracle GoldenGate. Extract runs on a source system or on a downstream database, or both, depending on the database and the implementation requirements.
You can configure Extract in one of the following ways:
Initial loads: For initial data loads, Extract extracts (captures) a current, static set of data directly from their source objects.
Change synchronization: To keep source data synchronized with another set of data, Extract captures DML and DDL operations after the initial synchronization has taken place.
Extract captures from a data source that can be one of the following:
Source tables, if the run is an initial load.
The database recovery logs or transaction logs (such as the Oracle redo logs). The actual method of capturing from the logs varies depending on the database type. For example, Oracle GoldenGate for Oracle offers an integrated capture mode, in which Extract interacts directly with a database logmining server that mines the Oracle transaction stream. See "About Integrated Capture" in Using Oracle GoldenGate for Oracle Database for more information about integrated capture.
A third-party capture module. This method provides a communication layer that passes data and metadata from an external API to the Extract API. The database vendor or a third-party vendor provides the components that extract the data operations and pass them to Extract.
When configured for change synchronization, Extract captures the DML and DDL operations that are performed on objects in the Extract configuration. Extract stores these operations until it receives commit records or rollbacks for the transactions that contain them. When a rollback is received, Extract discards the operations for that transaction. When a commit is received, Extract persists the transaction to disk in a series of files called a trail, where it is queued for propagation to the target system. All of the operations in each transaction are written to the trail as a sequentially organized transaction unit. This design ensures both speed and data integrity.
Extract ignores operations on objects that are not in the Extract configuration, even though the same transaction may also include operations on objects that are in the Extract configuration.
Multiple Extract processes can operate on different objects at the same time. For example, two Extract processes can extract and transmit in parallel to two Replicat processes (with two persistence trails) to minimize target latency when the databases are large. To differentiate among different Extract processes, you assign each one a group name (see Overview of Groups).
A data pump is a secondary Extract group within the source Oracle GoldenGate configuration. If a data pump is not used, Extract must send the captured data operations to a remote trail on the target. In a typical configuration with a data pump, however, the primary Extract group writes to a trail on the source system. The data pump reads this trail and sends the data operations over the network to a remote trail on the target. The data pump adds storage flexibility and also serves to isolate the primary Extract process from TCP/IP activity.
In general, a data pump can perform data filtering, mapping, and conversion
Though configuring a data pump is optional, Oracle recommends it for most configurations.. Some reasons for using a data pump include the following:
Protection against network and target failures: In a basic Oracle GoldenGate configuration, with only a trail on the target system, there is nowhere on the source system to store the data operations that Extract continuously extracts into memory. If the network or the target system becomes unavailable, Extract could run out of memory and abend. However, with a trail and data pump on the source system, captured data can be moved to disk, preventing the abend of the primary Extract. When connectivity is restored, the data pump captures the data from the source trail and sends it to the target system(s).
You are implementing several phases of data filtering or transformation. When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system, or even on an intermediary system, and then use another data pump or the Replicat group to perform the second transformation.
Consolidating data from many sources to a central target. When synchronizing multiple source databases with a central target database, you can store extracted data operations on each source system and use data pumps on each of those systems to send the data to a trail on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.
Synchronizing one source with multiple targets. When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets.
The Replicat process runs on the target system, reads the trail on that system, and then reconstructs the DML or DDL operations and applies them to the target database. Replicat uses dynamic SQL to compile a SQL statement once, and then execute it many times with different bind variables.
You can configure Replicat in one of the following ways:
Initial loads: For initial data loads, Replicat can apply a static data copy to target objects or route it to a high-speed bulk-load utility.
Change synchronization: When configured for change synchronization, Replicat applies the replicated source operations to the target objects using a native database interface or ODBC, depending on the database type.
You can use multiple Replicat processes with one or more Extract processes and data pumps in parallel to increase throughput. To preserve data integrity, each set of processes handles a different set of objects. To differentiate among Replicat processes, you assign each one a group name (see Overview of Groups).
Rather than use multiple Replicat processes, you can configure one Replicat in coordinated or integrated mode.
Coordinated mode is supported on all databases that Oracle GoldenGate supports. In coordinated mode, Replicat is threaded. One coordinator thread spawns and coordinates one or more threads that execute replicated SQL operations in parallel. A coordinated Replicat uses one parameter file and is monitored and managed as one unit. See Creating an Online Replicat Group for more information.
Integrated mode is supported for Oracle versions 220.127.116.11 or later. In integrated mode, Replicat leverages the apply processing functionality that is available within the Oracle database. Within a single Replicat configuration, multiple inbound server child processes known as apply servers apply transactions in parallel while preserving the original transaction atomicity. See About Integrated Replicat for more information about integrated mode.
You can delay Replicat so that it waits a specific amount of time before applying the replicated operations to the target database. A delay may be desirable, for example, to prevent the propagation of errant SQL, to control data arrival across different time zones, or to allow time for other planned events to occur. The length of the delay is controlled by the
Various parameters control the way that Replicat converts source transactions to target transactions. These parameters include
MAXTRANSOPS. For more information about these and other Replicat parameters, see Oracle GoldenGate Parameters in Reference for Oracle GoldenGate.
To support the continuous extraction and replication of database changes, Oracle GoldenGate stores records of the captured changes temporarily on disk in a series of files called a trail. A trail can exist on the source system, an intermediary system, the target system, or any combination of those systems, depending on how you configure Oracle GoldenGate. On the local system it is known as an extract trail (or local trail). On a remote system it is known as a remote trail.
By using a trail for storage, Oracle GoldenGate supports data accuracy and fault tolerance (see Overview of Checkpoints). The use of a trail also allows extraction and replication activities to occur independently of each other. With these processes separated, you have more choices for how data is processed and delivered. For example, instead of extracting and replicating changes continuously, you could extract changes continuously but store them in the trail for replication to the target later, whenever the target application needs them.
The primary Extract and the data-pump Extract write to a trail. Every online Extract process must be linked to a trail. Only one primary Extract process can write to a given local trail. All local trails must have different names.
Multiple data pump Extract processes can each write to a trail of the same name, but the physical trails themselves must reside on different remote systems, such as in a data-distribution topology. For example, a data pump named
1pump and a data pump named
2pump can both reside on sys01 and write to a remote trail named
aa. Data pump
1pump can write to trail
sys02, while data pump
2pump can write to trail
Processes that read the trail are:
Data-pump Extract: Extracts DML and DDL operations from a local trail that is linked to a previous Extract (typically the primary Extract), performs further processing if needed, and transfers the data to a trail that is read by the next Oracle GoldenGate process downstream (typically Replicat, but could be another data pump if required).
Replicat: Reads the trail and applies replicated DML and DDL operations to the target database.
The trail files themselves are created as needed during processing, but you specify a two-character name for the trail when you add it to the Oracle GoldenGate configuration with the
ADD RMTTRAIL or
ADD EXTTRAIL command. By default, trails are stored in the
dirdat sub-directory of the Oracle GoldenGate directory. You can specify a six or nine digit sequence number using the
TRAIL_SEQLEN_9D | TRAIL_SEQLEN_6D GLOBALS parameter;
TRAIL_SEQLEN_9D is set by default.
Trail files are aged automatically to allow processing to continue without interruption for file maintenance. As each new file is created, it inherits the two-character trail name appended with a unique nine digit sequence number from 000000000 through 999999999 (for example c:
\ggs\dirdat\tr000000001). When the sequence number reaches 999999999, the numbering starts over at 000000000, and previous trail files are overwritten. Trail files can be purged on a routine basis by using the Manager parameter
PURGEOLDEXTRACTS. The 9 digits trail file format is the default.
You can create more than one trail to separate the data from different objects or applications. You link the objects that are specified in a
SEQUENCE parameter to a trail that is specified with an
RMTTRAIL parameter in the Extract parameter file.
To maximize throughput, and to minimize I/O load on the system, extracted data is sent into and out of a trail in large blocks. Transactional order is preserved.
See About the Oracle GoldenGate Trail for more information about the trail and the records it contains.
You can convert trail files from 9 digit to 6 digit checkpoint record for the named extract groups. Use
convchk native command to convert to 9 digit trail by stopping your Extract gracefully then using
convchk to upgrade as follows:
convchk extract trail seqlen_9d
Start your Extract.
You can downgrade from a 9 to 6 digit trail with the same process using this
convchk extract trail seqlen_6d
In some configurations, Oracle GoldenGate stores extracted data in an extract file instead of a trail. The extract file can be a single file, or it can be configured to roll over into multiple files in anticipation of limitations on file size that are imposed by the operating system. In this sense, it is similar to a trail, except that checkpoints are not recorded. The file or files are created automatically during the run. The same versioning features that apply to trails also apply to extract files.
Checkpoints store the current read and write positions of a process to disk for recovery purposes. Checkpoints ensure that data changes that are marked for synchronization actually are captured by Extract and applied to the target by Replicat, and they prevent redundant processing. They provide fault tolerance by preventing the loss of data should the system, the network, or an Oracle GoldenGate process need to be restarted. For complex synchronization configurations, checkpoints enable multiple Extract or Replicat processes to read from the same set of trails.
Checkpoints work with inter-process acknowledgments to prevent messages from being lost in the network. Oracle GoldenGate has a proprietary guaranteed-message delivery technology.
Extract creates checkpoints for its positions in the data source and in the trail. Because Extract only captures committed transactions, it keeps track of the operations in all open transactions, in the event that any of them are committed. This requires Extract to record a checkpoint where it is currently reading in a transaction log, plus the position of the start of the oldest open transaction, which can be in the current or any preceding log.
To control the amount of transaction log that must be re-processed after an outage in an Oracle database, Extract also persists the current state and data of processing to disk at specific intervals, including the state and data (if any) of long-running transactions. If Extract stops after one of these intervals, it can recover from a position within the previous interval or at the last checkpoint, instead of having to return to the log position where the oldest open long-running transaction first appeared. See the
BR parameter in Reference for Oracle GoldenGate for more information.
Replicat creates checkpoints for its position in the trail. Replicat stores these checkpoints in a table, known as the checkpoint table, in the target database and also in a checkpoint file on disk. The checkpoint table is stored with a user-specified name and location. The checkpoint file is stored in the
dirchk sub-directory of the Oracle GoldenGate directory.
At the completion of each transaction, Replicat writes information about that transaction to a row in the checkpoint table, linking the transaction with a unique position in a specific trail file. Replicat also writes a checkpoint to the checkpoint file when it completes a transaction. At regular intervals, Replicat also writes its current read position to the checkpoint file. These positions are typically not at a transaction boundary, but at some point within a transaction. The interval length is controlled by the
Because the checkpoint table is part of the database, and benefits from the database recovery system, it provides a more efficient recovery point for Replicat. The last checkpoint in the checkpoint file may not be the most recent transaction boundary. It could be the middle of a transaction not yet applied by Replicat or an earlier transaction that was already applied. The checkpoint table ensures that Replicat starts at the correct transaction boundary, so that each transaction is applied only once. The information in the checkpoint table can be used for recovery in some cases, but is primarily used for purposes, such as for the
INFO commands in GGSCI.
Regular backups of the Oracle GoldenGate environment, including the trails, should match your database backup, recovery, and retention policies. Restoring the database (and with it the checkpoint table) to an earlier period of time causes Replicat to reposition to an earlier checkpoint that matches that time. If the required trail files for this time period are already aged off the system, they must be restored from backup. To understand how trails are maintained and aged, see "Overview of Trails".
Checkpoints are not required for non-continuous types of configurations, such as a batch load or initial load. If there is a failure, these processes can be started again from the original start point.
See About Checkpoints for additional information about checkpoints and the checkpoint table.
Manager is the control process of Oracle GoldenGate. Manager must be running on each system in the Oracle GoldenGate configuration before Extract or Replicat can be started, and Manager must remain running while those processes are running so that resource management functions are performed. Manager performs the following functions:
Start Oracle GoldenGate processes
Start dynamic processes
Maintain port numbers for processes
Perform trail management
Create event, error, and threshold reports
One Manager process can control many Extract or Replicat processes. On Windows systems, Manager can run as a service. See Configuring Manager and Network Communications for more information about the Manager process and configuring TCP/IP connections.
Collector is a process that runs in the background on the target system when continuous, online change synchronization is active. Collector does the following:
Upon a connection request from a remote Extract to Manger, scan and bind to an available port and then send the port number to Manager for assignment to the requesting Extract process.
Receive extracted database changes that are sent by Extract and write them to a trail file. Manager starts Collector automatically when a network connection is required, so Oracle GoldenGate users do not interact with it. Collector can receive information from only one Extract process, so there is one Collector for each Extract that you use. Collector terminates when the associated Extract process terminates.
Collector can be run manually, if needed. This is known as a static Collector (as opposed to the regular, dynamic Collector). Several Extract processes can share one static Collector; however, a one-to-one ratio is optimal. A static Collector can be used to ensure that the process runs on a specific port. See Reference for Oracle GoldenGate for more information about the static Collector. See Configuring Manager and Network Communications for more information about how Manager assigns ports.
By default, Extract initiates TCP/IP connections from the source system to Collector on the target, but Oracle GoldenGate can be configured so that Collector initiates connections from the target. Initiating connections from the target might be required if, for example, the target is in a trusted network zone, but the source is in a less trusted zone.
Depending on the requirement, Oracle GoldenGate can be configured with the following processing types.
An online Extract or Replicat process runs until stopped by a user. Online processes maintain recovery checkpoints in the trail so that processing can resume after interruptions. You use online processes to continuously extract and replicate DML and DDL operations (where supported) to keep source and target objects synchronized. The
REPLICAT parameters apply to this process type.
A source-is-table (also known as in initial-load Extract) Extract process extracts a current set of static data directly from the source objects in preparation for an initial load to another database. This process type does not use checkpoints. The
SOURCEISTABLE parameter applies to this process type.
A special-run Replicat process applies data within known begin and end points. You use a special Replicat run for initial data loads, and it also can be used with an online Extract to apply data changes from the trail in batches, such as once a day rather than continuously. This process type does not maintain checkpoints, because the run can be started over with the same begin and end points. The
SPECIALRUN parameter applies to this process type.
A remote task is a special type of initial-load process in which Extract communicates directly with Replicat over TCP/IP. Neither a Collector process nor temporary disk storage in a trail or file is used. The task is defined in the Extract parameter file with the
To differentiate among multiple Extract or Replicat processes on a system, you define processing groups. For example, to replicate different sets of data in parallel, you would create two Replicat groups.
A processing group consists of a process (either Extract or Replicat), its parameter file, its checkpoint file, and any other files associated with the process. For Replicat, a group may also include an associated checkpoint table. You define groups by using the
ADD EXTRACT and
ADD REPLICAT commands in the Oracle GoldenGate command interface, GGSCI.
All files and checkpoints relating to a group share the name that is assigned to the group itself. Any time that you issue a command to control or view processing, you supply a group name or multiple group names by means of a wildcard.
When working with Oracle GoldenGate, you might need to refer to a Commit Sequence Number, or CSN. A CSN is an identifier that Oracle GoldenGate constructs to identify a transaction for the purpose of maintaining transactional consistency and data integrity. It uniquely identifies a point in time in which a transaction commits to the database.
The CSN can be required to position Extract in the transaction log, to reposition Replicat in the trail, or for other purposes. It is returned by some conversion functions and is included in reports and certain GGSCI output.
See About the Commit Sequence Number for more information about the CSN and a list of CSN values per database.