19 Tuning the Performance of Oracle GoldenGate

This chapter contains suggestions for improving the performance of Oracle GoldenGate components.

This chapter includes the following sections:

19.1 Using Multiple Process Groups

Typically, only one Extract group is required to efficiently capture from a database. However, depending on the database size and type, or the data and operation types, you may find that you are required to add one or more Extract group to the configuration.

Similarly, only one Replicat group is typically needed to apply data to a target database if using Replicat in coordinated mode. (See Section 14.7.2, "About Coordinated Replicat Mode" for more information.) However, even in some cases when using Replicat in coordinated mode, you may be required to use multiple Replicat groups. If you are using Replicat in classic mode and your applications generate a high transaction volume, you probably will need to use parallel Replicat groups.

Because each Oracle GoldenGate component — Extract, data pump, trail, Replicat — is an independent module, you can combine them in ways that suit your needs. You can use multiple trails and parallel Extract and Replicat processes (with or without data pumps) to handle large transaction volume, improve performance, eliminate bottlenecks, reduce latency, or isolate the processing of specific data.

Figure 19-1 shows some of the ways that you can configure Oracle GoldenGate to improve throughput speed and overcome network bandwidth issues.

Figure 19-1 Load-balancing configurations that improve performance


19.1.1 Considerations for using multiple process groups

Before configuring multiple processing groups, review the following considerations to ensure that your configuration produces the desired results and maintains data integrity.

19.1.1.1 Maintaining data integrity

Not all workloads can be partitioned across multiple groups and still preserve the original transaction atomicity. You must determine whether the objects in one group will ever have dependencies on objects in any other group, transactional or otherwise. For example, tables for which the workload routinely updates the primary key cannot easily be partitioned in this manner. DDL replication (if supported for the database) is not viable in this mode, nor is the use of some SQLEXEC or EVENTACTIONS features that base their actions on a specific record.

If your tables do not have any foreign- key dependencies or updates to primary keys, you may be able to use multiple processes. Keep related DML together in the same process stream to ensure data integrity.

19.1.1.2 Number of groups

The number of concurrent Extract and Replicat process groups that can run on a system depends on how much system memory is available. Each Extract and Replicat process needs approximately 25-55 MB of memory, or more depending on the size of the transactions and the number of concurrent transactions.

The Oracle GoldenGate GGSCI command interface fully supports up to 5,000 concurrent Extract and Replicat groups per instance of Oracle GoldenGate Manager. At the supported level, all groups can be controlled and viewed in full with GGSCI commands such as the INFO and STATUS commands. Beyond the supported level, group information is not displayed and errors can occur. Oracle GoldenGate recommends keeping the number of Extract and Replicat groups (combined) at the default level of 300 or below in order to manage your environment effectively. The number of groups is controlled by the MAXGROUPS parameter.

Note:

When creating the groups, keep tables that have relational constraints to each other in the same group.

19.1.1.3 Memory

The system must have sufficient swap space for each Oracle GoldenGate Extract and Replicat process that will be running. To determine the required swap space:

  1. Start up one Extract or Replicat.

  2. Run GGSCI.

  3. View the report file and find the line PROCESS VM AVAIL FROM OS (min).

  4. Round up the value to the next full gigabyte if needed. For example, round up 1.76GB to 2 GB.

  5. Multiply that value by the number of Extract and Replicat processes that will be running. The result is the maximum amount of swap space that could be required

See the CACHEMGR parameter in Reference for Oracle GoldenGate for Windows and UNIX for more information about how memory is managed.

19.1.1.4 Isolating processing-intensive tables

You can use multiple process groups to support certain kinds of tables that tend to interfere with normal processing and cause latency to build on the target. For example:

  • Extract may need to perform a fetch from the database because of the data type of the column, because of parameter specifications, or to perform SQL procedures. When data must be fetched from the database, it affects the performance of Extract. You can get fetch statistics from the STATS EXTRACT command if you include the STATOPTIONS REPORTFETCH parameter in the Extract parameter file. You can then isolate those tables into their own Extract groups, assuming that transactional integrity can be maintained.

  • In its classic mode, Replicat process can be a source of performance bottlenecks because it is a single-threaded process that applies operations one at a time by using regular SQL. Even with BATCHSQL enabled (see Reference for Oracle GoldenGate for Windows and UNIX) Replicat may take longer to process tables that have large or long-running transactions, heavy volume, a very large number of columns that change, and LOB data. You can then isolate those tables into their own Replicat groups, assuming that transactional integrity can be maintained.

19.1.2 Using parallel Replicat groups on a target system

This section contains instructions for creating a configuration that pairs one Extract group with multiple Replicat groups. Although it is possible for multiple Replicat processes to read a single trail (no more than three of them to avoid disk contention) it is recommended that you pair each Replicat with its own trail and corresponding Extract process.

19.1.2.1 To create the Extract group

Note:

This configuration includes Extract data-pumps.
  1. On the source, use the ADD EXTRACT command to create a primary Extract group.

  2. On the source, use the ADD EXTTRAIL command to specify as many local trails as the number of Replicat groups that you will be creating. All trails must be associated with the primary Extract group.

  3. On the source create a data-pump Extract group.

  4. On the source, use the ADD RMTTRAIL command to specify as many remote trails as the number of Replicat groups that you will be creating. All trails must be associated with the data-pump Extract group.

  5. On the source, use the EDIT PARAMS command to create Extract parameter files, one for the primary Extract and one for the data pump, that contain the parameters required for your database environment. When configuring Extract, do the following:

    • Divide the source tables among different TABLE parameters.

    • Link each TABLE statement to a different trail. This is done by placing the TABLE statements after the EXTTRAIL or RMTTRAIL parameter that specifies the trail you want those statements to be associated with.

19.1.2.2 To create the Replicat groups

  1. On the target, create a Replicat checkpoint table. For instructions, see Section 14.3, "Creating a Checkpoint Table." All Replicat groups can use the same checkpoint table.

  2. On the target, use the ADD REPLICAT command to create a Replicat group for each trail that you created. Use the EXTTRAIL argument of ADD REPLICAT to link the Replicat group to the appropriate trail.

  3. On the target, use the EDIT PARAMS command to create a Replicat parameter file for each Replicat group that contains the parameters required for your database environment. All MAP statements for a given Replicat group must specify the same objects that are contained in the trail that is linked to that group.

  4. In the Manager parameter file on the target system, use the PURGEOLDEXTRACTS parameter to control the purging of files from the trails.

19.1.3 Using multiple Extract groups with multiple Replicat groups

Multiple Extract groups write to their own trails. Each trail is read by a dedicated Replicat group.

19.1.3.1 To create the Extract groups

Note:

This configuration includes data pumps.
  1. On the source, use the ADD EXTRACT command to create the primary Extract groups.

  2. On the source, use the ADD EXTTRAIL command to specify a local trail for each of the Extract groups that you created.

  3. On the source create a data-pump Extract group to read each local trail that you created.

  4. On the source, use the ADD RMTTRAIL command to specify a remote trail for each of the data-pumps that you created.

  5. On the source, use the EDIT PARAMS command to create an Extract parameter file for each primary Extract group and each data-pump Extract group.

19.1.3.2 To create the Replicat groups

  1. On the target, create a Replicat checkpoint table. For instructions, see Section 14.3, "Creating a Checkpoint Table." All Replicat groups can use the same checkpoint table.

  2. On the target, use the ADD REPLICAT command to create a Replicat group for each trail. Use the EXTTRAIL argument of ADD REPLICAT to link the group to the trail.

  3. On the target, use the EDIT PARAMS command to create a Replicat parameter file for each Replicat group. All MAP statements for a given Replicat group must specify the same objects that are contained in the trail that is linked to the group.

  4. In the Manager parameter files on the source system and the target system, use the PURGEOLDEXTRACTS parameter to control the purging of files from the trails.

19.2 Splitting large tables into row ranges across process groups

You can use the @RANGE function to divide the rows of any table across two or more Oracle GoldenGate processes. It can be used to increase the throughput of large and heavily accessed tables and also can be used to divide data into sets for distribution to different destinations. Specify each range in a FILTER clause in a TABLE or MAP statement.

@RANGE is safe and scalable. It preserves data integrity by guaranteeing that the same row will always be processed by the same process group.

It might be more efficient to use the primary Extract or a data pump to calculate the ranges than to use Replicat. To calculate ranges, Replicat must filter through the entire trail to find data that meets the range specification. However, your business case should determine where this filtering is performed.

Figure 19-2 Dividing rows of a table between two Extract groups

Description of Figure 19-2 follows
Description of "Figure 19-2 Dividing rows of a table between two Extract groups"

Figure 19-3 Dividing rows of a table between two Replicat groups

Description of Figure 19-3 follows
Description of "Figure 19-3 Dividing rows of a table between two Replicat groups"

19.3 Configuring Oracle GoldenGate to use the network efficiently

Inefficiencies in the transfer of data across the network can cause lag in the Extract process and latency on the target. If not corrected, it can eventually cause process failures.

When you first start a new Oracle GoldenGate configuration:

  1. Establish benchmarks for what you consider to be acceptable lag and throughput volume for Extract and for Replicat. Keep in mind that Extract will normally be faster than Replicat because of the kind of tasks that each one performs. Over time you will know whether the difference is normal or one that requires tuning or troubleshooting.

  2. Set a regular schedule to monitor those processes for lag and volume, as compared to the benchmarks. Look for lag that remains constant or is growing, as opposed to occasional spikes. Continuous, excess lag indicates a bottleneck somewhere in the Oracle GoldenGate configuration. It is a critical first indicator that Oracle GoldenGate needs tuning or that there is an error condition.

To view volume statistics, use the STATS EXTRACT or STATS REPLICAT command. To view lag statistics, use the LAG EXTRACT or LAG REPLICAT command. See Reference for Oracle GoldenGate for Windows and UNIX for more information.

19.3.1 Detecting a network bottleneck that is affecting Oracle GoldenGate

To detect a network bottleneck that is affecting the throughput of Oracle GoldenGate, follow these steps.

  1. Issue the following command to view the ten most recent Extract checkpoints. If you are using a data-pump Extract on the source system, issue the command for the primary Extract and also for the data pump.

    INFO EXTRACT group, SHOWCH 10
    
  2. Look for the Write Checkpoint statistic. This is the place where Extract is writing to the trail.

    Write Checkpoint #1
    
    GGS Log Trail
    Current Checkpoint (current write position):
       Sequence #: 2
       RBA: 2142224
       Timestamp: 2011-01-09 14:16:50.567638
       Extract Trail: ./dirdat/eh
    
  3. For both the primary Extract and data pump:

    • Determine whether there are more than one or two checkpoints. There can be up to ten.

    • Find the Write Checkpoint n heading that has the highest increment number (for example, Write Checkpoint #8) and make a note of the Sequence, RBA, and Timestamp values. This is the most recent checkpoint.

  4. Refer to the information that you noted, and make the following validation:

    • Is the primary Extract generating a series of checkpoints, or just the initial checkpoint?

    • If a data pump is in use, is it generating a series of checkpoints, or just one?

  5. Issue INFO EXTRACT for the primary and data pump Extract processes again.

    • Has the most recent write checkpoint increased? Look at the most recent Sequence, RBA, and Timestamp values to see if their values were incremented forward since the previous INFO EXTRACT command.

  6. Issue the following command to view the status of the Replicat process.

    SEND REPLICAT group, STATUS
    
    • The status indicates whether Replicat is delaying (waiting for data to process), processing data, or at the end of the trail (EOF).

There is a network bottleneck if the status of Replicat is either in delay mode or at the end of the trail file and either of the following is true:

  • You are only using a primary Extract and its write checkpoint is not increasing or is increasing too slowly. Because this Extract process is responsible for sending data across the network, it will eventually run out of memory to contain the backlog of extracted data and abend.

  • You are using a data pump, and its write checkpoint is not increasing, but the write checkpoint of the primary Extract is increasing. In this case, the primary Extract can write to its local trail, but the data pump cannot write to the remote trail. The data pump will abend when it runs out of memory to contain the backlog of extracted data. The primary Extract will run until it reaches the last file in the trail sequence and will abend because it cannot make a checkpoint.

Note:

Even when there is a network outage, Replicat will process in a normal manner until it applies all of the remaining data from the trail to the target. Eventually, it will report that it reached the end of the trail file.

19.3.2 Working around bandwidth limitations by using data pumps

Using parallel data pumps may enable you to work around bandwidth limitations that are imposed on a per-process basis in the network configuration. You can use parallel data pumps to send data to the same target system or to different target systems. Data pumps also remove TCP/IP responsibilities from the primary Extract, and their local trails provide fault tolerance.

19.3.3 Reducing the bandwidth requirements of Oracle GoldenGate

Use the compression options of the RMTHOST parameter to compress data before it is sent across the network. Weigh the benefits of compression against the CPU resources that are required to perform the compression. See Reference for Oracle GoldenGate for Windows and UNIX for more information.

19.3.4 Increasing the TCP/IP packet size

Use the TCPBUFSIZE option of the RMTHOST parameter to control the size of the TCP socket buffer that Extract maintains. By increasing the size of the buffer, you can send larger packets to the target system. See Reference for Oracle GoldenGate for Windows and UNIX for more information.

Use the following steps as a guideline to determine the optimum buffer size for your network.

  1. Use the ping command from the command shell obtain the average round trip time (RTT), shown in the following example:

    C:\home\ggs>ping ggsoftware.com
    Pinging ggsoftware.com [192.168.116.171] with 32 bytes of data:
    Reply from 192.168.116.171: bytes=32 time=31ms TTL=56
    Reply from 192.168.116.171: bytes=32 time=61ms TTL=56
    Reply from 192.168.116.171: bytes=32 time=32ms TTL=56
    Reply from 192.168.116.171: bytes=32 time=34ms TTL=56
    Ping statistics for 192.168.116.171:
        Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 31ms, Maximum = 61ms, Average = 39ms
    
  2. Multiply that value by the network bandwidth. For example, if average RTT is .08 seconds, and the bandwidth is 100 megabits per second, then the optimum buffer size is:

    0.08 second * 100 megabits per second = 8 megabits
    
  3. Divide the result by 8 to determine the number of bytes (8 bits to a byte). For example:

    8 megabits / 8 = 1 megabyte per second
    

    The required unit for TCPBUFSIZE is bytes, so you would set it to a value of 1000000.

The maximum socket buffer size for non-Windows systems is usually limited by default. Ask your system administrator to increase the default value on the source and target systems so that Oracle GoldenGate can increase the buffer size configured with TCPBUFSIZE.

19.4 Eliminating disk I/O bottlenecks

I/O activity can cause bottlenecks for both Extract and Replicat.

  • A regular Extract generates disk writes to a trail and disk reads from a data source.

  • A data pump and Replicat generate disk reads from a local trail.

  • Each process writes a recovery checkpoint to its checkpoint file on a regular schedule.

19.4.1 Improving I/O performance within the system configuration

If there are I/O waits on the disk subsystems that contain the trail files, put the trails on the fastest disk controller possible.

Check the RAID configuration. Because Oracle GoldenGate writes data sequentially, RAID 0+1 (striping and mirroring) is a better choice than RAID 5, which uses checksums that slow down I/O and are not necessary for these types of files.

19.4.2 Improving I/O performance within the Oracle GoldenGate configuration

You can improve I/O performance by making configurations changes within Oracle GoldenGate. Try increasing the values of the following parameters.

  • Use the CHECKPOINTSECS parameter to control how often Extract and Replicat make their routine checkpoints.

    Note:

    CHECKPOINTSECS is not valid for an integrated Replicat on an Oracle database system.
  • Use the GROUPTRANSOPS parameter to control the number of SQL operations that are contained in a Replicat transaction when operating in its normal mode. Increasing the number of operations in a Replicat transaction improves the performance of Oracle GoldenGate by reducing the number of transactions executed by Replicat, and by reducing I/O activity to the checkpoint file and the checkpoint table, if used. Replicat issues a checkpoint whenever it applies a transaction to the target, in addition to its scheduled checkpoints.

    Note:

    GROUPTRANSOPS is not valid for an integrated Replicat on an Oracle database system, unless the inbound server parameter parallelism is set to 1.
  • Use the EOFDELAY or EOFDELAYCSECS parameter to control how often Extract, a data pump, or Replicat checks for new data after it has reached the end of the current data in its data source. You can reduce the system I/O overhead of these reads by increasing the value of this parameter.

Note:

Increasing the values of these parameters improves performance, but it also increases the amount of data that must be reprocessed if the process fails. This has an effect on overall latency between source and target. Some testing will help you determine the optimal balance between recovery and performance.

19.5 Managing Virtual Memory and Paging

Because Oracle GoldenGate replicates only committed transactions, it stores the operations of each transaction in a managed virtual-memory pool known as a cache until it receives either a commit or a rollback for that transaction. One global cache operates as a shared resource of an Extract or Replicat process. The Oracle GoldenGate cache manager takes advantage of the memory management functions of the operating system to ensure that Oracle GoldenGate processes work in a sustained and efficient manner. The CACHEMGR parameter controls the amount of virtual memory and temporary disk space that is available for caching uncommitted transaction data that is being processed by Oracle GoldenGate.

When a process starts, the cache manager checks the availability of resources for virtual memory, as shown in the following example:

CACHEMGR virtual memory values (may have been adjusted)CACHESIZE: 32GCACHEPAGEOUTSIZE (normal): 8M PROCESS VM AVAIL FROM OS (min): 63.97GCACHESIZEMAX (strict force to disk): 48G

If the current resources are not sufficient, a message like the following may be returned:

2013-11-11 14:16:22 WARNING OGG-01842 CACHESIZE PER DYNAMIC DETERMINATION (32G) LESS THAN RECOMMENDED: 64G (64bit system)vm found: 63.97GCheck swap space. Recommended swap/extract: 128G (64bit system).

If the system exhibits excessive paging and the performance of critical processes is affected, you can reduce the CACHESIZE option of the CACHEMGR. parameter. You can also control the maximum amount of disk space that can be allocated to the swap directory with the CACHEDIRECTORY option. For more information about CACHEMGR, see Reference for Oracle GoldenGate for Windows and UNIX.

19.6 Optimizing data filtering and conversion

Heavy amounts of data filtering or data conversion add processing overhead. The following are suggestions for minimizing the impact of this overhead on the other processes on the system.

  • Avoid using the primary Extract to filter and convert data. Keep it dedicated to data capture. It will perform better and is less vulnerable to any process failures that result from those activities. The objective is to make certain the primary Extract process is running and keeping pace with the transaction volume.

  • Use Replicat or a data-pump to perform filtering and conversion. Consider any of the following configurations:

    • Use a data pump on the source if the system can tolerate the overhead. This configuration works well when there is a high volume of data to be filtered, because it uses less network bandwidth. Only filtered data gets sent to the target, which also can help with security considerations.

    • Use a data pump on an intermediate system. This configuration keeps the source and target systems free of the overhead, but uses more network bandwidth because unfiltered data is sent from the source to the intermediate system.

    • Use a data pump or Replicat on the target if the system can tolerate the overhead, and if there is adequate network bandwidth for sending large amounts of unfiltered data.

  • If you have limited system resources, a least-best option is to divide the filtering and conversion work between Extract and Replicat.

19.7 Tuning Replicat transactions

Replicat uses regular SQL, so its performance depends on the performance of the target database and the type of SQL that is being applied (inserts, versus updates or deletes). However, you can take certain steps to maximize Replicat efficiency.

19.7.1 Tuning coordination performance against barrier transactions

In a coordinated Replicat configuration, barrier transactions such as updates to the primary key cause an increased number of commits to the database, and they interrupt the benefit of the GROUPTRANSOPS feature of Replicat. When there is a high number of barrier transactions in the overall workload of the coordinated Replicat, using a high number of threads can actually degrade Replicat performance.

To maintain high performance when large numbers of barrier transactions are expected, you can do the following:

  • Reduce the number of active threads in the group. This reduces the overall number of commits that Replicat performs.

  • Move the tables that account for the majority of the barrier transactions, and any tables with which they have dependencies, to a separate coordinated Replicat group that has a small number of threads. Keep the tables that have minimal barrier transactions in the original Replicat group with the higher number of threads, so that parallel performance is maintained without interruption by barrier transactions.

  • (Oracle RAC) In a new Replicat configuration, you can increase the PCTFREE attribute of the Replicat checkpoint table. However, this must be done before Replicat is started for the first time. The recommended value of PCTFREE is 90.

19.7.2 Applying similar SQL statements in arrays

Use the BATCHSQL parameter to increase the performance of Replicat. BATCHSQL causes Replicat to organize similar SQL statements into arrays and apply them at an accelerated rate. In its normal mode, Replicat applies one SQL statement at a time.

When Replicat is in BATCHSQL mode, smaller row changes will show a higher gain in performance than larger row changes. At 100 bytes of data per row change, BATCHSQL has been known to improve the performance of Replicat by up to 300 percent, but actual performance benefits will vary, depending on the mix of operations. At around 5,000 bytes of data per row change, the benefits of using BATCHSQL diminish.

The gathering of SQL statements into batches improves efficiency but also consumes memory. To maintain optimum performance, use the following BATCHSQL options:

BATCHESPERQUEUE 
BYTESPERQUEUE 
OPSPERBATCH 
OPSPERQUEUE 

As a benchmark for setting values, assume that a batch of 1,000 SQL statements at 500 bytes each would require less than 10 megabytes of memory.

You can use BATCHSQL with the BATCHTRANSOPS option to tune array sizing. BATCHTRANSOPS controls the maximum number of batch operations that can be grouped into a transaction before requiring a commit. The default for non-integrated Replicat is 1000. The default for integrated Replicat is 50. If there are many wait dependencies when using integrated Replicat, try reducing the value of BATCHTRANSOPS. To determine the number of wait dependencies, view the TOTAL_WAIT_DEPS column of the V$GG_APPLY_COORDINATOR database view in the Oracle database.

See Reference for Oracle GoldenGate for Windows and UNIX for additional usage considerations and syntax.

19.7.3 Preventing full table scans in the absence of keys

If a target table does not have a primary key, a unique key, or a unique index, Replicat uses all of the columns to build its WHERE clause. This is, essentially, a full table scan.

To make row selection more efficient, use a KEYCOLS clause in the TABLE and MAP statements to identify one or more columns as unique. Replicat will use the specified columns as a key. The following example shows a KEYCOLS clause in a TABLE statement:

TABLE hr.emp, KEYCOLS (FIRST_NAME, LAST_NAME, DOB, ID_NO);

For usage guidelines and syntax, see the TABLE and MAP parameters in Reference for Oracle GoldenGate for Windows and UNIX.

19.7.4 Splitting large transactions

If the target database cannot handle large transactions from the source database, you can split them into a series of smaller ones by using the Replicat parameter MAXTRANSOPS. See Reference for Oracle GoldenGate for Windows and UNIX for more information.

Note:

MAXTRANSOPS is not valid for an integrated Replicat on an Oracle database system.

19.7.5 Adjusting open cursors

The Replicat process maintains cursors for cached SQL statements and for SQLEXEC operations. Without enough cursors, Replicat must age more statements. By default, Replicat maintains as many cursors as allowed by the MAXSQLSTATEMENTS parameter. You might find that the value of this parameter needs to be increased. If so, you might also need to adjust the maximum number of open cursors that are permitted by the database. See Reference for Oracle GoldenGate for Windows and UNIX for more information.

19.7.6 Improving update speed

Excessive block fragmentation causes Replicat to apply SQL statements at a slower than normal speed. Reorganize heavily fragmented tables, and then stop and start Replicat to register the new object ID.

19.7.7 Set a Replicat transaction timeout

Use the TRANSACTIONTIMEOUT parameter to prevent an uncommitted Replicat target transaction from holding locks on the target database and consuming its resources unnecessarily. You can change the value of this parameter so that Replicat can work within existing application timeouts and other database requirements on the target.

TRANSACTIONTIMEOUT limits the amount of time that Replicat can hold a target transaction open if it has not received the end-of-transaction record for the last source transaction in that transaction. By default, Replicat groups multiple source transactions into one target transaction to improve performance, but it will not commit a partial source transaction and will wait indefinitely for that last record. The Replicat parameter GROUPTRANSOPS controls the minimum size of a grouped target transaction.

The following events could last long enough to trigger TRANSACTIONTIMEOUT:

  • Network problems prevent trail data from being delivered to the target system.

  • Running out of disk space on any system, preventing trail data from being written.

  • Collector abends (a rare event).

  • Extract abends or is terminated in the middle of writing records for a transaction.

  • An Extract data pump abends or is terminated.

  • There is a source system failure, such as a power outage or system crash.

See Reference for Oracle GoldenGate for Windows and UNIX for more information.