7 Configuring Oracle GoldenGate for Real-time Data Warehousing

This chapter describes how to configure Oracle GoldenGate for real-time data warehousing.

Topics:

7.1 Overview of the Data Warehousing Configuration

A data warehousing configuration is a many-to-one configuration. Multiple source databases send data to one target warehouse database. Oracle GoldenGate supports like-to-like or heterogeneous transfer of data, with capabilities for filtering and conversion on any system in the configuration (support varies by database platform).

7.2 Considerations for a Data Warehousing Configuration

This section describes considerations for a data warehousing configuration.

7.2.1 Isolation of Data Records

This configuration assumes that each source database contributes different records to the target system. If the same record exists in the same table on two or more source systems and can be changed on any of those systems, conflict resolution routines are needed to resolve conflicts when changes to that record are made on both sources at the same time and replicated to the target table. See Configuring Oracle GoldenGate for Active-Active High Availability for more information about resolving conflicts.

7.2.2 Data Storage

You can divide the data storage between the source systems and the target system to reduce the need for massive amounts of disk space on the target system. This is accomplished by using a data pump on each source, rather than sending data directly from each Extract across the network to the target.

  • A primary Extract writes to a local trail on each source.

  • A data-pump Extract on each source reads the local trail and sends it across TCP/IP to a dedicated Replicat group.

7.2.3 Filtering and Conversion

If not all of the data from a source system will be sent to the data warehouse, you can use the data pump to perform the filtering. This removes that processing overhead from the primary Extract group, and it reduces the amount of data that is sent across the network. See Mapping and Manipulating Data for filtering and conversion options.

7.2.4 Additional Information

The following documentation provides additional information of relevance to configuring Oracle GoldenGate.

7.3 Creating a Data Warehousing Configuration

Refer to Figure 7-1 for a visual representation of the objects you will be creating.

Figure 7-1 Configuration for Data Warehousing

Description of Figure 7-1 follows
Description of "Figure 7-1 Configuration for Data Warehousing"

7.3.1 Source Systems

Configure the Manager process and primary Extract groups for the source systems.

To Configure the Manager Process

  1. On each source, configure the Manager process according to the instructions in Configuring Manager and Network Communications.

  2. In each Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the trail on the local system.

To Configure the primary Extract Groups

  1. On each source, use the ADD EXTRACT command to create a primary Extract group. For documentation purposes, these groups are called ext_1 and ext_2.

    Command on source_1:

    ADD EXTRACT ext_1, {TRANLOG | INTEGRATED TRANLOG}, BEGIN time [option[, ...]]
    

    Command on source_2:

    ADD EXTRACT ext_2, {TRANLOG | INTEGRATED TRANLOG}, BEGIN time [option[, ...]]
    

    See Reference for Oracle GoldenGate for detailed information about these and other ADD EXTRACT options that may be required for your installation.

  2. On each source, use the ADD EXTTRAIL command to create a local trail.

    Command on source_1:

    ADD EXTTRAIL local_trail_1, EXTRACT ext_1
    

    Command on source_2:

    ADD EXTTRAIL local_trail_2, EXTRACT ext_2
    

    Use the EXTRACT argument to link each Extract group to the local trail on the same system. The primary Extract writes to this trail, and the data-pump reads it.

  3. On each source, use the EDIT PARAMS command to create a parameter file for the primary Extract. Include the following parameters plus any others that apply to your database environment. For possible additional required parameters, see the Oracle GoldenGate installation and setup guide for your database.

    Parameter file for ext_1:

    -- Identify the Extract group:
    EXTRACT ext_1
    -- Specify database login information as needed for the database:
    [SOURCEDB dsn_1][, USERIDALIAS alias]
    -- Log all scheduling columns if using integrated Replicat
    LOGALLSUPCOLS
    -- Specify the local trail that this Extract writes to
    -- and the encryption algorithm:
    ENCRYPTTRAIL algorithm
    EXTTRAIL local_trail_1
    -- Specify tables and sequences to be captured:
    SEQUENCE [container.|catalog.]owner.sequence;
    TABLE [container.|catalog.]owner.table;
    

    Parameter file for ext_2:

    -- Identify the Extract group:
    EXTRACT ext_2
    -- Specify database login information as needed for the database:
    [SOURCEDB dsn_2][, USERIDALIAS alias]
    -- Log all scheduling columns if using integrated Replicat or CDR
    LOGALLSUPCOLS
    -- Specify the local trail that this Extract writes to
    -- and the encryption algorithm:
    ENCRYPTTRAIL algorithm
    EXTTRAIL local_trail_2
    -- Specify tables and sequences to be captured:
    SEQUENCE [container.|catalog.]owner.sequence;
    TABLE [container.|catalog.]owner.table;
    

To Configure the Data Pumps

  1. On each source, use the ADD EXTRACT command to create a data pump Extract group. For documentation purposes, these pumps are called pump_1 and pump_2.

    Command on source_1:

    ADD EXTRACT pump_1, EXTTRAILSOURCE local_trail_1, BEGIN time
    

    Command on source_2:

    ADD EXTRACT pump_2, EXTTRAILSOURCE local_trail_2, BEGIN time
    

    Use EXTTRAILSOURCE as the data source option, and specify the name of the trail on the local system

  2. On each source, use the ADD RMTTRAIL command to create a remote trail on the target.

    Command on source_1:

    ADD RMTTRAIL remote_trail_1, EXTRACT pump_1
    

    Command on source_2:

    ADD RMTTRAIL remote_trail_2, EXTRACT pump_2
    

    Use the EXTRACT argument to link each remote trail to a different data pump. The data pump writes to this trail over TCP/IP, and a Replicat reads from it.

    See Reference for Oracle GoldenGate for additional ADD RMTTRAIL options.

  3. On each source, use the EDIT PARAMS command to create a parameter file for the data pump group. Include the following parameters plus any others that apply to your database environment.

    Parameter file for pump_1:

    -- Identify the data pump group:
    EXTRACT pump_1
    -- Specify database login information as needed for the database:
    [SOURCEDB dsn_1][, USERIDALIAS alias]
    -- Decrypt the data only if the data pump must process it.
    -- DECRYPTTRAIL
    -- Specify the name or IP address of the target system
    -- and optional encryption of data over TCP/IP:
    RMTHOSTOPTIONS target, MGRPORT port_number, ENCRYPT encryption_options
    -- Specify the remote trail and encryption algorithm on the target system:
    ENCRYPTTRAIL algorithm
    RMTTRAIL remote_trail_1
    -- Specify tables and sequences to be captured:
    SEQUENCE [container.|catalog.]owner.sequence;
    TABLE [container.|catalog.]owner.table;
    

    Parameter file for pump_2:

    -- Identify the data pump group:
    EXTRACT pump_1
    -- Specify database login information as needed for the database:
    [SOURCEDB dsn_2][, USERIDALIAS alias]
    -- Decrypt the data only if the data pump must process it.
    -- DECRYPTTRAIL
    -- Specify the name or IP address of the target system
    -- and optional encryption of data over TCP/IP:
    RMTHOSTOPTIONS target, MGRPORT port_number, ENCRYPT encryption_options
    -- Specify the remote trail and encryption algorithm on the target system:
    ENCRYPTTRAIL algorithm
    RMTTRAIL remote_trail_2
    -- Specify tables and sequences to be captured:
    SEQUENCE [container.|catalog.]owner.sequence;
    TABLE [container.|catalog.]owner.table;
    

7.3.2 Target System

Configure the Manager process and primary Replicat groups for the target system.

To Configure the Manager Process

  1. Configure the Manager process. See Configuring Manager and Network Communications for instructions.

  2. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the trail.

To Configure the Replicat Groups

  1. On the target, create a Replicat checkpoint table (unless using Oracle integrated Replicat). See Creating a Checkpoint Table for instructions.
  2. On the target, use the ADD REPLICAT command to create a Replicat group for each remote trail that you created. For documentation purposes, these groups are called rep_1 and rep_2.

    Command to add rep_1:

    ADD REPLICAT rep_1
    [, INTEGRATED | COORDINATED [MAXTHREADS number]]
    , EXTTRAIL remote_trail_1, BEGIN time
    

    Command to add rep_2:

    ADD REPLICAT rep_2
    [, INTEGRATED | COORDINATED [MAXTHREADS number]]
    , EXTTRAIL remote_trail_2, BEGIN time
    

    Use the EXTTRAIL argument to link the Replicat group to the trail.

    See Reference for Oracle GoldenGate for detailed information about these and other options that may be required for your installation.

  3. On the target, use the EDIT PARAMS command to create a parameter file for each Replicat group. Include the following parameters plus any others that apply to your database environment. For possible additional required parameters, see the Oracle GoldenGate installation and setup guide for your database.

    Parameter file for rep_1:

    -- Identify the Replicat group:
    REPLICAT rep_1
    -- Specify database login information as needed for the database:
    [TARGETDB dsn_3][, USERIDALIAS alias]
    -- Specify error handling rules:
    REPERROR (error, response)
    -- Specify tables for delivery and threads if using coordinated Replicat:
    MAP [container.|catalog.]owner.table, TARGET owner.table[, DEF template]
    [, THREAD (thread_ID)][, THREADRANGE (thread_range[, column_list])]
    ;
    

    Parameter file for rep_1:

    -- Identify the Replicat group:
    REPLICAT rep_2
    -- Specify database login information as needed for the database:
    [TARGETDB dsn_3][, USERIDALIAS alias]
    -- Specify error handling rules:
    REPERROR (error, response)
    -- Specify tables for delivery and threads if using coordinated Replicat:
    MAP [container.|catalog.]owner.table, TARGET owner.table[, DEF template]
    [, THREAD (thread_ID)][, THREADRANGE (thread_range[, column_list])]
    ;
    

    You can use any number of MAP statements for any given Replicat group. All MAP statements for a given Replicat group must specify the same objects that are contained in the trail that is linked to the group.