3 Configuring Initial Data Synchronization
This topic includes the following sections:
3.1 Initial Data Synchronization
You can use Oracle GoldenGate to load data in any of the following ways.
-
Using database utilities. The utility program performs the initial load. Examples include loading the target using FUP or SQL and using back up and restore.
-
Loading from a file to a database utility. Extract writes records to an extract file in external ASCII format. The files are used as input to a bulk load utility that writes to target. Replicat creates the run and control files.
-
Loading from a file to Replicat. Extract writes records to an extract file and Replicat applies them to the target tables.
-
Using an Oracle GoldenGate direct load. Extract communicates with Replicat directly without using a Collector process or files. Replicat applies the data to the target.
-
Using direct bulk load. Extract writes records in external ASCII format and delivers them directly to Replicat, which delivers them to the Oracle SQL*Loader bulk load utility. This is the fastest method of loading Oracle data with Oracle GoldenGate.
The method you choose depends, in part, on the database types of your source and target. For example, if you are replicating from a NonStop source to an Oracle target, your choices include direct bulk load, which interacts with SQL*Loader.
Regardless of the method chosen, the initial data synchronization should be run after initiating change capture and before configuring delivery. The steps are:
-
Configure and run Extract to capture all active database changes to an Oracle GoldenGate trail.
-
Perform initial load.
-
Configure and run Replicat.
3.1.1 Example Steps for Initial Data Load
This example shows how you might follow these three steps to configure change capture and delivery and perform an initial data load.
3.1.1.1 Configure and Run Extract
You can configure this Extract process to read change records from a TMF audit trail, log trail, or flat file, and process them to an Oracle GoldenGate trail. This step is necessary before doing the initial load because it extracts and delivers ongoing changes to the source database, preserving data integrity for your business operations.
Instructions for configuring and running Extract can be found in "Planning the Configuration".
3.1.1.3 Configure and Run Replicat
When your initial data load finishes writing to its trails, configure Replicat on the target system. This can be the same parameter file you use for ongoing Replicat work, however, you will need to add HANDLECOLLISIONS
and END
parameters, run the batch, then remove those parameters before beginning ongoing change extract.
- Configure Replicat on the target system, including
HANDLECOLLISIONS
andEND
in the parameter file. TheEND
parameter is the time recorded for the completion of the Extract process. - Start Replicat with the
START REPLICAT
command. - When Replicat stops, remove the
HANDLECOLLISIONS
andEND
parameters. - Start Replicat for incremental data synchronization.
3.1.2 Direct Load
Using direct load, you can extract data directly from the source tables and send it, in a large block, to Replicat. You may do this on any Operating System and database combination Oracle GoldenGate supports (such as NonStop to NonStop, NonStop to OpenSys (Classic), and OpenSys (Classic) to NonStop). This is not supported to or from OpenSys MicroServices.
3.1.2.2 Using Wildcards
Wildcards can be used for the FILE
and TABLE
statements in direct load parameter files, but not for views.
Refer back to the example of an Extract group added with the SOURCEISFILE
parameter in "To run direct load:". If the ACCOUNT
, PRODUCT
and CUSTOMER
files are the only files on $DATA.MASTER
, the Extract parameters could be changed to use wildcards. This use of wildcards is shown in the following direct load parameter file:
EXTRACT INITEXT RMTHOST targethost, MGRPORT 7809 RMTTASK REPLICAT, GROUP INITREP TABLE $DATA.MASTER.*, AUTOTRUNCATE;
3.1.3 Direct Bulk Load
If you are loading to an Oracle target, you may choose to use direct bulk load. Direct bulk load is the fastest technique for capturing and delivering data to SQL*Loaders. Extract sends the data, in a large block, to Replicat. Manager dynamically starts Replicat, which communicates directly with SQL*Loader using an API.
Note:
You can only use this direct bulk load from NonStop to Oracle Classic.
3.2 Controlling the IP Process for Replicat
Although you can configure multiple initial-load Extracts and Replicats, by default the Replicats will inherit the IP process of the Manager running on the target. This results in a single IP channel that does not spread the load across the processors.
To configure the Extract and Replicat pairs to use different channels, you can use static Replicats as shown in the next example.
3.3 Loading Oracle, Microsoft, or Sybase SQL Server Tables
NonStop tables and files can be synchronized with Oracle or SQL Server tables in very much the same way as NonStop-to-NonStop synchronization.
3.3.1 Loading to Oracle or SQL Server
To load to Oracle or SQL Server:
-
Run
DEFGEN
to export source data definitions to the target system. Be sure to satisfy any other prerequisites. -
Start the Collector on the target system:
For UNIX:
$server –d /ggs/mydb.def 2> server.log &
For Windows:
server –d \ggs\mydb.def 2> server.log
Use the –d option to specify the definitions file created by
DEFGEN
(mydb.def
). -
Create an Extract parameter file to perform initial table extract over TCP/IP.
-
Run the Extract program to extract the data into a flat file on the target system:
TACL> RUN GGS.EXTRACT /IN GGSPARM.TRANINI, OUT $S.#TRAN/
This command creates a flat file on the target. If you specified the
FORMATASCII, SQLLOADER
in a parameter file for Oracle, Oracle GoldenGate generates the flat file in a format that SQL*Loader can read. If you specifiedFORMATASCII
,BCP
in the parameter file for SQL Server, Oracle GoldenGate generates a flat file that is compatible with the BCP utility. -
Create a Replicat parameter file using the
MAP
parameter to map between source and target tables. -
Run Replicat to create files called
TRANSTAB.ctl
andTRANSTAB.run
for Oracle, andTRANSTAB.bat
andTRANSTAB.fmt
for SQL Server. These files contain mapping definitions and run commands required to load.For UNIX:
replicat paramfile/ggs/dirprm/tranini.prm
For Windows:
C:\replicat paramfile\ggs\dirprm\tranini
-
Load the data.
For UNIX:
$TRANSTAB.run
For Windows:
TRANSTAB.bat
3.3.2 Initial Sync Parameter File Examples
This sections contains these examples:
-
An Extract parameter file example for Oracle running on UNIX
-
A Replicat parameter file example for Oracle running on UNIX
-
An Extract parameter file example for SQL Server running on Windows
-
A Replicat parameter file example for SQL Server running on Windows
3.3.2.1 Sample NonStop to Oracle Parameter Files
Following are examples of NonStop to Oracle Extract parameter files.
Extract Parameter File GGSPARM.ORAINI:
SOURCEISFILE, FASTUNLOAD FORMATASCII, SQLLOADER RMTHOST ntbox12, MGRPORT 7809, PARAMS "-d c:\ggs\dirdef\source.def" RMTFILE TRANSTAB.dat, PURGE FILE \SRC.$D2.MYDB.TRANSTAB;
-
FORMATASCII, SQLLOADER
specifies the data format is compatible with Oracle's SQL*Loader utility. -
RMTFILE
identifiesTRANSTAB.dat
as the source table.
Replicat Parameter File /ggs/dirprm/tranini.prm:
GENLOADFILES USERID me, PASSWORD orapw SOURCEDEFS /ggs/mydb.def MAP $D2.MYDB.TRANSTAB, TARGET ORATRANS; RMTFILE /ggsdat/tranini, PURGE FILE \SRC.$D2.MYDB.TRANSTAB;
-
GENLOADFILES
generates load control files and then quits. These control files generate maps, even between dissimilar tables. -
USERID
andPASSWORD
specify the database log on. -
SOURCEDEFS
specifies the location of the NonStop definitions exported byDEFGEN
(These are required to generate a load map.) -
MAP
specifies the source to target relationship of the NonStop to Oracle table. -
Errors are displayed to the screen and detailed messages are written to the
TRANSTAB.err
andTRANSTAB.log
files
3.3.2.2 Sample SQL Server Parameter Files
Following are examples of parameter files for SQL Server.
Extract parameter file GGSPARM.SQLINI:
SOURCEISFILE, FASTUNLOAD FORMATASCII, BCP RMTHOST ntbox12, MGRPORT 7809, PARAMS "-d c:\ggs\dirdef\source.def" RMTFILE C:\GGS\TRANSTAB.dat, PURGE TABLE $DATA.MASTER.TRANS
-
FORMATASCII, BCP
specifies the data format is compatible with the MicrosoftBCP
utility. -
RMTFILE
identifiesTRANSTAB.dat
as the source table. Using thedat
extension makes it compatible with the load functions.
To load data to SQL Server, you must use the BCP
template provided by Oracle GoldenGate. You can call BCP
from your Replicat parameter file or run it interactively from the operating system shell. The template tells Replicat how data is laid out in the SQL Server target.
Replicat parameter file for GGSPARM.TRANINI
GENLOADFILES BCPFMT.TPLTARGETDB TARGETDB MYAPP, USERID MYNAME, PASSWORD MSPW SOURCEDEFS c:\ggs\mydb.def MAP $D2.MYDB.TRANSTAB, TARGET SCHEMA.ORATRANS;
3.4 Limiting the Enscribe Source Key Range for Initial Load
If your parameters meet the requirements, the FILE
parameter options STARTKEY
and ENDKEY
can be used to limit the range of Enscribe records selected for a SOURCEISFILE
initial-load process. This allows you to load subsets of the data for different purposes or to break up the a large initial data load. Refer to Reference for Oracle GoldenGate on HP NonStop Guardian FILE | TABLE
parameter for specifics on the requirements and how to use STARTKEY
and ENDKEY
.
3.5 Restarting an Initial Load
You can restart initial loads using the RESTARTCHECKPOINTS
option of the SOURCEISFILE
or SOURCEISTABLE
parameter if your Extract is added from GGSCI.
You can use RESTARTCHECKPOINTS
for:
-
SQL/MP source tables with or without the
SQLPREDICATE
option -
Enscribe whether or not you use the
FILE
STARTKEY
andENDKEY
options -
Both SQL/MP and Enscribe with or without
FASTUNLOAD
.
Refer to Reference for Oracle GoldenGate on HP NonStop Guardian for additional conditions and restrictions for using the SOURCEISFILE
RESTARTCHECKPOINTS
option.
The messages generated when the SOURCEISFILE
Extract restarts vary based on the type of database and the parameters and options that are used. Some different types of examples are shown next.
- Example 1 SQL/MP tables produced without using FASTUNLOAD
-
A message similar to the following is produced for SQL/MP source tables without
FASTUNLOAD
. In this example the optionSQLPREDICATE
is being used andWHERE (STATE = "CA")
is the user's predicate.AC_KEY
is the multi-column key for the restart.Output extract file \NY.$DATA02.ACDAT.PA000009 Write Position: RBA 19126 Extract SourceIsFile process is restartable Processing File \NY.$DATA02.ACDAT.ACCT Using this SQL statement to retrieve data: SELECT * FROM \NY.$DATA02.ACDAT.ACCT WHERE (STATE = "CA") AND AC_KEY1, AC_KEY2, AC_KEY3 > 13 ,4781 ,27 BROWSE ACCESS
- Example 2 SQL/MP or Enscribe tables produced using FASTUNLOAD
-
A message similar to the following is produced for SQL/MP or Enscribe source tables using
FASTUNLOAD
. The restart key isRBA 9555968
of partition$DATA03
.Output extract file \NY.$DATA02.ACDAT.PA000009 Write Position: RBA 19126 Extract SourceIsFile process is restartable Processing File \NY.$DATA02.ACDAT.ACCT2 Processing Partition \NY.$DATA03.ACDAT.ACCT2 Positioning Restart at RBA 9555968
- Example 3 Enscribe tables produced without FASTUNLOAD or STARTKEY
-
A message similar to the following is produced for Enscribe that is not using
FASTUNLOAD
and withoutSTARTKEY
. TheCUST-KEY
used for the restart is1234
.Output extract file \NY.$DATA02.ACDAT.PA000009 Write Position: RBA 19126 ExtractSourceIsFile process is restartable Processing File \NY.$DATA02.ACDAT.ALTPART Processing using restart values ( CUST-KEY = 1234 )
- Example 4 Enscribe tables produced without FASTLOAD and with a STARTKEY
-
A message similar to the following is produced for Enscribe without
FASTUNLOAD
and withSTARTKEY
. TheCUST-KEY
used for the restart is1234
.file $data02.acdat.altpart, startkey (CUST-key = 0000), def ens-rec, endkey (CUST-key = 5555); file $data02.acdat.altpart, startkey (CUST-key = 5556), def ens-rec, endkey (CUST-key = 999999); Output extract file \NY.$DATA02.ACDAT.PA000009 Write Position: RBA 19126 Extract SourceIsFile process is restartable Processing File \NY.$DATA02.ACDAT.ALTPART Processing using restart values ( CUST-KEY = 1234 ) Finished to EndKey ( CUST-KEY = 5555 ) Processing from StartKey ( CUST-KEY = 5556 ) Finished to EndKey ( CUST-KEY = 999999 )
3.6 Loading Initial Data from Windows and Unix
Use Replicat to load data from a Windows or UNIX system into a NonStop target database.
3.7 Integrating Source and Target Data
When only a subset of source rows or columns are needed in the target, you can use one of the following techniques to integrate selected data into your target:
-
Selecting on the source with
WHERE
orFILTER
-
Mapping columns on the target with
COLMAP
When the data source is a SQL table, you can specify SQL Views. This allows automatic filtering of columns before transmission.
Data transformation (such as six-to-eight digit date conversion) takes a little extra effort during the load process. There are a couple of ways to achieve initial loads in this situation.
The first solution involves extracting the entire table into a flat file. In this case, do not specify FORMATASCII
. Next use Replicat to load the table using the SPECIALRUN
parameter. This method, while slower than native loads, is often sufficient and allows field conversion functions to be used during replication.
The second solution is to perform all data mapping on the NonStop before transmission on the target side. This means that all conversion work is performed by Extract. Using this strategy can result in less network traffic, since filtering can be performed before data reaches the pipe. However, this can also require the creation of a dummy table or DDL definition on the NonStop side that mimics the structure of the real target table.
3.8 Distributing Extracted Data
In addition to extracting and replicating database changes, Extract can forward and distribute changes that have already been extracted. This process is known as data pumping.
Use data pumping in the following scenarios:
-
A network or target system may be down for an extended time, but extraction or logging activities must occur constantly.
-
Data extracted by Logger must be forwarded over TCP/IP to non-NonStop systems.
Running Extract for these purposes is nearly identical to capturing data from TMF audit trails. To run Extract in this manner, perform the following tasks.
- Using the
EXTTRAILSOURCE
orLOGTRAILSOURCE
option, create an initial Extract checkpoint with the GGSCIADD EXTRACT
command. - Add a local or remote Oracle GoldenGate trail with the GGSCI
ADD EXTTRAIL
orADD RMTTRAIL
command. By adding the trails, you direct Extract where to write the data you need. - Set up an Extract parameter file.
- Start Extract using the GGSCI
START EXTRACT
command.
3.9 Direct File Extraction
Rather than capturing from trails, you can extract directly from a file or a sequence of files. You can read a file directly only when the following conditions are true:
-
Only inserts occur against the files (no updates).
-
Records are inserted only at the end of the file.
Use this feature when:
-
An entry-sequenced, BASE24 TFL/PTLF, or Connex Advantage file
-
The input files meet the conditions described above.
-
You want to transfer the batch file contents a little at a time throughout the day ("trickle" transfer), rather than all at once at the end of the day.
To extract directly from a file:
- Enter a GGSCI
ADD EXTRACT
command, specifying theFILETYPE
parameter.FILETYPE
indicates the type of file from which you are reading. - If more than one file in a sequence might be open at a time, start Extract for each
file in use simultaneously. Enter an
ALTINPUT
parameter in each process's parameter file with aRANGE
option to distribute the files among the processes.
3.10 Batch Processing
You can configure Extract and Replicat to run in batch when capturing and delivering incremental changes is not appropriate for the application. You can configure ongoing batch runs for a specific time daily, or special, one-time batch runs.
3.10.2 Trickle Batch Processing
When you are extracting batch files using RMTBATCH
, you may need to perform the following steps:
3.10.2.1 Determining the Next File
Use ALTINPUT
for direct file extraction. With ACI files, multiple files can be in use at one time. For example, processing can continue for Monday's file after midnight, while Tuesday's file is opened for new data. To handle a multiple file situation, run more than one Extract process for the file sequence. Use the ALTINPUT RANGE
option to distribute the files across the processes so that Extract never processes two files in sequence. You can also use ALTINPUT
to specify the access mode of the file open, and to move Extract to the next sequential file if an application has a file open that it is not updating.
By default, Extract looks for the next alphabetical file name. The file name must conform to a template for the file type, which defaults to predefined characteristics. You can also specify the template by parameter.
If the file type is ACITLF
or ACIPTLF
, then the
template is in the form $VOL.SUBVOL.XXYYMMDD
, where
XX
is a two character prefix, followed by a six digit
date.
If the file type is ACITLFX
or ACIPTLFX
, the template is in the form $VOL.SUBVOL.XMMDDNNN
, where X
is a one character prefix, followed by a month, day and three digit sequence number.
When specifying any of the ACI file types in the FILETYPE
option, do not include the date or sequence number. The next file is the one following the current file in name order, and must also satisfy any RANGE
criteria in the ALTINPUT
parameter.
If the file type is ENTRY
, you specify the template in the ALTINPUT
parameter TEMPLATE
option. NonStop wildcards are acceptable. For example, the template $DATA*.MYDAT.FL*
processes files starting with FL
residing on different $Data
volumes.
When using FILETYPE
ADVANTAGE
, specify the first file to process, not the file
prefix. By default, the next file is the next file name to fit the template.
As an alternative, you can use ALTINPUT
USENEXTMODIFIED
. This option selects the next file modified
after the current file that also fits the template.
3.10.2.2 When the Next File is Processed
Before moving to the next file in a sequence, Extract must process the entire contents of the current file. By default, Extract uses the following rules to determine that the current file has been exhausted and the next file is ready for processing.
-
End-of-file was reached in current file at least five seconds earlier, and no new data has appeared since.
-
No processes have the current file open for write access.
-
The next file exists and has at least one record in it.
-
The next file was modified after the current file.
You can modify these rules with the NOWAITNEXTMODIFIED
, WAITNEXTRBA
, and OPENTIMEOUT
options for the ALTINPUT
parameter.