18 Monitoring Oracle GoldenGate Processing

This chapter describes the monitoring of Oracle GoldenGate processing.

This chapter includes the following sections:

18.1 Using the Information Commands in GGSCI

The primary way to view processing information is through GGSCI. For more information about these commands, see Reference for Oracle GoldenGate for Windows and UNIX.

Table 18-1 Commands to View Process Information

Command What it shows

INFO {EXTRACT | REPLICAT} group [DETAIL]

Run status, checkpoints, approximate lag, and environmental information.

INFO MANAGER

Run status and port number

INFO ALL

INFO output for all Oracle GoldenGate processes on the system

STATS {EXTRACT | REPLICAT} group

Statistics on processing volume, such as number of operations performed.

STATUS {EXTRACT | REPLICAT} group

Run status (starting, running, stopped, abended)

STATUS MANAGER

Run status

LAG {EXTRACT | REPLICAT} group

Latency between last record processed and timestamp in the data source

INFO {EXTTRAIL | RMTTRAIL} trail

Name of associated process, position of last data processed, maximum file size

SEND MANAGER

Run status, information about child processes, port information, trail purge settings

SEND {EXTRACT | REPLICAT} group

Depending on the process and selected options, returns information about memory pool, lag, TCP statistics, long-running transactions, process status, recovery progress, and more.

VIEW REPORT group

Contents of the discard file or process report

VIEW GGSEVT

Contents of the Oracle GoldenGate error log

COMMAND ER wildcard

Information dependent on the COMMAND type:

INFO

LAG

SEND

STATS

STATUS

wildcard is a wildcard specification for the process groups to be affected, for example:

INFO ER ext*
STATS ER *

18.2 Monitoring an Extract Recovery

Note:

This topic applies to all types of databases except Oracle, for which a different recovery mechanism known as Bounded Recovery is used. For more information, see the BR parameter in Reference for Oracle GoldenGate for Windows and UNIX.

If Extract abends when a long-running transaction is open, it can seem to take a long time to recover when it is started again. To recover its processing state, Extract must search back through the online and archived logs (if necessary) to find the first log record for that long-running transaction. The farther back in time that the transaction started, the longer the recovery takes, in general, and Extract can appear to be stalled.

To confirm that Extract is recovering properly, use the SEND EXTRACT command with the STATUS option. One of the following status notations appears, and you can follow the progress as Extract changes its log read position over the course of the recovery.

  • In recovery[1] – Extract is recovering to its checkpoint in the transaction log.

  • In recovery[2] – Extract is recovering from its checkpoint to the end of the trail.

  • Recovery complete – The recovery is finished, and normal processing will resume.

18.3 Monitoring Lag

Lag statistics show you how well the Oracle GoldenGate processes are keeping pace with the amount of data that is being generated by the business applications. With this information, you can diagnose suspected problems and tune the performance of the Oracle GoldenGate processes to minimize the latency between the source and target databases. See Section 19, "Tuning the Performance of Oracle GoldenGate" for help with tuning Oracle GoldenGate to minimize lag.

18.3.1 About Lag

For Extract, lag is the difference, in seconds, between the time that a record was processed by Extract (based on the system clock) and the timestamp of that record in the data source.

For Replicat, lag is the difference, in seconds, between the time that the last record was processed by Replicat (based on the system clock) and the timestamp of the record in the trail.

To view lag statistics, use either the LAG or SEND command in GGSCI. For more information, see Reference for Oracle GoldenGate for Windows and UNIX.

Note:

The INFO command also returns a lag statistic, but this statistic is taken from the last record that was checkpointed, not the current record that is being processed. It is less accurate than LAG or INFO.

18.3.2 Controlling How Lag is Reported

Use the LAGREPORTMINUTES or LAGREPORTHOURS parameter to specify the interval at which Manager checks for Extract and Replicat lag. See Reference for Oracle GoldenGate for Windows and UNIX.

Use the LAGCRITICALSECONDS, LAGCRITICALMINUTES, or LAGCRITICALHOURS parameter to specify a lag threshold that is considered critical, and to force a warning message to the error log when the threshold is reached. This parameter affects Extract and Replicat processes on the local system. See Reference for Oracle GoldenGate for Windows and UNIX.

Use the LAGINFOSECONDS, LAGINFOMINUTES, or LAGINFOHOURS parameter to specify a lag threshold; if lag exceeds the specified value, Oracle GoldenGate reports lag information to the error log. If the lag exceeds the value specified with the LAGCRITICAL parameter, Manager reports the lag as critical; otherwise, it reports the lag as an informational message. A value of zero (0) forces a message at the frequency specified with the LAGREPORTMINUTES or LAGREPORTHOURS parameter. See Reference for Oracle GoldenGate for Windows and UNIX.

18.4 Monitoring Processing Volume

The STATS commands in GGSCI show you the amount of data that is being processed by an Oracle GoldenGate process, and how fast it is being moved through the Oracle GoldenGate system. With this information, you can diagnose suspected problems and tune the performance of the Oracle GoldenGate processes. These commands provide a variety of options to select and filter the output.

The STATS commands are: STATS EXTRACT, STATS REPLICAT, or STATS ER command. See Reference for Oracle GoldenGate for Windows and UNIX.

You can send interim statistics to the report file at any time with the SEND EXTRACT or SEND REPLICAT command with the REPORT option. See Reference for Oracle GoldenGate for Windows and UNIX.

18.5 Using the Error Log

Use the Oracle GoldenGate error log to view:

  • a history of GGSCI commands

  • Oracle GoldenGate processes that started and stopped

  • processing that was performed

  • errors that occurred

  • informational and warning messages

Because the error log shows events as they occurred in sequence, it is a good tool for detecting the cause (or causes) of an error. For example, you might discover that:

  • someone stopped a process

  • a process failed to make a TCP/IP or database connection

  • a process could not open a file

To view the error log, use any of the following:

Because the error log will continue to grow as you use Oracle GoldenGate, consider archiving and deleting the oldest entries in the file.

18.6 Using the Process Report

Use the process report to view (depending on the process):

  • parameters in use

  • table and column mapping

  • database information

  • runtime messages and errors

  • runtime statistics for the number of operations processed

Every Extract, Replicat, and Manager process generates a report file. The report can help you diagnose problems that occurred during the run, such as invalid mapping syntax, SQL errors, and connection errors.

To view a process report, use any of the following:

  • standard shell command for viewing a text file

  • Oracle GoldenGate Director or Oracle GoldenGate Monitor

  • VIEW REPORT command in GGSCI. See Reference for Oracle GoldenGate for Windows and UNIX.

  • To view information if a process abends without generating a report, use the following command to run the process from the command shell of the operating system (not GGSCI) to send the information to the terminal.

    process paramfile path.prm
    

    Where:

    • The value for process is either extract or replicat.

    • The value for path.prm is the fully qualified name of the parameter file, for example:

      replicat paramfile /ogg/dirdat/repora.prm
      

By default, reports have a file extension of .rpt, for example EXTORA.rpt. The default location is the dirrpt sub-directory of the Oracle GoldenGate directory. However, these properties can be changed when the group is created. Once created, a report file must remain in its original location for Oracle GoldenGate to operate properly after processing has started.

To determine the name and location of a process report, use the INFO EXTRACT, INFO REPLICAT, or INFO MANAGER command in GGSCI. See Reference for Oracle GoldenGate for Windows and UNIX.

18.6.1 Scheduling Runtime Statistics in the Process Report

By default, runtime statistics are written to the report once, at the end of each run. For long or continuous runs, you can use optional parameters to view these statistics on a regular basis, without waiting for the end of the run.

To set a schedule for reporting runtime statistics, use the REPORT parameter in the Extract or Replicat parameter file to specify a day and time to generate runtime statistics in the report. See Reference for Oracle GoldenGate for Windows and UNIX.

To send runtime statistics to the report on demand, use the SEND EXTRACT or SEND REPLICAT command with the REPORT option to view current runtime statistics when needed. See Reference for Oracle GoldenGate for Windows and UNIX.

18.6.2 Viewing Record Counts in the Process Report

Use the REPORTCOUNT parameter to report a count of transaction records that Extract or Replicat processed since startup. Each transaction record represents a logical database operation that was performed within a transaction that was captured by Oracle GoldenGate. The record count is printed to the report file and to the screen. For more information, see Reference for Oracle GoldenGate for Windows and UNIX.

18.6.3 Preventing SQL Errors from Filling the Replicat Report File

Use the WARNRATE parameter to set a threshold for the number of SQL errors that can be tolerated on any target table before being reported to the process report and to the error log. The errors are reported as a warning. If your environment can tolerate a large number of these errors, increasing WARNRATE helps to minimize the size of those files. For more information, see Reference for Oracle GoldenGate for Windows and UNIX.

18.7 Using the Discard File

By default, a discard file is generated whenever a process is started with the START command through GGSCI. The discard file captures information about Oracle GoldenGate operations that failed. This information can help you resolve data errors, such as those that involve invalid column mapping.

The discard file reports such information as:

  • The database error message

  • The sequence number of the data source or trail file

  • The relative byte address of the record in the data source or trail file

  • The details of the discarded operation, such as column values of a DML statement or the text of a DDL statement.

To view the discard file, use a text editor or use the VIEW REPORT command in GGSCI. See Reference for Oracle GoldenGate for Windows and UNIX.

The default discard file has the following properties:

  • The file is named after the process that creates it, with a default extension of .dsc. Example: finance.dsc.

  • The file is created in the dirrpt sub-directory of the Oracle GoldenGate installation directory.

  • The maximum file size is 50 megabytes.

  • At startup, if a discard file exists, it is purged before new data is written.

You can change these properties by using the DISCARDFILE parameter. You can disable the use of a discard file by using the NODISCARDFILE parameter. See Reference for Oracle GoldenGate for Windows and UNIX.

If a proces is started from the command line of the operating system, it does not generate a discard file by default. You can use the DISCARDFILE parameter to specify the use of a discard file and its properties.

Once created, a discard file must remain in its original location for Oracle GoldenGate to operate properly after processing has started.

18.8 Maintaining the Discard and Report Files

By default, discard files and report files are aged the same way. A new discard or report file is created at the start of a new process run. Old files are aged by appending a sequence number from 0 (the most recent) to 9 (the oldest) to their names.

If the active report or discard file reaches its maximum file size before the end of a run (or over a continuous run), the process abends unless there is an aging schedule in effect. Use the DISCARDROLLOVER and REPORTROLLOVER parameters to set aging schedules for the discard and report files respectively. These parameters set instructions for rolling over the files at regular intervals, in addition to when the process starts. Not only does this control the size of the files and prevent process outages, but it also provides a predictable set of archives that can be included in your archiving routine. For more information, see the following documentation:

No process ever has more than ten aged reports or discard files and one active report or discard file. After the tenth aged file, the oldest is deleted when a new report is created. It is recommended that you establish an archiving schedule for aged reports and discard files in case they are needed to resolve a service request.

Table 18-2 Current Extract and Manager Reports Plus Aged Reports

Permissions X Date Report

-rw-rw-rw-

1 ggs ggs
1193 Oct 11 14:59
MGR.rpt

-rw-rw-rw-

1 ggs ggs
3996 Oct 5  14:02
MGR0.rpt
-rw-rw-rw-
1 ggs ggs
4384 Oct 5  14:02
TCUST.rpt
-rw-rw-rw-
1 ggs ggs
1011 Sep 27 14:10
TCUST0.rpt
-rw-rw-rw-
1 ggs ggs
3184 Sep 27 14:10
TCUST1.rpt
-rw-rw-rw-
1 ggs ggs
2655 Sep 27 14:06
TCUST2.rpt
-rw-rw-rw-
1 ggs ggs
2655 Sep 27 14:04
TCUST3.rpt
-rw-rw-rw-
1 ggs ggs
2744 Sep 27 13:56
TCUST4.rpt
-rw-rw-rw-
1 ggs ggs
3571 Aug 29 14:27
TCUST5.rpt

18.9 Using the System Logs

Oracle GoldenGate writes errors that are generated at the level of the operating system to the Event Viewer on Windows or to the syslog on UNIX and Linux. Oracle GoldenGate events are basically the same format in the UNIX, Linux, and Windows system logs. Oracle GoldenGate errors that appear in the system logs also appear in the Oracle GoldenGate error log.

On UNIX and Linux, Oracle GoldenGate messaging to the syslog is enabled by default. On Windows, Oracle GoldenGate messaging to the Event Viewer must be installed by registering the Oracle GoldenGate message DLL during the installation process. You can add this functionality at any time by running the install program with the addevents option. This program is stored in the root of the Oracle GoldenGate directory.

Use the SYSLOG parameter to control the types of messages that Oracle GoldenGate sends to the system logs on a Windows or UNIX system. You can:

  • include all Oracle GoldenGate messages

  • suppress all Oracle GoldenGate messages

  • filter to include information, warning, or error messages, or any combination of those types

You can use SYSLOG as a GLOBALS or Manager parameter, or both. For more information, see Reference for Oracle GoldenGate for Windows and UNIX.

18.10 Reconciling Time Differences

To account for time differences between source and target systems, use the TCPSOURCETIMER parameter in the Extract parameter file. This parameter adjusts the timestamps of replicated records for reporting purposes, making it easier to interpret synchronization lag. For more information, see Reference for Oracle GoldenGate for Windows and UNIX.

18.11 Sending Event Messages to a NonStop System

Event messages created by Collector and Replicat processes on a Windows or UNIX system can be captured and sent to EMS on NonStop systems. This feature enables centralized viewing of Oracle GoldenGate messages across platforms. To use this feature, there are two procedures:

  • Run the EMS client on the Windows or UNIX system

  • Start a Collector process on the NonStop system

18.11.1 Running EMSCLNT on a Windows or UNIX System

The EMSCLNT utility captures Oracle GoldenGate event messages that originate on a Windows or UNIX system and sends them to a TCP/IP Collector process on the NonStop system. EMSCLNT reads a designated error log and runs indefinitely, waiting for more messages to send.

Run emsclnt from the Oracle GoldenGate directory on the Windows or UNIX system, using the following syntax:

emsclnt -h host_name | IP_address
-p port_number
-f file_name
-c Collector

Where:

  • -h host_name | IP_address is either the name or IP address of the NonStop server to which EMS messages will be sent.

  • -p port_number is the port number of the NonStop Collector process.

  • -f file_name is the name of the local file from which to distribute error messages. Use the full path name if the file resides somewhere other than the Oracle GoldenGate directory.

  • -c Collector is the EMS Collector for this client.

The following Windows example, executed from the DOS prompt, reads the file D:\ogg\ggserr.log for error messages. Error messages are sent to the Collector on NonStop host myhost.us.example.com listening on port 9876. The Collector process on NonStop writes formatted messages to EMS Collector $0.

Example 18-1 Reading the Log File for Error Messages (WIndows)

> emsclnt –h myhost.us.example.com –p 9876 –f d:\ogg\ggserr.log –c $0

The following UNIX example reads the file ggserr.log for error messages. Error messages are sent to the Collector on the NonStop server at IP address 10.0.0.0 listening on port 7850. The Collector on NonStop writes formatted messages to ems Collector $0.

Example 18-2 Reading the Log File for Error Messages (UNIX)

emsclnt –h 10.0.0.0 –p 7850 –f ggserr.log –c '$0'

Note:

Because the dollar sign on UNIX denotes a variable, the $0 must be within single quotes.

18.11.2 Running the Collector on NonStop

The Collector on the NonStop system (called Server-Collector on that platform) collects and distributes the ems messages. To start the Collector, run the server program. For each EMSCLNT process that you will be running on a Windows or UNIX system, start one server process on the NonStop system.

For example, the following runs server and outputs messages to $DATA1.GGSERRS.SERVLOG.

> ASSIGN STDERR, $DATA1.GGSERRS.SERVLOG
> RUN SERVER /NOWAIT/ –p 7880

See Administering Oracle GoldenGate for Mainframe for HP NonStop Guardian for more information about running Collector on NonStop.

18.12 Getting Help with Performance Tuning

See Chapter 19, "Tuning the Performance of Oracle GoldenGate" for help with tuning the performance of Oracle GoldenGate.