1 Introduction

Change data capture (CDC) is an integration technology that enables efficient near real-time integration by capturing changes made to enterprise data sources and providing them for processing in integration middleware, such as Oracle Fusion Middleware.

The Oracle CDC Adapter for SQL Server provides such capabilities for SQL Server 2000 and 2005, capturing changes in a non-intrusive manner leveraging the SQL Server Transaction Logs (TLOG). The CDC Adapter can be used by Oracle SOA Suite (BPEL, ESB), and Oracle Data Integrator (ODI).

This chapter provides an overview of the features and architecture of the OracleAS CDC Adapter for SQL Server. It includes the following topics:

OracleAS CDC Adapter for SQL Server Overview

The OracleAS CDC Adapter for SQL Server captures and delivers changes (such as insert, update, and delete operations) made to tables in SQL Server 2000 and 2005. By using the CDC Adapter, you can build efficient data integration solutions that use SQL Server data, synchronize systems to ensure data integrity, update data warehouses and operational data stores in near real-time, and enable event-driven business activity monitoring and processing.

The OracleAS CDC Adapter for SQL Server has the following capabilities:

  • Non-intrusive change capture: The OracleAS CDC Adapter for SQL Server is a log-based CDC technology. It leverages the SQL Server Transaction Logs (TLOG) to identify changes and does not add triggers or tables into the database.

    The CDC agent accesses a logstream that is polled to access the changes. The agent begins polling from a specific point called the stream position (or agent context). The last position read is saved as the stream position. This next time the agent polls the logstream for information, it starts at the saved stream position.

  • Real-time data capture: The OracleAS CDC Adapter for SQL Server captures changes to data in near-real-time for applications that demand zero latency and require the most up-to-date data. Unlike scheduled CDC solutions, near real-time data capture guarantees that a change event is can be made available in a matter of seconds at the consumer. The term near-real-time data capture is used because after a period of inactivity changes may be available after a few seconds.

  • Change storage: The Oracle CDC Adapter for SQL Server provides a staging area that stores the captured changes in a format that is easy to access and process.The staging area ensures that changes are persisted until they are processed, and provides facilities to clean up old change records.

  • Change access and delivery: The Oracle CDC Adapter for SQL Server includes a JCA Resource Adapter that can be configured in BPEL and ESB as an inbound endpoint, delivering changes in XML format.

  • Filtering. The OracleAS CDC Adapter for SQL Server enables you to filter captured data based on the type of operation (INSERT, UPDATE, DELETE), based on column selection and even based on specific column values.

  • Reliable delivery and recovery: The Oracle CDC Adapter for SQL Server stores changes in a reliable persistent storage and uses the concept of a 'stream position' that indicates the location in the change stream that was successfully moved and processed. By storing stream positions, the CDC Adapter can always restart after failure, recover, and continue moving and delivering changes from wherever it was stopped

  • Ease of use: The OracleAS CDC Adapter for SQL Server provides an easy and intuitive way to confugure CDC solutions using Oracle Studio.

Using the OracleAS CDC Adapter for SQL Server makes ETL (extract, transform, and load) processes more efficient and less disruptive to the operation of the source database systems, eliminating the need for the traditional ETL downtime window

OracleAS Adapter for SQL Server CDC Agent Architecture

The OracleAS CDC Adapter for SQL Server CDC agent uses the normal SQL Server transactin logs (TLOG) and a TLOG miner to extract the necessary data and place it into the staging area where it is available for consumption by BPEL/ESB. The following diagram shows this process:

Figure 1-1 SQL Server CDC Agent Architecture

SQL Server CDC agent architecture

The agent is a part of the overall OracleAS Adapter architecture that is described in OracleAS Adapter for SQL Server CDC Agent Architecture. The following describes the components shown in the diagram:

  • SQL Server: The Microsoft SQL server creates transaction logs that log server activity for recovery purposes. When using Replication, the logs hold information in a format that is consumable by Oracle's CDC agents.

    The active section of the TLOG contains the changes made to the currently active transaction.

    The reusable or inactive section has the information from older transactions, which do not require further processing by MS SQL server. This space is reusable. It is possible to back up the data or the MS SQL Server might truncate the log to create more space. When a TLOG is truncated, some of the data is dropped and is no longer available for the CDC agent to use. The Oracle CDC solution for the MS SQL Server is designed to prevent potential data loss.

  • TLOG miner (LGR): This component is installed as a stand-alone Microsoft Windows service. It reads the TLOG file and extracts or mines the change data and stores it in the Transient Storage area. It has two parts.

    • TLOG Detainer: The Microsoft SQL Server management policy periodically reorganizes the log files when necessary. In this case, the log files are truncated when their data is no longer active. The truncated data is erased from the system and cannot be used. Occasional truncation of the transaction LOG can expose the LGR to potential loss of data. The detainer is used to ensure the TLOG files are not truncated before being read by the TLOG miner.

      Truncation only takes place in the non-active section of the TLOG. The detainer places a detained transaction behind the logged records to be read. This creates a limit for the TLOGs active portion, which protects records from being truncated before they are read.

    • TLOG Parser: The parser parses the TLOG information and then writes it into Transient Storage.

  • Transient Storage: This is a set of binary files located in a specified working directory where the LGR stores the data it mines from the TLOG into Transient Storage. The log records are kept in Transient Storage according to the LGR cleanup policy and the maximum size limit specified at coniguation time.

    The OracleAS CDC Adapter for SQL Server CDC agent reads the changes from Transient Storage, parses it, orders it in committed transaction order, and places it in the staging area where it can be consumed by BPEL and ESB.

    For more information see, Setting Up a Change Data Capture with the OracleAS CDC Adapter for SQL Server.

  • SQL Server CDC agent: This is part of the OracleAS Adapter for SQL Server CDC Agent Architecture.

  • BPEL/ESB: This is part of the OracleAS Adapter for SQL Server CDC Agent Architecture.

  • Staging Area: This is part of the OracleAS Adapter for SQL Server CDC Agent Architecture.

OracleAS CDC Adapter for SQL Server Architecture

The following figure illustrates the system components used for change data capture using the OracleAS CDC Adapter for SQL Server.

Figure 1-2 Component Architecture used with the SQL Server CDC adapter

Architecture of the Oracle CDC Agent for IMS/DB.

The OracleAS CDC Adapter for SQL Server component architecture includes the following components:

  • Database Platform: The database platform is the data source that contains the data to be captured.

  • Database Log: The database log is a log stream that contains the raw change data.

  • Oracle Connect: Oracle Connect runs on the back-end system and handles from the special J2CA adapter that runs on the Oracle Application Server.

  • CDC agent: This is an OracleAS CDC Adapter for SQL Server component that runs under the control of the daemon (a listener process) on the computer where the back-end data is located. The capture agent provides access to change events in the transaction log for the change router.

    The agent is located on the same computer as the changes to be captured. It maintains the last position read in the journal (the stream position or context) and starts at this point the next time it polls the journal. The context is stored in the repository where the adapter definition is stored. The adapter definition includes a single primary interaction which is used to access the appropriate journal and includes the list of tables to monitor for changes.

  • BPEL/ESB: This is where the change events are consumed by means of a JCA provider that sends change records to an end point configured in either ESB or BPEL.

  • JC2A Resource Adapter: A special version of the J2CA 1.5 SQL Server adapter that can read change records from the staging area. It implements inbound endpoints for BPEL and ESB that pulls change events from the staging area and deliver them as XML change documents. Each endpoint handles change records for a single captured table at the source database.

    The J2CA adapter uses multiple connections to the staging area to process changes from multiple tables in parallel.

  • JDBC Driver: Enables access to changes in the staging area using SQL queries.

  • WebLogic / BPEL Components: Any OracleAS component that hosts and gets messages from the JC2A adapter.

  • Staging Area: This is a set of change tables where change records are stored before being sent to BPEL/ESB. Old records from the Staging Area are deleted based upon the specified aging policy.

    The Staging Area contains a DISAM data source for SQL access and a Database Events adapter for the XML access.

    For more information, see The Staging Area.

    The staging area includes the following components:

    • Change Router: This is an OracleAS CDC Adapter for SQL Server service that runs under the control of the daemon (a listener process) on the Oracle platform. The change router reads the change stream from the CDC agent, sorts them in transaction/commit order eliminating rolled-back changes, and distrubutes the changes into the various change tables.

      Only one instance of the change router is active at a time against a captured database. The change router also deletes old changes from the change tables according to a configurable aging policy. policy (48 hours is the default, which means that records that have been in the staging area for more than 48 hours are deleted).

    • Change Tables: The staging area is made of change tables, each containing changes to one captured table in the source database. These change tables maintain change events in transaction order and in occurrence order within a transaction for each captured table. The change tables are DISAM files that are indexed to support fast access to the change data, that can be accessed directly using the stream position.

    • Change Access Service. This is an instance Oracle Connect that is installed on a separate computer or the same computer as BPEL/ESB that allows access to the change tables. This instance of Oracle Connect is configure as the staging area.

  • Oracle Studio: Provides the developer or administrator with a graphical user interface for setting up the CDC Agent and Staging Area,using intuitive wizards. Oracle Studio can be installed on Linux, UNIX, or Windows.

The Staging Area

The staging area is an area used by Oracle Connect to store captured data from a journal. The journal is scanned once and changes for every required table read during that scan are passed to the staging area, where they are stored. This means that the journal is scanned once each time it is polled. After the changes are written to the staging area, processing of these changes is carried out independently of the journal.

The staging area is a benefit when transactions are used. The changed data is not written to the change queue until the transaction is committed. Thus, if the transaction fails, there is no overhead of having to back out any processing done with the steps in the failed transaction.

The staging area can be on any computer with a Windows, Linux, or UNIX operating system that is running the Oracle Application Server and does not need to be run on the same server as the SQL Server CDC adapter. Once the information is extracted from the journal and written to the staging area, processing of changes is carried out on the staging area only. Thus, the staging area should be set up to consider the network configuration and where the consumer application runs.

The staging area maintains the last position read by the consumer application (the staging area context) and starts at this point the next time a request from the consumer application is received. The context is stored in the repository where the staging area is maintained.

The staging area is indexed so that access to the staging area for a specific stream is quick.

The staging area is cleared by default every 48 hours. All events that have been in the staging area for more than 48 hours are deleted.