This chapter contains the following topics:
As a company evolves, it becomes increasingly important for it to be able to share information among multiple databases and applications. Companies need to share OLTP updates, database events, and application messages, as customers place orders online, through the sales force, or even with a partner. This information must be routed to a variety of destinations including heterogeneous replicated databases, message queuing systems, data warehouse staging areas, operational data stores, other applications, and a standby database.
There are three basic approaches to sharing information. You can consolidate the information into a single database, which eliminates the need for further integration. You can leave information distributed, and provide tools to federate that information, making it appear to be in a single virtual database. Or, you can share information, which lets you maintain the information in multiple data stores and applications. This chapter focuses on federating and sharing information.
See Also:
Chapter 16, "Business Intelligence" for more information on features to consolidate informationOracle provides distributed SQL for federating distributed information. Distributed SQL synchronously accesses and updates data distributed among multiple databases, while maintaining location transparency and data integrity.
Oracle Streams is the asynchronous information sharing infrastructure in the Oracle database. Oracle Streams can mine the Oracle redo logs to capture DML and DDL changes to Oracle data, and it makes that changed data available to other applications and databases. Thus, Oracle Streams can provide an extremely flexible asynchronous replication solution, as well as an event notification framework. Because Streams supports applications explicitly enqueuing and dequeuing messages, it also provides a complete asynchronous messaging solution. That solution, Oracle Streams Advanced Queuing, can be used to exchange information with customers, partners, and suppliers, and to coordinate business processes.Both Streams and distributed SQL can access and update data in non-Oracle systems using Oracle Transparent Gateways, Generic Connectivity, and the Messaging Gateway. Oracle can work with non-Oracle data sources, non-Oracle message queuing systems, and non-SQL applications, ensuring interoperability with other vendor's products and technologies. Each of the solutions are described in detail in the following sections.
A distributed environment is a network of disparate systems that seamlessly communicate with each other. Each system in the distributed environment is called a node. The system to which a user is directly connected is called the local system. Any additional systems accessed by this user are called remote systems. A distributed environment allows applications to access and exchange data from the local and remote systems. All the data can be simultaneously accessed and modified.
While a distributed environment enables increased access to a large amount of data across a network, it must also hide the location of the data and the complexity of accessing it across the network.
In order for a company to operate successfully in a distributed environment, it must be able to do the following:
Exchange data between Oracle databases
Communicate between applications
Exchange information with customers, partners, and suppliers
Replicate data between databases
Communicate with non-Oracle databases
A homogeneous distributed database system is a network of two or more Oracle databases that reside on one or more computers.
Distributed SQL enables applications and users to simultaneously access or modify the data in several databases as easily as they access or modify a single database.
An Oracle distributed database system can be transparent to users, making it appear as though it is a single Oracle database. Companies can use this distributed SQL feature to make all its Oracle databases look like one and thus reduce some of the complexity of the distributed system.
Oracle uses database links to enable users on one database to access objects in a remote database. A local user can access a link to a remote database without having to be a user on the remote database.
See Also:
Oracle Database Administrator's Guide for more information on database linksAn Oracle distributed database system lets application developers and administrators hide the physical location of database objects from applications and users. Location transparency exists when a user can universally refer to a database object, such as a table, regardless of the node to which an application connects. Location transparency has several benefits, including the following:
Access to remote data is simple, because database users do not need to know the physical location of database objects.
Administrators can move database objects with no impact on users or existing database applications. Typically, administrators and developers use synonyms to establish location transparency for the tables and supporting objects in an application schema.
In addition to synonyms, developers can use views and stored procedures to establish location transparency for applications that work in a distributed database system.
See Also:
Chapter 5, "Schema Objects" for more information on synonyms and views
Chapter 24, "SQL, PL/SQL, and Java" for more information on stored procedures
Oracle's distributed database architecture also provides query, update, and transaction transparency. For example, standard SQL statements such as SELECT, INSERT, UPDATE, and DELETE work just as they do in a non-distributed database environment. Additionally, applications control transactions using the standard SQL statements COMMIT, SAVEPOINT, and ROLLBACK.
Unlike a transaction on a local database, a distributed transaction involves altering data on multiple databases. Consequently, distributed transaction processing is more complicated, because Oracle must coordinate the committing or undo of the changes in a transaction as a self-contained unit. In other words, the entire transaction commits, or the entire transaction rolls back.
Oracle ensures the integrity of data in a distributed transaction using the two-phase commit mechanism. In the prepare phase, the initiating node in the transaction tasks the other participating nodes to promise to commit or undo the transaction. During the commit phase, the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes undo. The two-phase commit mechanism is completely transparent, requiring no complex programming or other special operations to provide distributed transaction control.
See Also:
"The Two-Phase Commit Mechanism"Distributed query optimization reduces the amount of data transfer required between sites when a transaction retrieves data from remote tables referenced in a distributed SQL statement. Distributed query optimization uses Oracle's optimizer to find or generate SQL expressions that extract only the necessary data from remote tables, process that data at a remote site (or sometimes at the local site) and send the results to the local site for final processing.
This operation reduces the amount of required data transfer when compared to the time it takes to transfer all the table data to the local site for processing. Using various optimizer hints, such as DRIVING_SITE, NO_MERGE, and INDEX, you can control where Oracle processes the data and how it accesses the data.
See Also:
Oracle Database Performance Tuning Guide for more information on the optimizer and hintsAt the heart of any integration is the sharing of data among various applications in the enterprise.
Replication is the maintenance of database objects in two or more databases. It provides a solution to the scalability, availability, and performance issues facing many companies. For example, replication can improve the performance of a company's Web site. By locally replicating remote tables that are frequently queried by local users, such as the inventory table, the amount of data going across the network is greatly reduced. By having local users access the local copies instead of one central copy, the distributed database does not need to send information across a network repeatedly, thus helping to maximize the performance of the database application. Oracle Streams provides powerful replication features that can be used to keep multiple copies of distributed objects synchronized.
Many companies have developed a variety of autonomous and distributed applications to automate business processes and manage business tasks. However, these applications need to communicate with each other, coordinating business processes and tasks in a consistent manner. They also need to exchange information efficiently with customers, partners, and suppliers over low-cost channels such as the Internet, while preserving a traceable history of events—a requirement previously satisfied through now obsolete paper forms.For loose application coupling, Oracle offers Oracle Streams Advanced Queuing, which is built on top of the flexible Oracle Streams infrastructure. Oracle Streams Advanced Queuing provides a unified framework for processing events.Events generated in applications, in workflow, or implicitly captured from redo logs or in database triggers can be captured and staged in a queue. These events can be consumed in a variety of ways. They can be applied automatically with a user-defined function or database table operation, or they can be dequeued explicitly. Also, notifications can be sent to the consuming application. These events can be transformed at any stage. If the consuming application is on a different database, then the events can be propagated to the appropriate database automatically. Operations on these events can be automatically audited, and the history can be retained for the user-specified duration.
Oracle Streams enables the propagation and management of data, transactions, and events in a data stream either within a database, or from one database to another. The stream routes published information to subscribed destinations. As users' needs change, they can implement a new capability of Oracle Streams, without sacrificing existing capabilities.
Oracle Streams provides a set of elements that allows users to control what information is put into a stream, how the stream flows or is routed from node to node, what happens to events in the stream as they flow into each node, and how the stream terminates. By specifying the configuration of the elements acting on the stream, a user can address specific requirements, such as message queuing or data replication.
Oracle Streams satisfies the information sharing requirements for a variety of usage scenarios. Oracle Streams Advanced Queuing provides the database-integrated message queuing and event management capabilities. In addition, Oracle includes tools to help users build event notification, replication and data warehouse loading solutions using Oracle Streams. Using the full power of Oracle Streams, you can create configurations that span multiple use cases, enabling new classes of applications. Most deployments and their associated metadata are compatible. For example, a system configured to load a data warehouse easily can be extended to enable bi-directional replication. A complete reconfiguration is not required.
The architecture of Oracle Streams is very flexible. As shown in Figure 23-1, Streams contains three basic elements.
Capture
Staging
Consumption
Oracle Streams can capture events implicitly and explicitly and place these events in a staging area. Database events, such as DML and DDL changes, can be implicitly captured by mining the redo log files. Rules determine what events are captured. Information representing a captured event is formatted as a logical change record (LCR) and placed in the staging area.
Oracle Streams supports mining the online redo log, as well as mining archived log files. In the case of online redo log mining, redo information is mined for change data at the same time it is written, reducing the latency of capture.
User applications explicitly can enqueue messages representing events into the staging area. These messages can be formatted as LCRs, which will allow them to be consumed by the apply engine, or they can be formatted for consumption by another user application using an explicit dequeue.
Once captured, events are placed in a staging area. The staging area is a queue that stores and manages captured events. LCRs and other types of events are stored in a staging area until subscribers consume them. LCR staging provides a holding area with security, as well as auditing and tracking of LCR data.
Subscribers examine the contents of the staging area and determine whether they have an interest in the message representing that event. A subscriber can either be a user application, a propagation to another staging area, usually on another system, or an apply process. The subscriber optionally can evaluate a set of rules to determine whether the message meets the criteria set forth in the subscription. If so, then the message will be consumed by the subscriber.
If the subscriber is a user application, then the application will dequeue the message from the staging area in order to consume the message. If the subscriber is a propagation to another staging area, then the message will be propagated to that staging area. If the subscriber is an apply process, then it will be dequeued and consumed by the apply process.
Events in the staging area optionally may be propagated to other staging areas in the same database, or to staging areas in remote databases. To simplify network routing and reduce network traffic, events need not be sent to all databases and applications. Rather, they can be directed through staging areas on one or more systems until they reach the subscribing system. Not all systems need subscribe to the events, providing flexibility regarding what events are applied at a particular system. A single staging area can stage events from multiple databases, simplifying setup and configuration.
As events enter the staging area, are propagated, or exit the staging area, they can be transformed. A transformation is a change in the form of an object participating in capture and apply or a change in the data it holds. Transformations can include changing the datatype representation of a particular column in a table at a particular database, adding a column to a table at one database only, or including a subset of the data in a table at a particular database.
Messages in a staging area are consumed by the apply engine, where the changes they represent are applied to a database, or they are consumed by an application. An Oracle Streams apply process is flexible. It enables standard or custom apply of events. A custom apply can manipulate the data or perform other actions during apply. Support for explicit dequeue allows application developers to use Oracle Streams to reliably exchange messages. They also can notify applications of changes to data, by leveraging the change capture and propagation features of Oracle Streams.
Oracle Streams is an information sharing technology that automatically determines what information is relevant and shares that information with those who need it. This active sharing of information includes capturing and managing events in the database including DML and DDL changes and propagating those events to other databases and applications. Data changes can be applied directly to the replica database or can call a user-defined procedure to perform alternative work at the destination database. For example, such a procedure can populate a staging table used to load a data warehouse.
The basic elements of the Oracle Streams technology used in replication environments include the following:
Configuring Streams for replication begins with specifying an object or set of objects to be replicated. Using the implicit capture mechanism of Oracle Streams, changes made to these objects can be captured efficiently and replicated to one or more remote systems with little impact to the originating system. This capture mechanism can extract both data changes (DML) and structure changes (DDL) from the redo log. The captured changes are published to a staging area. Log-based capture leverages the fact that changes made to tables are logged in the redo log to guarantee recoverability in the event of a malfunction or media failure.
Capturing changes directly from the redo log minimizes the overhead on the system. Oracle can read, analyze, and interpret redo information, which contains information about the history of activity on a database. Oracle Streams can mine the information and deliver change data to the capture process.
Replicated databases utilizing Oracle Streams technology need not be identical. Participating databases can maintain different data structures using Streams to transform the data into the appropriate format. Streams provides the ability to transform the stream at multiple points: during change capture at the source database, while propagating to another database, or during application at the destination site. These transformations are user-defined functions registered within the Oracle Streams framework. For example, the transformation can be used to change the datatype representation of a particular column in a table or to change the name of a column in a table or change a table name.
The data at each site can be subsetted based on content as well. For example, the replica can use a rule which specifies that only the employees for a particular division based on the department identifier column be contained within the table. Oracle Streams automatically manages the changes to ensure that the data within the replica matches the subset rule criteria.
Events in a staging area can be sent to staging areas in other databases. The directed network capability of Streams allows changes to be directed through intermediate databases as a pass-through. Changes at any database can be published and propagated to or through other databases anywhere on the network. By using the rules-based publish and subscribe capabilities of the staging area queues, database administrators can choose which changes are propagated to each destination database and can specify the route messages traverse on their way to a destination.
Thus, for example, a company could configure replication to capture all changes to a particular schema, propagate only changes to European customers to their European headquarters in London, apply only those changes relevant to the London office, and forward site-specific information to be applied at each field office.
This directed network approach is also friendly to Wide Area Networks (WAN), enabling changes to subsequent destinations to traverse the network once to a single site for later fan-out to other destinations, rather than sending to each destination directly.
Messages in a staging area can be consumed by an apply process, where the changes they represent are applied to database objects, or they can be consumed by an application. User-defined apply procedures enable total control over the events to be applied.
Using custom apply, separate procedures can be defined for handling each type of DML operation (inserts, updates, or deletes) on a table. For example, using this custom apply capability, a user could write a procedure to skip the apply of all deletes for the employees table, for employees with a salary greater than $100,000, based on a value for the employee in the salary table. Inserts and updates to the employees table would continue to be applied using the default apply engine, as would deletes for employees with salaries less than $100,000.
Custom apply also could be used to perform custom transformations of data. For example, changes to one table at the originating site might need to be applied to three different tables at the remote location.
The remote databases in a replication environment are fully open for read/write, and need not be identical copies of the source database. Because the remote database can be updated by other means, an apply process detects conflicts before changes are applied. These conflicts also can be automatically resolved using built-in or custom resolution mechanisms.
Beyond database integration, Oracle Streams Advanced Queuing provides many features that make it the most robust and feature rich message queuing system. These features improve developer productivity and reduce the operational burden on administrators, which reduces the cost of building and maintaining Oracle-based distributed applications. These features are described in the following sections.
Oracle Streams Advanced Queuing provides asynchronous integration of distributed applications. It offers several ways to enqueue messages. A capture process can capture the messages from redo logs implicitly, or applications and users can capture messages explicitly.Messages can be enqueued with delay and expiration. Delay allows an enqueued message to be visible at a later date. Advanced Queuing also supports several ways to order messages before consumption. It supports first-in first-out ordering and priority-based ordering of messages.Advanced Queuing also offers multiple ways to consume a message. Automatic apply lets users invoke a user-specified action for the message. Consuming applications can dequeue a message explicitly. Both blocking and nonblocking dequeue is supported. The consuming applications can choose to receive notifications either procedurally using PL/SQL, OCI, or Java callback functions. Alternatively, they can get notifications in an e-mail or by HTTP post. Consuming applications can also choose to perform automatic apply.
Oracle Streams Advanced Queuing offers an extensible framework for developing and integrating distributed applications. Many applications are integrated with a distributed hub and spoke model with the Oracle database server as the hub.
The distributed applications on an Oracle database communicate with queues in the same Oracle database server hub. Oracle's extensible framework lets multiple applications share the same queue, eliminating the need to add additional queues to support additional applications.
Also, Advanced Queuing supports multiconsumer queues, where a single message can be consumed by multiple applications. As additional applications are added, these applications can coordinate business transactions using the same queues and even the same messages in the Oracle database server hub. It offers the benefits of extensibility without losing guaranteed once and only once delivery of a message.
Advanced Queuing supports a content-based publish and subscribe model, where applications publish messages and consumers subscribe to the messages without knowledge of the publishing application. With such a model, it is possible to add consuming applications to a hub with no change required to existing applications.
If the distributed applications are running on different Oracle databases, then business communications can be automatically propagated to the appropriate Oracle database. The propagation is managed automatically by the Oracle Streams Advanced Queuing system and is transparent to the application.
Traditionally, different applications had to use a common data model for communication. This data model was further restricted by the limited datatype support of the message-oriented middleware. Oracle Streams Advanced Queuing supports AnyData queues that can store messages of multiple datatypes.
Advanced Queuing provides applications with the full power of the Oracle type system. It includes support for scalar datatypes such as number, date, varchar, and so on, Oracle object types with inheritance, XMLType with additional operators for XML data, and AnyData support. In particular, with XMLType type support, application developers can make use of the full power of XML for extensibility and flexibility in business communications.
Oracle Streams Advanced Queuing also offers transformation capabilities. Applications with different data models can transform the messages while dequeuing or enqueuing the messages to or from their own data model. These transformation mappings are defined as SQL expressions, which can involve PL/SQL functions, Java functions, or external C callouts.
The Oracle Messaging Gateway integrates Oracle database applications with other message queuing systems, such as Websphere MQ (formerly called MQ Series) and Tibco. Because many legacy applications on mainframes communicate with Websphere MQ, there is a need for integrating these applications into an Oracle environment. The message gateway makes non-Oracle message queues appear as if they were Oracle Streams queues, and automatically propagates messages between Oracle Streams queues and Websphere MQ or Tibco queues.
Distributed applications spanning multiple partners can coordinate using the Internet access features of Oracle Streams Advanced Queuing. Using these features, a business partner or application securely can place an order into an advanced queuing queue over the Internet. Only authorized and authenticated business partners can perform these operations.
Advanced Queuing Internet operations utilize an XML-based protocol over Internet transports, such as HTTP(S), allowing messages to flow through firewalls without compromising security. Supporting the Internet for communications drastically reduces the cost of communications, and thus the cost of the entire solution.
Oracle Streams Advanced Queuing supports industry-standard APIs: SQL, JMS, and SOAP. Database changes made using SQL are captured automatically as messages.
Similarly, the distributed messages and database changes can be applied to database tables, which can be seen using SQL. The messages can be enqueued and dequeued using industry-standard JMS. Advanced queuing also has a SOAP-based XML API and supports OCI and OCCI to enqueue and dequeue messages.
Client applications can receive notifications when the result set of a registered query changes. For example, if the client registers a query of the hr.employees table, and if a user adds an employee, then the application can receive a database change notification when a new row is added to the table. A new query of hr.employees returns the changed result set. Database Change Notification is relevant in many development contexts, but is particularly useful to mid-tier applications that rely on cached data.
See Also:
Oracle Database Application Developer's Guide - Fundamentals for more information on using Database Change NotificationChange Data Capture, a feature built on the Oracle Streams infrastructure, efficiently identifies and captures data that has been added to, updated, or removed from Oracle relational tables, and it makes the change data available for use by ETL tools and applications. Using the Change Data Capture capabilities of Oracle Streams, it quickly identifies and processes only the data that has changed, not entire tables.
Oracle Streams is an open information sharing solution, supporting heterogeneous replication between Oracle and non-Oracle systems. Using a transparent gateway, DML changes initiated at Oracle databases can be applied to non-Oracle databases.
To implement capture and apply of DML changes from an Oracle source to a non-Oracle destination, an Oracle system functions as a proxy and runs the apply process that would normally be running at an Oracle destination site. The Oracle system then communicates with the non-Oracle system with a transparent gateway.
The changes are dequeued in an Oracle database itself and the local apply process applies the changes to a non-Oracle system across a network connection through a gateway.
Users who want to propagate changes from a non-Oracle database to an Oracle database write an application to capture the changes made to the non-Oracle database. The application can capture the changes by reading from transaction logs or by using triggers. The application is then responsible for assembling and ordering these changes into transactions, converting them into the Oracle defined logical change record (LCR) format, and publishing them into the target Oracle database staging area. These changes can be applied with a Streams apply process.
See Also:
"Oracle Transparent Gateways"Use Oracle Streams to create configurations that enable new classes of applications. In addition, all deployments and their associated metadata are compatible. For example, a replication installation easily can be extended to load a data warehouse or enable bi-directional replication—a complete reconfiguration is not required.
Suppose that a company uses Oracle Streams to maintain multiple copies of a corporate Web site for improved availability, scalability, and performance. Additional requirements could include a reporting database containing the most current information for analysts in a company headquarters office in New York to perform ad-hoc querying as well as a disaster recovery database separately maintained from their New York office. Additionally, updatable materialized views can be used to support the field sales staff. A final requirement is to share data with existing applications that are hosted on a Sybase database.
Figure 23-2 illustrates this Streams configuration.
Oracle Streams is used to replicate data in an N-way configuration consisting of three regional sites: New York, London, and Tokyo. At each of these sites, Streams log-based capture collects any changes that occur for subscribed tables in each local region, and stages them locally in the queue. All changes captured in each region are then forwarded to each of the other region's databases with the goal that all changes made at each database be reflected at every other database, providing complete data for the subscribed objects throughout the world.
Because the updates are applied automatically when received at each regional database, an Oracle Streams apply process is used to apply the changes. As changes are applied, Oracle Streams checks for and resolves any conflicts that are detected. Streams also can be used to exchange data for particular tables with non-Oracle databases. Utilizing the Oracle Transparent Gateway for Sybase, a Streams apply process applies the changes to a Sybase database using the same mechanisms as it does for Oracle databases.
The databases for reporting and disaster recovery are hosted from the New York database site. The reporting database is a fully functional Oracle database that has a read-only copy of the relevant application tables. The reporting site is not configured to capture changes on these application tables. Streams imposes no restrictions on the configuration or usage of this reporting database.
The London site also serves as the master site for several updatable materialized view sites. Each salesperson receives an updatable copy of just the portion of the data that he requires. These sites typically only connect once a day to upload their orders and download any changes since their last refresh.
Oracle Streams is fully inter-operational with materialized views, or snapshots, which can be used to maintain updatable or read-only, point-in-time copies of data. They can be defined to contain a full copy of a table or a defined subset of the rows in the master table that satisfy a value-based selection criterion. There can be multitier materialized views as well, where one materialized view is based on another materialized view. Materialized views are periodically updated, or refreshed, from their associated master tables through transactionally consistent batch updates.
Read-only materialized views can be used to periodically propagate the updated product catalog to the various sales offices, because the product catalog is only updated at the headquarters location.
Because materialized views do not require a dedicated connection, they are ideal for disconnected computing. For example, a company might choose to use updatable materialized views for the members of their sales force. A salesperson could enter orders into his or her laptop throughout the day, then simply dial up the regional sales office at the end of the day to upload these changes and download any updates.
Oracle provides two solutions for integrating the Oracle database server with non-Oracle databases--Generic Connectivity and Transparent Gateways. These solutions enable Oracle clients to access non-Oracle data stores. They translate third party SQL dialects, data dictionaries, and datatypes into Oracle formats, thus making the non-Oracle data store appear as a remote Oracle database. These technologies enable companies to integrate seamlessly the different systems and provide a consolidated view of the company as a whole.
Generic Connectivity and Oracle Transparent Gateways can be used for synchronous access, using distributed SQL. In addition, Oracle Transparent Gateways can be used for asynchronous access, using Oracle Streams. Introducing a Transparent Gateway into an Oracle Streams environment enables replication of data from an Oracle database to a non-Oracle database.
Both Generic Connectivity and Oracle Transparent Gateways transparently access data in non-Oracle systems from an Oracle environment. As with an Oracle distributed database environment, location transparency can be extended to objects residing in non-Oracle systems as well. Therefore, users can create synonyms for the objects in the non-Oracle database and refer to them without having to specify its physical location. This transparency eliminates the need for application developers to customize their applications to access data from different non-Oracle systems, thus decreasing development efforts and increasing the mobility of the application. Instead of requiring applications to interoperate with non-Oracle systems using their native interfaces (which can result in intensive application-side processing), applications can be built upon a consistent Oracle interface for both Oracle and non-Oracle systems.
Generic Connectivity is a generic solution that uses an ODBC or OLEDB driver to access any ODBC or OLEDB compliant non-Oracle system. It provides data access to many data stores for which Oracle does not have a gateway solution. This enables transparent connectivity using industry standards, such as ODBC and OLEDB. Generic connectivity makes it possible to access low-end data stores, such as Foxpro, Access, dBase, and non-relational targets like Excel.
In contrast to Generic Connectivity, which is a generic solution, Oracle Transparent Gateways are tailored solutions, specifically coded for the non-Oracle system. They provide an optimized solution, with more functionality and better performance than Generic Connectivity.
Generic Connectivity relies on industry standards, whereas Oracle Transparent Gateways accesses the non-Oracle systems using their native interface. The Transparent Gateways are also end-to-end certified. Oracle has Transparent Gateways to many sources, including Sybase, DB2, Informix, and Microsoft SQL Server.