1 Introduction to Oracle Data Integrator

Learn what you can do with Oracle Data Integrator and understand the important concepts before you get started.

About Oracle Data Integrator

A brief overview of Oracle Data Integrator.

Oracle Data Integrator (ODI) provides a fully unified solution for building, deploying, and managing complex data warehouses or as part of data- centric architectures in a SOA or business intelligence environment. In addition, it combines all the elements of data integration - data movement, data synchronization, data quality, data management, and data services - to ensure that information is timely, accurate, and consistent across complex systems.

Traditional ETL (Extract, Transform, Load) tools operate by, first, extracting the data from various sources, then, transforming the data in a proprietary, middle-tier ETL engine that is used as the staging area, and finally loading the transformed data into the target data warehouse, integration server, or Hadoop cluster. Hence the term ETL represents both the names and the order of the operations performed. ODI extracts the data from the various sources, and then loads it to the target where it's transformed. This is known as E-LT: Extract, Load, Transform.

ODI features an active integration platform that includes all styles of data integration: data-based, event-based and service-based. ODI transforms large volumes of data efficiently, processing events in real time through its advanced Changed Data Capture (CDC) framework, and providing data services to the Oracle SOA Suite. It also provides robust data integrity control features, assuring the consistency and correctness of data. With powerful core differentiators - heterogeneous E-LT, Declarative Design and Knowledge Modules - ODI meets the performance, flexibility, productivity, modularity and hot-pluggability requirements of an integration platform.

See Overview of Oracle Data Integrator in Understanding Oracle Data Integrator.

Oracle Data Integrator Repositories

An overview of Oracle Data Integrator Repositories.

The Oracle Data Integrator (ODI) Repository is composed of a master repository and at least one work repository. Objects developed or configured through the users are stored in one of these repository types. It's installed on an OLTP relational database. There's usually only one master repository that stores the Security information, Topology information, and versioned and archived objects. The work repository is the one that contains actual developed objects. Several work repositories may coexist in the same ODI installation (for example, to have separate environments or to match a particular versioning life cycle). A work repository stores information for Models, Projects, and Scenario execution. When the work repository contains only the execution information (typically for production purposes), it's called an execution repository.

Create Repositories and Connections for Oracle Data Integrator

Oracle Data Integrator Navigators

An overview of Oracle Data Integrator Navigators.

ODI Studio provides four Navigators for managing the different aspects and steps of an ODI integration project, Topology Navigator, Designer Navigator, Operator Navigator, and Security Navigator.

The Topology Navigator is used to manage the data describing the information system's physical and logical architecture. Through it you can manage the topology of your information system, the technologies and their data types, the data servers linked to these technologies and the schemas they contain, the contexts, the language and the agents, as well as the repositories. The site, machine, and data server descriptions enable Oracle Data Integrator to execute the same mappings in different environments.

The Designer Navigator is used to design data integrity checks and to build transformations, such as automatic reverse-engineering of existing applications or databases, graphical development and maintenance of transformations and mappings, visualization of data flows in the mappings, automatic documentation generation, and customization of the generated code. The main objects handled through it are Models and Projects.

The Operator Navigator is the production management and monitoring tool. It's designed for IT production operators. Through it you can manage your executions in the sessions, as well as the scenarios in production.

The Security Navigator is the tool for managing the security information in ODI. Through it you can create users and profiles and assign user rights for methods on generic objects, and fine-tune these rights on the object instances.

The Oracle Data Integrator User Interface

Oracle Data Integrator Run-Time Agent

An overview of Oracle Data Integrator's Run-Time Agent.

At design time, developers generate scenarios from the business rules that they have designed. The code of these scenarios is then retrieved from the repository by the Run-Time Agent. This agent then connects to the data servers and orchestrates the code execution on these servers. It retrieves the return codes and messages for the execution, as well as additional logging information - such as the number of processed records, and execution time - in the repository.

The Agent comes in three different flavors, standalone Agent, standalone Colocated Agent, and Java EE Agent. The standalone agent is more appropriate when you need to use a resource that is local to one of your data servers, and you do not want to install a Java EE application server on this machine. Standalone colocated agents can be installed on any server with a Java Virtual Machine installed, but require a connection to the WebLogic Administration Server. This type of agent is more appropriate when you need to use a resource that is local to one of your data servers but you want to centralize management of all applications in an enterprise application server. A Java EE agent is deployed as a web application in a Java EE application server, and can benefit from all the features of the application server. This type of agent is more appropriate when there is a need for centralizing the deployment and management of all applications in an enterprise application server, or when you have requirements for high availability.

Oracle Data Integrator Agents

Oracle Data Integrator Data Servers

Learn about Data Servers in Oracle Data Integrator.

The physical components that store and expose structured data in Oracle Data Integrator (ODI) are defined as data servers. Each data server is always linked to a single technology. It stores information according to a specific technical logic, which is declared in the physical schemas attached to it. Every database server, JMS message file, group of flat files, or other form of data source or target, that is used in ODI, must be declared as a data server. Similarly, every schema, database, JMS Topic, and so forth used in ODI, must be declared as a physical schema.

Data Servers

Oracle Data Integrator Models

An overview of Oracle Data Integrator models.

A model is the description of a set of datastores. It corresponds to a group of tabular data structures stored in a data server. A model is based on a Logical Schema defined in the topology. In a given Context, this Logical Schema is mapped to a physical schema. The data schema of this physical schema contains physical data structure: tables, files, JMS messages, elements from an XML file, that are represented as datastores. Models, as well as all their components, are based on the relational paradigm (table, attributes, keys, and so on). Models in Oracle Data Integrator only contain metadata - that is, the description of the data structures. They do not contain a copy of the actual data.

To automatically populate datastores into the model, you reverse-engineer the model. A standard reverse-engineering uses the capacities of the JDBC driver used to connect the data server to retrieve the model metadata. A customized reverse-engineering uses a reverse-engineering Knowledge Module (RKM), to retrieve metadata for a specific type of technology and create the corresponding datastore definition in the data model.


Oracle Data Integrator Datastores

An overview of Oracle Data Integrator Datastores.

A datastore is a data structure that can be used as a source or a target in a mapping. It can be a table stored in a relational database, a Hive table in a Hadoop cluster, an ASCII or EBCDIC file (delimited, or fixed length), a node from a XML file, a JMS topic or queue from a Message Oriented Middleware, a node from a enterprise directory, or an API that returns data in the form of an array of records. Regardless of the underlying technology, all data sources appear in Oracle Data Integrator in the form of datastores that can be manipulated and integrated in the same way. The datastores are grouped into data models.

Data Stores

Oracle Data Integrator Data Flow

An introduction to data flow in Oracle Data Integrator.

Business rules defined in the mapping are automatically converted into a data flow that carries out the joins, filters, mappings, and constraints from source data to target tables.

By default, Oracle Data Integrator (ODI) uses the Target RDBMS as a staging area for loading source data into temporary tables and applying all the required mappings, staging filters, joins and constraints. The staging area is a separate area in the RDBMS (a user/database) where ODI creates its temporary objects and executes some of the rules. When performing the operations this way, ODI uses an E-LT strategy as it first extracts and loads the temporary tables and then finishes the transformations in the target RDBMS.

In cases when source volumes are small (less than 500,000 records), the staging area can be located in ODI's In-Memory Engine - its own in-memory relational database. ODI then behaves like a traditional ETL tool.

Data Flow

Oracle Data Integrator Integration Projects

An introduction to integration projects in Oracle Data Integrator.

An integration project is composed of several types of components. These components include organizational objects, such as folders, procedures, and packages, and development objects such as mappings, sequences, and variables. A project's components can be defined in the project and/or global components referenced by the project. Also, a project can use components defined in the models and the topology.

The package is a large unit of execution in ODI. A package is made up of a sequence of steps organized into an execution diagram. Each step can either succeed or fail its execution. Depending on the execution result (success or failure), a step can branch to another step.

When a package, mapping, procedure, or variable component has been fully developed, it's compiled in a scenario. A scenario is the execution unit for production environments. Scenarios can be scheduled for automated execution.




Oracle Data Integrator Mappings

An overview of Oracle Data Integrator Mappings.

A mapping connects sources to targets through a flow of components such as Join, Filter, Aggregate, Set, Split, and so on. A mapping references the Knowledge Modules that will be used to generate the integration process. A mapping is made up of the source and target datastores. Optionally, you can use datasets within mappings as data sources. A dataset is a logical container organizing datastores by an entity relationship declared as joins and filters, rather than the flow mechanism used elsewhere in mappings.

Reusable mappings are modular, encapsulated flows of components which you can save and re-use. You can place a reusable mapping inside another mapping, or another reusable mapping (reusable mappings can be nested). A reusable mapping can also include datastores as sources and targets, like other mapping components.


Oracle Data Integrator Knowledge Modules

An overview of Oracle Data Integrator Knowledge Modules.

Knowledge Modules (KM) implement "how" the integration processes occur. There are six Knowledge Module types, each referring to a specific integration task:
  • Reverse-engineering metadata from the heterogeneous systems for Oracle Data Integrator (RKM). See Creating and Using Data Models and Datastores in Developing Integration Projects with Oracle Data Integrator.
  • Journalizing Knowledge Modules handling Changed Data Capture (CDC) on a given system (JKM). See Using Journalizing in Developing Integration Projects with Oracle Data Integrator.
  • Integrating Knowledge Modules (IKM) are used in mappings, to integrate data in a target system, using specific strategies (insert/update, slowly changing dimensions). See Creating and Using Mappings in Developing Integration Projects with Oracle Data Integrator.
  • Controlling Knowledge Modules, control data integrity on the data flow (CKM) in a data model's static check and mappings flow checks. See Creating and Using Data Models and Datastores in Developing Integration Projects with Oracle Data Integrator.
  • Loading Knowledge Modules (LKM) load data from one system to another, using system-optimized methods. These KMs are used in mappings. See Creating and Using Mappings in Developing Integration Projects with Oracle Data Integrator.
  • Service Knowledge Modules (SKM) expose data in the form of web services. See Creating and Using Data Services in Administering Oracle Data Integrator.

A Knowledge Module is a code template for a given integration task. At run time, Oracle Data Integrator sends this code for execution to the source and target systems. Knowledge Modules are also fully extensible. Their code is open and can be edited through a graphical user interface.

Knowledge Modules

Oracle Data Integrator Scenarios

Learn about Oracle Data Integrator Scenarios.

In Oracle Data Integrator (ODI), you can generate scenarios for packages, procedures, mappings, and variables. Scenarios are saved to the development work repository, which you can export to other development or execution repositories. You can run them from ODI Studio, or from the command line.


Oracle Data Integrator Load Plans

An introduction to load plans in Oracle Data Integrator.

When Oracle Data Integrator is used to populate very large data warehouses, it's common to have thousands of tables populated by using hundreds of scenarios. The execution of these scenarios must be organized in such a way that data throughput from the sources to the target is the most efficient within the batch window. Load plans help to organize the execution of scenarios in a hierarchy of sequential and parallel steps for these types of use cases. They provide features for parallel, sequential, and conditional scenario execution, restartability, and exception handling. Load plans can be created and modified in production environments.

Load Plans

Oracle Data Integrator Console

An overview of the Oracle Data Integrator Console.

Oracle Data Integrator Console allows you to manage and monitor Oracle Data Integrator run-time architecture, and browse run-time objects in a web-based console. It integrates with Oracle Fusion Middleware Console allowing Fusion Middeware administrators access to details of the Oracle Data Integrator sessions and components.