What's New In Oracle Data Integrator?

This section summarizes the new features and significant product changes for Oracle Data Integrator (ODI) in the Oracle Fusion Middleware 12c release.

This chapter includes the following sections:

New and Changed Features for Release 12c (12.1.3.0.1)
New and Changed Features for Release 12c (12.1.3)
New and Changed Features for Release 12c (12.1.2)

New and Changed Features for Release 12c (12.1.3.0.1)

Oracle Data Integrator 12c (12.1.3.0.1) introduces the following enhancements:

Execution of ODI Mappings using Spark and Pig
Orchestration of ODI Jobs using Oozie
Enhanced Hive Driver and Knowledge Modules
Retrieval of Hadoop Audit Logs
HDFS access in ODI File Tools
Flatten and Jagged Components
New Big Data Guide added to the ODI Documentation Set

Execution of ODI Mappings using Spark and Pig

ODI allows the defining of mappings through a logical design, which is independent of the implementation language. For Hadoop-based transformations, you can select between Hive, Spark, and Pig as the generated transformation language. This allows you to pick the best implementation based on the environment and use case; you can also choose different implementations simultaneously using multiple physical designs. This selection makes development for Big Data flexible and future-proof.

Generate Pig Latin transformations: You can choose Pig Latin as the transformation language and execution engine for ODI mappings. Apache Pig is a platform for analyzing large data sets in Hadoop and uses the high-level language, Pig Latin for expressing data analysis programs. Any Pig transformations can be executed either in Local or MapReduce mode. Custom Pig code can be added through user-defined functions or the table function component.
Generate Spark transformations: ODI mapping can also generate PySpark, which exposes the Spark programming model in the Python language. Apache Spark is a transformation engine for large-scale data processing. It provides fast in-memory processing of large data sets. Custom PySpark code can be added through user-defined functions or the table function component.

Orchestration of ODI Jobs using Oozie

You can now choose between the traditional ODI Agent or Apache Oozie as the orchestration engine for ODI jobs such as mappings, packages, scenarios, and procedures. Apache Oozie allows a fully native execution on a Hadoop infrastructure without installing an ODI environment for orchestration. You can utilize Oozie tools to schedule, manage, and monitor ODI jobs. ODI uses Oozie's native actions to execute Hadoop processes and conditional branching logic.

Enhanced Hive Driver and Knowledge Modules

ODI includes the WebLogic Hive JDBC driver that provides a number of advantages when compared to the Apache Hive driver, such as, total JDBC compliance and improved performance. All Hive Knowledge Modules have been rewritten to benefit from this new driver. Also, the Knowledge Modules whose main purpose is to load from a source are now provided as Load Knowledge Modules, enabling them to be combined in a single mapping with other Load Knowledge Modules. A new class of "direct load" Load Knowledge Modules also allows the loading of targets without intermediate staging. The table function component has been extended to support Hive constructs.

Retrieval of Hadoop Audit Logs

ODI integrates results from Hadoop Audit Logs in Operator tasks for executions of Oozie, Pig, and other tasks. The log results show MapReduce statistics and provide a link to Hadoop statistics in native web consoles.

HDFS access in ODI File Tools

The file based tools used in ODI packages and procedures have been enhanced to include Hadoop Distributed File System (HDFS) file processing. This includes copying, moving, appending, and deleting files, detecting file changes, managing folders, and transferring files using FTP directly into HDFS.

Flatten and Jagged Components

The new Flatten component for mappings allows complex sub-structures to be processed as part of a flat list of attributes. The new Jagged component converts key-value lists into named attributes for further processing.

New Big Data Guide added to the ODI Documentation Set

A new guide, Integrating Big Data with Oracle Data Integrator, has been added to the ODI documentation set. This guide provides information on how to integrate Big Data, deploy and execute Oozie workflows, and generate code in languages such as Pig Latin and Spark.

New and Changed Features for Release 12c (12.1.3)

Oracle Data Integrator 12c (12.1.3) introduces the following enhancements:

ODI FIPS Compliance
ODI XML Driver Enhancements
JSON Support
Hadoop SQOOP Integration
Hadoop HBase Integration
Hive Append Optimization
Multi-threaded Target Table Load in ODI Engine
Improved Control for Scenario and Load Plan Concurrent Execution
Create New Model and Topology Objects
Documentation Changes

ODI FIPS Compliance

ODI now uses Advanced Encryption Standard (AES) as the standard encryption algorithm for encrypting Knowledge Modules, procedures, scenarios, actions, and passwords. You can configure the encryption algorithm and key length to meet requirements. Passwords and other sensitive information included in repository exports are now encrypted and secured by a password.

For more information, see "Advanced Encryption Standard".

ODI XML Driver Enhancements

The following XML Schema support enhancements have been added:

Recursion: ODI now supports recursion inside XML Schemas.
any, anyType, and anyAttribute: Data defined by these types is stored in string type columns with XML markup from the original document.
Metadata annotations can be added inside an XML Schema to instruct the ODI XML Driver which table name, column name, type, length, and precision should be used.

For more information, see "Oracle Data Integrator Driver for XML Reference" in Connectivity and Knowledge Modules Guide for Oracle Data Integrator.

JSON Support

The ODI Complex File Driver can now read and write files in JSON format. The JSON structure is defined through an nXSD schema.

For more information, see "JSON Support" in Connectivity and Knowledge Modules Guide for Oracle Data Integrator.

Hadoop SQOOP Integration

ODI can now load the following sources and targets using Hadoop SQOOP:

From relational databases to HDFS, Hive, and HBase through Knowledge Module IKM File-Hive to SQL (SQOOP)
From HDFS and Hive to relational databases through Knowledge Module IKM SQL to Hive-HBase-File (SQOOP)

SQOOP enables load and unload mechanisms using parallel JDBC connections in Hadoop Map-Reduce processes.

For more information, see "Hadoop" in Application Adapters Guide for Oracle Data Integrator.

Hadoop HBase Integration

ODI now supports Hadoop HBase through a new technology and the following knowledge modules:

LKM HBase to Hive (HBase-SerDe)
IKM Hive to HBase Incremental Update (HBase-SerDe)
RKM HBase

For more information, see "Hadoop" in Application Adapters Guide for Oracle Data Integrator.

Hive Append Optimization

Knowledge Modules writing to Hive now support the Hive 0.8+ capability and can append data to the existing data files rather than copying existing data into a new appended file.

For more information, see "Hadoop" in Application Adapters Guide for Oracle Data Integrator.

Multi-threaded Target Table Load in ODI Engine

ODI can now load a target table using multiple parallel connections. This capability is controlled through the Degree of Parallelism for Target property in the data server.

For more information, see "Creating a Data Server".

Improved Control for Scenario and Load Plan Concurrent Execution

You can now limit concurrent executions in a scenario or load plan and force a concurrent execution to either wait or raise an execution error.

For more information, see "Controlling Concurrent Execution of Scenarios and Load Plans" in Developing Integration Projects with Oracle Data Integrator.

Create New Model and Topology Objects

The Create New Model and Topology Objects dialog in the Designer Navigator provides the ability to create a new model and associate it with new or existing topology objects, if connected to a work repository. This dialog enables you to create topology objects without having to use Topology editors unless more advanced options are required.

For more information, see "Creating a Model and Topology Objects" in Developing Integration Projects with Oracle Data Integrator.

Documentation Changes

The information that was previously available in the Oracle Data Integrator Developer's Guide is now reorganized. The following new guides have been added to the ODI documentation library:

Understanding Oracle Data Integrator
Administering Oracle Data Integrator
Oracle Data Integrator Tool Reference

For more information, see "What's New In Oracle Data Integrator?" in Developing Integration Projects with Oracle Data Integrator.

New and Changed Features for Release 12c (12.1.2)

Oracle Data Integrator 12c (12.1.2) introduces the following enhancements:

Declarative Flow-Based User Interface
Reusable Mappings
Multiple Target Support
Step-by-Step Debugger
Runtime Performance Enhancements
Oracle GoldenGate Integration Improvements
Standalone Agent Management with WebLogic Management Framework
Integration with OPSS Enterprise Roles
XML Improvements
Oracle Warehouse Builder Integration
Unique Repository IDs

Declarative Flow-Based User Interface

The new declarative flow-based user interface combines the simplicity and ease-of-use of the declarative approach with the flexibility and extensibility of configurable flows. Mappings (the successor of the Interface concept in Oracle Data Integrator 11g) connect sources to targets through a flow of components such as Join, Filter, Aggregate, Set, Split, and so on.

Reusable Mappings

Reusable Mappings can be used to encapsulate flow sections that can then be reused in multiple mappings. A reusable mapping can have input and output signatures to connect to an enclosing flow; it can also contain sources and targets that are encapsulated inside the reusable mapping.

Multiple Target Support

A mapping can now load multiple targets as part of a single flow. The order of target loading can be specified, and the Split component can be optionally used to route rows into different targets, based on one or several conditions.

Step-by-Step Debugger

Mappings, Packages, Procedures, and Scenarios can now be debugged in a step-by-step debugger. You can manually traverse task execution within these objects and set breakpoints to interrupt execution at pre-defined locations. Values of variables can be introspected and changed during a debugging session, and data of underlying sources and targets can be queried, including the content of uncommitted transactions.

Runtime Performance Enhancements

The runtime execution has been improved to enhance performance. Various changes have been made to reduce overhead of session execution, including the introduction of blueprints, which are cached execution plans for sessions.

Performance is improved by loading sources in parallel into the staging area. Parallelism of loads can be customized in the physical view of a map.

You also have the option to use unique names for temporary database objects, allowing parallel execution of the same mapping.

Oracle GoldenGate Integration Improvements

The integration of Oracle GoldenGate as a source for the Change Data Capture (CDC) framework has been improved in the following areas:

Oracle GoldenGate source and target systems are now configured as data servers in Topology. Extract and replicate processes are represented by physical and logical schemas. This representation in Topology allows separate configuration of multiple contexts, following the general context philosophy.
Most Oracle GoldenGate parameters can now be added to extract and replicate processes in the physical schema configuration. The UI provides support for selecting parameters from lists. This minimizes the need for the modification of Oracle GoldenGate parameter files after generation.
A single mapping can now be used for journalized CDC load and bulk load of a target. This is enabled by the Oracle GoldenGate JKM using the source model as opposed to the Oracle GoldenGate replication target, as well as configuration of journalizing in mapping as part of a deployment specification. Multiple deployment specifications can be used in a single mapping for journalized load and bulk load.
Oracle GoldenGate parameter files can now be automatically deployed and started to source and target Oracle GoldenGate instances through the JAgent technology.

Standalone Agent Management with WebLogic Management Framework

Oracle Data Integrator Standalone agents are now managed through the WebLogic Management Framework. This has the following advantages:

UI-driven configuration through Configuration Wizard
Multiple configurations can be maintained in separate domains
Node Manager can be used to control and automatically restart agents

Integration with OPSS Enterprise Roles

Oracle Data Integrator can now use the authorization model in Oracle Platform Security Services (OPSS) to control access to resources. Enterprise roles can be mapped into Oracle Data Integrator roles to authorize enterprise users across different tools.

XML Improvements

The following XML Schema constructs are now supported:

list and union - List or union-based elements are mapped into VARCHAR columns.
substitutionGroup - Elements based on substitution groups create a table each for all types of the substitution group.
Mixed content - Elements with mixed content map into a VARCHAR column that contains text and markup content of the element.
Annotation - Content of XML Schema annotations are stored in the table metadata.

Oracle Warehouse Builder Integration

Oracle Warehouse Builder (OWB) jobs can now be executed in Oracle Data Integrator through the OdiStartOwbJob tool. The OWB repository is configured as a data server in Topology. All the details of the OWB job execution are displayed as a session in the Operator tree.

Unique Repository IDs

Master and work repositories now use unique IDs following the GUID convention. This avoids collisions during import of artifacts and allows for easier management and consolidation of multiple repositories in an organization.