2 Overview of Oracle Data Integrator Topology

This chapter provides an overview of Oracle Data Integrator topology concepts and components relevant to ODI developers.

This chapter includes the following section:

Introduction to the Oracle Data Integrator Topology

Introduction to the Oracle Data Integrator Topology

The Oracle Data Integrator Topology is the physical and logical representation of the Oracle Data Integrator architecture and components.

Note:

The Installation Guide for Oracle Data Integrator uses the term "topology" in some sections to refer to the organization of servers, folders, and files on your physical servers. This chapter refers to the "topology" configured using the Topology Navigator in ODI Studio.

This section contains these topics:

Physical Architecture
Contexts
Logical Architecture
Agents
Languages
Repositories

Physical Architecture

The physical architecture defines the different elements of the information system, as well as their characteristics taken into account by Oracle Data Integrator. Each type of database (Oracle, DB2, etc.), Big Data source (Hive, HBase), file format (XML, Flat File), or application software is represented in Oracle Data Integrator by a technology.

A technology handles formatted data. Therefore, each technology is associated with one or more data types that allow Oracle Data Integrator to generate data handling scripts.

The physical components that store and expose structured data are defined as data servers. A data server is always linked to a single technology. A data server stores information according to a specific technical logic which is declared into physical schemas attached to this data server. Every database server, JMS message file, group of flat files, and so forth, that is used in Oracle Data Integrator, must be declared as a data server. Every schema, database, JMS Topic, etc., used in Oracle Data Integrator, must be declared as a physical schema.

Finally, the physical architecture includes the definition of the Physical Agents. These are the Java software components that run Oracle Data Integrator jobs.

Contexts

Contexts bring together components of the physical architecture (the real Architecture) of the information system with components of the Oracle Data Integrator logical architecture (the Architecture on which the user works).

For example, contexts may correspond to different execution environments (Development, Test and Production) or different execution locations (Boston Site, New-York Site, and so forth.) where similar physical resource exist.

Note that during installation the default GLOBAL context is created.

Logical Architecture

The logical architecture allows you to identify as a single Logical Schema a group of similar physical schemas (that contain datastores that are structurally identical) that are located in different physical locations. Logical Schemas, like their physical counterparts, are attached to a technology.

Contexts allow logical schemas to resolve to physical schemas. In a given context, a logical schema resolves to a single physical schema.

For example, the Oracle logical schema Accounting may correspond to two Oracle physical schemas:

Accounting Sample used in the Development context
Accounting Corporate used in the Production context

These two physical schemas are structurally identical (they contain accounting data), but are located in different physical locations. These locations are two different Oracle schemas (Physical Schemas), possibly located on two different Oracle instances (Data Servers).

All the components developed in Oracle Data Integrator are designed on top of the logical architecture. For example, a data model is always attached to logical schema, and data flows are defined with this model. By specifying a context at run-time (either Development or Production), the model's logical schema (Accounting) resolves to a single physical schema (either Accounting Sample or Accounting Corporate), and the data contained in this schema in the data server can be accessed by the integration processes.

Agents

Oracle Data Integrator run-time Agents orchestrate the execution of jobs. These agents are Java components.

The run-time agent functions as a listener and a scheduler agent. The agent executes jobs on demand (model reverses, packages, scenarios, mappings, and so forth), for example when the job is manually launched from a user interface or from a command line. The agent is also used to start the execution of scenarios according to a schedule defined in Oracle Data Integrator.

Third party scheduling systems can also trigger executions on the agent. See "Scheduling a Scenario or a Load Plan with an External Scheduler" in Administering Oracle Data Integrator for more information.

Typical projects only require a single Agent in production; however, "Load balancing Agents" in Administering Oracle Data Integrator describes how to set up multiple load-balanced agents.

ODI Studio can also directly execute jobs on demand. This internal "agent" can be used for development and initial testing. However, it does not have the full production features of external agents, and is therefore unsuitable for production data integration. When running a job, in the Run dialog, select Local (No Agent) as the Logical Agent to directly execute the job using ODI Studio. Note the following features are not available when running a job locally:

Stale session cleanup
Ability to stop a running session
Load balancing

If you need any of these features, you should use an external agent.

Agent Lifecycle

The lifecycle of an agent is as follows:

When the agent starts it connects to the master repository.
Through the master repository it connects to any work repository attached to the Master repository and performs the following tasks at startup:
- Execute any outstanding tasks in all work repositories that need to be executed upon startup of this agent.
- Clean stale sessions in each work repository. These are the sessions left incorrectly in a running state after an agent or repository crash.
- Retrieve its list of scheduled scenarios in each work repository, and compute its schedule.
The agent starts listening on its port.
- When an execution request is received by the agent, the agent acknowledges this request and starts the session.
- The agent launches sessions according to the schedule.
- The agent is also able to process other administrative requests in order to update its schedule, stop a session, respond to a ping, or clean stale sessions. The standalone agent can also process a stop signal to terminate its lifecycle.

Refer to "Running Integration Processes" in Administering Oracle Data Integrator for more information about a session lifecycle.

Agent Features

Agents are not data transformation servers. They do not perform any data transformation, but instead only orchestrate integration processes. They delegate data transformation to database servers, operating systems, and scripting engines.

Agents are multi-threaded lightweight components. An agent can run multiple sessions in parallel. When declaring a physical agent, Oracle recommends that you adjust the maximum number of concurrent sessions it is allowed to execute simultaneously from a work repository. When this maximum number is reached, any new incoming session will be queued by the agent and executed later when other sessions have terminated. If you plan to run multiple parallel sessions, you can consider load balancing executions, as described in "Load balancing Agents" in Administering Oracle Data Integrator.

Agent Types

Oracle Data Integrator agents are available with three types: standalone agents, standalone colocated agents, and Java EE agents.

For more information about agent types, see: "Run-Time Agent" in Understanding Oracle Data Integrator.

Physical and Logical Agents

A physical agent corresponds to a single standalone agent or a Java EE agent. A physical agent should have a unique name in the Topology.

Similarly to schemas, physical agents having an identical role in different environments can be grouped under the same logical agent. A logical agent is related to physical agents through contexts. When starting an execution, you indicate the logical agent and the context. Oracle Data Integrator will translate this information into a single physical agent that will receive the execution request.

Agent URL

An agent runs on a host and a port and is identified on this port by an application name. The agent URL also indicates the protocol to use for the agent connection. Possible values for the protocol are http or https. These four components make the agent URL. The agent is reached using this URL.

For example:

A standalone agent started on port 8080 on the odi_production machine will be reachable at the following URL:

http://odi_production:8080/oraclediagent.

Note:
The application name for a standalone agent is always oraclediagent and cannot be changed.
A Java EE agent started as an application called oracledi on port 8000 in a WLS server deployed on the odi_wls host will be reachable at the following URL:

http://odi_wls:8000/oracledi.

Apache Oozie

Apache Oozie is a workflow scheduler that helps you manage Apache Hadoop jobs. It is a server-based Workflow Engine specialized in running workflow jobs with actions that run Hardoop MapReduce jobs. Refer to Integrating Big Data with Oracle Data Integrator for more information.

Languages

Languages defines the programming and scripting languages, and language elements, available when creating and editing expressions during integration development. Languages provided by default in Oracle Data Integrator do not require any user change.

Repositories

The topology contains information about the Oracle Data Integrator repositories. Repository definition, configuration and installation is described in "Creating the Oracle Data Integrator Master and Work Repository Schema" in Installing and Configuring Oracle Data Integrator.