Overview of Data Integration
Administrators, data engineers, ETL developers, and operators are among the different types of data professionals who use Oracle Cloud Infrastructure Data Integration.
You may perform one or more of the following roles:
- Administrators: Oversee, manage, and monitor lifecycle management and security policies for the service.
- Data engineers and ETL developers: Develop, build, and test data integration solutions.
- Operators: Manage, monitor, and diagnose data integration executions.
Here are some resources that you can use to learn about Oracle Cloud Infrastructure Data Integration.
Watch a video introduction to the service.
About the Service
Before you get started, the administrator must satisfy connectivity requirements so that the Data Integration service can establish a connection to your data sources. The administrator then creates workspaces and gives you access to them. You use workspaces to stay organized and easily manage different data integration environments.
For each data integration solution, you register data assets to identify the source and target data sources to use. When you're ready to start designing a data integration solution, Data Integration provides integration and data loader tasks .
To create an integration task, start with a data flow. The designer in Data Integration is an easy-to-use graphical user interface where you can select from different operators and visually build the data flow. It includes validation and debug features to help you identify and correct potential issues before running the task.
When you create a data loader task, you specify your source data asset, and then configure transformations to cleanse and process the data as it is loaded into the target data asset.
To execute a specific set of processes in a sequence, you create a pipeline. Designing a pipeline is similar to building a data flow, where you use operators to add the tasks and activities you want. After building a pipeline, you create a pipeline task that uses the pipeline.
After you create tasks, you publish them to the Default Application or to your own Application . From the Application, you can run tasks and then monitor their progress and status. You can also schedule tasks for automated runs.
Data Integration Concepts
The following is a list of concepts that would be helpful for you to know when using the Data Integration service:
- The container for all Data Integration resources, such as projects, folders, data assets, tasks, data flows, pipelines, applications, and schedules, associated with a data integration solution.
- A container for design-time resources, such as tasks or data flows and pipelines.
- A container within a project or another folder to organize your design-time resources.
- Data Asset
- Represents a data source such as a database, an object store, a file or document store containing the data source's metadata and connection details.
- Includes the necessary details to establish a connection to a data source. A connection is always associated to one data asset. A data asset can have more than one connection.
- Data Entity
- A collection of data, such as a database table or view, or a single logical file, with many attributes that describe its data.
- A collection of data entities within a data asset.
- Data Flow
- A design-time resource that defines the flow of data and any operations on the data between the source and target systems. To run a data flow, you add the data flow to an integration task.
- A design-time resource for orchestrating tasks and activities in a sequence or in parallel to facilitate a process from start to finish. To run a pipeline, you add the pipeline to a pipeline task.
- An operator represents an input source or output target, or a transformation in a data flow. In a pipeline, an operator represents a published task or an activity such as merge or end.
- A type of variable you can assign to an operator's details so that you can reuse the data flow or pipeline design with different resources and values. When you use parameters and set default values during design time, you can then change the values later, either in tasks that wrap the data flow or pipeline, or when you run the tasks.
- A design-time resource that specifies a set of actions to perform on data. You can create data loader tasks, integration tasks for data flows, and pipeline tasks for pipelines. You can also create SQL tasks and OCI Data Flow tasks. To run a task, you publish the task into an Application to test it or roll it out to production.
- A container for runtime artifacts, such as tasks that have been published along with their dependencies. You use Applications for testing and eventually roll them out into production.
- An update to an Application. When you publish a single task or a group of tasks, or when you unpublish a task, these activities are logged as patches in an Application. When you create an Application (target) by making a copy of existing resources in another Application (source), a patch is added to the Application (target). In subsequent refreshes of the target Application by syncing with changes from the source Application, a patch is also created in the Application (target).
- A runtime artifact that represents the execution of a task.
- A runtime resource that defines when and how often any published tasks should run automatically.
- Task Schedule
- A runtime resource that is associated with a specific published task and an existing schedule to define when and how often the task should run automatically.
Ways to Access Oracle Cloud Infrastructure
You can access Oracle Cloud Infrastructure using the Console (a browser-based interface) or the REST API.
Instructions for the Console and API are included in topics throughout this guide. For a list of available SDKs, see Software Development Kits and Command Line Interface.
To access the Console, you must use a supported browser. From the navigation menu at the top of this help page, you can use the Infrastructure Console link to go to the sign-in page. You are prompted to enter your cloud tenant, your user name, and your password.
Most types of Oracle Cloud Infrastructure resources have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).
For information about the OCID format and other ways to identify your resources, see Resource Identifiers.
Service Limits and Quotas
Data Integration limits you to five workspaces per region.
You can limit the number of workspace resources in a compartment by creating a quota limit. For example:
set dataintegration quota workspace-count to 10 in compartment <compartment_name>
Data Integration is integrated with various Oracle Cloud Infrastructure services and features.
Data Integration integrates with IAM for authentication and authorization, for all interfaces (the Console, SDK, CLI, and REST API).
An administrator sets up groups, compartments, and policies. Policies control who can create users, create and manage the cloud network, launch instances, create buckets, download objects, and so on.
If you're a regular user, not an administrator, who needs to use Oracle Cloud Infrastructure resources that your company owns, have your administrator set up user ID for you. The administrator can confirm which compartment or compartments you can use.
The administrator can create common policies to authorize Data Integration users. They can also create Data Integration Policies to control user access to the Data Integration service.
The tenancy explorer lets you view all resources in a specific compartment, across all regions. The tenancy explorer is powered by the Search service and supports the Data Integration resource type,
About Data Security
In addition to the control and transparency you get with Oracle Cloud Infrastructure security, the Data Integration service also handles your data with care.
Oracle Cloud Infrastructure customer isolation ensures that each Data Integration workspace you create gets its own reserved compute instance. Your workspace is isolated from other workspaces within the same tenancy, and from other tenancies. Data Integration doesn't store any data in this compute instance beyond task runs to ensure that your data is secure.
Data Integration uses Oracle Cloud Infrastructure's Vault service to store and encrypt sensitive information, such as passwords, wallet files for data asset and connection information as secrets. Schemas and data entities are accessed in real time, when needed. When a data sampling is loaded in Data Xplorer, in the Data tab for Data Flow or for configuring transformations in the data loader task, the data is loaded from the data entity in real time.
Assign only the required privileges to accounts used for
example, Data Integration only requires read access to ingest
data from data assets.
For more information, see:
Typical Data Integration User Activities
Here are some of the activities you're likely to perform as a Data Integration user.
|Accessing a Workspace||Open your Data Integration instance|
|Creating a Data Asset||Register your data sources as Data Integration data assets|
|Adding a Connection||Add new connections to data assets|
|Creating Projects and Folders||Create projects and folders to organize your design-time artifacts|
|Creating a Data Flow||Design a data flow|
|Creating a Pipeline||Design a pipeline|
|Creating a Pipeline Task||Create a pipeline task|
|Using Data Integration Applications||Create an Application for task runs|
|Publishing Design Tasks||Publish tasks to Applications for testing and running|
|Run tasks and then monitor their progress|
|Scheduling Published Tasks||Create a schedule and task schedules for automating runs|
|Monitoring Workspaces||Monitor a workspace|
Using the Console's Data Integration Overview Page
When you access Data Integration in the Console and click Overview, you are presented with the Data Integration overview page.
The overview page provides you information about the Data Integration capabilities, enables you to get started with the service, and points you to the available resources that help you to use Data Integration efficiently.