11 Lineage (Preview)

Lineage in Oracle AI Data Platform Workbench shows how data artifacts are related through notebook and workflow executions. The lineage graph helps you trace upstream sources, downstream consumers, and column-level derivations for supported artifacts.

Note:

Lineage metadata is captured from notebook and workflow executions. For each process run, the service currently displays the latest captured lineage and does not yet expose historical lineage.

Lineage capture is enabled or disabled at the compute level as part of the Spark configuration. By default, lineage is enabled in any compute you create. To manually disable lineage, you add spark.aidp.lineage.enabled = false to the Spark configuration field in your compute, under Advanced options. To re-enable lineage, use spark.aidp.lineage.enabled = true. This setting is compute-specific, meaning if you disable lineage in one compute, workflows run on another compute where lineage is still enabled are still captured.

You can view the lineage of AI Data Platform artifacts from the Master Catalog by right-clicking on an artifact and selecting Lineage. You can view the lineage of any data artifact in AI Data Platform, such as tables and volumes. Lineage currently supports tables as anchor nodes, but displays both tables and volumes as part the lineage diagram.


Lineage diagram.

The Lineage view displays a lineage graph with upstream and downstream artifacts for the selected data artifact. You can switch between the full graph, upstream-only view, and downstream-only view.


Lineage diagram navigator bar. Downstream, upstream, lineage graph, anchor, and zoom drop-down menu are indicated by red text.

You can view column-level lineage to trace how columns in one data artifact are derived from, transformed by, or propagated to columns in other artifacts.

You can hide the filters at the top of your canvas by clicking the Filter icon in the top-left.


Lineage diagram filter bar.

You expand data artifacts in your Lineage flow by clicking the down arrow at the bottom of the artifact card. When the artifact is expanding, you can see upstream and downstream inheritance of specific data columns. This function works only for artifacts that contain data columns, like tables and volumes.


Lineage diagram is displayed. The table node content_engagement is selected and expanded.

For expanded artifact cards, you expand a table or volume to view its columns and the column-level lineage relationships connected to them. You expand data artifacts in your Lineage flow by clicking the down arrow at the bottom of the artifact card. When the artifact is expanding, you can see upstream and downstream data flow for specific columns. This function works only for artifacts that contain data columns, like tables and volumes.

You can expand multiple tables and volumes in your lineage graph to see the data flow from each. When you expand the data artifact, blue arrows show how columns in source artifacts contribute to columns in target artifacts through notebook or workflow executions. You highlight the path of an individual column by double-clicking it.

Blue arrows show column-level lineage relationships between source and target columns. These relationships indicate how data is derived, transformed, or propagated across tables, volumes, notebooks, tasks, and workflows. Double-click a column to highlight its lineage path across the graph.


Lineage diagram is displayed. The content_engagement node is expanded and the engagement_date data column is selected. Dark blue arrows connect the data column to upstream and downstream nodes.

You can select multiple data columns by Shift- or Ctrl-clicking them to highlight multiple paths.

From the Actions menu in the top-right of the Lineage window, you can control your Lineage settings, which affects the depth of upstream and downstream artifacts displayed, or you can share your lineage diagram, either by copying a link or by exporting a PNG image.


Lineage actions button expanded and showing Lineage settings, Copy link, and Export current lineage view options.

Lineage Details

Double-clicking on an artifact in the lineage diagram shows details for that artifact. For tasks, the details page provides both details for the task and the job it belongs to. For tables and volumes, the details page provides information on the table or volume and their columns.

You can right-click data artifacts to either View Details or Set as Anchor. Setting the data artifact as anchor changes the currently displayed diagram to center on that node instead.

At the top of the Details window, you can see the artifact type, the schema it belongs to, and the number of upstream and downstream artifacts. In the Description pane, clicking the Asset link takes you to the artifact in your workspace.


Lineage details page for content_engagement_clean node is displayed. The Details tab is selected.

For Data artifacts, the Details window shows when the artifact was last updated, information on data columns, format, and the catalog to which the data artifact belongs. You can search for specific data columns by name and filter by data type using the drop-down menu.

For Process artifacts, which include tasks and notebooks, the Details window displays information related to the artifact, including the most recent task and job status, duration, task type, job or notebook name and ID, and the attached cluster. In the right pane, you can search for source and target artifacts based on artifact name or using the drop-down menu to filter for transformation type.

Tranformation Types

AI Data Platform Workbench supports the following transformation types when tracking lineage:

Type Meaning Example Scenario Example Field Mapping
AGGREGATION The output field is computed by aggregating multiple input records. Creating summary tables or metrics. total_sales = SUM(amount)
IDENTITY The output field is exactly the same as the input field (no change). Copying a dataset from one table to another. customer_id → customer_id
TRANSFORMATION The output is derived from input fields using functions, casts, concatenation, etc. Standardizing or cleaning data. full_name = CONCAT(first_name, ' ', last_name)

Impact Analysis

Data artifacts selected as the anchor node have an additional tab in their Details window for Impact Analysis. From the Impact Analysis tab, you can search for specific artifact names or filter by artifact type. You can select Upstream or Downstream to only show artifacts that are upstream or downstream of the currently selected artifact.


Lineage node content_engagement_clean details page is displayed. Impact Analysis tab is selected.

Use upstream impact analysis to understand dependencies. Use downstream impact analysis to identify consumers that may be affected by changes to the selected artifact.

Click Export import analysis to export the artifacts related to the selected data artifact. You can export upstream artifacts, downstream artifacts, or all related artifacts.

Entity and Column Lineage

In some lineage scenarios where multiple upstream datasets participate in producing a target dataset, only some of those upstream datasets contribute actual column values to the target.

The key distinction between entity lineage and column lineage is the question they answer:
  • Entity lineage answers: Which datasets participated in creating the target?
  • Column lineage answers: Which source columns supplied the target column values?
Because these questions are different, entity lineage and column lineage can look different for the same pipeline.
In some transformations, one input provides the rows and column values written to the target, while another input is used only as a reference for filtration. In these cases:
  • Entity lineage should show all upstream datasets that the target depends on.
  • Column lineage may show column-level flow only from the value-providing input.
  • A reference input can affect the target row set without contributing values to target columns.
This behavior is expected.

Example: Entity and Column Lineage

Assume two source datasets contain the same columns, but not the same rows:
  • source_table_1 contains the primary dataset.
  • source_table_2 contains a reference set of rows.
  • The target table is created by keeping only the rows that exist in both source tables.
For example:

Table 11-1 source_table_1

product_id sales_date quantity total_amount
101 2025-06-01 10 150.0
102 2025-06-02 20 300.0
103 2025-06-03 15 225.0
104 2025-06-04 12 180.0

Table 11-2 source_table_2

product_id sales_date quantity total_amount
102 2025-06-02 20 300.0
103 2025-06-03 15 225.0
105 2025-06-05 18 270.0

Table 11-3 target_table

product_id sales_date quantity total_amount
102 2025-06-02 20 300.0
103 2025-06-03 15 225.0

In this example, both source tables participate in creating the target because both are required to determine the final row set.


Lineage canvas is displayed with source_table_1 and source_table_2 nodes connected to instersect ipynb node which is connected to target_table node.

However, from a column lineage perspective, the target column values may be attributed only to the value-providing input, such as source_table_1. The second input, source_table_2, is used to determine which rows qualify for the target, but its values are not necessarily copied into the target columns.


Lineage canvas with source_table_1 node expanded and blue arrows connecting columns to the instersect notebook node, which is connected to the four columns inherited by target_table

For these reasons, when the lineage view is anchored on source_table_2, no column-level lineage links are displayed, as shown below.


Lineage canvas showing source_table_2 as the anchor node and no column-level lineage links connecting it to target_table.

Why Entity Lineage Shows Both Inputs

Entity lineage captures dataset-level dependency. If a processing job reads two datasets and the result depends on both, both datasets are legitimate upstream entities. In this pattern:
  • The target cannot be fully explained without Source Dataset A.
  • The target also cannot be fully explained without Source Dataset B, because Source Dataset B determines which records from Source Dataset A are retained.
  • Therefore, both Source Dataset A and Source Dataset B should appear as upstream entities for Target Dataset C.
This is dependency lineage, not value lineage.

Why Column Lineage Shows Only the Value-Providing Input

Column lineage captures value provenance. It describes where the values in each target column came from.

For example, if the target table is written using rows from Source Dataset A after filtering rows from Source Dataset B, then the target column values still originate from Source Dataset A.

Example column mappings:

Target Column Source Column
target.product_id source_a.product_id
target.sales_date source_a.sales_date
target.quantity source_a.quantity
target.total_amount source_a.total_amount

Source Dataset B influences whether a row is present, but its column values are not copied into the target. As a result, Source Dataset B may appear in entity lineage while not appearing in column lineage.

View Data Lineage

You can see the inheritance of data in your workspace as it moves between different Oracle AI Data Platform Workbench artifacts.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.

    Master catalog view of an AI Data Platform Workbench workspace is displayed. A table has been right-clicked and displays the menu options Sharing and Lineage. Lineage is highlighted.

  3. The lineage diagram is displayed.

View Lineage for Specific Data Columns

You can trace the lineage of a specific data column through your lineage diagram.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. Click the arrow at the bottom of a table or volume artifact to expand it.
  4. Double-click the data column you want to highlight the lineage for.

View Details for a Lineage Artifact

You can see additional details for an artifact in your lineage diagrams.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. Double-click an artifact on the lineage diagram to view additional details. You can also right-click and click View Details.
  4. Click the Impact Analysis tab to view the upstream and downstream impact of the artifact. This tab is only available for the anchor node.

Export Impact Analysis

You can export the impact analysis for data artifacts while viewing the details of a lineage artifact.

Note:

You can only export impact analysis for data artifacts.
  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. Double-click a data artifact in the lineage diagram. Select the Impact Analysis tab.
  4. Click Export impact analysis.
  5. From the drop-down menu, select if upstream, downstream, or all artifacts should be included.
  6. Click Export.

Filter Lineage Flow Diagram

You can filter your lineage diagram to help focus on more specific data points when examining lineage.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. From the drop-down menus, select specific catalogs, schemas, volumes, or workspaces to filter out results from.

Search for Artifacts in Lineage Flow Diagram

You can search for strings to locate specific artifacts in the lineage diagram when viewing artifact lineage.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. In the Search field at the top of your lineage diagram, enter the string to search for.
  4. Click a result in the list to center the diagram on that artifact.

Change Lineage Flow Depth

You can alter how many levels of upstream or downstream artifacts your lineage diagram displays to help you either expand or narrow the focus of your diagram.

  1. Navigate to the artifact in your Master Catalog you want to view the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. Click Actions three dot icon Actions in the top-right
  4. Click Lineage Settings.

    Lineage three-dot actions menu is displayed. Lineage settings is highlighted.

  5. Modify Upstream depth and Downstream depth as needed.
  6. Click Save.

Share a Lineage Flow Diagram

You can share the lineage diagram showing the lineage of a specific object as either a direct link or a PNG image.

  1. Navigate to the artifact in your Master Catalog you want to share the lineage for.
  2. Right-click the artifact then click Lineage. You can also select the artifact and click Actions in the top-right, then click Lineage.
  3. Click Actions three dot icon Actions in the top-right.

    Lineage three-dot actions menu is selected. Copy link and Export current lineage view are highlighted.

  4. Choose how you want to share your lineage diagram:
    • Click Copy link to copy a link directly to your clipboard. Paste the link to share it.
    • Click Export current lineage view (.png) to export the current view of your lineage diagram, including any filters you have applied.