Query Apache Iceberg Tables

Autonomous AI Database supports querying Apache Iceberg tables.

About Querying Apache Iceberg Tables
Autonomous AI Database supports querying of Apache Iceberg tables.
Concepts Related to Querying Apache Iceberg Tables
An understanding of the following concepts is helpful for querying Apache Iceberg tables.
Credentials for Iceberg: REST Catalog vs. Object Store
This topic explains how Apache Iceberg manages and accesses data through the two credentials: REST Catalog and Object Store. You can also refer to two different ways of managing table information in data lake table formats like Apache Iceberg.
Typical Workflow of Querying Apache Iceberg Tables
Before you begin querying Apache Iceberg Tables, you should be familiar with it's workflow. This section explains how to set up external tables to access data presented as an end-to-end setup workflow with five main steps.
Provider Quickstarts
This chapter describes the process for setting up external data access with different cloud data providers.
References
This section provides a list of references for links cited throughout this chapter:

Parent topic: Query External Data with Autonomous AI Database

About Querying Apache Iceberg Tables

Autonomous AI Database supports querying of Apache Iceberg tables.

Supported Configurations

Here is the compatibility matrix of supported configuration:

Catalog	Object store(s)	Catalog Auth (REST)	Storage auth	Notes
Unity (Databricks)	Amazon S3, Azure ADLS Gen2	OAuth2 service principal (/oidc/v1/token) - recommended; PAT - quick tests	S3 access/secret key; ADLS Gen2 SAS key	UniForm required (Delta to Iceberg-readable). Native Iceberg via Iceberg REST not yet supported by our integration; use Delta+UniForm via …/api/2.1/unity-catalog/iceberg; Object store credential vending is not supported.
Polaris (Snowflake)	Amazon S3, Azure ADLS Gen2	OAuth2 (client credentials) or Polaris-supported token	S3 access/secret key; ADLS Gen2 SAS key	Object store credential vending is not supported.
AWS Glue	Amazon S3	N/A (uses AWS account auth)	S3 access/secret key;	Object store credential vending is not supported. Same credentials must be used for S3 and Glue. S3 and Glue must be in the same AWS region.
JSON metadata (non-catalog option)	Amazon S3, Azure ADLS Gen2, OCI Object Store	N/A (no REST)	S3 access/secret key; ADLS Gen2 SAS key, OCI Native Credentials	Point ADB to the table’s `metadata.json` (root manifest). Point-in-time snapshot; recreate external table after schema change or new snapshot.
Hadoop (non-Catalog)	OCI Object Storage	N/A (no REST)	OCI native credentials	Points to a lakehouse folder that contains data and metadata files.

Restrictions of Querying Apache Iceberg Tables
This chapter lists restrictions of querying Apache Iceberg tables.

Parent topic: Query Apache Iceberg Tables

Restrictions of Querying Apache Iceberg Tables

This chapter lists restrictions of querying Apache Iceberg tables.

Catalogs & Interoperability

Unity native Iceberg (REST): Not supported.

Workaround: Use Delta + UniForm to publish an Iceberg-readable view via Unity Catalog’s Iceberg REST endpoint.
Certified REST catalogs: ADB is certified with Snowflake Polaris and Databricks Unity Catalog (UniForm only) for Iceberg read access.

Authentication & Credentials:

Catalog credential vending: Not supported.

Cloud-native role-based vending, such as automatic role assumption or temporary credentials issued by STS, is not supported. Use explicit access/secret keys or static tokens.)
AWS ARN credentials: Not supported. IAM role ARNs and AssumeRole via ARN are not accepted.

Table Semantics & DML

Partitioned Iceberg tables are not supported; only non-partitioned tables are allowed.
Row-level updates (merge-on-read): Not supported. If Iceberg metadata references delete files, queries will fail.

Schema & Metadata Evolution

The schema of a fixed external table is determined at creation and must align with the Iceberg schema version in the metadata. If the Iceberg schema is updated, the external table must be recreated.

Snapshots and Time Travel

No query-time time travel: Querying by snapshot, version, or timestamp is not supported.
Non_catalog only: New snapshots are not picked up automatically. To read a specific snapshot, target that snapshot’s metadata.json and recreate the external table.

AWS Glue

Credential alignment: The same credentials must be used for AWS S3 and AWS Glue.
Region co-location: S3 buckets and the AWS Glue catalog must be in the same AWS region.

Parent topic: About Querying Apache Iceberg Tables

Concepts Related to Querying Apache Iceberg Tables

An understanding of the following concepts is helpful for querying Apache Iceberg tables.

Iceberg Catalog

The Iceberg catalog is a service that manages table metadata, such table snapshots, the table schema and partitioning information. In order to query the latest snapshot for an Iceberg table, query engines must first access the catalog and obtain the location of the most recent metadata file. There are already a number of available catalog implementations, including AWS Glue, Hive, Nessie, and Hadoop. Autonomous AI Database supports the AWS Glue catalog and the HadoopCatalog implementation used by Spark.

For more information, see Optimistic Concurrency.

Metadata Files

The metadata file is a JSON document that keeps track of the table snapshots, partitioning scheme and schema information. The metadata file is the entry point to a hierarchy of manifest lists and manifest files. The manifests track the table's data files along with information including partitioning and column statistics. See the Iceberg Table Specification, for more information.

Transactions

Iceberg supports row-level updates to tables using either copy-on-write or merge-on-read. Copy-on-write generates new data files that reflect the updated rows, while merge-on-read generates new "delete files" that must be merged with the data files during reading. Oracle supports copy-on-write. Queries on iceberg tables fail if they encounter a delete file. For more information, see RowLevelOperationMode.

Schema Evolution

Iceberg supports schema evolution. Schema changes are reflected in the Iceberg metadata using a schema ID. Note that Oracle external tables have a fixed schema, determined by the most current schema version at table creation time. Iceberg queries fail when the queried metadata points to a different schema version compared to the one used at table creation time. For more information, see Schema Evolution.

Partitioning

Iceberg supports advanced partitioning options such as hidden partitioning and partition evolution that rely on processing/altering the table's metadata without costly data layout changes.

Parent topic: Query Apache Iceberg Tables

Credentials for Iceberg: REST Catalog vs. Object Store

This topic explains how Apache Iceberg manages and accesses data through the two credentials: REST Catalog and Object Store. You can also refer to two different ways of managing table information in data lake table formats like Apache Iceberg.

Catalog-Managed vs Direct-Metadata External Tables

The following section compares catalog-managed external tables with direct-metadata external tables highlighting their key differences.

Catalog-Managed (Unity / Polaris / AWS Glue)

What it is: Metadata, schema, and “current” snapshot resolved via a REST catalog.

Behavior: Automatically reflects the catalog’s latest snapshot; centralized permissions, tags, and lineage.

Best for: Enterprise data products, cross-engine sharing, consistent governance, discoverability (the catalog is the single point of truth).
- Direct-Metadata (Filesystem via metadata.json)
  
  What it is: External table points directly to a specific metadata.json.
  
  Behavior: Fixed, reproducible snapshot; no automatic advances; governance limited to object-store ACLs.
  
  Best for: Experiments, tests, audits.

REST vs. Object Store Credentials

REST Catalog Credentials

REST credentials are required when connecting to an Apache Iceberg REST catalog. The REST catalog manages metadata for Iceberg tables by exposing RESTful endpoints. To authenticate, REST credentials are often OAuth-based, requiring you to obtain a bearer token from a token endpoint using a client ID and secret.

When using a REST Iceberg catalog, two credentials are required:

rest_auth_cred: authenticates with the catalog service (e.g., Unity or Polaris).
credential_name: authenticates to the object store where the Iceberg data and metadata reside.

Note

Vending credentials are not supported at this time. Vending credentials refers to the controlled process of distributing or fetching access credentials (like usernames and passwords, API keys, or tokens) when they are needed, often automatically or on-demand, rather than storing them statically in configuration files or scripts.

Object Store Credentials

Object store credentials are used when Apache Iceberg tables are stored directly on cloud object storage like Oracle Cloud Infrastructure (OCI) Object Storage or Amazon S3.

The Credentials allow the Autonomous AI Database to access and read files (such as Parquet data and metadata manifests) directly from the cloud object store.

Note

Use object store credentials when defining external tables that directly point to Parquet/metadata files in OCI/S3 buckets.

Parent topic: Query Apache Iceberg Tables

Typical Workflow of Querying Apache Iceberg Tables

Before you begin querying Apache Iceberg Tables, you should be familiar with it's workflow. This section explains how to set up external tables to access data presented as an end-to-end setup workflow with five main steps.

Decide Your Access Model:
- Catalog-Managed: Use this model when you want a continuously updated, governed catalog to serve as the single source of truth for data metadata. This central catalog helps maintain consistency and governance over your data.
- Direct-Metadata: Use this model when you work with a fixed snapshot of metadata (via metadata.json). This model is simpler but static, not recommended for production since it lacks automatic updates and governance.
Gather What You Need:
- Catalog-Managed: You must have access to the catalog endpoint (if applicable), the exact table path, and the object store location where the actual data files reside.
- Direct-Metadata: You only need the URI pointing to the root metadata.json file plus the object store location of these data files.
Prepare Credentials:
- For Catalog-Managed setups, acquire credentials to access the catalog.
- Object store credentials are always necessary regardless of model to read both data and metadata files.
  Note
  
  Automated credential vending and AWS AssumeRole for catalog access are unsupported.
Create the External Table:
- In Catalog-Managed, the table queries data through the catalog and accesses files in the object store.
- In Direct-Metadata, the table points directly to the specific metadata.json file without catalog involvement.
Quick Check & Expectations:

Run a simple query such as COUNT(*) to verify your table setup and ensure it can access data correctly.

Parent topic: Query Apache Iceberg Tables

Provider Quickstarts

This chapter describes the process for setting up external data access with different cloud data providers.

Topics:

Databricks Unity Catalog
This section explains the workflow that links Databricks with open table formats through UniForm, making it easier to access Delta Lake data in environments that support Iceberg.
Snowflake Polaris
This topic describes Snowflake Polaris (REST catalog) allowing secure access to Apache Polaris Iceberg tables via a REST API using OAuth2 authentication.
AWS Glue Catalog
This topic describes how to access Amazon S3 data via Glue Data Catalog with registered Iceberg tables using AWS credentials.
Hadoop/Filesystem (direct metadata file)
This topic explains how to create a storage credential for accessing an Iceberg table’s metadata file directly from object stores like ADLS, S3, or OCI. It explains categorizing the types of direct metadata management for Iceberg tables stored directly in the filesystem (usually Hadoop compatible file systems) without using a catalog service.

Parent topic: Query Apache Iceberg Tables

Databricks Unity Catalog

This section explains the workflow that links Databricks with open table formats through UniForm, making it easier to access Delta Lake data in environments that support Iceberg.

Databricks Unity Catalog (UniForm path)

Prerequisites:

A Delta table created with UniForm so Iceberg clients can read it.
Table files in Azure ADLS Gen2 or AWS S3.
Unity Catalog privileges for external access (e.g., External data access enabled); grant EXTERNAL USE SCHEMA to your principal).
Auth: OAuth2 (recommended) or Personal Access Token (for quick tests).

Note

Native Iceberg via Iceberg REST is not supported by our integration yet. Use Delta with UniForm (Iceberg-readable) and expose it via Unity Iceberg REST: https://<workspace-host>/api/2.1/unity-catalog/iceberg.

Create a UniForm (Iceberg-readable) table in Databricks:

The following procedure creates a UniForm (Iceberg-readable) table named customers_iceberg in Databricks within the specified Unity Catalog catalog and schema:

USE CATALOG <your_catalog>;
USE SCHEMA  <your_schema>;

CREATE TABLE customers_iceberg (
  id   INT,
  name STRING
)
TBLPROPERTIES(
  'delta.columnMapping.mode'='name',
  'delta.enableIcebergCompatV2'='true',
  'delta.universalFormat.enabledFormats'='iceberg'
);

INSERT INTO customers_iceberg (id, name) VALUES
  (1,'Alice'), (2,'Bob'), (3,'Carol');

Object store credential (ADLS Gen2)

The following procedure sets up credentials for object store access (Azure Data Lake Storage Gen2) using either storage account keys or SAS tokens for secure external cloud storage access.

BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('AZURE_BLOB_CRED'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'AZURE_BLOB_CRED',
    username        => '<storage-account-or-sas-username>',
    password        => '<storage-key-or-sas-token>'
  );
END;
/

Create REST Catalog Credentials with OAuth2

The following procedure sets up OAuth2 credentials for secure authentication using a Databricks service principal (client ID and secret):

-- Databricks service principal (client_id / client_secret)
BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('UNITY_OAUTH'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'UNITY_OAUTH',
    username        => '<client_id>',
    password        => '<client_secret>'
  );
END;
/

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name      => 'CUSTOMERS_ICEBERG',
    credential_name => 'AZURE_BLOB_CRED',
    format          => '{
      "access_protocol": {
        "protocol_type": "iceberg-rest",
        "protocol_config": {
          "iceberg_catalog_type": "unity",
          "rest_catalog_endpoint": "https://<workspace-host>/api/2.1/unity-catalog/iceberg",
          "rest_authentication": {
            "rest_auth_cred": "UNITY_OAUTH",
            "rest_auth_endpoint": "https://<workspace-host>/oidc/v1/token"
          },
          "table_path": ["<your_catalog>","<your_schema>","customers_iceberg"]
        }
      }
    }'
  );
END;
/
SELECT COUNT(*) FROM CUSTOMERS_ICEBERG;

Create REST Catalog Credentials with Personal Access Token (PAT)

This procedure displays on how to use a PAT for quick tests by creating a PAT credential and creating the external table using it.

BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('UNITY_PAT'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'UNITY_PAT',
    username        => 'token',
    password        => '<dapiXXXXXXXXXXXXXXXXXXXXXXXX>'
  );
END;
/

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name      => 'CUSTOMERS_ICEBERG',
    credential_name => 'AZURE_BLOB_CRED',
    format          => '{
      "access_protocol": {
        "protocol_type": "iceberg-rest",
        "protocol_config": {
          "iceberg_catalog_type": "unity",
          "rest_catalog_endpoint": "https://<workspace-host>/api/2.1/unity-catalog/iceberg",
          "rest_authentication": { "rest_auth_cred": "UNITY_PAT" },
          "table_path": ["<your_catalog>","<your_schema>","customers_iceberg"]
        }
      }
    }'
  );
END;
/
SELECT COUNT(*) FROM CUSTOMERS_ICEBERG;

Parent topic: Provider Quickstarts

Snowflake Polaris

This topic describes Snowflake Polaris (REST catalog) allowing secure access to Apache Polaris Iceberg tables via a REST API using OAuth2 authentication.

Snowflake Polaris (REST catalog)

Prerequisites:

Polaris Iceberg catalog and endpoint available to your account.
Table files accessible in your object store (S3/ADLS as applicable).
Auth: OAuth2 recommended (client credentials) or another Polaris-supported token mechanism.

Create an OAuth2 Credential:

The following procedure creates an OAuth2 credential with the name POLARIS_OAUTH for authenticating access to an Apache Polaris Iceberg catalog.

BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('POLARIS_OAUTH'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'POLARIS_OAUTH',
    username        => '<client_id>',
    password        => '<client_secret>'
  );
END;
/

Create Storage Credential

The following procedure creates a storage credential named S3_CRED for accessing object storage (example: Amazon S3) with an AWS access key ID and secret access key.

-- Storage credential for your object store (example: S3)
BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('S3_CRED'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'S3_CRED',
    username        => '<aws_access_key_id>',
    password        => '<aws_secret_access_key>'
  );
END;
/

Create External Table

The following procedure defines an external table called SALES_POLARIS in Databricks that accesses data stored using Iceberg format managed by Polaris catalog.

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name      => 'SALES_POLARIS',
    credential_name => 'S3_CRED',
    format          => '{
      "access_protocol": {
        "protocol_type": "iceberg-rest",
        "protocol_config": {
          "iceberg_catalog_type": "polaris",
          "rest_catalog_endpoint": "<https://<your-polaris-endpoint>/...>",
          "rest_authentication": {
            "rest_auth_cred": "POLARIS_OAUTH",
            "rest_auth_endpoint": "<https://<your-oauth-token-endpoint>>"
          },
          "table_path": ["<db>","<schema>","<table>"]
        }
      }
    }'
  );
END;
/

Quick Functionality Check

The following procedure runs a query to verify that the external table is correctly set up and accessible.

SELECT COUNT(*) FROM SALES_POLARIS;

Note

Keep endpoint and token URL placeholders as they vary by Polaris configuration.

Parent topic: Provider Quickstarts

AWS Glue Catalog

This topic describes how to access Amazon S3 data via Glue Data Catalog with registered Iceberg tables using AWS credentials.

Prerequisites:

Glue Data Catalog with the Iceberg table registered (S3 objects accessible).

Region name for Glue (e.g., us-east-1).
Auth: Access key/secret for S3; Glue access via catalog configuration.
Credential vending via AWS ARN is unsupported. Explicit credentials must be supplied.

Create a storage credential

The following procedure creates a storage credential named S3_CRED in Databricks to enable access to data stored in an Amazon S3 bucket.

-- S3 credential
BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('S3_CRED'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'S3_CRED',
    username        => '<aws_access_key_id>',
    password        => '<aws_secret_access_key>'
  );
END;
/

Create an external Iceberg table

Creates an external Iceberg table named ORDERS_GLUE in Databricks.

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name      => 'ORDERS_GLUE',
    credential_name => 'S3_CRED',
    format          => '{
      "access_protocol": {
        "protocol_type": "iceberg",
        "protocol_config": {
          "iceberg_catalog_type": "aws_glue",
          "iceberg_glue_region": "us-east-1",
          "table_path": ["<database>","<table>"]
        }
      }
    }'
  );
END;
/

Quick Functionality Check

The following procedure runs a query to count all rows in the external table ORDERS_GLUE verifying the connection and data accessibility.

SELECT COUNT(*) FROM ORDERS_GLUE;

Parent topic: Provider Quickstarts

Hadoop/Filesystem (direct metadata file)

This topic explains how to create a storage credential for accessing an Iceberg table’s metadata file directly from object stores like ADLS, S3, or OCI. It explains categorizing the types of direct metadata management for Iceberg tables stored directly in the filesystem (usually Hadoop compatible file systems) without using a catalog service.

Example: Query an Iceberg table using JSON Metadata

Prerequisites:

You can access the table’s Iceberg root manifest (metadata.json) in your object store (ADLS/S3/OCI).
This path is point-in-time. To follow new snapshots, recreate the external table.

Create a Storage Credential

This procedure first tries to drop an existing credential called STORE_CRED if it exists (ignoring any errors). Then, it creates a new credential named STORE_CRED.

-- Storage credential for wherever the metadata.json lives
BEGIN
  BEGIN DBMS_CLOUD.DROP_CREDENTIAL('STORE_CRED'); EXCEPTION WHEN OTHERS THEN NULL; END;
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'STORE_CRED',
    username        => '<user-or-key>',
    password        => '<secret-or-token>'
  );
END;
/

Create an External Table

It creates an external table named CUSTOMERS_META.

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name      => 'CUSTOMERS_META',
    credential_name => 'STORE_CRED',
    file_uri_list   => 'https://<bucket-or-container>/<path>/metadata.json',
    format          => '{"access_protocol":{"protocol_type":"iceberg"}}'
  );
END;
/

Quick Functionality Check

The following procedure runs a query to count all rows in the external table.


SELECT COUNT(*) FROM CUSTOMERS_META;

Example: Query an Iceberg table using Hadoop Catalog on OCI

In this example, we query the Iceberg table db.icebergTablePy , created using OCI Data Flow, where Spark uses the HadoopCatalog implementation for the Iceberg catalog. HadoopCatalog uses a lakehouse folder iceberg in bucket my-iceberg-bucket and puts the Iceberg metadata in a $database_name/$table_name subfolder under this directory. It also uses a version-hint.text file that contains the version number for the most recent metadata file version.

Create an external table for table db.icebergTablePy as follows:

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
    table_name       => 'iceberg_parquet_time_dim3',
    credential_name  => 'OCI_CRED',
    format           => '{
      "access_protocol": {
        "protocol_type": "iceberg",
        "protocol_config": {
          "iceberg_catalog_type": "hadoop",
          "iceberg_lakehouse": "https://objectstorage.uk-cardiff-1.oraclecloud.com/n/my-tenancy/b/my-iceberg-bucket/o/iceberg",
          "iceberg_table_path": "db.icebergTablePy"
        }
      }
    }'
  );
END;
/

Parent topic: Provider Quickstarts

References

This section provides a list of references for links cited throughout this chapter:

DBMS_CLOUD.CREATE_EXTERNAL_TABLE
DBMS_CLOUD.CREATE_CREDENTIAL
A blog on Autonomous AI Database Insider: Iceberg Tables: A New Data Source for Oracle Autonomous AI Database
Building Lakehouses with Apache Iceberg and Autonomous AI Database | javidatatech.com

Parent topic: Query Apache Iceberg Tables

Oracle Cloud Infrastructure Documentation

Query Apache Iceberg Tables

About Querying Apache Iceberg Tables

Restrictions of Querying Apache Iceberg Tables

Concepts Related to Querying Apache Iceberg Tables

Credentials for Iceberg: REST Catalog vs. Object Store

Typical Workflow of Querying Apache Iceberg Tables

Provider Quickstarts

Databricks Unity Catalog

Snowflake Polaris

AWS Glue Catalog

Hadoop/Filesystem (direct metadata file)

References