Use Lakehouse with Autonomous AI Database

Learn the benefits of using Lakehouse with Autonomous AI Database.

About Lakehouse with Autonomous AI Database

Oracle Autonomous AI Database is a versatile solution for accommodating any type of data and workload.

Autonomous AI Database provides cost-efficient storage, with a cost per TB comparable to object stores, while supporting diverse data types like JSON, Graph, and Vector. With Autonomous AI Database, businesses can consolidate their data onto a single platform. They can leverage converged capabilities such as Oracle Machine Learning (OML), Graph, Spatial, Vector, and Blockchain to manage their data comprehensively.

For organizations that already have existing Lakehouses on other platforms, Oracle Autonomous AI Database integrates seamlessly, allowing businesses to benefit from Autonomous AI Database’s advanced features without disrupting their current setups.

To learn more, try the LiveLabs Title Build a Lakehouse with Autonomous AI Lakehouse.

What is a Lakehouse ?

Lakehouses are centralized repositories designed to store vast amounts of raw data in their native format until the data is needed for analysis.

They are highly flexible and scalable, making them a powerful complement to traditional Lakehouses by enabling organizations to store and process various types of data, including structured, semi-structured, and unstructured.

Key attributes of a Lakehouse:

Key Lakehouse Features of Autonomous AI Database

Oracle Autonomous AI Database is designed to seamlessly support Lakehouse workloads, eliminating the need for management or installation. It delivers robust capabilities to handle various data formats across different cloud environments, ensuring flexible and comprehensive data analysis.

Ready for Lakehouse Workloads

Oracle Autonomous AI Database is fully ready for Lakehouse workloads out-of-the-box, requiring no additional components. This readiness extends to key Lakehouse tasks such as data transformation, metadata management, and integration with popular Lakehouse tools-all available from day one without extra setup.

This comprehensive readiness is what makes Autonomous AI Database stand out, offering an integrated, hassle-free experience that accelerates time-to-insight for Lakehouse workloads. This means users can immediately start handling Lakehouse tasks without any setup or configuration, making it a true plug-and-play solution for Lakehouse environments. This built-in capability simplifies operations, reduces maintenance costs, and ensures higher reliability with fewer errors.

Autonomous AI Database provides a set of tools for all user types, from developers to business analysts, making the platform universal and accessible.

Description of image follows

Description of the illustration data-lake-workloads.png

Developers can use tools such as the PL/SQL API for advanced operations, scripting, and automation, allowing seamless integration with existing tools and creating customized database solutions efficiently. See Autonomous AI Database Supplied Package Reference, for more information.

For business users, Data Studio can be used-a web-based interface for simplifying data interaction, exploration, and visualization. Data Studio enables non-technical users to derive insights, create reports, and collaborate effectively, reducing complexity and supporting informed decision-making. See The Data Studio Overview Page, for more information.

Multi-Cloud Support

For organizations that already have existing Lakehouses on other platforms, Autonomous AI Database integrates seamlessly, allowing businesses to benefit from Autonomous AI Database advanced features without disrupting their current setups.

Provide Autonomous AI Database access to your Lakehouse by granting the necessary privileges and access for your Lakehouse to be connected to Autonomous AI Database. Once you’ve provided the necessary credentials, Autonomous AI Database can seamlessly connect to Lakehouses across various cloud environments, including AWS, Azure, Google Cloud, and Oracle OCI object store.

This capability allows you to securely access and manage your data, leveraging the native security features of each cloud provider. With this multi-cloud support, you gain the flexibility to deploy and scale your Lakehouse across different cloud platforms while maintaining a unified and secure environment.

Oracle Autonomous AI Database supports native security for other clouds, to learn more, see Use Amazon Resource Names (ARNs) to Access AWS Resources, Use Azure Service Principal to Access Azure Resources and Use Google Service Account to Access Google Cloud Platform Resources for your corresponding cloud platform.

Description of image follows

Description of the illustration data-lake-multicloud.png

End-To-End Data Format Support

Oracle Autonomous AI Database is designed with the flexibility to handle a broad spectrum of data formats, making it a universal solution for diverse data sources and workloads.

Whether your data resides in structured, semi-structured, or unstructured formats, Autonomous AI Database seamlessly supports them across various cloud environments. This allows businesses to ingest, store, and analyze data without worrying about format compatibility.

Autonomous AI Database provides native support for traditional formats like CSV and JSON, as well as advanced formats such as AVRO, Parquet, and ORC. See Query External Data with Autonomous AI Database, for more information. Autonomous AI Database supports the following file formats: CSV, JSON, XML, AVRO, ORC, Parquet, Delta Sharing, Iceberg, Word, PDF.

With the added support for the Iceberg Table format, Autonomous AI Database offers enhanced capabilities for large-scale Lakehouse environments. Iceberg allows for optimized, high-performance querying, better version control, and easier data management, making it a good fit for large, evolving datasets. See Query Apache Iceberg Tables, for more information.

Enhanced Capabilities: Autonomous AI Database for Unstructured Data Management

While Oracle AI Database is recognized for its powerful processing of structured and semi-structured data, Autonomous AI Database extends its capabilities to handle unstructured datasets as well.

These capabilities include managing and analyzing a wide range of formats like JPG, PDF, Word documents, and more. With these advancements, Autonomous AI Database brings a comprehensive solution for businesses dealing with unstructured data sources.

These expanded capabilities position Autonomous AI Database as a powerful tool for handling the growing demands of unstructured data, while also tapping into AI-powered solutions, making it a versatile and future-proof platform for modern data challenges.

Flexible Metadata Management

Oracle Autonomous AI Database provides users with various ways to define metadata for their datasets, making data management more adaptable and efficient.

Federated Metadata Support

Autonomous AI Database supports a federated metadata catalog, enabling users to unify metadata from different sources into a single view, providing a unified interface for metadata management.

This approach simplifies metadata management across various environments by connecting data sources across multiple clouds and platforms. Whether using catalog-based metadata or defining it manually, all information is available in a unified catalog for easy browsing. For example, an organization can use this federated view to manage data assets from both AWS and Oracle Cloud, ensuring consistent governance and discoverability across platforms.

Description of image follows

Description of the illustration data-lake-uni-dcat.png

Collaboration

After users finish their analysis, they often need to share their results with others. Oracle Autonomous AI Database makes sharing easy by offering several ways to collaborate, providing unique advantages over other databases, such as integrated security features, open protocols, and seamless cloud connectivity.

These options are made to be flexible and secure, so they fit different collaboration needs:

Description of image follows

Description of the illustration data-lake-data-share.png

Broad Compatibility with Oracle AI Database Tools

The Autonomous AI Database environment is fully compatible with a wide array of Oracle database tools.

Any tool you already use to interact with Oracle databases-whether for data visualization, analytics, ETL, or administration-can also be leveraged seamlessly to analyze datasets within Autonomous AI Database . This compatibility ensures a frictionless experience, allowing users to integrate Autonomous AI Database into their existing workflows without needing to adopt new tools or processes, thereby maximizing efficiency and reducing the learning curve.

See The Data Studio Overview Page, for information on a few of the tools available to use with Oracle databases.

Performance

Autonomous AI Database includes numerous optimizations specifically designed for querying data stored in Object Store and utilizing open table formats, such as Apache Iceberg.

Data Lake Accelerator

The Data Lake Accelerator is a dynamic scale-out service that significantly enhances query performance by offloading intensive scan operations, including filtering, projection, and decompression from your Autonomous AI Database to a dedicated, pool of compute resources. This service dynamically provisions and adds ECPUs only for the duration of query execution, enabling large scans to finish faster by parallelizing data processing directly at the source, without requiring the data to be loaded into the database. Upon query completion, the allocated resources are automatically released, ensuring efficient consumption based utilization. See Data Lake Accelerator for more information.

External Table Cache

The external table cache lets you store frequently accessed external data locally. When you use the cache, queries on external tables can retrieve data directly from within the Autonomous AI Database, making them significantly faster. You don’t need to change existing SQL statements or workflows to benefit from faster access, as this caching mechanism is fully transparent to applications. You can create external table cache for partitioned and nonpartitioned external tables created on Parquet, ORC, AVRO, CSV and Iceberg Tables. See Use Lake Cache to Improve Performance for External Tables for more information.

Implicit Partitioning

Implicit partitioning in Autonomous AI Database automatically recognizes common folder and file naming patterns in your Object Store paths, for example, '.../country=US/year=2024/month=01/'. The database treats these naming conventions as partition keys, enabling it to skip files and folders that are irrelevant to your query filters. This delivers partition pruning benefits without requiring you to manually define partitions in your table DDL or alter your existing directory structure. As a result, queries scan less data from Object Store and deliver faster results, especially when working with large datasets. See Query External Tables with Implicit Partitioning for more information.

Choosing the Right Feature*

     
Feature Use Case Data Volume
External Table Cache Utilize for repeated, interactive, or scheduled dashboards. Medium (GBs to low TBs)
Data Lake Accelerator Utilize to scale out heavy or adhoc scans over extensive data. Very Large (TBs to PBs)
Implicit Partitioning Utilize to query or analyze large datasets organized by folder or file naming patterns, for example, by date, region, or other attributes, in Object Store. Medium to Large (GBs to TBs)
Hybrid External table Cache to cache frequently accessed (hot) data subsets and Data Lake Accelerator to query against the full historical data. All volumes