Machine learning platform on Autonomous Data Warehouse

To keep pace with rapidly changing information needs, organizations are looking for every opportunity to quickly train, deploy, and manage machine learning (ML) models.

With Oracle Autonomous Data Warehouse (ADW) you have all the necessary built-in tools to load and prepare data, and to train, deploy, and manage machine learning models. These services are included with Autonomous Data Warehouse, but you also have the flexibility to mix and match other tools to best fit your organization’s needs.

This reference architecture positions the technology solution within the overall business context:

Description of data-driven-business-context.png follows
Description of the illustration data-driven-business-context.png

When organizations implement a data warehouse or data mart in conjunction with a machine learning platform in the cloud, they usually need to cobble together multiple services to implement an end-to-end solution. While for some organizations, this is achievable, for others that lack the experience or resources to do so, it can be a daunting task.

A comprehensive machine learning platform should, at a minimum, include the following:

  • Easy access to both structured and unstructured data
  • Ability to build and manage data engineering pipelines
  • Ability to build models and score data at scale to meet business objectives
  • Collaborative platform for building machine learning models
  • Simple process for managing and deploying models
  • Use AutoML to expand the reach of those able to build machine learning models and to accelerate data scientists' work

The machine learning platform of tools included in Autonomous Data Warehouse provide departments and organizations an effective way to deliver the benefits of machine learning without relying heavily on IT resources and availability. In addition, product updates and security patches are automatically handled through Autonomous Data Warehouse.

Architecture

This architecture uses data science and machine learning features embedded in Oracle Autonomous Data Warehouse to analyze data from a broad range of enterprise data resources for business analysis and machine learning.

The following diagram shows multiple paths a user can follow, depending on the use case. The easiest path (solid lines) provides a simple method for performing data engineering tasks, for building machine learning models, and for managing and deploying models with tools embedded in Autonomous Data Warehouse (ADW). For more advanced use cases (dashed lines), we have included other Oracle Cloud Infrastructure (OCI) services that seamlessly integrate with the services included in ADW (enclosed in the gray-lined box).

Description of ml-adw-architecture.png follows
Description of the illustration ml-adw-architecture.png

ml-adw-architecture-oracle.zip

The architecture focuses on the following logical divisions:

  • Ingest, Transform

    Ingests and refines the data for use in each of the data layers in the architecture.

  • Persist, Curate, Create

    Facilitates access to and navigation of the data to show the current business view. For relational technologies, data may be logically or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For non-relational data, this layer contains one or more pools of data, either output from an analytical process or data optimized for a specific analytical task.

  • Analyze, Learn, Predict

    Abstracts the logical business view of the data for the consumers. This abstraction facilitates agile approaches to development, migration to the target architecture, and the provision of a single reporting layer from multiple federated sources.

The following diagram shows a mapping of the architecture to services provided on Oracle Cloud Infrastructure using security best practices.



oci-adb-oac-arch-gw-oracle.zip

The architecture has the following components:

  • Data integration

    Autonomous Data Warehouse comes with the embedded tools necessary to acquire, load, and transform your data for many departmental scenarios and specific advanced use cases. Included with Autonomous Data Warehouse is a load capability that allows you to quickly load data from local or object storage. Also included is Autonomous Data Transforms which allow you to connect to data from many different source types and access ELT type functionality.

    For more advanced use cases, there is Oracle Cloud Infrastructure Data Integration. Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, native cloud service that helps you with common extract, load, and transform (ETL) tasks such as ingesting data from different sources, cleansing, transforming, and reshaping that data, and then efficiently loading it to target data sources on Oracle Cloud Infrastructure.

  • Autonomous Data Warehouse

    Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

    With Autonomous Data Warehouse, you have the flexibility to load data into multiple formats including structured, JSON, XML, graph, and spatial. Bundled with this service are the Autonomous Tools that allow you to easily load data into tables and do light ETL work.

    Oracle Machine Learning is built into the core of Autonomous Data Warehouse. This enables running in-database algorithms in the kernel of the database and produces first-class database objects for immediate deployment.

  • Object storage

    Oracle Cloud Infrastructure Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Oracle Cloud Infrastructure Object Storage can store an unlimited amount of unstructured data of any content type, including analytic data. You can safely and securely store or retrieve data directly from the internet or from within the cloud platform. Multiple management interfaces let you easily start small and scale seamlessly, without experiencing any degradation in performance or service reliability.

  • Predict

    Oracle Machine Learning Services extend Oracle Machine Learning (OML) functionality to support model deployment and model lifecycle management for both in-database Oracle Machine Learning models and third-party Open Neural Networks Exchange (ONNX) machine learning models via REST APIs. Oracle Machine Learning Services supports real-time and small-batch scoring for applications and dashboards.

    The REST API for Oracle Machine Learning Services provides REST endpoints with authentication via Autonomous Data Warehouse. These endpoints enable the storage and management of machine learning models and their metadata. These endpoints also allow for the creation of scoring endpoints for models.

    Oracle Machine Learning Services supports third-party classification or regression models that can be built using packages like Scikit-learn and TensorFlow, among others and then exported in ONNX format. Oracle Machine Learning Services supports integrated cognitive text analytics for topic discovery, keywords, summary, sentiment, and similarity. Oracle Machine Learning Services also supports image classification via third-party ONNX format model deployment, and supports scoring using images or tensors.

    Users can also predict directly in the database using in-database models from SQL, R, and Python for singleton, small batch, and large-scale batch scoring. Users can leverage OML4Py embedded Python execution to invoke user-defined Python function with models produced from third-party packages and make predictions from Python and REST interfaces.

  • Learn

    Oracle Machine Learning Notebooks provide a collaborative user interface for data scientists and business and data analysts to work with SQL and Python interpreters while also performing machine learning in Oracle Autonomous Database—which includes Autonomous Data Warehouse (ADW), Autonomous Transaction Processing (ATP), and Autonomous JSON Database (AJD). Oracle Machine Learning Notebooks enable the broader data science team (data scientists, citizen data scientists, data analysts, data engineers, DBAs) to work together to explore their data visually and to develop analytical methodologies using OML4SQL and OML4Py. The Notebooks interface provides access to Oracle's high-performance, parallel, and scalable in-database implementations of machine learning algorithms via Python, SQL, and PL/SQL. In-database functionality can also be accessed through connection to Autonomous Database via external interfaces, such as SQL Developer, open source notebook environments, and third-party IDEs.

    OML4Py also provides a Python API for automated machine learning (AutoML) for automated algorithm and feature selection, and for automated model tuning and selection.

    Oracle Machine Learning AutoML User Interface (OML AutoML UI) is a no-code user interface that provides automated machine learning with ease of deployment to Oracle Machine Learning Services. Business users without extensive data science background can use OML AutoML UI to create and deploy machine learning models as well as generate an OML notebook containing the corresponding OML4Py code to rebuild the model and score data programmatically.

    Expert data scientists may use OML AutoML UI as a productivity accelerator for faster model exploration, for ease of deployment, and for starter notebook generation.

  • Analytics

    Oracle Analytics Cloud is a scalable and secure public cloud service that provides a full set of capabilities to explore and perform collaborative analytics for you, your workgroup, and your enterprise.

    Oracle Analytics Cloud is integrated with Oracle Machine Learning with access to in-database models that can be searched, visualized, and deployed within Oracle Analytics Cloud workflows and dashboards.

    With Oracle Analytics Cloud you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.

Recommendations

Use the following recommendations as a starting point creating a platform for both an advanced cloud data warehouse and for a machine learning operations framework.

Your requirements might differ from the architecture described here.

  • Ingest, Transform

    Autonomous Database Tools is functionality embedded in Oracle Autonomous Data Warehouse that provides the capabilities to load, transform, catalog, gain insights and even develop business models in a simple straightforward fashion.

  • Analyze, Learn, Predict

    Before you connect Oracle Analytics Cloud to Oracle Autonomous Data Warehouse, have a database administrator allow the IP address (or address range) for your Oracle Analytics Cloud instance. The database administrator must add a security rule that allows TCP/IP traffic from Oracle Analytics Cloud to the database.

Considerations

When creating a machine learning operations framework in conjunction with your cloud data warehouse, consider these implementation options.

  • Data gravity: Keep your machine learning operations framework close to your data to limit the high cost of data movement, both monetarily and in terms of machine learning model development time (even for data scoring using machine learning models).
  • Quicker time to value: The recommendations in the table below will help you to get started faster and reduce the time it takes to begin realizing the value of your solution.
Guidance Recommended Other Options Rationale
Ingest, Transform Autonomous Database Tools Oracle Cloud Infrastructure Data Integration This is use case dependent. For easy loading of data from files on Object Storage or Local data storage, use Autonomous Database Tools. As mentioned previously, Autonomous Data Warehouse Data Transforms can also be utilized depending on use case. For more advanced cases, use Oracle Cloud Infrastructure Data Integration, which is an on-demand service.
Persist Oracle Autonomous Data Warehouse Autonomous Data Warehouse is a cloud data warehouse that not only provides the analytics needs of a data warehouse but also includes the functionality to deploy an advanced Oracle Machine Learning operations framework. You can also directly access the data from object storage via external tables stored in any number of formats and types.
Learn Oracle Machine Learning Notebooks with OML4SQL, OML4Py, and OML4R

Oracle Machine Learning AutoML UI

3rd Party

OCI Data Science

OML Notebooks is a collaborative notebook environment included in the Autonomous Data Warehouse platform. Using OML4SQL, OML4Py, and OML4R, a user can build models directly in the database. In-database models can be exported and imported between Oracle Database and Autonomous Data Warehouse. Users can build Python and R models by using third party tools with custom conda environments within Autonomous Database, or build them outside of the Oracle Machine Learning framework and store these native models in the database datastore for use with OML4Py-embedded and OML4R-embedded execution.
Predict

Oracle Machine Learning services

Oracle Machine Learning Notebooks with OML4SQL, OML4Py, and OML4R

Oracle Cloud Infrastructure Data Science

In-database models using SQL queries and OML4R/OML4Py interfaces

Ability to score model via REST API with model deployment managed by Oracle Machine Learning Services. Oracle Machine Learning Services also allows for the import of models created outside of the Oracle Machine Learning framework via the ONNX format. This can include models produced within Oracle Cloud Infrastructure Data Science.
Access and Interpretation Oracle Analytics Cloud 3rd Party tools Oracle Analytics Cloud is fully managed and tightly integrated with the Oracle Machine Learning framework. One of the key capabilities is the ability to deploy models built in Oracle Machine Learning to Oracle Analytics Cloud for scalable machine learning and in dashboards.

Deploy

The code required to deploy this reference architecture is available in GitHub. You can pull the code into Oracle Cloud Infrastructure Resource Manager with a single click, create the stack, and deploy it. Alternatively, download the code from GitHub to your computer, customize the code, and deploy the architecture by using the Terraform CLI.

  • Deploy by using Oracle Cloud Infrastructure Resource Manager:
    1. ClickDeploy to Oracle Cloud

      If you aren't already signed in, enter the tenancy and user credentials.

    2. Review and accept the terms and conditions.
    3. Select the region where you want to deploy the stack.
    4. Follow the on-screen prompts and instructions to create the stack.
    5. After creating the stack, click Terraform Actions, and select Plan.
    6. Wait for the job to be completed, and review the plan.

      To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

    7. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
  • Deploy using the Terraform code in GitHub:
    1. Go to GitHub.
    2. Clone or download the repository to your local computer.
    3. Follow the instructions in the README document.

Change Log

This log lists significant changes: