Oracle Machine Learning is embedded in both Oracle Autonomous Data Warehouse and Oracle Autonomous Transaction Processing. Because the machine learning algorithms are resident in the database, data scientists can avoid the time, effort, and expense of moving the data to external systems for analysis and model building, scoring, and deployment.
The architecture uses a region with one availability domain and regional subnets. You can use the same architecture in a region that has multiple availability domains. We recommend that you use regional subnets for your deployment, regardless of the number of availability domains.
We assume that data from different sources is already loaded on Autonomous Data Warehouse. This data can be loaded using customer-provided or Oracle-provided data integration tools.
Although this reference architecture uses Autonomous Data Warehouse, Autonomous Transaction Processing can be used instead.
The following diagram illustrates this reference architecture.
Description of the illustration oci-ml-modeling.png
The architecture has the following components:
A region is a localized geographic area composed of one or more availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or continents).
- Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.
- Fault domains
A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you place Compute instances across multiple fault domains, applications can tolerate physical server failure, system maintenance, and many common networking and power failures inside the availability domain.
- Virtual cloud network (VCN) and subnets
A VCN is a software-defined network that you set up in an Oracle Cloud Infrastructure region. VCNs can be segmented into subnets, which can be specific to a region or to an availability domain. Both region-specific and availability domain-specific subnets can coexist in the same VCN. A subnet can be public or private.
- Autonomous Data Warehouse
An Oracle autonomous database that includes Oracle Machine Learning. Data scientists can build, evaluate, score, and deploy machine learning models using in-database Oracle Machine Learning features and the related Notebooks interface. Note that Autonomous Transaction Processing can also be used.
- User Application VM - Oracle Cloud Infrastructure
An Oracle Cloud Infrastructure Compute instance with Oracle Linux installed and ready for installation of tools and applications that need access to the database.
- Oracle Cloud Infrastructure Data Catalog
Oracle Cloud Infrastructure Data Catalog is a fully-managed, self-service data discovery and governance solution for your enterprise data. Data Catalog provides a single collaborative environment to manage technical, business, and operational metadata.
Oracle Analytics Cloud empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing/generation.
Oracle Analytics Cloud is integrated with Oracle Machine Learning. This allows analysts to list available in-database models and use those models in Oracle Analytics Cloud analytics and dashboards.
- Oracle Application Express (APEX)
APEX is a low-code development platform that enables you to build scalable and secure enterprise applications that can be deployed anywhere. APEX users can access models and results from Oracle Machine Learning.
- Security lists
For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.
- Internet Gateway
An internet gateway is a virtual router that provides direct access to the internet. It supports connections initiated from within the VCN (egress) and connections initiated from the internet (ingress).
- Dynamic Routing Gateway
A dynamic routing gateway connects an on-premises network to the VCN through either IPSec VPN or Oracle Cloud Infrastructure FastConnect. It allows traffic that uses private IPv4 addresses to flow between the VCN and networks outside the VCN's region.
Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.
When you create the VCN, determine how many IP addresses your cloud resources in each subnet require. Using the Classless Inter-Domain Routing (CIDR) notation, specify a subnet mask and a network address range that's large enough for the required IP addresses. Use an address range that's within the standard private IP address space.
Select an address range that doesn’t overlap with your on-premises network, so that you can set up a connection between the VCN and your on-premises network, if necessary.
After you create a VCN, you can't change its address range.
When you design the subnets, consider your traffic flow and security requirements. Attach all the compute instances within the same tier or role to the same subnet, which can serve as a security boundary.
- Security lists
Use security lists to define ingress and egress rules that apply to the entire subnet.
- Compute Shapes
Use the VM.Standard.2.4 Compute shape, which has 4 OCPU and 60GB of RAM. Use auto scaling to adjust the number of instances up or down automatically.
Assign public IP addresses so that data scientists can log in and install their tools and applications. Set up Security Lists and Network Security Groups to prevent unauthorized access.
- Autonomous Data Warehouse
Create a separate schema for exclusive use by data scientists. Grant the schema read-only access to the main data warehouse schema. This arrangement allows data scientists to create local views of data for exploration, analysis, and model building. Where needed, shared data can be copied into their own schema where they can modify it locally.
Use policies to restrict who can access the Oracle Cloud Infrastructure resources that your company has, and how they can access them.
- Application availability
Fault domains provide the best resilience within a single availability domain. You can deploy Compute instances that perform the same tasks in multiple availability domains. This design removes a single point of failure by introducing redundancy.
Evaluate your requirements to choose the appropriate Compute shapes.
- Monitoring and alerts
Set up monitoring and alerts on CPU and memory usage for your nodes so that you can scale the shape up or down as needed.