Data Science Service: Health care use cases
Oracle Cloud Infrastructure Data Science (OCI) Data Science is a fully managed, serverless platform for data science teams to build, train, and manage machine learning models.
Data Science integrates with the rest of the OCI stack, including Oracle Functions, Data Flow, Autonomous Data Warehouse, and Object Storage. Oracle Accelerated Data Science (ADS) software developer kit (SDK) is a Python library that's included as part of the Data Science service, which has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring and visualizing data, training a model with AutoML, evaluating models, and explaining models. ADS also provides a simple interface to access the Data Science service model catalog and other OCI services, including Object Storage.
Architecture
This flexible architecture supports multiple scenarios across integrated health networks based on Oracle Machine Learning service, combining Autonomous Data Warehouse and Data Science platforms.
In addition to Data Science and Autonomous Data Warehouse, this architecture also uses Data Catalog, Oracle APEX Application Development, and Oracle Analytics Cloud. It also uses OCI Compute instances to host applications that can dynamically stream wearable device data to either Autonomous Data Warehouse or Object Storage. This architecture serves multiple purposes, including storing important data in secure, reliable, and quick-retrieval storage, and building and deploying the applications and machine learning modules in short periods of time.
The following diagram illustrates this reference architecture.
Description of the illustration architecture-datascience-use-cases.png
The architecture has the following components:
- Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.
- Fault domains
A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.
- Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- Data Science service
A fully managed, serverless platform for data science teams to build, train, and manage machine learning models. It can easily integrate with other OCI services such as Autonomous Data Warehouse, Object Storage, and more.
- Autonomous Data Warehouse
An Oracle autonomous database that includes Oracle Machine Learning. Data scientists can build, evaluate, score, and deploy machine learning models using in-database Oracle Machine Learning features and the related Notebooks interface. You can also use Autonomous Transaction Processing.
- Application VM
An OCI Compute instance with Oracle Linux installed and ready for installation of tools and applications that need access to the database.
- Data Catalog
OCI Data Catalog is a fully managed, self-service data discovery and governance solution for your enterprise data. Data Catalog provides a single collaborative environment to manage technical, business, and operational metadata.
- Oracle
Analytics Cloud
Oracle Analytics Cloud empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation.
Oracle Analytics Cloud is integrated with Oracle Machine Learning. This integration allows analysts to list available in-database models and use those models in Oracle Analytics Cloud analytics and dashboards.
- APEX
Oracle APEX Application Development is a low-code development platform that enables you to build scalable and secure enterprise applications that you can deploy anywhere. It's included with Autonomous Database and requires no installation. APEX users can access models and results from Oracle Machine Learning.
Recommendations
Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.
- VCN
When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.
Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.
After you create a VCN, you can change, add, and remove its CIDR blocks.
When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.
- Security
Use Oracle Cloud Guard to monitor and maintain the security of your resources in OCI proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with those actions, based on responder recipes that you can define.
For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.
- Autonomous Data Warehouse
Create a separate schema for exclusive use by data scientists. Grant the schema read-only access to the main data warehouse schema. This arrangement allows data scientists to create local views of data for exploration, analysis, and model building. Where needed, shared data can be copied into their own schema where they can modify it locally.
- Virtual Machines
The VMs are distributed across multiple fault domains for high availability. We recommend using a flexible VM shape for the compute instance; this will allow you to increase or decrease the capacity of the VMs in minutes.
- Object Storage
Object Storage offers reliable and cost-efficient data durability, it provides quick access to large amounts of structured and unstructured data of any content type, including database data, analytic data, images, videos and more. We recommend using standard storage to ingest data from external sources because applications and users can access it quickly. You can build a lifecycle policy to move the data from standard storage to archive storage when it’s no longer required to be accessed frequently.
Considerations
Consider the following points when deploying this reference architecture.
- Security
Use policies to restrict who can access the OCI resources that your company has and how they can access them.
- Application availability
Fault domains provide the best resilience within a single availability domain. You can deploy Compute instances that perform the same tasks in multiple fault domains. This design removes a single point of failure by introducing redundancy.
- Cost
Evaluate your requirements to choose the appropriate Compute shapes.
- Monitoring and alerts
Set up monitoring and alerts on CPU and memory usage for your nodes so that you can scale the shape up or down as needed.
Deploy
The code required to deploy this reference architecture is available in GitHub. You can pull the code into Oracle Cloud Infrastructure Resource Manager with a single click, create the stack, and deploy it. Alternatively, download the code from GitHub to your computer, customize the code, and deploy the architecture by using the Terraform CLI.
- Deploy by using Oracle Cloud Infrastructure Resource
Manager:
- Click
If you aren't already signed in, enter the tenancy and user credentials.
- Review and accept the terms and conditions.
- Select the region where you want to deploy the stack.
- Follow the on-screen prompts and instructions to create the stack.
- After creating the stack, click Terraform Actions, and select Plan.
- Wait for the job to be completed, and review the plan.
To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.
- If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
- Click
- Deploy using the Terraform code in GitHub:
- Go to GitHub.
- Clone or download the repository to your local computer.
- Follow the instructions in the
README
document.