Standardize Healthcare Data Using Analytics and AI Architecture

Modernize and standardize healthcare data, apply data models, and extract actionable intelligence to gain insights and improve customer experience.

Payer, provider, and claims data can be enriched and advanced analytics techniques, including artificial intelligence, can be applied for use cases such as patient care and disease prevention, evidence-based decision making in pre-authorization, analysis, detection, and claim fraud prevention, and optimization of medical alarm parameters for hospitals and healthcare providers.

Architecture

This architecture accelerates the digitization and modernization of healthcare business functions using their existing data.

Oracle Cloud Infrastructure (OCI) services can be used to ingest, process, and analyze data to gain business intelligence, improve customer experience, and enhance operational efficiency. Oracle offers a comprehensive and fully integrated stack of cloud applications and cloud platform services.

OCI provides an easy and flexible way to deploy and scale large language models. Oracle provides various choices to apply artificial intelligence to your business applications and accelerate innovation using Oracle’s SaaS solutions, Data and AI platform, and lower cost, higher performance high performance compute, storage, and network infrastructure than any other cloud providers to build, test, deploy, and use state of the art AI applications. If you’re new to OCI, you can try out this solution for free using Oracle Cloud Free Tier, which provides US$300 free trial credits for a 30-day period. Free Tier also includes several Always Free services that are available for an unlimited time, even after your free credits expire.

In this reference architecture oracle cloud, we can implement and apply Zero Trust security, data protection and privacy, and automated logging and monitoring solutions. Data at rest and in transit can be encrypted using industry standard encryption technologies. System logging and application performance monitoring can be implemented using OCI Logging, and a web application firewall can be used along with OCI API Gateway protecting from potential DDoS attack and cyber threats.

The following diagram illustrates this reference architecture.



oci-healthcare-lifescience-aiml-oracle.zip

The architecture has the following components:

  • Data Integration

    Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment.

    ETL (extract transform load) leverages fully-managed, scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data.

    Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation, and helps data engineers protect against schema drift by defining rules to handle schema changes.

  • GoldenGate

    Oracle Cloud Infrastructure GoldenGate is a managed service providing a real-time data mesh platform, which uses replication to keep data highly available, and enabling real-time analysis. Customers can design, execute, and monitor their data replication and stream data processing solutions without the need to allocate or manage compute environments.

  • Object storage

    Oracle Cloud Infrastructure Object Storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Functions

    Oracle Cloud Infrastructure Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. OCI Functions enables you to deploy your code, and either call it directly or trigger it in response to events. OCI Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

  • Dataflow

    Oracle Cloud Infrastructure Data Flow is a fully managed service for running Apache Spark applications. It lets developers focus on their applications and provides an easy runtime environment to run them. It has an easy and simple user interface with API support for integration with applications and workflows.

  • Autonomous Data Warehouse

    Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating, backing up, patching, upgrading, and tuning the database.

  • File storage

    Oracle Cloud Infrastructure File Storage provides a durable, scalable, secure, enterprise-grade network file system. You can connect to OCI File Storage from any bare metal, virtual machine, or container instance in a VCN. You can also access OCI File Storage from outside the VCN by using Oracle Cloud Infrastructure FastConnect and IPSec VPN.

  • Slurm scheduler and database (open source)

    Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

  • Monitoring

    Oracle Cloud Infrastructure Monitoring service actively and passively monitors your cloud resources using metrics to monitor resources and alarms to notify you when these metrics meet alarm-specified triggers.

  • Logging
    Logging is a highly scalable and fully managed service that provides access to the following types of logs from your resources in the cloud:
    • Audit logs: Logs related to events emitted by the Audit service.
    • Service logs: Logs emitted by individual services such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN flow logs.
    • Custom logs: Logs that contain diagnostic information from custom applications, other cloud providers, or an on-premises environment.
  • Compute

    With Oracle Cloud Infrastructure Compute, you can provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.

  • Vector Database (Oracle DB 23ai)

    A vector database is any database that can natively store and manage vector embeddings and handle the unstructured data they describe, such as documents, images, video, or audio.

  • Data Catalog

    Oracle Cloud Infrastructure Data Catalog is a fully managed, self-service data discovery and governance solution for your enterprise data. It provides data engineers, data scientists, data stewards, and chief data officers a single collaborative environment to manage the organization's technical, business, and operational metadata.

  • Langchain

    LangChain is an open source, modular framework for creating applications from large language models (LLMs). You can use LangChain to build chatbots, analyze text, perform Q&A from structured data, interact with APIs, and create applications that use generative AI.

  • Integration

    Oracle Integration is a fully managed, preconfigured environment that allows you to integrate cloud and on-premises applications, automate business processes, and develop visual applications. It uses an SFTP-compliant file server to store and retrieve files and allows you to exchange documents with business-to-business trading partners by using a portfolio of hundreds of adapters and recipes to connect with Oracle and third-party applications.

Data ingestion and processing

  • Oracle Cloud provides comprehensive hybrid and multi-cloud solutions integrating data across on-premises, other cloud platforms and internet. OCI GoldenGate and Data Integration platform-as-a-service can be used to ingest data from variety of source systems depending on type of the sources. OCI GoldenGate can be used to replicate data, keep it in sync, and repair it to maintain data integrity and consistency. Oracle Integration can connect to various enterprise applications and ingest data. Bulk data transfers can be done using secured FTP, HL7v2 over MLP, and standard Fast Healthcare Interoperability Resources (FHIR) web services.
  • Data from healthcare systems such as electronic health records (EHRs), patient information, claims and provider data, data from medical devices, and genomic information can be moved to Oracle's highly available, durable, and low-cost object storage as staging area.
  • OCI Data Flow can be used to process raw data triggered by OCI Functions as new data arrives in the staging area. Data Flow on Oracle Cloud platform provides serverless, spark-based, accelerated data preparation and processing service where code can be written using PySpark, SQL, or Java based on your preference without managing and maintaining any infrastructure.
  • Prepared and processed data can be written into Oracle Autonomous Data Warehouse and OCI Object Storage as a curated stage for downstream processing and consumption. Oracle Autonomous Data Warehouse is an industry-leading, fully managed analytical database platform with built-in scalability, security, management, and high availability. For healthcare data, privacy and protection of PII information is of the utmost importance. Oracle Autonomous Data Warehouse always provides data encryption at rest (AES256). The data also is encrypted in transition using TLS 2.0. Oracle Data Safe, which is included with Autonomous Database, provides a unified control center that helps you manage the day-to-day security and compliance requirements of Oracle databases. Oracle Data Safe provides advanced data security features required by healthcare such as data masking, data obfuscation, activity auditing, and SQL firewall management.
  • In the AI layer, the solution consists of Data Integration, AI integration, GPU and CPU clusters for LLM training and inference, AI development tools and libraries, context and catalog. For a model, OCI Generative AI service can be used for industry leading, state-of-art AI models from Cohere and Meta Llama 3.1 models, a dedicated high performing GPU cluster, Chat API and Playground, LangChain and LlamaIndex, and other open-source integration. Oracle offers a broader range of GPUs such as L40s, A10, A100, and H100, and an attractive price-performance distinction over other hyper-scalers.
  • Oracle offers a fully automated Slurm scheduler ready to deploy as part of cluster deployment automation. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
  • OCI offers Retrieval-Augmented Generation (RAG) as a managed service with OCI GenAI Agents (the service is still in beta and only supports OpenSearch as the knowledge base repository). Oracle Database 23ai and Oracle HeatWave MySQL are great for AI vector search and store. Using RAG, organizations can integrate and enrich large language model (LLM) response using existing knowledge base based on their existing data. For example, if a user asks a question, it retrieves pieces of information that contain additional contextual information and add to the question, then supply the question and retrieved texts to an LLM to augment the LLM’s response and reduce hallucination.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • Use a private virtual cloud network to deploy services and use a security list and NSGs to restrict unintended access.
  • Use OCI Identity and Access Management to apply the principal of least privilege and role-based access controls.
  • OCI API Gateway enables you to publish APIs with private endpoints that are accessible from within your network, and which you can expose to the public internet if required. The endpoints support API validation, request and response transformation, CORS, authentication and authorization, and request limiting.
  • OCI provides full HIPAA compliance and FedRAMP and other standard compliances for regulatory obligation and protection.
  • Use open-source technology to avoid vendor lock-in on OCI such as LangChain, REST API, Functions, and build an abstraction layer on top to accelerate innovation and transformation.

Acknowledgments

  • Authors: Gautam Karmakar
  • Contributors: John Sulyok