Design a data lakehouse for health insurance analytics

Health insurance providers need to analyze data across varied data sources for improving the claims management customer experience and preventing fraud. The data sources can include web transactions, office visits, phone transcriptions, and more; often siloed in multiple systems, without common storage, processing, or visualization tools.

Health insurance providers can capture a wealth of data and turn to cloud-based big data solutions to aggregate and manage data. An Oracle Cloud Infrastructure (OCI) data lakehouse architecture can be used to capture, manage and gain insights from data. With a lakehouse architecture, you can store and curate structured and unstructured data and access closely integrated services for organizing large volumes of highly diverse data from multiple sources.

An OCI data lakehouse architecture provides data lake and data warehouse capabilities, along with tightly integrated services such as Data Science and Oracle Analytics Cloud for a complete, advanced analytics experience.

The following reference architecture introduces a platform topology, component overview, and recommended best practices.

Architecture

The two main elements of this reference architecture are Oracle Health Insurance (OHI) and Oracle Revenue Management and Billing (RMB), delivered as Software-as-a-Service on Oracle Cloud. The architecture supports multiple scenarios across integrated healthcare networks based on the Data Science service, combining Autonomous Data Warehouse (ADW) and Data Lake capabilities. In addition, this architecture also uses services such as Data Catalog and Oracle Analytics Cloud.

  • Oracle Health Insurance Management System

    Enables insurers to simplify healthcare IT, achieve greater levels of operational efficiency, reduce costs, and adapt quickly to ongoing market and regulatory demands. It has the flexibility to manage different business rules on a single platform, it is scalable for small and large healthcare providers, and provides a unified experience for professionals and customers. It is transparent for claim adjudications and claims processing for a better customer experience, while ensuring the demanded security on the cloud.

  • Oracle Revenue Management and Billing for Healthcare Payers

    State-of-the-art business service that provides:

    • Streamlined and automated billing, payments, and collections processes to enable accurate and timely access to billing information and controlling the revenue leakages.
    • Supports exchange, individual billing, and group billing, Administrative Services Only billing with stop loss, government plans, and more, while minimizing cost and risk via a HIPAA-compliant cloud solution.
    • Improves customer satisfaction with timely and accurate billing and transparency, with the ability to scale to meet the demands of large volumes.
  • Oracle Integration Cloud (OIC)

    Enables integration for cloud and on-premises applications and can be used to:

    • Automate business processes
    • Gain insight into business processes
    • Develop visual applications
    • Use an SFTP-compliant file server to store and retrieve files, and exchange business documents with a B2B trading partner
Oracle Cloud Infrastructure is categorized as a no-view cloud service provider and can support customers who are in scope for HIPAA.

Note:

A Business Associate Agreement is required for identifying and establishing the respective responsibilities of Oracle Cloud Infrastructure and the customer for appropriately safeguarding patient health information in accordance with HIPAA and any amending legislation.

The following diagram introduces the conceptual health insurance lakehouse reference architecture.

Description of healthcare-lakehouse-arch.png follows
Description of the illustration healthcare-lakehouse-arch.png

healthcare-lakehouse-arch.zip

In the representation above, we can find the following integrations:
  • OHI integrates with OIC via the Oracle Insurance Gateway (OIG).
  • OHI can be integrated directly with the OCI technical adapters (e.g. REST, SOAP, File/SFTP, B2B, ERP), or integrated through API Gateway, using serverless Oracle Functions service for custom transformations.
  • OIC sends data to the ADW in the data lakehouse through Data Integration.
  • Oracle RMB integrates with the data lakehouse / ADW using Oracle GoldenGate, with the possibility of additional integration via API Gateway to the OIC technical adapters.
  • Oracle ERP and EPM are integrated with each other and ERP integrates directly with the data lakehouse and OIC.
  • OIC, in this example, plays the additional role of integration with the customer data center, including PeopleSoft and Siebel CRM, and third-party applications.
  • This integration uses a Dynamic Routing Gateway (DRG) and FastConnect / VPN connectivity.
  • ADW uses a query accelerator for fast and seamless querying of the Object Storage data lake.
  • Oracle Analytics Cloud and Data Science integrate seamlessly with other services from the data lakehouse.

The architecture serves multiple purposes, including storing important data in a secure, reliable and quick retrieval storage, being the source for machine learning modules, and providing advanced visualization and reporting capabilities for internal and external usage.

The architecture has the following components:

  • Autonomous Data Warehouse

    Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Object Storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Data Catalog

    Oracle Cloud Infrastructure Data Catalog is a fully managed, self-service data discovery and governance solution for your enterprise data. It provides data engineers, data scientists, data stewards, and chief data officers a single collaborative environment to manage the organization's technical, business, and operational metadata.

  • Oracle Analytics Cloud

    Oracle Analytics Cloud is a scalable and secure public cloud service that empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation. With Oracle Analytics Cloud, you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.

  • Data Science

    OCI Data Science is a fully managed, serverless platform for data science teams to build, train, and manage machine learning (ML) models using OCI. It can easily integrate with other OCI services like Autonomous Data Warehouse, Object Storage, and more. You can build and evaluate high-quality machine learning models, Increase business flexibility by putting enterprise-trusted data to work quickly, and support data-driven business objectives with easier deployment of ML models.

  • Oracle Data Integration

    Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. ETL (extract transform load) leverages fully-managed scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment. Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation and helps data engineers protect against schema drift by defining rules to handle schema changes.

  • GoldenGate

    Oracle Cloud Infrastructure GoldenGate is a fully managed service that allows data ingestion from sources residing on premises or in any cloud, leveraging the GoldenGate CDC technology for a non intrusive and efficient capture of data and delivery to Oracle Autonomous Data Warehouse in real time and at scale in order to make relevant information available to consumers as quickly as possible.

  • Oracle Functions

    Oracle Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

  • API Gateway

    Oracle API Gateway service enables you to publish APIs with private endpoints that are accessible from within your network, and which you can expose to the public internet if required. The endpoints support API validation, request and response transformation, CORS, authentication and authorization, and request limiting.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • FastConnect

    Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.

  • VPN Connect

    VPN Connect provides site-to-site IPSec VPN connectivity between your on-premises network and VCNs in Oracle Cloud Infrastructure. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use policies to restrict who can access the OCI resources that your company has and how they can access them.

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in OCI proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with those actions, based on responder recipes that you can define. For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Autonomous Data Warehouse

    This architecture uses Oracle Autonomous Data Warehouse on shared infrastructure. Enable auto scaling to give the database workloads up to three times the processing power.

    Consider using Oracle Autonomous Data Warehouse on dedicated infrastructure if you want the self-service database capability within a private database cloud environment running on the public cloud.

    Consider using the hybrid partitioned tables feature of Autonomous Data Warehouse to move partitions of data to Oracle Cloud Infrastructure Object Storage and serve them to users and applications transparently. We recommend that you use this feature for data that is not often consumed and for which you don't need the same performance as for data stored within Autonomous Data Warehouse.

    Consider using the external tables feature to consume data stored in Oracle Cloud Infrastructure Object Storage in real time without the need to replicate it to Autonomous Data Warehouse. This feature transparently and seamlessly joins data sets curated outside of Autonomous Data Warehouse, regardless of the format (parquet, avro, orc, json, csv, and so on), with data residing Autonomous Data Warehouse.

    Consider using ADW query accelerator when consuming object storage data to deliver an improved and faster experience to users consuming and joining data between the data warehouse and the data lake.

  • Object Storage

    Object Storage offers reliable and cost-efficient data durability, it provides quick access to large amounts of structured and unstructured data of any content type, including database data, analytic data, images, videos and more. We recommend using standard storage to ingest data from external sources and use it for further processing since it can access quickly and frequently. You can build a lifecycle policy to move the data to cold storage from standard when it's no longer required frequently.

  • Data Catalog

    To have a complete and holistic end-to-end view of the data stored and flowing on the platform, consider harvesting not only data stores supporting the data persistence layer but also the source data stores. Mapping this harvested technical metadata to the business glossary and enriching it with custom properties allows you to map business concepts and to document and govern security and access definitions.

    To facilitate the creation of Oracle Autonomous Data Warehouse external tables that virtualize data stored on Oracle Cloud Infrastructure Object Storage, leverage the metadata previously harvested by Oracle Cloud Infrastructure Data Catalog. This simplifies the creation of external tables, enforces consistency of metadata across data stores, and is less susceptible to human error.

Explore More

Learn more about Data Lakehouse and Oracle Cloud Infrastructure.

Review these additional resources: