Application of a Data Lakehouse

A data lakehouse can accelerate and improve the diagnostic process, improving a patient's chances for recovery.

The difference between PACS and VNA is the ability to handle a greater variety of image formats beyond DICOM. However, either one can present challenges. The images demand a lot of storage capacity, so the storage system often holds current case images. Older images are likely to be moved to a lower cost, possibly offline storage, making accessing a patient’s image history a challenging and time-consuming effort. The RIS and PACS/VNA solutions are likely to reside on-premises at the medical center, meaning access to its IT infrastructure may be required. This also means the medical center needs to employ storage specialists to ensure data security. A VNA solution is more likely to support participation in health information exchange (HIE), which enables healthcare providers to collaborate in sharing data, building secure networks, and using Fast Healthcare Interoperability Resources (FHIR) and related standards. This may be a regulated solution in some countries, such as the Health and Social Care Network (HSCN) in the UK. Regulations could cause additional costs for proving compliance.

A data lakehouse can significantly accelerate the patient-journey process. Adopting a cloud-provided storage solution provides quick access to very high volumes of storage (even archived storage) at an economic scale that a medical center is unlikely to match with an on-premises solution. A cloud solution also reduces or removes the need for medical centers to maintain the specialist skills needed to manage redundancy and recovery problems to keep the solution operating.

Quicker access to images means that clinicians have more time to evaluate images, which reduces the time a patient spends waiting for a diagnoses and treatment. It also means that large volumes of images can be used to train artificial intelligence (AI) and machine learning (ML) capabilities (delivered by Oracle Cloud Infrastructure Vision and Oracle Cloud Infrastructure Data Science) on healthy versus unhealthy chests. This not only allows the AI/ML to investigate the primary issue (the patient’s current symptoms), but could also detect other issues for which the patient has yet to develop symptoms. Whether or not the AI/ML has detected any issues while assessing the x-rays, the final say should always be from a human. The image processing AI can help with that in several ways based on its assessment:

  • The AI can prioritize the x-rays that suggest significant issues where rapid intervention can make a difference to the patient’s prognosis. This could lead to the patient being admitted immediately rather than being sent home.
  • The AI can act as a second pair of eyes by highlighting anomalies to the assessing clinician. This can help to potentially identify subtleties for secondary issues that a human may miss. A digital experience can be created with the use of Oracle Functions (or Oracle Container Engine for Kubernetes) or a dashboard can be rendered with Oracle Visual Builder, Oracle JET, or APEX Service.
  • The data lakehouse will have a diverse range of related data that can be mixed with the medical analysis to draw additional insights from the patient data. This can help discover unexpected trends or insights the clinician may not be aware of.

    For example, notes in the patient’s medical records may add variables like whether the patient is a smoker, was a smoker, or comes from a household that includes smokers; or the patient’s contact details combined with geographical data may add variables associated with living conditions that could have complicating factors (e.g., the presence of dampness or asbestos).

    Without this assistance, assessing these factors would require the clinician to also think like a social worker, a realtor, and would increase the time to review a patient’s medical history in detail in addition to focusing on assessing the images.

    To achieve this mixing of data, we would store the semi- or fully-structured data in a database like the Autonomous Database service, using Data Integration to link the datasets together. The unstructured- and semi-structured data, such as the images in Oracle Cloud Infrastructure Object Storage, would be integrated by adding links to the structured data, such as the metadata associated with the images. The process of linking images to searchable data would be supported by using the Oracle Cloud Infrastructure Data Labeling (data labeling) service. Data labeling can also be used to label structured data so that records representing outliers can be easily tagged and then explicitly included or excluded from AI/ML processes.

  • The AI/ML’s findings would be reported as preliminary (or subject to a secondary clinician check) to the patient and their care provider, so other causes for the symptoms could be examined. The net result of this would be that the patient is another step closer to their diagnosis. This communication could be achieved by pulling contact information from the patient’s records in the Autonomous Database and using services such as SMS, email, or by notifying the clinical staff through the dashboard or digital frontends mentioned previously.

As the patient’s data is not bound to physical IT constraints of a medical center, it can be referred to by external medical experts in other locations. This can happen quickly and efficiently because the infrastructure to work with one or more HIEs will likely be in place, thus expanding the capacities beyond the capabilities of a single care provider. This type of automation can be achieved through Oracle Integration Cloud Service (OIC) and an API gateway (in an outbound position, so we can control and audit data egress; if external APIs are being used on a pay-per-call basis, the outbound management allows those services to be controlled and overage charges to be avoided). Bulky data sharing would be managed through FTP (provided through OIC), but communication of non-API data would be signaled through API calls in the first instance.

Figure - Patient experience scenario without data lakehouse on OCI



Figure - Patient experience scenario with data lakehouse on OCI



About Linking Socio-Economic Data to Clinical Image Processes

Data lakehouses, which can be used to ingest large volumes of data, can be connected to other data sources to facilitate efficient analysis for clinicians assessing their patients.

We’ve highlighted how using a data lakehouse allows us to ingest large volumes of data and mix different data sources to assist in providing suitable clinical treatment. The ability to link non-medical data with patients is crucial. Consider the COVID-19 pandemic or the Ebola outbreaks in West Africa where such linked data has helped to identify and limit the spread of these viruses. Optimal treatments for those already infected that improve their recovery chances, recovery speed, and post-infection health issues (conditions from depression and anxiety, to exercise intolerance and fatigue) can be found by analyzing against other patients with similar infections and diseases and their outcomes.

To achieve this kind of linkage, we need to understand the data available to us, which is where the Oracle Cloud Infrastructure Data Catalog helps; not only in tracking the data being stored within the lakehouse itself, but stored in other data sources that can be accessed and used through sources like APIs. This can then be used to manage the data lakehouse’s content and inform the data analysis being performed with the Oracle Cloud Infrastructure Data Science tools to determine relations in the data.

More common applications of mixing data include the identification and isolation of outbreaks such as Legionnaire’s Disease, which has a 10% fatality rate and estimated 20,000 cases per year in the United States alone. This type of identification would entail extracting relevant EMR/EHR records in the Autonomous Database and digitized (Oracle Cloud Infrastructure Vision) notes from patients about their movements, and combining them with geographical data using the Data Science toolset and the visualization capabilities of any of these products: Oracle Analytics Cloud, Oracle Visual Builder, Oracle JET, or APEX Service. The choice of tools depends on the desired user experience and data to be presented.

Organizations such as the Center for Disease Control (CDC) and World Health Organization (WHO) provide APIs connected to data sets about contributing factors to health. For example, air quality data can be retrieved from API services like the World Air Quality Index. This data is vital for people with lung impairments like pneumonia, as air quality or toxicity levels are a significant aggravating factor and can be difficult to assess due to significant variance even within short distances between each sensor. The data can be assessed using a combination of Oracle Integration Cloud Service, API gateway, Oracle Functions (or Oracle Container Engine for Kubernetes), Autonomous Database (structured content), and Oracle Cloud Infrastructure Object Storage (unstructured content). Data preparation or cleansing processes may use Oracle Cloud Infrastructure Data Science, Oracle Cloud Infrastructure Data Flow, or Oracle Cloud Infrastructure Data Integration to regulate data flow in or out of the platform. Streams would then provide a Kafka API complement capability as it allows us to handle data as a series of events. This means that if an external service provides data in highly concentrated data bursts, then the data can be staged until we are ready to load it into the data lakehouse.

To achieve this, we need to:

  • Capture and cleanse relevant data to avoid “garbage in, garbage out”.
  • Translate unstructured data to semi-structured data to make it easier to search and interrogate.

Using standards like HL7, FHIR, SNOMED (clinical terminology), and practices from HIE for representing the data allows greater interoperability with DICOM for imaging. These domain standards are built on common industry technologies like XML, JSON, REST, etc. As a result, Oracle products can handle the data out-of-the-box with domain-specific solutions delivered on top of these technologies.