Monitor devices in real time based on IoT actions using OCI Generative AI and Oracle E-Business Suite

In today’s world, for each application, we have many devices and data points which are connected to a central server for data processing. These data points continuously emit metrics, and if monitored and calculated we can get some very useful insights from the data. These insights can then be used to make predictions like when some device may crash, and eventually can be integrated with a system like Oracle E-Business Suite to place an order to replace the faulty device on the go.

The architecture which we propose – for healthcare, will take the inputs from the incoming events emitted by devices. These events will have the data about device health, for example, data emitted by an oxygen monitor running in a hospital will have the data regarding how old it is, operating system, security patches applied, historical and current information about its memory and storage usage and the load it is serving.

We will clean and then pass this data to our ML model running in the Oracle Cloud Infrastructure Data Science service and then compute the probability which will tell us the chances this device is going to stop working and by when. We will aggregate all this data and as per the requirement push toOracle Autonomous Data Warehouse for further reporting. We can also integrate the data further with Oracle E-Business Suite so that it can place an order automatically as soon as it matches the specified criteria for device failure.

Architecture

This reference architecture demonstrates how to utilize cloud capabilities inOracle Cloud Infrastructure (OCI) to create a device monitoring solution hosted on OCI.

In this architecture, we have shown how this device monitoring solution will be hosted on OCI and how admin users will access the solution for both business and administration or operations purpose.

The following diagram illustrates this reference architecture data flow.



oci-genai-iot-ebs-arch-oracle.zip

As soon as the data is generated at device, the client application running on device will access the OCI Streaming on an endpoint exposed via API Gateway. These endpoints will be protected by a high end web security service – WAF – stands for Web Application Firewall. This service will make sure the frontend security is by default applicable to the application. The same streaming endpoint is connected from a Service Connector Hub, which will keep monitoring the stream and as soon as there is a new data produced by the device, it will consume the data, and trigger OCI Functions for further processing of the data.

OCI Functions will take the consumed data and start data processing. There will be scenarios where multiple records will be consumed in single consume call depending on the input traffic, and the function will be capable of taking care of all the records separately. For each record, the function will perform the following tasks:

  1. Clean the record data and gather the required parameters from it.
  2. Create an API Request call for the ML model hosted on an endpoint. The input for this request will be the required parameters for the model to make a device failure prediction. The response of this request will be the device failure prediction (ranging from 0.00 to 10.00, where 0.00 means least chances of device failure, and 10.00 means most chances of device failure).
  3. After we get the prediction, the function will add this to the input record and push it to Autonomous Data Warehouse for future reporting, and ML model continuous re-learning.
  4. Based on the prediction value, OCI Functions will trigger the next task. If the prediction is for non-failure, the function will exit the run for that record as there is nothing else to do. If the prediction is for failure, the function will perform the following sub-tasks:
    1. Access the Autonomous Data Warehouse reference table for all details for the new order like order submitter and approver details, data related to device and all other stakeholders.
    2. Use OCI Generative AI to generate the order details summary.
    3. Submit the order details to Oracle E-Business Suite, or any other ERP, CRM software.
    4. Use OCI Generative AI to draft an email for stakeholder summary.
    5. Send the notification to the corresponding stakeholders to inform about the order placement.
  5. Once this flow is completed, the function will mark the record as processed, and will move to the next record.

The solution consists of a self re-learning ML model, which will keep updating itself with the new data coming to the Autonomous Data Warehouse. All three tiers of the application are hosted in different subnets to make sure we have opened the correct security ports as required by the application. The data stored in the databases are pulled from another subnet to ensure proper security.

The architecture diagram also illustrates another user access flow for admin users. These are the users who are responsible for operating the device monitoring application on OCI. They will access the application resources using SSH over Site-to-Site VPN or FastConnect. This will create a secure tunnel which will connect the CPE device in the customer data center with the DRG on OCI. Using this path, the administrators will access the application resources on OCI from the data center computers. This access is required to make sure all operations job like patching, application upgrade, operating system security upgrades and other tasks are done securely and on time.

In this architecture, we can also add the concepts of high availability and disaster recovery implemented on OCI. High availability means the application is deployed in multiple availability domains in the same region. This will make sure the application is always available even in the case where one of the availability domains goes down due to some issue like fire or electricity. Disaster recovery means the application is also deployed in multiple OCI regions. This will make sure the application is always available even in the case where one of the regions goes down due to some issue like tsunami, cyclone, or earthquake. Further to these, you should also place resources in multiple fault domains to make sure the architecture is also safe from any failure on the rack level inside the Oracle data center. These are extremely important topics to consider as these applications are supposed to run continuously for long times with zero downtime.

On the same architecture, we can also see we have created the deployment using theOCI DevOps service which will make sure all the components are deployed in an agile way. All components are deployed using Terraform and maintained through Ansible. This showcases how you can leverage automation on OCI and follow the Infrastructure-as-a-Code (IaaC) approach to make the mainframe application more agile and easy to maintain.

The architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain shouldn't affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • FastConnect

    Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.

  • IPSec VPN

    VPN Connect provides site-to-site IPSec VPN connectivity between your on-premises network and VCNs in Oracle Cloud Infrastructure. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and does not traverse the internet.

  • Autonomous database systems

    Oracle Autonomous Database is a fully automated service that makes it easy for all organizations to develop and deploy application workloads, regardless of complexity, scale, or criticality. The service’s converged engine supports diverse data types, simplifying application development and deployment from modeling and coding to ETL, database optimization, and data analysis. With machine learning–driven automated tuning, scaling, and patching, Autonomous Database delivers the highest performance, availability, and security for OLTP, analytics, batch, and Internet of Things (IoT) workloads. Built on Oracle Database and Oracle Exadata, Autonomous Database is available on Oracle Cloud Infrastructure (OCI) for serverless or dedicated deployments as well as on-premises with Oracle Exadata Cloud@Customer and OCI Dedicated Region.

  • Streaming

    Oracle Cloud Infrastructure Streaming provides a fully managed, scalable, and durable storage solution for ingesting continuous, high-volume streams of data that you can consume and process in real time. You can use Streaming for ingesting high-volume data, such as application logs, operational telemetry, web click-stream data; or for other use cases where data is produced and processed continually and sequentially in a publish-subscribe messaging model.

  • Service connectors

    Oracle Cloud Infrastructure Service Connector Hub is a cloud message bus platform that orchestrates data movement between services in OCI. You can use service connectors to move data from a source service to a target service. Service connectors also enable you to optionally specify a task (such as a function) to perform on the data before it is delivered to the target service.

    You can use Oracle Cloud Infrastructure Service Connector Hub to quickly build a logging aggregation framework for security information and event management (SIEM) systems.

  • Data Science

    Oracle Cloud Infrastructure Data Science is a fully managed, serverless platform that data science teams can use to build, train, and manage machine learning (ML) models on Oracle Cloud Infrastructure (OCI). It can easily integrate with other OCI services such as Oracle Autonomous Data Warehouse, Oracle Cloud Infrastructure Object Storage, and more. You can build and evaluate high-quality machine learning models that increase business flexibility by putting enterprise-trusted data to work quickly, and you can support data-driven business objectives with easier deployment of ML models.

  • Functions

    Oracle Cloud Infrastructure Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

  • Logging
    Logging is a highly scalable and fully managed service that provides access to the following types of logs from your resources in the cloud:
    • Audit logs: Logs related to events emitted by the Audit service.
    • Service logs: Logs emitted by individual services such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN flow logs.
    • Custom logs: Logs that contain diagnostic information from custom applications, other cloud providers, or an on-premises environment.

Recommendations

Use the following recommendations as a starting point to implement this reference architecture using OCI Functions and OCI Events. Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Application Design

    This reference architecture uses OCI Functions for all the processing. There are few limitations on using OCI Functions like maximum execution time of 300 seconds, and if you intend to use this architecture for large amount of input flow data, then you can consider running the application on OCI Compute instances. You can execute multiple runners, and each runner for consuming and processing the inputs events separately and in parallel.

  • Disaster Recovery

    A standby disaster recovery instance in a different OCI region is recommended for enterprise applications. The DR Strategy must be consistent across the 3-tiers to meet SLA and data durability requirements. The disaster recovery Oracle Exadata Database Service on Dedicated Infrastructure is synced up with production by using Oracle Data Guard. The standby Oracle Exadata Database Service on Dedicated Infrastructure is a transactionally consistent copy of the primary database. Oracle Data Guard automatically maintains synchronization between the databases by transmitting and applying redo data from the primary database to the standby. In the event of a disaster in the primary region, Oracle Data Guard automatically fails over to the standby database in the secondary region. Front-end load balancers are deployed either in a standby mode for network load balancers, or with high availability by using Load Balancer as a Service.

Considerations

When implementing this reference architecture, it is important to consider the following aspects.

  • Performance

    OCI Functions, Autonomous Data Warehouse and other important services are highly scalable. Consider adjusting the number of compute and storage resources, based on the size and requirement of your mainframe application.

  • Security

    Use policies to restrict who can access the OCI resources. For OCI Object Storage, encryption is enabled by default and cannot be turned off. All access to functions deployed in OCI Functions is controlled through Oracle Cloud Infrastructure Identity and Access Management (OCI IAM), which allows both function management and function invocation privileges to be assigned to specific users and user groups. It is recommended to store secrets and sensitive data in OCI Vault. Consider using OCI Vault for storing API keys and auth token used for authorization with OCI services.

  • Availability

    Oracle ensures high availability of the OCI Functions, Autonomous Data Warehouse and other services, which are cloud native and fully managed. For workloads deployed within a single availability domain, you can ensure resilience by distributing the resources across the fault domains as shown in this architecture. If you plan to deploy your workload in a region that has more than one availability domain, you can distribute the resources across multiple availability domains.

  • Scalability

    You can scale the application servers vertically by switching to and changing the shape of the compute instances. A shape with a higher core count provides more memory and network bandwidth. If you need more storage, increase the size of the block volumes attached to the application server. You can scale the databases vertically by enabling more cores for the Autonomous Data Warehouse DB system. You can add OCPUs in multiples of two for a quarter rack. The databases remain available during a scaling operation. If your workload outgrows the available CPUs and storage, you can migrate to a larger rack.

Acknowledgments

  • Author: Lovelesh Saxena