UniQreate is a data extraction automation company that helps businesses maximize the value of unorganized data by using the most efficient workflows and the least intrusive interactions.
Many organizations have data spread across millions of documents that vary in structure, context, layout, and formats. Extracting relevant data from this unorganized data using manual resources or other tools takes lots of time and has limited scalability. UniQreate solves this problem by using artificial intelligence (AI), intelligent workflows, and web interfaces to improve their deep learning models. With this approach, as organization's data has better context and representation and can be used readily without dependence on manual processing or custom-built tools.
UniQreate was looking to build their data management platform and chose Oracle Cloud at Customer (OCI) for the following reasons:
- Agility when scaling compute and storage bandwidth
- Off-the-shelf higher compute power at a competitive cost
- Scalable file storage and managed MySQL services
- Object storage addresses all project storage needs
- Compartment feature provides a clean way to segregate and manage separate environments
UniQreate has been part of Oracle Cloud at Customer startup program since 2020 and has been running 16 OCPU instances and 3 GPU instances for multiple client environments. This setup allows them to run 200 extraction cycles per day with model training running every 24 hours.
With the features, functionality, and competitive cost provided by Oracle Cloud at Customer, UniQreate has been able to achieve 20% monthly savings on overall cost.
This architecture shows UniQreate's multi-region disaster recovery architecture on Oracle Cloud Infrastructure.
- Web server: Provides extraction user interface and administrative capabilities
- Code manager: Determines the shapes of virtual machines (VMs) that need to be launched for the prediction engine
- Prediction engine: Runs artificial intelligence (AI) and machine learning (ML) modules
- Monitoring servers: Monitor the health and performance of the entire solution.
- File system: Provides scalable, low-latency storage for model and document metadata, independent from web servers and database servers
- Database server: Provides persistent storage for web servers
Automation is achieved using Ansible scripts to dynamically launch the prediction engine with the preferred shape. This scalable prediction engine helps address processing of large document based on the client's need.
The virtual cloud network (VCN) where virtual machines (VM) are hosted is segmented into two subnets: a public subnet for the bastion host (for SSH connectivity), and a private subnet for VMs such as the code manager, prediction engine, file storage, and MySQL database servers. The public subnet also hosts a Jenkins server for continuous integration and deployment (CI/CD) requirements.
Resources are deployed in multiple fault domains for high availability.
Object Storage is used as backup for the entire environment, including images of each VM. A public load balancer distributes the traffic load across VMs. The environment uses two layers of security: one for network security implemented using network security lists and another that's application-specific, implemented using a network security group (NSG) for each network segment. Separate availability domains and separate fault domains within each availability domain are used for deployment, which provides high availability and greater fault tolerance within the region. The entire environment is also hosted in another region for disaster recovery. User access is managed using Identity and Access Management (IAM) policies.
The entire set-up was deployed on Oracle Cloud Infrastructure in four days, providing maximum availability and uptime. For their future deployments, UniQreate is using Oracle Cloud Infrastructure's GPU offering to improve and refine their deep learning models that customers can use to process their documents at $5–10M per year per client and generate much better context and representation of the text.
The following diagram illustrates this reference architecture.
The architecture has the following components:
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
All the resources in this architecture are deployed in a single region.
- Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.
All the resources in this architecture are deployed in a single availability domain.
- Fault domain
A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.
Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.
- Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- Security lists
For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.
- Remote peering
Remote peering allows the VCNs' resources to communicate using private IP addresses without routing the traffic over the internet or through your on-premises network. Remote peering eliminates the need for an internet gateway and public IP addresses for the instances that need to communicate with another VCN in a different region.
- Bastion host
The bastion host is a compute instance that serves as a secure, controlled entry point to the topology from outside the cloud. The bastion host is provisioned typically in a demilitarized zone (DMZ). It enables you to protect sensitive resources by placing them in private networks that can't be accessed directly from outside the cloud. The topology has a single, known entry point that you can monitor and audit regularly. So, you can avoid exposing the more sensitive components of the topology without compromising access to them.
- Load balancer
The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.
This architecture includes a public load balancer.
- Object storage
Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.
- File storage
The Oracle Cloud Infrastructure File Storage service provides a durable, scalable, secure, enterprise-grade network file system. You can connect to a File Storage service file system from any bare metal, virtual machine, or container instance in a VCN. You can also access a file system from outside the VCN by using Oracle Cloud Infrastructure FastConnect and IPSec VPN.
- Oracle MySQL Database Service
Oracle MySQL Database Service is a fully managed Oracle Cloud Infrastructure (OCI) database service that lets developers quickly develop and deploy secure, cloud native applications. Optimized for and exclusively available in OCI, Oracle MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.
Oracle MySQL Database Service has an integrated, high-performance analytics engine (HeatWave) to run sophisticated real-time analytics directly against an operational MySQL database.
Learn more about the features of this architecture.