Text data, such as social media posts, news, and surveys, provide valuable business and customer insights. Most often it is too time-consuming to manually analyze large amounts of textual data, so companies turn to natural language processing (NLP) to gain insights effectively and at scale. To use these NLP capabilities, you must rely on data scientists to build and train custom machine learning models, then deploy these models into applications. This process is often time-consuming and expensive.
Oracle Cloud Infrastructure (OCI) Language reduces this time and effort by providing key language processing capabilities as production-ready pre-trained models to uncover insights in unstructured text and eliminates the need for machine learning expertise. You can use OCI Language to automate text analysis at scale and understand unstructured text in documents, customer feedback interactions or support tickets regarding any issue or reviews. This will enable you to extract insights for improving customer experience and increasing efficiency.
OCI Language empowers developers to apply capabilities like sentiment analysis, key-phrase extraction, text classification, named entity recognition, and more into their applications. Developers can integrate pretrained NLP capabilities into applications, without needing data scientists to create customized models. OCI Language can be accessed either through the OCI console, OCI SDKs in Python, Java, Go, Typescript, .Net, REST APIs, or the OCI-CLI.
- Improve customer experience: Explore how customers use the product(s), extract sentiments about certain areas of interest, and identify key frustrations to address them in a timely manner.
- Identify important data: Extract named entities from customer feedback to identify people, products, and organizations mentioned.
- Ensure security and privacy: OCI Language upholds customer privacy with language models that do not store any data for training, debugging, or other purposes. In addition, OCI Language can be used to identify any potential personally identifiable information (PII) to protect customer privacy.
This architecture demonstrates the relationship among the various components in a typical system that has OCI Language at its core.
About 80% of the world's data is in unstructured formats, most of the time, written in natural language. This reference architecture illustrates any system that receives feedback from customers. In this specific example, let's use a booking application for the hospitality industry that includes accommodation, food and beverage service, event planning, theme parks, hotels, travel agencies, restaurants, or bars. The following diagram describes how you can use the pre-built AI capabilities to analyze, explore and visualize customer feedback to extract insights for improving customer experience.
A hotel chain asks customers for feedback after they have checked out. There is a continuous stream of feedback that needs to be analyzed. The hotel uses a spreadsheet to capture thousands of reviews, with every row consisting of a customer review: happy or not happy about the service or location or food. This information can be used to improve products, services sold, or the whole business. Since there is so much unstructured information, data needs to be ingested, the insights need to be extracted from that information and then analyzed and visualized. The data integration service is used to orchestrate the data flow in this reference architecture.
The following diagram illustrates this reference architecture.
Description of the illustration oci-ai-language-arch.png
- Aggregate the customer review data in a data source such as a database or file. For the purposes of this example, we will assume the data is in a .csv file in object storage.
- Data integration can read the data from the data source, and for each customer review, send calls to OCI Language through a serverless function.
- OCI Language extracts a list of aspects and their
related sentiments (positive, negative, neutral) from each
record. In addition, OCI Language extracts the list of
entities mentioned in the record sent to it, such as the
names of people or organizations mentioned on each review.
For example: One of the reviews mentions: "Hey the hotel was beautiful, the staff was very kind to me but the breakfast food was not so great". OCI Language will extract aspects such as "hotel", "breakfast" and "staff" and will tell us that "hotel" and "staff" has positive sentiment and the "breakfast" has negative sentiment.
It is also possible to extract entities, such as names of people, locations, organizations, and events using OCI Language.
- Once the aspects and entities are received by data
integration, this information is projected as a set of
tables into the Autonomous Data Warehouse. Three different
tables are projected: a table for the raw review data, a
table for each of the aspects detected and their sentiment,
and a table with the entities identified.
The target database can also be a different type of database, such as MySQL.
- You can then use Oracle Analytics Cloud to visualize
the extracted insights. Oracle Analytics Cloud allows you to
create charts from the extracted tables and filter the data.
For instance, you could plot the sentiment over time in a
chart, or visualize the aspects that are the most likely to
cause positive or negative sentiment in a word cloud.
The process of transforming the file from OCI Language and displaying the insights in Oracle Analytics Cloud is as follows: Object Storage → Data Integration Service → Oracle Functions → OCI Language → Oracle Functions → Data Integration Service → Autonomous Data Warehouse → Oracle Analytics Cloud.
The architecture has the following components:
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.
- Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- Load Balancer
The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from one entry point to multiple servers reachable from your virtual cloud network (VCN). The service offers a load balancer with your choice of a public or private IP address, and provisioned bandwidth. A load balancer improves resource utilization, facilitates scaling, and helps ensure high availability.
You can configure multiple load balancing policies and application-specific health checks to ensure that the load balancer directs traffic only to healthy instances. The load balancer can reduce your maintenance window by draining traffic from an unhealthy application server before you remove it from service for maintenance.
The Load Balancing service enables you to create a public or private load balancer within your VCN. A public load balancer has a public IP address that is accessible from the internet. A private load balancer has an IP address from the hosting subnet, which is visible only within your VCN. Dedicated subnets will be created for Private or Public Load Balancers for future requirements. OCI Public load balancer with the Oracle Cloud Infrastructure WAF will be considered for any internet-facing web application or HTTP-based API.
- Security lists
For each subnet, you can create security rules that specify the source, destination, and traffic type that must be allowed in and out of the subnet.
- Route tables
Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.
- Internet gateway
The internet gateway allows traffic between the public subnets in a VCN and the public internet.
- VPN Connect
VPN Connect provides site-to-site IPSec VPN connectivity between your on-premises network and VCNs in Oracle Cloud Infrastructure. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.
- Identity and access management (IAM)
Oracle Cloud Infrastructure Identity and Access Management (IAM) enables you to control who can access your resources in Oracle Cloud Infrastructure and the operations that they can perform on those resources.
- Object storage
Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.
- Data Integration Service
Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. ETL (extract transform load) leverages fully-managed scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment. Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation and helps data engineers protect against schema drift by defining rules to handle schema changes.
For the hotel reviews example, you can create a data flow to read your unstructured data, call OCI Language to extract insights from the text, and then project the extracted insights into structured tables in a database. For more information, see the linked blog post: Extracting insights from unstructured data using AI services in the "More Information" section.
Oracle Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.
- OCI Language
OCI Language is a serverless and multi-tenant service that is accessible using REST API calls. It provides pre-trained models that are frequently retrained and monitored to give you the best results. Language provides you with artificial intelligence and machine learning capabilities to detect the language in your unstructured text. Also, it provides other tools to help you further gain insights into your text.
- Autonomous Data
Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.
- Oracle Analytics Cloud
Oracle Analytics Cloud is a scalable and secure public cloud service that empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation. With Oracle Analytics Cloud, you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.
Transforming thousands of unstructured reviews into structured formats, such as the aspects table, enables you to use the data for scenarios, such as data analytics, training machine learning models, and search. For the hotel reviews example, you can load the data into Oracle Analytics Cloud to visualize the insights and explore the information in a way that allows you to identify actionable tasks. For more information, see the linked blog post: Extracting insights from unstructured data using AI services in the "More Information" section.
- Fault domains
A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.
- Analytics, ML, and custom applications
Analytics services and custom applications that will catalog, prepare, process, and analyze data.
When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.
Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.
After you create a VCN, you can change, add, and remove its CIDR blocks.
When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.
Use policies to restrict who can access the OCI resources that your company has and how they can access them.When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with those actions, based on responder recipes that you can define.
For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys.
When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.
- Autonomous Data Warehouse
This architecture uses Oracle Autonomous Data Warehouse on shared infrastructure. Enable auto scaling to give the database workloads up to three times the processing power.
Consider using the hybrid partitioned tables feature of Autonomous Data Warehouse to move partitions of data to Oracle Cloud Infrastructure Object Storage and serve them to users and applications transparently. We recommend that you use this feature for data that is not often consumed and for which you don't need the same performance as for data stored within Autonomous Data Warehouse.
Consider using the external tables feature to consume data stored in Oracle Cloud Infrastructure Object Storage in real time without the need to replicate it to Autonomous Data Warehouse. This feature transparently and seamlessly joins data sets curated outside of Autonomous Data Warehouse, regardless of the format (parquet, avro, orc, json, csv, and so on), with data residing in the Autonomous Data Warehouse.
Consider using ADW query accelerator when consuming object storage data to deliver an improved and faster experience to users.
- Object Storage
This architecture uses standard Oracle Cloud Infrastructure Object Storage store processed output so that other cloud services can access the output for further analysis and display.
- Load balancer bandwidth
While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.
Consider the following points when deploying this reference architecture.
- Resource limits
Consider the best practices, limits by service, and compartment quotas for your tenancy.
Consider using FastConnect if you want a dedicated, private connection between your premises and OCI, otherwise use VCN Connect.
- OCI Monitoring
The OCI Monitoring service enables you to actively and passively monitor your cloud resources using the metrics and alarms features.
Use flexible shapes to select the number of CPUs and the amount of memory you need for the workloads that run on the instance. This flexibility enables you to build VMs that match your workload, allowing you to optimize performance and minimize cost. The OCI Monitoring service enables you to actively and passively monitor your cloud resources using the metrics and alarms features.
- Chatbots with real-time sentiment analysis
As a future project this architecture can be adapted to use chatbots. Sentiment analysis has evolved to include real-time narrative mapping that allows the chatbot to look at the important words in a sentence and assign them a relative value of positive, neutral, or negative, giving the bot an understanding of the entire tenor of the conversation.
To learn more about OCI Language, review these additional resources:
- Author: Gabriel Grigorie
- Contributors: Hassan Ajan, Luis Cabrera-Cordon, Mari Messinger