Use OCI Language for Customer Feedback Analytics

Text data, such as social media posts, news, and surveys, provide valuable business and customer insights. Most often it is too time-consuming to manually analyze large amounts of textual data, so companies turn to natural language processing (NLP) to gain insights effectively and at scale. To use these NLP capabilities, you must rely on data scientists to build and train custom machine learning models, then deploy these models into applications. This process is often time-consuming and expensive.

Oracle Cloud Infrastructure (OCI) Language reduces this time and effort by providing key language processing capabilities as production-ready pre-trained models to uncover insights in unstructured text and eliminates the need for machine learning expertise. You can use OCI Language to automate text analysis at scale and understand unstructured text in documents, customer feedback interactions or support tickets regarding any issue or reviews. This will enable you to extract insights for improving customer experience and increasing efficiency.

OCI Language empowers developers to apply capabilities like sentiment analysis, key-phrase extraction, text classification, named entity recognition, and more into their applications. Developers can integrate pretrained NLP capabilities into applications, without needing data scientists to create customized models. OCI Language can be accessed either through the OCI console, OCI SDKs in Python, Java, Go, Typescript, .Net, REST APIs, or the OCI-CLI.

Using OCI Language provides the following benefits:
  • Improve customer experience: Explore how customers use the product(s), extract sentiments about certain areas of interest, and identify key frustrations to address them in a timely manner.
  • Identify important data: Extract named entities from customer feedback to identify people, products, and organizations mentioned.
  • Ensure security and privacy: OCI Language upholds customer privacy with language models that do not store any data for training, debugging, or other purposes. In addition, OCI Language can be used to identify any potential personally identifiable information (PII) to protect customer privacy.

Architecture

This architecture demonstrates the relationship among the various components in a typical system that has OCI Language at its core.

About 80% of the world's data is in unstructured formats, most of the time, written in natural language. This reference architecture illustrates any system that receives feedback from customers. In this specific example, let's use a booking application for the hospitality industry that includes accommodation, food and beverage service, event planning, theme parks, hotels, travel agencies, restaurants, or bars. The following diagram describes how you can use the pre-built AI capabilities to analyze, explore and visualize customer feedback to extract insights for improving customer experience.

A hotel chain asks customers for feedback after they have checked out. There is a continuous stream of feedback that needs to be analyzed. The hotel uses a spreadsheet to capture thousands of reviews, with every row consisting of a customer review: happy or not happy about the service or location or food. This information can be used to improve products, services sold, or the whole business. Since there is so much unstructured information, data needs to be ingested, the insights need to be extracted from that information and then analyzed and visualized. The data integration service is used to orchestrate the data flow in this reference architecture.

The following diagram illustrates this reference architecture.

Description of oci-ai-language-arch.png follows
Description of the illustration oci-ai-language-arch.png

oci-ai-language-arch-oracle.zip

The following section describes the customer feedback analytics flow in this reference architecture.
  1. Aggregate the customer review data in a data source such as a database or file. For the purposes of this example, we will assume the data is in a .csv file in object storage.
  2. Data integration can read the data from the data source, and for each customer review, send calls to OCI Language through a serverless function.
  3. OCI Language extracts a list of aspects and their related sentiments (positive, negative, neutral) from each record. In addition, OCI Language extracts the list of entities mentioned in the record sent to it, such as the names of people or organizations mentioned on each review.

    For example: One of the reviews mentions: "Hey the hotel was beautiful, the staff was very kind to me but the breakfast food was not so great". OCI Language will extract aspects such as "hotel", "breakfast" and "staff" and will tell us that "hotel" and "staff" has positive sentiment and the "breakfast" has negative sentiment.

    It is also possible to extract entities, such as names of people, locations, organizations, and events using OCI Language.

  4. Once the aspects and entities are received by data integration, this information is projected as a set of tables into the Autonomous Data Warehouse. Three different tables are projected: a table for the raw review data, a table for each of the aspects detected and their sentiment, and a table with the entities identified.

    The target database can also be a different type of database, such as MySQL.

  5. You can then use Oracle Analytics Cloud to visualize the extracted insights. Oracle Analytics Cloud allows you to create charts from the extracted tables and filter the data. For instance, you could plot the sentiment over time in a chart, or visualize the aspects that are the most likely to cause positive or negative sentiment in a word cloud.

    The process of transforming the file from OCI Language and displaying the insights in Oracle Analytics Cloud is as follows: Object Storage → Data Integration Service → Oracle Functions → OCI Language → Oracle Functions → Data Integration Service → Autonomous Data Warehouse → Oracle Analytics Cloud.

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Load Balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from one entry point to multiple servers reachable from your virtual cloud network (VCN). The service offers a load balancer with your choice of a public or private IP address, and provisioned bandwidth. A load balancer improves resource utilization, facilitates scaling, and helps ensure high availability.

    You can configure multiple load balancing policies and application-specific health checks to ensure that the load balancer directs traffic only to healthy instances. The load balancer can reduce your maintenance window by draining traffic from an unhealthy application server before you remove it from service for maintenance.

    The Load Balancing service enables you to create a public or private load balancer within your VCN. A public load balancer has a public IP address that is accessible from the internet. A private load balancer has an IP address from the hosting subnet, which is visible only within your VCN. Dedicated subnets will be created for Private or Public Load Balancers for future requirements. OCI Public load balancer with the Oracle Cloud Infrastructure WAF will be considered for any internet-facing web application or HTTP-based API.

  • Security lists

    For each subnet, you can create security rules that specify the source, destination, and traffic type that must be allowed in and out of the subnet.

  • Route tables

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • VPN Connect

    VPN Connect provides site-to-site IPSec VPN connectivity between your on-premises network and VCNs in Oracle Cloud Infrastructure. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.

  • Identity and access management (IAM)

    Oracle Cloud Infrastructure Identity and Access Management (IAM) enables you to control who can access your resources in Oracle Cloud Infrastructure and the operations that they can perform on those resources.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Data Integration Service

    Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. ETL (extract transform load) leverages fully-managed scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment. Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation and helps data engineers protect against schema drift by defining rules to handle schema changes.

    For the hotel reviews example, you can create a data flow to read your unstructured data, call OCI Language to extract insights from the text, and then project the extracted insights into structured tables in a database. For more information, see the linked blog post: Extracting insights from unstructured data using AI services in the "More Information" section.

  • Functions

    Oracle Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service (FaaS) platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

  • OCI Language

    OCI Language is a serverless and multi-tenant service that is accessible using REST API calls. It provides pre-trained models that are frequently retrained and monitored to give you the best results. Language provides you with artificial intelligence and machine learning capabilities to detect the language in your unstructured text. Also, it provides other tools to help you further gain insights into your text.

  • Autonomous Data Warehouse

    Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Oracle Analytics Cloud

    Oracle Analytics Cloud is a scalable and secure public cloud service that empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation. With Oracle Analytics Cloud, you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.

    Transforming thousands of unstructured reviews into structured formats, such as the aspects table, enables you to use the data for scenarios, such as data analytics, training machine learning models, and search. For the hotel reviews example, you can load the data into Oracle Analytics Cloud to visualize the insights and explore the information in a way that allows you to identify actionable tasks. For more information, see the linked blog post: Extracting insights from unstructured data using AI services in the "More Information" section.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Analytics, ML, and custom applications

    Analytics services and custom applications that will catalog, prepare, process, and analyze data.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use policies to restrict who can access the OCI resources that your company has and how they can access them.When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys.

    When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Autonomous Data Warehouse

    This architecture uses Oracle Autonomous Data Warehouse on shared infrastructure. Enable auto scaling to give the database workloads up to three times the processing power.

    Consider using the hybrid partitioned tables feature of Autonomous Data Warehouse to move partitions of data to Oracle Cloud Infrastructure Object Storage and serve them to users and applications transparently. We recommend that you use this feature for data that is not often consumed and for which you don't need the same performance as for data stored within Autonomous Data Warehouse.

    Consider using the external tables feature to consume data stored in Oracle Cloud Infrastructure Object Storage in real time without the need to replicate it to Autonomous Data Warehouse. This feature transparently and seamlessly joins data sets curated outside of Autonomous Data Warehouse, regardless of the format (parquet, avro, orc, json, csv, and so on), with data residing in the Autonomous Data Warehouse.

    Consider using ADW query accelerator when consuming object storage data to deliver an improved and faster experience to users.

  • Object Storage

    This architecture uses standard Oracle Cloud Infrastructure Object Storage store processed output so that other cloud services can access the output for further analysis and display.

  • Load balancer bandwidth

    While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.

Considerations

Consider the following points when deploying this reference architecture.

  • Resource limits

    Consider the best practices, limits by service, and compartment quotas for your tenancy.

  • Connectivity

    Consider using FastConnect if you want a dedicated, private connection between your premises and OCI, otherwise use VCN Connect.

  • OCI Monitoring

    The OCI Monitoring service enables you to actively and passively monitor your cloud resources using the metrics and alarms features.

  • Cost

    Use flexible shapes to select the number of CPUs and the amount of memory you need for the workloads that run on the instance. This flexibility enables you to build VMs that match your workload, allowing you to optimize performance and minimize cost. The OCI Monitoring service enables you to actively and passively monitor your cloud resources using the metrics and alarms features.

  • Chatbots with real-time sentiment analysis

    As a future project this architecture can be adapted to use chatbots. Sentiment analysis has evolved to include real-time narrative mapping that allows the chatbot to look at the important words in a sentence and assign them a relative value of positive, neutral, or negative, giving the bot an understanding of the entire tenor of the conversation.

Acknowledgements

  • Author: Gabriel Grigorie
  • Contributors: Hassan Ajan, Luis Cabrera-Cordon, Mari Messinger

Change Log

This log lists significant changes: