Perform multicloud data analytics with Oracle modern data platform

Organizations can build an enterprise data warehouse to store both live and archived data in one location.

Data is generated as a result of a business processes being completed, operations being performed, or products being sold. Due to heterogeneous data sources, organizations want to adopt a simplified approach to building a centralized data store—a one-stop shop for all their data analytics needs. Due to the variety of sources, such as data reported by users, manufacturers, distributors, partners, and third-party vendors, along with day-to-day retail orders and customer feedback, the goal is to make a centralized, enterprise data warehouse repository that has been extracted and analyzed by business functions to build end-to-end business visibility and respond with data-driven information.

Organizations want to leverage the information and make data-driven decisions to run their businesses more efficiently. This multicloud data analytics solution enables organizations to effectively run analytics using a central data warehouse on Oracle modern data platform with integrations to multiple data sources, such as Oracle Fusion Cloud Enterprise Resource Planning, on-premises Microsoft Azure SQL Server (SQL Server), Salesforce, eBay, and Google Analytics.

Benefits include:

  • A unified data analytics pipeline

    Simplified access to all data across clouds and on-premises installations, including data stored in databases and object stores.

  • Ease of integration

    Integrate data from disparate systems: federate, orchestrate, synchronize and mash-up data. Integrate any data, any format, any API, at any speed, with any application, or with any device. All this while enabling secure collaboration, honoring security rules, and without writing any code.

  • High performance analytics

    Fast access to data using query tools enables quick decisions and better customer service.

  • Advanced analytics capabilities

    Enable advanced analytic techniques such as data and text mining, machine learning, forecasting, sentiment analysis, network and cluster analysis, graph analysis, complex event processing, and neural networks.

  • Single platform

    A single, cloud-based platform to increase collaboration within teams, improve execution and time-to-market, and accelerate innovation.

  • Cost, security, and availability

    Organizations want to reduce on capital expenditures (CapEx) and opereational expenditures (OpEx) costs, but also want to have a good mix of cost versus performance with security and availability.

Architecture

This reference architecture shows an enterprise multicloud data analytics pipeline that takes and formats data from different sources, moves them into the enterprise data warehouse on Oracle Cloud Infrastructure (OCI), and analyzes them using Oracle Analytics Cloud (OAC).

The data is integrated from various sources using Oracle Integration and OCI integration services. The data sources shown are Salesforce, eBay, SQL Server, Oracle Fusion Cloud Service, and Google Analytics, but the solution applies to any data source that accepts API calls or database connections. OCI integration services connect any application and data source to automate end-to-end processes and centralize management. The broad array of integrations, with pre-built adapters and low-code customization, simplify migration to the cloud while streamlining multicloud operations.

Oracle Integration (OIC) connects any applications, data, and services, including Salesforce and eBay, Oracle Fusion Cloud Service, and partner ecosystems for business-to-business (B2B) communications. The data is integrated from variou sources using Oracle Integration services. Once data is available from all the different sources in the staging layer, the data then gets cleansed, standardized, merged, and transformed using Data Integration. ADW stores the staging layer, reference data, and analytical layer. OAC is being used as an analytical tool to generate dashboards, reports, KPIs, and to drive self-service analytics across the organization. Pre-built connectors from OAC can be used to replicate and merge data from Google Analytics.



oci-multicloud-data-analytics-diagram-oracle.zip

Data source integration:

  • Google Analytics integration

    User uses OAC built-in Google Analytics data connector creating a connection to Google Drive or Google Analytics. The connection needs to use the Google authorized redirect URIs along with client secret and the Google Analytics view name. Once authorized you are ready to read transform and create dashboards in OAC.

  • Salesforce integration

    The Salesforce adapter enables users to create a simplified bidirectional integration with Salesforce.com. It allows the discovery of business objects and operations and provides easy mapping to and from Salesforce.com business objects. OIC builds a workflow that creates a connection with the Salesforce adapter, then pulls the data into ADW.

  • Microsoft Azure SQL Server integration

    SQL Server data integration with ADW is achieved using Data Integration. It creates a data pipeline from SQL Server to ADW, specifies the source data asset, and then configures transformations to cleanse and process the data as it is loaded into the target data asset. To execute a specific set of processes in a sequence, you create a pipeline. Designing a pipeline is similar to building a data flow, where you use operators to add the tasks and activities you want. After building a pipeline, you create a pipeline task that uses the pipeline. After you create tasks, you publish them to the default application or to your own application. Applications run tasks and then monitor their progress and status. You can also schedule tasks for automated runs.

  • Manual data feed (flat files)

    Oracle Cloud Infrastructure Object Storage is used as a business file store, where business and operational users were uploading the manual data feed files such as targets, forecasts, monthly customer markers, and tentative workforce alignment metrics. Once data files are available in Oracle Cloud Infrastructure Object Storage buckets, they are automatically picked up for processing using Data Integration.

  • Oracle Fusion Service integration

    OCI Data Integration uses the Oracle Business Intelligence Cloud Connector (BICC) to enable connections to Fusion Applications as data sources. You use a Fusion Applications data asset as a source to extract data from Fusion Applications, such as an ERP or HCM cloud. OCI Data Integration loads the extracted data into a predefined external storage location that’s configured in BICC. We load the data from Fusion Applications to Oracle Cloud Infrastructure Object Storage in Parquet format and then to ADW's staging layer.

The architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Autonomous Database

    Oracle Autonomous Database is a fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Analytics

    Oracle Analytics Cloud is a scalable and secure public cloud service that empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation. With Oracle Analytics Cloud, you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • OCI integration services

    OCI integration services connect any application and data source to automate end-to-end processes and centralize management. The broad array of integrations, with pre-built adapters and low-code customization, simplify migration to the cloud while streamlining multicloud operations.

  • OCI Application Integration

    OCI Application Integration provides pre-built connectivity to SaaS and on-premises applications, run-ready process automation templates, and a low-code visual builder for web and mobile application development. It gives you native access to events in Oracle Cloud ERP, HCM, and CX. Connect app-specific analytic silos to simplify requisition-to-receipt, recruit-to-pay, lead-to-invoice, and other critical processes, providing your IT and business leaders with end-to-end visibility.

  • Data Integration

    Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. ETL (extract transform load) leverages fully-managed scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment. Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation and helps data engineers protect against schema drift by defining rules to handle schema changes.

Acknowledgments

Authors: Parag Pardhi

Contributors: Wei Han, Daryl Eicher, John Sulyok