About Cloud SQL

Oracle Cloud SQL supports queries against non-relational data stored in multiple big data sources, including Apache Hive, HDFS, Oracle NoSQL Database, Apache Kafka, Apache HBase, and other object stores (Oracle Object Store and S3). It enables unified query for distributed data and therefore the ability to view and analyze data from disparate data stores seamlessly, as if it were all stored in an Oracle database.

Note:

Cloud SQL is not included with Oracle Big Data Service. You must pay an extra fee for use of Cloud SQL. See Add Oracle Cloud SQL to a Cluster.

Cloud SQL provides a Query Server that enables you to execute complex Oracle SQL statements against data in the Hadoop ecosystem, either manually or through your applications.

Cloud SQL provides enhancements to Oracle external tables that are specially designed for scale-out processing. An external table is an Oracle Database object that identifies and describes the location of data outside of a database. You can query an external table using the same SQL SELECT syntax that you use for any other database tables.

External tables use access drivers to parse the data outside the database. Each type of external data requires a unique access driver. Cloud SQL includes three access drivers for big data. There are drivers for:

  • Data that has metadata defined in Apache Hive

  • Accessing data stored in HDFS, with metadata specified only by an Oracle data administrator

  • Accessing data stored in the object stores, again with metadata specified only by an Oracle data administrator

Cloud SQL automatically defines external tables for metadata defined in the Hive metastore. Oracle Database schemas are created for Hive databases, and external tables are created for Hive tables. Cloudera Sentry authorization policies are used to control access to data.

Components of a Cloud SQL Deployment

The Cloud SQL architecture consists of a Cloud SQL Query Server — an Oracle Database 19c-compatible query engine that works in conjunction with Cloud SQL processes (known as Cloud SQL cells) that run on the worker nodes of your Big Data Service cluster. The Query Server is installed on its own node of Big Data Service.

Since data in HDFS is stored in an undetermined format, SQL queries require some constructs to parse and interpret data for it to be processed in rows and columns. Cloud SQL leverages available Hadoop constructs to accomplish this, notably the InputFormat and SerDe Java classes, optionally through Hive metadata definitions. The Cloud SQL processing cells on the DataNodes are a layer on top of this generic Hadoop infrastructure. Three key features provided by the cells are Smart Scan, Storage Indexes, and Aggregation Offload. See Cloud SQL Query Processing for information about those.