5.1 About Oracle Big Data SQL Query Server

Oracle Big Data SQL Query Server is an Oracle Database instance that you can optionally install as a component of Oracle Big Data SQL on an edge node in your Hadoop cluster. You use Query Server to primarily query data stored in the cluster (in HDFS and Hive formats) or object stores using Oracle external tables. This enables you to take advantage of the full SQL capabilities provided by the Oracle Database.

You can define external tables using the ORACLE_HDFS and ORACLE_HIVE or ORACLE_BIGDATA access drivers or have the Query Server automatically define external tables based on the metadata in the Hive metastore. In the latter case, Hive databases map to Oracle Database schemas – and the corresponding Hive tables are defined as Oracle external tables in those schemas. All data authorization is based on authorization rules in Hadoop such as Apache Sentry or HDFS Access Controls Lists (ACLs).

Once installed, Query Server provides an Oracle Database deployment that is automatically configured to query data in your Hadoop cluster using SQL. Restarting the Query Server restores the database to a “clean” state, eliminating management overhead. A restart preserves external tables (ORACLE_HIVE, ORACLE_HDFS, and ORACLE_BIGDATA types), associated statistics, user defined views, and credentials. A restart deletes regular tables containing user data.

If your solution requires High Availability (HA), advanced Oracle security policies, or combining data in Oracle Database with data in Hadoop, then you should leverage a full-blown Oracle Database with Big Data SQL. Oracle supports using both Query Server and a Big Data SQL enabled Oracle Database for a single Hadoop deployment.

To install Query Server, you must specify an existing edge node in your Hadoop cluster in the bds-config.json configuration file. You use the same configuration file to specify a list of Hive databases. Query Server automatically creates Oracle external tables corresponding to the tables in the Hive metastore database(s) so that they are ready for querying after a successful installation. The set of external tables in the Query Server can be automatically kept up-to-date with the corresponding Hive metastore tables by running either the Restart this Big Data SQL Query Server or the Synchronize Hive Databases commands in Cloudera Manager or Apache Ambari cluster management software. You can also use the dbms_bdsqs.sync_hive_databases PL/SQL API package procedure. See SYNC_HIVE_DATABASES.

See Also:

See Introduction in the Oracle Big Data SQL Installation Guide, which describes how to install and configure the software on the two sides of an Oracle Big Data SQL configuration.

See Store Oracle Tablespaces in HDFS for instructions on how to set up data files for smart scanning.

5.1.1 Query Server Features

Big Data SQL Query Server provides automatic installation and configuration, integration with Hadoop cluster managers, and automatic integration of cluster metadata:

  • Automatic installation and configuration: Oracle Big Data SQL installer automatically installs and configures Query Server, if you specify an existing target edge node in the Hadoop cluster in the bds-config.json configuration file. To specify the edge node where to install Query Server, you add the edgedb parameter and the node and enabled attributes to the bds-config.json configuration file to as shown in the following example where <edgenode_host_name> is the name of your edge node:
    "edgedb": {
            "node" : "dbnode.domain.com",  
            "enabled" : "true",  
            "sync_hive_db_list" : "my_hive_db_1,my_hive_db2"  
        }

    Note:

    If the bds-config.json configuration file does not include the edgedb subsection, then Query Server is not installed.

    See Also:

    bds-config.json Configuration Example in Oracle Big Data SQL Installation Guide shows a fully-populated bds-config.json file. The example includes all available configuration parameters.
  • Integration with Hadoop cluster managers: You can monitor and manage Query Server as a service using Cloudera Manager or Apache Ambari Hadoop cluster management tools.
  • Synchronization with Hive: When you start the Oracle Big Data service, Query Server automatically refreshes its metadata from the Hive metastore. After the initial refresh, users can synchronize the Query Server with the latest metadata in the Hive metastore.