Synchronize Query Server With Hive

5.4 Synchronize Query Server With Hive

You can synchronize the Query Server with the Hive databases that you specified by using Cloudera Manager, Apache Ambari, or the dbms_bdsqs.sync_hive_databases PL/SQL API.

You can synchronize Query Server with the Hive databases in the metastore using one of the following methods:

Execute the Restart this Big Data SQL Query Server command in Cloudera Manager or Apache Ambari.
Execute the Synchronize Hive Databases command in Cloudera Manager or Apache Ambari.
Invoke the dbms_bdsqs.sync_hive_databases PL/SQL API locally on the edge node.

You must specify the Hive databases to use in the synchronization either by using the bds-config.json configuration file or by using the sync_hive_db_list configuration parameter in Cloudera Manager.

Note that the dbms_bdsqs.sync_hive_databases PL/SQL API will only refresh the Hive table definitions for the Hive databases that have already been synchronized through the other two methods.

5.4.1 Restart Query Server Manually Using Cloudera Manager

You can synchronize Query Server with the Hive databases that you specified by restarting Query Server in Cloudera Manager or Apache Ambari.

You can use Cloudera Manager or Apache Ambari to manage Query Server such as starting, stopping, and restarting it. When you restart or start Query Server, it synchronizes the metadata with the Hive databases that you specified. Any changes in the Hive databases in the metatore such as dropping or adding tables will be reflected in Query Server. For example, you can restart Query Server in Cloudera Manager as follows:

Login to Cloudera Manager using your login credentials.
In the list of available services, click the Big Data SQL link to display the Big Data SQL details page.
From the Status Summary section, click the Big Data SQL Query Server link to display the Big Data SQL Query Server details page.
From the Actions drop-down list, select Restart this Big Data SQL Query Server.
A dialog box is displayed. Click Restart this Big Data SQL Query Server. Another dialog box is displayed to monitor the status of the synchronization job.

5.4.2 Synchronize Query Server Manually Using Cloudera Manager

You can use Cloudera Manager or Apache Ambari to manually synchronize Query Server with the Hive databases that you specified.

After the sychronization, any changes in the Hive databases in the metastore such as dropped or added tables will be reflected in Query Server. For example, you can synchronize Query Server in Cloudera Manager as follows:

Login to Cloudera Manager by using your login credentials.
In the list of available services, click the Big Data SQL link to display the Big Data SQL details page.
From the Status Summary section, click the Big Data SQL Query Server link to display the Big Data SQL Query Server details page.
From the Actions drop-down list, select Synchronize Hive Databases.
A dialog box is displayed. Click Synchronize Hive Databases. Another dialog box is displayed to monitor the status of the synchronization job.

5.4.3 Synchronize Query Server Using the PL/SQL API

You can synchronize Query Server with the Hive databases that you specified by using the PL/SQL API.

To do so, invoke the dbms_bdsqs.sync_hive_databases PL/SQL API locally on the edge node where the Query Server is installed.

The procedure contains no parameters. It synchronizes all of the Hive databases that are already in Query Server. The API will refresh the Query Server with only the Hive databases listed in the sync_hive_db_list configuration parameter. Each successive synchronization (also known as a refresh) will process changes since the last Query Server metadata refresh.

A synchronization captures any tables that were added or dropped in the Hive metastore since the last refresh. This also includes any tables whose schemas might have changed.

5.4.4 Enable Query Server Full Synchronization

You can specify whether Query Server performs a delta (default) or a full synchronization.

During the Query Server installation process, the Oracle schemas and the appropriate external tables are created based on the Hive databases list that you can specify either in the bds-config.json configuration file or the sync_hive_db_list configuration parameter. In that case, Query Server performs a full synchronization. By default, Query Server performs a delta synchronization during subsequent restarts or synchronizations.

You can control whether Query Server performs a full or a delta synchronization by using the Enable full synchronization configuration parameter in Cloudera Manager or Apache Ambari. This configuration parameter is de-selected by default. To enable Query Server to perform a full synchronization, select this checkbox in Cloudera Manager or Apache Ambari. For example, you can use Cloudera Manager to enable Query Server to perform a full synchronization during a restart or a manual synchronization as follows:

Login to Cloudera Manager by using your login credentials.
In Cloudera Manager, use the Search field to search for the Enable full synchronization configuration parameter. Enter / Enable full synchronization (or enter part of the name until it is displayed in the list) in the Search field, and then press Enter.
Click Big Data SQL: Enable full synchronization. The checkbox is de-selected by default. This indicates that Query Sever will perform a delta synchronization.
To enable full synchronization, select the checkbox, and then click Save Changes.

A full synchronization drops all of the existing schemas and external tables from Query Server, and then re-creates new schemas and new external tables based on the Hive databases list that you specified in the sync_hive_db_list configuration parameter.

By default, Query Server performs a delta synchronization between the Hive databases in the metastore that you specify and Query Server. Any changes in the Hive databases such as dropping or adding tables will be reflected in Query Server. However, When you start Query Server for the very first time, it will create Oracle Schemas based on the Hive databases that you specify either in the bds-config.json configuration file or in sync_hive_db_list configuration parameter in Cloudera Manager or Apache Ambari.

The first time the Query Server synchronizes with Hive the process will be slower than usual. This is because it is importing all of the tables for the specified databases (configured in Cloudera Manager or Apache Ambari) in the Hadoop cluster. Subsequent refreshes should be much faster as it would only refresh the changes that were made to the Hive Metadata such as additions of new tables. During a delta import, the Query Server will also gather new statistics for tables that have been added/modified.