Specify the Hive Databases to Synchronize With Query Server

5.3 Specify the Hive Databases to Synchronize With Query Server

Before you can synchronize Query Server with the desired Hive databases in the metastore, you have to specify the list of Hive databases.

Use either of these methods:

During installation, specify the sync_hive_db_list parameter in the bds-config.json configuration file.
After installation, you can update the sync_hive_db_list configuration parameter in Cloudera Manager or Apache Ambari.

After installing Query Server, it automatically creates schemas and external tables based on the Hive metastore databases list that you specified. Every subsequent Query Server restart will perform a delta synchronization.

5.3.1 Specify the Hive Databases in the bds-config.json Configuration File

You can provide the initial list of Hive databases to synchronize with Query Server as part of the installation process using the bds-config.json configuration file.

In the configuration file, include the sync_hive_db_list configuration parameter followed by a list of the Hive databases. The following example specifies two Hive databases for the sync_hive_db_list configuration parameter: htdb0 and htdb1. Only these two databases will be synchronized with Query Server, even if the Hive metastore contains other databases.

"edgedb": { 
     "node": "<edgenode_host_name>",
     "enabled": "true",
     "sync_hive_db_list": "htdb0,htdb1"
     . . .
    	}

To synchronize all Hive databases in the metastore with Query Server, use the "*" wildcard character as follows:

"edgedb": { 
     "node": "EdgeNode_Host_Name",
     "enabled": "true"
     "sync_hive_db_list": "*"
     . . .
    	}

If the bds-config.json configuration file does not contain the sync_hive_db_list configuration parameter, then no synchronization will take place between the Hive databases and Query Server. In that case, you must specify the Hive databases using the sync_hive_db_list configuration parameter in Cloudera Manager or Apache Ambari.

Note:

Query Server is not intended to store internal data in Oracle tables. Whenever the Query Server is re-started, it is "reset" to its initial and clean state. This eliminates typical database maintenance such as storage management, database configuration, and so on. The goal of Query Server is to provide a SQL front-end for data in Hadoop, Object Store, Kafka, and NoSQL databases and not a general-purpose RDBMS.

5.3.2 Updating the Hive Databases With the sync_hive_db_list Configuration Parameter

You can update the list of the Hive databases to synchronize with Query Server by using Cloudera Manager.

You can update the list of the Hive databases to synchronize with Query Server by using the sync_hive_db_list configuration parameter in Cloudera Manager as follows:

Login to Cloudera Manager by using your login credentials.
In Cloudera Manager, use the Search field to search for the Synchronized Hive Databases configuration parameter. Enter /Synchronized Hive Databases (or enter part of the name until it is displayed in the list) in the Search field, and then press Enter.
Click the Big Data SQL: Synchronized Hive Databases parameter.
In the Synchronized Hive Databases text box, enter the names of the Hive databases separated by commas, such as htdb0,htdb1, and then click Save Changes. Only these two Hive databases will be synchronized with Query Server.
To synchronize all Hive databases in the metastore with Query Server, enter the "*" wildcard character in the Synchronized Hive Databases text box, and then click Save Changes .