About Accessing Thrift

Oracle Big Data Cloud deploys two Thrift servers to provide JDBC connectivity to Hive and Spark: Spark Thrift Server and Hive Thrift Server.

JDBC clients can connect to Hive or Spark servers and execute SQL. Spark Thrift Server provides a way to submit Spark jobs via SQL, and Hive Thrift Server provides a way to submit Hadoop jobs via SQL. A common use for this capability is to allow business intelligence (BI) tools to leverage the power of Apache Spark and Apache Hive.

Thrift servers are automatically started when a cluster is provisioned in Big Data Cloud and are made available by default for the Full deployment profile. Thrift servers are not available with the Basic deployment profile.

Create a Keystore and Certificate

Note:

This section about creating a keystore and certificate does not apply to clusters that use Oracle Identity Cloud Service (IDCS) for authentication. Certificates associated with the load balancing service are typically signed by a certificate authority (are not self-signed), which means the following steps generally aren't necessary for IDCS-enabled clusters.

Before you can access a Thrift server, a keystore must be created with the appropriate certificate:

  1. Download the certificate locally (on *nix environments):

    echo | \
      openssl s_client -connect ip_address:1080 2>/dev/null | \
      openssl x509 >nginx.crt

    where ip_address is the IP address of the Big Data Cloud Console (cluster console) or any of the master nodes in the cluster.

  2. Create a TrustStore:

    /usr/java/default/bin/keytool -import -trustcacerts \
      -keystore /tmp/bdcsce.jks \
      -storepass truststore_password -noprompt \
      -alias bdcsce-certs \
      -file nginx.crt;

    where truststore_password is a password of your choosing.

  3. (Optional) Verify the certificate is properly added:

    /usr/java/default/bin/keytool \
      -keystore /tmp/bdcsce.jks \
      -storepass truststore_password \
      -list -v

Access Spark or Hive Thrift Servers

Most JDBC clients can access the Spark and Hive Thrift Servers. The examples in this section use the Beeline client to show how to connect. The Spark Thrift Server can be accessed using the Beeline client both inside and outside of the cluster, as well as programmatically.

About the JDBC URL

If inside the cluster:

Spark and MapReduce jobs can read the Hive URL as a system property. Applications can access the URL and the user name from the /etc/bdcsce/datasources.properties file inside the cluster.

For external clients (external to the cluster), the URL must be manually constructed and use one of the following formats. Note that the URLs are almost identical and vary only by the value of the hive.server2.thrift.http.path attribute.

Note:

Thrift URLs are listed on the JDBC URLs tab on the Settings page in the Big Data Cloud Console and can be copied from there. See Access the Big Data Cloud Console.

The URLs differ depending on whether a cluster uses Basic authentication or uses IDCS for authentication. For IDCS-enabled clusters, interactions are routed through the load balancing server instead of going directly to the cluster, and that difference is reflected in the URL.

Basic authentication cluster

URL for Spark Thrift Server:

jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=path_to_truststore;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice

URL for Hive Thrift Server:

jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=path_to_truststore;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2service

where:

  • ip_address is the IP address of the desired endpoint

  • path_to_truststore is the absolute path to the Java Trust Store that holds the certificate

  • truststore_password is the password used with the trust store

IDCS-enabled cluster

URL for Spark Thrift Server:

jdbc:hive2://cluster_name-load_balancing_server_URI/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice

URL for Hive Thrift Server:

jdbc:hive2://cluster_name-load_balancing_server_URI/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice

where:

  • cluster_name is the name of the cluster

  • load_balancing_server_URI is the URI assigned to the cluster by the load balancing service

Access Using the Beeline CLI

The following examples show how to access the Thrift servers using Beeline.

Note:

The URLs shown in the examples are for clusters that use Basic authentication. For IDCS-enabled clusters, substitute the URLs listed above, and use IDCS credentials (user name and password).

Access Spark Thrift Server (example):

beeline –u \
'jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=/tmp/bdcsce.jks;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice' \
   -n user_name  \
   -p password

Access Hive Thrift Server (example):

beeline –u \
'jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=/tmp/bdcsce.jks;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2service' \
   -n user_name  \
   -p password

where:

  • ip_address is the IP address of the desired endpoint

  • truststore_password is the password used with the trust store

  • user_name is the name of the user that was specified when the cluster was created

  • password is the password specified for the cluster when the cluster was created

Access Thrift Programmatically

Thrift can easily be accessed programmatically. The following code snippet illustrates how Thrift can be accessed within the cluster using the available system properties:

String url = System.getProperty("bdcsce.hivethrift.default_connect"); 
Properties prop = new Properties ();
prop.put ("user", System.getProperty("bdcsce.hivethrift.default_user "));
 prop.put ("password", password);
 
 System.out.println("Connecting to url: " +url);
 Connection conn = DriverManager.getConnection(url, prop);
 System.out.println("connected");

Note that the Hive Thrift Server system properties used in the snippet:

bdcsce.hivethrift.default_connect
bdcsce.hivethrift.default_user

can be replaced with the Spark Thrift Server equivalents to connect to the Spark Thrift Server instead of the Hive Thrift Server:

bdcsce.sparkthrift.default_connect
bdcsce.sparkthrift.default_user

password can be an empty string if the client is run within the cluster. If the client is run outside of the cluster over https, password should be the Big Data Cloud Console password.

CLASSPATH is used to specify any additional jars required by the job. When running an application outside the cluster, CLASSPATH should include all libraries under ${spark_home}/jars, where spark_home points to the Spark2 install directory.

System Properties Related to Thrift

The following table summarizes system properties that can be used by applications or in Zeppelin to facilitate simpler connection to Thrift.

Property Example Value Description
bdcsce.hivethrift.default_user bdcsce_admin User name that should be used for connecting to the Hive Thrift Server.
bdcsce.hivethrift.default_connect jdbc:hive2://host:10002/default;transportMode=http;httpPath=hs2service URL to connect to the Hive Thrift Server within the cluster. This can be used by jobs to execute queries against the Hive Thrift Server.
bdcsce.sparkthrift.default_user bdcsce_admin User name that should be used for connecting to the Spark Thrift Server.
bdcsce.sparkthrift.default_connect jdbc:hive2://host:10001/default;transportMode=http;httpPath=cliservice URL to connect to the Spark Thrift Server within the cluster. This can be used by jobs to execute queries against the Spark Thrift Server.
oscs.default.container https://storage.oraclecorp.com/v1/Storage-tenant/container REST URL to connect to the object store. Applications running inside the cluster can query this system property to get the URL.