About Accessing Thrift
Oracle Big Data Cloud deploys two Thrift servers to provide JDBC connectivity to Hive and Spark: Spark Thrift Server and Hive Thrift Server.
JDBC clients can connect to Hive or Spark servers and execute SQL. Spark Thrift Server provides a way to submit Spark jobs via SQL, and Hive Thrift Server provides a way to submit Hadoop jobs via SQL. A common use for this capability is to allow business intelligence (BI) tools to leverage the power of Apache Spark and Apache Hive.
Thrift servers are automatically started when a cluster is provisioned in Big Data Cloud and are made available by default for the Full deployment profile. Thrift servers are not available with the Basic deployment profile.
Create a Keystore and Certificate
Note:
This section about creating a keystore and certificate does not apply to clusters that use Oracle Identity Cloud Service (IDCS) for authentication. Certificates associated with the load balancing service are typically signed by a certificate authority (are not self-signed), which means the following steps generally aren't necessary for IDCS-enabled clusters.Before you can access a Thrift server, a keystore must be created with the appropriate certificate:
-
Download the certificate locally (on *nix environments):
echo | \ openssl s_client -connect ip_address:1080 2>/dev/null | \ openssl x509 >nginx.crt
where
ip_address
is the IP address of the Big Data Cloud Console (cluster console) or any of the master nodes in the cluster. -
Create a TrustStore:
/usr/java/default/bin/keytool -import -trustcacerts \ -keystore /tmp/bdcsce.jks \ -storepass truststore_password -noprompt \ -alias bdcsce-certs \ -file nginx.crt;
where
truststore_password
is a password of your choosing. -
(Optional) Verify the certificate is properly added:
/usr/java/default/bin/keytool \ -keystore /tmp/bdcsce.jks \ -storepass truststore_password \ -list -v
Access Spark or Hive Thrift Servers
Most JDBC clients can access the Spark and Hive Thrift Servers. The examples in this section use the Beeline client to show how to connect. The Spark Thrift Server can be accessed using the Beeline client both inside and outside of the cluster, as well as programmatically.
About the JDBC URL
If inside the cluster:
Spark and MapReduce jobs can read the Hive URL as a system property. Applications can access the URL and the user name from the /etc/bdcsce/datasources.properties
file inside the cluster.
For external clients (external to the cluster), the URL must be manually constructed and use one of the following formats. Note that the URLs are almost identical and vary only by the value of the hive.server2.thrift.http.path
attribute.
Note:
Thrift URLs are listed on the JDBC URLs tab on the Settings page in the Big Data Cloud Console and can be copied from there. See Access the Big Data Cloud Console.The URLs differ depending on whether a cluster uses Basic authentication or uses IDCS for authentication. For IDCS-enabled clusters, interactions are routed through the load balancing server instead of going directly to the cluster, and that difference is reflected in the URL.
Basic authentication cluster
URL for Spark Thrift Server:
jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=path_to_truststore;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice
URL for Hive Thrift Server:
jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=path_to_truststore;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2service
where:
-
ip_address
is the IP address of the desired endpoint -
path_to_truststore
is the absolute path to the Java Trust Store that holds the certificate -
truststore_password
is the password used with the trust store
IDCS-enabled cluster
URL for Spark Thrift Server:
jdbc:hive2://cluster_name-load_balancing_server_URI/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice
URL for Hive Thrift Server:
jdbc:hive2://cluster_name-load_balancing_server_URI/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice
where:
-
cluster_name
is the name of the cluster -
load_balancing_server_URI
is the URI assigned to the cluster by the load balancing service
Access Using the Beeline CLI
The following examples show how to access the Thrift servers using Beeline.
Note:
The URLs shown in the examples are for clusters that use Basic authentication. For IDCS-enabled clusters, substitute the URLs listed above, and use IDCS credentials (user name and password).Access Spark Thrift Server (example):
beeline –u \ 'jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=/tmp/bdcsce.jks;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice' \ -n user_name \ -p password
Access Hive Thrift Server (example):
beeline –u \ 'jdbc:hive2://ip_address:1080/default;ssl=true;sslTrustStore=/tmp/bdcsce.jks;trustStorePassword=truststore_password?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2service' \ -n user_name \ -p password
where:
-
ip_address
is the IP address of the desired endpoint -
truststore_password
is the password used with the trust store -
user_name
is the name of the user that was specified when the cluster was created -
password
is the password specified for the cluster when the cluster was created
Access Thrift Programmatically
Thrift can easily be accessed programmatically. The following code snippet illustrates how Thrift can be accessed within the cluster using the available system properties:
String url = System.getProperty("bdcsce.hivethrift.default_connect");
Properties prop = new Properties ();
prop.put ("user", System.getProperty("bdcsce.hivethrift.default_user "));
prop.put ("password", password);
System.out.println("Connecting to url: " +url);
Connection conn = DriverManager.getConnection(url, prop);
System.out.println("connected");
Note that the Hive Thrift Server system properties used in the snippet:
bdcsce.hivethrift.default_connect bdcsce.hivethrift.default_user
can be replaced with the Spark Thrift Server equivalents to connect to the Spark Thrift Server instead of the Hive Thrift Server:
bdcsce.sparkthrift.default_connect bdcsce.sparkthrift.default_user
password
can be an empty string if the client is run within the cluster. If the client is run outside of the cluster over https, password
should be the Big Data
Cloud Console password.
CLASSPATH
is used to specify any additional jars required by the job. When running an application outside the cluster, CLASSPATH
should include all libraries under ${spark_home}/jars
, where spark_home
points to the Spark2
install directory.
System Properties Related to Thrift
The following table summarizes system properties that can be used by applications or in Zeppelin to facilitate simpler connection to Thrift.
Property | Example Value | Description |
---|---|---|
bdcsce.hivethrift.default_user |
bdcsce_admin |
User name that should be used for connecting to the Hive Thrift Server. |
bdcsce.hivethrift.default_connect |
jdbc:hive2://host:10002/default;transportMode=http;httpPath=hs2service |
URL to connect to the Hive Thrift Server within the cluster. This can be used by jobs to execute queries against the Hive Thrift Server. |
bdcsce.sparkthrift.default_user |
bdcsce_admin |
User name that should be used for connecting to the Spark Thrift Server. |
bdcsce.sparkthrift.default_connect |
jdbc:hive2://host:10001/default;transportMode=http;httpPath=cliservice |
URL to connect to the Spark Thrift Server within the cluster. This can be used by jobs to execute queries against the Spark Thrift Server. |
oscs.default.container |
https://storage.oraclecorp.com/v1/Storage-tenant/container |
REST URL to connect to the object store. Applications running inside the cluster can query this system property to get the URL. |