Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Enable Trino Editor in Oracle Big Data Service Hue for High Availability Clusters Without Kerberos
Introduction
Oracle Big Data Service (BDS) is a cloud-based service that enables users to create and manage Hadoop clusters, Spark clusters, Trino and other big data services. In the world of big data analytics, Trino stands out as a high-performance, distributed SQL query engine designed for running interactive analytic queries on large datasets. Trino enables querying data across multiple sources, including Hadoop Distributed File System (HDFS), OCI bucket, and traditional relational databases, all within a unified SQL interface. Its ability to handle large-scale data with low latency makes it a powerful tool for data engineers and analysts alike.
Integrating Trino with BDS Hue can greatly enhance your data querying capabilities by providing a seamless interface for querying data. In this tutorial, we will walk you through the steps to enable the Trino editor in a high availability (HA) cluster environment using Hue, assuming no Kerberos authentication is in place. By following these tasks, you will be able to configure your Hue environment to connect with Trino and leverage its powerful querying features effectively.
Objectives
-
Learn how to configure the Trino editor in Hue for a HA cluster environment.
-
Understand the set up required for enabling seamless connectivity between Hue and Trino.
-
Verify and troubleshoot the configuration to ensure successful querying.
Prerequisites
-
An Oracle Big Data Service cluster running on Oracle Cloud Infrastructure (OCI) with Trino and Hue enabled.
-
Access to the Hue server and necessary permissions to modify configurations.
-
The Trino Java Database Connectivity (JDBC) driver Java archive (JAR) file downloaded and accessible.
Note: This tutorial assumes you are working with a non-Kerberos HA cluster. If you are using a Kerberized environment, additional configuration steps related to Kerberos authentication will be required.
Task 1: Download and Install the Trino JDBC Driver
-
Download the Trino JDBC driver JAR file from Maven and save the JAR file to the UNO node (where Hue is running) in the BDS environment, but for the purposes of this tutorial, it is placed in the
/tmp
directory. -
Configure Hue for Trino integration.
-
Log in to Apache Ambari and navigate to Hue, config and Advanced.
-
Click Advanced pseudo-distributed.ini and look for interpreters.
-
-
To edit the configuration, add the following configuration within the interpreters section.
Ensure the JDBC URL matches your Trino coordinator’s Fully Qualified Domain Name (FQDN) and that the driver class name is correct
Task 2: Update Python Configuration for Hue
-
Locate and modify Python gateway configuration.
On the Hue server (UNO), navigate to the
/usr/odh/2.0.7/hue/build/env/lib/python2.7/site-packages/py4j-0.9-py2.7.egg/py4j/java_gateway.py
file.Note: Before editing, create a backup of this file.
-
Edit the Python file.
-
Open
java_gateway.py
and find where the classpath is defined. -
Add the following lines to include the JDBC driver path. This ensures that Hue can locate and use the Trino JDBC driver.
-
Task 3: Restart the Hue Service
-
Return to Apache Ambari and restart the Hue service to apply the new configurations.
-
Verify editor enablement in Hue.
-
After restarting, open the Hue server interface and check if Trino editor appears in the menu.
-
When prompted for credentials, enter Username as
trino
and Password astrino
.
-
Task 4: Query Data Using the Trino Editor
-
Access the Trino editor and run sql queries.
-
Navigate to the Trino editor and choose the database you want to run the query against.
-
You can now run SQL queries against your Trino instance from within Hue.
-
Troubleshooting and Tips
-
Driver Issues: Ensure the JAR file is correctly placed in the directory with permission and the file path in the Python configuration is accurate.
-
Connection Errors: Verify the JDBC URL and ensure it is accessible from your Hue server.
-
Configuration Verification: Double-check all configuration changes in Apache Ambari and confirm that the Hue service is properly restarted.
Next Steps
By following these tasks, you should have successfully integrated the Trino editor into your BDS Hue environment. This integration enhances your data querying capabilities, allowing you to leverage Trino’s advanced querying features directly from Hue. If you encounter any issues, review the troubleshooting tips or seek further assistance from documentation or community forums.
Related Links
Acknowledgements
- Authors - Pavan Upadhyay (Principal Cloud Engineer), Saket Bihari (Principal Cloud Engineer)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Enable Trino Editor in Oracle Big Data Service Hue for High Availability Clusters Without Kerberos
G13918-01
September 2024