Note:

Connect Alteryx to Oracle Big Data Service Hive Using ODBC and Kerberos

Introduction

Oracle Big Data Service is a cloud-based service provided by Oracle that enables users to create and manage Hadoop clusters, Spark clusters, and other big data services. By default, an Oracle Big Data Service cluster uses simple authentication mechanisms, such as username and password, to authenticate users and services. However, this may not be sufficient for organizations that require higher levels of security.

Kerberos authentication is a widely used mechanism for securing distributed computing systems. It provides a secure way to authenticate users and services, allowing them to communicate securely over a network. In a Kerberos-enabled environment, users and services are issued with cryptographic keys that are used to verify their identity and authorize access to resources.

Kerberos authentication can provide an additional layer of security for an Oracle Big Data Service cluster by enabling strong authentication and encryption between nodes.

Objectives

Explore the process of establishing seamless connectivity between Alteryx and Oracle Big Data Service Hive within a kerberized cluster. By implementing Kerberos authentication, we can ensure a secure environment for data processing and analysis.

Prerequisites

An Oracle Big Data Service cluster running on Oracle Cloud Infrastructure (OCI) with required services configured and ports opened.

Note: A highly available (HA) Oracle Big Data Service cluster comes with Kerberos installed. If you want to use non HA cluster you need the following list of Prerequisites.

Note: In this tutorial, we are using an Oracle Big Data Service HA cluster. You should also have a basic understanding of the following concepts and terminologies:

Task 1: Kerberize an Oracle Big Data Service cluster

  1. Log in to the Oracle Big Data Service cluster master node either through an SSH command or by utilizing putty with the ppk file using the opc user credentials. Once logged in, elevate your permissions to the root user. In this tutorial, we have used putty to login into the nodes.

  2. Stop all the running Hadoop and Spark services using Ambari.

    Ambari Stop Services

  3. Create the bdsuser Administrator Kerberos Principal by following these steps.

    • Enter Kadmin prompt using command Kadmin.local.

    • At the command line prompt of kadmin.local, execute the addprinc command to create a new Kerberos principal called bdsuser. When prompted, choose a password of your preference, and confirm it.

      Execute addprinc command

    • Enter exit to return to the original prompt. By default, the realm used here is selected from krb5.conf. However, you have the option to customize the realm by updating the configuration.

  4. Create a group bdsusergroup by executing command dcli -C "groupadd bdsusergroup.

  5. Enter the command dcli -C "useradd -g bdsusergroup -G hdfs,hadoop,hive bdsuser" at the prompt to create the bdsuser administrator user and add it to the listed groups on each node in the cluster.

  6. Use the linux id command to confirm the creation of the bdsuser and to list its groups membership.

  7. Start all the Oracle Big Data Service services using Ambari.

    Ambari Start Services

  8. Test bdsuser by generating Kerberos ticket using command kinit bdsuser and list files using hadoop fs -ls / command.

    Command Prompt

Task 2: Connect Alteryx to Oracle Big Data Service Hive

We explored the kerberization of Oracle Big Data Service cluster, and now we will dive into connecting Alteryx with the Kerberized Oracle Big Data Service Hive using the ODBC driver.

To unleash the full potential of your Oracle Big Data Service Hive data, it is crucial to establish a connection with your preferred tools. In this section, we will guide you through the process of connecting Alteryx to Oracle Big Data Service Hive. To achieve this, we will configure an ODBC connection that utilizes Kerberos authentication.

  1. Configure MIT Kerberos by following these steps.

    • Copy krb5.conf and krb5.keytab file from cluster master node to C:\Program Files\MIT\Kerberos.

    • Change the file extension from .conf to .ini for Windows compatibility.

    • Create a new folder C:/temp (Folder and path can be of your choice).

    • Setup following Environment variable.

      Environment Variable

    • Run command Kinit bdsuser from command prompt to verify the Kerberos setup and generate Ticket.

  2. Install the ODBC driver on your Windows machine. Follow the prompts to install the driver.

  3. Configure the ODBC driver by following these steps.

    • Make DNS and IP entries in hosts file C:\Windows\System32\drivers\etc for master and utility node.

      Sample hosts File

    • Test the hosts file entries using the ping command.

      Alt text

  4. Open the ODBC Data Source Administrator tool on your machine and follow these steps.

    • Choose System DSN tab. Click Add and Choose Cloudera ODBC driver for Apache Hive and then click Finish.

      System DSN Tab

    • Follow the screenshot below for the Hive connection configuration.

      Hive connection configuration

    • Click Test to check connectivity. If everything is set up correctly, you should see a message indicating that the test succeeded. Click OK to save the setting.

      Test Hive Connection

  5. Open Alteryx and follow these steps.

    • Go to File, select Manage Connection and click Add datasource. Choose Hive from the available technology options.

      Manage Connection

    • Once you have selected the technology option, you will be prompted to the next screen where you need to provide the DSN name. Select the ODBC DSN which you created in the previous step and click Save.

      Save Connection

    • We are done with the connection setup. Now let’s build a sample workflow to read the data from Hive. To create a sample workflow, click New workflow, then drag and drop the input data task.

      New Workflow

    • Select Hive ODBC from the connection list.

      Connection List

    • Select the Hive table to fetch data.

      Hive Tables

    • Run the workflow and you can see that Alteryx can successfully retrieve data from Oracle Big Data Service Hive.

      Run Workflow

Next Steps

Kerberos authentication is an essential component of a secure Hadoop cluster. By Kerberizing your Oracle Big Data Service cluster, you can better protect your sensitive data and applications from unauthorized access. While the process of Kerberizing your cluster can be challenging, careful planning and attention to detail can help ensure a successful implementation. Also, connecting Alteryx to Oracle Big Data Service Hive using ODBC and Kerberos authentication can be a bit tricky, but once you’ve got it set up, you’ll have Alteryx for analyzing and visualizing your Oracle Big Data Service data. By following the steps outlined in this tutorial, you should be able to set up the connection easily.

If you do run into issues, you can troubleshoot. With a bit of persistence, you’ll be able to connect Alteryx to Oracle Big Data Service Hive and unlock the full potential of your data.

Acknowledgements

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.