Configuring Apache Livy with Spark and Hive

Apache Livy enables easy interaction with a Spark and Hive cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, and Spark Context management, through a simple REST interface or an RPC client library. Perform the following configurations to enable Apache Livy:

·       Configuring Apache Livy

·       Configuring a Cluster

Configuring Apache Livy

To configure Apache Livy, perform the following steps:

1.      Download Apache Livy version 0.4 or 0.5 from the Apache website.

File Name: apache-livy-0.5.0-incubating-bin.zip

 

NOTE

File name for version 0.4 is apache-livy-0.4.0-incubating-bin.zip.

 

2.     Extract the file under the edge node or named node of the CDH cluster, where you have access to HADOOP HOME and SPARK HOME directory.

The extracted folder is created as LIVY_HOME.

3.     Configure the $LIVY_HOME/conf/livy.conf file for the following properties:

livy.server.port = 8998

livy.spark.master = yarn

livy.repl.enable-hive-context = true

livy.server.launch.kerberos.keytab = /scratch/ofsaa/ofsaa.keytab

livy.server.launch.kerberos.principal = ofsaa@OFS682.ORACLE.COM

livy.repl.enableHiveContext = true

4.    Configure the following environment variables for Apache Livy in the $LIVY_HOME/conf/livy-env.sh file:

### directory of JDK used for CDH

export JAVA_HOME=/scratch/software/jdk1.8.0_101

### spark 2.x home directory from CDH

export SPARK_HOME=/scratch/cloudera/parcels/SPARK2/lib/spark2

export SPARK_CONF_DIR=/etc/spark2/conf

export HADOOP_CONF_DIR=/etc/hadoop/conf:/etc/hive/conf

5.     After configuring the Apache Livy, restart using the following commands:

$LIVY_HOME/bin/livy-server stop

$LIVY_HOME/bin/livy-server start

To verify the server log, see the $LIVY_HOME/logs/livy-<user>-server.out file.

Configuring Apache Livy to use HTTPS or SSL-TLS 1.2

If you want to use Sparkmagic to communicate with Apache Livy through HTTPS or SSL-TLS 1.2, you must perform the following actions to configure Apache Livy as a secure endpoint:

·       Generate a keystore file, certificate, and truststore file for the Apache Livy server or use a third-party SSL certificate.

·       Update Apache Livy with the keystore details.

·       Restart the Apache Livy server.

Following are the steps to create the self-signed certificate and configure Apache Livy to use HTTPS or SSL-TLS 1.2:

1.      Generate a keystore file for the Apache Livy server using the following command:

keytool -genkey -alias <host> -keyalg RSA -keysize 1024 –dname CN=<host>,OU=ofsaa,O=ofsaa,L=redwood,ST=ca,C=us –keypass <keyPassword> -keystore <keystore_file> -storepass <storePassword>

2.     Create a certificate using the following command:

keytool -export -alias <host> -keystore <keystore_file> -rfc –file <cert_file> -storepass <StorePassword>

3.     Create a truststore file using the following command:

keytool -import -noprompt -alias <host> -file <cert_file> -keystore <truststore_file> -storepass <truststorePassword>

4.    Update the livy.conf file with the keystore details. For example:

livy.keystore = /home/ofsaa/livy-0.5.0-incubating-bin/keystore.jks

livy.keystore.password = storepass123

livy.key-password = keypass123

5.     After configuring the Apache Livy server, restart using the following commands:

$LIVY_HOME/bin/livy-server stop

$LIVY_HOME/bin/livy-server start

To verify the server log, see the $LIVY_HOME/logs/livy-<user>-server.out file.

Configuring a Cluster

 

NOTE

·       This section is applicable only during Stage and Results on Hive installation.

·       Ensure that you have the proper role to access this screen.

 

To configure a Cluster, you must configure DMT and provide the Apache Livy Interface details to add a New Cluster, add appropriate roles to the user:

1.      Navigate to Data Management Framework, select Data Management Tools, select DMT Configuration, select Register Cluster, and then select Edit Cluster.

2.     Specify the following details in the Cluster Configurations window:

§       Name

§       Description

§       Livy Details

3.     In the Livy Service URL field, enter the Apache Livy Server URL (HTTP or HTTPS) of your environment.

4.    Click Save to save the Cluster Configurations. The service URL enables easy interaction with a Spark and Hive cluster over a REST interface.