Installing Jupyter
Install Jupyter on the same node as the one you set up for PySpark integration.
-
Install Jupyter.
sudo python3 -m pip install jupyter
-
Upgrade
Pygments
package.$ pip3 install --upgrade Pygments
-
Check Jupyter install location.
$ which jupyter
/usr/local/bin/jupyter
-
Check Kernels available.
$ /usr/local/bin/jupyter kernelspec list
Available kernels: python3 /usr/local/share/jupyter/kernels/python3
-
Check Jupyter package versions.
$ /usr/local/bin/jupyter --version
Selected Jupyter core packages... IPython : 7.16.2 ipykernel : 5.5.6 ipywidgets : 7.6.5 jupyter_client : 7.1.0 jupyter_core : 4.9.1 jupyter_server : not installed jupyterlab : not installed nbclient : 0.5.9 nbconvert : 6.0.7 nbformat : 5.1.3 notebook : 6.4.6 qtconsole : 5.2.2 traitlets : 4.3.3
-
Request Kerberos ticket.
kinit -kt <spark-user-keytabfile> <principle> keyTab File Location: /etc/security/keytabs/**.keytab
Example$ kinit -kt /etc/security/keytabs/spark.headless.keytab spark-trainingcl@BDACLOUDSERVICE.ORACLE.COM
Kerberos ticket is applicable only to highly available clusters. You must request a Kerberos ticket with the appropriate user that has Ranger permissions on HDFS, yarn, and so on. This ticket is valid for 24 hours only.
For non-highly available clusters, ranger permissions and kerberos ticket are not required.
-
Launch Jupyter from the utility node.
<jupyter-location> notebook --ip=0.0.0.0 --allow-root
Example:
/usr/local/bin/jupyter notebook --ip=0.0.0.0 --allow-root
Example output:
[xxxx NotebookApp] To access the notebook, open this file in a browser: file:////xxxx Or copy and paste one of these URLs: xxxx or http://<some link> or http://127.0.0.1:8888/?token=<your-token>
-
From the output, copy the URL for the notebook, and replace
127.0.0.1
with the public IP address of the utility node.http://<utility-node-public-ip-address>:8888/?token=<your-token>
-
Run the following commands in your notebook.
import findspark findspark.init() import pyspark
from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .enableHiveSupport() \ .appName("ODH-ML-WorkBench") \ .getOrCreate()
-
Test by getting the Spark version:
spark.version
Example output:
'3.0.2'