Installing Jupyter

Install Jupyter on the same node as the one you set up for PySpark integration.

  1. Install Jupyter.
    sudo python3 -m pip install jupyter
  2. Upgrade Pygments package.
    $ pip3 install --upgrade Pygments
  3. Check Jupyter install location.
    $ which jupyter
    /usr/local/bin/jupyter
  4. Check Kernels available.
    $ /usr/local/bin/jupyter kernelspec list
    Available kernels:
      python3    /usr/local/share/jupyter/kernels/python3
  5. Check Jupyter package versions.
    $ /usr/local/bin/jupyter --version
    Selected Jupyter core packages...
    IPython          : 7.16.2
    ipykernel        : 5.5.6
    ipywidgets       : 7.6.5
    jupyter_client   : 7.1.0
    jupyter_core     : 4.9.1
    jupyter_server   : not installed
    jupyterlab       : not installed
    nbclient         : 0.5.9
    nbconvert        : 6.0.7
    nbformat         : 5.1.3
    notebook         : 6.4.6
    qtconsole        : 5.2.2
    traitlets        : 4.3.3
  6. Request Kerberos ticket.
    kinit -kt <spark-user-keytabfile> <principle>
    keyTab File Location: /etc/security/keytabs/**.keytab
    Example
    $ kinit -kt /etc/security/keytabs/spark.headless.keytab spark-trainingcl@BDACLOUDSERVICE.ORACLE.COM

    Kerberos ticket is applicable only to highly available clusters. You must request a Kerberos ticket with the appropriate user that has Ranger permissions on HDFS, yarn, and so on. This ticket is valid for 24 hours only.

    For non-highly available clusters, ranger permissions and kerberos ticket are not required.

  7. Launch Jupyter from the utility node.
    <jupyter-location> notebook --ip=0.0.0.0 --allow-root

    Example:

    /usr/local/bin/jupyter notebook --ip=0.0.0.0 --allow-root

    Example output:

    [xxxx NotebookApp] To access the notebook, open this file in a browser:
    file:////xxxx
    Or copy and paste one of these URLs:
    xxxx
    or http://<some link>
    or http://127.0.0.1:8888/?token=<your-token>
  8. From the output, copy the URL for the notebook, and replace 127.0.0.1 with the public IP address of the utility node.
    http://<utility-node-public-ip-address>:8888/?token=<your-token>
  9. Run the following commands in your notebook.
    import findspark
    findspark.init()
    import pyspark
    from pyspark.sql import SparkSession
    spark = SparkSession \
        .builder \
        .enableHiveSupport() \
        .appName("ODH-ML-WorkBench") \
        .getOrCreate()
  10. Test by getting the Spark version:
    spark.version

    Example output:

    '3.0.2'