Configure Data Science Service Notebooks Using PySpark

The Python3 conda environment is preinstalled in the notebook session. The conda environment is a Python3-based conda environment and has a minimal set of Python libraries installed.

To install a PySpark conda environment:

From the Data Science Service notebook session, select File, then New Launcher, then Terminal.
Enter your credentials to access the notebook. The notebook is displayed.
In the notebook, select File then New Launcher, then Terminal or select the Terminal icon if the launcher is already open. A terminal session window opens.
Enter the following command to install the PySpark dependency:
odsc conda install -s pyspark35_p311_cpu_x86_64_v1
To create a notebook that uses the PySpark kernel, select File, then select New and select the pyspark35_p311_cpu_x86_v1 kernel.

From the menu on the left, select spark_config_dir, select the spark-default.conf file, add the configurations below to the bottom of the file, then save.


spark.hadoop.fs.oci.client.hostname=https://objectstorage.us-ashburn-1.oraclecloud.com
spark.hadoop.oci.metastore.uris=https://datacatalog.us-ashburn-1.oci.oraclecloud.com/
spark.hadoop.fs.oci.client.custom.authenticator=com.oracle.bmc.hdfs.auth.ResourcePrincipalsCustomAuthenticator
spark.hadoop.oracle.dcat.metastore.client.custom.authentication_provider=com.oracle.bmc.hdfs.auth.ResourcePrincipalsCustomAuthenticator

Parent topic: Analyze Health Data