To use Python Virtual Environments with PySpark, follow these steps:
1. Creating a Virtual Environment with Conda
2. Including Virtual Environment in the Init Container
3. Updating Interpreter Properties
NOTE |
You can also use virtualenv to create your virtual environment instead of conda. |
To create a virtual environment with Conda, follow these steps:
1. Ensure that you have conda and conda-pack installed.
2. Create your virtual environment using the following command:
conda create -y -n <environment-name> python=<python-version> <additional-packages>
NOTE |
The <environment-name> can be chosen freely and subsequently must be used in further commands. |
3. Activate your virtual environment using the following command:
conda activate <environment-name>
4. Execute the following command to obtain the path to your virtual environment:
which python
The obtained result is referred to as <environment-abs-path>
5. Compress your virtual environment using the following command:
conda pack -n <environment-name> -o <environment-abs-path>/<environment-name>.tar.gz
To include the virtual environment in the Init container, you must place the Virtual Environment in the same path as the Spark libraries. For more information, see Provide Custom Spark libraries.
All the properties can either be configured in the interpreter JSON files or from the Interpreters page of the FCC Studio application UI after starting the FCC Studio application.
· In the Spark Interpreter Settings page of the FCC Studio application UI (or spark.json file), change the following values:
§ Change the value of the spark.yarn.dist.archives property to /var/olds-spark-interpreter/interpreter/spark/libs/<environment-name>/<environment-name>.tar.gz#<environment-name>
§ Change the value of the spark.pyspark.python property to ./<environment-name>/bin/python
· In the PySpark Interpreter Settings page of the FCC Studio application UI (or pyspark.json file), change the value of the zeppelin.pyspark.python property to /var/olds-spark-interpreter/interpreter/spark/libs/<environment-name>/bin/python.