Use Python Virtual Environments with PySpark
To ensure that the two Python versions match, in case your components run on different machines, you must use the Python virtual environments with PySpark.
Create a Virtual Environment with Conda
Note:
You can also use virtualenv to create your virtual environment instead of conda.To create a virtual environment with Conda, follow these steps:
- Ensure that you have conda and conda-Pack installed.
Note:
To check if conda is installed, then execute the following command:"conda --version"
- Create your virtual environment using the following
command:
conda create -y -n <environment-name> python=<python-version> <additional-packages>
Note:
The <environment-name> can be chosen freely and subsequently has to be substituted in further commands. - Activate your virtual environment using the following
command:
conda activate <environment-name>
- Execute the following to obtain the path to your virtual
environment:
which python
The obtained result is referred to as
<environment-abs-path>
. - Compress your virtual environment using the following
command:
conda pack -n <environment-name> -o <environment-abs-path>/<environmentname>. tar.gz
Update Interpreter Properties
The interpreter properties can either be configured in the interpreter JSON
files or from the Interpreters page of the OFS MMG application UI after starting the OFS
MMG application.
- In the Spark Interpreter Settings page of the OFS MMG application
UI (or
spark.json
), change the following:- Change the value of the
spark.yarn.dist.archives
property to<environment-abspath>/< environment-name>.tar.gz#<environment-name>
- Change the value of the
spark.pyspark.python property to ./<environmentname>/ bin/python
- Change the value of the
- In the PySpark Interpreter Settings page of the OFS MMG
application UI (or
pyspark.json
), change the value of thezeppelin.pyspark.python
parameter to<environment-abs-path>/bin/python.