3.1.6.3 Use Python Virtual Environments with PySpark
To ensure that the two Python versions match, in case your components run on different machines, you must use the Python virtual environments with PySpark.
Create a Virtual Environment with Conda
Note:
You can also use virtualenv to create your virtual environment instead of conda.To create a virtual environment with Conda, follow these steps:
- Ensure that you have conda and conda-Pack installed.
Note:
To check if conda is installed, then execute the following command:"conda --version"
- Create your virtual environment using the following
command:
conda create -y -n <environment-name> python=<python-version> <additional-packages>
Note:
The <environment-name> can be chosen freely and subsequently has to be substituted in further commands. - Activate your virtual environment using the following
command:
conda activate <environment-name>
- Execute the following to obtain the path to your virtual
environment:
which python
The obtained result is referred to as
<environment-abs-path>
. - Compress your virtual environment using the following
command:
conda pack -n <environment-name> -o <environment-abs-path>/<environmentname>. tar.gz
Update Interpreter Properties
The interpreter properties can either be configured in the interpreter JSON files or from
the Interpreters page of the Compliance Studio application UI after starting the
Compliance Studio application.
- In the Spark Interpreter Settings page of the Compliance Studio application UI
(or
spark.json
), change the following:- Change the value of the
spark.yarn.dist.archives
property to<environment-abspath>/< environment-name>.tar.gz#<environment-name>
- Change the value of the
spark.pyspark.python property to ./<environmentname>/ bin/python
- Change the value of the
- In the PySpark Interpreter Settings page of the Compliance Studio
application UI (or
pyspark.json
), change the value of thezeppelin.pyspark.python
parameter to<environment-abs-path>/bin/python.