Installing Python on non-BDA platforms

Python 2.7 must be installed on the Admin Server and the YARN NodeManager servers.

You can install either the Anaconda version of the Python distribution or the Miniconda version that contains the conda package manager and Python. If you intend to use 3rd-party packages for Python, then Anaconda is recommended as it includes Python, pandas, Jupyter, and over 150 other Python modules. Using Anaconda thus simplifies the installation of Python and those 3rd-party modules.

To install the Python package on the Admin Server:

  1. Download the Python 2.7 installer:
  2. Run the Python installer, as documented in the download page.

As a result, you should have Python installed on the Admin Server machine. For example, it could be installed in the /localdisk/anaconda2 directory.

After the Anaconda directory is created, you set that directory as the LOCAL_PYTHON_HOME in the bdd-shell.conf file, as in this example:
## Path to the python 2.7 and 3rd party libs on the server running BDD Shell
## Suggest to use Anaconda 2.5
LOCAL_PYTHON_HOME=/localdisk/anaconda2
Also set the location of the Python executable as the SPARK_EXECUTOR_PYTHON property in the bdd-shell.conf file, as in this example:
## Path to the python 2.7 binary on the Yarn Node Manager servers
SPARK_EXECUTOR_PYTHON=/localdisk/anaconda2/bin/python

Optionally, install any 3rd-party Python packages based on your needs, such as Pandas and Jupyter. Installing Anaconda 2.5, can simplify the installation of Python and 3rd-party packages.