Do you want to run Python? I can help you out! This documentation is specific to the farmshare2
cluster at Stanford, on which there are several versions on python available. The python convention is that python v2 is called ‘python’, and python v3 is called ‘python3’. They are not directly compatible, and in fact can be thought of as entirely different software.
How do I know which python I’m calling?
Like most Linux software, when you issue a command to execute some software, you have a variable called $PATH
that loads the first executable it finds with that name. The same is true for python
and python3
. Let’s take a look at some of the defaults:
# What python executable is found first?
rice05:~> which python
/usr/bin/python
# What version of python is this?
rice05:~> python --version
Python 2.7.12
# And what about python3?
rice05:~> which python3
/usr/bin/python3
# And python3 version
rice05:~> python3 --version
Python 3.5.2
This is great, but what if you want to use a different version? As a reminder, most clusters like Farmshare2 come with packages, modules, and can also be installed with your custom software (here’s a refresher if you need it). Let’s talk about the different options for extending the provided environments, or creating your own environment. First, remember that for all of your scripts, the first line instructs what executable to use. So make sure to have this at the top of your script:
#!/usr/bin/env python
Now, what to do when the default python doesn’t fit your needs? You have many choices:
- Install to a User Library if you want to continue using a provided python, but add a module of your choice to a personal library
- Install a conda environment if you need standard scientific software modules, and don’t want the hassle of compiling and installing them.
- Create a virtual environment if you want more control over the version and modules
1. Install to a User Library
The reason that you can’t install to the shared python
or python3
is because you don’t have access to the site-packages
folder, which is where the modules are looked for automatically by python. But don’t despair! You can install to your (very own) site-packages
by simply appending the --user
argument to the install command. For example:
# Install the pokemon-ascii package
pip install pokemon --user
# Where did it install to?
rice05:~> python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pokemon
>>> pokemon.__file__
'/home/vsochat/.local/lib/python2.7/site-packages/pokemon/__init__.pyc'
As you can see above, your --user
packages install to a site packages folder for the python version under .local/lib
. You can always peek into this folder to see what you have installed.
rice05:~> ls $HOME/.local/lib/python2.7/site-packages/
nibabel pokemon virtualenv.py
nibabel-2.1.0.dist-info pokemon-0.32.dist-info virtualenv.pyc
nisext virtualenv-15.0.3.dist-info virtualenv_support
You probably now have two questions.
- How does python know to look here, and
- How do I check what other folders are being checked?
How does Python find modules?
You can look at the sys.path
variable, a list of paths on your machine, to see where Python is going to look for modules:
rice05:~> python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/home/vsochat/.local/lib/python2.7/site-packages',
'/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']
Above we can see that the system libraries are loaded before local, so if you install a module to your user folder, it’s going to be loaded after. Did you notice that the first entry is an empty string? This means that your present working directory will be searched first. If you have a file called pokemon.py
in this directory and then you do import pokemon
, it’s going to use the file in the present working directory.
How can I dynamically change the paths?
The fact that these paths are stored in a variable means that you can dynamically add / tweak paths in your scripts. For example, when I fire up python3
and load numpy, it uses the first path found in sys.path
:
>>> import numpy
>>> numpy.__path__
['/usr/lib/python3/dist-packages/numpy']
And I can change this behavior by removing or appending paths to this list before importing. Additionally, you can add paths to the environmental variable $PYTHONPATH
to add folders with modules (read about PYTHONPATH here). First you add the variable to the path:
# Here is setting an environment variable with csh
rice05:~> setenv PYTHONPATH /home/vsochat:$PYTHONPATH
# And here with bash
rice05:~> export PYTHONPATH=/home/vsochat:$PYTHONPATH
# Did it work?
rice05:~> echo $PYTHONPATH
/home/vsochat
Now when we run python, we see the path has been appended to the beginning of sys.path
:
rice05:~> python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/home/vsochat', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
'/home/vsochat/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
Awesome!
How do I see more information about my modules?
You can look to see if a module has a __version__
, a __path__
, or a __file__
, each of which will tell you details that you might need for debugging. Keep in mind that not every module has a version defined.
rice05:~> python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.__version__
'1.11.0'
>>> numpy.__file__
'/usr/lib/python2.7/dist-packages/numpy/__init__.pyc'
>>> numpy.__path__
['/usr/lib/python2.7/dist-packages/numpy']
>>> numpy.__dict__
If you are really desperate for seeing what functions the module has available, take a look at (for example, for numpy) numpy.__dict__.keys()
. While this doesn’t work on the cluster, if you load a module in iPython you can press TAB to autocomplete for available options, and add a single or double _
to see the hidden ones like __path__
.
How do I ensure that my package manager is up to date?
We’ve hit a conundrum! How does one “pip install pip”? And further, how do we ensure we are using the pip version associated with the currently active python? The same way that you would upgrade any other module, using the --upgrade
flag:
rice05:~> python -m pip install --user --upgrade pip
rice05:~> python -m pip install --user --upgrade virtualenv
And note that you can do this for virtual environments (virtualenv
) as well.
2. Install a conda environment
There are a core set of scientific software modules that are quite annoying to install, and this is where anaconda and miniconda come in. These are packaged virtual environments that you can easily install with pre-compiled versions of all your favorite modules (numpy, scikit-learn, pandas, matplotlib, etc.). We are going to be following instructions from the miniconda installation documentation. Generally we are going to do the following:
- Download the installer
- Run it to install, and install to our home folder
- (optional) add it to our path
- Install additional modules with conda
First get the installer from here, and you can use wget
to download the file to your home folder:
rice05:~> cd $HOME
rice05:~> wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Make it executable
rice05:~> chmod u+x Miniconda3-latest-Linux-x86_64.sh
Then run it! If you do it without any command line arguments, it’s going to ask you to agree to the license, and then interactively specify installation parameters. The easiest thing to do is skip this, using the -b
parameter will automatically agree and install to miniconda3
in your home directory:
rice05:~> ./Miniconda3-latest-Linux-x86_64.sh -b
PREFIX=/home/vsochat/miniconda3
...
(installation continues here)
If you want to add the miniconda to your path, meaning that it will be loaded in preference to all other pythons, then you can add it to your .profile:
echo "export PATH=$HOME/miniconda3/bin:$PATH >> $HOME/.profile"
Then source your profile to make the python path active, or log in and out of the terminal to do the same:
source /home/vsochat/.profile
Finally, to install additional modules to your miniconda environment, you can use either conda (for pre-compiled binaries) or the pip that comes installed with the miniconda environment (in the case that the conda package managed doesn’t include it).
# Scikit learn is included in the conda package manager
/home/vsochat/miniconda3/bin/conda install -y scikit-learn
# Pokemon ascii is not
/home/vsochat/miniconda3/bin/pip install pokemon
3. Install a virtual environment
If you don’t want the bells and whistles that come with anaconda or miniconda, then you probably should go for a virtual environment. The Hitchhiker’s Guide to Python has a great introduction, and we will go through the steps here as well. First, let’s make sure we have the most up to date version for our current python:
rice05:~> python -m pip install --user --upgrade virtualenv
Since we are installing this to our user (.local
) folder, we need to make sure the bin (with executables for the install) is on our path, because it usually won’t be:
# Ruhroh!
rice05:~/myproject> which virtualenv
virtualenv: Command not found.
# That's ok, we know where it is!
rice05:~/myproject> export PATH=/home/vsochat/.local/bin:$PATH
# (and for csh)
rice05:~/myproject> setenv PATH /home/vsochat/.local/bin:$PATH
# Did we add it?
rice05:~/myproject> which virtualenv
/home/vsochat/.local/bin/virtualenv
You can also add this to your $HOME/.profile
if you want it sourced each time.
Now we can make and use virtual environments! It is as simple as creating it, and activating it:
rice05:~>mkdir myproject
rice05:~>cd myproject
rice05:~/myproject> virtualenv venv
New python executable in /home/vsochat/myproject/venv/bin/python
Installing setuptools, pip, wheel...done.
rice05:~/myproject> ls
venv
To activate our environment, we use the executable activate
in the bin provided. If you take a look at the files in bin
, there is an activate file for each kind of shell, and there is also the executables for python
and the package manager pip
:
rice05:~/myproject> ls venv/bin/
activate activate_this.py pip python python-config
activate.csh easy_install pip2 python2 wheel
activate.fish easy_install-2.7 pip2.7 python2.7
Here is how we would active for csh:
rice05:~/myproject> source venv/bin/activate.csh
[venv] rice05:~/myproject>
Notice any changes? The name of the active virtual environment is added to the terminal prompt! Now if we look at the python and pip versions running, we see we are in our virtual environment:
[venv] rice05:~/myproject> which python
/home/vsochat/myproject/venv/bin/python
[venv] rice05:~/myproject> which pip
/home/vsochat/myproject/venv/bin/pip
Again, you can add the source command to your $HOME/.profile
if you want it to be loaded automatically on login. From here you can move forward with using python setup.py install
(for local module files) and pip install MODULE
to install software to your virtual environment.
To exit from your environment, just type deactivate
:
[venv] rice05:~/myproject> deactivate
rice05:~/myproject>
PROTIP You can specify commands to your virtualenv creation to include the system site packages in your environment. This is useful for modules like numpy that require compilation (lib/blas, anyone?) that you don’t want to deal with:
rice05:~/myproject> virtualenv venv --system-site-packages
Reproducible Practices
Whether you are a researcher or a software engineer, you are going to run into the issue of wanting to share your code, and someone on a different cluster running it. The best solution is to container-ize everything, and for this we recommend using Singularity. However, let’s say that you’ve been a bit disorganized, and you want to quickly capture your current python environment either for a requirements.txt file, or for a container configuration? If you just want to glance and get a “human readable” version, then you can do:
rice05:~> pip list
biopython (1.66)
decorator (4.0.6)
gbp (0.7.2)
nibabel (2.1.0)
numpy (1.11.0)
pip (8.1.2)
pokemon (0.32)
Pyste (0.9.10)
python-dateutil (2.4.2)
reportlab (3.3.0)
scipy (0.18.1)
setuptools (28.0.0)
six (1.10.0)
virtualenv (15.0.1)
wheel (0.29.0)
If you want your software printed in the format that will populate the requirement.txt
file, then you want:
rice05:~> pip freeze
biopython==1.66
decorator==4.0.6
gbp==0.7.2
nibabel==2.1.0
numpy==1.11.0
pokemon==0.32
Pyste==0.9.10
python-dateutil==2.4.2
reportlab==3.3.0
scipy==0.18.1
six==1.10.0
virtualenv==15.0.1
And you can print this right to file:
# Write to new file
rice05:~> pip freeze > requirements.txt
# Append to file
rice05:~> pip freeze >> requirements.txt
Suggested Citation:
Sochat, Vanessa. "Python Environments, A User Guide." @vsoch (blog), 18 Oct 2016, https://vsoch.github.io/2016/python-install/ (accessed 22 Dec 24).