1. SLURM Job Manager, smanage
Slurm Manage, for submitting and reporting on job arrays run on slurm
Practices for Reproducible Research
Here is a quick getting started for using pytorch on the Sherlock cluster! We have pre-built two containers, Docker containers, then we have pulled onto the cluster as Singularity containers that can help you out:
For tutorials using the containers in detail, see the first link with the README. In the following example, we will show using the Python 2.7 container.
In the getting started snippet, we will show you how to grab an interactive gpu node using
srun
, load the needed libraries and software, and then interact with torch (the module import name
for pytorch) to verify that we have gpu.
Copy the container that you need from @vsoch shared folder
# Pytorch with python 2.7 (this is used in tutorial below)
cp /scratch/users/vsochat/share/pytorch-dev-py2.7.simg $SCRATCH
# Pytorch with python 3
cp /scratch/users/vsochat/share/pytorch-dev.simg $SCRATCH
# Pytorch with python 3 (provided by pytorch/pytorch on Docker Hub)
cp /scratch/users/vsochat/share/pytorch-0.4.1-cuda9-cudnn7-devel.simg $SCRATCH
Grab an interactive node with gpu
$ srun -p gpu --gres=gpu:1 --pty bash
Load modules Singularity and cuda library
$ module use system
$ module load singularity
$ module load cuda
Don’t forget to copy the container, if you didn’t above!
cp /scratch/users/vsochat/share/pytorch-dev-py2.7.simg $SCRATCH
cd $SCRATCH
Shell into the container
Now we are going to shell into the Singularity container! Note the --nv
flag here,
it means “nvidia” and will expose the libraries on the host for the container to see.
You can use this flag with “exec” and “run” too!
$ singularity shell --nv pytorch-dev-py2.7.simg
Singularity: Invoking an interactive shell within container...
Singularity pytorch-dev-py2.7.simg:~> python
We just launched python! Let’s now import torch, and generate a variable called “device” that will be “cuda:0” if we are using the gpu, or “cpu” if not.
Python 2.7.14 |Intel Corporation| (default, May 4 2018, 04:27:35)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import torch
>>> import torch.nn as nn
>>> from torch.autograd import Variable
>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
>>> print(device)
cuda:0
Tada! And from the above we see that we have successfully set up the container to work with Python 2.7 and GPU. You can now read more about pytorch to get started with machine learning.
Thanks to one of our awesome users in the CoCoLab for helping to develop this container and tutorial!
If you need a refresher with job submission, check out our post on SLURM. Do you have questions or want to see another tutorial? Please reach out!
This series guides you through getting started with HPC cluster computing.
Slurm Manage, for submitting and reporting on job arrays run on slurm
A Quick Start to using Singularity on the Sherlock Cluster
Use the Containershare templates and containers on the Stanford Clusters
A custom built pytorch and Singularity image on the Sherlock cluster
Use Jupyter Notebooks via Singularity Containers on Sherlock with Port Forwarding
Use R via a Jupyter Notebook on Sherlock
Use Jupyter Notebooks (optionally with GPU) on Sherlock with Port Forwarding
Use Jupyter Notebooks on Sherlock with Port Forwarding
A native and container-based approach to using Keras for Machine learning with R
How to create and extract rar archives with Python and containers
Getting started with SLURM
Getting started with the Sherlock Cluster at Stanford University
Getting started with the Sherlock Cluster at Stanford University
Using Kerberos to authenticate to a set of resources