Simple Linux for Resource Management (SLURM)

SLURM is a job scheduling tool. If you read our Sherlock docs, you will remember this setup. You can submit jobs to SLURM from the set of machines that you work from, the login nodes, The submission is sent to a master node queue, and the jobs are sent out to the workers, which are other machines on the cluster.

                +------------+
                |            |
                |  Worker 2  |
+------------+  |  (GPU)     |
|            |  |            |
|  Worker 1  |  +------------+
|            |
+----^-------+   +----------------------+
     |           | Priority Queue       |
     +-----------+ 1. Job A UserMark    |
     2. Run Job  | 2. Job B UserFred    |
                 |                      |
                 +-^--------------------+
                   |
                   |
                   |
                   |  1. Submit Job (sbatch...)
                   |
     +-------------+-+  +---------------+
     |               |  |               |
     |               |  |               |
     |  Login Node 1 |  |  Login Node 2 |
     |               |  |               |
     |               |  |               |
     +---------------+  +---------------+

Jobs

How do I run jobs? This is the most important function that you must do! See the SLURM guide for a detailed introduction and walk through. There are two kinds of jobs:

  • interactive jobs: means jumping yourself onto a worker node, and issuing commands interactively. If you want to test or otherwise run commands manually, you should ask for an interactive node. You should not be running things on the login node because it hinders performance for other users. Your processes will be killed if they take up too much memory.
  • batch jobs means handing a submission script to the SLURM batch scheduler to give instruction for running one or more jobs. We provide detailed instructions in the link above, and quick command references below.

Command Quick Reference

Jobs

# Submit a job
sbatch .jobs/jobFile.job

# See the entire job queue (note that you are allowed 5000 in queue at once)
squeue

# See only jobs for a given user
squeue -u username

# Crap, kill a job with ID $PID
scancel $PID

# Kill ALL jobs for a user
scancel -u username

# Kill all pending jobs
scancel -u username --state=pending

# Stop and restart jobs interactively with SLURM's scancel.
scancel -s SIGSTOP job id

# Run interactive node with 16 cores (12 plus all memory on 1 node) for 4 hours:
srun -n 12 -N 1 --mem=64000 --time 4:0:0 --pty bash

# Claim interactive node for exclusive use, 8 hours
srun --exclusive --time 8:0:0 --pty bash

# Same as above, but with X11 forwarding
srun --exclusive --time 8:0:0 --x11 --pty bash

# Same as above, but with priority over your other jobs
srun --nice=9999 --exclusive --time 8:0:0 --x11 --pty -p dev -t 12:00 bash

# Count number of running / in queue jobs
squeue -u username | wc -l

# Get estimated start times for your jobs (when Sherlock is busy)
squeue --start -u username

# Request big memory node for 4 hours
srun -N 1 -p bigmem -t 4:0:0 --pty bash

# Run a job with 1 node, 4 CPU cores, and 2 GPU
srun -N 1 -n 4 --gres=gpy:2 -p gpu --qos=gpu

Big Memory Nodes

# Submit a job
sbatch -p bigmem --qos bigmem /path/to/job/gica.job

# Interactive job
srun -p bigmem --qos bigmem --time 1:00 --exclusive --x11 --pty bash

Software Modules

# What modules are available?
module avail

# Load a module
module load MODULE_NAME

# Some modules need to be loaded under a group first
module load system
module load singularity/2.4.5

Disk Usage

# Get usage for file systems
df -k

# Get usage for your home directory (useful if you are close to quota)
du

Interactive Jobs

srun -t 2-00 --pty bash

With this command you get an interactive node available for 2 days (-t 2-00) in an interactive mode (–pty) bash shell. If you want to be the exclusive user of the node (“I want all the memory!” (diabolical laugh) add --exclusive.

You can also designate how much memory you would like to use:

srun --mem=32G --pty bash

Job Status

  • sinfo: show the overall status of each partition, like how many nodes are idle per partition.
  • squeue: display all jobs currently running, per given condition.
  • sstat -j jobID: show the status of a currently running job.
  • sacct -j jobID: show the final status of a finished job.
  • scancel: cancel job(s). You can specify job state/conditions to cancel (e.g., --state PENDING).

Show jobs from userID

squeue -u userID

Batch Jobs

Let’s say that we have a script that we want to run, analysis.sh. We can hand the script off to be run on a node as follows:

sbatch  -o out.%j -e err.%j analysis.sh arg1 value1

(Standard) Output of this script will be saved as out.%j (%j will be replaced with the jobID) in the current directory, and $STDERR will be in err.%j. The .out and .job files can be written where you will easily find them, and this is a good strategy for debugging. I usually like to keep an .out, and .job folder in my script directory so they are organized and easy to find. The string %j will produce a unique identifier for the job, and the arg1 and value1 are example input args and values for your script.

An example job script is shown below.

#!/bin/bash
#SBATCH --job-name=TRM000_phonetic.job
#SBATCH --output=TRM000_phonetic.job.out
#SBATCH --error=TRM000_phonetic.job.err
#SBATCH --time=2-00:00
#SBATCH --mem=12000
#SBATCH --qos=normal
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER@stanford.edu
Rscript $HOME/shapleyProbeSets.R $SCRATCH/DATA/input.tab 0.05 1000 myJob.R

The above job will submit a job named TRM000_phonetic.job, and write to correponding error and output files. It will have a maximum run time of 2 days, ask for 12GB memory, and be submit to the normal queue.