Today I had about 10,000 scripts I wanted to run on a SLURM cluster, and I also have an upper limit of jobs I’m allowed to run at once. I’m too lazy to wait and monitor the jobs, so I prepared the sbatch commands in a large file that looks something like this:

sbatch /scratch/users/vsochat/zenodo-ml/slurm/jobs/
sbatch /scratch/users/vsochat/zenodo-ml/slurm/jobs/
sbatch /scratch/users/vsochat/zenodo-ml/slurm/jobs/
sbatch /scratch/users/vsochat/zenodo-ml/slurm/jobs/

Normally I can just run this bash script and the number is within the limit (and I’m fine):

$ /bin/bash

but today I was way over the limit, and had I done this would have just hit the limit and found a bunch of error messages when I returned from my afternoon dinosaur frolicking. Instead of re-writing the script to generate the commands on demand as space opens up (I’ve done this before) I decided to write a script to read in the commands, check the count in the queue, and submit jobs when there is an opening.

I can easily check which jobs still need to run by way of the output being organized by the identifier that is also captured in the script name. If I needed to run this again (and not redo runs) I would simply check for the existence of this folder, and skip if it’s found.

Yes, I’m spamming the queue, and I’m lazy. I’m really that terrible.

Suggested Citation:
Sochat, Vanessa. "Spam the Queue." @vsoch (blog), 28 May 2018, (accessed 12 Jun 24).