Batch Computing Guide
Batch jobs can be submitted via several methods. The most basic is a simple Slurm job. Slurm can also run jobs arrays. We also provide access to a Globus Compute endpoint which can be used to submit jobs.
The Slurm scheduler utilises Fair Share to help with job prioritisation. We also impose general limits on the size and number of jobs submitted by any user.
Depending on the needs of your batch jobs, you may need to specify the partition you want the job to run on. Please see the Hardware page for specifics about the system. If you need to use GPUs, the Using GPUs page will provide generic information to get started.
Slurm job basics¶
Please see Submitting your first job for detailed instructions and examples.
Batch script formatting¶
Jobs on the HPC are submitted in the form of a batch script containing the code you want to run and a header of information needed by our job scheduler Slurm.
#!/bin/bash -e
#SBATCH --job-name SerialJob # job name (shows up in the queue)
#SBATCH --time 00:01:00 # Walltime (HH:MM:SS)
#SBATCH --mem 512MB # Memory in MB
#SBATCH --cpus-per-task 1 # CPUs
<code to be run goes here>
Note: if you are a member of multiple accounts you should add the line
#SBATCH --account=<projectcode>
Submitting¶
Jobs are submitted to the scheduler using:
sbatch myjob.sl
You should receive an output
Submitted batch job 1748836
sbatch can take command line arguments similar to those used in the shell script through SBATCH pragmas.
You can find more details on its use on the Slurm Documentation
Managing and reviewing your Slurm jobs¶
Job Queue¶
The currently queued jobs can be checked using
squeue
You can filter to just your jobs by adding the flag
squeue -u usr9999
You can also filter to just your jobs using
squeue --me
You can find more details on its use on the Slurm Documentation.
You can check all jobs submitted by you in the past day using:
sacct
Or since a specified date using:
sacct -S YYYY-MM-DD
Each job will show as multiple lines, one line for the parent job and then additional lines for each job step.
Tip
sacct -XOnly show parent processes.sacct --state=PENDING/RUNNING/FAILED/CANCELLED/TIMEOUTFilter jobs by state.
You can find more details on its use on the Slurm Documentation.
Cancelling¶
scancel <jobid> will cancel the job described by <jobid>.
You can obtain the job ID by using sacct or squeue.
Tip
scancel -u [username]Kill all jobs submitted by you.scancel {[n1]..[n2]}Kill all jobs with an id between[n1]and[n2].
You can find more details on its use on the Slurm Documentation.