Skip to content

The Slurm job scheduler

Sophia uses the Slurm cluster management and job scheduling system. Computations on Sophia are executed through interactive Slurm sessions or via batch compute job scripts.

Job queues a.k.a. partitions

Usage of particular computation resources available on Sophia is governed by specification of a job queue, or partition in Slurm lingo, which can be done either on the commandline as argument to the srun command or via a batch job script that is submitted to the scheduler with the sbatch command.

Sophia compute nodes are organised in the following job queues/partitions:

Name Open for all Sophia users
workq AMD EPYC 7351 (1st gen, 32 cores), 128 GB memory
romeq AMD EPYC 7302 (2nd gen, 32 cores), 128 GB memory
fatq AMD EPYC 7351, 256 GB memory
gpuq 1 Nvidia Quadro P4000 GPU per node
v100 1 Nvidia Tesla V100 per node
Name Exclusive access for DTU Wind Energy staff
windq AMD EPYC 7351 (1st gen, 32 cores), 128 GB memory
windfatq AMD EPYC 7351 (1st gen, 32 cores), 256 GB memory

Use the sinfo command to list information about the Slurm partitions configured and squeue to list compute jobs on queue.

Interactive jobs

For code testing purposes an interactive terminal session is convenient and can be requested using the srun command. The following example illustrates the procedure;

srun --partition windq --time 06:00:00 --nodes 2 --ntasks-per-node 32 --pty bash

which requests an interactive job with 2 compute nodes from the windq partition, and all 32 physical cores available on each, for 6 hours. The session is granted when resources become available and ends when the user exits the terminal or once the time limit - here 6 hours - is reached.

Batch jobs

Once the simulation workflow has been tested via interactive jobs one may wish to run a number of jobs unsupervised. To dispatch jobs programmatically a job script must be prepared;

Slurm job script example

[<username>@sophia1 ~]$ cat slurm.job

#!/bin/bash
#SBATCH --time=2:00:00
#SBATCH --partition=workq
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
echo "hello, Sophia!"

which can then be submitted with the sbatch command,

[<username>@sophia1 ~]$ sbatch slurm.job