Temporary storage performance-hierarchy

The temporary storage options are listed below.

Each compute node's local `/tmp` directory

Sophia compute nodes are ephemeral meaning they fetch a minimal CentOS installation at each reboot which then runs in the node's memory. This means that the underlying XFS file system runs on the compute node's random access memory hardware with orders of magnitude higher bandwidth and lower latency compared with disk drives. Thus, e.g. to eliminate I/O as the limiting factor in a application performance benchmark experiment, a Sophia user can run test code from the /tmp directory on up to 32 MPI processes on a single Sophia node with 32 physical cores.

The RAM-based storage provides approximately 64GB capacity per node (minus OS usage).

Each compute node's local `/scratch` directory

Each compute node is equipped with a 1TB spinning disk mounted at /scratch. This provides significantly more storage space than the RAM-based /tmp directory, while still offering good I/O performance for local storage needs. This storage is node-local and is ideal for larger temporary datasets that will not fit in RAM but do not require shared access across nodes.

Burst buffer

Two servers with NVMe disks - Sophia's burst buffer nodes - are connected to Sophia compute nodes via the HPC cluster's Mellanox EDR Infiniband interconnect, and each server connects to the Ceph file system via bonded 10Gbps connections (i.e. 20Gbps). Sophia's burst buffer runs the BeeGFS file system, striping data written from compute nodes across BeeGFS storage targets. The storage resource is mounted on /work on Sophia's head node and all compute nodes and provides approximately 50TB of shared storage space accessible from all nodes.

Automatic Temporary Directory Management

To simplify temporary storage usage and ensure proper cleanup, we have implemented automatic temporary directory management through predefined SLURM variables:

$TMPRAM - Points to /tmp/users/$SLURM_JOB_UID/$SLURM_JOB_ID
$TMPDISK - Points to /scratch/users/$SLURM_JOB_UID/$SLURM_JOB_ID
$TMPSHARE - Points to /work/users/$SLURM_JOB_UID/$SLURM_JOB_ID

These directories are automatically created when your job starts and deleted when your job completes, regardless of whether the job completes successfully or fails. This eliminates the need for manual cleanup and helps maintain system performance.

Usage Example

#!/bin/bash
#SBATCH [your job parameters]

# Use RAM-based storage for very fast I/O
./fast_io_program --workdir=$TMPRAM

# Use local disk for larger datasets
cp large_input.dat $TMPDISK/
./analysis_program --input=$TMPDISK/large_input.dat --output=$TMPDISK/results.dat
cp $TMPDISK/results.dat $HOME/results/

# Use shared storage for multi-node access
mpirun -n 64 ./parallel_sim --output=$TMPSHARE/simulation_output/

Important: Any data you wish to keep must be copied to your home directory or permanent storage before your job ends, as the temporary directories will be automatically removed.