SLURM#
SLURM (Simple Linux Utility for Resource Management) is an open-source job scheduler used in most large supercomputers — including TACC — to manage who runs what, where, and when.
Think of SLURM as the traffic cop and air traffic controller for a supercomputer:
You ask SLURM to run your program.
It finds the right hardware (nodes/cores).
It decides when your job can run based on availability and fairness.
It starts your job, monitors it, and lets you know when it finishes.
What Does SLURM Do?#
Schedules jobs across a supercomputer cluster.
Allocates resources (CPU cores, memory, GPUs).
Queues jobs if resources aren’t currently available.
Monitors and tracks job progress.
Logs results and resource usage for accounting and reproducibility.
SLURM in DesignSafe Apps#
In DesignSafe Web Portal Apps, SLURM is completely hidden from the user — but it’s still doing all the work behind the scenes.
When you launch a job through the web interface:
You select the app (OpenSees, MATLAB, Jupyter, etc.).
You set parameters (e.g. number of cores, run time, input file).
DesignSafe:
Fills in a JSON job description.
Maps it to a SLURM template script.
Submits the job via the Tapis API to TACC.
Monitors the run and stages files back when complete.
Behind the scenes, your job runs on a TACC machine like Stampede3, with SLURM allocating the compute resources and managing the run.
Why Learn to Write Your Own SLURM Scripts?#
Once you understand how the DesignSafe apps stage files, load modules, and run on TACC, you’re ready to take the next step: Writing and submitting SLURM job scripts directly on Stampede3 (or another TACC system).
This unlocks full flexibility:
Launch parameter sweeps with shell loops or job arrays
Fine-tune MPI or OpenMP configuration
Chain multiple programs (preprocessing → simulation → postprocessing)
Coordinate complex workflows or custom software stacks
Writing your own SLURM scripts gives you low-level control of HPC resources — perfect for advanced studies, large batch workflows, or research automation.