SLURM-Job Workflow#
In this section we explain the workflow from two complementary perspectives:
Platform-level view — what happens when you submit a job from DesignSafe
Scheduler-level view — how SLURM actually queues, schedules, and runs the job
Where Job Submission Happens
SLURM job submission commands (such as sbatch) are issued from a login or submission node, not from compute nodes. When using DesignSafe, this step is handled automatically for you. When working directly on a TACC system, you typically access a login node via SSH to prepare files and submit jobs. A dedicated section later in this book covers job submission environments and workflows in detail.
1. Platform-level view: SLURM-Job Submission Workflow#
How Are Jobs Run for DesignSafe on TACC? – Big Picture
TACC (Texas Advanced Computing Center) provides the high-performance computing (HPC) resources that DesignSafe uses to run jobs. Specifically, jobs submitted from DesignSafe are executed on TACC’s systems — typically Stampede3 or similar clusters.
Step-by-step Process:
Log into DesignSafe
You access it through https://www.designsafe-ci.org
You don’t need to interact with TACC directly unless you want to.
Use the Workspace
DesignSafe has an interactive workspace:
Jupyter Notebooks
MATLAB
SimCenter tools
Other scientific apps like OpenSees, ADCIRC, etc.
These are all powered on the backend by TACC resources.
Submit Job When you submit a job via DesignSafe:
The job is packaged with:
Your script
Input data
Parameters (like CPU cores, memory, wall time)
DesignSafe translates this into a SLURM job (TACC uses SLURM as its job scheduler).
The job is then sent to a TACC queue (often the designsafe or community queue on Stampede3).
Execution on TACC
TACC runs the job exactly like any HPC job using SLURM.
The job runs on actual compute nodes (on Stampede3, for example).
Output files are saved in your DesignSafe project or data workspace.
Results
Once complete, results are sent back to DesignSafe automatically.
You can view, download, or analyze the data directly in the platform.
1. Scheduler-level view: SLURM-Job Submission Workflow#
How does SLURM manage your job
Job Submission
You submit a job with a job script that specifies:
Number of nodes and cores (e.g., 32 cores on 4 nodes – –ntasks=32, –nodes=4)
Maximum runtime (e.g., 4 hours – –time=04:00:00)
Memory requirements (e.g., –mem=16G)
Partition (queue) to submit to (NODE TYPE – –partition=skx)
Allocation (project)
Submit with: sbatch job_script.sh
Job Enters the Queue
Your job enters the SLURM queue and is assigned a priority.
You can monitor the queue status using SLURM commands (e.g., squeue, sacct, etc.)
SLURM Schedules the Job SLURM decides when and where to run your job based on:
Requested resources: Number of nodes, memory, and runtime.
Queue priority: System policies may prioritize shorter/smaller jobs.
Current system load: Jobs may wait until required nodes become free.
Job Execution
Once sufficient resources are available, SLURM starts the job.
Your job runs on the assigned compute nodes for the allocated time.
Job Completion When the job finishes:
Output files and logs (e.g., SLURM-
.out ) are generated.You can check results and performance.