Accessing HPC#
How to Use TACC’s HPC on DesignSafe
High Performance Computing (HPC) systems like Stampede3 at TACC are powerful clusters built for large-scale simulations and data analysis. Unlike personal computers, you don’t run programs directly — instead, you interact with a layered environment designed to manage thousands of users fairly and efficiently.
This section explains the architecture of HPC systems, your access options (via JupyterHub, SSH, or Tapis), and how jobs are submitted and executed.
HPC Architecture: Components and Roles#
Component |
Purpose |
|---|---|
Login Nodes |
Where users connect via SSH or JupyterHub to prepare and submit jobs |
Compute Nodes |
Where your jobs actually run (scheduled by SLURM) |
Storage Systems |
|
When you connect to TACC via SSH or JupyterHub, you’re entering a login node — a shared environment for lightweight tasks like file editing, module loading, and job submission. Actual computation happens on compute nodes, which are allocated by the SLURM scheduler when you submit a job.
Submitting Jobs: The Queueing Model#
On HPC systems, you do not run scripts directly like on your desktop. Instead:
You write a SLURM job script specifying resources and commands.
Submit it using
sbatch job.slurm.SLURM places your job in a queue, schedules resources, and runs it.
The job runs independently without real-time input — like launching a rocket.
Think of it like checking a bag at the airport: you hand it off, and the system handles the rest.
Interactive Options#
idev: Temporary Shell on Compute Node
To get a real shell on a compute node for compiling, testing, or debugging:
idev -N 1 -n 4 -p development -t 00:30:00
This launches an interactive SLURM job, granting temporary access to a compute node where you can:
Test MPI/OpenMP configurations
Debug scripts and workflows
Run small-scale exploratory jobs
JupyterHub: Web-Based Interactive Access#
TACC’s JupyterHub (e.g., TAP) and DesignSafe’s JupyterLab HPC environments both provide browser-based interfaces to HPC, but with important distinctions.
TACC JupyterHub (TAP)
Submits your Jupyter session as a job that runs on a container or virtual node
You get access to full resources (e.g., 8 cores, 20 GB RAM)
Sessions may queue before launching due to resource limits
DesignSafe JupyterLab HPC (CPU/GPU)
Offers dedicated access to Stampede3 compute nodes
Originally intended for machine learning workflows, now widely used for:
OpenSees
Python simulations
Other research code needing HPC scale
Comes in multiple flavors (CPU, GPU, Native), and may evolve
Despite being interactive, both systems are still subject to HPC constraints like queueing, resource limits, and node availability.
The HPC Environment via SSH#
For full control and flexibility:
SSH to the login node
Prepare scripts, stage files, and manage your data
Submit jobs with SLURM manually
This approach is ideal for:
Complex configurations
Advanced Linux workflows
Fine-grained resource requests
Even in this mode, you can still leverage Tapis to automate file transfers, script generation, and job submission.
Integrating with Tapis#
Whether you use Jupyter, SSH, or the Web Portal, Tapis allows you to:
Upload/download files
Generate SLURM job scripts
Submit jobs programmatically
Retrieve output automatically
This is especially useful for parametric studies, automated campaigns, and reproducible workflows.
Summary: HPC Access Model#
Method |
Best For |
|---|---|
Login Node (SSH) |
Setup, editing, compiling, staging data |
Compute Node (SLURM) |
Actual execution of simulations |
idev |
Real-time debugging and test runs on compute nodes |
JupyterHub (TACC) |
Interactive sessions via queued containerized jobs |
DesignSafe JupyterLab |
Full-node interactive access for OpenSees, ML, and Python workflows |
Tapis Automation |
Orchestrating multi-job pipelines and managing files programmatically |
Understanding this model explains why:
Jobs don’t start instantly — they’re queued
You can’t “just run” a script from your terminal — you must submit
Jupyter sessions are powerful but also resource-limited
All methods, whether batch or interactive, follow the same SLURM-backed model