Ranks in MPI Jobs#
MPI ranks, nodes, and scratch space: what you need to know first
Before using node-local /tmp storage safely in an MPI job, it is important to understand how parallel processes are organized and identified at runtime. The file-management patterns that follow rely on these concepts to avoid file collisions, race conditions, and data loss.
This section introduces the core MPI and Slurm concepts you will see referenced throughout the storage chapter.
What is an MPI rank?#
In an MPI job, your program is launched multiple times in parallel. Each running instance is called a rank.
Every rank executes the same program
Each rank has a unique integer ID
Ranks typically cooperate by exchanging data via MPI
The rank ID is commonly referred to as:
MPI rank (0, 1, 2, ..., N−1)
In Slurm-launched jobs (including Tapis jobs on TACC systems), this value is exposed as:
SLURM_PROCID
Example:
SLURM_PROCID=0→ rank 0 (often used for coordination or aggregation)SLURM_PROCID=7→ the 8th MPI process
When writing files, two ranks with the same filename will overwrite each other unless separated. This is why rank-aware paths are essential.
Nodes vs. ranks: why this distinction matters#
A node is a physical (or virtual) machine in the cluster.
Each node has:
its own CPUs
its own memory
its own
/tmpdirectory
Multiple MPI ranks typically run on the same node
Slurm provides two important identifiers:
Concept |
Meaning |
Slurm variable |
|---|---|---|
Rank ID |
Global MPI process index |
|
Node ID |
Which node in the allocation |
|
Local rank |
Rank index within a node |
|
Example:
Rank 12 may be:
SLURM_NODEID=1(second node)SLURM_LOCALID=4(5th rank on that node)
This matters because all ranks on the same node share the same /tmp directory.
Common environment variables#
These variables are automatically defined by Slurm (and therefore by Tapis on Slurm systems):
Variable |
Description |
|---|---|
|
Global MPI rank ID |
|
Rank index within a node |
|
Node index within the job allocation |
|
Scheduler job identifier |
|
Unix username |
|
Shared job execution directory |
Throughout this chapter, these variables are used to:
create unique per-rank or per-node directories
coordinate file copies safely
control cleanup behavior
Why rank-aware file management is essential#
Without rank-aware file paths:
ranks overwrite each other’s temporary files
partial files are read before being fully written
jobs fail intermittently and are difficult to debug
By explicitly separating files by rank and node, you gain:
deterministic behavior
reproducibility
safe use of high-performance node-local storage
How this fits into the broader storage model#
In summary:
Shared filesystems (e.g., Work, Scratch):
single job directory
visible to all nodes
ideal for inputs and final outputs
Node-local
/tmp:fast but ephemeral
requires explicit management
ideal for temporary, I/O-intensive data
The MPI-safe patterns that follow build directly on these concepts and show how to combine correctness and performance in large-scale parallel workflows.