Job Execution Details

Job Execution Details#

How Tapis v3 Apps Work — Internal Runtime Workflow

This section explains exactly what happens when you submit a Tapis app (runtime type ZIP or Singularity/Apptainer) to a Slurm-based execution system.

You do not need to know this to use Tapis—but understanding it is invaluable for:

debugging failed or slow jobs
developing custom Tapis apps
profiling performance and file-transfer overhead
reasoning about where time is actually spent

At a high level, Tapis does not run jobs itself. It automates what you would otherwise do manually:

SSH → mkdir → stage files → write batch script → sbatch → poll → archive

What follows is the full, end-to-end execution timeline, from submission to close-out.

Conceptual Overview (Big Picture)#

When you submit a Tapis job:

Service-side orchestration happens on the login node
- SSH, directory creation, file staging, script generation, submission
Actual computation happens on compute nodes
- Slurm executes your batch script
- Your application runs here—not on the login node
Archiving and file delivery happen after completion

Two key artifacts define the control boundary:

Tapis-controlled
- tapisjob.sh (scheduler-facing batch wrapper)
- tapisjob.env (resolved parameters and environment)
User-controlled
- tapisjob_app.sh (your app entrypoint)
- Everything it loads, runs, installs, or launches

Detailed Execution Timeline#

Condensed Timeline (Reference)#

SSH → mkdir job directories
SSH + Files → stage inputs
SSH → unpack ZIP or locate .sif
SSH → write Slurm script
SSH → sbatch
Slurm → execute on compute nodes
SSH → monitor via squeue
SSH → collect output metadata
Files Service → deliver outputs

Visual Swim-Lane Diagram: Who Does What, Where, and When#

Mental Model (Read Top → Bottom)

Think of a Tapis job as moving vertically in time, while responsibility is split horizontally across four lanes:

┌──────────────────┬──────────────────────┬─────────────────────┬──────────────────────┐
│   You (User)     │   Tapis Services     │   Login Node        │   Compute Nodes      │
├──────────────────┼──────────────────────┼─────────────────────┼──────────────────────┤
│ Submit job       │ Validate job         │                     │                      │
│ (CLI / API / UI) │ Resolve inputs       │                     │                      │
│                  │ Open SSH             │ SSH session active  │                      │
│                  │ Create directories   │ mkdir /work/...     │                      │
│                  │ Stage inputs         │ cp / file transfer  │                      │
│                  │ Unpack ZIP / locate  │ unzip / locate .sif │                      │
│                  │ Write Slurm script   │ tapisjob.sh written │                      │
│                  │ sbatch               │ sbatch invoked      │                      │
│                  │ Monitor job          │ squeue polling      │                      │
│                  │                      │                     │ Job starts           │
│                  │                      │                     │ tapisjob.sh runs     │
│                  │                      │                     │ → tapisjob_app.sh    │
│                  │                      │                     │ Your computation     │
│                  │                      │                     │ runs here            │
│                  │ Detect completion    │                     │ Job ends             │
│                  │ Harvest metadata     │ ls output dir       │                      │
│                  │ Archive outputs      │ file transfer out   │                      │
│ Retrieve outputs │ Files service        │                     │                      │
└──────────────────┴──────────────────────┴─────────────────────┴──────────────────────┘

Key Clarification Reinforced#

Login node
- orchestration only
- SSH, file staging, script generation, submission
Compute nodes
- all real computation
- MPI, OpenSees, Python, containers, etc.
Tapis
- never executes your science
- it automates and observes

Where Performance Tuning Matters Most#

Not all stages are equal. Below are the highest-impact tuning points, ranked roughly by how often they dominate wall-clock time in real workflows.

🔴 Stage 3 — Input Staging (Often the Hidden Bottleneck)#

Why it matters

Tapis stages files serially
Thousands of small files are far worse than a few large ones
SSH + metadata operations dominate, not bandwidth

Symptoms

Jobs appear “stuck” in PENDING or STAGING
Very short compute runs but long total job time

Best Practices

✔ Bundle inputs (tar.gz, zip) ✔ Keep shared datasets in work or scratch ✔ Avoid repeatedly staging identical ground motions ✔ Stage once, reuse many times

🔴 Stage 5–6 — Slurm Script Generation & Resource Mapping#

Why it matters

Incorrect resource requests waste queue time
Over-requesting nodes = long queue waits
Under-requesting memory = runtime failure

What to tune

Nodes vs tasks vs threads
Memory per node
Walltime realism

Rule of thumb

Profile first, then scale. One well-profiled job beats 100 blindly submitted ones.

🔴 Stage 9 — Your App Wrapper (`tapisjob_app.sh`)#

This is the most important boundary.

Everything below this line is your responsibility:

tapisjob.sh  ← Tapis-controlled
────────────
tapisjob_app.sh  ← YOU

Common tuning opportunities

Avoid repeated module loads
Pre-install Python packages (don’t pip install every run)
Use node-local scratch for temporary files
Launch MPI correctly (srun vs mpirun)
Control logging verbosity

Performance reality

90% of runtime inefficiencies live here.

🔴 Stage 12 — Output Archiving (File-Transfer Phase #2)#

Why it matters

Archiving happens after your job finishes
Users often forget this counts toward perceived job time
Thousands of tiny files are disastrous

Best Practices

✔ Write raw outputs to scratch/work ✔ Collect + bundle at the end ✔ Archive only final artifacts ✔ Avoid archiving intermediate checkpoints unless needed

🟡 Stage 10 — Monitoring Overhead (Usually Minor)#

Polling via squeue is lightweight
Rarely a performance issue unless job states flap rapidly

Performance Tuning Cheat Sheet#

Stage	Risk	What to Optimize
Input staging	🔴 High	File count, reuse, bundling
Slurm requests	🔴 High	Nodes, memory, walltime
App wrapper	🔴 Very High	MPI launch, I/O, env setup
Output archiving	🔴 High	Bundle outputs
Monitoring	🟡 Low	Rarely critical

One-Sentence Mental Model (Worth Remembering)#

If a Tapis job is slow, it’s usually not Slurm — it’s file movement or app-level decisions.

Job Execution Details

Contents

Job Execution Details#

Conceptual Overview (Big Picture)#

Detailed Execution Timeline#

Condensed Timeline (Reference)#

Visual Swim-Lane Diagram: Who Does What, Where, and When#

Key Clarification Reinforced#

Where Performance Tuning Matters Most#

🔴 Stage 3 — Input Staging (Often the Hidden Bottleneck)#

🔴 Stage 5–6 — Slurm Script Generation & Resource Mapping#

🔴 Stage 9 — Your App Wrapper (tapisjob_app.sh)#

🔴 Stage 12 — Output Archiving (File-Transfer Phase #2)#

🟡 Stage 10 — Monitoring Overhead (Usually Minor)#

Performance Tuning Cheat Sheet#

One-Sentence Mental Model (Worth Remembering)#

🔴 Stage 9 — Your App Wrapper (`tapisjob_app.sh`)#