Job Execution Details#
How Tapis v3 Apps Work β Internal Runtime Workflow
This section explains exactly what happens when you submit a Tapis app (runtime type ZIP or Singularity/Apptainer) to a Slurm-based execution system.
You do not need to know this to use Tapisβbut understanding it is invaluable for:
debugging failed or slow jobs
developing custom Tapis apps
profiling performance and file-transfer overhead
reasoning about where time is actually spent
At a high level, Tapis does not run jobs itself. It automates what you would otherwise do manually:
SSH β mkdir β stage files β write batch script β sbatch β poll β archive
What follows is the full, end-to-end execution timeline, from submission to close-out.
Conceptual Overview (Big Picture)#
When you submit a Tapis job:
Service-side orchestration happens on the login node
SSH, directory creation, file staging, script generation, submission
Actual computation happens on compute nodes
Slurm executes your batch script
Your application runs hereβnot on the login node
Archiving and file delivery happen after completion
Two key artifacts define the control boundary:
Tapis-controlled
tapisjob.sh(scheduler-facing batch wrapper)tapisjob.env(resolved parameters and environment)
User-controlled
tapisjob_app.sh(your app entrypoint)Everything it loads, runs, installs, or launches
Detailed Execution Timeline#
1. From Job Request to Internal Job Record
When you submit a job to a Tapis App:
POST /v3/jobs
Content-Type: application/json
{
"appId": "opensees-mp-s3-latest",
"execSystemId": "stampede3.compute",
"name": "my-opensees-job",
"parameterSet": { ... },
"fileInputs": [ ... ]
}
Tapis:
Validates the request against the App:
Required inputs present?
Parameter types correct?
Values within allowed ranges?
Resolves the execution system (execSystemId) and checks:
Do the systemβs allowed runtimes include the Appβs runtime? (ZIP vs SINGULARITY)
Does the system support the Appβs jobType (BATCH vs FORK)?
Creates a job record in its database with:
Job UUID
App + system references
Initial status (PENDING)
A copy of the effective ArgString / command line it will eventually run.
At this point, nothing has touched the HPC cluster yet. That happens in the next stage.
2. Establishing SSH Context on the Execution System
Once the job is ready to be launched, the Jobs service:
Looks up system-level credentials associated with the Execution System:
Typically an SSH keypair managed by Tapis on behalf of the DesignSafe project.
A mapping from Tapis user identity β HPC account (e.g., a TACC username).
Opens an SSH session to the login node of the execution system:
ssh -i /tapis/keys/system-key \ <hpc-user>@login.stampede3.tacc.utexas.edu
In practice, Tapis uses a pool of SSH connections and reuses them where possible, but conceptually itβs a normal SSH login as the target HPC user.
Sets the working environment:
Default shell = /bin/bash (usually)
$HOME on the HPC system
Any system-level profile scripts (.bashrc, .bash_profile) that the cluster loads automatically.
From TACCβs perspective, this is just another user logging in and running commands.
Where: login node Purpose: orchestration, not computation
3. Directory Creation on the Execution System
Next, Tapis creates a structured job workspace on the execution system. The exact paths are defined in the Execution System JSON, but conceptually:
# Job root (execution directory)
JOB_ROOT=/work2/<proj>/<user>/tapis/jobs/${JOB_UUID}
# Tapis often keeps input/output subdirs as well:
INPUT_DIR=$JOB_ROOT/input
OUTPUT_DIR=$JOB_ROOT/output
LOG_DIR=$JOB_ROOT
mkdir -p "$JOB_ROOT" "$INPUT_DIR" "$OUTPUT_DIR"
These directories correspond to:
execSystemExecDir β $JOB_ROOT
execSystemInputDir β $INPUT_DIR
execSystemOutputDir β $OUTPUT_DIR
On TACC systems, all of these are on a shared parallel filesystem (/work2, /scratch, etc.), so every compute node can see them.
Where: login node Purpose: isolate inputs, runtime, and outputs
4. Input Staging (SSH + Tapis Files subsystem)
Tapis now needs to copy the userβs input files into the jobβs input directory. There are a few cases:
4.1. Inputs Already on the Execution System
If a file input has a path like:
"sourceUrl": "tapis://stampede3.compute/work2/05072/silvia/data/model.tcl"
Tapis maps this to a local path on the HPC filesystem and issues an SSH command:
cp /path/to/source /work2/.../job-12345/input/
for example:
cp /work2/05072/silvia/data/model.tcl \
/work2/05072/silvia/tapis/jobs/${JOB_UUID}/input/
For directories, it uses cp -r or rsync (implementation detail), but the idea is: the jobβs input dir collects everything in one place.
4.2. Inputs Coming from Another Tapis System (e.g., DesignSafe Storage)
If the file is stored on a different Tapis system (e.g., a DesignSafe data system) or a generic HTTP(S) URL (e.g. S3 URL), the Jobs service hands the work off to the Files/Streams subsystem, which:
Either copies data server-to-server (without shipping it through the client), or
Streams it from the external source into the execSystemInputDir path over SSH/SCP.
Conceptually, you can think of it as:
curl -o $INPUT_DIR/model.tcl https://...
# or
scp data-host:/path/to/model.tcl $INPUT_DIR
The actual mechanism is more integrated, but the result is straightforward: all the inputs end up sitting under $INPUT_DIR on a shared filesystem.
Common source of slowdowns: Large numbers of small files staged individually.
Best practices:
Zip/tar inputs
Reuse shared data already in
workorscratchKeep long-lived common files outside the job directory
5. Preparing the Runtime Environment (ZIP vs Singularity/Apptainer)
Now Tapis applies the Appβs runtime and containerImage settings.
5.1. ZIP Runtime
For runtime: βZIPβ, containerImage is something like:
"containerImage": "/work2/05072/silvia/apps/opensees-mp-s3/opensees-mp-s3.zip"
Tapis copies and unpacks the archive:
cp /work2/05072/silvia/apps/opensees-mp-s3/opensees-mp-s3.zip \
/work2/05072/silvia/tapis/jobs/${JOB_UUID}/
cd /work2/05072/silvia/tapis/jobs/${JOB_UUID}
unzip opensees-mp-s3.zip
# or tar -xzf app.tgz, depending on the archive type
Whatβs inside the ZIP is entirely up to the App author: shell scripts, Tcl files, Python scripts, small binaries, templates, etc.
There is no container at this point.
Everything will run directly in the HPC environment (with modules loaded).
Modules are loaded in the batch script or app wrapper
5.2. Singularity / Apptainer Runtime
For runtime: βSINGULARITYβ, containerImage might be:
"containerImage": "/work2/05072/silvia/containers/opensees-env.sif"
Tapis ensures that the .sif file is present (copying it if needed). It then plans to use Apptainer with appropriate bind mounts. The actual execution is deferred to the batch script, but Tapis determines:
The path to the .sif file
The directories that must be bound into the container:
execution directory: $JOB_ROOT (exec dir)
input directory: $INPUT_DIR
output directory: $OUTPUT_DIR
A typical command line in the Slurm script looks like:
apptainer run \
--bind $JOB_ROOT:/TapisExec \
--bind $INPUT_DIR:/TapisInput \
--bind $OUTPUT_DIR:/TapisOutput \
/work2/05072/silvia/containers/opensees-env.sif \
./run-opensees-mp.sh "${PARAMS[@]}"
The important points:
Tapis does not run ApptainerβSlurm does, via the batch script.
Tapis decides what to mount and which .sif to use, then encodes that in the launch script.
6. Generate Launch Artifacts (wrapper + environment)
Tapis generates:
tapisjob.shβ scheduler-facing Slurm batch scripttapisjob.envβ resolved parameters, paths, and environment variables
These files are often archived for reproducibility and debugging.
What you gain: You can inspect exactly what Tapis submitted.
7. Building the Slurm Launch Script
At this stage, Tapis has:
A job directory ($JOB_ROOT)
Inputs staged ($INPUT_DIR)
Either:
A ZIP unpacked into $JOB_ROOT, or
A .sif image accessible from the cluster
Now it generates the batch script for jobType: βBATCHβ.
Tapis writes the batch script line-by-line over SSH:
cat > /work2/05072/silvia/tapis/jobs/${JOB_UUID}/tapis_slurm_script.sh << 'EOF'
#!/bin/bash
#SBATCH -J my-opensees-job
#SBATCH -o tapisjob.out
#SBATCH -e tapisjob.err
#SBATCH -N 2
#SBATCH --ntasks-per-node=56
#SBATCH -t 02:00:00
#SBATCH -p normal
set -e
set -o pipefail
JOB_ROOT=/work2/05072/silvia/tapis/jobs/${JOB_UUID}
INPUT_DIR=$JOB_ROOT/input
OUTPUT_DIR=$JOB_ROOT/output
cd $JOB_ROOT
# Load modules for ZIP runtime, or for inside-container tooling if needed
module purge
module load hdf5
module load opensees
module list
echo "Starting job at $(date)" > $JOB_ROOT/tapisjob.log
# ---- Runtime-specific execution ----
# Example A: ZIP runtime
# (The ZIP unpacked a wrapper script run-opensees-mp.sh)
./run-opensees-mp.sh \
--input $INPUT_DIR/model.tcl \
--output $OUTPUT_DIR \
--np $SLURM_NTASKS \
>> $JOB_ROOT/tapisjob.log 2>&1
# Example B: Singularity runtime
# apptainer run --bind ... opensees-env.sif ./run-opensees-mp.sh ...
# -----------------------------------
EXIT_CODE=$?
echo $EXIT_CODE > $JOB_ROOT/tapisjob.exitcode
echo "Job finished with exit code $EXIT_CODE at $(date)" >> $JOB_ROOT/tapisjob.log
EOF
chmod +x /work2/05072/silvia/tapis/jobs/${JOB_UUID}/tapis_slurm_script.sh
A few things to notice:
Tapis writes the script via a here-document (cat << βEOFβ) over SSH.
The script itself encapsulates:
Scheduler directives (#SBATCH lines)
Module loading
Runtime-specific command (ZIP vs Singularity)
A convention for recording exit codes in tapisjob.exitcode.
Mental model: This is the same script you would write manuallyβjust generated automatically.
8. Submitting the Job to Slurm (sbatch)
To launch the job, Tapis issues:
cd /work2/05072/silvia/tapis/jobs/${JOB_UUID}
sbatch tapis_slurm_script.sh
Slurm responds with something like:
Submitted batch job 9834756
Tapis parses the Slurm job ID (9834756) and stores it in the Tapis job record. From now on, the authoritative compute-side state lives in the scheduler.
The job is now in the SLURM queue, waiting for the requested resources.
9. Scheduler Execution on Compute Nodes
When scheduled, Slurm runs the batch script on allocated compute nodes.
Critical clarification:
SSH, staging, and submission β login/service side
Heavy computation β compute nodes
This is where your application actually runs.
10. Handoff: Tapis Wrapper β Your App Wrapper
On compute nodes, tapisjob.sh:
sources
tapisjob.envchanges into the execution directory
invokes your app entrypoint (commonly
./tapisjob_app.sh)
This boundary is crucial:
Tapis controls:
tapisjob.sh,tapisjob.envThe app author controls:
tapisjob_app.shand everything it does (modules, MPI launch, Python envs, containers, etc.)
11. Monitoring Job State
While the job is pending or running, Tapis periodically reconnects via SSH and queries Slurm:
squeue -j 9834756 -h -o "%T"
# or for more detail:
sacct -j 9834756 --format=JobIDRaw,State,ExitCode
States are mapped into Tapis job states (QUEUED, RUNNING, FINISHED, FAILED):
Slurm PENDING β Tapis QUEUED
Slurm RUNNING β Tapis RUNNING
Slurm COMPLETED β Tapis FINISHED (assuming exit code 0)
Slurm FAILED / CANCELLED β Tapis FAILED (with details)
It may also inspect:
tapisjob.out
tapisjob.err
running Apptainer processes (if applicable)
ls -l tapisjob.out tapisjob.err tapisjob.log
tail -n 40 tapisjob.log
or check for the presence of tapisjob.exitcode.
12. Completion, Exit Codes, and Error Handling
Once Slurm reports a terminal state (COMPLETED, FAILED, CANCELLED, etc.), Tapis:
Reads tapisjob.exitcode if it exists:
cat /work2/05072/silvia/tapis/jobs/${JOB_UUID}/tapisjob.exitcode
If tapisjob.exitcode is missing, it falls back to the Slurm exit code from sacct.
It enumerates files in the output directory:
ls -l /work2/05072/silvia/tapis/jobs/${JOB_UUID}/output/
Harvests output metadata. It collects:
File names and sizes
Timestamps
Any additional metadata the system can provide
Tapis then updates the job record:
status = FINISHED if exit code == 0
status = FAILED if exit code != 0 or Slurm reported a failure
If something goes wrong in earlier stages (input staging, script generation, sbatch failure), Tapis marks the job as FAILED with an appropriate reason and error message in its metadata, but the failure is still surfaced via the same job record.
13a. Archiving Outputs (file-transfer phase #2)
Tapis transfers outputs to the configured archive system/path.
Common source of slowdowns:
Very large output directories
Thousands of small files
Best practices:
Bundle outputs (tar/zip)
Filter aggressively
Write intermediate data to work/scratch, collect later
13b. Output Retrieval (Userβs Perspective)
From the user side, once the job is FINISHED:
Users can call GET /v3/jobs/{jobUuid} to see:
Job metadata
Slurm/job status
Output file listing
Each output file is available via the Files API, which again uses SSH/SCP under the hood to stream the file contents.
Conceptually, Files calls do something like:
# For a download:
cat /work2/05072/silvia/tapis/jobs/${JOB_UUID}/output/results.h5
wrapped in HTTP responses.
14. Final Metadata + Job Closeout
Tapis records:
Exit code
Slurm status
Runtime duration
Output file list
Available system logs
Accessible via:
GET /v3/jobs/{jobUuid}
No further SSH occurs unless files are requested.
15. Optional: File Retrieval via Files Service
When a user requests output content:
GET /v3/files/content?path=/work2/.../job-12345/output/results.h5
Tapis either:
Streams directly from the filesystem, or
Uses SCP/SSH internally (system-dependent)
Summary: Whatβs Really Going Over SSH
If I compress all of that down to a βshell view,β a single Tapis job triggers SSH commands that look roughly like this (in order):
# 1. Directories
mkdir -p $JOB_ROOT $INPUT_DIR $OUTPUT_DIR
# 2. Input staging
cp /some/existing/path $INPUT_DIR/
# or scp/curl from remote source
# 3. Runtime prep
cp /path/to/app.zip $JOB_ROOT/
cd $JOB_ROOT && unzip app.zip
# or ensure /path/to/image.sif exists
# 4. Slurm script
cat > $JOB_ROOT/tapis_slurm_script.sh << 'EOF'
#!/bin/bash
#SBATCH ...
...
EOF
chmod +x $JOB_ROOT/tapis_slurm_script.sh
# 5. Submit
cd $JOB_ROOT
sbatch tapis_slurm_script.sh
# 6. Polling
squeue -j <slurmID> -h -o "%T"
# eventually, sacct for final state + exit code
# 7. Completion
ls -l $OUTPUT_DIR
cat tapisjob.exitcode
Everything elseβthe REST API, the App schema, the Job JSONβsits on top of this fairly standard SSH + scheduler automation.
Condensed Timeline (Reference)#
SSH β mkdir job directories
SSH + Files β stage inputs
SSH β unpack ZIP or locate .sif
SSH β write Slurm script
SSH β sbatch
Slurm β execute on compute nodes
SSH β monitor via squeue
SSH β collect output metadata
Files Service β deliver outputs
Visual Swim-Lane Diagram: Who Does What, Where, and When#
Mental Model (Read Top β Bottom)
Think of a Tapis job as moving vertically in time, while responsibility is split horizontally across four lanes:
ββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββββββββββββ
β You (User) β Tapis Services β Login Node β Compute Nodes β
ββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββββββββ€
β Submit job β Validate job β β β
β (CLI / API / UI) β Resolve inputs β β β
β β Open SSH β SSH session active β β
β β Create directories β mkdir /work/... β β
β β Stage inputs β cp / file transfer β β
β β Unpack ZIP / locate β unzip / locate .sif β β
β β Write Slurm script β tapisjob.sh written β β
β β sbatch β sbatch invoked β β
β β Monitor job β squeue polling β β
β β β β Job starts β
β β β β tapisjob.sh runs β
β β β β β tapisjob_app.sh β
β β β β Your computation β
β β β β runs here β
β β Detect completion β β Job ends β
β β Harvest metadata β ls output dir β β
β β Archive outputs β file transfer out β β
β Retrieve outputs β Files service β β β
ββββββββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββββββ
Key Clarification Reinforced#
Login node
orchestration only
SSH, file staging, script generation, submission
Compute nodes
all real computation
MPI, OpenSees, Python, containers, etc.
Tapis
never executes your science
it automates and observes
Where Performance Tuning Matters Most#
Not all stages are equal. Below are the highest-impact tuning points, ranked roughly by how often they dominate wall-clock time in real workflows.
π΄ Stage 5β6 β Slurm Script Generation & Resource Mapping#
Why it matters
Incorrect resource requests waste queue time
Over-requesting nodes = long queue waits
Under-requesting memory = runtime failure
What to tune
Nodes vs tasks vs threads
Memory per node
Walltime realism
Rule of thumb
Profile first, then scale. One well-profiled job beats 100 blindly submitted ones.
π΄ Stage 9 β Your App Wrapper (tapisjob_app.sh)#
This is the most important boundary.
Everything below this line is your responsibility:
tapisjob.sh β Tapis-controlled
ββββββββββββ
tapisjob_app.sh β YOU
Common tuning opportunities
Avoid repeated module loads
Pre-install Python packages (donβt pip install every run)
Use node-local scratch for temporary files
Launch MPI correctly (
srunvsmpirun)Control logging verbosity
Performance reality
90% of runtime inefficiencies live here.
π΄ Stage 12 β Output Archiving (File-Transfer Phase #2)#
Why it matters
Archiving happens after your job finishes
Users often forget this counts toward perceived job time
Thousands of tiny files are disastrous
Best Practices
β Write raw outputs to scratch/work β Collect + bundle at the end β Archive only final artifacts β Avoid archiving intermediate checkpoints unless needed
π‘ Stage 10 β Monitoring Overhead (Usually Minor)#
Polling via
squeueis lightweightRarely a performance issue unless job states flap rapidly
Performance Tuning Cheat Sheet#
Stage |
Risk |
What to Optimize |
|---|---|---|
Input staging |
π΄ High |
File count, reuse, bundling |
Slurm requests |
π΄ High |
Nodes, memory, walltime |
App wrapper |
π΄ Very High |
MPI launch, I/O, env setup |
Output archiving |
π΄ High |
Bundle outputs |
Monitoring |
π‘ Low |
Rarely critical |
One-Sentence Mental Model (Worth Remembering)#
If a Tapis job is slow, itβs usually not Slurm β itβs file movement or app-level decisions.