Run a Tapis App

Run a Tapis App#

Two Views of the Same Process (User Workflow + Tapis Runtime)

A Tapis job submission has one lifecycle, but it can be described from two perspectives:

A. What the app-user does (the “front” process)#

User-facing workflow: what you choose and what you provide (Portal / CLI / Tapipy)

B. What Tapis does (the “internal” runtime process)#

Runtime workflow: what Tapis automates on the execution system (SSH + filesystem + scheduler + archiving)

The internal runtime workflow (stage → submit → run → archive)

This is the same lifecycle, described by the system actions that occur after you submit.

This section describes what happens after you click “Run Job” (or submit via API). The exact details vary by execution system and runtime type, but the pattern is consistent:

validate → stage → unpack/prepare → submit → monitor → archive

The key idea is that Tapis is an orchestrator: it stages files, generates a scheduler script, submits to Slurm, monitors, then archives outputs.

The lifecycle at a glance (swimlane)#

USER (Portal / CLI / Tapipy)              TAPIS (Jobs Service + Files)                 HPC (Stampede3 / Slurm)
───────────────────────────────────       ─────────────────────────────────────────   ─────────────────────────
1) Pick app + version  ───────────────▶   Validate request (app schema) 
2) Provide inputs/params ─────────────▶   Create job record (UUID, config)
3) Request resources ─────────────────▶   Stage inputs + runtime (file transfer)
                                          Build batch script
                                          sbatch batch_script ─────────────────────▶  Queue (PENDING)
                                          Poll scheduler status  ◀──────────────────  Run (RUNNING)
4) Monitor status  ◀───────────────────   Map scheduler states to Tapis states
5) Get results  ◀──────────────────────   Archive outputs (file transfer)
                                          Provide outputs via Files API

On shared systems like Stampede3, jobs may queue before running due to demand — this delay is the trade-off for accessing powerful resources.

Appendix: Tapis job execution (SSH + Slurm timeline)#

A “shell view” of what the Jobs service typically does

Tapis does not run jobs internally. The Jobs service automates what you would otherwise do manually on an HPC system:

SSH into the execution system (as the effective HPC user)
Create job directories
Stage inputs and runtime assets
Write a scheduler batch script
Submit and monitor the scheduler job
Archive outputs and expose them via the Files service

Condensed behind-the-scenes timeline

SSH → mkdir job directories
SSH + Files → stage inputs
SSH → copy/unpack ZIP (or locate container image)
SSH → write scheduler script
SSH → submit (sbatch)
SSH → poll (squeue / sacct)
SSH → collect output metadata
Files Service → deliver outputs

Where to look when debugging#

If stuck in STAGING_INPUTS → input transfers, too many files, remote source delays

If stuck in QUEUED/PENDING → scheduler wait time (partition, allocation, walltime)

If failing in RUNNING → wrapper logic, module loads, env vars, executable errors

If slow after FINISHED → archiving overhead (again: too many files)

Important performance note: many “slow jobs” are not slow because compute is slow — they’re slow because file transfer is slow. The input staging and output archiving phases can dominate runtime when there are many small files. When possible: reduce file counts, reuse common datasets from Work/Scratch, or bundle inputs/outputs as a ZIP/TAR that you unpack/pack inside your wrapper.

Practical debugging: “Where is my time going?” Use the lifecycle to localize bottlenecks

When users say “Tapis is slow,” it usually means one of these stages:

Slow before RUNNING → input staging or queue wait
Slow after FINISHED → archiving (lots of files or large directories)
Slow during RUNNING → your executable/runtime environment

File-transfer advice (high impact):

Minimize file count (thousands of small files is worse than one big file)
Keep common datasets (e.g., ground motions) in Work/Scratch, and reuse them
Bundle inputs/outputs as ZIP/TAR and extract/pack inside the wrapper
Consider writing intermediate results to Work/Scratch and collecting only what you need at the end