App Files#

tapisjob and tapisjob_app.sh – Tapis Job Scripts: How Your Application Actually Runs

When you submit a job through Tapis—whether from the DesignSafe web portal, a Jupyter notebook, or an automated workflow—you are not directly submitting a traditional Slurm script. Instead, Tapis inserts itself as an orchestration layer between your application and the HPC scheduler. One of the most important manifestations of this orchestration is Tapis’s two-script execution model, which cleanly separates scheduler control from application logic.

This separation is intentional. It allows Tapis to manage resource requests, environment injection, logging, monitoring, and portability across execution systems, while allowing you—as the app developer—to focus entirely on the scientific or computational workflow. Understanding how these two scripts interact is essential for debugging jobs, developing custom Tapis apps, and reasoning about where (and how) your code actually runs.


The Two-Script Model: Launcher vs Application#

When running a ZIP-based Tapis application on an HPC system, two scripts work together at runtime:

tapisjob.sh — the scheduler-facing launcher#

tapisjob.sh is a Tapis-generated script created automatically for every job submission. Conceptually, it plays the same role as a Slurm batch script that you would submit manually with sbatch. It contains the scheduler directives that describe the job’s resource requirements—such as node count, cores per node, queue/partition, and walltime—based on the app definition and the job request.

This script is what Slurm actually executes on the compute node once the job leaves the queue. Before launching your application, tapisjob.sh prepares the runtime environment. This typically includes sourcing or exporting variables from a companion file called tapisjob.env, which contains job metadata and resolved parameters provided by Tapis (for example, job UUIDs, allocated resources, and input/output paths).

Once the environment is ready, tapisjob.sh invokes the application entrypoint. In other words, tapisjob.sh is the glue between Tapis and the HPC scheduler: it translates your high-level job description into a concrete batch execution and then hands control to your application logic.

Example tpisjob.sh file
#!/bin/bash

# This script was auto-generated by the Tapis Jobs Service for the purpose
# of running a Tapis application.  The order of execution is as follows:
#
#   1. The batch scheduler options are passed to the scheduler, including any
#      user-specified, scheduler-managed environment variables.
#   2. The application container is run with container options, environment
#      variables and application parameters as supplied in the Tapis job,
#      application and system definitions.

# Slurm directives.
#SBATCH --account DS-HPC1
#SBATCH --job-name tapisjob.sh
#SBATCH --nodes 2
#SBATCH --ntasks 96
#SBATCH --output /scratch/05072/silvia/tapis/966244f7-de44-4404-ac54-9f1da33cda3e-007/tapisjob.out
#SBATCH --partition skx
#SBATCH --time 120

module load opensees/3.6.0

# Issue launch command for application executable.
# Format: nohup ./tapisjob_app.sh > tapisjob.out 2>&1 &

# Export Tapis and user defined environment variables.
. ./tapisjob.env

# Launch app executable.
./tapisjob_app.sh OpenSeesMP simpleMP_WebSubmit.tcl > /scratch/05072/silvia/tapis/966244f7-de44-4404-ac54-9f1da33cda3e-007/tapisjob.out 2>&1

tapisjob_app.sh — your application workflow#

tapisjob_app.sh is the user-provided script that contains the actual commands you want to run. If your ZIP archive includes a file named tapisjob_app.sh at the top level, Tapis treats it as the application entrypoint and does not modify it in any way. This script is analogous to the execution section of a hand-written Slurm script—the place where you load modules, activate environments, launch MPI jobs, and run your analysis codes.

During execution, tapisjob.sh typically calls it directly (for example, ./tapisjob_app.sh), redirecting standard output and error streams to Tapis-managed log files. From your perspective as an app developer, tapisjob_app.sh is your job: it defines the workflow, ordering, and logic of your computation.

If your ZIP archive does not include tapisjob_app.sh, Tapis instead looks for a tapisjob.manifest file. This manifest can specify an alternate executable using a key such as tapisjob_executable=. If neither tapisjob_app.sh nor a valid manifest is present, the job will fail because Tapis has no executable to launch. If both are present, tapisjob_app.sh takes precedence.

Example tapisjob_app.sh file
set -x

BINARYNAME=$1
INPUTSCRIPT=$2
echo "INPUTSCRIPT is $INPUTSCRIPT"

TCLSCRIPT="${INPUTSCRIPT##*/}"
echo "TCLSCRIPT is $TCLSCRIPT"

cd "${inputDirectory}"

echo "Running $BINARYNAME"

ibrun $BINARYNAME $TCLSCRIPT
if [ ! $? ]; then
      echo "OpenSees exited with an error status. $?" >&2
      exit
fi

cd ..

Why Two Scripts Exist#

At first glance, this separation may seem redundant—but it is precisely what gives Tapis its flexibility and portability.

By owning tapisjob.sh, Tapis can automatically inject monitoring, logging, and lifecycle management without requiring any changes to your application logic. For example, Tapis redirects stdout and stderr to standardized files (such as tapisjob.out), records exit status in a tapisjob.exitcode file, and can insert additional bookkeeping or cleanup steps around your application. In container-based runtimes, tapisjob.sh may also handle container invocation and then execute tapisjob_app.sh inside the container environment.

All of this mirrors setup work that advanced users often embed manually in Slurm scripts—but here it is standardized, repeatable, and abstracted away. Meanwhile, you retain full control over the scientific workflow in tapisjob_app.sh, without worrying about scheduler mechanics or Tapis internals.


Reserved Filenames and Best Practices#

All filenames beginning with tapisjob are reserved by Tapis. You should not create or rename arbitrary files using this prefix, as Tapis will overwrite or regenerate them during job execution. As an application developer, you are expected to supply only:

  • tapisjob_app.sh (optional but strongly recommended), and/or

  • tapisjob.manifest (only if you are not using tapisjob_app.sh)

Everything else—tapisjob.sh, tapisjob.env, output logs, and status files—is managed by Tapis itself.


Where Execution Actually Happens#

A critical point that often causes confusion is where these scripts run.

Although users interact with DesignSafe through a web interface or JupyterHub, neither tapisjob.sh nor tapisjob_app.sh runs on a login node. Both scripts execute entirely inside a Slurm job allocation on compute nodes.

When a Tapis job is submitted:

  1. Tapis sends the job request to the HPC scheduler (e.g., Slurm on Stampede3).

  2. Slurm allocates the requested compute nodes.

  3. The ZIP runtime is unpacked into the job execution directory.

  4. Slurm launches tapisjob.sh on the first allocated compute node.

  5. tapisjob.sh invokes tapisjob_app.sh.

  6. tapisjob_app.sh launches the main application binaries.

The compute-node environment is intentionally minimal. Especially on systems using a tacc-no-modules profile, no modules are preloaded and no Python environment is configured. This is why all environment setup—module loads, virtual environments, MPI configuration—must occur inside tapisjob_app.sh. Relying on login-node defaults will lead to failures.

For this reason, a typical tapisjob_app.sh explicitly loads everything it needs, such as:

module load python/3.12.11
module load opensees
module load hdf5/1.14.4

This guarantees that your job runs reproducibly, regardless of the login environment or submission method.


Mental model to remember#

Think of tapisjob.sh as Tapis’s Slurm script, and tapisjob_app.sh as your application’s workflow. One manages how the job runs on the system; the other defines what the job actually does.