Paths in Tapis Applications#

How Storage Is Referenced in Jobs

Tapis provides a unified way to access storage systems, but paths are never standalone. Every path is interpreted relative to a system.


System + Path (The Core Concept)#

In Tapis, file locations are defined using:

  • systemId → where the file lives

  • path → where it is on that system

Conceptually:

(systemId, path)

Documentation often shows this as:

tapis://systemId/path/to/file

…but this is shorthand. APIs and job definitions always separate the two.


Common DesignSafe Storage Systems#

Storage Location

Typical System ID

MyData

designsafe.storage.default

Work

designsafe.work.stampede3

Community

designsafe.storage.community

Scratch (HPC)

designsafe.compute.stampede3


Example: Input File Reference#

{
  "sourceSystemId": "designsafe.storage.default",
  "sourcePath": "/myfolder/input/model.tcl"
}

This tells Tapis exactly where to fetch the file from.


Output Locations Matter#

If your app writes output to:

/scratch/05072/silvia/run01

Those files will not automatically appear in JupyterHub or the Data Depot.

Best practice:

  • Write final outputs to WORK or MyData

  • Or explicitly copy them at the end of your job


Why Tapis Paths Feel Strict (and That’s Good)#

Tapis enforces clarity:

  • No ambiguous relative paths

  • No implicit working directories

  • No guessing which system you meant

This discipline is what allows:

  • Web-submitted jobs

  • Automated pipelines

  • Reproducible workflows


Summary: Tapis Path Rules of Thumb#

  • Always think system first, path second

  • Scratch is fast but temporary

  • Work is the bridge

  • Final results belong in Work or MyData


Designing Path-Safe, Reproducible Workflows#

From Interactive Exploration to Scalable Runs

Paths are not just technical details—they encode workflow intent.

A good path strategy answers:

  • Where do inputs live?

  • Where do intermediate results go?

  • Which outputs should persist?



Example Workflow Pattern#

MyData/
 └── projectA/
     ├── inputs/
     ├── scripts/
     └── notebooks/

Work/
 └── projectA/
     ├── run01/
     ├── run02/
     └── results/

Scratch/
 └── projectA/
     └── run01_tmp/

Portability Principle#

Your code should not change when moving between:

  • Jupyter exploration

  • Single-node tests

  • Large-scale HPC runs

Only paths and environment variables should differ.


Common Anti-Patterns to Avoid#

  • Hard-coding /home/jupyter/… in batch jobs

  • Writing final results only to Scratch

  • Relying on implicit working directories

  • Mixing inputs and outputs in the same folder


The Big Picture#

Path-safe workflows enable:

  • Debugging in Jupyter

  • Scaling on HPC

  • Automation through Tapis

  • Sharing with collaborators

They are foundational—not optional—for serious computational research on DesignSafe.