Try on DesignSafe

Tapis Paths#

How DesignSafe File Storage and Tapis Work Together

by Silvia Mazzoni, DesignSafe, 2025

Tapis powers file access and job submission on DesignSafe. It provides a consistent interface to interact with multiple storage systems and compute environments, making it easier to manage data before, during, and after simulation workflows.

With Tapis, you can:

  • List, upload, download, move, and delete files across storage systems

  • Stage input files (e.g., move from long-term storage to a compute node)

  • Collect outputs automatically and return them to Corral (MyData)

  • Use the same scripting or automation tools across locations

Tapis acts as the glue between DesignSafe’s storage and compute environments, streamlining data movement and improving reproducibility.

What is a URI?#

A URI (Uniform Resource Identifier) is the formal way Tapis identifies the location of your files and directories. Think of it as the “address” for data in the Tapis ecosystem. Instead of just using a simple path (like /home/user/file.txt), a Tapis URI encodes both where the file lives (which storage system) and what the path to the file is on that system.

For example:

tapis://designsafe.storage.mydata/home/username/project/data/input.txt
  • tapis:// → tells us we’re using Tapis to access this resource

  • designsafe.storage.mydata → identifies the storage system (e.g., MyData, Community Data, Work on Stampede3, etc.)

  • /home/username/project/data/input.txt → the actual path on that system

This consistent URI format allows you to write scripts or automation that work across different storage systems without changing code every time.


In this notebook, we’ll assemble a JSON dictionary of your Tapis paths and save it to your MyData (~/MyData/.tapis_user_paths.json) so you can reference it from any script at any time, even in new Jupyter Sessions, or from elsewhere.

Make sure you have Established Your System Credentials first!


The Two Parts of a Tapis Path#

A Tapis path is conceptually:

tapis://<SYSTEM_ID>/<RELATIVE_PATH>
  1. SYSTEM_ID — which storage system you’re targeting Examples:

    • designsafe.storage.default (MyData),

    • designsafe.storage.community (Community),

    • cloud.data (Work allocations), or a

    • project-scoped system (e.g., project-)

  2. RELATIVE_PATH — the location inside that system’s root This is not relative to your program’s current directory; it’s relative to the system root.

    Tapis ignores CWD entirely; it’s always “(system, path-within-that-system).”

When building your tapis paths, it’s always a good idea to define these two components separately and join them only in the input. This makes your script more portable and reusable.


How Tapis Paths Differ from Jupyter Paths#

  • Jupyter/OS paths (e.g., /home/jovyan/… or data/run1.csv) resolve relative to your current working directory (CWD) on that machine.

  • Tapis paths ignore CWD. They always mean (system, path-within-that-system), no matter where your code runs.

  • For batch/automation (Tapis jobs, SLURM), Tapis paths are more portable than CWD-dependent relative paths.


Obtaining File-System Paths#

Start with the Easy, Fixed Bases#

These don’t depend on an HPC system or allocation. Once you know your username, the bases are stable. Because you can obtain your username programmatically from Tapis, you may not need to know it.

Storage

Typical Base (Tapis)

Notes

MyData

tapis://designsafe.storage.default//

Your personal storage (aka Corral)

Community

tapis://designsafe.storage.community/

Public community content (read-mostly)

Published

tapis://designsafe.storage.published/

Published content (read-only)

Examples

  • MyData:

    MyData/inputs/model.tcl → tapis://designsafe.storage.default/username/inputs/model.tcl

    don’t forget your username!

  • CommunityData:

    CommunityData/Records/ATC-63/groundmotion.at2 → tapis://designsafe.storage.community/Records/ATC-63/groundmotion.at2

  • Published:

    Published/Records/ATC-63/groundmotion.at2 → tapis://designsafe.storage.published/Records/ATC-63/groundmotion.at2

    You can find the relative path in the Data Depot

Use these as bases, then append project/job-specific subpaths.

# Connect to Tapis
t=OpsUtils.connect_tapis()
 -- Checking Tapis token --
 Token loaded from file. Token is still valid!
 Token expires at: 2025-09-05T23:57:32+00:00
 Token expires in: 3:39:50.881346
-- LOG IN SUCCESSFUL! --
# Initialize Json dictionary
TapisPaths = {}

Obtain your username programmatically#

Using a utility function

username = OpsUtils.get_tapis_username(t)
print('username:',username)
username: silvia
# we will make the keys lower case, as they'll be easier to match
TapisPaths['mydata'] = f'tapis://designsafe.storage.default/{username}'
TapisPaths['community'] = f'tapis://designsafe.storage.community'
TapisPaths['published'] = f'tapis://designsafe.storage.published'

Next: Work (User & System Dependent)#

Work is the shared, high-performance project area mounted on both JupyterHub and HPC—ideal for staging inputs and storing outputs for jobs. Its base includes allocation and username, and differs by system (e.g., Stampede3 vs. LS6).

Typical form:

tapis://cloud.data/work/<allocation>/<username>/<system>/

Because this base is user/system-specific, it’s best to discover it once and save it.

save to a file#

This is the default (“~/MyData/.tapis_user_paths.json”) path that will be used in the utility functions (OpsUtils) used in this training module.

print('TapisPaths',TapisPaths)
TapisPaths {'mydata': 'tapis://designsafe.storage.default/silvia', 'community': 'tapis://designsafe.storage.community', 'published': 'tapis://designsafe.storage.published', 'work/stampede3': 'tapis://cloud.data/work/05072/silvia/stampede3/', 'work/ls6': 'tapis://cloud.data/work/05072/silvia/ls6/', 'work/frontera': 'tapis://cloud.data/work/05072/silvia/frontera/'}
import json
Path("~/MyData/.tapis_user_paths.json").expanduser().write_text(json.dumps(TapisPaths, indent=2))
368

Use Utility Function#

We have put the above steps into a utility function: get_tapis_base_paths(t,system) that you can call at the beginning of your script.

get_user_path_tapis_uri.py
# /home/jupyter/CommunityData/OpenSees/TrainingMaterial/training-OpenSees-on-DesignSafe/OpsUtils/OpsUtils/Tapis/get_user_path_tapis_uri.py
from __future__ import annotations

def get_user_path_tapis_uri(
    t,
    file_system: str = "none",                  # "none" | "mydata" | "community" | "work/stampede3","work/ls6","work/frontera"

    paths_file_path: str = "~/MyData/.tapis_user_paths.json",
    force_refresh: bool = False,
) -> Union[str, Dict]:
    """
    Discover and cache user-specific Tapis base URIs for DesignSafe storage systems,
    then return either the entire dictionary or a single base URI.

    Author
    -------
    Silvia Mazzoni, DesignSafe (silviamazzoni@yahoo.com)

    Parameters
    ----------
    t : Tapipy client
        An authenticated Tapipy v3 client.
    file_system : {"none","mydata","community","work"}, optional
        Which base to return. Use "none" to return the full dictionary.
        When file_system="work", which HPC system's Work base to return.
    paths_file_path : str, optional
        Location (on MyData or local home) where the JSON cache is stored.
        Default: "~/MyData/.tapis_user_paths.json".
    force_refresh : bool, optional
        If True, (re)discover all bases and overwrite the cache file.

    Returns
    -------
    Union[str, dict]
        - If file_system == "none": the full dict of bases (including subdict for "work").
        - Else: a single base URI string for the requested system.

    Notes
    -----
    - Stored values are full Tapis URIs (start with "tapis://" and end with "/").
    - Keys are lowercase: "mydata", "community", "work". For "work", values are a dict
      keyed by HPC system ("stampede3", "ls6", "frontera").
    """
    import json
    import os
    from pathlib import Path
    from typing import Dict, Optional, Union, Iterable, Sequence
    from OpsUtils import OpsUtils

    # ----------------------------
    # normalize & validate inputs
    # ----------------------------
    fs = (file_system or "none").strip().lower()
    # print('fs',fs)

    # Handle loose input like "CommunityData"
    if "community" in fs:
        fs = "community"

    valid_file_systems = ["mydata", "community",  "published","none"]
    valid_work_systems = ["stampede3", "ls6", "frontera", "none"]
    for thisW in valid_work_systems:
        valid_file_systems.append(f'work/{thisW}')
    # print('valid_file_systems',valid_file_systems)

    if fs not in valid_file_systems:
        raise ValueError(f"file_system='{file_system}' not in {sorted(valid_file_systems)}")

    cache_path = Path(os.path.expanduser(paths_file_path))
    # print('cache_path',cache_path)

    # ----------------------------
    # helper: normalize URIs
    # ----------------------------
    def _with_scheme(u: str) -> str:
        u = u.strip()
        if not u:
            return u
        if not u.startswith("tapis://"):
            u = "tapis://" + u.lstrip("/")
        # if not u.endswith("/"):
        #     u += "/"
        u = u.rstrip("/")
        return u

    # ----------------------------
    # try reading existing cache
    # ----------------------------
    paths: Dict = {}
    if cache_path.exists() and not force_refresh:
        try:
            with cache_path.open("r", encoding="utf-8") as f:
                paths = json.load(f)
                print(f'found paths file: {cache_path}')
        except Exception:
            paths = {}
            
    # quick return if cache satisfies the request
    def _maybe_return_from_cache() -> Optional[Union[str, Dict]]:
        if not paths:
            return None
        if fs == "none":
            return paths
        if fs in {"mydata", "community", "published"}:
            val = paths.get(fs)
            if isinstance(val, str) and val:
                return _with_scheme(val)
        if "work" in fs:
            val = paths.get(fs)
            if isinstance(val, str) and val:
                return _with_scheme(val)
        return None

    cached = _maybe_return_from_cache()
    if cached is not None:
        return cached

    # ----------------------------
    # (re)discover all bases
    # ----------------------------
    try:
        username = OpsUtils.get_tapis_username(t)
    except Exception as e:
        raise RuntimeError(f"Could not determine Tapis username: {e}")

    discovered: Dict = {
        "mydata": _with_scheme(f"designsafe.storage.default/{username}"),
        "community": _with_scheme("designsafe.storage.community"),
        "published": _with_scheme("designsafe.storage.published"),
    }

    # Discover Work bases using the new inner helper
    for system in ("stampede3", "ls6", "frontera"):
        try:
            base_uri = OpsUtils.get_user_work_tapis_uri(t, system_id=system)
            discovered[f'work/{system}'] = _with_scheme(base_uri)  # idempotent
        except Exception:
            # Skip systems we can't resolve; they can be refreshed later
            continue

    # Persist to cache

    cache_path.parent.mkdir(parents=True, exist_ok=True)
    with cache_path.open("w", encoding="utf-8") as f:
        json.dump(discovered, f, indent=2)
        print(f'saved data to {cache_path}')

    # Return per request
    if fs == "none":
        return discovered
    else:
        return discovered[fs]

    return discovered
allPaths = OpsUtils.get_user_path_tapis_uri(t,force_refresh=True)
for key,value in allPaths.items():
    print(key,value)
saved data to /home/jupyter/MyData/.tapis_user_paths.json
mydata tapis://designsafe.storage.default/silvia
community tapis://designsafe.storage.community
published tapis://designsafe.storage.published
work/stampede3 tapis://cloud.data/work/05072/silvia/stampede3
work/ls6 tapis://cloud.data/work/05072/silvia/ls6
work/frontera tapis://cloud.data/work/05072/silvia/frontera
#specify a specific value of file_system
thisPath = OpsUtils.get_user_path_tapis_uri(t,file_system='MyData')
print('thisPath:',thisPath)
found paths file: /home/jupyter/MyData/.tapis_user_paths.json
thisPath: tapis://designsafe.storage.default/silvia
#specify a specific value of file_system
thisPath = OpsUtils.get_user_path_tapis_uri(t,file_system='Work/stampede3')
print('thisPath:',thisPath)
found paths file: /home/jupyter/MyData/.tapis_user_paths.json
thisPath: tapis://cloud.data/work/05072/silvia/stampede3

Alternate Semi-Automatic Method: Copy from the Web Portal#

If you don’t have a base yet (or you’re exploring):

  1. Open an app page (e.g., OpenSeesMP on Stampede3).

  2. Click the folder icon to browse.

  3. Navigate to the target directory and select it.

  4. Copy the displayed Tapis URI, e.g.

    tapis://cloud.data/work/05072/jdoe/stampede3/somefolder
    
  5. Keep the base portion for reuse:

    tapis://cloud.data/work/05072/jdoe/stampede3/
    

Once you get the hang of this, it’s quick—then move that base into your saved JSON so you never have to browse again.


Quick Reminders#

  • Tapis paths are system-rooted, not CWD-rooted (different from Jupyter).

  • Use Work for HPC I/O; copy anything you want to keep from Scratch into Work or MyData.

  • Save your bases once; append relative subpaths in all your scripts and job submissions.