get_tapis_job_all_files()

get_tapis_job_all_files()#

get_tapis_job_all_files(t, jobUuid, displayIt=10, local_base_dir=False, overwrite=False)

Overview#

The get_tapis_job_all_files function is a powerful utility for working with Tapis jobs. It recursively explores all output files produced by a given Tapis job, returning:

Relative local-style file paths (helpful for mirroring directory structures when downloading).
Full absolute Tapis system paths (needed for direct API calls or future metadata checks).
Raw item metadata returned by Tapis (includes type, size, modification times, etc.).
Total file count.

Additionally, it can automatically download all found files into a local directory, preserving the directory hierarchy of the remote Tapis job.

Function Signature#

get_tapis_job_all_files(
    t, jobUuid,
    displayIt=10,
    local_base_dir=False,
    overwrite=False
)

Parameters#

Parameter	Type	Description
t	Tapis	An authenticated Tapis client (typically from connect_tapis()).
jobUuid	str	The UUID of the Tapis job whose outputs you wish to list or download.
displayIt	bool or int, optional	Controls printing of the directory tree: False or 0: silent (prints nothing). True or 1: prints all files in all directories. >=2: prints up to displayIt files per directory, then shows a message indicating suppressed output. Default is 10.
local_base_dir	bool, None, or str, optional	Determines download behavior: False or None: does not download (just lists). True: downloads files into ./OutFiles_{jobUuid} by default. str: downloads files into the specified directory. Default is False.
overwrite	bool, optional	If True, existing local files will be overwritten. Default is False (skips existing files).

Returns#

Returns a dict with:

Key	Type	Description
‘Nfiles’	int	Total number of files found (excluding directories).
‘LocalPath’	list[str]	Relative file paths (suitable for recreating the directory structure locally).
‘FullPath’	list[str]	Absolute paths on the Tapis system.
‘Items’	list	Raw Tapis file metadata objects (contains type, size, lastModified, etc.).

Behavior Summary#

local_base_dir value	What happens
False or None	Only lists files, no downloads performed.
True	Downloads files into a default local folder ./OutFiles_{jobUuid}.
“mydir” (string)	Downloads files into the specified directory.

All downloads preserve the original remote directory structure.

Example Usage#

Just list the files, print up to 5 per directory#

outputs = get_tapis_job_all_files(t, jobUuid, displayIt=5)
print(outputs['Nfiles'], "files found.")

List all files silently#

outputs = get_tapis_job_all_files(t, jobUuid, displayIt=False)

Download into default folder#

outputs = get_tapis_job_all_files(t, jobUuid, local_base_dir=True)

This will download all files into a local folder:

./OutFiles_{jobUuid}/results/output.txt
./OutFiles_{jobUuid}/logs/run.log
...

Download into a custom folder, overwriting any existing files#

outputs = get_tapis_job_all_files(
    t, jobUuid, 
    local_base_dir="MyDownloads",
    overwrite=True
)

Notes#

The ‘Items’ list provides the original Tapis metadata objects for each file, which can include:
- type (file or directory)
- length (bytes)
- lastModified
Downloads are streamed in binary, and this function creates all necessary subfolders under local_base_dir.
If overwrite=False (default), existing local files are skipped with a message.

Recommended next steps:

Use ‘LocalPath’ + ‘FullPath’ pairs for further file processing, logs, or reporting.
Integrate with data analysis pipelines that take a structured folder of downloaded results.

Absolutely! 🎉 Here’s a beautifully structured Quickstart panel you can drop into your Jupyter Book (or any markdown docs). It shows side-by-side typical use cases, so users quickly see how to leverage your function.

Quickstart Panel: Using get_tapis_job_all_files#

Typical workflows#

Example	What it does
Just list files, print up to 5 per directory:	python outputs = get_tapis_job_all_files(t, jobUuid, displayIt=5) print(f”Found {outputs[‘Nfiles’]} files.”)
List all files without printing anything:	python outputs = get_tapis_job_all_files(t, jobUuid, displayIt=False)
Download all files into default folder:	python outputs = get_tapis_job_all_files(t, jobUuid, local_base_dir=True) This creates ./OutFiles_{jobUuid}/… preserving directory structure.
Download into a custom folder, overwrite existing files:	python outputs = get_tapis_job_all_files(t, jobUuid, local_base_dir=”MyResults”, overwrite=True)

What’s returned?#

You always get back a dictionary like:

{
    'Nfiles': 42,
    'LocalPath': ['results/data.csv', 'logs/run.log', ...],
    'FullPath': ['/tapis/jobs/v2/job-outputs/.../data.csv', ...],
    'Items': [<TapisItem>, <TapisItem>, ...]
}

Use this to:#

Build download or analysis pipelines.
Create logs of what was produced by your HPC jobs.
Or simply verify that all expected outputs were generated.

Pro Tip#

Set displayIt=10 to show only the first 10 files per directory (helps with large jobs).
Change overwrite=True if you’re rerunning analyses and want to ensure fresh files.

Files#

You can find these files in Community Data.

get_tapis_job_all_files.py

import os
import ipywidgets as widgets
from IPython.display import display, clear_output
from html import escape


def get_tapis_job_all_files(
    t, jobUuid, 
    displayIt=10, 
    target_dir=False, 
    overwrite=False,
    display_file_content=True
):
    """
    Recursively retrieves all output files from a Tapis job, optionally downloading them.

    This function connects to the Tapis job output system, traverses the job's complete 
    output directory structure recursively, and collects:

    - Local-style relative file paths (to recreate directory structure on disk),
    - Full absolute Tapis paths for direct API use or metadata,
    - Raw item objects returned by Tapis (which include size, lastModified, etc.),
    - The total file count.

    It can also automatically download these files into a local directory, preserving
    the folder hierarchy.

    Parameters
    ----------
    t : Tapis
        An authenticated Tapis client object (typically created with connect_tapis()).

    jobUuid : str
        The UUID of the Tapis job whose output files you want to inspect or download.

    displayIt : bool or int, optional
        Controls printed output:
            - False or 0: completely silent.
            - True or 1: prints all files in all directories.
            - int >= 2: prints at most `displayIt` files per directory,
                        then indicates suppression.

    target_dir : bool, None, or str, optional
        Determines whether to download files:
            - False or None: does not download files, only lists them.
            - True: downloads files into a default directory './OutFiles_{jobUuid}'.
            - str: downloads files into the specified local directory.

    overwrite : bool, optional
        If True, overwrites existing local files. If False (default), skips already
        existing files.

    Returns
    -------
    dict
        {
            'Nfiles': total number of files found,
            'LocalPath': list of relative paths (like 'results/output.txt'),
            'FullPath': list of absolute Tapis paths (like '/tapis/jobs/v2/...'),
            'Items': list of raw Tapis file objects (metadata)
        }

    Examples
    --------
    # Just list files, print up to 5 per directory
    >>> outputs = get_tapis_job_all_files(t, jobUuid, displayIt=5)

    # List all files without printing anything
    >>> outputs = get_tapis_job_all_files(t, jobUuid, displayIt=False)

    # Download into default './OutFiles_{jobUuid}'
    >>> outputs = get_tapis_job_all_files(t, jobUuid, target_dir=True)

    # Download into a custom directory, overwriting if needed
    >>> outputs = get_tapis_job_all_files(t, jobUuid, target_dir="my_results", overwrite=True)

    Notes
    -----
    - Downloads replicate the Tapis directory structure inside the chosen local folder.
    - Use 'LocalPath' and 'FullPath' together to pair local save paths with original remote locations.
    - The 'Items' list provides full Tapis metadata for each file, which can be useful for logs.
    """
    # Silvia Mazzoni, 2025

    import os
    import OpsUtils

    import ipywidgets as widgets
    from IPython.display import display, clear_output
    from html import escape
    
    # thresholds you can tune
    TEXTAREA_CHAR_LIMIT = 200_000     # ~200 KB of text
    TEXTAREA_MAX_LINE   = 2_000       # lines wider than this favor <pre>
    
    def _bytes_to_text(data: bytes, path: str = "") -> str | None:
        """Return decoded text, or None if this looks binary."""
        if not isinstance(data, (bytes, bytearray)):
            return str(data)
        lower = path.lower()
        if lower.endswith((
            ".zip",".gz",".bz2",".xz",".tgz",".tar",
            ".png",".jpg",".jpeg",".gif",".pdf",".h5",".npy",".npz",
            ".so",".exe",".xlsx",".pptx",".docx"
        )):
            return None
        if b"\x00" in data:
            return None
        for enc in ("utf-8", "utf-16", "latin-1"):
            try:
                return data.decode(enc)
            except UnicodeDecodeError:
                pass
        return data.decode("utf-8", errors="replace")
    
    def _should_use_pre(text: str, nbytes: int) -> bool:
        """Heuristic chooser: True => HTML <pre>, False => Textarea."""
        if nbytes > TEXTAREA_CHAR_LIMIT:
            return True
        if len(text) > TEXTAREA_CHAR_LIMIT:
            return True
        # Check longest line (cap sample to avoid O(N) on huge files)
        it = iter(text.splitlines())
        longest = 0
        for _ in range(50_000):  # sample first 50k lines max
            try:
                ln = next(it)
            except StopIteration:
                break
            if len(ln) > longest:
                longest = len(ln)
                if longest > TEXTAREA_MAX_LINE:
                    return True
        return False
    
    def view_tapis_file_in_accordion(selected_path):
        view_select_out = widgets.Output()
        acc = widgets.Accordion(children=[view_select_out])
        acc.set_title(0, f" View File: {selected_path or '<none>'}")
        display(acc)
    
        with view_select_out:
            clear_output()
            if not selected_path:
                print(" No output file selected to download.")
                return
    
            data = t.jobs.getJobOutputDownload(jobUuid=jobUuid, outputPath=selected_path)
            text = _bytes_to_text(data, selected_path)
            print(f" Viewing: {selected_path}")
    
            if text is None:
                print(" (binary file; not displayed here)")
                return
    
            nbytes = len(data) if isinstance(data, (bytes, bytearray)) else len(text.encode("utf-8", "ignore"))
            default_view = "pre" if _should_use_pre(text, nbytes) else "ta"
    
            # prepare both widgets once
            html_w = widgets.HTML(
                value=f'<pre style="margin:0;white-space:pre;overflow:auto;max-height:500px;'
                      f'font-family:monospace;">{escape(text)}</pre>'
            )
            ta_w = widgets.Textarea(
                value=text,
                layout=widgets.Layout(width="100%", height="500px"),
                disabled=False
            )
    
            # tiny toggle to let the user switch
            toggle = widgets.ToggleButtons(
                options=[("HTML <pre>", "pre"), ("Textarea", "ta")],
                value=default_view, description="Viewer:"
            )
            holder = widgets.Output()
            display(toggle, holder)
    
            def _render(kind):
                holder.clear_output()
                with holder:
                    display(html_w if kind == "pre" else ta_w)
    
            _render(default_view)
            toggle.observe(lambda ch: (ch["name"] == "value") and _render(ch["new"]), names="value")


    # def _bytes_to_text(data: bytes, path: str = "") -> str | None:
    #     """Return decoded text, or None if this looks binary."""
    #     if not isinstance(data, (bytes, bytearray)):
    #         return str(data)
    
    #     lower = path.lower()
    #     # quick extension check for common binaries
    #     if lower.endswith((
    #         ".zip",".gz",".bz2",".xz",".tgz",".tar",
    #         ".png",".jpg",".jpeg",".gif",".pdf",".h5",".npy",".npz",
    #         ".so",".exe",".xlsx",".pptx",".docx"
    #     )):
    #         return None
    #     # null byte heuristic
    #     if b"\x00" in data:
    #         return None
    
    #     for enc in ("utf-8", "utf-16", "latin-1"):
    #         try:
    #             return data.decode(enc)
    #         except UnicodeDecodeError:
    #             pass
    #     # last resort: replace errors
    #     return data.decode("utf-8", errors="replace")


    # def view_tapis_file_in_accordion(selected_path):
    #     view_select_out = widgets.Output()
    #     view_select_out_acc = widgets.Accordion(children=[view_select_out])
    #     view_select_out_acc.set_title(0, f" View File: {selected_path}")
    #     if selected_path == '.':
    #         view_select_out_acc.selected_index = 0
    #     display(view_select_out_acc)
    
    #     with view_select_out:
    #         clear_output()
    #         if not selected_path:
    #             print(" No output file selected to download.")
    #             return
    
    #         data = t.jobs.getJobOutputDownload(jobUuid=jobUuid, outputPath=selected_path)
    #         text = _bytes_to_text(data, selected_path)
    #         print(f" Viewing: {selected_path}")
    
    #         if text is None:
    #             print(" (binary file; not displayed here)")
    #             return
    
    #         # Option A: HTML <pre> (fast for large text)
    #         html = widgets.HTML(
    #             value=f'<pre style="margin:0;white-space:pre;overflow:auto;max-height:500px;'
    #                   f'font-family:monospace;">{escape(text)}</pre>'
    #         )
    #         display(html)
    
    #         # Option B (alternative): Textarea (slower for huge files)
    #         # ta = widgets.Textarea(
    #         #     value=text,
    #         #     layout=widgets.Layout(width='100%', height='500px'),
    #         #     disabled=False
    #         # )
    #         # display(ta)



    # normalize displayIt
    if isinstance(displayIt, bool):
        displayLevel = 1 if displayIt else 0
        displayLimit = None
    elif isinstance(displayIt, int):
        displayLevel = 1
        displayLimit = displayIt if displayIt >= 2 else None
    else:
        displayLevel = 0
        displayLimit = None    

    if displayLevel>=1:
        import ipywidgets as widgets
        from IPython.display import display, clear_output
        from OpsUtils import OpsUtils
        filedata_out = widgets.Output()
        filedata_accordion = widgets.Accordion(children=[filedata_out])
        filedata_accordion.set_title(0, f'Job Filedata   ({jobUuid})')
        filedata_accordion.selected_index = 0
        display(filedata_accordion)
        
    if displayLevel>=1:
        with filedata_out:
            print('----------------------------')
            print(f'JOB: {jobUuid}')
            print('----------------------------')
    
    # determine local download dir
    if target_dir is True:
        download_dir = f"./OutFiles_{jobUuid}"
    elif isinstance(target_dir, str):
        download_dir = target_dir
    else:
        download_dir = None  # no download

    if displayLevel>=1:
        if download_dir != None:
            with filedata_out:
                print('----------------------------')
                print(f'TARGET DIR: {download_dir}')
                print('----------------------------')
    view_direct_out = widgets.Output()
    def get_files_recursive(view_direct_out,path=""):
        Nfiles = 0
        returnFiles = []
        returnFilesPath = []
        returnItems = []

        output_path = path if path else "."
        output_items = t.jobs.getJobOutputList(jobUuid=jobUuid, outputPath=output_path)

        # split into dirs vs files
        output_items_dirs = [item for item in output_items if getattr(item, "type", "") == "dir"]
        output_items_files = [item for item in output_items if getattr(item, "type", "") != "dir"]
        output_items_ordered = output_items_files + output_items_dirs

        
        printed_count = 0
        Nstopp = 0
        hereDisplay = True
        if displayLevel >= 1:
            if len(output_items_files)>0:
                firstCase = output_items_files[0].path
                dirr = os.path.dirname(firstCase)
                with view_direct_out:
                    print(f' {dirr}')
            with view_direct_out:
                print(f'  {len(output_items_files)} files & {len(output_items_dirs)} directories:')
            print(f'      {len(output_items_files)} files & {len(output_items_dirs)} directories')

        for item in output_items_ordered:
            remote_path = os.path.join(path, item.name) if path else item.name

            if getattr(item, "type", "") == "dir":
                if displayLevel >= 1:
                    # print('----------------------------')
                    # print(f'DIRECTORY: {remote_path}')
                    # print(f'DIRECTORY: {remote_path}\n{item.path}')
                    view_direct_out = widgets.Output()
                    view_direct_out_acc = widgets.Accordion(children=[view_direct_out])
                    view_direct_out_acc.set_title(0, f"DIRECTORY: {remote_path}")
                    # view_direct_out_acc.selected_index = 0
                    display(view_direct_out_acc)

                Nhere, hereFiles, hereFilesPath, hereItems = get_files_recursive(view_direct_out,remote_path)
                Nfiles += Nhere
                returnFiles.extend(hereFiles)
                returnFilesPath.extend(hereFilesPath)
                returnItems.extend(hereItems)
            else:
                returnFiles.append(remote_path)
                returnFilesPath.append(item.path)
                returnItems.append(item)
                Nfiles += 1

                # print tree
                if displayLevel >= 1 and (displayLimit is None or printed_count < displayLimit):
                    with view_direct_out:
                        if not download_dir:
                            if display_file_content:
                                view_tapis_file_in_accordion(remote_path)
                            else:
                                print(f'    FILE: {remote_path}')
                        printed_count += 1
                        if displayLimit is not None and printed_count == displayLimit:
                            Nstopp = Nfiles

                # download if needed
                if download_dir:
                    # print('download_dir',download_dir)
                    # print('remote_path',remote_path)
                    local_file_path = os.path.join(download_dir, remote_path)
                    homePath = os.path.expanduser('~')
                    local_file_path = os.path.join(homePath, local_file_path)
                    local_dir = os.path.dirname(local_file_path)
                    # print('local_file_path',local_file_path)
                    # print('local_dir',local_dir)
                    os.makedirs(local_dir, exist_ok=True)

                    if os.path.exists(local_file_path) and not overwrite:
                        if hereDisplay:
                            print(f"    [SKIP] {local_file_path} (already exists)")
                        continue

                    if hereDisplay:
                        print(f"        [DOWNLOADING] {remote_path} -> {local_file_path}")
                    data = t.jobs.getJobOutputDownload(jobUuid=jobUuid, outputPath=remote_path)
                    with open(local_file_path, "wb") as f:
                        f.write(data)

                if displayLevel >= 1 and hereDisplay and Nstopp != 0:
                    print(f'\n          ........(suppressing additional-file display beyond {displayLimit})')
                    hereDisplay = False

        
        
        return Nfiles, returnFiles, returnFilesPath, returnItems

    if displayIt:
        with filedata_out:
            print('----------------------------')
            # print('DIRECTORY: "."')
            view_direct_out = widgets.Output()
            view_direct_out_acc = widgets.Accordion(children=[view_direct_out])
            view_direct_out_acc.set_title(0, f'DIRECTORY: "."')
            view_direct_out_acc.selected_index = 0
            display(view_direct_out_acc)
            Nfiles, FileList, FilesPathList, itemsList = get_files_recursive(view_direct_out)
    else:
        Nfiles, FileList, FilesPathList, itemsList = get_files_recursive(view_direct_out)

    if displayIt:
        with filedata_out:
            print(f"\nA total of {Nfiles} job-output files have been found"
                  f"{' and downloaded' if download_dir else ''}"
                "!")

    return {
        'Nfiles': Nfiles,
        'LocalPath': FileList,
        'FullPath': FilesPathList,
        'Items': itemsList
    }