get_tapis_jobs_df()

get_tapis_jobs_df()#

get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500)

This function retrieves your jobs from Tapis using the standard Tapis utility getJobList(), then converts them into a pandas DataFrame for easy exploration and filtering.

It provides a tabular, familiar way to inspect your Tapis job metadata, whether you want to quickly list all jobs, filter by app, status, or date, or prepare data for more advanced analysis.

How it works step by step#

1. Retrieves jobs using `getJobList()`#

jobslist = t.jobs.getJobList(limit=NmaxJobs)

getJobList() is a core Tapis utility that fetches job metadata.
It pulls high-level info about all your jobs, whether they were submitted via the Tapis API or through web portals (like OpenSeesMP on DesignSafe).
This is typically your first step in exploring job data.
We intentionally do not use search criteria on getJobList, because in practice they can be unreliable. Instead, we fetch everything and filter later in pandas.

2. Converts TapisResult objects to dictionaries#

jobsdicts = [job.__dict__ for job in jobslist]

Each returned item is a TapisResult — a custom Python object.
Calling . dict extracts its internal fields into a plain Python dictionary, making it easy to work with pandas.

3. Builds a pandas DataFrame#

import pandas as pd
df = pd.DataFrame(jobsdicts)

Turns your list of plain dictionaries into a rich, tabular dataframe.

4. Adds a numeric index column#

df["index_column"] = df.index

Useful for quickly referencing rows.

5. Reorders columns for convenience#

startCols = ['index_column', 'name', 'uuid', 'status', 'appId', 'appVersion']

Puts the most important metadata up front — if those columns exist.
Uses a safe check (existingStartCols) so it doesn’t break if any are missing.

6. Optionally displays results#

Controlled by the displayIt argument:
- True or ‘displayAll’ shows the entire dataframe.
- ‘head’ or ‘displayHead’ shows just the first few rows.

7. Returns the dataframe#

So you can filter it, search by app, group by date, or pass it to another function (like your get_tapis_job).

Why this is useful#

This function gives you a fast, robust way to pull your entire Tapis job history into a familiar pandas DataFrame.
It’s the foundation for:
Finding jobs by status, app, or submission window.
Generating summary tables.
Preparing lists of job UUIDs to drill down with functions like getJob, getJobOutputList, or getJobOutputDownload.

Example usage#

df = get_tapis_jobs_df(t, displayIt='head', NmaxJobs=1000)

Prints the top of the dataframe so you can quickly see your recent jobs.

Notes on getJobList() vs direct search#

getJobList() is the standard Tapis method to list jobs.
It pulls high-level metadata on all jobs, including:
- uuid, name, status, created, appId, and more.
We avoid filtering in the API call itself and instead rely on pandas for local, flexible, reliable filtering.

In short#

get_tapis_jobs_df uses Tapis getJobList() to fetch all your jobs, converts them to a pandas DataFrame, puts the most important columns up front, and lets you immediately explore or process your job history.

Example process in Python:#

# pull jobs
jobslist = t.jobs.getJobList(limit=500)
# convert to dicts
jobsdicts = [job.__dict__ for job in jobslist]
# build dataframe
import pandas as pd
df = pd.DataFrame(jobsdicts)
# reorder columns, add index
df["index_column"] = df.index

This gives you a powerful local snapshot of your jobs, ready for filtering, querying, or driving downstream workflows.

Files#

You can find these files in Community Data.

get_tapis_jobs_df.py

def get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500):
    """
    Retrieve a list of jobs from Tapis and organize them into a Pandas DataFrame.

    This function fetches up to NmaxJobs from the user's Tapis account, converts the 
    results into a structured DataFrame, adds a convenient index column, and moves key 
    metadata columns (like name, uuid, status) to the front for easier exploration.

    It can also optionally display the DataFrame (entire or just the head) right in 
    the notebook for quick inspection.

    Parameters
    ----------
    t : Tapis
        An authenticated Tapis client (from connect_tapis()).

    displayIt : bool or str, default=False
        If 'head' or 'displayHead', displays only the first few rows.
        If True or 'displayAll', displays the entire DataFrame.
        If False, no display output (just returns the DataFrame).

    NmaxJobs : int, default=500
        Maximum number of jobs to retrieve from Tapis.

    Returns
    -------
    pandas.DataFrame
        DataFrame containing metadata for the fetched jobs.

    Example
    -------
    df = get_tapis_jobs_df(t, displayIt='head', NmaxJobs=1000)
    """
    # Silvia Mazzoni, 2025

    from datetime import datetime, timezone
    import pandas as pd

    # Get jobs from Tapis
    jobslist = t.jobs.getJobList(limit=NmaxJobs)
    
    # Convert TapisResult objects to dictionaries
    jobsdicts = [job.__dict__ for job in jobslist]
    
    # Build DataFrame
    df = pd.DataFrame(jobsdicts)
    
    # Add index column for convenience
    df["index_column"] = df.index
    
    # add formatted data
    for thisK in ['created','remoteStarted', 'ended','lastUpdated']:
        df[f'{thisK}_dt'] = pd.to_datetime(df[thisK], utc=True)
        df[f'{thisK}_unix'] = df[f'{thisK}_dt'].astype('int64') // 10**9
        df[f'{thisK}_date'] = df[f'{thisK}_unix'].apply(
                    lambda x: datetime.fromtimestamp(x, tz=timezone.utc).date()
                )

    
    # Reorder columns: put key ones first if they exist
    startCols = ['index_column', 'name', 'uuid', 'status', 'appId', 'appVersion']
    existingStartCols = [col for col in startCols if col in df.columns]
    remainingCols = [col for col in df.columns if col not in existingStartCols]
    columns = existingStartCols + remainingCols
    df = df[columns]
    
    # Optional display logic
    if displayIt != False:
        print(f'Found {len(df)} jobs')
        
        if displayIt in [True] or displayIt.lower() in ['display','displayall','all']:
            display(df)
        elif displayIt.lower() in ['head', 'displayHead']:
            display(df.head())
    
    return df