get_tapis_jobs_df()#
get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500)
This function retrieves your jobs from Tapis using the standard Tapis utility getJobList(), then converts them into a pandas DataFrame for easy exploration and filtering.
It provides a tabular, familiar way to inspect your Tapis job metadata, whether you want to quickly list all jobs, filter by app, status, or date, or prepare data for more advanced analysis.
How it works step by step#
1. Retrieves jobs using getJobList()#
jobslist = t.jobs.getJobList(limit=NmaxJobs)
getJobList() is a core Tapis utility that fetches job metadata.
It pulls high-level info about all your jobs, whether they were submitted via the Tapis API or through web portals (like OpenSeesMP on DesignSafe).
This is typically your first step in exploring job data.
We intentionally do not use search criteria on getJobList, because in practice they can be unreliable. Instead, we fetch everything and filter later in pandas.
2. Converts TapisResult objects to dictionaries#
jobsdicts = [job.__dict__ for job in jobslist]
Each returned item is a TapisResult — a custom Python object.
Calling . dict extracts its internal fields into a plain Python dictionary, making it easy to work with pandas.
3. Builds a pandas DataFrame#
import pandas as pd
df = pd.DataFrame(jobsdicts)
Turns your list of plain dictionaries into a rich, tabular dataframe.
4. Adds a numeric index column#
df["index_column"] = df.index
Useful for quickly referencing rows.
5. Reorders columns for convenience#
startCols = ['index_column', 'name', 'uuid', 'status', 'appId', 'appVersion']
Puts the most important metadata up front — if those columns exist.
Uses a safe check (
existingStartCols) so it doesn’t break if any are missing.
6. Optionally displays results#
Controlled by the displayIt argument:
True or ‘displayAll’ shows the entire dataframe.
‘head’ or ‘displayHead’ shows just the first few rows.
7. Returns the dataframe#
So you can filter it, search by app, group by date, or pass it to another function (like your get_tapis_job).
Why this is useful#
This function gives you a fast, robust way to pull your entire Tapis job history into a familiar pandas DataFrame.
It’s the foundation for:
Finding jobs by status, app, or submission window.
Generating summary tables.
Preparing lists of job UUIDs to drill down with functions like getJob, getJobOutputList, or getJobOutputDownload.
Example usage#
df = get_tapis_jobs_df(t, displayIt='head', NmaxJobs=1000)
Prints the top of the dataframe so you can quickly see your recent jobs.
Notes on getJobList() vs direct search#
getJobList() is the standard Tapis method to list jobs.
It pulls high-level metadata on all jobs, including:
uuid, name, status, created, appId, and more.
We avoid filtering in the API call itself and instead rely on pandas for local, flexible, reliable filtering.
In short#
get_tapis_jobs_df uses Tapis getJobList() to fetch all your jobs, converts them to a pandas DataFrame, puts the most important columns up front, and lets you immediately explore or process your job history.
Example process in Python:#
# pull jobs
jobslist = t.jobs.getJobList(limit=500)
# convert to dicts
jobsdicts = [job.__dict__ for job in jobslist]
# build dataframe
import pandas as pd
df = pd.DataFrame(jobsdicts)
# reorder columns, add index
df["index_column"] = df.index
This gives you a powerful local snapshot of your jobs, ready for filtering, querying, or driving downstream workflows.
Files#
You can find these files in Community Data.
get_tapis_jobs_df.py
def get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500):
"""
Retrieve a list of jobs from Tapis and organize them into a Pandas DataFrame.
This function fetches up to NmaxJobs from the user's Tapis account, converts the
results into a structured DataFrame, adds a convenient index column, and moves key
metadata columns (like name, uuid, status) to the front for easier exploration.
It can also optionally display the DataFrame (entire or just the head) right in
the notebook for quick inspection.
Parameters
----------
t : Tapis
An authenticated Tapis client (from connect_tapis()).
displayIt : bool or str, default=False
If 'head' or 'displayHead', displays only the first few rows.
If True or 'displayAll', displays the entire DataFrame.
If False, no display output (just returns the DataFrame).
NmaxJobs : int, default=500
Maximum number of jobs to retrieve from Tapis.
Returns
-------
pandas.DataFrame
DataFrame containing metadata for the fetched jobs.
Example
-------
df = get_tapis_jobs_df(t, displayIt='head', NmaxJobs=1000)
"""
# Silvia Mazzoni, 2025
from datetime import datetime, timezone
import pandas as pd
# Get jobs from Tapis
jobslist = t.jobs.getJobList(limit=NmaxJobs)
# Convert TapisResult objects to dictionaries
jobsdicts = [job.__dict__ for job in jobslist]
# Build DataFrame
df = pd.DataFrame(jobsdicts)
# Add index column for convenience
df["index_column"] = df.index
# add formatted data
for thisK in ['created','remoteStarted', 'ended','lastUpdated']:
df[f'{thisK}_dt'] = pd.to_datetime(df[thisK], utc=True)
df[f'{thisK}_unix'] = df[f'{thisK}_dt'].astype('int64') // 10**9
df[f'{thisK}_date'] = df[f'{thisK}_unix'].apply(
lambda x: datetime.fromtimestamp(x, tz=timezone.utc).date()
)
# Reorder columns: put key ones first if they exist
startCols = ['index_column', 'name', 'uuid', 'status', 'appId', 'appVersion']
existingStartCols = [col for col in startCols if col in df.columns]
remainingCols = [col for col in df.columns if col not in existingStartCols]
columns = existingStartCols + remainingCols
df = df[columns]
# Optional display logic
if displayIt != False:
print(f'Found {len(df)} jobs')
if displayIt in [True] or displayIt.lower() in ['display','displayall','all']:
display(df)
elif displayIt.lower() in ['head', 'displayHead']:
display(df.head())
return df