get_tapis_jobs()#
get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500)
This function searches for Tapis jobs on a platform like DesignSafe (through the Python Tapis client t) based on flexible selection criteria you provide. It returns:
a list of matching job UUIDs, and
a dataframe of all matching job metadata.
It’s designed to support:
Filtering on time ranges or specific dates for job lifecycle fields (like created, remoteStarted, ended, lastUpdated).
Filtering on exact matches for other metadata fields.
It can also optionally print out the results.
How it works, step by step#
Loads job data
Calls your utility OpsUtils.get_tapis_jobs_df() to get a dataframe of up to NmaxJobs jobs from Tapis, with full metadata.
Loops through your SelectCriteria dictionary, where each key is a field name (like created, status, appId) and the value is:
either a list (for ranges or multiple matches), or
a single value (for exact matching).
Handles time fields specially:
For created, remoteStarted, ended, or lastUpdated:
Converts them to Unix timestamps using convert_time_unix.
If you provide a list of two dates, filters between them (inclusive time range).
If you provide a single date (YYYY-MM-DD), filters for jobs matching that exact day.
Handles other fields as standard filters:
If you give a list, uses .isin() to match any of the values.
If you give a single value, matches exactly.
Collects the UUIDs of matching jobs into filtered_uuid.
Optionally displays the UUID list and the filtered dataframe.
Returns both:
(filtered_uuid, filtered_df)
Example use#
SelectCriteria = {
'created': ['2025-06-01', '2025-06-30'],
'status': ['FINISHED', 'FAILED'],
'appId': 'opensees-mp'
}
uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)
This would:
Find all jobs created in June 2025,
with status either FINISHED or FAILED,
and submitted to the opensees-mp app.
How it handles dates vs. lists#
Field value type |
Behavior |
|---|---|
[‘2025-06-01’, ‘2025-06-30’] |
Filter between two dates (time range) |
‘2025-06-15’ |
Exact match on that date |
[‘FINISHED’, ‘FAILED’] |
Filter matching any of these statuses |
‘opensees-mp’ |
Exact match on appId |
Why this is powerful#
It lets you run complex, multi-field searches over your Tapis job history — filtering by date ranges, statuses, app IDs, or any other job metadata in a single call.
This is essential for managing large-scale or repeated computational workflows on platforms like DesignSafe.
How SelectCriteria values work#
Field |
SelectCriteria value |
Behavior |
|---|---|---|
created, remoteStarted, |
[‘YYYY-MM-DD’, ‘YYYY-MM-DD’] |
Filters jobs between two dates (inclusive) |
‘YYYY-MM-DD’ |
Filters jobs on that exact day |
|
status, appId, etc. |
[‘val1’, ‘val2’] |
Filters jobs matching any of the values |
‘val’ |
Filters jobs matching exactly that value |
Example call#
SelectCriteria = {
'created': ['2025-06-01', '2025-06-30'],
'status': ['FINISHED', 'FAILED'],
'appId': 'opensees-mp'
}
uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)
Example snippet of the returned dataframe#
uuid |
name |
status |
created |
appId |
… |
|---|---|---|---|---|---|
12ab-34cd-56ef… |
job_case_001 |
FINISHED |
2025-06-15T10:23:45Z |
opensees-mp |
… |
78gh-90ij-12kl… |
job_case_002 |
FAILED |
2025-06-20T14:18:02Z |
opensees-mp |
… |
In short:
Lists: interpreted as ranges for date fields, or multiple options for other fields.
Strings: treated as an exact match. This makes your function extremely flexible for filtering any combination of time and job metadata on Tapis.
Files#
You can find these files in Community Data.
get_tapis_jobs.py
def get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500):
"""
Filter Tapis jobs based on flexible selection criteria, including time ranges,
specific dates, status, appId, or any other metadata field.
This function pulls a DataFrame of your jobs via get_tapis_jobs_df, then applies
the filters specified in SelectCriteria to return only the matching jobs.
Supports:
----------
- Date range filtering (for 'created', 'remoteStarted', 'ended', 'lastUpdated')
by providing a list like ['YYYY-MM-DD', 'YYYY-MM-DD'].
- Single date filtering by providing a single 'YYYY-MM-DD' string.
- Filtering on any other field using exact match or a list of values.
Parameters
----------
t : Tapis
An authenticated Tapis client (from connect_tapis()).
SelectCriteria : dict
Dictionary where keys are field names and values are:
- A single value for exact matching, e.g. {'status': 'FINISHED'}
- A list of values for multiple match, e.g. {'status': ['FINISHED', 'FAILED']}
- For time fields, a list of two dates for range filtering,
or a single date string for exact date matching.
displayIt : bool, str, default=False
If True, prints and displays all filtered UUIDs and metadata.
If 'head', only prints the top of the filtered DataFrame.
NmaxJobs : int, default=500
Max number of jobs to retrieve from Tapis before filtering.
Returns
-------
(list, DataFrame)
A tuple containing:
- List of UUIDs of jobs that matched the filters.
- The filtered Pandas DataFrame itself.
Example
-------
SelectCriteria = {
'created': ['2025-06-01', '2025-06-30'],
'status': ['FINISHED', 'FAILED'],
'appId': 'opensees-mp'
}
uuids, df = get_tapis_job(t, SelectCriteria, displayIt=True)
"""
# Silvia Mazzoni, 2025
from datetime import datetime, timezone
import re
from OpsUtils import OpsUtils
filtered_df = OpsUtils.get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500)
for key, values in SelectCriteria.items():
if key in ['created', 'remoteStarted', 'ended','lastUpdated']:
filtered_df[f'{key}_unix'] = filtered_df[key].apply(OpsUtils.convert_time_unix)
if isinstance(values, list):
if len(values) == 2:
min_time = OpsUtils.convert_time_unix(values[0])
max_time = OpsUtils.convert_time_unix(values[1])
filtered_df = filtered_df[
(filtered_df[f'{key}_unix'] >= min_time) &
(filtered_df[f'{key}_unix'] <= max_time)
]
else:
return -1 # invalid list length
else:
# Single date filtering
filtered_df[f'{key}_date'] = filtered_df[f'{key}_unix'].apply(
lambda x: datetime.fromtimestamp(x, tz=timezone.utc).date()
)
target_date = datetime.strptime(values, "%Y-%m-%d").date()
filtered_df = filtered_df[filtered_df[f'{key}_date'] == target_date]
else:
if isinstance(values, list):
filtered_df = filtered_df[filtered_df[key].isin(values)]
else:
filtered_df = filtered_df[filtered_df[key] == values]
filtered_uuid = list(filtered_df['uuid'])
if displayIt:
print('-- uuid --')
display(filtered_uuid)
print('-- Job Metadata --')
display(filtered_df)
return filtered_uuid, filtered_df