get_tapis_jobs()

get_tapis_jobs()#

get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500)

This function searches for Tapis jobs on a platform like DesignSafe (through the Python Tapis client t) based on flexible selection criteria you provide. It returns:

a list of matching job UUIDs, and
a dataframe of all matching job metadata.

It’s designed to support:

Filtering on time ranges or specific dates for job lifecycle fields (like created, remoteStarted, ended, lastUpdated).
Filtering on exact matches for other metadata fields.

It can also optionally print out the results.

How it works, step by step#

Loads job data
- Calls your utility OpsUtils.get_tapis_jobs_df() to get a dataframe of up to NmaxJobs jobs from Tapis, with full metadata.
Loops through your SelectCriteria dictionary, where each key is a field name (like created, status, appId) and the value is:
- either a list (for ranges or multiple matches), or
- a single value (for exact matching).
Handles time fields specially:
- For created, remoteStarted, ended, or lastUpdated:
  - Converts them to Unix timestamps using convert_time_unix.
  - If you provide a list of two dates, filters between them (inclusive time range).
  - If you provide a single date (YYYY-MM-DD), filters for jobs matching that exact day.
Handles other fields as standard filters:
- If you give a list, uses .isin() to match any of the values.
- If you give a single value, matches exactly.
Collects the UUIDs of matching jobs into filtered_uuid.
Optionally displays the UUID list and the filtered dataframe.
Returns both:

(filtered_uuid, filtered_df)

Example use#

SelectCriteria = {
    'created': ['2025-06-01', '2025-06-30'],
    'status': ['FINISHED', 'FAILED'],
    'appId': 'opensees-mp'
}

uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)

This would:

Find all jobs created in June 2025,
with status either FINISHED or FAILED,
and submitted to the opensees-mp app.

How it handles dates vs. lists#

Field value type	Behavior
[‘2025-06-01’, ‘2025-06-30’]	Filter between two dates (time range)
‘2025-06-15’	Exact match on that date
[‘FINISHED’, ‘FAILED’]	Filter matching any of these statuses
‘opensees-mp’	Exact match on appId

Why this is powerful#

It lets you run complex, multi-field searches over your Tapis job history — filtering by date ranges, statuses, app IDs, or any other job metadata in a single call.

This is essential for managing large-scale or repeated computational workflows on platforms like DesignSafe.

How SelectCriteria values work#

Field	SelectCriteria value	Behavior
created, remoteStarted, ended, lastUpdated	[‘YYYY-MM-DD’, ‘YYYY-MM-DD’]	Filters jobs between two dates (inclusive)
	‘YYYY-MM-DD’	Filters jobs on that exact day
status, appId, etc.	[‘val1’, ‘val2’]	Filters jobs matching any of the values
	‘val’	Filters jobs matching exactly that value

Example call#

SelectCriteria = {
    'created': ['2025-06-01', '2025-06-30'],
    'status': ['FINISHED', 'FAILED'],
    'appId': 'opensees-mp'
}

uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)

Example snippet of the returned dataframe#

uuid	name	status	created	appId	…
12ab-34cd-56ef…	job_case_001	FINISHED	2025-06-15T10:23:45Z	opensees-mp	…
78gh-90ij-12kl…	job_case_002	FAILED	2025-06-20T14:18:02Z	opensees-mp	…

In short:

Lists: interpreted as ranges for date fields, or multiple options for other fields.
Strings: treated as an exact match. This makes your function extremely flexible for filtering any combination of time and job metadata on Tapis.

Files#

You can find these files in Community Data.

get_tapis_jobs.py

def get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500):
    """
    Filter Tapis jobs based on flexible selection criteria, including time ranges, 
    specific dates, status, appId, or any other metadata field.

    This function pulls a DataFrame of your jobs via get_tapis_jobs_df, then applies
    the filters specified in SelectCriteria to return only the matching jobs.

    Supports:
    ----------
    - Date range filtering (for 'created', 'remoteStarted', 'ended', 'lastUpdated')
      by providing a list like ['YYYY-MM-DD', 'YYYY-MM-DD'].
    - Single date filtering by providing a single 'YYYY-MM-DD' string.
    - Filtering on any other field using exact match or a list of values.

    Parameters
    ----------
    t : Tapis
        An authenticated Tapis client (from connect_tapis()).

    SelectCriteria : dict
        Dictionary where keys are field names and values are:
        - A single value for exact matching, e.g. {'status': 'FINISHED'}
        - A list of values for multiple match, e.g. {'status': ['FINISHED', 'FAILED']}
        - For time fields, a list of two dates for range filtering, 
          or a single date string for exact date matching.

    displayIt : bool, str, default=False
        If True, prints and displays all filtered UUIDs and metadata.
        If 'head', only prints the top of the filtered DataFrame.

    NmaxJobs : int, default=500
        Max number of jobs to retrieve from Tapis before filtering.

    Returns
    -------
    (list, DataFrame)
        A tuple containing:
        - List of UUIDs of jobs that matched the filters.
        - The filtered Pandas DataFrame itself.

    Example
    -------
    SelectCriteria = {
        'created': ['2025-06-01', '2025-06-30'],
        'status': ['FINISHED', 'FAILED'],
        'appId': 'opensees-mp'
    }
    uuids, df = get_tapis_job(t, SelectCriteria, displayIt=True)
    """
    # Silvia Mazzoni, 2025
    from datetime import datetime, timezone
    import re
    from OpsUtils import OpsUtils

    filtered_df = OpsUtils.get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500)

    for key, values in SelectCriteria.items():
        if key in ['created', 'remoteStarted', 'ended','lastUpdated']:
            filtered_df[f'{key}_unix'] = filtered_df[key].apply(OpsUtils.convert_time_unix)
            if isinstance(values, list):
                if len(values) == 2:
                    min_time = OpsUtils.convert_time_unix(values[0])
                    max_time = OpsUtils.convert_time_unix(values[1])
                    filtered_df = filtered_df[
                        (filtered_df[f'{key}_unix'] >= min_time) & 
                        (filtered_df[f'{key}_unix'] <= max_time)
                    ]
                else:
                    return -1  # invalid list length
            else:
                # Single date filtering
                filtered_df[f'{key}_date'] = filtered_df[f'{key}_unix'].apply(
                    lambda x: datetime.fromtimestamp(x, tz=timezone.utc).date()
                )
                target_date = datetime.strptime(values, "%Y-%m-%d").date()
                filtered_df = filtered_df[filtered_df[f'{key}_date'] == target_date]
        else:
            if isinstance(values, list):
                filtered_df = filtered_df[filtered_df[key].isin(values)]
            else:
                filtered_df = filtered_df[filtered_df[key] == values]

    filtered_uuid = list(filtered_df['uuid'])

    if displayIt:
        print('-- uuid --')
        display(filtered_uuid)
        print('-- Job Metadata --')
        display(filtered_df)

    return filtered_uuid, filtered_df