get_tapis_jobs()#

get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500)

This function searches for Tapis jobs on a platform like DesignSafe (through the Python Tapis client t) based on flexible selection criteria you provide. It returns:

  • a list of matching job UUIDs, and

  • a dataframe of all matching job metadata.

It’s designed to support:

  • Filtering on time ranges or specific dates for job lifecycle fields (like created, remoteStarted, ended, lastUpdated).

  • Filtering on exact matches for other metadata fields.

It can also optionally print out the results.


How it works, step by step#

  1. Loads job data

    • Calls your utility OpsUtils.get_tapis_jobs_df() to get a dataframe of up to NmaxJobs jobs from Tapis, with full metadata.

  2. Loops through your SelectCriteria dictionary, where each key is a field name (like created, status, appId) and the value is:

    • either a list (for ranges or multiple matches), or

    • a single value (for exact matching).

  3. Handles time fields specially:

    • For created, remoteStarted, ended, or lastUpdated:

      • Converts them to Unix timestamps using convert_time_unix.

      • If you provide a list of two dates, filters between them (inclusive time range).

      • If you provide a single date (YYYY-MM-DD), filters for jobs matching that exact day.

  4. Handles other fields as standard filters:

    • If you give a list, uses .isin() to match any of the values.

    • If you give a single value, matches exactly.

  5. Collects the UUIDs of matching jobs into filtered_uuid.

  6. Optionally displays the UUID list and the filtered dataframe.

  7. Returns both:

(filtered_uuid, filtered_df)

Example use#

SelectCriteria = {
    'created': ['2025-06-01', '2025-06-30'],
    'status': ['FINISHED', 'FAILED'],
    'appId': 'opensees-mp'
}

uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)

This would:

  • Find all jobs created in June 2025,

  • with status either FINISHED or FAILED,

  • and submitted to the opensees-mp app.


How it handles dates vs. lists#

Field value type

Behavior

[‘2025-06-01’, ‘2025-06-30’]

Filter between two dates (time range)

‘2025-06-15’

Exact match on that date

[‘FINISHED’, ‘FAILED’]

Filter matching any of these statuses

‘opensees-mp’

Exact match on appId


Why this is powerful#

It lets you run complex, multi-field searches over your Tapis job history — filtering by date ranges, statuses, app IDs, or any other job metadata in a single call.

This is essential for managing large-scale or repeated computational workflows on platforms like DesignSafe.


How SelectCriteria values work#

Field

SelectCriteria value

Behavior

created, remoteStarted,
ended, lastUpdated

[‘YYYY-MM-DD’, ‘YYYY-MM-DD’]

Filters jobs between two dates (inclusive)

‘YYYY-MM-DD’

Filters jobs on that exact day

status, appId, etc.

[‘val1’, ‘val2’]

Filters jobs matching any of the values

‘val’

Filters jobs matching exactly that value


Example call#

SelectCriteria = {
    'created': ['2025-06-01', '2025-06-30'],
    'status': ['FINISHED', 'FAILED'],
    'appId': 'opensees-mp'
}

uuids, df = get_tapis_jobs(t, SelectCriteria, displayIt=True)

Example snippet of the returned dataframe#

uuid

name

status

created

appId

12ab-34cd-56ef…

job_case_001

FINISHED

2025-06-15T10:23:45Z

opensees-mp

78gh-90ij-12kl…

job_case_002

FAILED

2025-06-20T14:18:02Z

opensees-mp


In short:

  • Lists: interpreted as ranges for date fields, or multiple options for other fields.

  • Strings: treated as an exact match. This makes your function extremely flexible for filtering any combination of time and job metadata on Tapis.

Files#

You can find these files in Community Data.

get_tapis_jobs.py
def get_tapis_jobs(t, SelectCriteria, displayIt=False, NmaxJobs=500):
    """
    Filter Tapis jobs based on flexible selection criteria, including time ranges, 
    specific dates, status, appId, or any other metadata field.

    This function pulls a DataFrame of your jobs via get_tapis_jobs_df, then applies
    the filters specified in SelectCriteria to return only the matching jobs.

    Supports:
    ----------
    - Date range filtering (for 'created', 'remoteStarted', 'ended', 'lastUpdated')
      by providing a list like ['YYYY-MM-DD', 'YYYY-MM-DD'].
    - Single date filtering by providing a single 'YYYY-MM-DD' string.
    - Filtering on any other field using exact match or a list of values.

    Parameters
    ----------
    t : Tapis
        An authenticated Tapis client (from connect_tapis()).

    SelectCriteria : dict
        Dictionary where keys are field names and values are:
        - A single value for exact matching, e.g. {'status': 'FINISHED'}
        - A list of values for multiple match, e.g. {'status': ['FINISHED', 'FAILED']}
        - For time fields, a list of two dates for range filtering, 
          or a single date string for exact date matching.

    displayIt : bool, str, default=False
        If True, prints and displays all filtered UUIDs and metadata.
        If 'head', only prints the top of the filtered DataFrame.

    NmaxJobs : int, default=500
        Max number of jobs to retrieve from Tapis before filtering.

    Returns
    -------
    (list, DataFrame)
        A tuple containing:
        - List of UUIDs of jobs that matched the filters.
        - The filtered Pandas DataFrame itself.

    Example
    -------
    SelectCriteria = {
        'created': ['2025-06-01', '2025-06-30'],
        'status': ['FINISHED', 'FAILED'],
        'appId': 'opensees-mp'
    }
    uuids, df = get_tapis_job(t, SelectCriteria, displayIt=True)
    """
    # Silvia Mazzoni, 2025
    from datetime import datetime, timezone
    import re
    from OpsUtils import OpsUtils

    filtered_df = OpsUtils.get_tapis_jobs_df(t, displayIt=False, NmaxJobs=500)

    for key, values in SelectCriteria.items():
        if key in ['created', 'remoteStarted', 'ended','lastUpdated']:
            filtered_df[f'{key}_unix'] = filtered_df[key].apply(OpsUtils.convert_time_unix)
            if isinstance(values, list):
                if len(values) == 2:
                    min_time = OpsUtils.convert_time_unix(values[0])
                    max_time = OpsUtils.convert_time_unix(values[1])
                    filtered_df = filtered_df[
                        (filtered_df[f'{key}_unix'] >= min_time) & 
                        (filtered_df[f'{key}_unix'] <= max_time)
                    ]
                else:
                    return -1  # invalid list length
            else:
                # Single date filtering
                filtered_df[f'{key}_date'] = filtered_df[f'{key}_unix'].apply(
                    lambda x: datetime.fromtimestamp(x, tz=timezone.utc).date()
                )
                target_date = datetime.strptime(values, "%Y-%m-%d").date()
                filtered_df = filtered_df[filtered_df[f'{key}_date'] == target_date]
        else:
            if isinstance(values, list):
                filtered_df = filtered_df[filtered_df[key].isin(values)]
            else:
                filtered_df = filtered_df[filtered_df[key] == values]

    filtered_uuid = list(filtered_df['uuid'])

    if displayIt:
        print('-- uuid --')
        display(filtered_uuid)
        print('-- Job Metadata --')
        display(filtered_df)

    return filtered_uuid, filtered_df