queryDF()#

queryDF(outKey, JobsData_df, key, values, displayIt=False)

This function provides a simple way to query a Pandas DataFrame for rows matching a set of values, then returns the corresponding values from another column (keyed by outKey).

In other words, it’s a flexible filter + select utility, designed for quickly extracting specific metadata (like a UUID) based on other column values (like an index or status).

It also includes:

  • Checks to ensure your column names are valid.

  • Automatic wrapping of values into a list if you pass a single value.

  • A compact optional display printout for quick interactive exploration.

Typical use#

This is especially helpful with Tapis job metadata DataFrames, for quickly extracting things like:

  • The uuid of a job by filtering on index_column

  • The status of a job by filtering on appId

Example#

uuid_result = queryDF(**uuid**, filtered_df, **index_column**, 388, True)

This will:

  • Filter filtered_df where index_column == 388

  • Return the corresponding uuid

  • Print the found mapping if displayIt=True.

Example DataFrame#

The Dataframe could be a slice of your Tapis jobs dataframe:

import pandas as pd

data = {
    'index_column': [101, 102, 103, 104],
    'uuid': ['abc-1', 'def-2', 'ghi-3', 'jkl-4'],
    'status': ['FINISHED', 'FAILED', 'FINISHED', 'QUEUED']
}
df = pd.DataFrame(data)

# Example 1: Single value match
uuid_result = OpsUtils.queryDF('uuid', df, 'index_column', 103, displayIt=True)

# Example 2: Multiple value match
uuids_for_status = OpsUtils.queryDF('uuid', df, 'status', ['FINISHED', 'QUEUED'], displayIt=True)

Produces:

outKey=ghi-3  for index_column=[103]
outKey=['abc-1', 'ghi-3', 'jkl-4']  for status=['FINISHED', 'QUEUED']

Files#

You can find these files in Community Data.

queryDF.py
def queryDF(outKey, JobsData_df, key, values, displayIt=False):
    """
    Query a Pandas DataFrame to extract values from one column (outKey)
    where another column (key) matches specified values.

    This function checks that both specified columns exist, automatically
    wraps a single value into a list, filters the DataFrame, and returns
    the resulting list of values from the outKey column.

    If only one result is found, returns it as a scalar instead of a list.

    Parameters
    ----------
    outKey : str
        The column name to extract values from.

    JobsData_df : pandas.DataFrame
        The DataFrame to query.

    key : str
        The column name to filter on.

    values : list, set, tuple, or single value
        The values to filter by in the `key` column.

    displayIt : bool, default=False
        If True, prints a summary of the extracted values.

    Returns
    -------
    list or single value
        The extracted values from `outKey` for the matching rows.
        Returns a scalar if only one match.

    Example
    -------
    uuid_result = queryDF('uuid', filtered_df, 'index_column', 388, True)

    Author
    ------
    Silvia Mazzoni, DesignSafe (silviamazzoni@yahoo.com)

    Date
    ----
    2025-08-14

    Version
    -------
    1.0
    """

    keyList = list(JobsData_df.keys())
    if not isinstance(values, (list, set, tuple)):
        values = [values]
    if not outKey in keyList:
        print(f'{outKey} is not a valid key for the dataframe! Please select from the following:',keyList)
        return []    
    if not key in JobsData_df.keys():
        print(f'{key} is not a valid key for the dataframe! Please select from the following:',keyList)
        return []
    thisOut = list(JobsData_df[JobsData_df[key].isin(values)][outKey])
    if len(thisOut)==1:
        thisOut = thisOut[0]
    if displayIt:
        print(f'outKey={thisOut}  for {key}={values}')
    return thisOut

# example:
# a=queryDF('uuid',filtered_df,'index_column',388,True)