queryDF()#
queryDF(outKey, JobsData_df, key, values, displayIt=False)
This function provides a simple way to query a Pandas DataFrame for rows matching a set of values, then returns the corresponding values from another column (keyed by outKey).
In other words, it’s a flexible filter + select utility, designed for quickly extracting specific metadata (like a UUID) based on other column values (like an index or status).
It also includes:
Checks to ensure your column names are valid.
Automatic wrapping of values into a list if you pass a single value.
A compact optional display printout for quick interactive exploration.
Typical use#
This is especially helpful with Tapis job metadata DataFrames, for quickly extracting things like:
The uuid of a job by filtering on index_column
The status of a job by filtering on appId
Example#
uuid_result = queryDF(**uuid**, filtered_df, **index_column**, 388, True)
This will:
Filter filtered_df where index_column == 388
Return the corresponding uuid
Print the found mapping if displayIt=True.
Example DataFrame#
The Dataframe could be a slice of your Tapis jobs dataframe:
import pandas as pd
data = {
'index_column': [101, 102, 103, 104],
'uuid': ['abc-1', 'def-2', 'ghi-3', 'jkl-4'],
'status': ['FINISHED', 'FAILED', 'FINISHED', 'QUEUED']
}
df = pd.DataFrame(data)
# Example 1: Single value match
uuid_result = OpsUtils.queryDF('uuid', df, 'index_column', 103, displayIt=True)
# Example 2: Multiple value match
uuids_for_status = OpsUtils.queryDF('uuid', df, 'status', ['FINISHED', 'QUEUED'], displayIt=True)
Produces:
outKey=ghi-3 for index_column=[103]
outKey=['abc-1', 'ghi-3', 'jkl-4'] for status=['FINISHED', 'QUEUED']
Files#
You can find these files in Community Data.
queryDF.py
def queryDF(outKey, JobsData_df, key, values, displayIt=False):
"""
Query a Pandas DataFrame to extract values from one column (outKey)
where another column (key) matches specified values.
This function checks that both specified columns exist, automatically
wraps a single value into a list, filters the DataFrame, and returns
the resulting list of values from the outKey column.
If only one result is found, returns it as a scalar instead of a list.
Parameters
----------
outKey : str
The column name to extract values from.
JobsData_df : pandas.DataFrame
The DataFrame to query.
key : str
The column name to filter on.
values : list, set, tuple, or single value
The values to filter by in the `key` column.
displayIt : bool, default=False
If True, prints a summary of the extracted values.
Returns
-------
list or single value
The extracted values from `outKey` for the matching rows.
Returns a scalar if only one match.
Example
-------
uuid_result = queryDF('uuid', filtered_df, 'index_column', 388, True)
Author
------
Silvia Mazzoni, DesignSafe (silviamazzoni@yahoo.com)
Date
----
2025-08-14
Version
-------
1.0
"""
keyList = list(JobsData_df.keys())
if not isinstance(values, (list, set, tuple)):
values = [values]
if not outKey in keyList:
print(f'{outKey} is not a valid key for the dataframe! Please select from the following:',keyList)
return []
if not key in JobsData_df.keys():
print(f'{key} is not a valid key for the dataframe! Please select from the following:',keyList)
return []
thisOut = list(JobsData_df[JobsData_df[key].isin(values)][outKey])
if len(thisOut)==1:
thisOut = thisOut[0]
if displayIt:
print(f'outKey={thisOut} for {key}={values}')
return thisOut
# example:
# a=queryDF('uuid',filtered_df,'index_column',388,True)