flatten_dict()

flatten_dict()#

flatten_dict(d, parent_key=’’, sep=’.’)

This is a powerful recursive utility that takes a deeply nested data structure — such as JSON responses, Tapis job objects, or multi-level Python dictionaries — and flattens it into a single-level dictionary with compound keys.

It handles:

  • Nested dictionaries, joining keys with a separator (like .).

  • Lists, using index notation (e.g. key[0], key[1]).

  • TapisResult objects, by flattening their internal attributes.

  • JSON strings that are themselves dictionaries, parsing them and continuing to flatten.

  • Leaves simple values (None, strings, numbers, booleans) unchanged.

This is especially useful for:

  • Preparing complex API responses for DataFrame creation or CSV export.

  • Searching or filtering on nested keys.

  • Logging structured data in a flat, human-readable form.


Example input vs. output#

nested_test = {
    "job": {
        "id": "123",
        "status": "RUNNING",
        "details": {"queue": "normal", "nodes": 2},
        "history": [
            {"time": "2025-06-30T12:00:00Z", "event": "QUEUED"},
            {"time": "2025-06-30T12:10:00Z", "event": "RUNNING"}
        ]
    },
    "owner": "smazzoni"
}

flattened_test = OpsUtils.flatten_dict(nested_test)

print(flattened_test)

Produces:

{
    'job.id': '123',
    'job.status': 'RUNNING',
    'job.details.queue': 'normal',
    'job.details.nodes': 2,
    'job.history[0].time': '2025-06-30T12:00:00Z',
    'job.history[0].event': 'QUEUED',
    'job.history[1].time': '2025-06-30T12:10:00Z',
    'job.history[1].event': 'RUNNING',
    'owner': 'smazzoni'
}

Or you can request each key:

print("Flattened dictionary keys:")
for key in flattened_test:
    print(f" - {key}: {flattened_test[key]}")

Produces:

Flattened dictionary keys:
 - job.id: 123
 - job.status: RUNNING
 - job.details.queue: normal
 - job.details.nodes: 2
 - job.history[0].time: 2025-06-30T12:00:00Z
 - job.history[0].event: QUEUED
 - job.history[1].time: 2025-06-30T12:10:00Z
 - job.history[1].event: RUNNING
 - owner: smazzoni

Files#

You can find these files in Community Data.

flatten_dict.py
def flatten_dict(d, parent_key: str = "", sep: str = ".") -> dict:
    """
    Recursively flattens a nested mapping into a single-level dict with compound keys.

    Handles:
    - Nested dictionaries, joining keys with `sep`.
    - Lists, producing keys like `key[0]`, `key[1]`.
    - TapisResult values (from tapipy), by flattening their internal `__dict__`.
    - JSON strings that are dictionaries, parsing and continuing to flatten.

    Notes:
    - Scalars (`None`, `int`, `float`, and `bool`) are left as-is.
    - Dict keys are coerced to strings when forming compound keys.
    - Tuples/sets are treated as scalar values (not expanded).

    Useful for preparing complex Tapis job objects or deeply nested JSON
    for DataFrame creation, CSV export, or simplified logging.

    Parameters
    ----------
    d : dict
        The nested dictionary to flatten. (If you pass a TapisResult at top level,
        convert it to `obj.__dict__` first, or extend this function accordingly.)
    parent_key : str, default=''
        Used internally to build up compound keys.
    sep : str, default='.'
        Separator used to join keys.

    Returns
    -------
    dict
        A flattened dictionary with compound keys.

    Example
    -------
    flattened = flatten_dict(nested_dict)
    print(flattened['job.history[0].event'])  # → 'QUEUED'

    Author
    ------
    Silvia Mazzoni, DesignSafe (silviamazzoni@yahoo.com)

    Date
    ----
    2025-08-14

    Version
    -------
    1.0
    """
    import json
    from tapipy.tapis import TapisResult
    items = []
    for k, v in d.items():
        new_key = f"{parent_key}{sep}{k}" if parent_key else k
        if v is None or isinstance(v, (int, float)):
            items.append((new_key, v))
            continue
        if isinstance(v, dict):
            items.extend(flatten_dict(v, new_key, sep=sep).items())
            continue
        if isinstance(v, list):
            for idx, item in enumerate(v):
                indexed_key = f"{new_key}[{idx}]"
                if isinstance(item, dict):
                    items.extend(flatten_dict(item, indexed_key, sep=sep).items())
                else:
                    items.append((indexed_key, item))
            continue
        if isinstance(v, TapisResult):
            items.extend(flatten_dict(v.__dict__, new_key, sep=sep).items())
            continue
        if isinstance(v, str):
            try:
                parsed_json = json.loads(v)
                if isinstance(parsed_json, dict):
                    items.extend(flatten_dict(parsed_json, new_key, sep=sep).items())
                    continue
            except (json.JSONDecodeError, TypeError):
                pass
        items.append((new_key, v))
    return dict(items)