generate_task_commands()

generate_task_commands()#

Generating PyLauncher Task Commands for Parameter Sweeps

This utility provides a general, reusable way to generate PyLauncher tasklists (for example, a runsList.txt file) by expanding a single command template into all combinations of parameter values.

Instead of manually writing dozens or hundreds of nearly identical command lines, you define:

one base command, and
a dictionary of parameters to sweep,

and the utility produces one shell command per parameter combination, ready to be executed by PyLauncher.

Why This Utility Exists#

Large parametric studies are common in HPC workflows, but manually managing tasklists is:

error-prone,
hard to scale, and
difficult to maintain.

This utility:

keeps sweeps explicit and readable,
guarantees collision-free task generation, and
integrates cleanly with SLURM + PyLauncher + Tapis-style workflows.

Function Overview#

generate_task_commands(base_command, sweep, *, placeholder_style="token")

Purpose#

Expand a command template into a list of fully resolved shell commands, one for each combination of sweep parameters.

Parameters#

`base_command : str`#

A command template containing placeholders for parameters to be swept.

Example:

python3 -u simulate.py --alpha ALPHA --beta BETA --gamma GAMMA --output "$WORK/sweep_$SLURM_JOB_ID/line_$LAUNCHER_JID/slot_$LAUNCHER_TSK_ID"

The placeholders (ALPHA, BETA, GAMMA) must match the keys in the sweep dictionary.

Environment variables such as $WORK, $SLURM_JOB_ID, and $LAUNCHER_TSK_ID are intentionally left unresolved here and will be expanded by the shell at runtime.

`sweep : Mapping[str, Sequence[Any]]`#

A dictionary defining the parameter sweep.

Each key corresponds to a placeholder in base_command, and each value is a sequence of values to sweep over.

Example:

{
    "ALPHA": [0.3, 0.5, 3.7],
    "BETA": [1.1, 2, 3],
    "GAMMA": ["a", "b", "c"],
}

All combinations are generated using a Cartesian product.

`placeholder_style : {"token", "braces"}, optional`#

Controls how placeholders are written in base_command.

"token" (default): Placeholders appear as bare tokens:
```
--alpha ALPHA
```
"braces": Placeholders appear inside braces:
```
--alpha {ALPHA}
```

The brace style can be safer if parameter names might overlap with other text.

Returns#

`List[str]`#

A list of fully expanded command strings.

Each element corresponds to one unique combination of sweep parameters, in deterministic order based on the insertion order of the sweep dictionary.

Example Usage#

inputFilename = "simulate.py"

base_command = (
    f'python3 -u {inputFilename} '
    f'--alpha ALPHA --beta BETA --gamma GAMMA '
    f'--output "$WORK/sweep_$SLURM_JOB_ID/line_$LAUNCHER_JID/slot_$LAUNCHER_TSK_ID"'
)

sweep_params = {
    "ALPHA": [0.3, 0.5],
    "BETA": [1, 2],
    "GAMMA": ["a", "b"],
}

commands = generate_task_commands(base_command, sweep_params)

for cmd in commands:
    print(cmd)

This produces one command per parameter combination, suitable for use in a PyLauncher tasklist.

Writing a PyLauncher Tasklist#

The companion helper function can write the generated commands directly to a file:

write_tasklist(commands, "runsList.txt")

Each command is written on its own line, matching PyLauncher’s expected format.

Notes on Environment Variables and Output Paths#

Environment Variables#

Variables such as:

$WORK
$SLURM_JOB_ID
$LAUNCHER_JID
$LAUNCHER_TSK_ID

are not interpreted by Python. They are expanded by the shell at runtime when PyLauncher executes each command. This makes them ideal for constructing unique, collision-free output directories per task.

Output Location Strategy#

The example command writes outputs to:

$WORK/sweep_<jobid>/line_<launcher_jid>/slot_<launcher_task_id>

This path is outside the job execution directory. This design choice:

keeps the execution directory lightweight,
reduces I/O and packaging overhead at job completion, and
avoids unnecessarily archiving large sweep results when using Tapis-based workflows.

When to Use This Utility#

Use this pattern when you need to:

run large parametric sweeps,
generate PyLauncher runsList.txt files programmatically,
keep job outputs well-organized and scalable, and
minimize archiving overhead in HPC and Tapis workflows.

Files#

You can find these files in Community Data.

flatten_dict.py

from __future__ import annotations

from itertools import product
from pathlib import Path
from typing import Any, Dict, Iterable, List, Mapping, Sequence


import pandas as pd

def generate_task_commands(
    base_command: str,
    sweep: Mapping[str, Sequence[Any]],
    *,
    placeholder_style: str = "token",
) -> List[str]:
    """
    Expand a command template into a list of commands for all combinations.

    Parameters
    ----------
    base_command
        Command template containing placeholders for parameters. Example:

        'python3 -u simulate.py --alpha ALPHA --beta BETA --gamma GAMMA --output ".../slot_$LAUNCHER_TSK_ID"'

        Placeholders must match keys in `sweep`.

    sweep
        Mapping of placeholder -> list/tuple of values to sweep over.
        Example: {"ALPHA": [0.3, 0.5], "BETA": [1, 2]}

    placeholder_style
        How placeholders appear in `base_command`:
        - "token": placeholders are bare tokens like ALPHA, BETA (default)
        - "braces": placeholders are in braces like {ALPHA}, {BETA}

    Returns
    -------
    list of str
        One command per combination of values, in deterministic order based
        on the insertion order of `sweep`.

    Notes
    -----
    - This function does *string substitution only*; it does not validate that
      the command is runnable on your system.
    - Environment variables such as $WORK or $SLURM_JOB_ID are left untouched.
    """
    if not sweep:
        return [base_command]

    keys = list(sweep.keys())
    value_lists = [sweep[k] for k in keys]

    # Basic validation
    for k, vals in sweep.items():
        if not isinstance(vals, Sequence) or isinstance(vals, (str, bytes)):
            raise TypeError(f"sweep[{k!r}] must be a non-string sequence of values.")
        if len(vals) == 0:
            raise ValueError(f"sweep[{k!r}] is empty; provide at least one value.")

    commands: List[str] = []
    for combo in product(*value_lists):
        cmd = base_command
        for k, v in zip(keys, combo):
            if placeholder_style == "token":
                cmd = cmd.replace(k, str(v))
            elif placeholder_style == "braces":
                cmd = cmd.replace("{" + k + "}", str(v))
            else:
                raise ValueError("placeholder_style must be 'token' or 'braces'.")
        commands.append(cmd)

    return commands


def write_tasklist(commands: Iterable[str], outfile: str | Path) -> None:
    """
    Write commands to a PyLauncher tasklist file (one command per line).
    """
    outpath = Path(outfile)
    outpath.parent.mkdir(parents=True, exist_ok=True)
    outpath.write_text("\n".join(commands) + "\n", encoding="utf-8")

# # ---------------------------------------------------------------------
# # Example usage (edit these for your sweep)
# # ---------------------------------------------------------------------

# inputFilename = "simulate.py"

# base_command = (
#     f'python3 -u {inputFilename} '
#     f'--alpha ALPHA --beta BETA --gamma GAMMA '
#     f'--output "$WORK/sweep_$SLURM_JOB_ID/line_$LAUNCHER_JID/slot_$LAUNCHER_TSK_ID"'
# )

# # Define parameters for dynamic generation (add/remove keys freely)
# sweep_params: Dict[str, Sequence[Any]] = {
#     "ALPHA": [0.3, 0.5, 3.7],
#     "BETA": [1.1, 2, 3],
#     "GAMMA": ["a", "b", "c"],
# }

# generated_tasks = generate_task_commands(base_command, sweep_params, placeholder_style="token")

# print(f"Generated {len(generated_tasks)} task commands:")
# print("-" * 60)
# for cmd in generated_tasks:
#     print(cmd)

# # Optional: write a runsList file for PyLauncher
# # write_tasklist(generated_tasks, "runsList.txt")


def preview_sweep_table(sweep: Mapping[str, Sequence[Any]]) -> pd.DataFrame:
    """
    Create a preview table of all parameter combinations in a sweep.

    Parameters
    ----------
    sweep
        Mapping of parameter name -> sequence of values to sweep over.
        Example:
            {"ALPHA": [0.3, 0.5], "BETA": [1, 2], "GAMMA": ["a", "b"]}

    Returns
    -------
    pandas.DataFrame
        A table with one row per parameter combination. Column order follows
        the insertion order of `sweep`.

    Notes
    -----
    - This is intended for interactive notebooks. For large sweeps, consider
      displaying only `.head()` or sampling rows.
    - Numeric formatting (e.g., 3 significant figures) is handled by display
      settings; see example below.
    """
    if not sweep:
        return pd.DataFrame()

    keys = list(sweep.keys())
    value_lists = [sweep[k] for k in keys]

    # Basic validation
    for k, vals in sweep.items():
        if not isinstance(vals, Sequence) or isinstance(vals, (str, bytes)):
            raise TypeError(f"sweep[{k!r}] must be a non-string sequence of values.")
        if len(vals) == 0:
            raise ValueError(f"sweep[{k!r}] is empty; provide at least one value.")

    rows = [dict(zip(keys, combo)) for combo in product(*value_lists)]
    return pd.DataFrame(rows)

# # Example
# df = preview_sweep_table(sweep_params)
# print(f"Total runs: {len(df)}")
# df.head(10)

# # Optional: Display Numeric Values with 3 Significant Figures
# # If you want the notebook preview to show numbers with 3 significant figures, you can set a pandas display format:
# import pandas as pd
# pd.options.display.float_format = "{:.3g}".format
# df = preview_sweep_table(sweep_params)
# df

# # Tip for Large Sweeps
# # If your sweep is very large, preview only a small portion:
# df.sample(20, random_state=0)   # random sample of 20 rows
# # or
# df.head(20)                     # first 20 rows