Execution Strategies

Execution Strategies#

How workloads are mapped onto compute systems

An execution strategy describes how a workload is launched, distributed, coordinated, and completed on a computing system. While the workload defines computational behavior, the execution strategy defines control flow, parallel structure, and resource usage.

Crucially, execution strategies are independent of tools. The same strategy may be implemented using JupyterHub, SLURM scripts, or Tapis apps — what changes is automation, not intent.

Why Execution Strategies Matter#

Execution strategies sit between scientific intent and computing tools.

They answer questions such as:

Should tasks run independently or in coordination?
Should the workflow be one long job or many small jobs?
Is performance limited by CPU, memory, communication, or I/O?
Does scaling mean more tasks, larger tasks, or longer runs?

Understanding execution strategies prevents common pitfalls such as:

oversubscribing memory,
underutilizing nodes,
overwhelming the filesystem with small files,
or adding resources that reduce performance.

The Three Core Execution Dimensions#

Every execution strategy is shaped by how a workload behaves along three fundamental dimensions:

These dimensions—not the software—determine the correct execution strategy.

Importantly, a single workflow may change execution strategy over its lifetime — for example, starting as embarrassingly parallel during exploration and evolving into a tightly coupled execution at scale.

Common Execution Strategies#

Below are the most common execution strategies used on DesignSafe and similar HPC platforms.

Execution Strategy ≠ Platform#

A critical distinction:

Execution strategies describe structure — not tools.

The same strategy can be implemented using:

JupyterHub (interactive, exploratory)
SLURM batch scripts (manual control)
Tapis apps (automated, repeatable workflows)

The strategy stays the same; only the level of automation and orchestration changes.

Execution Strategy ≠ Resource Size#

A critical misconception is that scaling a workload means adding more resources.

Many workloads fail to scale because the execution strategy does not match the workload structure.

Examples:

Adding nodes to a tightly coupled simulation may slow it down
Running many tiny tasks as one job may waste cores
GPU jobs without sufficient preprocessing may idle accelerators

Choosing the right execution strategy is often more important than choosing the largest system.

Looking Ahead#

In later chapters, these execution strategies will be mapped to:

Interactive environments (e.g., JupyterHub)
Batch systems (SLURM)
Automated pipelines (Tapis applications)

The goal is not to lock you into a single approach, but to give you a strategy-first mindset for building scalable, reusable computational workflows.

Guiding Principle#

Performance problems are usually strategy problems, not hardware problems.