ConceptsJun 20263 min read

Coroutines vs Process-Based Parallelism

When to reach for async coroutines and when to fork real OS processes — a decisive verdict on the two dominant concurrency models, judged on CPU work, I/O work, fault isolation, and what actually breaks in production.

The short answer

Coroutines over Process Based Parallelism for most cases. Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency,.

  • Pick Coroutines if your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box
  • Pick Process Based Parallelism if your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings
  • Also consider: A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.

— Nice Pick, opinionated tool recommendations

The only question that matters: I/O or CPU?

Stop arguing taste and answer one question — what is your program waiting on? If it waits on the network, disk, or a database, it is I/O-bound, and coroutines are the answer. A coroutine parks itself the instant it hits a yield point and the event loop services another. One thread, one core, tens of thousands of in-flight operations, kilobytes of memory each. Processes can't touch that. If instead your program is pinning a core doing math — resizing images, running a tokenizer, FFTs — then no amount of async will help, because there's nothing to wait on. You need more cores, which means more processes. The mistake almost everyone makes is reaching for processes 'to go faster' on an I/O-bound service. That doesn't go faster. It just burns RAM and context-switch budget to do the same waiting in a more expensive way.

Where processes earn their keep

Process-based parallelism has two genuine, non-negotiable advantages, and honesty demands naming them. First: true parallelism. In runtimes with a global lock — CPython's GIL being the famous offender — coroutines and threads cannot run Python bytecode on two cores at once. Processes each get their own interpreter and their own core. For CPU-bound work, that is the difference between using your hardware and wasting seven-eighths of it. Second: fault and memory isolation. A coroutine that segfaults a shared address space, leaks state, or corrupts a global takes the whole event loop with it. A worker process that dies just dies — the supervisor respawns it and the siblings never notice. That isolation is why Nginx, Postgres, and Gunicorn run process pools. If a crash in one unit of work must not poison the others, processes are not optional.

The costs nobody puts on the slide

Processes are expensive and people pretend otherwise. Each one is megabytes of resident memory, a full fork or spawn, and a separate copy of your imports. Spin up 10,000 and you've eaten the machine. Worse, they don't share memory, so every byte that crosses between them gets pickled, copied through a pipe, and unpickled — serialization overhead that quietly dominates if your tasks are small or your payloads are fat. Coroutines have their own tax, and it's the one that actually bites teams: one blocking call — a synchronous DB driver, a CPU-heavy loop, a careless time.sleep — stalls the entire event loop and freezes every other task on it. Async is cooperative; one rude function ruins the party. And async code is colored: await propagates up your whole call stack, and bridging sync and async is a recurring source of deadlocks. Neither model is free.

What I'd actually ship

Default to coroutines and treat processes as the specialist tool you summon with evidence. The honest production architecture is a hybrid: an async event loop terminates connections and orchestrates all the I/O — the thing you have thousands of — and when a request needs real CPU work, you hand that chunk to a small, fixed-size process pool sized to your core count. That's exactly the shape of a modern Python web stack (uvicorn/asyncio out front, a ProcessPoolExecutor for the heavy bits) and of Node clustered behind PM2. The anti-patterns are symmetric and both common: spawning a process per request to 'parallelize' a service that's really just waiting on Postgres, and cramming a 200ms CPU loop directly into an event loop and wondering why p99 latency exploded. Profile first. If you're waiting, go async. If you're computing, go parallel. Don't guess — the profiler already knows.

Quick Comparison

FactorCoroutinesProcess Based Parallelism
I/O-bound throughputExcellent — tens of thousands of concurrent ops on one thread, KB eachPoor — each waiter costs a full process; memory caps you fast
CPU-bound parallelismNone under a GIL — bound to one core for bytecodeTrue multi-core; each process owns a core
Memory per unit of workKilobytes per coroutine, shared address spaceMegabytes per process, no shared memory
Fault isolationWeak — one crash or block kills the whole event loopStrong — a worker dies alone, supervisor respawns it
Data sharing costFree — same address space, no serializationExpensive — pickle/IPC copy across every boundary

The Verdict

Use Coroutines if: Your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box.

Use Process Based Parallelism if: Your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings.

Consider: A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.

Coroutines vs Process Based Parallelism: FAQ

Is Coroutines or Process Based Parallelism better?

Coroutines is the Nice Pick. Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency, and scale. Process-based parallelism wins exactly one fight (CPU-bound work on multiple cores), and that fight is narrower than people pretend. Default to coroutines; reach for processes only when a profiler proves you're CPU-bound.

When should you use Coroutines?

Your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box.

When should you use Process Based Parallelism?

Your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings.

What's the main difference between Coroutines and Process Based Parallelism?

When to reach for async coroutines and when to fork real OS processes — a decisive verdict on the two dominant concurrency models, judged on CPU work, I/O work, fault isolation, and what actually breaks in production.

How do Coroutines and Process Based Parallelism compare on i/o-bound throughput?

Coroutines: Excellent — tens of thousands of concurrent ops on one thread, KB each. Process Based Parallelism: Poor — each waiter costs a full process; memory caps you fast. Coroutines wins here.

Are there alternatives to consider beyond Coroutines and Process Based Parallelism?

A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.

🧊
The Bottom Line
Coroutines wins

Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency, and scale. Process-based parallelism wins exactly one fight (CPU-bound work on multiple cores), and that fight is narrower than people pretend. Default to coroutines; reach for processes only when a profiler proves you're CPU-bound.

Related Comparisons

Disagree? nice@nicepick.dev