Coroutines vs Process-Based Parallelism
When to reach for async coroutines and when to fork real OS processes — a decisive verdict on the two dominant concurrency models, judged on CPU work, I/O work, fault isolation, and what actually breaks in production.
The short answer
Coroutines over Process Based Parallelism for most cases. Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency,.
- Pick Coroutines if your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box
- Pick Process Based Parallelism if your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings
- Also consider: A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.
— Nice Pick, opinionated tool recommendations
The only question that matters: I/O or CPU?
Stop arguing taste and answer one question — what is your program waiting on? If it waits on the network, disk, or a database, it is I/O-bound, and coroutines are the answer. A coroutine parks itself the instant it hits a yield point and the event loop services another. One thread, one core, tens of thousands of in-flight operations, kilobytes of memory each. Processes can't touch that. If instead your program is pinning a core doing math — resizing images, running a tokenizer, FFTs — then no amount of async will help, because there's nothing to wait on. You need more cores, which means more processes. The mistake almost everyone makes is reaching for processes 'to go faster' on an I/O-bound service. That doesn't go faster. It just burns RAM and context-switch budget to do the same waiting in a more expensive way.
Where processes earn their keep
Process-based parallelism has two genuine, non-negotiable advantages, and honesty demands naming them. First: true parallelism. In runtimes with a global lock — CPython's GIL being the famous offender — coroutines and threads cannot run Python bytecode on two cores at once. Processes each get their own interpreter and their own core. For CPU-bound work, that is the difference between using your hardware and wasting seven-eighths of it. Second: fault and memory isolation. A coroutine that segfaults a shared address space, leaks state, or corrupts a global takes the whole event loop with it. A worker process that dies just dies — the supervisor respawns it and the siblings never notice. That isolation is why Nginx, Postgres, and Gunicorn run process pools. If a crash in one unit of work must not poison the others, processes are not optional.
The costs nobody puts on the slide
Processes are expensive and people pretend otherwise. Each one is megabytes of resident memory, a full fork or spawn, and a separate copy of your imports. Spin up 10,000 and you've eaten the machine. Worse, they don't share memory, so every byte that crosses between them gets pickled, copied through a pipe, and unpickled — serialization overhead that quietly dominates if your tasks are small or your payloads are fat. Coroutines have their own tax, and it's the one that actually bites teams: one blocking call — a synchronous DB driver, a CPU-heavy loop, a careless time.sleep — stalls the entire event loop and freezes every other task on it. Async is cooperative; one rude function ruins the party. And async code is colored: await propagates up your whole call stack, and bridging sync and async is a recurring source of deadlocks. Neither model is free.
What I'd actually ship
Default to coroutines and treat processes as the specialist tool you summon with evidence. The honest production architecture is a hybrid: an async event loop terminates connections and orchestrates all the I/O — the thing you have thousands of — and when a request needs real CPU work, you hand that chunk to a small, fixed-size process pool sized to your core count. That's exactly the shape of a modern Python web stack (uvicorn/asyncio out front, a ProcessPoolExecutor for the heavy bits) and of Node clustered behind PM2. The anti-patterns are symmetric and both common: spawning a process per request to 'parallelize' a service that's really just waiting on Postgres, and cramming a 200ms CPU loop directly into an event loop and wondering why p99 latency exploded. Profile first. If you're waiting, go async. If you're computing, go parallel. Don't guess — the profiler already knows.
Quick Comparison
| Factor | Coroutines | Process Based Parallelism |
|---|---|---|
| I/O-bound throughput | Excellent — tens of thousands of concurrent ops on one thread, KB each | Poor — each waiter costs a full process; memory caps you fast |
| CPU-bound parallelism | None under a GIL — bound to one core for bytecode | True multi-core; each process owns a core |
| Memory per unit of work | Kilobytes per coroutine, shared address space | Megabytes per process, no shared memory |
| Fault isolation | Weak — one crash or block kills the whole event loop | Strong — a worker dies alone, supervisor respawns it |
| Data sharing cost | Free — same address space, no serialization | Expensive — pickle/IPC copy across every boundary |
The Verdict
Use Coroutines if: Your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box.
Use Process Based Parallelism if: Your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings.
Consider: A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.
Coroutines vs Process Based Parallelism: FAQ
Is Coroutines or Process Based Parallelism better?
Coroutines is the Nice Pick. Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency, and scale. Process-based parallelism wins exactly one fight (CPU-bound work on multiple cores), and that fight is narrower than people pretend. Default to coroutines; reach for processes only when a profiler proves you're CPU-bound.
When should you use Coroutines?
Your workload is I/O-bound: web servers, API gateways, scrapers, chat backends, anything that spends its life waiting on the network. Tens of thousands of concurrent connections on one box.
When should you use Process Based Parallelism?
Your workload is genuinely CPU-bound — image processing, numeric crunching, ML preprocessing — or you need hard fault isolation so one crash can't take down siblings.
What's the main difference between Coroutines and Process Based Parallelism?
When to reach for async coroutines and when to fork real OS processes — a decisive verdict on the two dominant concurrency models, judged on CPU work, I/O work, fault isolation, and what actually breaks in production.
How do Coroutines and Process Based Parallelism compare on i/o-bound throughput?
Coroutines: Excellent — tens of thousands of concurrent ops on one thread, KB each. Process Based Parallelism: Poor — each waiter costs a full process; memory caps you fast. Coroutines wins here.
Are there alternatives to consider beyond Coroutines and Process Based Parallelism?
A hybrid: a coroutine event loop in front handling I/O, dispatching CPU-heavy chunks to a process pool. This is what mature systems actually run, and it beats picking one religion.
Most real-world concurrency is I/O-bound — network calls, disk, database round-trips — and that is exactly where coroutines crush processes on memory, latency, and scale. Process-based parallelism wins exactly one fight (CPU-bound work on multiple cores), and that fight is narrower than people pretend. Default to coroutines; reach for processes only when a profiler proves you're CPU-bound.
Related Comparisons
Disagree? nice@nicepick.dev