A repeatable framework for thinking about a new system, sized for an hour-long whiteboard or a one-week design doc. The interesting work is not the boxes — it’s the trade-offs you make explicit and the failure modes you commit to handle.
The six questions to answer before drawing anything
- What problem are we solving, for whom? A system that solves the wrong problem perfectly is still wrong.
- What’s the scale? RPS, concurrent users, data volume, growth rate. An order of magnitude is fine — you need it to choose between regions of the design space.
- What’s the latency budget? End-to-end, by percentile (p50, p99). “Fast” isn’t a number.
- What’s the consistency model? Linearizable? Read-your-writes? Eventually consistent with bounded staleness? This is the single most expensive decision.
- What’s the durability requirement? RPO (data loss tolerance), RTO (recovery time). “We can’t lose anything” doubles the cost.
- What’s outside scope? Be explicit. Multi-region, GDPR, mobile, offline — name what you’re not solving so you don’t accidentally promise it.
Then, top-down
Start at the highest abstraction and only descend when the level above is settled.
- Boxes and arrows. Clients, edge, services, data. Don’t pick technologies yet — say “queue” not “Kafka.”
- Data flow. Trace a representative request and a representative write end-to-end. Latency adds up; mark each hop.
- Data model. What entities, what cardinalities, what access patterns. Choose the storage shape (relational / KV / document / graph / search / blob) from the queries, not from defaults.
- Concrete tech. Now name Postgres, Redis, Kafka, etc. — and justify it from the constraints above. “We chose X because of latency and consistency; we accept that this means Y.”
- Failure modes. For each component, ask: what happens when it dies, when it’s slow, when it lies (returns wrong data)? What’s the blast radius?
- Capacity. Back-of-envelope: per-request CPU, memory, disk, network. Multiply by RPS. Compare to single-instance limits. Decide where you’ll scale horizontally.
The design-space cheat sheet
| Concern | Lever | Cost |
|---|---|---|
| Read latency | Cache (CDN, app, in-memory) | Cache invalidation, consistency staleness |
| Write throughput | Async via queue | Eventual consistency, dedupe complexity |
| Geographic latency | Multi-region replicas | Replication lag, conflict resolution |
| Search at scale | Dedicated index (Elastic, OpenSearch) | Sync drift between source-of-truth and index |
| Cross-service workflow | Event-driven / saga | Compensation logic, observability harder |
| Cost per request | Batch + columnar storage | Latency from batching window |
| Failure isolation | Bulkheads / circuit breakers | More moving parts, more config |
Every lever is a trade. State the trade explicitly when you pull the lever.
The components you’ll always end up drawing
- Edge / API gateway — TLS termination, rate limiting, auth, routing.
- Stateless service tier — horizontally scalable, no local state.
- Caching layer — CDN at the edge, Redis-class in front of the DB, in-process for hot lookups.
- Source-of-truth datastore — usually relational; choose NoSQL with a specific reason (scale, schema flexibility, access pattern).
- Async pipeline — queue + workers — for anything that can be deferred from the request path.
- Search index — when the access pattern is “find by free text or facets”, separate from the source of truth.
- Object storage — for blobs, backups, and the data lake.
- Observability — metrics, logs, traces. Not optional. Build it in from day one or pay 10× to add later.
ML systems, the extra concerns
Beyond the standard system, ML adds:
- Training vs serving — different latency / throughput / consistency profiles, often different infrastructure entirely.
- Feature store — to keep training and serving features identical (skew is the silent killer).
- Model versioning + rollout — same problem as code deploys, but with a longer feedback loop and bigger blast radius.
- Drift monitoring — input distribution, prediction distribution, label distribution (when labels arrive). Concept drift vs data drift; both kill quietly.
- Evaluation harness — offline metrics (precision, recall, AUC) as gate, online A/B as final word. Offline-online correlation is its own ongoing problem.
The interview / design-review presentation arc
- Clarify (5 min) — questions 1–6 above. Lead with these; never start drawing.
- High-level diagram (10 min) — boxes and arrows, no tech yet.
- Detailed component walk (15 min) — pick the 2–3 hardest components, drill into data model, algorithm, failure modes.
- Scale and bottleneck pass (10 min) — what breaks first, what you’d do about it.
- Trade-offs and what you’d do differently (5 min) — explicit, named.
A senior interviewer is grading the trade-off articulation, not the box drawing.