A repeatable framework for thinking about a new system, sized for an hour-long whiteboard or a one-week design doc. The interesting work is not the boxes — it’s the trade-offs you make explicit and the failure modes you commit to handle.

The six questions to answer before drawing anything

  1. What problem are we solving, for whom? A system that solves the wrong problem perfectly is still wrong.
  2. What’s the scale? RPS, concurrent users, data volume, growth rate. An order of magnitude is fine — you need it to choose between regions of the design space.
  3. What’s the latency budget? End-to-end, by percentile (p50, p99). “Fast” isn’t a number.
  4. What’s the consistency model? Linearizable? Read-your-writes? Eventually consistent with bounded staleness? This is the single most expensive decision.
  5. What’s the durability requirement? RPO (data loss tolerance), RTO (recovery time). “We can’t lose anything” doubles the cost.
  6. What’s outside scope? Be explicit. Multi-region, GDPR, mobile, offline — name what you’re not solving so you don’t accidentally promise it.

Then, top-down

Start at the highest abstraction and only descend when the level above is settled.

  1. Boxes and arrows. Clients, edge, services, data. Don’t pick technologies yet — say “queue” not “Kafka.”
  2. Data flow. Trace a representative request and a representative write end-to-end. Latency adds up; mark each hop.
  3. Data model. What entities, what cardinalities, what access patterns. Choose the storage shape (relational / KV / document / graph / search / blob) from the queries, not from defaults.
  4. Concrete tech. Now name Postgres, Redis, Kafka, etc. — and justify it from the constraints above. “We chose X because of latency and consistency; we accept that this means Y.”
  5. Failure modes. For each component, ask: what happens when it dies, when it’s slow, when it lies (returns wrong data)? What’s the blast radius?
  6. Capacity. Back-of-envelope: per-request CPU, memory, disk, network. Multiply by RPS. Compare to single-instance limits. Decide where you’ll scale horizontally.

The design-space cheat sheet

ConcernLeverCost
Read latencyCache (CDN, app, in-memory)Cache invalidation, consistency staleness
Write throughputAsync via queueEventual consistency, dedupe complexity
Geographic latencyMulti-region replicasReplication lag, conflict resolution
Search at scaleDedicated index (Elastic, OpenSearch)Sync drift between source-of-truth and index
Cross-service workflowEvent-driven / sagaCompensation logic, observability harder
Cost per requestBatch + columnar storageLatency from batching window
Failure isolationBulkheads / circuit breakersMore moving parts, more config

Every lever is a trade. State the trade explicitly when you pull the lever.

The components you’ll always end up drawing

  • Edge / API gateway — TLS termination, rate limiting, auth, routing.
  • Stateless service tier — horizontally scalable, no local state.
  • Caching layer — CDN at the edge, Redis-class in front of the DB, in-process for hot lookups.
  • Source-of-truth datastore — usually relational; choose NoSQL with a specific reason (scale, schema flexibility, access pattern).
  • Async pipeline — queue + workers — for anything that can be deferred from the request path.
  • Search index — when the access pattern is “find by free text or facets”, separate from the source of truth.
  • Object storage — for blobs, backups, and the data lake.
  • Observability — metrics, logs, traces. Not optional. Build it in from day one or pay 10× to add later.

ML systems, the extra concerns

Beyond the standard system, ML adds:

  • Training vs serving — different latency / throughput / consistency profiles, often different infrastructure entirely.
  • Feature store — to keep training and serving features identical (skew is the silent killer).
  • Model versioning + rollout — same problem as code deploys, but with a longer feedback loop and bigger blast radius.
  • Drift monitoring — input distribution, prediction distribution, label distribution (when labels arrive). Concept drift vs data drift; both kill quietly.
  • Evaluation harness — offline metrics (precision, recall, AUC) as gate, online A/B as final word. Offline-online correlation is its own ongoing problem.

The interview / design-review presentation arc

  1. Clarify (5 min) — questions 1–6 above. Lead with these; never start drawing.
  2. High-level diagram (10 min) — boxes and arrows, no tech yet.
  3. Detailed component walk (15 min) — pick the 2–3 hardest components, drill into data model, algorithm, failure modes.
  4. Scale and bottleneck pass (10 min) — what breaks first, what you’d do about it.
  5. Trade-offs and what you’d do differently (5 min) — explicit, named.

A senior interviewer is grading the trade-off articulation, not the box drawing.