Personal notes · May 2026

System Design Trade-Offs

There are no best architectures — only architectures that are right for a set of constraints. I wrote this note to map the trade-off space I navigate in system design: the laws that bound the design, the axes where you spend your one budget, and the framework I use to choose deliberately and defend the choice.

The question behind this note: which trade-offs are we making on purpose — and which ones are we drifting into because no one wrote them down? What follows is how I think through that.
Covers: Distributed systems · data · scaling · resilience Written: May 2026 ✍️ By: Linh Truong

Foundations

The Trade-Off Mental Model

Every architectural decision moves a slider. Moving it toward one virtue spends a budget that could have bought another. The skill I care about is not knowing the "right" answer — it is naming the budget you are spending and proving it is the cheapest one to spend for this system.

"There are no solutions, only trade-offs." — Thomas Sowell · Adapted to engineering by every architect who ever shipped at scale.

Principle 1

You cannot maximize everything

Latency, consistency, cost, simplicity, and flexibility pull against each other. Optimizing one past a point degrades another. Pick the two or three that matter for the business and let the rest be "good enough."

Principle 2

Constraints reveal the answer

The "best" design falls out of the requirements + constraints: read/write ratio, scale, consistency needs, team size, latency budget, and money. Quantify these first; the architecture is then mostly a derivation.

Principle 3

Reversible vs one-way doors

Decide fast on reversible choices; deliberate hard on one-way doors (data model, partition key, public API contract, sync↔async boundaries). Spend your analysis budget where the cost of being wrong is highest.

The four budgets you are always spending

The Constraint Diamond — pull one corner, the others move your system Performance latency · throughput Scale availability Cost $ · ops · headcount Simplicity consistency
You sit in the middle. Dragging the system toward any corner stretches the cords to the others — more performance often costs more money and simplicity; more scale stresses consistency.

Foundations

Laws & Theorems That Bound Every Design

These are not opinions — they are constraints. Knowing them stops you from promising the impossible (e.g., "strongly consistent, always available, across regions").

Distributed

CAP Theorem

During a network partition, you must choose Consistency or Availability. You cannot have both. When the network is healthy, the choice is moot — which is why PACELC matters more in practice.

Distributed

PACELC

If Partition → trade A vs C; Else → trade Latency vs Consistency. Captures the everyday cost: strong consistency adds round-trips even when nothing is broken.

Scaling

Amdahl's & Universal Scalability Law

Speedup is capped by the serial fraction (Amdahl). USL adds a coherency penalty: past a point, adding nodes makes throughput worse due to cross-node coordination. Contention is the enemy of scale.

Queueing

Little's Law

L = λ × W. Concurrency = arrival rate × latency. Cut latency and you cut the in-flight work (threads, connections, memory) you must provision. The cheapest capacity is lower latency.

Queueing

Tail latency & the queueing wall

As utilization → 100%, queue time explodes non-linearly. Run hot (>~70–80%) and p99 latency detonates. Plan capacity against the tail, not the mean.

Org

Conway's Law

Systems mirror the communication structure of the org that builds them. Want clean service boundaries? Shape the teams first. Microservices imposed on a monolithic org produce a distributed monolith.

How I end the debate Quote the law that ends the debate. "We can't be both strongly consistent and multi-region available during a partition — CAP forbids it. Which do we sacrifice when us-east can't reach eu-west?" turns an opinion war into a requirements decision.

Foundations

Latency Numbers Every Engineer Should Know

Order-of-magnitude intuition is what separates a back-of-the-envelope estimate from hand-waving. Memorize the relative gaps, not the exact figures.

Operation latency (log scale, approx.) L1 cache ref ~1 ns Branch mispredict ~3 ns L2 cache ref ~4 ns Mutex lock/unlock ~17 ns Main memory ref ~100 ns Compress 1 KB (zippy) ~2 µs SSD random read ~16 µs Round trip in same DC ~500 µs HDD seek ~2 ms RTT CA ⇄ Netherlands ~150 ms
The gaps that matter: memory is ~100× slower than L1; SSD ~100× slower than memory; a network hop ~30× slower than SSD; crossing the planet ~300× a same-DC round trip. Design data locality accordingly.
How to use these Capacity math in 60 seconds: 1 M requests/day ≈ ~12 req/s. 1 M requests with a 200 ms p99 needs ≈ 0.2 × peak_rps concurrent workers (Little's Law). 1 KB × 1 M rows ≈ 1 GB. These let you sanity-check any design on a whiteboard.

Core Axis · Distributed State

CAP & PACELC — Consistency vs Availability

The first fork in any distributed datastore decision. State which side you land on, and what happens to the other guarantee during a partition.

CAP under a network partition Node A (us-east) value = 42 client writes 99 → Node B (eu-west) value = 42 (stale) ← client reads ? partition Choose CP — Consistency Node B refuses the read (or errors) rather than return stale 42. ↳ Banks, inventory, locks. Sacrifice uptime. Choose AP — Availability Node B returns 42 now, reconciles to 99 when the link heals. ↳ Carts, feeds, DNS, telemetry. Sacrifice freshness.
CAP only forces a choice during a partition. PACELC reminds you that even when healthy ("Else"), strong consistency costs latency via cross-node coordination.
SystemPACELC classBehaviorUse it for
PostgreSQL / MySQL (single primary)PC/ECStrong consistency; primary is a SPOF for writesTransactions, money, relational integrity
Spanner / CockroachDB / YugabyteDBPC/ECGlobal strong consistency via consensus + synced clocks; pays latencyGlobal OLTP needing strong guarantees
DynamoDB / Cassandra (default)PA/ELAvailable, eventually consistent; tunable quorumsHigh-scale writes, carts, sessions, feeds
MongoDB (default)PA/ECAvailable under partition; consistent reads from primary when healthyFlexible documents, mixed workloads
Redis (single)In-memory, strong on one node; replication is async (can lose writes on failover)Cache, ephemeral, leaderboards

Core Axis · Distributed State

Consistency Models — The Spectrum

"Consistency" is not binary. It is a ladder from strict to eventual; each rung trades latency and availability for stronger guarantees. Pick the weakest model that still meets correctness.

Stronger · slower · less available Weaker · faster · more available Linearizablesingle up-to-date copy Sequentialglobal order, maybe stale Causalcause precedes effect Read-your-writessession guarantee Eventualconverges, someday Cost of a rung up ≈ extra coordination round-trips and reduced availability during faults.
The spectrum of consistency guarantees. Most user-facing systems are happy at causal or read-your-writes — full linearizability is expensive and rarely required outside money and locks.
Strong

When you truly need it

Account balances, inventory decrements, unique username claims, distributed locks, "exactly one winner" auctions. Anywhere a stale read causes double-spend or oversell.

Eventual / Weak

When weak is plenty

Like counts, view counts, social feeds, product catalogs, recommendations, analytics, presence indicators. A few seconds of staleness is invisible to users and buys massive scale + availability.

Hidden cost Eventual consistency pushes complexity to the client and the reconciliation layer: conflict resolution (LWW, vector clocks, CRDTs), idempotent retries, and "undo" UX for rejected writes. It is cheaper on the write path and more expensive everywhere else. Budget for it.

Core Axis · Scale

Vertical vs Horizontal Scaling

Buy a bigger box, or buy more boxes? Vertical is simpler and faster to reach for; horizontal is the only path past a single machine's ceiling — at the price of distributed-systems complexity.

Scale Up vs Scale Outsimplicity ⇄ ceiling
⬆ Vertical (scale up)
  • Bigger CPU / RAM / faster disks on one node
  • No code changes; no distribution problems
  • Strong consistency stays trivial
  • Hard ceiling + exponential $ per unit
  • Single point of failure; disruptive upgrades
VS
➡ Horizontal (scale out)
  • Many commodity nodes behind a balancer
  • Near-linear capacity & built-in redundancy
  • Requires statelessness, sharding, coordination
  • Introduces CAP, partial failure, data movement
  • Operational + cognitive overhead climbs
Decision rule: Scale up first — it is cheaper in engineering time until you hit the box ceiling or need HA. Scale out when a single node can't hold the load/data or you require fault tolerance. Make services stateless early so scaling out is later a config change, not a rewrite.

The prerequisite: statelessness

Push state out → scale the compute freely clients Load Balancer app (stateless) app (stateless) app (stateless) add/remove nodes at will Session / cache (Redis) Database (primary) the hard-to-scale part lives here, deliberately bounded
Statelessness is what makes horizontal scaling cheap. Externalize sessions and data so any node can serve any request; the load balancer then becomes the only thing that "knows" who is where.

Core Axis · Data Model

SQL vs NoSQL — and the Many NoSQLs

The most consequential one-way door in most systems. Choose for your access patterns and consistency needs, not for hype. "NoSQL" is four very different things.

DimensionRelational (SQL)DocumentKey-ValueWide-ColumnGraph
ExamplesPostgres, MySQLMongoDB, CouchbaseRedis, DynamoDBCassandra, ScyllaDB, BigtableNeo4j, Neptune
SchemaRigid, enforcedFlexibleNoneFlexible columnsNodes + edges
Best queryJoins, ad-hoc, aggregatesWhole-document fetchGet/put by keyWide rows by partition keyRelationship traversal
ConsistencyStrong (ACID)TunableTunableTunableStrong
Horizontal scaleHarder (sharding)GoodExcellentExcellentHard
Sweet spotTransactions, reportingCatalogs, profiles, CMSCache, sessions, countersTime-series, events, feedsSocial, fraud, recommendations
Normalize vs Denormalizewrite cost ⇄ read cost
Normalized
  • One source of truth, no duplication
  • Cheap, safe writes
  • Reads pay join cost
  • Default for OLTP / SQL
VS
Denormalized
  • Pre-joined, read-optimized
  • Fast reads, no joins
  • Writes fan out; risk of drift
  • Default for NoSQL / read-heavy
Rule: Normalize until reads hurt, then denormalize the hot paths. In NoSQL you model the query first and duplicate freely.
ACID vs BASEguarantee ⇄ scale
ACID
  • Atomic, Consistent, Isolated, Durable
  • Correctness by construction
  • Coordination limits scale
VS
BASE
  • Basically Available, Soft state, Eventual
  • Scales horizontally with ease
  • App handles conflicts/retries
Rule: ACID for money & integrity; BASE for scale-first, tolerant-to-staleness workloads. Many systems use both — ACID core, BASE edges.

OLTP vs OLAP — don't run analytics on your transactional store

OLTP (transactional)OLAP (analytical)
PatternMany small reads/writes by keyFew huge scans & aggregations
StorageRow-orientedColumn-oriented (Parquet, Redshift, BigQuery, ClickHouse)
GoalLow latency per opHigh throughput per query
Move data viaCDC / ETL / ELT into a warehouse or lakehouse
Common failure Heavy analytics queries on the production OLTP primary lock rows and blow the latency budget for users. Replicate to a read replica or warehouse; isolate the workloads.

Core Axis · Scale

Replication & Partitioning

Replication buys availability and read scale (copies of the same data). Partitioning/sharding buys write scale and capacity (splits of different data). Most large systems do both.

Replication topologies

Single-leader

Leader–follower

All writes to one leader, async/sync to followers. Simple, strong-ish. Trade: leader is a write SPOF; failover + replication lag.

Multi-leader

Active–active

Writes accepted in multiple regions. Great for geo-latency & offline. Trade: write conflicts you must resolve (LWW / CRDT / app logic).

Leaderless

Quorum (Dynamo-style)

Read/write to any N nodes; tune W + R > N for consistency. Trade: tunable but complex; read-repair & anti-entropy.

Quorum math With N replicas, writing to W and reading from R: if W + R > N you are guaranteed to read the latest write. Example N=3, W=2, R=2: strong-ish with one node down. Lower R for fast reads (accept staleness); lower W for fast writes (accept lost-update risk).

Partitioning (sharding) strategies

StrategyHowProCon
Hash / consistent hashingshard = hash(key) mod ringEven distribution; minimal reshuffle on resizeRange scans impossible; hard to query ranges
Rangeshard = key falls in [a,b)Efficient range scans & orderingHotspots if keys are skewed (e.g. timestamps)
Directory / lookupexplicit map key→shardFull flexibility, easy rebalancingLookup service is a dependency & SPOF
Geoshard by regionData locality, residency complianceCross-region queries are expensive
The shard key is a one-way door Choose it for (1) even load, (2) the dominant query, and (3) avoiding cross-shard transactions. A bad shard key (e.g. customer_id when one customer is 40% of traffic) creates a permanent hotspot that is brutally expensive to fix later. Model the access pattern before you commit.

Core Axis · Performance

Caching — The Cheapest Latency Win and the Hardest Bug

Caching trades freshness and complexity for latency and load reduction. "There are only two hard things in CS: cache invalidation and naming things." Plan invalidation before you add the cache.

Write/Read strategies

PatternHow it worksProCon / risk
Cache-aside (lazy)App checks cache; on miss, reads DB and populatesResilient, only caches what's usedFirst hit slow; stale until TTL; thundering herd on miss
Read-throughCache library fetches from DB on missTransparent to app codeCache becomes a hard dependency
Write-throughWrite to cache & DB synchronouslyCache always freshWrite latency = cache + DB; caches unread data
Write-back (write-behind)Write to cache, flush to DB asyncFast writes, absorbs burstsData loss if cache dies before flush
Write-aroundWrite to DB only; cache fills on readAvoids caching write-once dataRecently written data reads are slow misses
Eviction

When the cache is full

  • LRU — evict least recently used (good general default)
  • LFU — evict least frequently used (good for skewed popularity)
  • FIFO / TTL — simple, time-bounded freshness
  • W-TinyLFU — modern hybrid (Caffeine), high hit rates
Failure modes

The three cache stampedes

  • Penetration — queries for keys that don't exist; cache never helps. Fix: cache negatives / bloom filter.
  • Avalanche — many keys expire at once → DB spike. Fix: jittered TTLs.
  • Stampede / dogpile — hot key expires, N requests rebuild it. Fix: single-flight lock / stale-while-revalidate.

Where the cache lives (layers)

Browser / client HTTP cache, SW CDN / edge static + cacheable API App / in-memory local LRU, per-node Distributed Redis / Memcached Database buffer pool, mat. views Closer to the client = lower latency & less origin load, but harder to invalidate. Cache as close as freshness allows.
Each layer trades invalidation control for latency. CDN edge caching is the single biggest lever for global read-heavy workloads; in-memory local caches are fastest but per-node and incoherent.

Core Axis · Communication

Synchronous vs Asynchronous & the API Spectrum

How services talk is as consequential as how they store. Sync is simple and immediate but couples availability; async decouples and absorbs load but adds eventual consistency and operational surface.

Synchronous (request/response) vs Asynchronous (messaging)simplicity ⇄ resilience
Synchronous · REST/gRPC
  • Immediate result, easy to reason about
  • Linear request flow, simple debugging
  • Temporal coupling: callee down ⇒ caller down
  • Latency adds up across the chain
  • Backpressure must be handled explicitly
VS
Asynchronous · queues/events
  • Decoupled; producer ≠ consumer uptime
  • Absorbs spikes; natural load leveling
  • Eventual consistency & out-of-order delivery
  • Harder to trace; need idempotency + DLQs
  • Enables fan-out & replay
Rule: Use sync when the caller genuinely needs the answer to proceed (a read, a validation). Use async for work that can happen later (emails, thumbnails, indexing, fan-out) or to decouple availability. A user click should rarely block on five synchronous downstream calls.

API styles

StyleTransport / shapeStrengthsWeaknessesUse for
RESTHTTP/JSON, resourcesUbiquitous, cacheable, simpleOver/under-fetching; chattyPublic APIs, CRUD, broad compatibility
gRPCHTTP/2, protobuf, RPCFast, typed contracts, streamingBinary, browser/edge frictionInternal service-to-service, low latency
GraphQLHTTP, query languageClient picks fields; one round-tripCaching & rate-limiting complexity; N+1Aggregating many sources for varied UIs
Webhooks / SSE / WebSocketServer-pushReal-time, server-initiatedConnection state, scaling fan-outNotifications, live updates, chat

Queue vs Log (message broker vs event stream)

Message queue

RabbitMQ / SQS — work distribution

Message consumed once and deleted; competing consumers split a task queue. Great for job processing. Trade: no replay, ordering is limited.

Event log

Kafka / Kinesis / Pulsar — event streaming

Append-only, retained log; many consumers read independently and can replay. Great for event sourcing, analytics, fan-out. Trade: more ops, partition ordering only.

Always design for Idempotency (same message twice = same result; use idempotency keys), at-least-once delivery (duplicates happen — dedupe), and a dead-letter queue (poison messages need somewhere to go). "Exactly once" is mostly a marketing term; engineer for at-least-once + idempotent consumers.

Architecture

Monolith vs Microservices

The defining org-and-tech trade-off of the last decade. Microservices trade in-process simplicity for independent deployability — and buy a distributed system's worth of new problems. Most teams should start with a modular monolith.

Modular Monolith Orders Billing Catalog Users one deploy · in-process calls · one DB · one transaction Microservices Orders + DB Billing + DB Catalog + DB Users + DB N deploys · network calls · DB-per-service no cross-service transaction → use sagas
The monolith keeps everything in one process and one transaction boundary. Microservices grant independent scaling and deploys per team, but every in-process call becomes a network call that can fail, and every cross-entity write becomes a distributed-transaction problem.
DimensionMonolithMicroservices
DeploymentOne artifact, simpleIndependent per service, needs CI/CD maturity
ScalingScale the whole appScale hot services independently
Team autonomyCoupled; coordination taxTeams own & ship independently
TransactionsACID, trivialDistributed; sagas + eventual consistency
DebuggingStack trace, one processDistributed tracing required
Failure modeAll-or-nothingPartial failure (good & bad)
Operational costLowHigh (mesh, observability, infra)
The distributed monolith The worst of both worlds: services split by layer not by domain, sharing a database and deploying together. You pay the network/ops tax of microservices and keep the coupling of a monolith. Split by business capability and data ownership, or don't split. Conway's Law applies — align services to teams.

Architecture

Patterns: CQRS, Event Sourcing, Saga, EDA

Powerful tools that each solve a real problem — and each add real complexity. Reach for them only when the constraint they address is actually present.

Read/write split

CQRS

Separate the write model from one or more read models, each optimized for its job.

Use when: read & write loads/shapes diverge sharply.

Cost: two models to keep in sync; eventual consistency between them.

Audit / replay

Event Sourcing

Store the sequence of events, derive state by replaying. The log is the source of truth.

Use when: you need a full audit trail, time-travel, or to rebuild projections.

Cost: schema evolution of events, snapshots, steep mental model.

Distributed txn

Saga

Replace a cross-service ACID transaction with a sequence of local transactions + compensating actions on failure.

Use when: a business process spans services (order → payment → inventory → ship).

Cost: you write the rollback logic; no isolation — intermediate states are visible.

Decoupling

Event-Driven Architecture

Services emit events; others react. Producers don't know consumers.

Use when: you need loose coupling, fan-out, extensibility without changing producers.

Cost: emergent behavior is hard to trace; eventual consistency everywhere.

Saga: choreography vs orchestration

Choreography (events)Orchestration (coordinator)
ControlDecentralized; each service reacts to eventsCentral orchestrator drives the steps
ProLoose coupling, no central bottleneckExplicit flow, easy to see & debug
ConHard to follow the whole flow; cyclic riskOrchestrator is coupling + a SPOF to manage
UseFew steps, simple flowsMany steps, complex compensation logic

Architecture

Resilience & Reliability Patterns

At scale, failure is the steady state, not the exception. Reliability is bought with redundancy, isolation, and graceful degradation — each trading cost and complexity for uptime.

Stop the bleeding

Circuit Breaker

Trip open after repeated downstream failures; fail fast instead of piling on. Half-open to test recovery. Prevents cascading failure.

Shed load

Rate Limiting & Throttling

Token/leaky bucket caps inbound load. Protects you from abuse & thundering herds. Trade: legitimate spikes get 429'd — tune carefully.

Don't amplify

Retries + Backoff + Jitter

Retry transient errors with exponential backoff and jitter. Naïve retries cause retry storms that turn a blip into an outage.

Contain blast radius

Bulkheads

Isolate resource pools (thread pools, connection pools, cells) so one failing dependency can't drown the whole service.

Degrade gracefully

Fallbacks & Timeouts

Every remote call needs a timeout. On failure, serve cached/default/partial data. A degraded page beats an error page.

Smooth the flow

Backpressure

Let slow consumers signal producers to slow down (bounded queues, reactive streams). Unbounded buffering just relocates the crash to OOM.

Redundancy & the cost of nines

AvailabilityDowntime / yearTypical cost driver
99% ("two nines")~3.65 daysSingle region, manual recovery
99.9% ("three nines")~8.77 hoursRedundant instances, automated failover
99.99% ("four nines")~52.6 minutesMulti-AZ, no single points of failure
99.999% ("five nines")~5.26 minutesMulti-region active-active, heavy investment
Diminishing returns Each extra nine roughly 10×'s the cost and complexity. Match the SLO to the business: a checkout flow and an internal admin dashboard do not deserve the same investment. Define SLOs and error budgets explicitly, then engineer to them — not beyond.

Failure-domain checklist

  • No single point of failure on critical paths (LB, DB, queue all redundant)
  • Health checks + automated failover, tested via game-days / chaos engineering
  • Timeouts on every network call; circuit breakers on every dependency
  • Idempotent writes so retries are safe
  • Graceful degradation paths defined for each dependency outage
  • Backups + restore tested (an untested backup is a hope, not a backup)

Decision Toolkit

Master Trade-Off Matrix

My one-screen reference. For each decision: what you gain, what you give up, and the signal that tells you which way to lean.

DecisionLean A when…Lean B when…The cost you pay
Consistency vs AvailabilityStale read causes harm (money, inventory)Uptime > freshness (feeds, carts)Latency & coordination, or staleness
Vertical vs HorizontalBelow the box ceiling; want simplicityPast one machine; need HADistributed-systems complexity
SQL vs NoSQLRelations, transactions, ad-hoc queriesKnown access pattern, massive scaleJoins & flexibility, or write scale
Normalize vs DenormalizeWrite-heavy, integrity-criticalRead-heavy, latency-criticalRead joins, or write fan-out + drift
Sync vs AsyncCaller needs the answer nowWork can defer; decouple uptimeCoupling/latency, or eventual consistency
Monolith vs MicroservicesSmall team, early product, <clear domains>Many teams, clear bounded contextsCoupling, or distributed ops tax
Cache vs no cacheRead-heavy, tolerant of slight stalenessStrong freshness, low read volumeInvalidation complexity & stale bugs
Strong vs eventual replicationCorrectness on every readGeo-latency & availability matterWrite latency, or conflict resolution
Build vs BuyCore differentiator, special needsCommodity capability (auth, email, search)Maintenance burden, or vendor lock-in + $
Batch vs StreamPeriodic, high-throughput, simplerReal-time needs, freshness mattersLatency, or operational complexity

Decision Toolkit

Decision Framework I Use

How I actually run a design decision in a room full of strong opinions. The output is not "the answer" — it is a defensible answer with the rejected alternatives written down.

  1. Quantify requirements & constraints first. Read/write ratio, QPS (avg & peak), data size & growth, latency budget (p50/p99), consistency needs, availability SLO, budget, team size & skills, compliance/residency. Numbers, not adjectives. Most arguments dissolve once these are on the board.
  2. Estimate on the back of an envelope. Use the latency & capacity numbers to find the binding constraint. Is this storage-bound, compute-bound, bandwidth-bound, or latency-bound? The bottleneck picks the architecture.
  3. Identify the one-way doors. Mark which decisions are expensive to reverse (data model, shard key, sync/async boundaries, public contracts). Spend analysis time proportional to reversal cost.
  4. Generate 2–3 candidate designs. Not one. A single option is a decision in disguise. Include the boring option (often a modular monolith + Postgres) as a baseline.
  5. Score against the constraints, name the trade-off. For each candidate, state explicitly what it optimizes and what it sacrifices. There is no winner without a named loser.
  6. Pick the weakest sufficient option. The simplest design that meets the constraints — not the most impressive. Complexity must be earned by a requirement.
  7. Write the ADR. Record context, options, decision, and consequences. Future-you and new joiners need the why, especially the rejected paths.
  8. Define the trigger to revisit. "We'll shard when the table exceeds X GB / p99 exceeds Y ms." Decisions have expiry conditions; name them so you evolve deliberately, not in a panic.
The question I keep asking Junior engineers ask "what's the best technology?" Senior engineers ask "what are the requirements?" I ask "what do we have to be wrong about for this to be the wrong choice, and how would we know?" Design for the decision to be reviewable, not just right.

Decision Toolkit

Documenting & Communicating Decisions

A trade-off you can't explain is a trade-off you can't defend. The Architecture Decision Record (ADR) is the artifact I rely on most.

ADR template (keep it to one page)

TitleShort, e.g. "ADR-014: Use Kafka for order events"
StatusProposed · Accepted · Superseded by ADR-NNN
ContextThe forces at play: requirements, constraints, the problem. Numbers here.
Options considered2–3 candidates, each with pros/cons. Show your work.
DecisionWhat we chose and the primary reason (the trade-off we accepted).
ConsequencesWhat gets easier, what gets harder, what we now owe (new ops, risks).
Revisit whenThe condition that should make us reopen this (scale, cost, SLA breach).
Communicating up vs down To executives: frame trade-offs in business terms — cost, time-to-market, risk, customer impact. ("Strong global consistency adds ~80ms to checkout and ~$X/mo; eventual is invisible to users here.") To engineers: frame in mechanism — round-trips, failure modes, coupling. Same decision, two languages.

Decision Toolkit

Trade-Off Anti-Patterns

The expensive mistakes — usually made by optimizing a virtue no one asked for, or copying a hyperscaler's solution without their problem.

Resume-driven

Resume-driven development

Choosing tech for novelty/CV value, not fit. Kubernetes + microservices + Kafka for a 1000-user app is a tax on every future change.

Premature

Premature optimization & scaling

Architecting for Google scale at startup scale. You pay full complexity now for load that may never arrive. Build for 10× current, not 10000×.

Cargo cult

Cargo-culting hyperscalers

"Netflix does microservices, so we must." Netflix's constraints (thousands of engineers, planetary scale) are not yours. Copy the reasoning, not the architecture.

Golden hammer

One tool for everything

"We use Mongo/Postgres/Kafka for everything." Forcing every workload into one store ignores that storage choice is a trade-off per access pattern.

Distributed monolith

Microservices without boundaries

Services that share a DB and deploy together — all the cost, none of the benefit. Split by data ownership or not at all.

Ignoring ops

"It works on my machine" scale

Designing for the happy path only. No observability, no failure injection, no capacity headroom. The trade-off you forgot to make gets made for you at 3 a.m.

Decision Toolkit

Pre-Flight Checklist

I run this before signing off on any non-trivial system design. If I can't answer a line, that's where the risk is hiding.

Requirements & scale

  • Read/write ratio and peak QPS are quantified
  • Data volume + growth rate estimated (1yr / 3yr)
  • Latency budget set per path (p50/p99)
  • Availability SLO + error budget agreed with the business
  • Consistency requirement named per data type

Data & state

  • Storage chosen per access pattern, not by default
  • Shard/partition key justified for load + queries
  • Replication topology & failover path defined
  • Backup + tested restore exists
  • Schema/contract evolution strategy in place

Resilience

  • No single point of failure on critical paths
  • Timeouts, retries (w/ jitter), circuit breakers everywhere
  • Idempotency keys on all writes that can retry
  • Graceful degradation path per dependency
  • Backpressure / load shedding designed, not assumed

Operability & cost

  • Observability: logs, metrics, traces, alerts on SLOs
  • Cost estimated and matched to the SLO (no over-buying nines)
  • Rollout/rollback & migration plan exists
  • ADR written with rejected options & revisit trigger
  • Simplest design that meets the constraints — complexity earned
The one sentence I leave with The goal is not the most sophisticated system — it is the simplest system that satisfies the constraints, with every trade-off named, measured, and written down so it can be revisited when the constraints change.

Sources

References & Sources

Annotated bibliography behind this system design trade-offs note — distributed-systems laws, latency intuition, data and scaling axes, communication patterns, resilience, and decision governance. Section tags (e.g. §04) show where each source informed the prose, tables, and diagrams. SVG figures, the master trade-off matrix, constraint diamond, and synthesis prose are my own unless noted.

Scope. Synthesis of textbooks, seminal papers, practitioner blogs, and industry patterns (May 2026). Latency bars, PACELC class labels, and availability-nines tables are teaching aids — re-measure against your workload, region, and SLO before committing to a one-way door (shard key, public API, sync↔async boundary).

Citations are numbered continuously [1]–[n] within this section.

Trade-off mental model & one-way doors (§01, §15, §18)

  1. Sowell, T., A Conflict of Visions / The Vision of the Anointed. Basic Books, 1980s–1990s. "There are no solutions, only trade-offs" — §01 lead-rule quote and §14 matrix framing. — §01, §14, §15.
  2. Richards, M., & Ford, N., Fundamentals of Software Architecture. O'Reilly, 2020. First Law ("everything is a trade-off") — hero lede and §01 budget-spending theme. — §01, hero, §15.
  3. Bezos, J., "Day 1 / Type 1 vs Type 2 decisions." Amazon shareholder letters & internal memo (circulated 2011). One-way vs two-way doors — §01 Principle 3 and §15 step 3. — §01, §15, §08.
  4. Truong, L. (synthesis). Constraint diamond SVG (performance · scale · cost · simplicity) — §01 four-budgets figure. LinhTruong.com — §01, §14.

Laws, theorems & org constraints (§02, §04, §11, §17)

  1. Brewer, E. A., "CAP Twelve Years Later: How the 'Rules' Have Changed." IEEE Computer, 2012. CAP during partition vs everyday latency trade-offs — §02 CAP card and §04 opening. — §02, §04.
  2. Gilbert, S., & Lynch, N., "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services." ACM SIGACT News, 2002. Formal CAP proof — background for §02 and §04 partition diagram. — §02, §04.
  3. Abadi, D., "Consistency Tradeoffs in Modern Distributed Database System Design." IEEE Computer, 2012. PACELC (PA/EL vs PC/EC) — §02 PACELC card and §04 PACELC table. — §02, §04.
  4. Amdahl, G. M., "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities." AFIPS, 1967. Serial fraction caps speedup — §02 Amdahl card. — §02, §06.
  5. Gunther, N. J., Guerrilla Capacity Planning / USL papers. Universal Scalability Law coherency penalty — §02 USL card. — §02, §06.
  6. Little, J. D. C., "A Proof for the Queuing Formula: L = λW." Operations Research, 1961. Little's Law — §02 queueing card and §03 capacity-math callout. — §02, §03, §15.
  7. Dean, J., & Barroso, L. A., "The Tail at Scale." Communications of the ACM, 2013. Tail latency under high utilization — §02 queueing-wall card. — §02, §13, §18.
  8. Conway, M. E., "How Do Committees Invent?" Datamation, 1968. Conway's Law — §02 org card and §11 distributed-monolith callout. — §02, §11, §17.

Latency numbers & back-of-envelope capacity (§03, §15)

  1. Dean, J., & Barroso, L. A., "Numbers Everyone Should Know" (talk slides). Order-of-magnitude latency ladder — §03 bar chart (L1 → cross-continent RTT). — §03.
  2. Bryant, R., & O'Hallaron, D., Computer Systems: A Programmer's Perspective. Pearson, 3rd ed. Memory hierarchy and I/O latency — §03 figcaption gaps. — §03.
  3. High Scalability, "Latency Numbers Every Programmer Should Know" (curated table). Community-maintained latency reference — §03 relative gaps. highscalability.com — §03.
  4. Kleppmann, M., Designing Data-Intensive Applications. O'Reilly, 2017. Envelope math, throughput, and latency budgets — §03 tip and §15 step 2. — §03, §15, §18.
  5. Barroso, L. A., Clidaras, J., & Hölzle, U., The Datacenter as a Computer. Morgan & Claypool, 2013. Scale-out economics — background for §06 horizontal scaling. — §03, §06.

CAP, PACELC & datastore classes (§04, §05, §08, §14)

  1. Kleppmann, Designing Data-Intensive Applications — Ch. 5–9. Replication, consistency, and partition behavior — §04–§05 and §08 replication topologies. — §04, §05, §08.
  2. DeCandia, G., et al., "Dynamo: Amazon's Highly Available Key-value Store." SOSP, 2007. AP/quorum tunability — §04 DynamoDB/Cassandra row and §08 leaderless card. — §04, §08.
  3. Corbett, J. C., et al., "Spanner: Google's Globally-Distributed Database." OSDI, 2012. Global strong consistency — §04 Spanner/Cockroach row. — §04, §05.
  4. Herlihy, M. P., & Wing, J. M., "Linearizability: A Correctness Condition for Concurrent Objects." ACM TOPLAS, 1990. Strongest consistency rung — §05 linearizable tick. — §05.
  5. Lamport, L., "Time, Clocks, and the Ordering of Events in a Distributed System." Communications of the ACM, 1978. Causal ordering — §05 causal consistency tick. — §05.
  6. Shapiro, M., et al., "Conflict-Free Replicated Data Types." SSS, 2011. CRDTs for conflict resolution — §05 hidden-cost callout and §08 multi-leader card. — §05, §08.
  7. Richardson, C., "Pattern: Database per service." microservices.io — §11 microservices diagram caption. microservices.io — §11, §12.

Scaling, statelessness & replication/partitioning (§06, §08, §14, §18)

  1. Kleppmann, DDIA — partitioning & replication chapters. Leader/follower, quorum, sharding strategies — §08 entire section. — §06, §08, §18.
  2. Karger, D., et al., "Consistent Hashing and Random Trees." STOC, 1997. Hash/ring partitioning — §08 hash-sharding row. — §08.
  3. Amazon ElastiCache / AWS Architecture Blog — horizontal scaling patterns. Stateless app tier + externalized state — §06 statelessness figure. aws.amazon.com/architecture — §06.
  4. Google SRE Team, Site Reliability Engineering. O'Reilly, 2016. Capacity planning and utilization targets — §02 tail-latency theme and §13 nines table. sre.google — §02, §13, §18.
  5. Truong, L. (synthesis). Vertical vs horizontal trade-off block and stateless-LB diagram — §06. — §06, §14.

Data models: SQL, NoSQL, OLTP/OLAP (§07, §14, §18)

  1. Codd, E. F., "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 1970. Relational/SQL foundation — §07 SQL column. — §07.
  2. Kleppmann, DDIA — data models chapter. Document, key-value, wide-column, graph comparison — §07 NoSQL table. — §07.
  3. Pritchett, D., "BASE: An Acid Alternative." ACM Queue, 2008. BASE vs ACID framing — §07 ACID/BASE trade-off block. — §07.
  4. Stonebraker, M., et al., "The End of an Architectural Era (It's Time for a Complete Rewrite)." VLDB, 2007. OLTP vs specialized stores — background for §07 OLTP/OLAP table. — §07.
  5. Inmon, W. H., & Kimball, R. — data warehousing literature. Column-oriented OLAP and ETL/ELT — §07 OLAP row and analytics-isolation callout. — §07, §18.
  6. Dehghani, Z., "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh." martinfowler.com, 2019. Analytics across services — background for §07 CDC/warehouse row. martinfowler.com — §07.

Caching strategies (§09, §14, §18)

  1. Fitzpatrick, B., "Distributed Caching with Memcached." Linux Journal, 2004. Cache-aside at scale — §09 cache-layer stack. — §09.
  2. AWS / Azure / Cloudflare CDN documentation (synthesized). Edge vs CDN vs app vs DB cache tiers — §09 caching-layer figure. — §09.
  3. Kleppmann, DDIA — caching & materialized views. Staleness vs latency trade — §09 figcaption and §14 cache row. — §09, §14.
  4. Facebook / Meta engineering posts on cache invalidation (synthesized). "Hard problem" of invalidation — §09 trade-off theme. — §09, §18.

Sync vs async, APIs & messaging (§10, §12, §14, §18)

  1. Fielding, R. T., Architectural Styles and the Design of Network-based Software Architectures (REST). UC Irvine, 2000. REST constraints — §10 REST row. — §10.
  2. gRPC / Protocol Buffers documentation. HTTP/2 RPC and typed contracts — §10 gRPC row. grpc.io — §10.
  3. GraphQL specification & Facebook engineering notes. Client-driven field selection — §10 GraphQL row. graphql.org — §10.
  4. Hohpe, G., & Woolf, B., Enterprise Integration Patterns. Addison-Wesley, 2003. Messaging, pub/sub, competing consumers — §10 queue vs log cards. — §10, §12.
  5. Kreps, J., et al., "Kafka: A Distributed Messaging System for Log Processing." NetDB workshop, 2011. Event log / replay — §10 Kafka card and §12 EDA card. — §10, §12.
  6. Kreps, J., "Exactly-Once Semantics Are Possible: Here's How Kafka Does It." Confluent blog, 2017. At-least-once + idempotency reality — §10 idempotency callout. confluent.io — §10, §13, §18.
  7. Richardson, C., "Pattern: Messaging." microservices.io. Sync vs async coupling — §10 trade-off block. microservices.io — §10, §14.

Monolith vs microservices (§11, §14, §17)

  1. Fowler, M., & Lewis, J., "Microservices." martinfowler.com, 2014. Service boundaries and distributed costs — §11 table and §17 cargo-cult card. martinfowler.com — §11, §17.
  2. Newman, S., "Monolith First." martinfowler.com, 2015. Modular monolith baseline — §11 opening sub and §15 step 4. martinfowler.com — §11, §15, §17.
  3. Newman, S., Building Microservices (2nd ed.). O'Reilly, 2021. Independent deployability vs ops tax — §11 comparison table. — §11, §14.
  4. Evans, E., Domain-Driven Design. Addison-Wesley, 2003. Bounded contexts and modular monolith — §11 split-by-capability callout. — §11, §15.
  5. Richardson, C., "Antipattern: Shared database." microservices.io — §11 distributed-monolith callout and §17 distributed-monolith card. — §11, §17.
  6. Skelton, M., & Pais, M., Team Topologies. IT Revolution, 2019. Team–service alignment — §11 Conway callout. — §11, §17.

CQRS, event sourcing, saga & EDA (§12, §14, §18)

  1. Young, G., "CQRS Documents" / CQRS pattern posts. Separate read/write models — §12 CQRS card. cqrs.files.wordpress.com — §12.
  2. Fowler, M., "Event Sourcing." martinfowler.com. Event log as source of truth — §12 event-sourcing card. martinfowler.com — §12.
  3. Richardson, C., "Pattern: Saga." microservices.io. Compensating transactions — §12 saga card and §11 diagram caption. microservices.io — §11, §12, §18.
  4. Garcia-Molina, H., & Salem, K., "Sagas." ACM SIGMOD Record, 1987. Original saga concept — background for §12. — §12.
  5. Richardson, C., Microservices Patterns. Manning, 2018. Choreography vs orchestration sagas — §12 saga table. — §12.
  6. Evans, G., Domain-Driven Design Reference — domain events. Event-driven architecture — §12 EDA card. — §12.

Resilience, SLOs & failure modes (§13, §18)

  1. Nygard, M., Release It! (2nd ed.). Pragmatic Bookshelf, 2018. Circuit breaker, bulkhead, timeout, stability patterns — §13 pattern cards. — §13, §18.
  2. Netflix Hystrix / resilience4j documentation (synthesized). Circuit breaker states — §13 circuit-breaker card. — §13.
  3. Google SRE — SLOs, error budgets, nines. Availability vs cost — §13 nines table and diminishing-returns callout. sre.google — §13, §18.
  4. AWS Architecture Blog, "Exponential Backoff And Jitter." Retry-storm prevention — §13 retries card. aws.amazon.com — §13, §18.
  5. Basiri, A., et al., "Chaos Engineering." IEEE Software / Netflix practice. Game-days and failure injection — §13 failure-domain checklist. — §13, §18.
  6. Reactive Streams / backpressure specification. Bounded queues and flow control — §13 backpressure card. reactive-streams.org — §13.
  7. Dean & Barroso, "The Tail at Scale." Cascading latency under load — §13 resilience context. — §13.

Decision framework, ADRs & anti-patterns (§14–§18)

  1. Truong, L. (synthesis). Master trade-off matrix — §14 one-screen reference table. — §14.
  2. Nygard, M., "Documenting Architecture Decisions." Cognitect, 2011. ADR template — §16 entire section and §18 operability checklist. cognitect.com — §15, §16, §18.
  3. Thomson, J., "Architecture Decision Records." adr.github.io — §16 ADR template table. — §16, §18.
  4. Beck, K., Extreme Programming Explained — YAGNI. Simplest sufficient design — §15 step 6 and §17 golden-hammer card. — §15, §17.
  5. Knuth, D., "Structured Programming with go to Statements." ACM Computing Surveys, 1974. "Premature optimization" quote — §17 premature-scaling card. — §17.
  6. Fowler, M., "Big Ball of Mud." martinfowler.com. Structureless systems — analogy for §17 resume-driven theme. martinfowler.com — §17.
  7. Netflix engineering culture posts (synthesized). Hyperscaler context — §17 cargo-cult card. — §17.
  8. Truong, L., System Design Trade-Offs — personal working notes. May 2026. Constraint diamond, latency bar chart, CAP partition diagram, consistency spectrum, stateless scaling figure, monolith/microservices SVG, caching stack, trade-off blocks, pre-flight checklist, and synthesis prose. LinhTruong.com — all sections.
Before you cite externally. PACELC labels and datastore examples in §04 reflect typical defaults — vendors change tunables. Latency numbers in §03 drift with hardware generations; use them for ratios, not SLAs. "Exactly once" in messaging (§10) usually means idempotent at-least-once in practice. Match availability nines (§13) to business impact, not ego. Most teams should start with a modular monolith (Newman, Fowler) and earn microservices with team and load evidence. Re-measure every one-way door against your own constraints before production.