System Design Trade-Offs
There are no best architectures — only architectures that are right for a set of constraints. I wrote this note to map the trade-off space I navigate in system design: the laws that bound the design, the axes where you spend your one budget, and the framework I use to choose deliberately and defend the choice.
Foundations
The Trade-Off Mental Model
Every architectural decision moves a slider. Moving it toward one virtue spends a budget that could have bought another. The skill I care about is not knowing the "right" answer — it is naming the budget you are spending and proving it is the cheapest one to spend for this system.
"There are no solutions, only trade-offs." — Thomas Sowell · Adapted to engineering by every architect who ever shipped at scale.
You cannot maximize everything
Latency, consistency, cost, simplicity, and flexibility pull against each other. Optimizing one past a point degrades another. Pick the two or three that matter for the business and let the rest be "good enough."
Constraints reveal the answer
The "best" design falls out of the requirements + constraints: read/write ratio, scale, consistency needs, team size, latency budget, and money. Quantify these first; the architecture is then mostly a derivation.
Reversible vs one-way doors
Decide fast on reversible choices; deliberate hard on one-way doors (data model, partition key, public API contract, sync↔async boundaries). Spend your analysis budget where the cost of being wrong is highest.
The four budgets you are always spending
Foundations
Laws & Theorems That Bound Every Design
These are not opinions — they are constraints. Knowing them stops you from promising the impossible (e.g., "strongly consistent, always available, across regions").
CAP Theorem
During a network partition, you must choose Consistency or Availability. You cannot have both. When the network is healthy, the choice is moot — which is why PACELC matters more in practice.
PACELC
If Partition → trade A vs C; Else → trade Latency vs Consistency. Captures the everyday cost: strong consistency adds round-trips even when nothing is broken.
Amdahl's & Universal Scalability Law
Speedup is capped by the serial fraction (Amdahl). USL adds a coherency penalty: past a point, adding nodes makes throughput worse due to cross-node coordination. Contention is the enemy of scale.
Little's Law
L = λ × W. Concurrency = arrival
rate × latency. Cut latency and you cut the in-flight work (threads, connections, memory) you must
provision. The cheapest capacity is lower latency.
Tail latency & the queueing wall
As utilization → 100%, queue time explodes non-linearly. Run hot (>~70–80%) and p99 latency detonates. Plan capacity against the tail, not the mean.
Conway's Law
Systems mirror the communication structure of the org that builds them. Want clean service boundaries? Shape the teams first. Microservices imposed on a monolithic org produce a distributed monolith.
us-east can't reach eu-west?"
turns an opinion war into a requirements decision.
Foundations
Latency Numbers Every Engineer Should Know
Order-of-magnitude intuition is what separates a back-of-the-envelope estimate from hand-waving. Memorize the relative gaps, not the exact figures.
~12 req/s. 1 M requests with a 200 ms
p99 needs ≈ 0.2 × peak_rps concurrent workers (Little's Law). 1 KB × 1 M rows ≈ 1 GB.
These let you sanity-check any design on a whiteboard.
Core Axis · Distributed State
CAP & PACELC — Consistency vs Availability
The first fork in any distributed datastore decision. State which side you land on, and what happens to the other guarantee during a partition.
| System | PACELC class | Behavior | Use it for |
|---|---|---|---|
| PostgreSQL / MySQL (single primary) | PC/EC | Strong consistency; primary is a SPOF for writes | Transactions, money, relational integrity |
| Spanner / CockroachDB / YugabyteDB | PC/EC | Global strong consistency via consensus + synced clocks; pays latency | Global OLTP needing strong guarantees |
| DynamoDB / Cassandra (default) | PA/EL | Available, eventually consistent; tunable quorums | High-scale writes, carts, sessions, feeds |
| MongoDB (default) | PA/EC | Available under partition; consistent reads from primary when healthy | Flexible documents, mixed workloads |
| Redis (single) | — | In-memory, strong on one node; replication is async (can lose writes on failover) | Cache, ephemeral, leaderboards |
Core Axis · Distributed State
Consistency Models — The Spectrum
"Consistency" is not binary. It is a ladder from strict to eventual; each rung trades latency and availability for stronger guarantees. Pick the weakest model that still meets correctness.
When you truly need it
Account balances, inventory decrements, unique username claims, distributed locks, "exactly one winner" auctions. Anywhere a stale read causes double-spend or oversell.
When weak is plenty
Like counts, view counts, social feeds, product catalogs, recommendations, analytics, presence indicators. A few seconds of staleness is invisible to users and buys massive scale + availability.
Core Axis · Scale
Vertical vs Horizontal Scaling
Buy a bigger box, or buy more boxes? Vertical is simpler and faster to reach for; horizontal is the only path past a single machine's ceiling — at the price of distributed-systems complexity.
⬆ Vertical (scale up)
- Bigger CPU / RAM / faster disks on one node
- No code changes; no distribution problems
- Strong consistency stays trivial
- Hard ceiling + exponential $ per unit
- Single point of failure; disruptive upgrades
➡ Horizontal (scale out)
- Many commodity nodes behind a balancer
- Near-linear capacity & built-in redundancy
- Requires statelessness, sharding, coordination
- Introduces CAP, partial failure, data movement
- Operational + cognitive overhead climbs
The prerequisite: statelessness
Core Axis · Data Model
SQL vs NoSQL — and the Many NoSQLs
The most consequential one-way door in most systems. Choose for your access patterns and consistency needs, not for hype. "NoSQL" is four very different things.
| Dimension | Relational (SQL) | Document | Key-Value | Wide-Column | Graph |
|---|---|---|---|---|---|
| Examples | Postgres, MySQL | MongoDB, Couchbase | Redis, DynamoDB | Cassandra, ScyllaDB, Bigtable | Neo4j, Neptune |
| Schema | Rigid, enforced | Flexible | None | Flexible columns | Nodes + edges |
| Best query | Joins, ad-hoc, aggregates | Whole-document fetch | Get/put by key | Wide rows by partition key | Relationship traversal |
| Consistency | Strong (ACID) | Tunable | Tunable | Tunable | Strong |
| Horizontal scale | Harder (sharding) | Good | Excellent | Excellent | Hard |
| Sweet spot | Transactions, reporting | Catalogs, profiles, CMS | Cache, sessions, counters | Time-series, events, feeds | Social, fraud, recommendations |
Normalized
- One source of truth, no duplication
- Cheap, safe writes
- Reads pay join cost
- Default for OLTP / SQL
Denormalized
- Pre-joined, read-optimized
- Fast reads, no joins
- Writes fan out; risk of drift
- Default for NoSQL / read-heavy
ACID
- Atomic, Consistent, Isolated, Durable
- Correctness by construction
- Coordination limits scale
BASE
- Basically Available, Soft state, Eventual
- Scales horizontally with ease
- App handles conflicts/retries
OLTP vs OLAP — don't run analytics on your transactional store
| OLTP (transactional) | OLAP (analytical) | |
|---|---|---|
| Pattern | Many small reads/writes by key | Few huge scans & aggregations |
| Storage | Row-oriented | Column-oriented (Parquet, Redshift, BigQuery, ClickHouse) |
| Goal | Low latency per op | High throughput per query |
| Move data via | — | CDC / ETL / ELT into a warehouse or lakehouse |
Core Axis · Scale
Replication & Partitioning
Replication buys availability and read scale (copies of the same data). Partitioning/sharding buys write scale and capacity (splits of different data). Most large systems do both.
Replication topologies
Leader–follower
All writes to one leader, async/sync to followers. Simple, strong-ish. Trade: leader is a write SPOF; failover + replication lag.
Active–active
Writes accepted in multiple regions. Great for geo-latency & offline. Trade: write conflicts you must resolve (LWW / CRDT / app logic).
Quorum (Dynamo-style)
Read/write to any N nodes; tune
W + R > N for consistency. Trade: tunable but complex; read-repair & anti-entropy.
N replicas, writing to W and reading from R: if
W + R > N you are guaranteed to read the latest write. Example N=3, W=2, R=2:
strong-ish with one node down. Lower R for fast reads (accept staleness); lower W
for fast writes (accept lost-update risk).Partitioning (sharding) strategies
| Strategy | How | Pro | Con |
|---|---|---|---|
| Hash / consistent hashing | shard = hash(key) mod ring | Even distribution; minimal reshuffle on resize | Range scans impossible; hard to query ranges |
| Range | shard = key falls in [a,b) | Efficient range scans & ordering | Hotspots if keys are skewed (e.g. timestamps) |
| Directory / lookup | explicit map key→shard | Full flexibility, easy rebalancing | Lookup service is a dependency & SPOF |
| Geo | shard by region | Data locality, residency compliance | Cross-region queries are expensive |
customer_id when one customer is 40% of traffic) creates a permanent hotspot that is
brutally expensive to fix later. Model the access pattern before you commit.Core Axis · Performance
Caching — The Cheapest Latency Win and the Hardest Bug
Caching trades freshness and complexity for latency and load reduction. "There are only two hard things in CS: cache invalidation and naming things." Plan invalidation before you add the cache.
Write/Read strategies
| Pattern | How it works | Pro | Con / risk |
|---|---|---|---|
| Cache-aside (lazy) | App checks cache; on miss, reads DB and populates | Resilient, only caches what's used | First hit slow; stale until TTL; thundering herd on miss |
| Read-through | Cache library fetches from DB on miss | Transparent to app code | Cache becomes a hard dependency |
| Write-through | Write to cache & DB synchronously | Cache always fresh | Write latency = cache + DB; caches unread data |
| Write-back (write-behind) | Write to cache, flush to DB async | Fast writes, absorbs bursts | Data loss if cache dies before flush |
| Write-around | Write to DB only; cache fills on read | Avoids caching write-once data | Recently written data reads are slow misses |
When the cache is full
- LRU — evict least recently used (good general default)
- LFU — evict least frequently used (good for skewed popularity)
- FIFO / TTL — simple, time-bounded freshness
- W-TinyLFU — modern hybrid (Caffeine), high hit rates
The three cache stampedes
- Penetration — queries for keys that don't exist; cache never helps. Fix: cache negatives / bloom filter.
- Avalanche — many keys expire at once → DB spike. Fix: jittered TTLs.
- Stampede / dogpile — hot key expires, N requests rebuild it. Fix: single-flight lock / stale-while-revalidate.
Where the cache lives (layers)
Core Axis · Communication
Synchronous vs Asynchronous & the API Spectrum
How services talk is as consequential as how they store. Sync is simple and immediate but couples availability; async decouples and absorbs load but adds eventual consistency and operational surface.
Synchronous · REST/gRPC
- Immediate result, easy to reason about
- Linear request flow, simple debugging
- Temporal coupling: callee down ⇒ caller down
- Latency adds up across the chain
- Backpressure must be handled explicitly
Asynchronous · queues/events
- Decoupled; producer ≠ consumer uptime
- Absorbs spikes; natural load leveling
- Eventual consistency & out-of-order delivery
- Harder to trace; need idempotency + DLQs
- Enables fan-out & replay
API styles
| Style | Transport / shape | Strengths | Weaknesses | Use for |
|---|---|---|---|---|
| REST | HTTP/JSON, resources | Ubiquitous, cacheable, simple | Over/under-fetching; chatty | Public APIs, CRUD, broad compatibility |
| gRPC | HTTP/2, protobuf, RPC | Fast, typed contracts, streaming | Binary, browser/edge friction | Internal service-to-service, low latency |
| GraphQL | HTTP, query language | Client picks fields; one round-trip | Caching & rate-limiting complexity; N+1 | Aggregating many sources for varied UIs |
| Webhooks / SSE / WebSocket | Server-push | Real-time, server-initiated | Connection state, scaling fan-out | Notifications, live updates, chat |
Queue vs Log (message broker vs event stream)
RabbitMQ / SQS — work distribution
Message consumed once and deleted; competing consumers split a task queue. Great for job processing. Trade: no replay, ordering is limited.
Kafka / Kinesis / Pulsar — event streaming
Append-only, retained log; many consumers read independently and can replay. Great for event sourcing, analytics, fan-out. Trade: more ops, partition ordering only.
Architecture
Monolith vs Microservices
The defining org-and-tech trade-off of the last decade. Microservices trade in-process simplicity for independent deployability — and buy a distributed system's worth of new problems. Most teams should start with a modular monolith.
| Dimension | Monolith | Microservices |
|---|---|---|
| Deployment | One artifact, simple | Independent per service, needs CI/CD maturity |
| Scaling | Scale the whole app | Scale hot services independently |
| Team autonomy | Coupled; coordination tax | Teams own & ship independently |
| Transactions | ACID, trivial | Distributed; sagas + eventual consistency |
| Debugging | Stack trace, one process | Distributed tracing required |
| Failure mode | All-or-nothing | Partial failure (good & bad) |
| Operational cost | Low | High (mesh, observability, infra) |
Architecture
Patterns: CQRS, Event Sourcing, Saga, EDA
Powerful tools that each solve a real problem — and each add real complexity. Reach for them only when the constraint they address is actually present.
CQRS
Separate the write model from one or more read models, each optimized for its job.
Use when: read & write loads/shapes diverge sharply.
Cost: two models to keep in sync; eventual consistency between them.
Event Sourcing
Store the sequence of events, derive state by replaying. The log is the source of truth.
Use when: you need a full audit trail, time-travel, or to rebuild projections.
Cost: schema evolution of events, snapshots, steep mental model.
Saga
Replace a cross-service ACID transaction with a sequence of local transactions + compensating actions on failure.
Use when: a business process spans services (order → payment → inventory → ship).
Cost: you write the rollback logic; no isolation — intermediate states are visible.
Event-Driven Architecture
Services emit events; others react. Producers don't know consumers.
Use when: you need loose coupling, fan-out, extensibility without changing producers.
Cost: emergent behavior is hard to trace; eventual consistency everywhere.
Saga: choreography vs orchestration
| Choreography (events) | Orchestration (coordinator) | |
|---|---|---|
| Control | Decentralized; each service reacts to events | Central orchestrator drives the steps |
| Pro | Loose coupling, no central bottleneck | Explicit flow, easy to see & debug |
| Con | Hard to follow the whole flow; cyclic risk | Orchestrator is coupling + a SPOF to manage |
| Use | Few steps, simple flows | Many steps, complex compensation logic |
Architecture
Resilience & Reliability Patterns
At scale, failure is the steady state, not the exception. Reliability is bought with redundancy, isolation, and graceful degradation — each trading cost and complexity for uptime.
Circuit Breaker
Trip open after repeated downstream failures; fail fast instead of piling on. Half-open to test recovery. Prevents cascading failure.
Rate Limiting & Throttling
Token/leaky bucket caps inbound load. Protects you from abuse & thundering herds. Trade: legitimate spikes get 429'd — tune carefully.
Retries + Backoff + Jitter
Retry transient errors with exponential backoff and jitter. Naïve retries cause retry storms that turn a blip into an outage.
Bulkheads
Isolate resource pools (thread pools, connection pools, cells) so one failing dependency can't drown the whole service.
Fallbacks & Timeouts
Every remote call needs a timeout. On failure, serve cached/default/partial data. A degraded page beats an error page.
Backpressure
Let slow consumers signal producers to slow down (bounded queues, reactive streams). Unbounded buffering just relocates the crash to OOM.
Redundancy & the cost of nines
| Availability | Downtime / year | Typical cost driver |
|---|---|---|
| 99% ("two nines") | ~3.65 days | Single region, manual recovery |
| 99.9% ("three nines") | ~8.77 hours | Redundant instances, automated failover |
| 99.99% ("four nines") | ~52.6 minutes | Multi-AZ, no single points of failure |
| 99.999% ("five nines") | ~5.26 minutes | Multi-region active-active, heavy investment |
Failure-domain checklist
- No single point of failure on critical paths (LB, DB, queue all redundant)
- Health checks + automated failover, tested via game-days / chaos engineering
- Timeouts on every network call; circuit breakers on every dependency
- Idempotent writes so retries are safe
- Graceful degradation paths defined for each dependency outage
- Backups + restore tested (an untested backup is a hope, not a backup)
Decision Toolkit
Master Trade-Off Matrix
My one-screen reference. For each decision: what you gain, what you give up, and the signal that tells you which way to lean.
| Decision | Lean A when… | Lean B when… | The cost you pay |
|---|---|---|---|
| Consistency vs Availability | Stale read causes harm (money, inventory) | Uptime > freshness (feeds, carts) | Latency & coordination, or staleness |
| Vertical vs Horizontal | Below the box ceiling; want simplicity | Past one machine; need HA | Distributed-systems complexity |
| SQL vs NoSQL | Relations, transactions, ad-hoc queries | Known access pattern, massive scale | Joins & flexibility, or write scale |
| Normalize vs Denormalize | Write-heavy, integrity-critical | Read-heavy, latency-critical | Read joins, or write fan-out + drift |
| Sync vs Async | Caller needs the answer now | Work can defer; decouple uptime | Coupling/latency, or eventual consistency |
| Monolith vs Microservices | Small team, early product, <clear domains> | Many teams, clear bounded contexts | Coupling, or distributed ops tax |
| Cache vs no cache | Read-heavy, tolerant of slight staleness | Strong freshness, low read volume | Invalidation complexity & stale bugs |
| Strong vs eventual replication | Correctness on every read | Geo-latency & availability matter | Write latency, or conflict resolution |
| Build vs Buy | Core differentiator, special needs | Commodity capability (auth, email, search) | Maintenance burden, or vendor lock-in + $ |
| Batch vs Stream | Periodic, high-throughput, simpler | Real-time needs, freshness matters | Latency, or operational complexity |
Decision Toolkit
Decision Framework I Use
How I actually run a design decision in a room full of strong opinions. The output is not "the answer" — it is a defensible answer with the rejected alternatives written down.
- Quantify requirements & constraints first. Read/write ratio, QPS (avg & peak), data size & growth, latency budget (p50/p99), consistency needs, availability SLO, budget, team size & skills, compliance/residency. Numbers, not adjectives. Most arguments dissolve once these are on the board.
- Estimate on the back of an envelope. Use the latency & capacity numbers to find the binding constraint. Is this storage-bound, compute-bound, bandwidth-bound, or latency-bound? The bottleneck picks the architecture.
- Identify the one-way doors. Mark which decisions are expensive to reverse (data model, shard key, sync/async boundaries, public contracts). Spend analysis time proportional to reversal cost.
- Generate 2–3 candidate designs. Not one. A single option is a decision in disguise. Include the boring option (often a modular monolith + Postgres) as a baseline.
- Score against the constraints, name the trade-off. For each candidate, state explicitly what it optimizes and what it sacrifices. There is no winner without a named loser.
- Pick the weakest sufficient option. The simplest design that meets the constraints — not the most impressive. Complexity must be earned by a requirement.
- Write the ADR. Record context, options, decision, and consequences. Future-you and new joiners need the why, especially the rejected paths.
- Define the trigger to revisit. "We'll shard when the table exceeds X GB / p99 exceeds Y ms." Decisions have expiry conditions; name them so you evolve deliberately, not in a panic.
Decision Toolkit
Documenting & Communicating Decisions
A trade-off you can't explain is a trade-off you can't defend. The Architecture Decision Record (ADR) is the artifact I rely on most.
ADR template (keep it to one page)
| Title | Short, e.g. "ADR-014: Use Kafka for order events" |
|---|---|
| Status | Proposed · Accepted · Superseded by ADR-NNN |
| Context | The forces at play: requirements, constraints, the problem. Numbers here. |
| Options considered | 2–3 candidates, each with pros/cons. Show your work. |
| Decision | What we chose and the primary reason (the trade-off we accepted). |
| Consequences | What gets easier, what gets harder, what we now owe (new ops, risks). |
| Revisit when | The condition that should make us reopen this (scale, cost, SLA breach). |
Decision Toolkit
Trade-Off Anti-Patterns
The expensive mistakes — usually made by optimizing a virtue no one asked for, or copying a hyperscaler's solution without their problem.
Resume-driven development
Choosing tech for novelty/CV value, not fit. Kubernetes + microservices + Kafka for a 1000-user app is a tax on every future change.
Premature optimization & scaling
Architecting for Google scale at startup scale. You pay full complexity now for load that may never arrive. Build for 10× current, not 10000×.
Cargo-culting hyperscalers
"Netflix does microservices, so we must." Netflix's constraints (thousands of engineers, planetary scale) are not yours. Copy the reasoning, not the architecture.
One tool for everything
"We use Mongo/Postgres/Kafka for everything." Forcing every workload into one store ignores that storage choice is a trade-off per access pattern.
Microservices without boundaries
Services that share a DB and deploy together — all the cost, none of the benefit. Split by data ownership or not at all.
"It works on my machine" scale
Designing for the happy path only. No observability, no failure injection, no capacity headroom. The trade-off you forgot to make gets made for you at 3 a.m.
Decision Toolkit
Pre-Flight Checklist
I run this before signing off on any non-trivial system design. If I can't answer a line, that's where the risk is hiding.
Requirements & scale
- Read/write ratio and peak QPS are quantified
- Data volume + growth rate estimated (1yr / 3yr)
- Latency budget set per path (p50/p99)
- Availability SLO + error budget agreed with the business
- Consistency requirement named per data type
Data & state
- Storage chosen per access pattern, not by default
- Shard/partition key justified for load + queries
- Replication topology & failover path defined
- Backup + tested restore exists
- Schema/contract evolution strategy in place
Resilience
- No single point of failure on critical paths
- Timeouts, retries (w/ jitter), circuit breakers everywhere
- Idempotency keys on all writes that can retry
- Graceful degradation path per dependency
- Backpressure / load shedding designed, not assumed
Operability & cost
- Observability: logs, metrics, traces, alerts on SLOs
- Cost estimated and matched to the SLO (no over-buying nines)
- Rollout/rollback & migration plan exists
- ADR written with rejected options & revisit trigger
- Simplest design that meets the constraints — complexity earned
Sources
References & Sources
Annotated bibliography behind this system design trade-offs note — distributed-systems laws, latency intuition, data and scaling axes, communication patterns, resilience, and decision governance. Section tags (e.g. §04) show where each source informed the prose, tables, and diagrams. SVG figures, the master trade-off matrix, constraint diamond, and synthesis prose are my own unless noted.
Scope. Synthesis of textbooks, seminal papers, practitioner blogs, and industry patterns (May 2026). Latency bars, PACELC class labels, and availability-nines tables are teaching aids — re-measure against your workload, region, and SLO before committing to a one-way door (shard key, public API, sync↔async boundary).
Citations are numbered continuously [1]–[n] within this section.
Trade-off mental model & one-way doors (§01, §15, §18)
- Sowell, T., A Conflict of Visions / The Vision of the Anointed. Basic Books, 1980s–1990s. "There are no solutions, only trade-offs" — §01 lead-rule quote and §14 matrix framing. — §01, §14, §15.
- Richards, M., & Ford, N., Fundamentals of Software Architecture. O'Reilly, 2020. First Law ("everything is a trade-off") — hero lede and §01 budget-spending theme. — §01, hero, §15.
- Bezos, J., "Day 1 / Type 1 vs Type 2 decisions." Amazon shareholder letters & internal memo (circulated 2011). One-way vs two-way doors — §01 Principle 3 and §15 step 3. — §01, §15, §08.
- Truong, L. (synthesis). Constraint diamond SVG (performance · scale · cost · simplicity) — §01 four-budgets figure. LinhTruong.com — §01, §14.
Laws, theorems & org constraints (§02, §04, §11, §17)
- Brewer, E. A., "CAP Twelve Years Later: How the 'Rules' Have Changed." IEEE Computer, 2012. CAP during partition vs everyday latency trade-offs — §02 CAP card and §04 opening. — §02, §04.
- Gilbert, S., & Lynch, N., "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services." ACM SIGACT News, 2002. Formal CAP proof — background for §02 and §04 partition diagram. — §02, §04.
- Abadi, D., "Consistency Tradeoffs in Modern Distributed Database System Design." IEEE Computer, 2012. PACELC (PA/EL vs PC/EC) — §02 PACELC card and §04 PACELC table. — §02, §04.
- Amdahl, G. M., "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities." AFIPS, 1967. Serial fraction caps speedup — §02 Amdahl card. — §02, §06.
- Gunther, N. J., Guerrilla Capacity Planning / USL papers. Universal Scalability Law coherency penalty — §02 USL card. — §02, §06.
- Little, J. D. C., "A Proof for the Queuing Formula: L = λW." Operations Research, 1961. Little's Law — §02 queueing card and §03 capacity-math callout. — §02, §03, §15.
- Dean, J., & Barroso, L. A., "The Tail at Scale." Communications of the ACM, 2013. Tail latency under high utilization — §02 queueing-wall card. — §02, §13, §18.
- Conway, M. E., "How Do Committees Invent?" Datamation, 1968. Conway's Law — §02 org card and §11 distributed-monolith callout. — §02, §11, §17.
Latency numbers & back-of-envelope capacity (§03, §15)
- Dean, J., & Barroso, L. A., "Numbers Everyone Should Know" (talk slides). Order-of-magnitude latency ladder — §03 bar chart (L1 → cross-continent RTT). — §03.
- Bryant, R., & O'Hallaron, D., Computer Systems: A Programmer's Perspective. Pearson, 3rd ed. Memory hierarchy and I/O latency — §03 figcaption gaps. — §03.
- High Scalability, "Latency Numbers Every Programmer Should Know" (curated table). Community-maintained latency reference — §03 relative gaps. highscalability.com — §03.
- Kleppmann, M., Designing Data-Intensive Applications. O'Reilly, 2017. Envelope math, throughput, and latency budgets — §03 tip and §15 step 2. — §03, §15, §18.
- Barroso, L. A., Clidaras, J., & Hölzle, U., The Datacenter as a Computer. Morgan & Claypool, 2013. Scale-out economics — background for §06 horizontal scaling. — §03, §06.
CAP, PACELC & datastore classes (§04, §05, §08, §14)
- Kleppmann, Designing Data-Intensive Applications — Ch. 5–9. Replication, consistency, and partition behavior — §04–§05 and §08 replication topologies. — §04, §05, §08.
- DeCandia, G., et al., "Dynamo: Amazon's Highly Available Key-value Store." SOSP, 2007. AP/quorum tunability — §04 DynamoDB/Cassandra row and §08 leaderless card. — §04, §08.
- Corbett, J. C., et al., "Spanner: Google's Globally-Distributed Database." OSDI, 2012. Global strong consistency — §04 Spanner/Cockroach row. — §04, §05.
- Herlihy, M. P., & Wing, J. M., "Linearizability: A Correctness Condition for Concurrent Objects." ACM TOPLAS, 1990. Strongest consistency rung — §05 linearizable tick. — §05.
- Lamport, L., "Time, Clocks, and the Ordering of Events in a Distributed System." Communications of the ACM, 1978. Causal ordering — §05 causal consistency tick. — §05.
- Shapiro, M., et al., "Conflict-Free Replicated Data Types." SSS, 2011. CRDTs for conflict resolution — §05 hidden-cost callout and §08 multi-leader card. — §05, §08.
- Richardson, C., "Pattern: Database per service." microservices.io — §11 microservices diagram caption. microservices.io — §11, §12.
Scaling, statelessness & replication/partitioning (§06, §08, §14, §18)
- Kleppmann, DDIA — partitioning & replication chapters. Leader/follower, quorum, sharding strategies — §08 entire section. — §06, §08, §18.
- Karger, D., et al., "Consistent Hashing and Random Trees." STOC, 1997. Hash/ring partitioning — §08 hash-sharding row. — §08.
- Amazon ElastiCache / AWS Architecture Blog — horizontal scaling patterns. Stateless app tier + externalized state — §06 statelessness figure. aws.amazon.com/architecture — §06.
- Google SRE Team, Site Reliability Engineering. O'Reilly, 2016. Capacity planning and utilization targets — §02 tail-latency theme and §13 nines table. sre.google — §02, §13, §18.
- Truong, L. (synthesis). Vertical vs horizontal trade-off block and stateless-LB diagram — §06. — §06, §14.
Data models: SQL, NoSQL, OLTP/OLAP (§07, §14, §18)
- Codd, E. F., "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 1970. Relational/SQL foundation — §07 SQL column. — §07.
- Kleppmann, DDIA — data models chapter. Document, key-value, wide-column, graph comparison — §07 NoSQL table. — §07.
- Pritchett, D., "BASE: An Acid Alternative." ACM Queue, 2008. BASE vs ACID framing — §07 ACID/BASE trade-off block. — §07.
- Stonebraker, M., et al., "The End of an Architectural Era (It's Time for a Complete Rewrite)." VLDB, 2007. OLTP vs specialized stores — background for §07 OLTP/OLAP table. — §07.
- Inmon, W. H., & Kimball, R. — data warehousing literature. Column-oriented OLAP and ETL/ELT — §07 OLAP row and analytics-isolation callout. — §07, §18.
- Dehghani, Z., "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh." martinfowler.com, 2019. Analytics across services — background for §07 CDC/warehouse row. martinfowler.com — §07.
Caching strategies (§09, §14, §18)
- Fitzpatrick, B., "Distributed Caching with Memcached." Linux Journal, 2004. Cache-aside at scale — §09 cache-layer stack. — §09.
- AWS / Azure / Cloudflare CDN documentation (synthesized). Edge vs CDN vs app vs DB cache tiers — §09 caching-layer figure. — §09.
- Kleppmann, DDIA — caching & materialized views. Staleness vs latency trade — §09 figcaption and §14 cache row. — §09, §14.
- Facebook / Meta engineering posts on cache invalidation (synthesized). "Hard problem" of invalidation — §09 trade-off theme. — §09, §18.
Sync vs async, APIs & messaging (§10, §12, §14, §18)
- Fielding, R. T., Architectural Styles and the Design of Network-based Software Architectures (REST). UC Irvine, 2000. REST constraints — §10 REST row. — §10.
- gRPC / Protocol Buffers documentation. HTTP/2 RPC and typed contracts — §10 gRPC row. grpc.io — §10.
- GraphQL specification & Facebook engineering notes. Client-driven field selection — §10 GraphQL row. graphql.org — §10.
- Hohpe, G., & Woolf, B., Enterprise Integration Patterns. Addison-Wesley, 2003. Messaging, pub/sub, competing consumers — §10 queue vs log cards. — §10, §12.
- Kreps, J., et al., "Kafka: A Distributed Messaging System for Log Processing." NetDB workshop, 2011. Event log / replay — §10 Kafka card and §12 EDA card. — §10, §12.
- Kreps, J., "Exactly-Once Semantics Are Possible: Here's How Kafka Does It." Confluent blog, 2017. At-least-once + idempotency reality — §10 idempotency callout. confluent.io — §10, §13, §18.
- Richardson, C., "Pattern: Messaging." microservices.io. Sync vs async coupling — §10 trade-off block. microservices.io — §10, §14.
Monolith vs microservices (§11, §14, §17)
- Fowler, M., & Lewis, J., "Microservices." martinfowler.com, 2014. Service boundaries and distributed costs — §11 table and §17 cargo-cult card. martinfowler.com — §11, §17.
- Newman, S., "Monolith First." martinfowler.com, 2015. Modular monolith baseline — §11 opening sub and §15 step 4. martinfowler.com — §11, §15, §17.
- Newman, S., Building Microservices (2nd ed.). O'Reilly, 2021. Independent deployability vs ops tax — §11 comparison table. — §11, §14.
- Evans, E., Domain-Driven Design. Addison-Wesley, 2003. Bounded contexts and modular monolith — §11 split-by-capability callout. — §11, §15.
- Richardson, C., "Antipattern: Shared database." microservices.io — §11 distributed-monolith callout and §17 distributed-monolith card. — §11, §17.
- Skelton, M., & Pais, M., Team Topologies. IT Revolution, 2019. Team–service alignment — §11 Conway callout. — §11, §17.
CQRS, event sourcing, saga & EDA (§12, §14, §18)
- Young, G., "CQRS Documents" / CQRS pattern posts. Separate read/write models — §12 CQRS card. cqrs.files.wordpress.com — §12.
- Fowler, M., "Event Sourcing." martinfowler.com. Event log as source of truth — §12 event-sourcing card. martinfowler.com — §12.
- Richardson, C., "Pattern: Saga." microservices.io. Compensating transactions — §12 saga card and §11 diagram caption. microservices.io — §11, §12, §18.
- Garcia-Molina, H., & Salem, K., "Sagas." ACM SIGMOD Record, 1987. Original saga concept — background for §12. — §12.
- Richardson, C., Microservices Patterns. Manning, 2018. Choreography vs orchestration sagas — §12 saga table. — §12.
- Evans, G., Domain-Driven Design Reference — domain events. Event-driven architecture — §12 EDA card. — §12.
Resilience, SLOs & failure modes (§13, §18)
- Nygard, M., Release It! (2nd ed.). Pragmatic Bookshelf, 2018. Circuit breaker, bulkhead, timeout, stability patterns — §13 pattern cards. — §13, §18.
- Netflix Hystrix / resilience4j documentation (synthesized). Circuit breaker states — §13 circuit-breaker card. — §13.
- Google SRE — SLOs, error budgets, nines. Availability vs cost — §13 nines table and diminishing-returns callout. sre.google — §13, §18.
- AWS Architecture Blog, "Exponential Backoff And Jitter." Retry-storm prevention — §13 retries card. aws.amazon.com — §13, §18.
- Basiri, A., et al., "Chaos Engineering." IEEE Software / Netflix practice. Game-days and failure injection — §13 failure-domain checklist. — §13, §18.
- Reactive Streams / backpressure specification. Bounded queues and flow control — §13 backpressure card. reactive-streams.org — §13.
- Dean & Barroso, "The Tail at Scale." Cascading latency under load — §13 resilience context. — §13.
Decision framework, ADRs & anti-patterns (§14–§18)
- Truong, L. (synthesis). Master trade-off matrix — §14 one-screen reference table. — §14.
- Nygard, M., "Documenting Architecture Decisions." Cognitect, 2011. ADR template — §16 entire section and §18 operability checklist. cognitect.com — §15, §16, §18.
- Thomson, J., "Architecture Decision Records." adr.github.io — §16 ADR template table. — §16, §18.
- Beck, K., Extreme Programming Explained — YAGNI. Simplest sufficient design — §15 step 6 and §17 golden-hammer card. — §15, §17.
- Knuth, D., "Structured Programming with go to Statements." ACM Computing Surveys, 1974. "Premature optimization" quote — §17 premature-scaling card. — §17.
- Fowler, M., "Big Ball of Mud." martinfowler.com. Structureless systems — analogy for §17 resume-driven theme. martinfowler.com — §17.
- Netflix engineering culture posts (synthesized). Hyperscaler context — §17 cargo-cult card. — §17.
- Truong, L., System Design Trade-Offs — personal working notes. May 2026. Constraint diamond, latency bar chart, CAP partition diagram, consistency spectrum, stateless scaling figure, monolith/microservices SVG, caching stack, trade-off blocks, pre-flight checklist, and synthesis prose. LinhTruong.com — all sections.