Computer science · field notes · May 2026

Computer Science — how I map the stack for working developers

Source / canonical copy: LinhTruong.com. If you forward this HTML, link there so attribution stays with the file.

Notes and SVGs I reuse when someone asks for a single picture of the whole road: foundations, systems, networking, distributed systems, data/ML, security, how we actually ship software, and a tight read on research and career moves. Slanted toward practitioners who present, review designs, or write—not a degree substitute.

Audience: Developers, leads, anyone teaching CS-shaped material Revised: May 2026 Format: 14 inline diagrams + appendix + references

1 · The Computer Science Landscape (2026)

Three forces keep showing up in the teams I talk to: AI bundled into products (often LLM-shaped), systems that are distributed by default (edge, cloud, device), and pressure to show your work—types, tests, traces, sometimes proof-ish reasoning. None of that is comfortable if you only live in one band of the stack.

The Modern CS Stack — five layers a developer must navigate L1 · Theory & Foundations Discrete math · Logic · Complexity · Automata · Information theory · Numerical methods L2 · Algorithms, Data Structures, Programming Languages Asymptotics · Graphs · Dynamic programming · Type systems · Compilers · Concurrency models L3 · Systems — OS, Networks, Databases, Distributed Memory hierarchy · TCP/QUIC · Storage engines · Consensus · Caching · Schedulers L4 · Applied AI / ML / LLMs & Data Engineering Training · Inference · RAG · Vectors · Evals · Fine-tuning · Agents · Pipelines L5 · Products, Security, Ethics & Human Factors UX · Privacy · Threat models · Accessibility · Policy · Sustainability
Figure 1 — Five layers; lower layers fund the upper ones—the more bands you can reason about, the less surprised you are in review.
Trend 01

AI-native software

LLMs and agents move from features to infrastructure. Retrieval, evals, and tool use are first-class concerns alongside testing and logging.

Trend 02

Edge + Cloud + Device

WASM, on-device inference, and serverless make "where does code run?" a design decision per request — not per service.

Trend 03

Verified & Observed

Strong types (Rust, TS, Swift 6), property testing, OpenTelemetry, and SBOMs raise the floor for production-grade software.

2 · Core Foundations

Foundations are option value. When you actually remember complexity classes, basic probability, and information theory, architecture arguments get shorter and code reviews get more concrete.

Math

The non-negotiables

  • Discrete math — sets, relations, induction, combinatorics, graph theory.
  • Probability & statistics — distributions, Bayes, A/B testing, confidence intervals.
  • Linear algebra — vectors, matrices, eigenvalues (used everywhere in ML and graphics).
  • Logic & proofs — propositional/first-order logic, invariants, contracts.
  • Information theory — entropy, compression, error correction.
CS Theory

What every developer should be able to argue

  • Why P vs NP matters for problem framing and approximation.
  • How Turing machines and the halting problem bound what is computable.
  • Why a regex is not a parser, and when to reach for a CFG.
  • What "NP-hard but tractable" means in practice (SAT, ILP, heuristics).
  • How information-theoretic lower bounds drive lossless compression and hashing.

3 · Data Structures & Algorithms

You do not need to memorize every algorithm — you need a map of which structure solves which class of problem, and the cost of each operation. The diagram below organizes the canon by access pattern.

Data structure map by access pattern Sequential Array · O(1) idx Linked list · O(1) insert Stack / Queue Ring buffer / Deque Rope (large strings) Skip list Associative Hash map · avg O(1) B-tree / B+ tree Trie / Radix tree Bloom / Cuckoo filter LSM tree (DBs) HNSW (vector search) Hierarchical Binary search tree Red-black / AVL Heap / priority queue Segment / Fenwick Merkle tree Quad / k-d tree Graph & Probabilistic Adjacency list / matrix DAG / topo sort Union-find (DSU) HyperLogLog (cardinality) Count-Min sketch CRDT (replicated)
Figure 2 — Pick a structure by access pattern first, then refine by constraints (memory, concurrency, persistence).

The algorithm strategies you must own

StrategyWhen to reach for itCanonical exampleBig-O hint
Divide & ConquerProblem splits into independent subproblemsMerge sort, FFTO(n log n)
GreedyLocal optimum implies global optimum (provable)Huffman, DijkstraO(n log n)
Dynamic ProgrammingOverlapping subproblems + optimal substructureLCS, knapsack, edit distanceO(n·m) typical
Backtracking / Branch & BoundSearch a space with prunable invariantsSAT, N-queens, schedulingExponential worst case
Graph traversalReachability, shortest path, cyclesBFS, DFS, A*O(V+E)
Randomized / SketchesMassive data, approximate answers OKBloom, HLL, MinHashSub-linear memory
ApproximationNP-hard but a 2× or (1+ε) bound is fineSet cover, TSPPolynomial w/ ratio
Online & streamingData arrives once, can't store it allReservoir sampling, EWMAO(1) per item

4 · Systems & Architecture

Performance, reliability, and cost all bottom out in physics: how memory hierarchies, kernels, and hardware coordinate. The "latency numbers every programmer should know" table below is a developer's compass.

Latency numbers every developer should know (rough orders of magnitude, 2026) 1ns10ns100ns 10µs1ms100ms+ L1 cache · ~1ns L2 cache · ~4ns Branch mispredict · ~5ns DRAM access · ~100ns NVMe SSD read · ~10µs Datacenter RTT · ~500µs Cross-region RTT · 70–150ms Cold-start LLM call · 0.5–3s
Figure 3 — The seven-orders-of-magnitude gap between a register and a cross-region call is why where code runs matters as much as what it does.
OS

Process & memory

Virtual memory, paging, page-cache, mmap, copy-on-write. Know your scheduler (CFS, EEVDF on Linux 6.x) and how io_uring changes I/O.

Concurrency

Three models

Threads + locks, async/await (cooperative), and actors/CSP (message passing). Pick one per service boundary, not per file.

Hardware

Mechanical sympathy

Cache lines (64B), NUMA, SIMD, branch prediction. A tight loop that fits in L1 is 100× a naïve one over RAM.

5 · Networking & the Web

The web runs on a small number of layered abstractions. Understanding them lets you debug "why is my page slow?" with first principles instead of guesses.

From bits to browsers — the 2026 web stack L7 · Application — HTTP/3, gRPC, GraphQL, WebSocket, WebTransport, SSE L6 · Presentation/Security — TLS 1.3, mTLS, ALPN, Encrypted ClientHello, post-quantum KEMs (ML-KEM) L5 · Session — QUIC streams, HTTP/2 multiplexing, 0-RTT L4 · Transport — TCP (BBRv3), UDP (QUIC), congestion control L3/L2 · IP / Ethernet / Wi-Fi 7 / 5G — routing, NAT, anycast, BGP L1 · Physical — fibre, wireless, satellite (LEO)
Figure 4 — HTTP/3 over QUIC is now the default on most CDNs. Post-quantum hybrid key exchange is rolling out in TLS 1.3.

What to debug, in order

  1. DNS — resolution time, TTL, regional anycast.
  2. TCP/QUIC handshake — 0-RTT eligibility, head-of-line blocking.
  3. TLS — cert chain, OCSP staple, ALPN.
  4. HTTP semantics — caching headers, range, compression (Brotli/zstd).
  5. Server processing — DB query, CPU, GC pauses.
  6. Client rendering — Critical CSS, hydration, Core Web Vitals (LCP, INP, CLS).

6 · Distributed Systems

Distributed systems are the discipline of partial failure. The CAP and PACELC framings, plus consensus, are the conceptual core.

CAP + PACELC — choosing trade-offs explicitly Consistency Availability Partition tol. Under partition, pick C or A PACELC matrix (real systems) If Partition → choose Else → choose PA · prefers AvailabilityEL · low Latency PC · prefers ConsistencyEC · strong Consistency DynamoDB · PA/EL Cassandra · PA/EL Spanner / CockroachDB · PC/EC MongoDB (majority) · PC/EC Redis Cluster · PA/EL
Figure 5 — PACELC extends CAP with the steady-state trade-off between latency and consistency.
Consensus

Paxos, Raft, Viewstamped Replication

All solve the same problem: agree on a value despite failures. Raft is the de-facto teaching algorithm; Multi-Paxos and Viewstamped Replication power the largest systems. Modern variants (EPaxos, Flexible Paxos) trade latency for failure-domain flexibility.

Replication

Patterns

  • Leader/follower (Postgres, MySQL).
  • Multi-leader (CRDTs, geo-active-active).
  • Leaderless / quorum (Dynamo-style).
  • Chain replication (CRAQ for read scaling).

7 · Databases & Data Engineering

The database is usually your most expensive and least reversible decision. Choose by access pattern + consistency need + ops cost, in that order.

Database selection — a decision tree, not a tier list What is the workload? OLTP (rows, ACID) OLAP (analytics) Specialty Postgres / MySQL CockroachDB / Spanner DuckDB · Clickhouse Snowflake · BigQuery · Databricks Redis · KV Neo4j · Graph Vector Vector (pgvector, Qdrant, Pinecone) Use when retrieval is semantic — pair with reranker. Time-series (Timescale, InfluxDB) High write rate, append-mostly, range queries. Search (OpenSearch, Meilisearch) Full-text, faceted, typo-tolerant.
Figure 6 — Default to Postgres until a workload forces you out. Mix engines at the storage layer, not at the API.

Data engineering pipeline shape

source  →  ingest  →  store (lake/warehouse)  →  transform (dbt/Spark)  →  serve
                          ↓                              ↓
                       catalog (Iceberg/Delta/Hudi)   metrics + features
                          ↓                              ↓
                       lineage / quality             ML training / RAG

8 · AI / ML / LLMs

In 2026 a software developer is expected to integrate AI competently even if they don't train models. The diagram below distinguishes the four common operating modes.

Four operating modes for LLMs in production 1 · Direct prompting Stateless call Few-shot examples Cheap, fast iteration Eval: golden set Risk: hallucination Use for: drafting, summarization, classification Latency: 100ms–2s 2 · RAG Retrieve + ground Vector + lexical (hybrid) Reranker on top-k Eval: faithfulness + recall Risk: stale / wrong docs Use for: docs Q&A, support, internal search Latency: 0.5–4s 3 · Tool use / Agents Plan → act → observe MCP / function calling Loops, memory, budgets Eval: trajectory + outcome Risk: runaway, prompt-inject Use for: workflows, coding agents, ops Latency: seconds–minutes 4 · Fine-tune / Train LoRA / DPO / RLHF Domain-specific data Owned weights Eval: holdout + offline Risk: drift, compute cost Use for: style, safety, small specialist models Latency: same as base
Figure 7 — Start with prompting, add RAG when grounded knowledge is needed, add tools when actions are needed, fine-tune last.

Evaluation is the moat

The most common failure mode in AI features is "looks good in the demo, regresses in prod." Build evals before you ship anything.

A research paper or production launch in 2026 without an eval section is incomplete. The eval is the experiment.

9 · Security & Cryptography

Threat modeling is cheap; incidents are not. Use the STRIDE mnemonic on every new feature, and pair it with the standard mitigations below.

STRIDE — six threat classes and their default mitigations Spoofing Identity faked → Strong authN → MFA, passkeys → Mutual TLS Tampering Data altered → Signed payloads → Integrity checks → Audit logs Repudiation "I didn't do it" → Append-only logs → Signed actions → Time stamps Info disclosure Secrets leak → Encryption → Least privilege → Data minimization Denial of service Resource exhaust → Rate limits → Quotas, queues → Circuit breakers Elevation of privilege → RBAC/ABAC → Sandboxing → Capabilities
Figure 8 — Apply STRIDE per data-flow boundary, not per feature.

2026 baseline you should be able to defend

10 · Software Engineering Practice

The most underrated CS skill is shipping software that other humans can change. The diagram below shows the modern feedback loop a high-functioning team runs.

The modern delivery loop — each arrow is a feedback signal Plan / Spec Build / Code Test / Review Release / Deploy Operate / Observe Learn / Iterate Definition of done PRs, CI gates Canary, feature flags SLOs, incidents Post-mortems User signal
Figure 9 — The loop closes when production behavior changes the next spec. Teams without that closure stall.
Code quality

The big six

  • Types & static analysis
  • Unit + property + integration tests
  • Code review with checklists
  • Linters & formatters in CI
  • Dependency upgrade bots
  • Mutation testing for hot paths
Workflow

Trunk-based delivery

  • Short-lived branches
  • Feature flags for unfinished work
  • Continuous deploy to staging
  • Canary + blue/green to prod
  • Rollback in <5 min
Docs

What to write

  • ADRs (architecture decisions)
  • RFCs for cross-team work
  • Runbooks per service
  • Onboarding 90-day plan
  • Glossary of domain terms

Testing pyramid (and the modern flip)

LevelQuantitySpeedConfidence2026 note
UnitManymsLocalProperty tests catch what examples miss
IntegrationSomesecondsModuleUse real DBs in containers, not mocks
End-to-endFewminutesSystemPlaywright/Cypress; run on a deployed env
Eval / behavioralAlwaysvariesAI featuresThe new tier for LLM-powered features
Chaos / loadPeriodichoursReliabilityRun before launches, not after incidents

11 · Cloud, DevOps & Platform Engineering

Cloud is the substrate; platform engineering is how you make it usable. A good internal developer platform (IDP) turns "deploy" into a one-line action and bakes in compliance.

Internal developer platform (IDP) — the golden path Developer Portal Service catalog Templates / scaffolds Docs & ownership Backstage / Port CI / CD Build & sign SBOM + scan Progressive delivery Argo / GitHub Actions Compute Kubernetes Serverless Edge / WASM GPUs for inference Data & AI Managed DBs Object store Model gateway Feature store Observability & Security OpenTelemetry SLO / SLI / error budget Secrets vault Policy as code (OPA)
Figure 10 — A working IDP makes the secure, observable, scalable path the easy one.

Infrastructure-as-code stance

12 · Performance & Observability

"Make it work, make it right, make it fast" — in that order. Measure before you optimize. Modern observability is three pillars plus one.

Observability — three pillars plus profiling, unified by OpenTelemetry Metrics Counters, gauges, histograms Cardinality matters RED & USE methods SLOs / SLIs Prometheus · Mimir Logs Structured (JSON) Sampled / leveled Correlated by trace-id Loki · OpenSearch No PII in logs Traces Spans across services Critical path analysis Tail sampling Tempo · Jaeger Propagate W3C tracecontext Continuous profiling CPU, memory, allocations eBPF — low overhead Flame graphs in prod Pyroscope · Parca Tie to deploy diffs
Figure 11 — Pillars are not enough on their own; the value is in correlation through OpenTelemetry trace IDs.

The performance loop

  1. Define the SLI (e.g., p95 request latency).
  2. Measure — instrument, run load tests, capture flame graphs.
  3. Find the bottleneck — Amdahl says fix the biggest slice first.
  4. Change one thing, A/B it, re-measure.
  5. Lock in with a regression test or budget alert.

13 · Research-Paper Methodology for CS

Whether you are writing a conference paper, a tech report, or a launch retrospective, the structure is the same: question → method → evidence → claim. The diagram below is a research project as a pipeline.

A CS research paper as a pipeline — each box is a deliverable 1 · QuestionSpecific, falsifiable 2 · LiteraturePosition vs. prior work 3 · HypothesisPredicted effect & size 4 · MethodDesign, baselines, metrics 5 · ExperimentRun, log, reproduce 6 · AnalysisStats, ablations 7 · ThreatsValidity, bias, limits 8 · ArtifactCode, data, seed, env 9 · WriteIMRaD + figures 10 · ReviewInternal, external, peer 11 · ReviseAddress each comment 12 · PublisharXiv, venue, blog Loop back to step 1 with the next question — papers seed papers.
Figure 12 — Treat the paper as a pipeline whose artifacts (data, code, figures) are reusable across drafts and venues.

IMRaD structure with CS-specific guidance

SectionLengthKey questions to answerCommon mistakes
Abstract150–250 wordsWhat problem, what idea, what evidence, what impact?Vague claims, no numbers
Introduction1–1.5 pagesWhy now? What gap? Three-bullet contribution list.Backstory instead of stakes
Related work0.5–1 pageWhat does this paper do that prior work does not?List instead of comparison
Method2–4 pagesCould a competent reader reimplement it?Skipping non-obvious choices
Experiments2–4 pagesBaselines, datasets, metrics, ablations, statistical tests.Single-seed results
Discussion0.5–1 pageWhat did we learn? When does this fail?Re-stating results
Threats to validity0.25–0.5 pageWhat might be wrong, and what would change the conclusion?Omitting
ReproducibilityAppendixCode, data, hyperparameters, hardware, seeds, exact commands."Available on request"

Reproducibility — the 2026 bar

If your reviewer cannot run your experiment with one command on one machine, your paper is too long.

Common CS publication venues

AreaTop venuesStyle
SystemsSOSP, OSDI, NSDI, EuroSys, ATC, FASTImplementation + measurement
NetworkingSIGCOMM, NSDI, CoNEXTProtocols + measurement
DatabasesSIGMOD, VLDB, ICDE, CIDREngines, query processing
PLPLDI, POPL, OOPSLA, ICFPSemantics, compilers, proofs
TheorySTOC, FOCS, SODAProofs, complexity
ML / AINeurIPS, ICML, ICLR, ACL, EMNLPEmpirical + analysis
SecurityUSENIX Security, IEEE S&P, CCS, NDSSAttacks, defenses, measurement
HCICHI, UIST, CSCWUser studies, design
SEICSE, FSE, ASETools, empirical SE

14 · Career & Learning Strategy for Software Developers

Skills compound; reputations compound faster. Aim for T-shaped early (one deep stem, broad horizontal), then π-shaped mid-career (two deep stems).

Skill shape over a 10-year career Year 0–2 · I-shape One deep skill Year 2–5 · T-shape Broad context + one depth Year 5–10 · π-shape Two depths bridged by breadth
Figure 13 — Add the second stem deliberately; usually one is technical, one is adjacent (product, security, ML, distributed).
Learning

How to keep up without burning out

  • Spaced practice — weekly, not weekend cram.
  • Build, then read — write the toy version, then read the paper.
  • Teach — internal talks, blog posts, code reviews. Output forces understanding.
  • Cap inputs — 2 newsletters, 1 podcast, 3 papers/month beats infinite feeds.
  • Re-read classicsDesigning Data-Intensive Applications, SICP, The Pragmatic Programmer.
Visibility

How to be known for something

  • Pick one topic you are willing to defend for 18 months.
  • Publish — blog, paper, OSS — on a regular cadence.
  • Show your work: benchmarks, reproductions, post-mortems.
  • Speak at one venue per year (lunch & learn counts).
  • Help one person publicly every week (issue, PR, review).

The 6-month focus plan (a template)

A 6-month skill sprint — one theme, three outputs M1 · Survey5 papers, 1 bookmap the field M2 · Replicatebuild the toymatch a baseline M3 · Varychange 1 thingmeasure effect M4 · Shipinto a real productor OSS lib M5 · Publishblog + talkor paper draft M6 · Reflectwhat next?
Figure 14 — Same shape works for "learn Rust", "ship an ML feature", or "write your first paper".

15 · Master Checklist — Everything You Should Be Able To Do

Code & algorithms

  • Pick a data structure by access pattern and justify the trade-off in writing.
  • Reason about Big-O and amortized cost without looking it up.
  • Write idiomatic code in at least two paradigms (OO + functional or systems).
  • Recognize when a problem is NP-hard and choose an approximation strategy.

Systems

  • Estimate latency, throughput, and storage on the back of an envelope.
  • Design a service that survives a single-AZ outage.
  • Trace a slow request from browser to database and back.
  • Write a runbook that an oncall engineer can use at 3am.

Data & AI

  • Design a RAG pipeline with a sensible eval harness.
  • Explain the difference between fine-tuning, RAG, and prompting to a PM.
  • Identify when a vector database is overkill vs. essential.
  • Diagnose a model regression with a holdout set.

Security & reliability

  • Run STRIDE on a feature design and produce mitigations.
  • Set an SLO and an error budget you actually enforce.
  • Lead a blameless post-mortem with clear action items.
  • Rotate a secret without downtime.

Practice & communication

  • Write an ADR that holds up six months later.
  • Give a 10-minute talk on something you built last quarter.
  • Review a PR with substantive feedback, not nits.
  • Mentor a junior to ship their first production change.

Research

  • Frame a falsifiable research question.
  • Run an experiment with baselines, ablations, and seeds.
  • Produce a one-command reproduction.
  • Publish or present at least one piece of work each year.

Appendix · Curated reading list

Foundations
  • Introduction to Algorithms — Cormen et al.
  • Structure and Interpretation of Computer Programs — Abelson & Sussman
  • Computer Systems: A Programmer's Perspective — Bryant & O'Hallaron
Systems
  • Designing Data-Intensive Applications — Kleppmann
  • Database Internals — Petrov
  • Site Reliability Engineering — Google
AI / ML
  • Deep Learning — Goodfellow et al.
  • Pattern Recognition and Machine Learning — Bishop
  • Papers: Attention Is All You Need, Chinchilla, RAG, Constitutional AI
Practice
  • The Pragmatic Programmer — Hunt & Thomas
  • Accelerate — Forsgren, Humble, Kim
  • A Philosophy of Software Design — Ousterhout
Security
  • Cryptography Engineering — Ferguson, Schneier, Kohno
  • The Tangled Web — Zalewski
  • OWASP Top 10 + LLM Top 10
Research craft
  • The Craft of Research — Booth et al.
  • Writing for Computer Science — Zobel
  • "How to write a great research paper" — Simon Peyton Jones (talk)

16 · References & sources

The diagrams here are my synthesis; the list below is where I point people for citable anchors—textbooks, papers, and standards that match §1–§15. It is not exhaustive relative to every vendor drawn in your head.

Note: Use published editions for bibliographies; arXiv is fine for preprints. RFCs and NIST publications are versioned—always confirm you cite the revision your design actually follows.

Algorithms, complexity & discrete foundations

  1. Cormen, Leiserson, Rivest & Stein, Introduction to Algorithms (CLRS). MIT Press—standard reference for Big-O, data structures, and classical algorithms tied to §3. ISBN 978-0262046305.
  2. Sedgewick & Wayne, Algorithms (4th ed.). Addison-Wesley—practical algorithms with running-cost intuition. https://algs4.cs.princeton.edu/
  3. Knuth, The Art of Computer Programming, Vol. 1 (Fundamental Algorithms). Addison-Wesley—historical rigor for analysis and combinatorial foundations.
  4. Sipser, Introduction to the Theory of Computation. Cengage—automata, computability, complexity classes; connects to “NP-hard” discussions in §3.

Computer systems, OS & architecture

  1. Bryant & O’Hallaron, Computer Systems: A Programmer’s Perspective (CSAPP). Pearson—memory hierarchy, linking, concurrency primitives; backs §4. ISBN 978-0134092669.
  2. Patterson & Hennessy, Computer Architecture: A Quantitative Approach. Morgan Kaufmann—pipeline, caches, and performance modeling vocabulary in §4, §12.
  3. Arpaci-Dusseau & Arpaci-Dusseau, Operating Systems: Three Easy Pieces (OSTEP). Free online textbook—processes, virtualization, persistence. https://pages.cs.wisc.edu/~remzi/OSTEP/

Programming languages & abstraction

  1. Abelson & Sussman, Structure and Interpretation of Computer Programs (SICP). MIT Press—procedures, state, and metalinguistic abstraction; cited in the appendix and useful for §2 foundations.

Networking & web protocols

  1. Tanenbaum & Wetherall, Computer Networks (ed.). Pearson—layered stack context for §5.
  2. Stevens, TCP/IP Illustrated, Vol. 1 (and Vol. 2 for implementation detail). Addison-Wesley.
  3. Postel, RFC 793 — Transmission Control Protocol. IETF. RFC 793
  4. Iyengar & Thomson (eds.), RFC 9000 — QUIC: A UDP-Based Multiplexed and Secure Transport. RFC 9000
  5. Fielding, RFC 9110 — HTTP Semantics (HTTP-bis). RFC 9110

Distributed systems, consensus & consistency

  1. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System.” CACM 1978. Logical clocks—§6. Author PDF
  2. Fischer, Lynch & Paterson, “Impossibility of Distributed Consensus with One Faulty Process.” J. ACM 1985 (FLP). Explains why asynchronous consensus is impossible with one crash fault without timeouts.
  3. Lamport, “The Paxos Algorithm.” (family of papers); see also Lamport’s Paxos Made Simple. Paxos Made Simple (PDF)
  4. Ongaro & Ousterhout, “In Search of an Understandable Consensus Algorithm (Raft).” USENIX ATC 2014. https://raft.github.io/raft.pdf
  5. Gilbert & Lynch, “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services.” SIGACT 2002—CAP formalization often referenced with §6–§7.

Databases, storage & data-intensive systems

  1. Kleppmann, Designing Data-Intensive Applications (DDIA). O’Reilly—storage engines, replication, stream processing; core for §7. ISBN 978-1449373320.
  2. Gray, “The Transaction Concept: Virtues and Limitations.” VLDB 1981—transaction semantics vocabulary. Author archive PDF
  3. Petrov, Database Internals. O’Reilly—B-trees, LSM, distributed storage mechanics complementing DDIA.

Machine learning, deep learning & retrieval-augmented generation

  1. Goodfellow, Bengio & Courville, Deep Learning. MIT Press—foundations for §8. https://www.deeplearningbook.org/
  2. Bishop, Pattern Recognition and Machine Learning. Springer—classical probabilistic ML.
  3. Vaswani et al., “Attention Is All You Need.” NeurIPS 2017—transformers underpin LLM sections. https://arxiv.org/abs/1706.03762
  4. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS 2020—RAG. https://arxiv.org/abs/2005.11401

Security, cryptography & web application safety

  1. Ferguson, Schneier & Kohno, Cryptography Engineering. Wiley—engineering practice for crypto systems; §9.
  2. OWASP Top 10 Web Application Security Risks. Community standard. OWASP Top 10
  3. OWASP Top 10 for Large Language Model Applications. OWASP LLM Top 10

Software engineering, reliability & platform practice

  1. Hunt & Thomas, The Pragmatic Programmer (20th Anniversary). Addison-Wesley—habits for §10.
  2. Forsgren, Humble & Kim, Accelerate. IT Revolution—DORA metrics and delivery science referenced in modern practice discussions. ISBN 978-1942788331.
  3. Ousterhout, A Philosophy of Software Design. Yaknyam Press—module design and complexity control.
  4. Google, Site Reliability Engineering (free). O’Reilly / Google—SLOs, error budgets, incident response; §11–§12. https://sre.google/sre-book/table-of-contents/
  5. Google, The Site Reliability Workbook. Companion for practical patterns. https://sre.google/workbook/table-of-contents/

Observability & performance analysis

  1. OpenTelemetry Project. Vendor-neutral telemetry model. https://opentelemetry.io/
  2. W3C Trace Context. Distributed trace propagation. https://www.w3.org/TR/trace-context/
  3. Gregg, Systems Performance (2nd ed.). Pearson—USE methodology, profiling, and systems observability depth for §12.

Research communication & CS writing

  1. Booth, Colomb & Williams, The Craft of Research (4th ed.). University of Chicago Press—question framing and argument structure; §13.
  2. Zobel, Writing for Computer Science. Springer—conventions for CS papers and reports.
  3. Peyton Jones, “How to Write a Great Research Paper.” Microsoft Research talk (series). Microsoft Research video listing
Disclaimer. Reading list only—links aren’t endorsements. Canonical page for this file: LinhTruong.comLinh Truong.