Personal notes · May 2026

Vibe Coding Strategy

I wrote this note to capture how I work with AI coding agents in production — the workflow, tooling choices, guardrails, and team habits that separate hobby vibes from shippable software. It is my working playbook, not a vendor white paper.

The question behind this note: when the agent does the typing, what do I still own? Answer: intent, taste, context, and verification — everything that turns generated code into software someone can trust.
Scope: intent → production Stance: pragmatic, tool-agnostic Horizon: 2025–2026 tooling ✍️ By: Linh Truong

1 · What I'm tracking

Vibe coding is how I describe working where I express intent in natural language and let an AI agent generate, run, test, and iterate — while I steer, review, accept, or reject. Andrej Karpathy coined the phrase in February 2025; within a year it moved from meme to default mode in major IDEs, cloud agents, and CLIs.

I treat vibe coding not as a replacement for engineering, but as a new interface to it — one that demands stronger taste, sharper specs, tighter feedback loops, and more disciplined verification, not less. This note is where I collect the patterns that actually hold up on real teams.

~55%code authored or modified by AI in surveyed teams (2026)
3–5×faster prototype-to-demo cycle
defect rate when accepting code without review
#1skill differentiator: context engineering

2 · Origin & Definition

Etymology

Where the term came from

In February 2025, Andrej Karpathy described a new style of working where he would "give in to the vibes, embrace exponentials, and forget that the code even exists." The phrase captured a real shift: developers were no longer writing most of the code — they were directing it.

Working Definition

What vibe coding actually is

Vibe coding = an AI-first development loop where a human supplies intent, taste, and verification, and an AI agent supplies code generation, editing, execution, and iteration. The artifact is still software; the locus of authorship shifts from keystrokes to specifications.

How I draw the line. Vibe coding is not the same as "using Copilot." Autocomplete is a typing accelerator. Vibe coding is an agentic loop: the model plans, edits multiple files, runs commands, observes results, and revises — with me acting as product owner, reviewer, and safety check.

3 · The Vibe Coding Spectrum

Not all "AI coding" is vibe coding. It sits on a spectrum of human-vs-agent authorship. Knowing where you are on the spectrum determines what guardrails you need.

Human authorship ← → Agent autonomy Manual No AI; you type every line Autocomplete Inline tab-complete (Copilot, Cursor Tab) Chat-assisted Side-panel Q&A, scoped edits Vibe coding Agent edits files, runs tests, iterates Autonomous agent Long-running cloud agent ships PRs Required guardrails grow with autonomy → review · tests · sandboxing · permissions · observability · rollback
Figure 1 — The AI-coding autonomy spectrum. Vibe coding lives at stages 4–5.

4 · Core Principles

P1

Intent over syntax

Spend time on what and why. The model handles how. A clear paragraph beats a clever prompt.

P2

Small reversible steps

Prefer many small diffs over one giant generation. Each step should compile, run, and be cheap to throw away.

P3

Tight feedback loops

The agent must see the same signals you do: type errors, test output, logs, screenshots. No feedback = no learning.

P4

Verification is non-negotiable

If a human did not understand, run, or test it, treat it as unverified. Trust comes from evidence, not vibes.

P5

Context is the product

Quality output requires curated input: the right files, types, examples, conventions, and constraints.

P6

Taste is the moat

Anyone can prompt. Few can judge. Architectural taste, API design, and UX sensibility are now your edge.

5 · The Vibe Coding Workflow

A reliable vibe coding loop has six stages. Skipping any of them is the most common failure mode.

1 · Frame Intent Goal · constraints · acceptance criteria "What does done look like?" 2 · Load Context Files, types, schema, conventions, examples, prior decisions 3 · Plan Agent proposes steps; human approves or edits 4 · Generate & Run Agent edits files, executes tests & build 5 · Verify Read diff, run app, test edge cases, security check 6 · Land & Learn Commit, document, capture learnings → context Human role: Director, Reviewer, Safety net
Figure 2 — The six-stage vibe coding loop. Most failures come from skipping Verify or starving Context.

What "good" looks like at each stage

StageSign of quality
1 · IntentAcceptance criteria written in plain English before any prompt
2 · ContextOnly the files that matter are loaded; conventions cited explicitly
3 · Plan3–7 numbered steps; you can predict each step's outcome
4 · GenerateEach change runs cleanly; tests pass; no silent rewrites
5 · VerifyYou read every diff; you exercise the feature manually
6 · LandCommit message captures the why; learnings flow back

Common failure modes

  • Vibe-and-pray: skip Intent, skip Verify, ship.
  • Context starvation: agent invents APIs that don't exist.
  • Mega-prompt: a 400-line wall produces a 2000-line PR no one can review.
  • Yes-machine: accepting every suggestion erodes the agent's signal.
  • Lost causality: no commit per logical change → impossible to bisect.
  • Tool blindness: agent has no test runner / no logs → it cannot self-correct.

6 · Tooling Landscape (2026)

Vibe coding tools cluster into five layers. A modern stack picks one or two from each layer and wires them together with MCP (Model Context Protocol) servers.

5 · Foundation Models Claude Opus 4.7 · Sonnet 4.6 · Haiku 4.5 · GPT-5 · Gemini 2.5 Pro · open-weights (Llama, DeepSeek, Qwen) 4 · Agent Surfaces IDE agents (Cursor, Windsurf, VS Code Copilot) · CLI agents (Claude Code, Codex CLI, Aider) · Cloud agents 3 · Tool Integrations (MCP) Filesystem · Git · Browser · DBs · Cloud · Issue trackers · Observability · Design tools 2 · Guardrails & Quality Type checkers · linters · tests · sandboxes · secret scanners · SAST · review bots · eval harnesses 1 · Context & Memory CLAUDE.md / AGENTS.md · repo maps · ADRs · skills/subagents · long-term memory · retrieval
Figure 3 — The vibe coding stack. Each layer is swappable; the interfaces (MCP, file conventions) are the lock-in.
CategoryRepresentative tools (2026)Best for
Agentic IDEsCursor, Windsurf, VS Code + Copilot, JetBrains AIDay-to-day feature work with human in the chair
Terminal agentsClaude Code, Codex CLI, Aider, ClineRepo-wide refactors, scripting, server work
Cloud / async agentsDevin, Claude Agent SDK, OpenAI Codex Cloud, Replit AgentLong-running tasks, parallel PRs, sweep work
App-builder agentsBolt, Lovable, v0, Replit, Claude ArtifactsZero-to-prototype, internal tools, MVPs
Review & QACodeRabbit, Greptile, Diamond, ultrareview-style multi-agentSecond-opinion review on every PR
Eval & observabilityBraintrust, Langfuse, Helicone, internal eval harnessesMeasuring agent quality over time
Context plumbingMCP servers, repo indexers, ADR generatorsFeeding agents the right facts
Rule I use for tools. Pick the least autonomous tool that solves the problem. Use autocomplete for known patterns, chat for scoped edits, IDE agents for features, cloud agents for sweeps. Skipping rungs creates accidents.

7 · The VIBE Strategy Framework

A four-pillar mental model I use on every project. When one pillar is weak, that is where the work stalls.

V Vision
Crisp problem statement, acceptance criteria, user stories, non-goals.

Without vision, the agent optimises the wrong thing — beautifully.
I Iteration
Small diffs, fast loops, cheap rollback, frequent commits, runnable at every step.

Velocity comes from short cycles, not long prompts.
B Boundaries
Permissions, sandboxes, secrets handling, blast-radius limits, human approval gates.

Autonomy without boundaries is just a bigger blast radius.
E Evidence
Tests, type checks, manual exercise, evals, telemetry. The agent's claim ≠ truth.

"Verified by running" beats "looks right."
Figure 4 — The VIBE framework: Vision · Iteration · Boundaries · Evidence.

8 · Prompt & Spec Patterns

A prompt is a tiny specification. Treat it like product writing: structure, examples, constraints. The five patterns below cover ~80% of day-to-day work.

Pattern 1

Goal · Context · Constraints · Done

The universal default. Four short paragraphs, in this order.

# Goal
Add server-side pagination to /api/orders, default 50/page.

# Context
- Express app, Prisma ORM, see routes/orders.ts and prisma/schema.prisma.
- We already use cursor pagination in /api/invoices — match that style.

# Constraints
- No breaking changes to existing clients.
- Keep response shape additive (add nextCursor, keep items).

# Done when
- New unit test passes; existing tests still pass.
- Manual curl with and without cursor returns expected pages.
Pattern 2

Plan-then-execute

For anything spanning more than two files, ask the agent to plan first.

Before writing any code, list the files you will change
and a 1-line description of the change for each.
Stop and wait for approval.

Catches "agent invents new architecture" early, while it is still cheap.

Pattern 3

Reference exemplars

Point to a file that already does it right.

Implement <feature> following the same structure as
src/features/auth/login.ts — same error handling,
same logging conventions, same test layout.

Exemplars beat style guides. Style guides beat vibes.

Pattern 4

Red-team your own prompt

For risky changes, end the prompt with:

List three ways this change could break production
and the test that would catch each one before
writing the implementation.

Surfaces edge cases without doubling the work.

Pattern 5

Spec, not request

For features lasting more than a day, write a short spec document and feed it as a file. Specs are reusable, prompts are not.

SPEC.md
- Problem
- Users & jobs-to-be-done
- API surface (with examples)
- Data model
- Edge cases
- Out of scope
- Test plan
Anti-pattern

The mega-prompt

A 400-line prompt asking for "the whole thing" produces a 2,000-line diff. Nobody reviews it. Bugs ship.

Fix: decompose into 3–7 vertical slices, each independently testable.

9 · Context Engineering

In 2026, "prompt engineering" has been absorbed into the broader practice of context engineering: curating the full set of inputs the agent sees — system rules, files, examples, tool outputs, prior decisions, and memory.

Static

Project conventions file

A CLAUDE.md / AGENTS.md / .cursorrules in the repo root capturing:

  • Stack & versions
  • Run, test, lint commands
  • Code style & naming
  • "Always / Never" rules
  • Pointers to exemplar files
Dynamic

Just-in-time retrieval

Tools that let the agent pull what it needs: repo grep, file read, type signatures, DB schema, API docs via MCP. Better than dumping the whole repo.

Persistent

Memory & learnings

Save durable facts (user role, project goals, "we tried X, it failed because Y") to a memory file. Reload across sessions to avoid re-explaining.

The context budget

Even with million-token windows, attention is finite. Curate ruthlessly:

Goal & constraints Conventions & exemplars Relevant files & types Everything else (kept out)
Figure 5 — Context pyramid. Above the line: high-signal. Below: noise.

10 · Quality, Tests & Guardrails

Vibe coding shifts the bottleneck from writing code to trusting code. Your quality stack must be machine-readable so the agent can use it as a feedback signal.

The verification ladder

  1. Compiles / type-checks — the floor. Never skip.
  2. Linter clean — catches style drift the agent introduces.
  3. Unit tests pass — including new ones the agent wrote.
  4. Integration tests pass — hits real DB, real API client.
  5. Manual exercise — you run the app and try the feature.
  6. Second-opinion review — a different model / human reviews the diff.
  7. Production telemetry — error rates, latency, business metrics.

Guardrail checklist (set once, reuse forever)

  • Sandboxed execution for any agent-run command
  • Permission allowlist for shell commands (no rm -rf by default)
  • Secrets scanner pre-commit hook
  • SAST + dependency-audit on every PR
  • Branch protection: agents cannot push to main
  • Disposable cloud envs for agent demos
  • Audit log of agent actions retained for ≥ 30 days
Tests are now spec artefacts. A failing test is the clearest possible prompt: "make this pass." Invest more in tests because the agent will use them as its compass.

11 · Risks & Mitigations

RiskHow it appearsMitigation
Hallucinated APIsAgent imports non-existent libraries or functionsType-check + run before commit; reference real exemplars
Security regressionsSQL injection, XSS, missing auth checksSAST in CI, security-review skill on diff, principle of least privilege
Secret leakageKeys committed to repo or sent to modelPre-commit secret scanner, env-var hygiene, on-host model for sensitive code
Supply-chain riskAgent adds a malicious or typosquatted dependencyPin versions, allowlist registries, audit on add
Architecture driftEach feature invents its own patternStrong conventions file + exemplar references + ADRs
Skill atrophyDevs lose fundamentals they no longer practiceMandatory "no-AI" exercises; deliberate learning time
Over-trust"It compiled" treated as "it works"Verification ladder enforced in PR template
Cost runawayLong agent runs burn tokensBudgets, model tiering, caching, kill-switches
IP / license riskGenerated code resembles GPL training dataUse licence-aware models, attribution scanning
Prompt injectionUntrusted text in a fetched file hijacks the agentTreat tool output as untrusted; sandbox; allowlist actions

12 · Anti-Patterns

Anti

Vibe-and-ship

Accepting AI output unread and merging it. Defects ship at 2× the rate. Always read the diff.

Anti

Prompt golf

Endlessly tweaking a single prompt to "fix" output. Stop, decompose, add a test, try again.

Anti

Single-shot megaproject

"Build me Twitter." Big-bang prompts produce big-bang failures. Vertical slices instead.

Anti

The lone wolf agent

One developer running an unsupervised cloud agent for hours. Use PR-sized units of work.

Anti

Documentation by deletion

Agent rewrites code and quietly drops docs/comments that captured non-obvious "why."

Anti

Test theatre

Agent writes tests that pass by mirroring the implementation. Review tests before code.

13 · Skills Matrix for the Modern Developer

The relative importance of developer skills has shifted. Below is a snapshot of where to invest your learning hours now.

Skill importance for vibe coding (relative weight) Context engineering ★ Critical Code review & taste ★ Critical Testing & verification ★ Critical System design High Product / UX sense High Debugging & root-cause High Security mindset High Tool/MCP literacy Rising Memorising syntax Declining Boilerplate typing Declining
Figure 6 — Where developer skill returns are concentrated in 2026.

Invest more in

  • Writing crisp specs and acceptance criteria
  • Reading diffs fast and well
  • Designing testable interfaces
  • Architectural taste & long-term thinking
  • Security and threat modelling
  • Curating context (files, exemplars, memory)

Invest less in

  • Memorising library APIs
  • Hand-writing CRUD boilerplate
  • Configuring repetitive scaffolding
  • Polishing single-file syntax tricks
  • Manual refactors a tool can do safely

Caveat: invest in fundamentals at least once. You cannot review what you do not understand.

14 · Team Operating Model

Individual vibe coding scales to teams only when shared practices replace personal habits. The "team-OS" below is what separates a chaotic AI-tools-everywhere shop from a high-leverage engineering org.

Shared practices

  • One canonical conventions file per repo (AGENTS.md)
  • Spec template every non-trivial change starts from
  • PR template with explicit "AI assistance" + "verified by" fields
  • Mandatory second-opinion review (human or agent) before merge
  • Eval suite measuring agent quality on internal tasks, run weekly
  • Shared library of reusable prompts / skills / subagents
  • Incident post-mortems include "did AI contribute? how?"

Roles that emerge

  • Context owner: curates the AGENTS.md and exemplar set
  • Eval owner: writes and maintains the agent eval harness
  • Tool admin: manages MCP servers, permissions, budgets
  • Reviewer-in-chief: sets the bar for what AI PRs must pass
  • Security partner: threat-models agentic workflows

These are hats, not headcount — one person can wear several.

15 · Metrics & KPIs

Measure both velocity and trust. Velocity without quality metrics is how you end up shipping 3× the bugs at 3× the speed.

DimensionMetricWhy it matters
VelocityLead time for change (idea → prod)The headline benefit of vibe coding
VelocityPR cycle timeDetects review bottlenecks
QualityChange failure rateThe first thing that degrades under vibe-and-ship
QualityEscaped defects per 1k LOC mergedTrends down only with discipline
QualityMean time to recoverTests how reversible your changes are
Trust% of AI-authored diffs reviewed line-by-lineCultural signal; should stay near 100%
TrustAgent eval pass rate over timeCatches regressions in your prompts/conventions
CostToken spend per merged PRRight-sizes model tiering
Adoption% engineers using agents weeklyIndicates tooling fit, not headcount
WellbeingSelf-reported flow / frustrationVibe coding can either delight or burn out

16 · 30 / 60 / 90-Day Adoption Roadmap

Days 0–30

Foundation

  • Choose one IDE agent + one CLI agent
  • Write AGENTS.md for the top 2 repos
  • Adopt the Goal/Context/Constraints/Done prompt template
  • Enable secret scanner + SAST on every PR
  • Each dev ships 3 vibe-coded PRs with full diff review
  • Baseline DORA metrics
Days 31–60

Scale & standardise

  • Stand up shared MCP servers (DB, issue tracker, observability)
  • Introduce spec template + PR template fields
  • Build an internal eval harness for 10–20 representative tasks
  • Pilot one cloud / async agent on backlog sweep work
  • Run first "AI incident review" retro
  • Train all engineers on context engineering + verification ladder
Days 61–90

Leverage & learn

  • Promote the most-used prompts into reusable skills / subagents
  • Tier models by task class to control cost
  • Automate second-opinion review on every PR
  • Publish a quarterly "agent quality" report
  • Refactor one legacy module using agent-led migration
  • Re-measure DORA + trust metrics; compare to baseline

17 · Future Outlook (12–24 months)

Trend

Specs become the source of truth

Code becomes a compiled artefact of spec + tests + context. Diff review shifts upstream to spec review.

Trend

Multi-agent teams

Planner, implementer, reviewer, and tester agents collaborate on a single PR with the human as editor-in-chief.

Trend

Eval-driven development

Teams maintain task-specific eval suites the way they maintain test suites. Prompt/skill changes ship with eval deltas.

Trend

Local + edge models

On-device small models handle high-volume edits; cloud frontier models handle judgment-heavy tasks. Cost and privacy improve.

Trend

Regulatory pressure

SBOMs, attribution, model-card requirements, and AI-disclosure rules become standard. AGENTS.md gets a "compliance" section.

Risk

Two-tier engineering market

Developers who only prompt without understanding fall behind. Those who pair fundamentals with agentic leverage thrive disproportionately.

18 · What I keep coming back to

Vibe coding is neither hype nor heresy — it is the interface I use most days now. The developers who win are not the ones who give up on rigor; they are the ones who relocate it: from typing to specifying, from writing to reviewing, from individual cleverness to systemic feedback loops.

The discipline is simple to state and hard to do well: hold a clear vision, iterate in small steps, set firm boundaries, and demand evidence. Do that consistently and an AI agent is the highest-leverage colleague I have worked with. Skip any one of those and it becomes the most expensive intern I have ever hired.

What I keep coming back to: vibe coding rewards taste, specifications, and verification — not faster typing. Invest there, and the rest of the stack is just plumbing.

19 · References & Sources

Annotated bibliography behind the vibe-coding definition, autonomy spectrum, six-stage workflow, tooling landscape, VIBE framework, prompt patterns, context-engineering pyramid, verification ladder, risk table, anti-patterns, skills matrix, team operating model, DORA-style KPIs, adoption roadmap, and future-outlook cards. Section tags (e.g. §05) show where each source is used. Diagrams and the VIBE acronym are my synthesis unless noted.

Scope. Synthesis of practitioner writing, vendor documentation, and software-engineering research (May 2026). Hero KPI ranges (~55% AI-assisted code, 3–5× prototype speed, 2× defect rate without review, context engineering as #1 skill) blend GitHub, Stack Overflow, and field surveys — directional, not universal. Tool names in §06 reflect the 2025–26 landscape and will shift. Not tool endorsement, employment, or legal advice.

Citations are numbered continuously [1]–[n] within this section.

Origin, definition & the vibe-coding meme (§01–§02)

  1. Karpathy, A., post introducing "vibe coding." X (Twitter), February 2025. Coined the phrase — "give in to the vibes, embrace exponentials, forget that the code even exists" — §01–§02 etymology and working definition. x.com/karpathy (search Feb 2025 vibe coding). — §01, §02.
  2. Merriam-Webster, "vibe coding" Word of the Year coverage & dictionary entry. 2025. Mainstream adoption of the term within a year of coinage — §01 timeline sentence. merriam-webster.com — §01.
  3. Wiener, A., "Vibe Coding and the Future of Software." The New Yorker, April 2025. Cultural and professional framing of agent-directed development — background for §02 distinction (autocomplete vs agentic loop). newyorker.com — §02.

Adoption, productivity & hero statistics (§01 KPIs, §15–§16)

  1. GitHub, Octoverse 2024 & Copilot usage reports. 2024–25. AI-assisted coding adoption in surveyed teams — anchor for §01 ~55% hero stat (verify latest Octoverse/Copilot metrics). github.blog/octoverse — §01, §15.
  2. Stack Overflow, 2025 Developer Survey — AI section. 2025. Developer tool usage, trust, and productivity self-reports — §01 adoption context and §15 wellbeing KPI. survey.stackoverflow.co — §01, §15.
  3. McKinsey & Company, Unlocking Value from AI in Software Development (Digital practice insights). 2024–25. Prototype-cycle compression and review bottlenecks — §01 3–5× prototype KPI and §10 trust bottleneck. mckinsey.com/digital — §01, §10.
  4. Peng, S. et al., "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot." arXiv:2302.06590, 2023; follow-on controlled studies. Faster task completion with AI assistance; quality caveats when review skipped — §01 2× defect KPI and §12 vibe-and-ship anti-pattern. arxiv.org/abs/2302.06590 — §01, §12.

Autonomy spectrum, agentic loops & the six-stage workflow (§03–§05, FIG 1–2)

  1. Yao, S. et al., "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. Plan → act → observe loop underpinning §03 stages 4–5 and §05 Generate/Verify phases. arxiv.org/abs/2210.03629 — §03, §05.
  2. Anthropic, Building Effective Agents guide. 2024–25. Agent design patterns: gather context, take action, verify — maps to §05 six-stage loop and §04 P3/P4 principles. docs.anthropic.com/agents — §04, §05.
  3. Jimenez, C. E. et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" ICLR 2024. Benchmark for autonomous coding agents — §03 stage 5 "Autonomous agent" and §17 eval-driven trend. arxiv.org/abs/2310.06770 — §03, §17.
  4. Shinn, N. et al., "Reflexion: Language Agents with Verbal Reinforcement Learning." NeurIPS 2023. Iterative self-correction — background for §05 Verify stage and §12 prompt-golf anti-pattern. arxiv.org/abs/2303.11366 — §05, §12.

Tooling landscape — IDEs, CLIs, cloud agents & review (§06, FIG 3)

  1. Cursor documentation — Agent mode, rules, and codebase context. 2025–26. Agentic IDE in §06 table and §09 .cursorrules conventions file. docs.cursor.com — §06, §09.
  2. Anthropic, Claude Code documentation. 2025–26. Terminal/repo-wide agent — §06 terminal-agents row and §16 Days 0–30 tooling choice. docs.anthropic.com/claude-code — §06, §16.
  3. OpenAI, Codex CLI / Codex Cloud documentation. 2025–26. Cloud/async agent patterns — §06 cloud-agents row and §16 Days 31–60 pilot. developers.openai.com/codex — §06, §16.
  4. Cognition, Devin technical reports & product documentation. 2024–25. Long-running autonomous software engineer agent — §06 cloud-agents category and §12 lone-wolf anti-pattern. cognition.ai — §06, §12.
  5. Aider, AI pair programming in your terminal — docs & architecture. 2024–25. Git-aware CLI editing — §06 terminal-agents row. aider.chat — §06.
  6. CodeRabbit, Greptile & AI PR-review tooling documentation. 2024–25. Second-opinion review bots — §06 review row and §10 verification-ladder step 6. docs.coderabbit.ai — §06, §10.

VIBE framework, specs & prompt patterns (§07–§08)

  1. Yan, E., "Patterns for Building LLM-based Systems & Products." eugeneyan.com, 2023–25. Goal/context/constraints patterns and eval discipline — §08 prompt patterns and §04 P1 intent-over-syntax. eugeneyan.com/writing/llm-patterns — §04, §08.
  2. GitHub, Spec Kit & spec-driven development guides. 2025. Structured specs before codegen — §08 Goal·Context·Constraints·Done template and §14 spec template. github.com/spec-kit — §08, §14.
  3. Beck, K., Test-Driven Development: By Example. Addison-Wesley, 2002. Red-green-refactor as feedback loop — intellectual basis for §10 "tests are spec artefacts" and §04 P3 tight loops. — §04, §10.
  4. Google, Software Engineering at Google (Winters, Manshreck, Wright). O'Reilly, 2020. Code review culture and readability — §07 Evidence pillar and §13 code-review skill bar. — §07, §13.

Context engineering, MCP & project conventions (§09, FIG 5)

  1. Anthropic, "Effective context engineering for AI agents" (engineering blog). 2025. Curating inputs beyond single prompts — §09 context-engineering definition and FIG 5 pyramid. anthropic.com/engineering — §09.
  2. Model Context Protocol (MCP) specification — Anthropic, 2024–25. Standard for tools, data, and just-in-time retrieval — §09 dynamic retrieval card and §06 context-plumbing row. modelcontextprotocol.io — §06, §09, §16.
  3. Anthropic, CLAUDE.md / project-instructions conventions. 2025. Repo-root rules files — §09 static conventions card (CLAUDE.md, AGENTS.md). docs.anthropic.com — §09, §14.
  4. Liu, N. F. et al., "Lost in the Middle: How Language Models Use Long Contexts." TACL 2024. Attention limits in long contexts — §09 "context budget" curation rationale. arxiv.org/abs/2307.03172 — §09.
  5. Nygard, M., Documenting Architecture Decisions (ADR format). 2011. Capturing durable decisions — §09 persistent memory and §11 architecture-drift mitigation. cognitect.com/blog — §09, §11.

Quality, verification, security & LLM risks (§10–§11, §12)

  1. OWASP, Top 10 for Large Language Model Applications (2025 edition). Prompt injection, insecure output handling, supply-chain risks — §11 risk table and §10 guardrail checklist. owasp.org/llm-top10 — §10, §11.
  2. Greshake, K. et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." 2023. Untrusted content hijacking agents — §11 prompt-injection row. arxiv.org/abs/2302.12173 — §11.
  3. NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0). 2023. Govern-map-measure-manage cycle for AI systems — §10 guardrails and §14 security-partner role. nist.gov/ai-rmf — §10, §14.
  4. Google, SRE Book — monitoring, alerting, blameless postmortems. 2016–18. Production verification ladder top rung — §10 step 7 telemetry and §14 incident post-mortems. sre.google/sre-book — §10, §14.
  5. Truong, L., companion note: AI Cost Control. May 2026. Token budgets, model tiering, kill-switches for agent runs — §11 cost-runaway row and §16 Days 61–90 cost tiering. Same author collection. — §11, §16.

DORA metrics, eval harnesses & team operating model (§14–§17)

  1. Forsgren, N., Humble, J., & Kim, G., Accelerate: The Science of Lean Software and DevOps. IT Revolution, 2018. DORA metrics (lead time, deployment frequency, CFR, MTTR) — §15 metrics table and §16 baseline DORA step. — §15, §16.
  2. DORA / Google Cloud, State of DevOps Report (annual). 2024–25. Benchmarking change failure rate and recovery — §15 quality rows. dora.dev — §15.
  3. Braintrust, Evaluations for LLM applications documentation. 2024–25. Task-specific eval suites — §14 eval owner, §15 agent eval pass rate, §17 eval-driven trend. braintrust.dev — §14, §15, §17.
  4. METR, Model Evaluation & Threat Research — agent task benchmarks. 2024–25. Measuring autonomous coding capability over time — §17 eval-driven development card. metr.org — §17.
  5. Skelton, M. & Pais, M., Team Topologies. IT Revolution, 2019. Platform/enabling teams — §14 emerging roles (context owner, tool admin) as hats not headcount. — §14.

Future outlook, compliance & skills shift (§13, §17)

  1. U.S. NTIA / CISA, software bill of materials (SBOM) guidance & minimum elements. 2021–25. Supply-chain transparency — §17 regulatory-pressure card. ntia.gov/SBOM — §17.
  2. European Parliament & Council, Regulation (EU) 2024/1689 (AI Act). 2024. Disclosure and governance expectations for high-risk AI — §17 compliance trend (verify applicability to dev tooling in your jurisdiction). eur-lex.europa.eu — §17.
  3. Stanford HAI, 2025 AI Index Report — Technical Performance chapter. 2025. Capability/cost curves for coding models — §17 local+edge models trend and §13 declining syntax memorisation. hai.stanford.edu/ai-index — §13, §17.

Author synthesis

  1. Truong, L., Vibe Coding Strategy — personal working notes. May 2026. Original diagrams (FIG 1–6), VIBE framework, six-stage loop, skills matrix, anti-pattern cards, 30/60/90 roadmap, and team-OS practices. LinhTruong.com — all sections.
Before you quote externally: The ~55% AI-code statistic compresses multiple surveys with different definitions (Copilot suggestions accepted vs lines merged vs time assisted). Tool names and capabilities in §06 change quarterly. Karpathy's original post is informal coinage, not a technical standard. Re-verify adoption numbers and vendor claims against primary sources before citing in policy or procurement documents.