Personal notes · May 2026

Vibe Coding Strategy

I wrote this note to capture how I work with AI coding agents in production — the workflow, tooling choices, guardrails, and team habits that separate hobby vibes from shippable software. It is my working playbook, not a vendor white paper.

Author: Linh Truong, MA (Harvard), MBA · Source: LinhTruong.com · Email: Linh@Alumni.Harvard.edu

The question behind this note: when the agent does the typing, what do I still own? Answer: intent, taste, context, and verification — everything that turns generated code into software someone can trust.

Scope: intent → production Stance: pragmatic, tool-agnostic Horizon: 2025–2026 tooling ✍️ By: Linh Truong

1 · What I'm tracking

Vibe coding is how I describe working where I express intent in natural language and let an AI agent generate, run, test, and iterate — while I steer, review, accept, or reject. Andrej Karpathy coined the phrase in February 2025; within a year it moved from meme to default mode in major IDEs, cloud agents, and CLIs.

I treat vibe coding not as a replacement for engineering, but as a new interface to it — one that demands stronger taste, sharper specs, tighter feedback loops, and more disciplined verification, not less. This note is where I collect the patterns that actually hold up on real teams.

~55%code authored or modified by AI in surveyed teams (2026)

3–5×faster prototype-to-demo cycle

2×defect rate when accepting code without review

#1skill differentiator: context engineering

2 · Origin & Definition

Etymology

Where the term came from

In February 2025, Andrej Karpathy described a new style of working where he would "give in to the vibes, embrace exponentials, and forget that the code even exists." The phrase captured a real shift: developers were no longer writing most of the code — they were directing it.

Working Definition

What vibe coding actually is

Vibe coding = an AI-first development loop where a human supplies intent, taste, and verification, and an AI agent supplies code generation, editing, execution, and iteration. The artifact is still software; the locus of authorship shifts from keystrokes to specifications.

How I draw the line. Vibe coding is not the same as "using Copilot." Autocomplete is a typing accelerator. Vibe coding is an agentic loop: the model plans, edits multiple files, runs commands, observes results, and revises — with me acting as product owner, reviewer, and safety check.

3 · The Vibe Coding Spectrum

Not all "AI coding" is vibe coding. It sits on a spectrum of human-vs-agent authorship. Knowing where you are on the spectrum determines what guardrails you need.

Figure 1 — The AI-coding autonomy spectrum. Vibe coding lives at stages 4–5.

4 · Core Principles

Intent over syntax

Spend time on what and why. The model handles how. A clear paragraph beats a clever prompt.

Small reversible steps

Prefer many small diffs over one giant generation. Each step should compile, run, and be cheap to throw away.

Tight feedback loops

The agent must see the same signals you do: type errors, test output, logs, screenshots. No feedback = no learning.

Verification is non-negotiable

If a human did not understand, run, or test it, treat it as unverified. Trust comes from evidence, not vibes.

Context is the product

Quality output requires curated input: the right files, types, examples, conventions, and constraints.

Taste is the moat

Anyone can prompt. Few can judge. Architectural taste, API design, and UX sensibility are now your edge.

5 · The Vibe Coding Workflow

A reliable vibe coding loop has six stages. Skipping any of them is the most common failure mode.

Figure 2 — The six-stage vibe coding loop. Most failures come from skipping Verify or starving Context.

What "good" looks like at each stage

Stage	Sign of quality
1 · Intent	Acceptance criteria written in plain English before any prompt
2 · Context	Only the files that matter are loaded; conventions cited explicitly
3 · Plan	3–7 numbered steps; you can predict each step's outcome
4 · Generate	Each change runs cleanly; tests pass; no silent rewrites
5 · Verify	You read every diff; you exercise the feature manually
6 · Land	Commit message captures the why; learnings flow back

Common failure modes

Vibe-and-pray: skip Intent, skip Verify, ship.
Context starvation: agent invents APIs that don't exist.
Mega-prompt: a 400-line wall produces a 2000-line PR no one can review.
Yes-machine: accepting every suggestion erodes the agent's signal.
Lost causality: no commit per logical change → impossible to bisect.
Tool blindness: agent has no test runner / no logs → it cannot self-correct.

6 · Tooling Landscape (2026)

Vibe coding tools cluster into five layers. A modern stack picks one or two from each layer and wires them together with MCP (Model Context Protocol) servers.

Figure 3 — The vibe coding stack. Each layer is swappable; the interfaces (MCP, file conventions) are the lock-in.

Category	Representative tools (2026)	Best for
Agentic IDEs	Cursor, Windsurf, VS Code + Copilot, JetBrains AI	Day-to-day feature work with human in the chair
Terminal agents	Claude Code, Codex CLI, Aider, Cline	Repo-wide refactors, scripting, server work
Cloud / async agents	Devin, Claude Agent SDK, OpenAI Codex Cloud, Replit Agent	Long-running tasks, parallel PRs, sweep work
App-builder agents	Bolt, Lovable, v0, Replit, Claude Artifacts	Zero-to-prototype, internal tools, MVPs
Review & QA	CodeRabbit, Greptile, Diamond, ultrareview-style multi-agent	Second-opinion review on every PR
Eval & observability	Braintrust, Langfuse, Helicone, internal eval harnesses	Measuring agent quality over time
Context plumbing	MCP servers, repo indexers, ADR generators	Feeding agents the right facts

Rule I use for tools. Pick the least autonomous tool that solves the problem. Use autocomplete for known patterns, chat for scoped edits, IDE agents for features, cloud agents for sweeps. Skipping rungs creates accidents.

7 · The VIBE Strategy Framework

A four-pillar mental model I use on every project. When one pillar is weak, that is where the work stalls.

Figure 4 — The VIBE framework: Vision · Iteration · Boundaries · Evidence.

8 · Prompt & Spec Patterns

A prompt is a tiny specification. Treat it like product writing: structure, examples, constraints. The five patterns below cover ~80% of day-to-day work.

Pattern 1

Goal · Context · Constraints · Done

The universal default. Four short paragraphs, in this order.

# Goal
Add server-side pagination to /api/orders, default 50/page.

# Context
- Express app, Prisma ORM, see routes/orders.ts and prisma/schema.prisma.
- We already use cursor pagination in /api/invoices — match that style.

# Constraints
- No breaking changes to existing clients.
- Keep response shape additive (add nextCursor, keep items).

# Done when
- New unit test passes; existing tests still pass.
- Manual curl with and without cursor returns expected pages.

Pattern 2

Plan-then-execute

For anything spanning more than two files, ask the agent to plan first.

Before writing any code, list the files you will change
and a 1-line description of the change for each.
Stop and wait for approval.

Catches "agent invents new architecture" early, while it is still cheap.

Pattern 3

Reference exemplars

Point to a file that already does it right.

Implement <feature> following the same structure as
src/features/auth/login.ts — same error handling,
same logging conventions, same test layout.

Exemplars beat style guides. Style guides beat vibes.

Pattern 4

Red-team your own prompt

For risky changes, end the prompt with:

List three ways this change could break production
and the test that would catch each one before
writing the implementation.

Surfaces edge cases without doubling the work.

Pattern 5

Spec, not request

For features lasting more than a day, write a short spec document and feed it as a file. Specs are reusable, prompts are not.

SPEC.md
- Problem
- Users & jobs-to-be-done
- API surface (with examples)
- Data model
- Edge cases
- Out of scope
- Test plan

Anti-pattern

The mega-prompt

A 400-line prompt asking for "the whole thing" produces a 2,000-line diff. Nobody reviews it. Bugs ship.

Fix: decompose into 3–7 vertical slices, each independently testable.

9 · Context Engineering

In 2026, "prompt engineering" has been absorbed into the broader practice of context engineering: curating the full set of inputs the agent sees — system rules, files, examples, tool outputs, prior decisions, and memory.

Static

Project conventions file

A CLAUDE.md / AGENTS.md / .cursorrules in the repo root capturing:

Stack & versions
Run, test, lint commands
Code style & naming
"Always / Never" rules
Pointers to exemplar files

Dynamic

Just-in-time retrieval

Tools that let the agent pull what it needs: repo grep, file read, type signatures, DB schema, API docs via MCP. Better than dumping the whole repo.

Persistent

Memory & learnings

Save durable facts (user role, project goals, "we tried X, it failed because Y") to a memory file. Reload across sessions to avoid re-explaining.

The context budget

Even with million-token windows, attention is finite. Curate ruthlessly:

Figure 5 — Context pyramid. Above the line: high-signal. Below: noise.

10 · Quality, Tests & Guardrails

Vibe coding shifts the bottleneck from writing code to trusting code. Your quality stack must be machine-readable so the agent can use it as a feedback signal.

The verification ladder

Compiles / type-checks — the floor. Never skip.
Linter clean — catches style drift the agent introduces.
Unit tests pass — including new ones the agent wrote.
Integration tests pass — hits real DB, real API client.
Manual exercise — you run the app and try the feature.
Second-opinion review — a different model / human reviews the diff.
Production telemetry — error rates, latency, business metrics.

Guardrail checklist (set once, reuse forever)

Sandboxed execution for any agent-run command
Permission allowlist for shell commands (no rm -rf by default)
Secrets scanner pre-commit hook
SAST + dependency-audit on every PR
Branch protection: agents cannot push to main
Disposable cloud envs for agent demos
Audit log of agent actions retained for ≥ 30 days

Tests are now spec artefacts. A failing test is the clearest possible prompt: "make this pass." Invest more in tests because the agent will use them as its compass.

11 · Risks & Mitigations

Risk	How it appears	Mitigation
Hallucinated APIs	Agent imports non-existent libraries or functions	Type-check + run before commit; reference real exemplars
Security regressions	SQL injection, XSS, missing auth checks	SAST in CI, security-review skill on diff, principle of least privilege
Secret leakage	Keys committed to repo or sent to model	Pre-commit secret scanner, env-var hygiene, on-host model for sensitive code
Supply-chain risk	Agent adds a malicious or typosquatted dependency	Pin versions, allowlist registries, audit on add
Architecture drift	Each feature invents its own pattern	Strong conventions file + exemplar references + ADRs
Skill atrophy	Devs lose fundamentals they no longer practice	Mandatory "no-AI" exercises; deliberate learning time
Over-trust	"It compiled" treated as "it works"	Verification ladder enforced in PR template
Cost runaway	Long agent runs burn tokens	Budgets, model tiering, caching, kill-switches
IP / license risk	Generated code resembles GPL training data	Use licence-aware models, attribution scanning
Prompt injection	Untrusted text in a fetched file hijacks the agent	Treat tool output as untrusted; sandbox; allowlist actions

12 · Anti-Patterns

Anti

Vibe-and-ship

Accepting AI output unread and merging it. Defects ship at 2× the rate. Always read the diff.

Anti

Prompt golf

Endlessly tweaking a single prompt to "fix" output. Stop, decompose, add a test, try again.

Anti

Single-shot megaproject

"Build me Twitter." Big-bang prompts produce big-bang failures. Vertical slices instead.

Anti

The lone wolf agent

One developer running an unsupervised cloud agent for hours. Use PR-sized units of work.

Anti

Documentation by deletion

Agent rewrites code and quietly drops docs/comments that captured non-obvious "why."

Anti

Test theatre

Agent writes tests that pass by mirroring the implementation. Review tests before code.

13 · Skills Matrix for the Modern Developer

The relative importance of developer skills has shifted. Below is a snapshot of where to invest your learning hours now.

Figure 6 — Where developer skill returns are concentrated in 2026.

Invest more in

Writing crisp specs and acceptance criteria
Reading diffs fast and well
Designing testable interfaces
Architectural taste & long-term thinking
Security and threat modelling
Curating context (files, exemplars, memory)

Invest less in

Memorising library APIs
Hand-writing CRUD boilerplate
Configuring repetitive scaffolding
Polishing single-file syntax tricks
Manual refactors a tool can do safely

Caveat: invest in fundamentals at least once. You cannot review what you do not understand.

14 · Team Operating Model

Individual vibe coding scales to teams only when shared practices replace personal habits. The "team-OS" below is what separates a chaotic AI-tools-everywhere shop from a high-leverage engineering org.

Shared practices

One canonical conventions file per repo (AGENTS.md)
Spec template every non-trivial change starts from
PR template with explicit "AI assistance" + "verified by" fields
Mandatory second-opinion review (human or agent) before merge
Eval suite measuring agent quality on internal tasks, run weekly
Shared library of reusable prompts / skills / subagents
Incident post-mortems include "did AI contribute? how?"

Roles that emerge

Context owner: curates the AGENTS.md and exemplar set
Eval owner: writes and maintains the agent eval harness
Tool admin: manages MCP servers, permissions, budgets
Reviewer-in-chief: sets the bar for what AI PRs must pass
Security partner: threat-models agentic workflows

These are hats, not headcount — one person can wear several.

15 · Metrics & KPIs

Measure both velocity and trust. Velocity without quality metrics is how you end up shipping 3× the bugs at 3× the speed.

Dimension	Metric	Why it matters
Velocity	Lead time for change (idea → prod)	The headline benefit of vibe coding
Velocity	PR cycle time	Detects review bottlenecks
Quality	Change failure rate	The first thing that degrades under vibe-and-ship
Quality	Escaped defects per 1k LOC merged	Trends down only with discipline
Quality	Mean time to recover	Tests how reversible your changes are
Trust	% of AI-authored diffs reviewed line-by-line	Cultural signal; should stay near 100%
Trust	Agent eval pass rate over time	Catches regressions in your prompts/conventions
Cost	Token spend per merged PR	Right-sizes model tiering
Adoption	% engineers using agents weekly	Indicates tooling fit, not headcount
Wellbeing	Self-reported flow / frustration	Vibe coding can either delight or burn out

16 · 30 / 60 / 90-Day Adoption Roadmap

Days 0–30

Foundation

Choose one IDE agent + one CLI agent
Write AGENTS.md for the top 2 repos
Adopt the Goal/Context/Constraints/Done prompt template
Enable secret scanner + SAST on every PR
Each dev ships 3 vibe-coded PRs with full diff review
Baseline DORA metrics

Days 31–60

Scale & standardise

Stand up shared MCP servers (DB, issue tracker, observability)
Introduce spec template + PR template fields
Build an internal eval harness for 10–20 representative tasks
Pilot one cloud / async agent on backlog sweep work
Run first "AI incident review" retro
Train all engineers on context engineering + verification ladder

Days 61–90

Leverage & learn

Promote the most-used prompts into reusable skills / subagents
Tier models by task class to control cost
Automate second-opinion review on every PR
Publish a quarterly "agent quality" report
Refactor one legacy module using agent-led migration
Re-measure DORA + trust metrics; compare to baseline

17 · Future Outlook (12–24 months)

Trend

Specs become the source of truth

Code becomes a compiled artefact of spec + tests + context. Diff review shifts upstream to spec review.

Trend

Multi-agent teams

Planner, implementer, reviewer, and tester agents collaborate on a single PR with the human as editor-in-chief.

Trend

Eval-driven development

Teams maintain task-specific eval suites the way they maintain test suites. Prompt/skill changes ship with eval deltas.

Trend

Local + edge models

On-device small models handle high-volume edits; cloud frontier models handle judgment-heavy tasks. Cost and privacy improve.

Trend

Regulatory pressure

SBOMs, attribution, model-card requirements, and AI-disclosure rules become standard. AGENTS.md gets a "compliance" section.

Risk

Two-tier engineering market

Developers who only prompt without understanding fall behind. Those who pair fundamentals with agentic leverage thrive disproportionately.

18 · What I keep coming back to

Vibe coding is neither hype nor heresy — it is the interface I use most days now. The developers who win are not the ones who give up on rigor; they are the ones who relocate it: from typing to specifying, from writing to reviewing, from individual cleverness to systemic feedback loops.

The discipline is simple to state and hard to do well: hold a clear vision, iterate in small steps, set firm boundaries, and demand evidence. Do that consistently and an AI agent is the highest-leverage colleague I have worked with. Skip any one of those and it becomes the most expensive intern I have ever hired.

What I keep coming back to: vibe coding rewards taste, specifications, and verification — not faster typing. Invest there, and the rest of the stack is just plumbing.

19 · References & Sources

Annotated bibliography behind the vibe-coding definition, autonomy spectrum, six-stage workflow, tooling landscape, VIBE framework, prompt patterns, context-engineering pyramid, verification ladder, risk table, anti-patterns, skills matrix, team operating model, DORA-style KPIs, adoption roadmap, and future-outlook cards. Section tags (e.g. §05) show where each source is used. Diagrams and the VIBE acronym are my synthesis unless noted.

Scope. Synthesis of practitioner writing, vendor documentation, and software-engineering research (May 2026). Hero KPI ranges (~55% AI-assisted code, 3–5× prototype speed, 2× defect rate without review, context engineering as #1 skill) blend GitHub, Stack Overflow, and field surveys — directional, not universal. Tool names in §06 reflect the 2025–26 landscape and will shift. Not tool endorsement, employment, or legal advice.

Citations are numbered continuously [1]–[n] within this section.

Origin, definition & the vibe-coding meme (§01–§02)

Karpathy, A., post introducing "vibe coding." X (Twitter), February 2025. Coined the phrase — "give in to the vibes, embrace exponentials, forget that the code even exists" — §01–§02 etymology and working definition. x.com/karpathy (search Feb 2025 vibe coding). — §01, §02.
Merriam-Webster, "vibe coding" Word of the Year coverage & dictionary entry. 2025. Mainstream adoption of the term within a year of coinage — §01 timeline sentence. merriam-webster.com — §01.
Wiener, A., "Vibe Coding and the Future of Software." The New Yorker, April 2025. Cultural and professional framing of agent-directed development — background for §02 distinction (autocomplete vs agentic loop). newyorker.com — §02.

Adoption, productivity & hero statistics (§01 KPIs, §15–§16)

GitHub, Octoverse 2024 & Copilot usage reports. 2024–25. AI-assisted coding adoption in surveyed teams — anchor for §01 ~55% hero stat (verify latest Octoverse/Copilot metrics). github.blog/octoverse — §01, §15.
Stack Overflow, 2025 Developer Survey — AI section. 2025. Developer tool usage, trust, and productivity self-reports — §01 adoption context and §15 wellbeing KPI. survey.stackoverflow.co — §01, §15.
McKinsey & Company, Unlocking Value from AI in Software Development (Digital practice insights). 2024–25. Prototype-cycle compression and review bottlenecks — §01 3–5× prototype KPI and §10 trust bottleneck. mckinsey.com/digital — §01, §10.
Peng, S. et al., "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot." arXiv:2302.06590, 2023; follow-on controlled studies. Faster task completion with AI assistance; quality caveats when review skipped — §01 2× defect KPI and §12 vibe-and-ship anti-pattern. arxiv.org/abs/2302.06590 — §01, §12.

Autonomy spectrum, agentic loops & the six-stage workflow (§03–§05, FIG 1–2)

Yao, S. et al., "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. Plan → act → observe loop underpinning §03 stages 4–5 and §05 Generate/Verify phases. arxiv.org/abs/2210.03629 — §03, §05.
Anthropic, Building Effective Agents guide. 2024–25. Agent design patterns: gather context, take action, verify — maps to §05 six-stage loop and §04 P3/P4 principles. docs.anthropic.com/agents — §04, §05.
Jimenez, C. E. et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" ICLR 2024. Benchmark for autonomous coding agents — §03 stage 5 "Autonomous agent" and §17 eval-driven trend. arxiv.org/abs/2310.06770 — §03, §17.
Shinn, N. et al., "Reflexion: Language Agents with Verbal Reinforcement Learning." NeurIPS 2023. Iterative self-correction — background for §05 Verify stage and §12 prompt-golf anti-pattern. arxiv.org/abs/2303.11366 — §05, §12.

Tooling landscape — IDEs, CLIs, cloud agents & review (§06, FIG 3)

Cursor documentation — Agent mode, rules, and codebase context. 2025–26. Agentic IDE in §06 table and §09 .cursorrules conventions file. docs.cursor.com — §06, §09.
Anthropic, Claude Code documentation. 2025–26. Terminal/repo-wide agent — §06 terminal-agents row and §16 Days 0–30 tooling choice. docs.anthropic.com/claude-code — §06, §16.
OpenAI, Codex CLI / Codex Cloud documentation. 2025–26. Cloud/async agent patterns — §06 cloud-agents row and §16 Days 31–60 pilot. developers.openai.com/codex — §06, §16.
Cognition, Devin technical reports & product documentation. 2024–25. Long-running autonomous software engineer agent — §06 cloud-agents category and §12 lone-wolf anti-pattern. cognition.ai — §06, §12.
Aider, AI pair programming in your terminal — docs & architecture. 2024–25. Git-aware CLI editing — §06 terminal-agents row. aider.chat — §06.
CodeRabbit, Greptile & AI PR-review tooling documentation. 2024–25. Second-opinion review bots — §06 review row and §10 verification-ladder step 6. docs.coderabbit.ai — §06, §10.

VIBE framework, specs & prompt patterns (§07–§08)

Yan, E., "Patterns for Building LLM-based Systems & Products." eugeneyan.com, 2023–25. Goal/context/constraints patterns and eval discipline — §08 prompt patterns and §04 P1 intent-over-syntax. eugeneyan.com/writing/llm-patterns — §04, §08.
GitHub, Spec Kit & spec-driven development guides. 2025. Structured specs before codegen — §08 Goal·Context·Constraints·Done template and §14 spec template. github.com/spec-kit — §08, §14.
Beck, K., Test-Driven Development: By Example. Addison-Wesley, 2002. Red-green-refactor as feedback loop — intellectual basis for §10 "tests are spec artefacts" and §04 P3 tight loops. — §04, §10.
Google, Software Engineering at Google (Winters, Manshreck, Wright). O'Reilly, 2020. Code review culture and readability — §07 Evidence pillar and §13 code-review skill bar. — §07, §13.

Context engineering, MCP & project conventions (§09, FIG 5)

Anthropic, "Effective context engineering for AI agents" (engineering blog). 2025. Curating inputs beyond single prompts — §09 context-engineering definition and FIG 5 pyramid. anthropic.com/engineering — §09.
Model Context Protocol (MCP) specification — Anthropic, 2024–25. Standard for tools, data, and just-in-time retrieval — §09 dynamic retrieval card and §06 context-plumbing row. modelcontextprotocol.io — §06, §09, §16.
Anthropic, CLAUDE.md / project-instructions conventions. 2025. Repo-root rules files — §09 static conventions card (CLAUDE.md, AGENTS.md). docs.anthropic.com — §09, §14.
Liu, N. F. et al., "Lost in the Middle: How Language Models Use Long Contexts." TACL 2024. Attention limits in long contexts — §09 "context budget" curation rationale. arxiv.org/abs/2307.03172 — §09.
Nygard, M., Documenting Architecture Decisions (ADR format). 2011. Capturing durable decisions — §09 persistent memory and §11 architecture-drift mitigation. cognitect.com/blog — §09, §11.

Quality, verification, security & LLM risks (§10–§11, §12)

OWASP, Top 10 for Large Language Model Applications (2025 edition). Prompt injection, insecure output handling, supply-chain risks — §11 risk table and §10 guardrail checklist. owasp.org/llm-top10 — §10, §11.
Greshake, K. et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." 2023. Untrusted content hijacking agents — §11 prompt-injection row. arxiv.org/abs/2302.12173 — §11.
NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0). 2023. Govern-map-measure-manage cycle for AI systems — §10 guardrails and §14 security-partner role. nist.gov/ai-rmf — §10, §14.
Google, SRE Book — monitoring, alerting, blameless postmortems. 2016–18. Production verification ladder top rung — §10 step 7 telemetry and §14 incident post-mortems. sre.google/sre-book — §10, §14.
Truong, L., companion note: AI Cost Control. May 2026. Token budgets, model tiering, kill-switches for agent runs — §11 cost-runaway row and §16 Days 61–90 cost tiering. Same author collection. — §11, §16.

DORA metrics, eval harnesses & team operating model (§14–§17)

Forsgren, N., Humble, J., & Kim, G., Accelerate: The Science of Lean Software and DevOps. IT Revolution, 2018. DORA metrics (lead time, deployment frequency, CFR, MTTR) — §15 metrics table and §16 baseline DORA step. — §15, §16.
DORA / Google Cloud, State of DevOps Report (annual). 2024–25. Benchmarking change failure rate and recovery — §15 quality rows. dora.dev — §15.
Braintrust, Evaluations for LLM applications documentation. 2024–25. Task-specific eval suites — §14 eval owner, §15 agent eval pass rate, §17 eval-driven trend. braintrust.dev — §14, §15, §17.
METR, Model Evaluation & Threat Research — agent task benchmarks. 2024–25. Measuring autonomous coding capability over time — §17 eval-driven development card. metr.org — §17.
Skelton, M. & Pais, M., Team Topologies. IT Revolution, 2019. Platform/enabling teams — §14 emerging roles (context owner, tool admin) as hats not headcount. — §14.

Future outlook, compliance & skills shift (§13, §17)

U.S. NTIA / CISA, software bill of materials (SBOM) guidance & minimum elements. 2021–25. Supply-chain transparency — §17 regulatory-pressure card. ntia.gov/SBOM — §17.
European Parliament & Council, Regulation (EU) 2024/1689 (AI Act). 2024. Disclosure and governance expectations for high-risk AI — §17 compliance trend (verify applicability to dev tooling in your jurisdiction). eur-lex.europa.eu — §17.
Stanford HAI, 2025 AI Index Report — Technical Performance chapter. 2025. Capability/cost curves for coding models — §17 local+edge models trend and §13 declining syntax memorisation. hai.stanford.edu/ai-index — §13, §17.

Author synthesis

Truong, L., Vibe Coding Strategy — personal working notes. May 2026. Original diagrams (FIG 1–6), VIBE framework, six-stage loop, skills matrix, anti-pattern cards, 30/60/90 roadmap, and team-OS practices. LinhTruong.com — all sections.

Before you quote externally: The ~55% AI-code statistic compresses multiple surveys with different definitions (Copilot suggestions accepted vs lines merged vs time assisted). Tool names and capabilities in §06 change quarterly. Karpathy's original post is informal coinage, not a technical standard. Re-verify adoption numbers and vendor claims against primary sources before citing in policy or procurement documents.