I wrote this note to capture how I work with AI coding agents in production — the workflow, tooling choices,
guardrails, and team habits that separate hobby vibes from shippable software. It is my working playbook,
not a vendor white paper.
The question behind this note: when the agent does the typing, what do I still own? Answer: intent, taste, context, and verification — everything that turns generated code into software someone can trust.
Vibe coding is how I describe working where I express intent in natural language
and let an AI agent generate, run, test, and iterate — while I steer, review, accept, or reject.
Andrej Karpathy coined the phrase in February 2025; within a year it moved from meme to default mode
in major IDEs, cloud agents, and CLIs.
I treat vibe coding not as a replacement for engineering, but as a new interface to it — one that
demands stronger taste, sharper specs, tighter feedback loops, and more disciplined verification, not less.
This note is where I collect the patterns that actually hold up on real teams.
~55%code authored or modified by AI in surveyed teams (2026)
3–5×faster prototype-to-demo cycle
2×defect rate when accepting code without review
#1skill differentiator: context engineering
2 · Origin & Definition
Etymology
Where the term came from
In February 2025, Andrej Karpathy described a new style of working where he would
"give in to the vibes, embrace exponentials, and forget that the code even exists."
The phrase captured a real shift: developers were no longer writing most of the code
— they were directing it.
Working Definition
What vibe coding actually is
Vibe coding = an AI-first development loop where a human supplies
intent, taste, and verification, and an AI agent supplies
code generation, editing, execution, and iteration. The artifact is still
software; the locus of authorship shifts from keystrokes to specifications.
How I draw the line. Vibe coding is not the same as
"using Copilot." Autocomplete is a typing accelerator. Vibe coding is an
agentic loop: the model plans, edits multiple files, runs commands, observes
results, and revises — with me acting as product owner, reviewer, and safety check.
3 · The Vibe Coding Spectrum
Not all "AI coding" is vibe coding. It sits on a spectrum of human-vs-agent authorship.
Knowing where you are on the spectrum determines what guardrails you need.
Figure 1 — The AI-coding autonomy spectrum. Vibe coding lives at stages 4–5.
4 · Core Principles
P1
Intent over syntax
Spend time on what and why. The model handles how. A clear paragraph beats a clever prompt.
P2
Small reversible steps
Prefer many small diffs over one giant generation. Each step should compile, run, and be cheap to throw away.
P3
Tight feedback loops
The agent must see the same signals you do: type errors, test output, logs, screenshots. No feedback = no learning.
P4
Verification is non-negotiable
If a human did not understand, run, or test it, treat it as unverified. Trust comes from evidence, not vibes.
P5
Context is the product
Quality output requires curated input: the right files, types, examples, conventions, and constraints.
P6
Taste is the moat
Anyone can prompt. Few can judge. Architectural taste, API design, and UX sensibility are now your edge.
5 · The Vibe Coding Workflow
A reliable vibe coding loop has six stages. Skipping any of them is the most common failure mode.
Figure 2 — The six-stage vibe coding loop. Most failures come from skipping Verify or starving Context.
What "good" looks like at each stage
Stage
Sign of quality
1 · Intent
Acceptance criteria written in plain English before any prompt
2 · Context
Only the files that matter are loaded; conventions cited explicitly
3 · Plan
3–7 numbered steps; you can predict each step's outcome
4 · Generate
Each change runs cleanly; tests pass; no silent rewrites
5 · Verify
You read every diff; you exercise the feature manually
6 · Land
Commit message captures the why; learnings flow back
Common failure modes
Vibe-and-pray: skip Intent, skip Verify, ship.
Context starvation: agent invents APIs that don't exist.
Mega-prompt: a 400-line wall produces a 2000-line PR no one can review.
Yes-machine: accepting every suggestion erodes the agent's signal.
Lost causality: no commit per logical change → impossible to bisect.
Tool blindness: agent has no test runner / no logs → it cannot self-correct.
6 · Tooling Landscape (2026)
Vibe coding tools cluster into five layers. A modern stack picks one or two from each layer
and wires them together with MCP (Model Context Protocol) servers.
Figure 3 — The vibe coding stack. Each layer is swappable; the interfaces (MCP, file conventions) are the lock-in.
Category
Representative tools (2026)
Best for
Agentic IDEs
Cursor, Windsurf, VS Code + Copilot, JetBrains AI
Day-to-day feature work with human in the chair
Terminal agents
Claude Code, Codex CLI, Aider, Cline
Repo-wide refactors, scripting, server work
Cloud / async agents
Devin, Claude Agent SDK, OpenAI Codex Cloud, Replit Agent
Rule I use for tools. Pick the least autonomous tool that solves the
problem. Use autocomplete for known patterns, chat for scoped edits, IDE agents for features,
cloud agents for sweeps. Skipping rungs creates accidents.
7 · The VIBE Strategy Framework
A four-pillar mental model I use on every project. When one pillar is weak, that is where the work stalls.
A prompt is a tiny specification. Treat it like product writing: structure, examples,
constraints. The five patterns below cover ~80% of day-to-day work.
Pattern 1
Goal · Context · Constraints · Done
The universal default. Four short paragraphs, in this order.
# Goal
Add server-side pagination to /api/orders, default 50/page.
# Context
- Express app, Prisma ORM, see routes/orders.ts and prisma/schema.prisma.
- We already use cursor pagination in /api/invoices — match that style.
# Constraints
- No breaking changes to existing clients.
- Keep response shape additive (add nextCursor, keep items).
# Done when
- New unit test passes; existing tests still pass.
- Manual curl with and without cursor returns expected pages.
Pattern 2
Plan-then-execute
For anything spanning more than two files, ask the agent to plan first.
Before writing any code, list the files you will change
and a 1-line description of the change for each.
Stop and wait for approval.
Catches "agent invents new architecture" early, while it is still cheap.
Pattern 3
Reference exemplars
Point to a file that already does it right.
Implement <feature> following the same structure as
src/features/auth/login.ts — same error handling,
same logging conventions, same test layout.
List three ways this change could break production
and the test that would catch each one before
writing the implementation.
Surfaces edge cases without doubling the work.
Pattern 5
Spec, not request
For features lasting more than a day, write a short spec document and feed it as a file. Specs are reusable, prompts are not.
SPEC.md
- Problem
- Users & jobs-to-be-done
- API surface (with examples)
- Data model
- Edge cases
- Out of scope
- Test plan
Anti-pattern
The mega-prompt
A 400-line prompt asking for "the whole thing" produces a 2,000-line diff. Nobody reviews it. Bugs ship.
Fix: decompose into 3–7 vertical slices, each independently testable.
9 · Context Engineering
In 2026, "prompt engineering" has been absorbed into the broader practice of context
engineering: curating the full set of inputs the agent sees — system rules, files,
examples, tool outputs, prior decisions, and memory.
Static
Project conventions file
A CLAUDE.md / AGENTS.md / .cursorrules in the repo root capturing:
Stack & versions
Run, test, lint commands
Code style & naming
"Always / Never" rules
Pointers to exemplar files
Dynamic
Just-in-time retrieval
Tools that let the agent pull what it needs: repo grep, file read, type signatures, DB schema, API docs via MCP. Better than dumping the whole repo.
Persistent
Memory & learnings
Save durable facts (user role, project goals, "we tried X, it failed because Y") to a memory file. Reload across sessions to avoid re-explaining.
The context budget
Even with million-token windows, attention is finite. Curate ruthlessly:
Vibe coding shifts the bottleneck from writing code to trusting code.
Your quality stack must be machine-readable so the agent can use it as a feedback signal.
The verification ladder
Compiles / type-checks — the floor. Never skip.
Linter clean — catches style drift the agent introduces.
Unit tests pass — including new ones the agent wrote.
Integration tests pass — hits real DB, real API client.
Manual exercise — you run the app and try the feature.
Second-opinion review — a different model / human reviews the diff.
Production telemetry — error rates, latency, business metrics.
Guardrail checklist (set once, reuse forever)
Sandboxed execution for any agent-run command
Permission allowlist for shell commands (no rm -rf by default)
Secrets scanner pre-commit hook
SAST + dependency-audit on every PR
Branch protection: agents cannot push to main
Disposable cloud envs for agent demos
Audit log of agent actions retained for ≥ 30 days
Tests are now spec artefacts. A failing test is the clearest possible
prompt: "make this pass." Invest more in tests because the agent will use them as
its compass.
11 · Risks & Mitigations
Risk
How it appears
Mitigation
Hallucinated APIs
Agent imports non-existent libraries or functions
Type-check + run before commit; reference real exemplars
Security regressions
SQL injection, XSS, missing auth checks
SAST in CI, security-review skill on diff, principle of least privilege
Secret leakage
Keys committed to repo or sent to model
Pre-commit secret scanner, env-var hygiene, on-host model for sensitive code
Mandatory "no-AI" exercises; deliberate learning time
Over-trust
"It compiled" treated as "it works"
Verification ladder enforced in PR template
Cost runaway
Long agent runs burn tokens
Budgets, model tiering, caching, kill-switches
IP / license risk
Generated code resembles GPL training data
Use licence-aware models, attribution scanning
Prompt injection
Untrusted text in a fetched file hijacks the agent
Treat tool output as untrusted; sandbox; allowlist actions
12 · Anti-Patterns
Anti
Vibe-and-ship
Accepting AI output unread and merging it. Defects ship at 2× the rate. Always read the diff.
Anti
Prompt golf
Endlessly tweaking a single prompt to "fix" output. Stop, decompose, add a test, try again.
Anti
Single-shot megaproject
"Build me Twitter." Big-bang prompts produce big-bang failures. Vertical slices instead.
Anti
The lone wolf agent
One developer running an unsupervised cloud agent for hours. Use PR-sized units of work.
Anti
Documentation by deletion
Agent rewrites code and quietly drops docs/comments that captured non-obvious "why."
Anti
Test theatre
Agent writes tests that pass by mirroring the implementation. Review tests before code.
13 · Skills Matrix for the Modern Developer
The relative importance of developer skills has shifted. Below is a snapshot of where to
invest your learning hours now.
Figure 6 — Where developer skill returns are concentrated in 2026.
Invest more in
Writing crisp specs and acceptance criteria
Reading diffs fast and well
Designing testable interfaces
Architectural taste & long-term thinking
Security and threat modelling
Curating context (files, exemplars, memory)
Invest less in
Memorising library APIs
Hand-writing CRUD boilerplate
Configuring repetitive scaffolding
Polishing single-file syntax tricks
Manual refactors a tool can do safely
Caveat: invest in fundamentals at least once. You cannot review what you do not understand.
14 · Team Operating Model
Individual vibe coding scales to teams only when shared practices replace personal habits.
The "team-OS" below is what separates a chaotic AI-tools-everywhere shop from a high-leverage
engineering org.
Shared practices
One canonical conventions file per repo (AGENTS.md)
Spec template every non-trivial change starts from
PR template with explicit "AI assistance" + "verified by" fields
Mandatory second-opinion review (human or agent) before merge
Eval suite measuring agent quality on internal tasks, run weekly
Shared library of reusable prompts / skills / subagents
Incident post-mortems include "did AI contribute? how?"
Roles that emerge
Context owner: curates the AGENTS.md and exemplar set
Eval owner: writes and maintains the agent eval harness
Reviewer-in-chief: sets the bar for what AI PRs must pass
Security partner: threat-models agentic workflows
These are hats, not headcount — one person can wear several.
15 · Metrics & KPIs
Measure both velocity and trust. Velocity without quality metrics is how you end
up shipping 3× the bugs at 3× the speed.
Dimension
Metric
Why it matters
Velocity
Lead time for change (idea → prod)
The headline benefit of vibe coding
Velocity
PR cycle time
Detects review bottlenecks
Quality
Change failure rate
The first thing that degrades under vibe-and-ship
Quality
Escaped defects per 1k LOC merged
Trends down only with discipline
Quality
Mean time to recover
Tests how reversible your changes are
Trust
% of AI-authored diffs reviewed line-by-line
Cultural signal; should stay near 100%
Trust
Agent eval pass rate over time
Catches regressions in your prompts/conventions
Cost
Token spend per merged PR
Right-sizes model tiering
Adoption
% engineers using agents weekly
Indicates tooling fit, not headcount
Wellbeing
Self-reported flow / frustration
Vibe coding can either delight or burn out
16 · 30 / 60 / 90-Day Adoption Roadmap
Days 0–30
Foundation
Choose one IDE agent + one CLI agent
Write AGENTS.md for the top 2 repos
Adopt the Goal/Context/Constraints/Done prompt template
Enable secret scanner + SAST on every PR
Each dev ships 3 vibe-coded PRs with full diff review
Baseline DORA metrics
Days 31–60
Scale & standardise
Stand up shared MCP servers (DB, issue tracker, observability)
Introduce spec template + PR template fields
Build an internal eval harness for 10–20 representative tasks
Pilot one cloud / async agent on backlog sweep work
Run first "AI incident review" retro
Train all engineers on context engineering + verification ladder
Days 61–90
Leverage & learn
Promote the most-used prompts into reusable skills / subagents
Tier models by task class to control cost
Automate second-opinion review on every PR
Publish a quarterly "agent quality" report
Refactor one legacy module using agent-led migration
Re-measure DORA + trust metrics; compare to baseline
17 · Future Outlook (12–24 months)
Trend
Specs become the source of truth
Code becomes a compiled artefact of spec + tests + context. Diff review shifts upstream to spec review.
Trend
Multi-agent teams
Planner, implementer, reviewer, and tester agents collaborate on a single PR with the human as editor-in-chief.
Trend
Eval-driven development
Teams maintain task-specific eval suites the way they maintain test suites. Prompt/skill changes ship with eval deltas.
Trend
Local + edge models
On-device small models handle high-volume edits; cloud frontier models handle judgment-heavy tasks. Cost and privacy improve.
Trend
Regulatory pressure
SBOMs, attribution, model-card requirements, and AI-disclosure rules become standard. AGENTS.md gets a "compliance" section.
Risk
Two-tier engineering market
Developers who only prompt without understanding fall behind. Those who pair fundamentals with agentic leverage thrive disproportionately.
18 · What I keep coming back to
Vibe coding is neither hype nor heresy — it is the interface I use most days now. The developers who
win are not the ones who give up on rigor; they are the ones who relocate it: from typing to
specifying, from writing to reviewing, from individual cleverness to systemic feedback loops.
The discipline is simple to state and hard to do well: hold a clear vision, iterate in small steps,
set firm boundaries, and demand evidence. Do that consistently and an AI agent is the highest-leverage
colleague I have worked with. Skip any one of those and it becomes the most expensive intern I have ever hired.
What I keep coming back to: vibe coding rewards taste, specifications, and
verification — not faster typing. Invest there, and the rest of the stack is just plumbing.
19 · References & Sources
Annotated bibliography behind the vibe-coding definition, autonomy spectrum, six-stage workflow, tooling landscape, VIBE framework, prompt patterns, context-engineering pyramid, verification ladder, risk table, anti-patterns, skills matrix, team operating model, DORA-style KPIs, adoption roadmap, and future-outlook cards. Section tags (e.g. §05) show where each source is used. Diagrams and the VIBE acronym are my synthesis unless noted.
Scope. Synthesis of practitioner writing, vendor documentation, and software-engineering research (May 2026). Hero KPI ranges (~55% AI-assisted code, 3–5× prototype speed, 2× defect rate without review, context engineering as #1 skill) blend GitHub, Stack Overflow, and field surveys — directional, not universal. Tool names in §06 reflect the 2025–26 landscape and will shift. Not tool endorsement, employment, or legal advice.
Citations are numbered continuously [1]–[n] within this section.
Origin, definition & the vibe-coding meme (§01–§02)
Karpathy, A., post introducing "vibe coding." X (Twitter), February 2025. Coined the phrase — "give in to the vibes, embrace exponentials, forget that the code even exists" — §01–§02 etymology and working definition. x.com/karpathy (search Feb 2025 vibe coding). — §01, §02.
Merriam-Webster, "vibe coding" Word of the Year coverage & dictionary entry. 2025. Mainstream adoption of the term within a year of coinage — §01 timeline sentence. merriam-webster.com — §01.
Wiener, A., "Vibe Coding and the Future of Software." The New Yorker, April 2025. Cultural and professional framing of agent-directed development — background for §02 distinction (autocomplete vs agentic loop). newyorker.com — §02.
GitHub, Octoverse 2024 & Copilot usage reports. 2024–25. AI-assisted coding adoption in surveyed teams — anchor for §01 ~55% hero stat (verify latest Octoverse/Copilot metrics). github.blog/octoverse — §01, §15.
Stack Overflow, 2025 Developer Survey — AI section. 2025. Developer tool usage, trust, and productivity self-reports — §01 adoption context and §15 wellbeing KPI. survey.stackoverflow.co — §01, §15.
McKinsey & Company, Unlocking Value from AI in Software Development (Digital practice insights). 2024–25. Prototype-cycle compression and review bottlenecks — §01 3–5× prototype KPI and §10 trust bottleneck. mckinsey.com/digital — §01, §10.
Peng, S. et al., "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot." arXiv:2302.06590, 2023; follow-on controlled studies. Faster task completion with AI assistance; quality caveats when review skipped — §01 2× defect KPI and §12 vibe-and-ship anti-pattern. arxiv.org/abs/2302.06590 — §01, §12.
Yao, S. et al., "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. Plan → act → observe loop underpinning §03 stages 4–5 and §05 Generate/Verify phases. arxiv.org/abs/2210.03629 — §03, §05.
Anthropic, Building Effective Agents guide. 2024–25. Agent design patterns: gather context, take action, verify — maps to §05 six-stage loop and §04 P3/P4 principles. docs.anthropic.com/agents — §04, §05.
Jimenez, C. E. et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" ICLR 2024. Benchmark for autonomous coding agents — §03 stage 5 "Autonomous agent" and §17 eval-driven trend. arxiv.org/abs/2310.06770 — §03, §17.
Shinn, N. et al., "Reflexion: Language Agents with Verbal Reinforcement Learning." NeurIPS 2023. Iterative self-correction — background for §05 Verify stage and §12 prompt-golf anti-pattern. arxiv.org/abs/2303.11366 — §05, §12.
Yan, E., "Patterns for Building LLM-based Systems & Products." eugeneyan.com, 2023–25. Goal/context/constraints patterns and eval discipline — §08 prompt patterns and §04 P1 intent-over-syntax. eugeneyan.com/writing/llm-patterns — §04, §08.
GitHub, Spec Kit & spec-driven development guides. 2025. Structured specs before codegen — §08 Goal·Context·Constraints·Done template and §14 spec template. github.com/spec-kit — §08, §14.
Beck, K., Test-Driven Development: By Example. Addison-Wesley, 2002. Red-green-refactor as feedback loop — intellectual basis for §10 "tests are spec artefacts" and §04 P3 tight loops. — §04, §10.
Google, Software Engineering at Google (Winters, Manshreck, Wright). O'Reilly, 2020. Code review culture and readability — §07 Evidence pillar and §13 code-review skill bar. — §07, §13.
Liu, N. F. et al., "Lost in the Middle: How Language Models Use Long Contexts." TACL 2024. Attention limits in long contexts — §09 "context budget" curation rationale. arxiv.org/abs/2307.03172 — §09.
OWASP, Top 10 for Large Language Model Applications (2025 edition). Prompt injection, insecure output handling, supply-chain risks — §11 risk table and §10 guardrail checklist. owasp.org/llm-top10 — §10, §11.
Greshake, K. et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." 2023. Untrusted content hijacking agents — §11 prompt-injection row. arxiv.org/abs/2302.12173 — §11.
NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0). 2023. Govern-map-measure-manage cycle for AI systems — §10 guardrails and §14 security-partner role. nist.gov/ai-rmf — §10, §14.
Google, SRE Book — monitoring, alerting, blameless postmortems. 2016–18. Production verification ladder top rung — §10 step 7 telemetry and §14 incident post-mortems. sre.google/sre-book — §10, §14.
Truong, L., companion note: AI Cost Control. May 2026. Token budgets, model tiering, kill-switches for agent runs — §11 cost-runaway row and §16 Days 61–90 cost tiering. Same author collection. — §11, §16.
DORA metrics, eval harnesses & team operating model (§14–§17)
Forsgren, N., Humble, J., & Kim, G., Accelerate: The Science of Lean Software and DevOps. IT Revolution, 2018. DORA metrics (lead time, deployment frequency, CFR, MTTR) — §15 metrics table and §16 baseline DORA step. — §15, §16.
DORA / Google Cloud, State of DevOps Report (annual). 2024–25. Benchmarking change failure rate and recovery — §15 quality rows. dora.dev — §15.
METR, Model Evaluation & Threat Research — agent task benchmarks. 2024–25. Measuring autonomous coding capability over time — §17 eval-driven development card. metr.org — §17.
Skelton, M. & Pais, M., Team Topologies. IT Revolution, 2019. Platform/enabling teams — §14 emerging roles (context owner, tool admin) as hats not headcount. — §14.
U.S. NTIA / CISA, software bill of materials (SBOM) guidance & minimum elements. 2021–25. Supply-chain transparency — §17 regulatory-pressure card. ntia.gov/SBOM — §17.
European Parliament & Council, Regulation (EU) 2024/1689 (AI Act). 2024. Disclosure and governance expectations for high-risk AI — §17 compliance trend (verify applicability to dev tooling in your jurisdiction). eur-lex.europa.eu — §17.
Stanford HAI, 2025 AI Index Report — Technical Performance chapter. 2025. Capability/cost curves for coding models — §17 local+edge models trend and §13 declining syntax memorisation. hai.stanford.edu/ai-index — §13, §17.
Author synthesis
Truong, L., Vibe Coding Strategy — personal working notes. May 2026. Original diagrams (FIG 1–6), VIBE framework, six-stage loop, skills matrix, anti-pattern cards, 30/60/90 roadmap, and team-OS practices. LinhTruong.com — all sections.
Before you quote externally: The ~55% AI-code statistic compresses multiple surveys with different definitions (Copilot suggestions accepted vs lines merged vs time assisted). Tool names and capabilities in §06 change quarterly. Karpathy's original post is informal coinage, not a technical standard. Re-verify adoption numbers and vendor claims against primary sources before citing in policy or procurement documents.